Every domain has its own language. The same word used in different domains can have totally different meanings. In fact, the same word can mean two separate things for different subdomains, eg. in science, hedgehog is both animal and protein. Moreover, domain language is dominated by abbreviations and terminology used by researchers and experts. A medical scientist understands when Ca is being used for Cancer and when for Calcium. Our brain is trained in a way to recognize these differences and understand the context of the word. But how will a computer understand them? As the human brain automatically connects meanings and relationships, ontology does the same for computers!
Big data has become a growing challenge for data scientists as it is increasing dramatically. In a study by Densen, it was estimated that the Life Sciences data will double approximately in just 73 days by 2020. Moreover, the data is featured by heterogeneity, i.e. it is available in various formats (structured, unstructured) and is interlaced between different languages and domains (eg. Life Science, Financial Services). An ontology is a set of domain-specific terms, concepts, & semantic relationships. It gives data meaning by understanding its context, so that valuable information can be extracted even from unstructured data.
A machine that specializes in a domain specific language can learn concepts and associations itself. The concept is similar to a 5-year-old who learns word concepts and semantic associations of up to 2000 English words, and over time, with exposure and experience, their word universe expands. By applying the same principles again and again, a computer learns and grows to recognize new indications, interventions, etc.. Hence, an AI machine that has been trained on life science language can crawl, extract, and aggregate that exploding data universe and understand the context. It will also be able to disambiguate cases where EGFR is a target, a biomarker, a gene or a protein, as well as identify and analyze semantic associations between concepts.
At Innoplexus our research graph connects four criteria for expressing our ontology:
- Entity: Represents an object or thing, for example author, gene, disease, drug
- Relation: Represents relationships between things, for example, a Disease-Drug relationship between two entities
- Role: Describes the participation of entities in a relation. For example, disease TREATED_BY drug
- Resource: Represents the properties associated with an entity or a relation, for example, active ingredient, brand names. Resources consist of primitive types and values, such as strings or integers
A self-learning ontology can help apply meaning to context and reason to searches. Logical inference can enable the discovery of unknown concepts and associations. Moreover, it can help measure the value of content and improve search results. Self-learning ontologies help automate data analysis to a significant extent, especially for a vast domain such as pharma, healthcare, and Life Sciences. Take for instance, the research on cancer is so extensive and the indication is full of subtypes, making data extraction on cancer confusing, especially for someone looking for information on a specific type of cancer, say pancreatic.
Additionally, bringing together the data of two or more abstractly divergent sub-domains is also, many a time, important to find a unique solution. With an AI developed to understand the language of these subdomains or subtypes, and drive relationships between them, the challenges imposed on scientists, researchers, and medical practitioners can be largely reduced such that they can concentrate more on insights and interpretations from the data. The understanding of domain ontology by computers in order to merge knowledge and facilitate the integration of useful information from different sub-domains would allow, for example, to successfully develop a new drug for cancer patients with diabetes.
Why does pharma & Life Sciences need a self-learning ontology?
A lot of data analysts focus or use a single type of ontology ie. gene ontology. By doing so, pharma is limiting the scope of insights it can drive from the data available publicly. The Life Sciences data is featured by heterogeneity and complexity, making it much more difficult for reaching optimal explanations and driving ideal solutions, a single type of ontology would never work perfectly. To generate useful insights which can not only help in drug discovery but also in accelerating the process of research in the pharma industry, understanding entity relationship is important. A life sciences ontology which understands more than 35 million terms and concepts, as well as relationships between proteins, indications, and intervention and therefore, helps in providing better insights and empowering important business decisions.
A self-learning ontology when applied to data analytics, can not only help crawl the most relevant term related queries but also enables discovery of new or previously unknown connections. It enables the discovery of Key Opinion Leaders, facilitates drug discovery and clinical development, allows finding of optimal clinical sites, to stratify patients, as well as to check patient and physician sentiment. Moreover, it allows streamlining of regulatory and medical affairs, overview of patent landscape in pharma, and the discovery of unmet needs. This kind of ontology automatically adds the most relevant and recent results to help derive real-time insights from life sciences data. The application of a self-learning ontology on integrated enterprise and third-party data enables reliable research and optimal development for the pharmaceutical industries.