Understanding ontology for better insight into the Life Sciences data ocean
The rapid advancement in the field of AI, and the necessity of extracting knowledge and generating insights from data to find sustainable solutions, demanded computers distinctly understand the language of the particular industries. Industry-specific terms used for research enables data analysis such that relevant insights are generated. With an ontology that consists of the relevant terms and connections from a specific domain, the process of identifying core concepts, improving classification results, and unifying data to collate key information becomes streamlined. An ontology assists search and helps apply meaning to information in order to realize its relevance and importance.
What is an ontology?
An ontology can be defined as a set of explicit narrative of knowledge models, encompassing common concepts, their principles, and the relation between those concepts. It’s useful as it employs only the terms and concepts from a specific industry and applies it to big data in order to extract most industry-specific information. Computer systems which use Natural Language Processing alone, and not a specific ontology, employ only the dictionary to extract information. The challenge here is that a dictionary contains definitions of words used in English and relevant to any domain. However, an ontology also contains connections between various entities and can understand the meaning of industry-specific terms which a dictionary may not contain. This includes understanding the search query and whether the person needs information on cancer or calcium when he searches for Ca.
Ontologies include the vocabulary used by experts and researchers to share information particular to an industry or domain, for example, Life Sciences or Financial Services. This includes the definition of categories, concepts, and relations between data and entities, thus limiting complexity by creating what is commonly known as ‘machine-interpretable’ language.
This framework of key concepts is developed in order to share the understanding of information and enable reuse of knowledge. An ontology acts as an abstraction for a particular domain with explicitly outlined assumptions for software agents, as well as for AI and computers to analyze domain knowledge, conveniently avoiding any confusion between similar terms from different domains.
Why are ontologies important for domain specific data analysis?
The world wide web is made up of data, structured and unstructured, and available in various formats. To find, crawl, aggregate, and analyze this data, it is necessary to leverage Artificial Intelligence (AI). However, in order to filter and process data most relevant to a particular domain, it is also important to use and understand a domain-specific language. By doing so, the computer can identify important terms and concepts from the domain in order to extract the most relevant data regarding the search query. Therefore, a domain-specific ontology is used in combination with AI-driven tools for data analytics. This form of AI, which is designed for a specific domain is not only efficient in crawling the relevant data, but also can open novel opportunities for discovering unexplored connections.
Every domain speaks its own language. For example, a professional who speaks English may not be necessarily able to comprehend the jargon used by an English speaking advocate, similarly, a German speaker may not necessarily understand a medical research paper in German, and neither will a Spanish native speaker follow the terminology of Spanish tax advisor. The words and terms we use in daily language are far from what scientists, advocates, or any domain expert speaks. A lot of terminologies that we understand as we do have completely different meaning when comprehended in a domain specific context.
Take for example, a hedgehog. If you are into human genetics or molecular biology, you would instantly associate this creature with cancer. However, someone who is not aware that a protein critical to cell division is also referred to as hedgehog, would definitely suppose that the word was referring to the spiky animal we all know of. Now, suppose we have to extract information about the protein, not the animal, how will the computer know? Here’s where ontologies come into play.
Ontologies are used to apply consistent language to a domain, to help both humans as well as computer systems understand the language of that domain. Ontologies are useful in order to remove word-sense disambiguation. A domain-specific ontology contains concepts and relations about everything the domain entails, and thus, helps computers understand domain specific terminology to precisely find, crawl, and aggregate the most relevant data. Through context based tagging, ontologies can recommend similar resources of information such as articles, searches, results, and concepts, thus improving the process of data analysis.
Uses of an ontology in pharma and life sciences
An ontology specific to Life Sciences can map discoverable concepts from all major sources, connect observations, and learn unseen concepts. This can help researchers, academicians, and scientists generate associations between disease, gene, drug targets, molecules, MOAs, etc. Moreover, a search performed using biomedical concepts and terms instead of tagged words will help minimize manual intervention and automate identification and tagging of the most relevant content. Moreover, the technology can provide recommendations for missing side effects, warnings, etc. through sentiment analysis on drug reviews.
Using Life Sciences language processing will lead to improved results and insights. This can be explained with one simple fact- language of Life Sciences is anything but natural. To understand concepts and develop associations, it is important to specify and use the terminology used in the domain. Natural language process cannot make sense of terms such as HER-1. For that we need an AI and ML system which uses Life Sciences language along with the root concept of NLP. This will enable computers to comprehend clinical papers, regulatory files, product monographs, etc., driving useful insights from them without manual intervention. However, for this we need to expand the vocabulary of our AI tools from around 10,000 English words to millions of life sciences terms. Making this possible without using a domain specific ontology is not feasible, and so the use of such an ontology in pharma and Life Sciences will remain indispensable.