Why machines need to speak the language of Life Sciences
Do you remember the days when NLP, known as Neuro-Linguistic Programming was all the rage in self-help books and motivational speaking? You remember the self-help gurus preaching the use of Neuro-Linguistic Programming as a method to help you achieve near superpowers to attain your goals by “modelling” the skills of so-called exceptional people. Well, since those days in the late 70s the 80s, that NLP has since been discredited scientifically. Now, there is a new NLP, known as Natural Language Processing. The concept is the area of artificial intelligence concerned with the interactions between computers and human languages.
What exactly is “natural language?” Natural language is how we write or speak in everyday life. When you read a newspaper article, a social media post, or this white paper, you are experiencing “natural language.” Why is NLP important? The NLP concept deals with how to program computers to process large amounts of natural language data. It should be noted that the concept isn’t new. It started in the 1950s, but with all the developments related to Data, Big Data, and the likes, the concept has become more and more a part of publications and content related to Artificial Intelligence (AI) and Machine Learning (ML).
While NLP serves as a root concept to AI and ML, I am going to explain why it actually doesn’t matter so much for the Lifesciences. Let’s start with the not so bold statement, that the language of Lifesciences is anything but natural. Anyone who has read a clinical paper, a product monograph, or a regulatory file, will quickly note the content is far from English prose. Essentially, the lifesciences has its own language.
How does this impact the way technology is used in the Life Sciences? The fact that most tools and technologies use NLP mean that the amount of irrelevant content (noise) is huge. Try doing a search in google for any biomedical concept and you will see that more than 50% of the results are not relevant to what you are looking for. Do a search or try to extract content from a movie review online and it works. Try doing the same over a pharma regulatory document or a doctors comments on a social media blog and it will fail miserably. This is due to NLP. This creates a significant burden in terms of time and energy spent on trying to manually filter and parse through the content. I use google as an example, but most data and analytical tools have the same issue. They are built on NLP and as such, so much of the data fed into the tools are irrelevant. The best computing power and the best algorithms still rely on the quality or relevancy of the data. Garbage in, garbage out.
So, what can be done to help those of us in the Life Sciences to benefit from Data & Analytics tools without all the noise? Use an NLP which is modified for the Lifesciences. Sounds simple right? But to do this, one would need to not only expand the vocabulary from about 10,000 English words to millions of life sciences terms but also create a Life Sciences Ontology. Huh? Onto-what? Ontology for a domain is what grammar is to natural language. Ontology is a set of concepts and categories in a specific domain that shows the properties and relations between them. It defines a common understanding of the structure of information. Essentially, in Lifesciences, we already use a number of Ontologies, for diseases, for genes and various other knowledge. The challenge is most machine programming and data analysis tools either leverage only one of the specific Ontology or none at all and as such are not capable of using a Lifescience Language Processing at scale.
Going beyond NLP to Life Science Language Processing is not only important to make data analysis useful, it’s required. As we consider the hundreds of tools and product offerings available today for the Lifesciences, we need to demand that these tools speak our language. As with the Neuro-Linguistic Programming (NLP), the merits of Natural Language Processing (NLP) will be rendered useless if not taking into account a life science ontology to drive a more relevant Life Science Language Processing.