Artificial intelligence, or AI, is gaining more attention in the pharma space these days. At one time evoking images from futuristic science fiction, it now plays a significant role in smartphones, cars, search engines, and financial transactions. But what, exactly, is artificial intelligence? It is a set of very specialized algorithms trained to solve very specific problems. These algorithms allow apps on our phones to perform such tasks as answering our spoken questions or giving us turn-by-turn driving directions to a destination. AI is continually becoming more advanced as its applications expand into broader use.
So how can we use it to improve the business of pharma? Can it really make a difference? The answer is yes. In fact, AI is key to the future success of the pharmaceutical and biotech industries. To understand what makes it so valuable for pharma, it’s important to understand the kinds of problems it can solve, which components of AI are of practical use for pharma, and how it will change drug discovery and development.
Let’s start by considering how we get reliable clinical results. Current approaches to drug development that rely solely on finding statistical differences between healthy and diseased patients to identify biomarkers, molecular targets, or suitable drugs often fail to provide biologically useful results. Even though these results may be statistically significant, they alone do not guarantee success. While significant differences can help predict which patients are likely to have misregulation of a gene, receptor, or protein, most of the differences will not be associated with the disease of interest, so those results would be considered false-positives. However, statistically significant results that are validated with previously established knowledge or data do lead to success.
Although we use machines to great advantage in the early stages of drug development, machines can only produce data and do not know how to validate that data. This means that to get to success, we rely on human validation, human intervention, and human contextualization. When we get validated results, we publish the information for the consumption of others, not for machines, which means that most of the world’s published information is in an unstructured story format. However, machines cannot understand the information in story format. They can understand the text and, sometimes, tables, but they need structured information with clearly defined entities and relationships to understand the data.
If we want to overcome this problem, we have a fundamental challenge: do we change people through standardization of data reporting, or do we change the machines to work the way the research scientists and data analysts do? If we change people, then we have to force them to conform to the way a machine works. Who wants to undertake that across an entire company, much less an entire industry? It would be easier and require far less time and financial expenditure if we could get machines to see and work the way we do. With information structure, in particular, we need machines to conform to the way we see and process information.
The major challenge in data analytics is managing the data. Because so much of the data is in unstructured formats, analytics professionals have to spend 80% of their time collecting, cleaning, and organizing data so that the computer can understand it. This lack of consistency in the way information is laid out is not going to change. We can make attempts to get organizations together to come up with a standard method of data presentation, but the reality is that the lack of uniformity in the way data is presented is not going to change. Even within a pharma company, there are often multiple software programs in use, requiring analytics professionals to spend an inordinate amount of time cleaning and organizing the data so that comparisons can be made.
Adding complexity to this issue is the fact that change in the pharmaceutical industry moves relatively slowly and the volume of data is continuously growing. In 2020, it is expected that the amount of medical data will double every 73 days. Any static ontology that is created to manage this data is going to be out of date before it can be used. The question is: How will pharma manage the growth in data if the ontologies can’t keep up with it? Lack of data will not be the issue, but the usability of the data will be.
To understand how AI can solve this and other problems, we need to understand its components. In pharma, practical applications of AI require four key elements that work together to make data usable by machines: computer vision, information extraction, life sciences language processing, and entity disambiguation.
With computer vision, information can be pulled from tables, graphs, photos, text, etc. Unlike standard optical character recognition software, which can only extract the words, computer vision is able to extract information along with the context from where the text originated. For example, Innoplexus has developed computer vision technologies that can identify the sections from which text is extracted, such as whether the text is from an abstract, an introduction, the results, or the discussion section. This is valuable because it enables our AI algorithms to rank the strength of relationships found within an article. A relationship located within the results or discussion section is more likely to be a novel relationship than one mentioned in the introduction.
Computer vision allows the computer to extract information from various types of files, such as pdfs, PowerPoint files, an Excel document, or a web page. Having the ability to recognize and extract information from various types of files is important.
Life sciences language processing
Although natural language processing (NLP) is a key element of AI, by itself it is not useful for the life sciences industry. Innoplexus has built an extensive life sciences ontology into the framework of NLP, and the result is a life science language processing system. An ontology is like a dictionary that includes not only the definition of a word or concept but also the relationships associated with the word or concept. It is not a table or a thesaurus, it is a set of linkages of concepts and terms. Having a useful ontology is important in that it allows machines to interpret human inputs and translate them into the context of life sciences. Furthermore, it aids in concept-based searches instead of keyword searches.
With NLP alone, the machines cannot tell the difference between similar terms, such as EGFR and eGFR. A set of data that has nonrelevant entities is useless to the life sciences industry. No great computing power or well-designed algorithm is going to be of any use if you can’t make sense of the data. And this has already happened in the industry. There have been cases where major cancer research centers that have the great computing power and the best algorithms have not been able to make sense of the data they generated because they did not have language processing that could function effectively in the life sciences arena.
At Innoplexus, we understood the data challenge. We realized that in order to effectively serve life sciences, we would need to build an AI machine that can continuously learn from the growing amount of information that the industry is producing. Since 2011, we have been crawling through 97% of the data publicly available on the web. Today, we have the world’s largest life-sciences research ontology. The AI technology behind this is auto-scaling and continuously growing. Having this ontology means that computers can get closer and closer to not requiring research scientists who are using AI to change the way they do their work and avoid requiring them to conform their data and their terminology to someone else’s standard. It’s difficult to ask people to change when they are asked to change everything they do. With a life sciences ontology, this won’t be necessary.
Entity disambiguation entails discerning the meaning between two or more different words or concepts that are spelled the same way. For example, searching for “EGFR” in two different major search platforms gives us two very different results: one gives us “eGFR,” estimated glomerular filtration rate, and the other gives us “EGFR,” epithelial growth factor receptor. Lacking in both search platforms is a system by which a user could select their topic interest.
The advantage of AI for pharma
One last important point about AI: It is not magic. It cannot solve problems that humans cannot solve if humans have unlimited time. What AI can do for pharma is free up data scientists from having to spend time on lower cognitive tasks – manual searches and validation of data – and make them available for more valuable, higher cognitive tasks. It can find, organize, and analyze vast amounts of information in considerably less time than it takes for a human to perform these tasks. In fact, AI machines can accomplish in less than a day what it takes humans months or longer to do.
With the volume of life sciences data growing daily, pharma companies need AI to keep up with it so that they can stay on top of new research results and discoveries. AI can present a picture from a more comprehensive set of sources than is humanly possible in a short period of time, and it may find something that a human analyst might have missed. While AI will reduce the need for manual labor collection, curation, and analysis, it will simultaneously open the door to further innovation and new opportunities for those involved in the research.