Solving the Big Data problem in pharma innovation

Summary:

Leverage unstructured data
Data integrity in Life Science
Third-wave of AI

I originally published this article at PharmaPhorum.

The effectiveness of Artificial IntelligenceI applications can be undermined by the volumes of unstructured data prevalent in the pharma industry. What can be done to overcome this issue?

We live in an exciting time for the pharmaceutical industry. Cutting-edge technologies like artificial intelligence and Blockchain are making headlines or revolutionizing everything from drug discovery to clinical trials. Many of these innovations are built upon the same foundation: Big Data. But a longstanding challenge within Big Data must be overcome in order for technologies like AI to achieve their full potential. That challenge is unstructured data.

Unstructured data and pharmaceutical Artificial Intelligence

The need to overcome this challenge can be illustrated by examining the consequences of unstructured data for the effectiveness of Artificial Intelligence applications within the pharmaceutical and life science industries.

As I’ve written about in the past, the history of Artificial Intelligence can be seen through the lens of three distinct waves. The first wave brought ‘knowledge engineering’ software that enabled efficient solutions to practical challenges. The second wave brought machine learning programs that enabled automated pattern recognition and advanced statistical analysis. We’ve now entered the third wave of AI, which has the power to generate novel hypotheses by analyzing massive sets of data.

Third-wave AI has the potential to significantly accelerate the research and development process for new drugs, as companies like Merck & Co and Sanofi have begun to discover. Applications of third-wave Artificial Intelligence programs have powered medical discoveries such as the connection between fish oil and Raynaud’s disease.

But third-wave AI applications have also suffered a series of failures in healthcare and pharmaceutical contexts. MD Anderson’s problems with IBM Watson serve as a notable example. In that instance, the problems all started when MD Anderson changed its electronic medical record (EMR) provider, preventing Watson from accessing the data that it needed. This example illustrates the challenge posed by unstructured data and the corresponding need for greater data integrity within life science industries.

Data integrity in life sciences

Many of today’s Artificial Intelligence programs depend on good, clean data in order to operate effectively. If access to such data is compromised, the Artificial Intelligence program’s ability to conduct analysis and generate hypotheses is undermined.

Data sets within the pharmaceutical and life science industries pose a particular challenge for Artificial Intelligence programs because of the unusual density, depth, and diversity of biological data. Because the complexity of biological data renders it incomprehensible to many Artificial Intelligence programs, the majority of pharmaceutical research today is carried out manually. Human researchers curate data, generate hypotheses, and perform experiments in much the same way that they have for decades. Lacking automation, the drug discovery, development, and testing process is inefficient, expensive, and often inaccurate.

The inefficiency of this process causes prolonged delays between the completion of an experiment and the publication of its results in scientific journals or databases. This delay has resulted in a significant problem with publication bias and inaccuracy in the industry. Even the open-science movement, which is attempting to increase access to not-yet-published clinical research results, depends on manually-curated datasets that are usually created by companies with proprietary interests.

Even heavily-curated data sets are often too inconsistent to be meaningfully analyzed by Artificial Intelligence. Take, for example, the challenge posed by abbreviations and acronyms within the pharmaceutical industry. The same abbreviation may carry different meanings depending on its context. ‘Ca’, for instance, could mean ‘cancer’ in one context and ‘calcium’ in another. Most Artificial Intelligence depends on accurate and nuanced contextual information, and manually-curated data sets often fall short of this mark.

Overcoming the unstructured data challenge

Fortunately, some of the world’s leading firms have begun to explore two possible ways to overcome these challenges. One approach is to simply improve the state of available data sets. 2009’s HITECH Act modeled this approach by standardizing EMR systems to create richer, more comprehensive, and more up-to-date, biological data sets. As a result, diverse data from biological patents, clinical trials, academic theses, and other sources can increasingly be analyzed by advanced Artificial Intelligence programs.

The second way to overcome the unstructured data challenge is simply to build better Artificial Intelligence. Recent innovations have brought ‘context normalization’ Artificial Intelligence technology that can process and analyze unstructured, heterogeneous data points using a combination of natural language processing, machine learning, and cutting-edge text analytics. Finally, the most advanced Artificial Intelligence programs are able to utilize disparate, incongruous data to generate novel hypotheses without the need for costly human curation.

Innovations like these are allowing researchers to analyze data, generate hypotheses, and conduct conclusive clinical trials at unprecedented levels of speed and accuracy. This is good news for pharmaceutical companies, medical professionals, and consumers alike.

Featured Blogs

on September 23, 2020

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	The cookies are used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is used to store whether or not a user has consented to the use of cookies. It does not store any personal data.

Latest Posts

Partnering to unlock the true potential of cannabis in medical care

23

Partex Partners with Lupin to Revolutionize Drug Discovery through AI-Driven Asset Search and Evaluation

28

Partex NV announces collaboration with Althea DRF Lifesciences to provide comprehensive end-to-end services to accelerate drug discovery and development

Solving the Big Data problem in pharma innovation

Summary:

Featured Blogs

Machine learning as an indispensable tool for Biopharma

Find biological associations between ‘never thought before to be linked’

Find key opinion leaders and influencers to drive your therapy’s

Do you know what ontology is?

Impact of AI and Digitalization on R&D in Biopharmaceutical Industry

Why AI Is a Practical Solution for Pharma

How can AI help in Transforming the Drug Development Cycle?

How Will AI Disrupt the Pharma Industry?

Revolutionizing Drug Discovery with AI-Powered Solutions

Leveraging the Role of AI for More Successful Clinical Trials

Understanding the Language of Life Sciences

Understanding the Computer Vision Technology

AI Is All Hype If We Don’t Have Access to

Partnering to unlock the true potential of cannabis in medical care

Partnering to make 100,000s COVID-19 publications searchable

Machine learning as an indispensable tool for Biopharma

Precision medicine and the discovery of biomarkers

Partex Partners with Lupin to Revolutionize Drug Discovery through AI-Driven Asset Search and Evaluation

Partex NV announces collaboration with Althea DRF Lifesciences to provide comprehensive end-to-end services to accelerate drug discovery and development

Innovative AI technology in oncology: Partex Group presents results from a pilot project

Partex NV Forges Collaboration with Sanofi in AI-Based Dossier Enrichment for Out-Licensing

WHO WE ARE

WHAT WE OFFER

HOW WE WORK

WHY US

Updates

Frankfurt (Germany)

Pune (India)

Iselin (USA)

Cham (Switzerland)