Why Unstructured Data Needs to Be Solved for Healthcare Innovation to Advance

Artificial intelligence programs are evolving. Currently, AI is undergoing a “third-wave” of innovation, and this new software is capable of metabolizing complex, heterogeneous data from loosely associated context to find patterns in previously unconnected events.


As software designers have known for decades, any cognitive task that is repetitive, and algorithmic can be replicated in a computer. The elegance in software and what often makes-or-breaks it in the real word is in the human-computer interface. This discipline involves efficiently offloading repetitive tasks to the computer, and efficiently interpreting the results.  As J.C.R. Licklider explained in his groundbreaking 1960 essay Man-Computer Symbiosis, the more seamless this translation task can become, the more likely it is computers will be able to aid us in our cognitive tasks and extend our decision-making capacities.


Almost everyone in the working world has had an “it’s easier to do it myself” moment when we wonder if it’s not more efficient to execute our own work than to try to communicate the task to a subordinate. When it comes to calculations, computers of the past have essentially been highly rigid (and often intransigent) subordinates. However, our software is evolving, and with good design, the cognitive load of task-switching from a cognitive task to specifying the automation of that cognitive task into a computer is becoming less irksome.


The evolution of AI


There are three “waves” of AI software causing ripples in the fabric of the working world.


First-wave AI or “knowledge engineering” software started with optimization of complex and rapidly evolving calculations, tax software adapting to yearly changes in the tax code, or navigation data responding to a traffic accident. However, first-wave AI programs need to be programmed in order to solve complex problems, so they are only useful for implementing known solutions more efficiently through repetitive use.


Second-wave AI or “machine learning” software implemented statistical probability for improved pattern recognition. This evolution enabled programs that could parse complex visual patterns such as retinal scans, in some cases with better-than human accuracy. However, second-wave AI is essentially a black box. Given a set of complex data (fracture radiographs), and solutions for that data (orthopedic surgeon assessments of those fractures), second-wave AI can replicate the solutions of a human rater, and sometimes even improve on it. However, second-wave AI software is not capable of reliably interacting with humans, and its logic (or the “causality” of its decisions) is not transparent.


Third-wave AI or “contextual normalisation” software recognizes logical reasons for patterns to exist within complex data. The hope would be that a third-wave AI could take widely varying datasets such as genetic data from DNA, messenger RNA counts, proteomics, electrolytes and complete blood count readings, diagnosis codes, and environmental data to predict links between genetic and environmental causes of disease. To give an example of this manual task, nutritional food labeling data was normalized from 175 countries then analyzed for a link between sugar and diabetes. Researchers found that an increase of 150 kcal/person/day in nationwide sugar availability (about one can of soda/day) was associated with a 1.1% increase in diabetes prevalence. This manual research task began with a hypothesis and required extensive manual collation. The hope of AI in replicating this task is that it could collate the same dataset in real-time, and continuously and automatically analyze all food components against all reported disease-states, generate novel links between food components and disease states, and explain the hypothetical casualty behind the link.


AI in medicine and the life sciences


Healthcare was one of the biggest adopters of first-wave AI technology. Two areas where AI can contribute significantly in the near future is in the synthesis of the medical literature and in digitization of patient records.


The medical literature is too vast for human consumption. Since the advent of the evidence-based medicine ideal in the second-half of the 19th century, Western-medicine has endeavored to become a data-driven industry. But biological data is deep, dense, and diverse; at current estimates published health information doubles every 3-4 years. The 8.1 million manually curated and peer-reviewed life sciences articles indexed in the Medline database underwent a 43% annual increase from 1978-2001, with more than 200,000 randomized controlled trials published between 1994 and 2001 alone. It is currently recommended that in order to keep abreast of the medical literature and make up-to-date recommendations to their patients, a general practitioner should read 19 published medical research articles per day, however most clinicians only have 1 hour per week for this activity. Artificial intelligence programs hold promise to replace literature reviews and synthesize relevant patient information for clinicians to utilize at point-of-care.


Third-wave AI can eliminate accessory information to equate datasets across widely varying contexts. Humans do this naturally, a physician assessing a fracture would get a history from a patient, assess radiographs, obtain the patient’s account of events, perform a gait analysis and manipulate an injured limb to feel unusual movements or textures in the bones, ligaments or soft-tissue. These disparate sensory inputs (storyline, movement, texture, and images) are equated in the human brain across contextual normalization curves (normal gait and range of motion, uninjured tissue, and healthy bone morphology). Accessory information (ethnicity, healthy aging, accent and word-choice) is filtered out and normalized. If the record is communicated through a nurse or junior physician, then handwriting and common medical acronyms are interpreted contextually. Together the information is organically synthesized to triangulate a more accurate diagnosis.  The hope is that third-generation AI could parse patient records containing disparate contextual datasets.


Current limitations to AI


There are known problems limiting normalization of clinical data contained in electronic medical records. These included natural language processing (NLP), proprietary datasets threatening open innovation, systematic bias in the medical literature due to publication bias and even outright fraud, and the increasing complexity of health data such that past data is not specific enough to be useful for current prediction. In the traditional medical research industry paradigm, this noise is filtered out by humans who manually structure noisy data into diagnosis codes, checkboxes, and clinical ontologies.


Now third-wave AI has evolved. In the “data as a service (DaS)” – model third-wave AI software crawls unstructured datasets and combines NLP, meta ontologies, image recognition software with machine learning, to automate the structuring of previously unstructured datasets. The more advanced programs combine this with a continuous analytics as a service (CAaS) model providing enhanced, real-time data visualization for continuous decision support on enterprise and external data.


While these systems are largely available at the enterprise level, In the future these two systems will be seamlessly delivered ton a case-by-case basis. Much like Google searches the entire web to answer an individual query a patient or physician will be able to parse the entire body of medical knowledge and access a single distilled dashboard of relevant information for an individualized, and timely treatment.


The only question remains how proximal this future really is, and how our “health” will change when it arrives.


About the author: 


Gunjan Bhardwaj is the founder and CEO of Innoplexus, a leader in AI and analytics as a service for life science industries. With a background at Boston Consulting Group and Ernst & Young, he bridges the worlds of AI, consulting, and life science to drive innovation.


Gunjan Bhardwaj

Gunjan Bhardwaj is the founder and CEO of Innoplexus, a leader in AI and analytics as a service for life science industries. With a background at Boston Consulting Group and Ernst & Young, he bridges the worlds of AI, consulting, and life science to drive innovation.

Book a Demo