Using Hana and SageMaker to Solve for Adverse Drug Effects

Wayne Applebaum

0/5 (0 vote)

Oct 30, 2019

CPOL

3 min read

5379

Discussion of the issues of identifying adverse drug effects and how machine learn and big data techniques can solve for them.

Introduction

Adverse drug reactions (ADR) are a serious and complex problem with both a toll in human suffering and cost. In 2013, that cost was estimated to be $30.1 billion (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3853675/). It is unlikely that the number has decreased since the study was completed.

The Problem and Capabilities

The increased availability of electronic medical records (EMR) has opened the door to a more in-depth analysis of this problem. The obvious solution to the problem is to understand what drugs the patient is taking and look for ones that show up across patients having adverse effects. But what appears simple on the surface is not under a more in-depth examination.

One difficulty in solving this problem is analyzing the massive amounts of data to determine the specific interactions between them. Interaction isn't limited to two drugs interacting but could be three or more. Solving this problem involves machine learning to sift through the massive changing amounts of data and adjust the algorithms as needed. This task requires both the big data capabilities to handle the combinatorial explosion of cases across the population. This analysis also needs to include a vast array of personal characteristics.

For example, we have reason to believe that drugs A, B, and C seem to cause an adverse reaction in 15 percent of the population. Now, what are the characteristics of that 15 percent that should not get the drug? There may be hundreds of characteristics that may or may not contribute to the effect. The easy ones to identify are things like gender, race, obesity. Harder are time-related variables such as length of time on the drugs, or length of time having a specific condition. Again, the combinations lead to a vast data set, not only in the number of observations but the number of variables. From the perspective of an Excel spreadsheet, it is both long and wide. This situation requires massive computational power. The solution to this problem might not lie in relational database technology, but might be more amenable to graph and spatial databases.

Another part of the problem is identifying when an ADR has occurred. How do we know that ADR has occurred? Identifying an ADR would require us to know that a patient had specific symptoms. These symptoms are often captured in the text contained in the treatment notes. Therefore, finding and identifying the ADR would involve some form of text analysis. Creating an index of words and their proximity to other words and testing whether they are indicative of an ADR is both a big data and machine learning problem. It involves looking at the providers' notes and determining which words are ADR indicators. This analysis would involve developing and adapting learning and testing models. Also, we want to optimize those models to control for false positives that would lead to patients not getting drugs that could help them.

EMR opens the door to a far more sophisticated analysis than is currently being done in the area of ADR. It is a costly and vital problem. ADRs involves the study of interactions, not only of drugs with other drugs but drugs with patient characteristics. These complex interactions require massive amounts of data to be kept in memory simultaneously. Hana provides this ability. It also provides the flexibility of using not only relational models but document and graph database models as well. Its search capabilities can allow for text analysis. AWS SageMaker provides the ability to apply machine learning to identify and refine the models to detect ADRs.