Machine Learning and Predictive Analytics Applications in Clinical Data Management

January 4, 2019 James Linnington

Machine Learning and Predictive Analytics Applications in Clinical Data Management


A major challenge that the healthcare industry is facing in the 21st century is the amount of data that is being generated. The term that is often coined for this is ‘BIG DATA’.

How to manage, analyze, store, identify patterns and predict trends in big data that provides useful information? is a key question in the industry. The goal is to turn this data into information to acquire knowledge.

Big Data is a challenge due to its characteristics including Volume, Variety, Velocity and Veracity:

Figure 1: Big Data







Volume: The quantity of the data that is being generated.

Velocity: How fast data is being generated.

Variety: The type and nature of data like structured data, semi-structured data and unstructured data.

Veracity: How consistent is the data.

Use of Analytics and Machine Learning Techniques

Many organizations are using different analytical methods to manage and analyze this data.  Using these techniques, companies are trying to answer questions like ‘What has happened?’, ‘What will happen?’ and ‘What should be the course of the action?’ in a given situation.

Predictive Analysis is becoming an area of interest in combination with Machine Learning (ML). Machine Learning is a subset of Artificial Intelligence (AI) where different types of algorithms are used to create specific predictive models. Many pharmaceutical companies and CROs are using these techniques for protocol generation, site selection etc.

Applying ML and Predictive Analysis Techniques in Clinical Data Management

There are several key challenges clinical data management is facing from PEOPLE, PROCESS and TECHNOLOGY perspectives. Some examples are:

  • lack of therapeutic area knowledge
  • difficulty to transfer 100% knowledge among the team members
  • inconsistencies in data acquisition and review methods
  • different types of tools 

All of these can lead to data quality issues, process issues and missing the timelines. Is there a way that we can minimize these issues using technology?  The answer is ‘Yes’. Using Machine Learning methods, PAREXEL can:

  1. Implement standard edit checks for a therapeutic area that can be used for all the studies
  2. Predict adverse events based on historic safety data for the same molecule
  3. Increase the number of auto-coded terms for medication and adverse events coding


Case 1: Implementing Standard Edit Check for Oncology Therapeutic Area Based on

RECIST 1.1 guideline

RECIST 1.1 is the latest guideline for evaluating solid tumors.  For this experiment past tumor data was taken from 2 studies. These data were used to train the Multiclass Decision Tree algorithm and a predictive model was generated. This predictive model was applied to ongoing studies tumor data.  Performance metrics that were generated using this predicted model are shown in Figure 2.

Figure 2:  Metrics










Case 2: Predicting Adverse Events Presence using Laboratory Data

Chemistry Panel data was taken to build a predictive model to check presence of Adverse Events. The Two Class Boosted Decision Tree method was used to create the predictive model. Performance metrics that were generated using this predicted model are shown in Figure 3.


Figure 3:  Metrics











The use of Machine Learning and Predictive Analytics methods in combination can be of great advantage to clinical data management. Predictive models created based on machine learning algorithms can preserve knowledge and be utilized for future studies. It helps in maintaining consistent data and quality throughout the program and portfolio instead of just a single study. It is useful in making early decisions for the whole program or portfolio.

By Yashpalsinh Raj,
Senior Clinical Data Analyst,


Further Reading:

Peter Flach, Machine Learning, The Art and Science of Algorithms that Make Sense of Data

Shai Shalev-Shwartz and Shai Ben-David, Understanding Machine Learning From Theory to Algorithms

Previous Item
Digital Disruptors
Digital Disruptors

Digital technologies, disrupting the drug development journey

Next Item
Machine Learning In Medical Imaging
Machine Learning In Medical Imaging

How PAREXEL used machine learning with medical images to objectively detect disease and response to therapy...