Foundational Differentiators in the Health Data Ecosystem

Avatar photo Team Zaloni April 7th, 2016

Leading pharmaceutical companies and Clinical Research Organizations (CROs) are already leveraging the nascent concepts of managed data lakes to accelerate patient recruitment and reduce costs by implementing risk-based monitoring approaches. In order for drug companies and associated partner, clinical research organizations to prevail in the competitive process of getting FDA approval for a drug, the data management function has to become lean and agile compared to the time-honored but inefficient practices of previous decades.

Implementing such comprehensive end-to-end harmonization and integration of data requires a number of capabilities that are enabled by nascent concepts such as managed data lakes. Data security, data traceability, data provenance and clinical concept semantics will all be key facets and functions that this foundational data infrastructure will need to provide ubiquitously and reliably.

Some implications of the rapid acceleration and adoption of data-driven evidence-based medicine:

  • High-throughput data production will become an essential part off any future managed data lake as velocity and volume of clinical/lab data increases exponentially in the immediate future
  • Miniaturized biosensors and nanotechnology embedded in diseased tissues will contribute huge amounts of live streaming data that will need to be handled, ingested, examined and retained
  • The sheer scale and immediacy of data that will need to be addressed will bring techniques such as event stream processing to the forefront
  • Remote monitoring of patients and their adherence to medication regimes will further ensure avoidance of hospital readmissions.

The beginning of the arduous drug approval process is usually conducted in labs and research institutions, called Academic and Laboratory Research Phase or the Pre Clinical phase. This is followed by Clinical Trials Phases I through IV culminating in FDA approval for the drug or not (usually resulting in termination of further interest in drug development).

Key success factors for a pharma company:

  • One or more successful products on the market that have patent protection (typically 10 years)
  • A large pipeline of candidate drugs with some in late-stage Phase III status for ongoing sustenance
  • The ability to find and recruit patients for the clinical trials in a timely and competitive manner
  • The ability to manage the vast amounts of data generated in Phase III and IV and remain in compliance with multinational regulations for patient safety and good clinical practice.
  • Cash to fund the development of their new drug candidates

A Pharma CRO Patient Recruitment Scenario: 

Consider the case of a new special purpose hypertension drug under clinical development where the stringent inclusion and exclusion criteria for patient recruitment must be managed across multiple sites and countries. 


  • Aged 22–85 years
  • Hypertension Diagnoses
  • Patients known to have taken one or two prescriptions for hypertension
  • ACE or ARB
  • CAD Diagnosis


  • Patients taking > 3 medications for Hypertensions*
  • Type 1 or Type 2 diabetes diagnosis

A representative illustration of pharmacy drug fill history data that could be leveraged for patient recruitment is provided as an example below:

pharmacy drug fill history data

In the past, the techniques for accelerating drug development used to hinge primarily on the identification of key physicians (principal investigators) and key hospitals (sites) where patients go for treatment. This implied an often misplaced reliance on the accuracy and currency of provider, hospital, or CRO databases or public health records to identify the candidate patient pools. The attendant process and paperwork to get such information often proved to be very onerous and expensive.

However, with the advent of managed data lakes and patient recruitment analytics, the existing data from diverse hospitals, practices, retail pharmacies, and social media sites can be leveraged to address time-sensitive patient recruitment challenges and to lock-in competitive advantage. What could only have been achieved by paying for access to expensive patient registries and CRO databases is now becoming commoditized by the foundational data management platforms that offer these facilities as platform services.

To learn more about managed data lakes, foundational data management platforms for health and life sciences, watch our on-demand webinar on Health Informatics and Managed Data Lakes.

about the author

This team of authors from Team Zaloni provide their expertise, best practices, tips and tricks and use cases across varied topics incuding: data governance, data catalog, dataops, observability, and so much more.