Case study

Accelerating science with a data lakehouse

A data platform to handle anything

A global biotechnology company wanted a unified data platform to analyze clinical data of different varieties from multiple sites and curate deeper insights. They selected Slalom to establish a new data lakehouse (AWS/Databricks) to support its data engineers, data scientists, clinical informaticists, clinical operations, other internal users, and external partners in cutting-edge research into new and improved therapies for neurodegenerative diseases (MS, SMA, Parkinson’s, Alzheimer’s, etc.) with real-world research network data.

 

Our client is a global biotechnology company that is a leader in developing and commercializing therapies for rare neurodegenerative diseases and an advocate of real-world research networks. The goal of this project was to transform their data platform to manage their research and participation in real-world research networks in neurodegenerative diseases.

 

Researchers in the digital health division use the data platform to quickly identify relevant datasets from several studies to repurpose for their own research needs, accelerating the process. However, the existing data platform had no central patient view, making it difficult to have full visibility into the data ecosystem. The long lead time for onboarding studies created a lack of confidence in the quality of data. 

 

They needed a solution that could bring together multiple datasets to support faster diagnosis and generate accurate treatment plans in a timely manner for patients with neurogenerative diseases.

 

Slalom initiated four agile delivery teams: one focused on the foundational elements of the new AWS/Databricks data platform, the second on biobanking and clinical operations, the third on imaging workflow, and the fourth on sensor and gyroscopic data.

 

AWS infrastructure, Databricks workspaces, and compute cluster provisioning were optimized through infrastructure as code and configuration-driven Databricks notebooks enabled faster study and data onboarding.

 

Slalom leveraged the latest features of Databricks, including:

  • Delta Live and Expectations to persist and validate data while loading it into a medallion architecture
  • Databricks Workflows to manage the orchestration of data

 

Solution

  • A medallion architecture to persist data in 
  1. Raw format (bronze layer)
  2. Terminology standardization applied (silver layer)
  3. Participant and patient-centric data model (gold layer)
  • Automated consent checks to ensure patient privacy and consent are adhered to before data is used for research purposes
  • Built-in data validation to ensure ingested data meets the standard required for research
  • Configuration-driven data pipelines that enable fast and reliable data pipelines to feed accurate and complete data for clinical informatics
  • Templated workflows for a robust DataOps process

 

Outcomes

The data platform provides a flexible and scalable system to onboard, filter, and curate data of different varieties to help advance the treatment of neurodegenerative diseases. 

  • 6x improvements in time to onboard studies
  • 4 different studies consolidated and unified on the data platform

 

The solution has accelerated the research and pharmaceutical drug development time to market for a new product. The new processes and templates are resulting in studies being onboarded in a matter of weeks instead of months, and the platform now has the ability to ingest petabyte-size sensor and gyroscopic data, which wasn’t possible in the old platform.

 

We believe that by focusing on broader organizational objectives and enabling our life sciences and healthcare clients for success in the long term, we craft the right holistic solutions—and drive incremental consumption for our partners.

 

Want to learn more? Reach out to Amber Sexton at amber.sexton@slalom.com.