← Back to Projects

Pharma Pipeline Orchestration with Airflow & MWAA

Developed and maintained scalable, automated pipelines for a pharmaceutical analytics platform, utilizing Apache Airflow (MWAA) to orchestrate secure daily and historical data deliveries from de-identified master tables—enabling reliable, privacy-compliant analytics for client overlap studies.

Project Overview

  • Developed modular Python DAGs (Directed Acyclic Graphs) to standardize ETL tasks for all clients.
  • Used YAML-based pipeline configuration for easy modification, scaling, and onboarding new clients and datasets.
  • Built scheduled jobs (daily/historical) that joined de-identified claims and tokenized tables to produce analytic datasets for downstream customer research.

Infrastructure & Security

  • Configured AWS IAM policies for secure, permissioned writes to client S3 buckets.
  • Automated data shipments—leveraging Lambda Functions scheduled by AWS EventBridge—ensuring on-time, unattended delivery for all stakeholders.

Results & Value Delivered

  • Enabled daily, automated, and secure data transfer for multiple pharma clients.
  • Simplified onboarding for new engagement—no-code needed for new schedules or data splits.
  • Reduced manual labor and improved compliance and auditability for both teams.
Tech Stack: Apache Airflow (MWAA) Python YAML AWS Lambda AWS S3 EventBridge IAM Policies