← Back to Projects
Pharma Pipeline Orchestration with Airflow & MWAA
Developed and maintained scalable, automated pipelines for a pharmaceutical analytics platform, utilizing Apache Airflow (MWAA) to orchestrate secure daily and historical data deliveries from de-identified master tables—enabling reliable, privacy-compliant analytics for client overlap studies.
Project Overview
- Developed modular Python DAGs (Directed Acyclic Graphs) to standardize ETL tasks for all clients.
- Used YAML-based pipeline configuration for easy modification, scaling, and onboarding new clients and datasets.
- Built scheduled jobs (daily/historical) that joined de-identified claims and tokenized tables to produce analytic datasets for downstream customer research.
Infrastructure & Security
- Configured AWS IAM policies for secure, permissioned writes to client S3 buckets.
- Automated data shipments—leveraging Lambda Functions scheduled by AWS EventBridge—ensuring on-time, unattended delivery for all stakeholders.
Results & Value Delivered
- Enabled daily, automated, and secure data transfer for multiple pharma clients.
- Simplified onboarding for new engagement—no-code needed for new schedules or data splits.
- Reduced manual labor and improved compliance and auditability for both teams.
Tech Stack: Apache Airflow (MWAA) Python YAML AWS Lambda AWS S3 EventBridge IAM Policies