Inabia Software & Consulting — Data Engineering for Healthcare Compliance
Problem
Pharmaceutical clients needed to ensure data integrity for regulatory compliance audits while processing massive volumes of sensitive health data efficiently.
Solution
Built ETL pipelines to ingest and normalize 1 TB of sensitive health data, and developed NLP models to flag problematic advertising content.
Key Impact
- •Improved regulator trust by developing NLP models to flag problematic advertising
- •Enhanced classification accuracy by 15-20% across 500K inquiries
- •Reduced miscommunication liability for health-care organizations
- •Boosted processing efficiency by 20%
The Work
During my internship at Inabia Software & Consulting, I worked on critical data infrastructure for pharmaceutical clients navigating complex healthcare compliance requirements. My primary focus was building robust ETL (Extract, Transform, Load) pipelines to handle sensitive health data at scale.
What I Built
ETL Pipelines for Health Data
Built production-grade data pipelines to ingest and normalize 1 TB of sensitive health data from multiple sources, ensuring data integrity for regulatory compliance audits.
Technical Implementation:
- Designed data ingestion workflows for multiple pharmaceutical data sources
- Implemented data normalization and validation logic
- Built error handling and data quality checks
- Ensured HIPAA compliance throughout data processing
Impact: Enabled pharmaceutical clients to process health data 20% faster while maintaining strict compliance standards for regulatory audits.
NLP Models for Advertising Compliance
Developed natural language processing models to automatically flag problematic advertising content in pharmaceutical communications.
The Challenge: Pharmaceutical companies face strict regulations on advertising claims. Manual review of all communications was time-consuming and prone to inconsistencies.
The Solution: Built NLP classifiers to detect regulatory red flags in advertising copy, improving both speed and consistency of compliance reviews.
Technical Details:
- Trained classification models on 500K+ pharmaceutical inquiries and advertisements
- Implemented text preprocessing and feature extraction pipelines
- Built validation framework to measure model accuracy
- Deployed models for real-time content flagging
Results:
- Enhanced classification accuracy by 15-20% across 500K inquiries
- Improved regulator trust through consistent compliance screening
- Reduced miscommunication liability for health-care organizations
Impact
What We Achieved:
- ✅ 20% efficiency gain: Boosted data processing speed for compliance audits
- ✅ 1 TB data processed: Built scalable ETL infrastructure for sensitive health data
- ✅ 15-20% accuracy improvement: Enhanced NLP classification for compliance screening
- ✅ Risk reduction: Minimized miscommunication liability through automated flagging
Client Value: Pharmaceutical companies gained faster, more reliable compliance infrastructure, reducing both processing time and regulatory risk.
Stack: Python, SQL, Pandas, ETL Pipelines, NLP, Healthcare Compliance
Timeline: May - August 2023
Organization: Inabia Software & Consulting (Data consulting firm specializing in healthcare and pharmaceutical industries)
Key Achievement: Built production ETL pipelines processing 1 TB of health data and NLP models improving compliance accuracy by 15-20%