Challenge
Building AI features for protected-health-data workloads demands airtight compliance and careful cost control; most teams manage one or the other but rarely both.
Strategy
- Treat compliance as a first-class engineering concern: policy-as-code, automated audits, immutable logs.
- Embed cost-aware routing—select models and hardware dynamically by security tier, latency target, and budget.
- Maintain an end-to-end traceability map linking prompts, model versions, and PHI transformations.
Execution
- Integrated AWS KMS + VPC endpoints to ensure PHI never left encrypted boundaries during inference.
- Implemented a model-selection broker that chooses GPT-4o, Claude, or an on-premise distilled model based on sensitivity and token budget.
- Wired OpenTelemetry spans to capture prompt, response, latency, and dollar spend—surfacing live dashboards and alert thresholds.
- Automated quarterly HIPAA and SOC 2 evidence packs, generated directly from pipeline metadata.
Outcomes
- Passed HIPAA compliance renewal with zero remediation tasks.
- Held average inference cost to ≤ $0.09 per user session, a 45 % drop versus baseline.
- Reduced security-review cycle time from three weeks to four days thanks to traceable, policy-as-code artifacts.
Key Capabilities Demonstrated
- Regulated-AI architecture (HIPAA, SOC 2)
- Inference-cost governance & dynamic model routing
- Audit-ready observability across the entire AI lifecycle