MedSpeech
A medical voice-to-text service with AI-enhanced processing. Healthcare professionals upload audio recordings and receive structured, enriched transcripts powered by multiple AI providers.
Overview
A medical voice-to-text service with AI-enhanced processing. Healthcare professionals upload audio recordings and receive structured, enriched transcripts powered by multiple AI providers.
Problem
Medical professionals needed a way to convert spoken clinical notes into structured digital records with additional AI-powered analysis — beyond what commodity transcription services provide for medical contexts.
Solution
Async upload pipeline where audio is queued via Celery, transcribed through OpenAI Whisper, Gemini, and Groq APIs with provider fallback logic, and returned as enriched structured output. Separate microservices handle the landing, dashboard, and processing layers.
Architecture
Upload-first async pipeline: audio → object storage → Celery task triggered → AI provider chain (OpenAI → Gemini → Groq fallback) → post-processing → results stored in PostgreSQL → served via DRF. Status polling endpoint allows frontend to track job progress. Each layer independently deployed via Docker.
Challenges
Orchestrating multi-provider AI calls with graceful fallback when a provider is unavailable or rate-limited.
Designing job status tracking so the frontend could reliably poll or eventually receive webhook callbacks.
Maintaining acceptable transcription latency for clinical use cases without blocking on synchronous AI responses.
Outcomes
Production service processing real medical audio for Medical Toxicology clients.
Multi-provider AI chain provides redundancy and the ability to switch providers without code changes.
Reduced manual clinical transcription time by automating the full audio-to-structured-text pipeline.
Lessons Learned
AI API latency is unpredictable — async-first with polling/webhook patterns is the only production-safe approach.
Provider abstraction layers pay dividends early when you need to swap between OpenAI, Gemini, and Groq.
Medical data requires storage, encryption, and access control from the very first commit.