Safa Global · Voice AI
A 2026 vendor decision for the holding company, spanning Aquiii, Safa Health, and support operations.
Executive audio briefing. Two hosts, about 22 minutes.
The decision
Standardize on ElevenLabs as the holding-wide default. It is the only 2026 vendor that bundles top-tier TTS, accurate STT (Scribe v2), best-in-class voice cloning, and native agent, telephony, and WhatsApp deployment in one ecosystem. Breadth, not raw quality, is why it wins as the default.
Deviate twice: an Azure + Deepgram pipeline for Safa Health (HIPAA), and Cartesia + Deepgram + Retell for high-volume telephony. Self-host (Kokoro 82M) only when a workload's text volume permanently exceeds 10M characters per month.
ElevenLabs is no longer the outright quality leader, but it is still the most complete platform.
Reading: for a brand-voice, multi-product holding company, ecosystem breadth beats a few Elo points. Standardizing cuts integration cost across ventures.
Consumer, Spanish-first Mexico, expressive character, WhatsApp ordering.
| Layer | Pick | Why |
|---|---|---|
| TTS | ElevenLabs Eleven v3 | 70+ languages incl. es-MX, deepest emotion via audio tags, cloning to lock the exact mascot voice |
| STT | ElevenLabs Scribe v2 | ~4% WER, 90+ languages, conversational |
| Orchestration | ElevenLabs Agents | Native WhatsApp inbound/outbound (messages + calls) |
| Biggest risk | Latency. v3 trades speed for quality (1-2s). Use Flash v2.5 for real-time turns; reserve v3 for scripted/expressive content. | |
Chronic-disease care, ES/EN/AR, handles patient health info (PHI).
| Layer | Pick | Why |
|---|---|---|
| TTS | Azure Neural | 140+ languages incl. Arabic, enterprise HIPAA. Deepgram TTS is English-only; ElevenLabs cloning is not BAA-covered |
| STT | Deepgram Nova-3 Multilingual | High accuracy on noisy audio, auto language detect, BAA available for PHI |
| Orchestration | Chained pipeline (Pipecat / LiveKit) to a HIPAA-eligible LLM (Azure OpenAI) | OpenAI Realtime audio is NOT HIPAA-eligible as of May 2026 |
| Biggest risk | Loss of real-time speed. Compliance forces a chained pipeline, so expect 1.2-1.8s end-to-end and less natural prosody. | |
Inbound/outbound phone, real-time, cost-sensitive at scale.
| Layer | Pick | Why |
|---|---|---|
| TTS | Cartesia Sonic 3 | Latency leader (40ms TTFA), $50 / 1M char |
| STT | Deepgram Nova-3 | Sub-300ms, $0.0048 / min |
| Orchestration | Retell AI | Built for telephony: SIP, inbound routing, warm human handoff |
| Biggest risk | PSTN audio degradation. 8kHz G.711 phone audio hurts both STT accuracy and TTS naturalness vs web audio. | |
The load-bearing constraint for Safa Health.
Only with ALL of: Enterprise tier + signed BAA + Zero Retention Mode ON + LLM restricted to an approved allowlist (Gemini/Claude) or your own API keys.
Zero Retention Mode disables inbound WhatsApp and request-stitching/history. You cannot have both HIPAA-mode and WhatsApp on the same agent.
Voice cloning is not confirmed BAA-covered and is listed as not zero-retention-eligible. Treat a cloned voice + PHI as non-compliant until legal confirms.
OpenAI Realtime: audio modality is NOT HIPAA-eligible. Any OpenAI-based health voice must chain HIPAA-safe STT + TTS around a text LLM.
Implication: keep RiiiRiii (cloning + WhatsApp) and Safa Health (HIPAA) on separate stacks. Do not force one vendor config to serve both.