Sub-300ms voice.
End-to-end.
STT → LLM → TTS, all co-located on a single edge node. No cross-region hops. No stitched-together APIs. Just fast.
How It Works
One node. Three stages. 300ms total.
Speech-to-Text
Whisper-class STT
User speech transcribed on the same node. No network hop.
LLM Inference
Open models at the edge
Token generation starts immediately after STT — co-located.
Text-to-Speech
Low-latency synthesis
TTS begins streaming before LLM completes. First audio byte fast.
Pricing
Simple per-minute pricing.
Full Pipeline
$0.07/ min
STT + LLM + TTS end-to-end. Sub-300ms.
STT only
$0.004/ min
Whisper-class transcription at the edge.
TTS only
$0.015/ 1K chars
Low-latency synthesis, multiple voices.
Volume discounts from 5–15% starting at $5K/month. Full pricing →
Use Cases
Built for real-time voice products.
Voice AI Agents
Interview copilots, customer support bots, sales assistants. Sub-300ms is the threshold where voice AI stops feeling like a machine.
Real-Time Transcription
Live captions, meeting notes, accessibility tools. STT at the edge means transcription that keeps up with conversation.
Voice Interfaces
Voice-enabled apps, smart devices, IVR replacements. Consistent low latency regardless of user location.
Conversational AI
Multi-turn voice conversations with context. Fast enough to match natural speech cadence.
Ship sub-300ms voice today.
$500 in free credits. OpenAI-compatible. No infrastructure to manage.