Voice Pipeline

Sub-300ms voice.
End-to-end.

STT → LLM → TTS, all co-located on a single edge node. No cross-region hops. No stitched-together APIs. Just fast.

How It Works

One node. Three stages. 300ms total.

1~50ms

Speech-to-Text

Whisper-class STT

User speech transcribed on the same node. No network hop.

2~180ms

LLM Inference

Open models at the edge

Token generation starts immediately after STT — co-located.

3~70ms

Text-to-Speech

Low-latency synthesis

TTS begins streaming before LLM completes. First audio byte fast.

Total end-to-end latency< 300ms

Pricing

Simple per-minute pricing.

Full Pipeline

$0.07/ min

STT + LLM + TTS end-to-end. Sub-300ms.

STT only

$0.004/ min

Whisper-class transcription at the edge.

TTS only

$0.015/ 1K chars

Low-latency synthesis, multiple voices.

Volume discounts from 5–15% starting at $5K/month. Full pricing →

Use Cases

Built for real-time voice products.

Voice AI Agents

Interview copilots, customer support bots, sales assistants. Sub-300ms is the threshold where voice AI stops feeling like a machine.

Real-Time Transcription

Live captions, meeting notes, accessibility tools. STT at the edge means transcription that keeps up with conversation.

Voice Interfaces

Voice-enabled apps, smart devices, IVR replacements. Consistent low latency regardless of user location.

Conversational AI

Multi-turn voice conversations with context. Fast enough to match natural speech cadence.

Ship sub-300ms voice today.

$500 in free credits. OpenAI-compatible. No infrastructure to manage.