Inference API

Top open models.
Edge-fast. OpenAI-compatible.

LLMs, speech-to-text, and text-to-speech — running on PolarGrid's edge GPU network. Drop-in replacement for OpenAI. Dramatically faster responses.

Model catalog

Everything you need on one API.

Large Language Models

Qwen 3.5 27B

Most capable

27B · FP8 · Apache 2.0

$0.20 / $0.75 per 1M tokens

Qwen 3.5 9B

Balanced

9B · FP8 · Apache 2.0

$0.055 / $0.085 per 1M tokens

Llama 3.1 8B

Fastest

Meta · 8B · Llama 3.1 license

$0.05 / $0.08 per 1M tokens

Speech-to-Text

Whisper Large V3 Turbo

Speed-optimized

OpenAI · 809M · Apache 2.0

$0.004 / min

Cohere Transcribe

Multilingual

3B · 14 languages · Apache 2.0

$0.004 / min

Text-to-Speech

Hume AI TADA

Voice cloning

3B · 10 languages · voice cloning

$0.008 / min

Kokoro 82M

Ultra-fast

82M · Apache 2.0

$0.008 / min

Need custom or fine-tuned models? See custom model hosting →

Integration

One line to switch.

PolarGrid is a drop-in replacement for the OpenAI API. Change your base URL and you're done — every SDK, framework, and tool you already use continues to work.

pip install polargrid-sdknpm install @polargrid/polargrid-sdk
quickstart.ts
// Before
const client = new OpenAI({
  baseURL: "https://api.openai.com/v1"
 });

// After — edge-routed, faster
const client = new OpenAI({
  baseURL: "https://autorouter.edge.polargrid.ai/v1",
  apiKey: process.env.POLARGRID_API_KEY,
});

// Same interface. Faster.

Start with $500 free.

No credit card required. Access every model from day one.