Inference API

Faster inference.
Drop-in replacement.

OpenAI-compatible LLM inference at the edge. Sub-300ms TTFA. Up to 13× cheaper than GPT-4o. Change one line of code.

Integration

One line to switch.

migration.py
# Before
client = OpenAI(
    base_url="https://api.openai.com/v1",
    api_key=os.environ["OPENAI_API_KEY"]
)

# After: PolarGrid — edge-routed, up to 13× cheaper
client = OpenAI(
    base_url="https://autorouter.polargrid.ai/v1",
    api_key=os.environ["POLARGRID_API_KEY"]
)

# Everything else stays the same.

Models

Top open models. Owned infrastructure.

ModelInput / 1MOutput / 1M
Qwen 3.5 9BMost Popular

Fast, capable. Best for latency-sensitive applications.

$0.055$0.085
Qwen 3.5 27B

Mid-range. Strong reasoning, longer context.

$0.20$0.75
Llama 3.3 70B

Large model. Best for complex tasks and high-quality output.

$0.55$1.80

Full pricing & model list →

Features

Everything you need. Nothing you don't.

OpenAI-compatible API

Change base_url and API key. Every SDK, framework, and integration you already use continues to work.

Edge-routed requests

Every request automatically routes to the fastest node for that user's location. No config required.

Always-warm models

Models are loaded and warm 24/7. No cold starts, no container spin-up, no first-request penalty.

Streaming support

Server-sent events streaming works out of the box. Tokens arrive as fast as the model generates them.

Python & TS SDKs

Native SDKs available. Or use any OpenAI-compatible library directly — it just works.

Usage dashboard

Real-time usage, costs, and request logs in the console at app.polargrid.ai.

Start inferring at the edge.

$500 in free credits. No credit card required.