Your models.
Our edge. Their speed.
Fine-tuned LLMs, proprietary voice models, custom embeddings — run them on PolarGrid's edge GPU network. Sub-30ms RTT, full data control, no infrastructure to manage.
Bring Your Own Models
Take your model library to the next level on PolarGrid's distributed edge network.
You've fine-tuned for your domain, your voice, your data. Running it on a distant hyperscaler defeats the purpose. PolarGrid brings it to the edge — closer to users, faster responses.
Deployment options
How to deploy.
Fine-Tuned Model Hosting
Custom weightsYou've fine-tuned a model on your data. We serve it on PolarGrid edge nodes — same low-latency infrastructure, no infra for you to manage. Bring LoRA adapters, full fine-tunes, or GGUF weights.
- ✓Any HuggingFace-compatible checkpoint
- ✓LoRA & full fine-tune support
- ✓Pay-per-token, same rates as shared models
- ✓OpenAI-compatible API
- ✓No cold starts
Proprietary Model Deployment
EnterpriseYour team has built or licensed a model you want running at the edge — not on shared infrastructure. Dedicated GPU allocation, reserved VRAM, production SLA, and full isolation.
- ✓Dedicated GPU allocation
- ✓Reserved VRAM — no queue, no sharing
- ✓Node-level SLA
- ✓Full model isolation
- ✓Priority routing across nodes
How it works
Upload. Deploy. Infer.
01
Upload your model.
Point us to your HuggingFace repo, or upload weights directly. Any architecture we support on the inference stack — transformers, diffusers, custom.
02
We deploy to the edge.
Your model is loaded onto PolarGrid GPU nodes and served via our intelligent auto-router. Requests are intelligently routed to the node that provides each specific user the lowest TTFT for that request.
03
Call it like any other model.
Use the standard OpenAI-compatible API with your custom model ID. Same SDKs, same integration — just your model at the edge.
// Your fine-tuned model — same API as everything else const client = new OpenAI({ baseURL: "https://autorouter.edge.polargrid.ai/v1", apiKey: process.env.POLARGRID_API_KEY, }); // Call your model by its custom ID const response = await client.chat.completions.create({ model: "your-org/your-fine-tuned-model", messages: [{ role: "user", content: "..." }], }); // Edge-routed. Sub-300ms. Your model.
Ready to bring your model to the edge?
Talk to us about your model, your use case, and your scale. We'll find the right deployment option.