10 providers tracked. Probed every 5 minutes.
The fastest LLM inference on the planet (LPU)
→ Google AI Studio (Gemini)Free Gemini API access — 1M-token context, multimodal
→ OpenRouterOne API, 300+ models — including many free ones
→ Together AIWide open-source model catalog with serverless + dedicated
→ Cerebras InferenceWafer-scale chips → fastest open-model inference (often >2,000 tok/s)
→ Mistral La PlateformeEU-based; Mistral Small / Codestral with experimental free tier
→ CohereTrial keys for Command R+, Embed, Rerank — RAG-friendly
→ Hugging Face Inference300k+ open-source models, free serverless inference
→ GitHub ModelsFree GPT-4o, Llama, Mistral via your GitHub PAT
→ SambaNova CloudRDU-accelerated Llama 3 with very high tok/s