The fastest LLM inference on the planet (LPU)
Groq — The fastest LLM inference on the planet (LPU).
Free tier: Generous free tier with very fast inference. Rate-limited per-model in RPM, RPD, and TPM.
API is OpenAI-compatible — point your SDK at https://api.groq.com/openai/v1.
llama-3.3-70b-versatilellama-3.1-8b-instantmixtral-8x7b-32768gemma2-9b-itDev tier (pay-as-you-go) starts at $0.05 / 1M input tokens for llama-3.1-8b-instant, $0.59 for llama-3.3-70b-versatile (output: $0.79/1M).
On-Demand and Batch tiers exist; contact sales for production SLAs.
https://api.groq.com/openai/v1
OpenAI-compatible — works with the OpenAI SDK by overriding base_url.
curl https://api.groq.com/openai/v1/chat/completions \
-H "Authorization: Bearer $GROQ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.3-70b-versatile",
"messages": [{"role": "user", "content": "Hello in 5 words"}]
}'
Latency-sensitive apps (sub-second response). Chatbots, autocomplete, agents needing fast tool-use.
Vision, image generation, embeddings (Groq is text-only LLM inference).
Generous free tier with very fast inference. Rate-limited per-model in RPM, RPD, and TPM. No credit card is required to sign up.
Most commonly used: llama-3.3-70b-versatile, llama-3.1-8b-instant, mixtral-8x7b-32768, gemma2-9b-it. The full current list is on Groq's docs page.
Yes — point the OpenAI SDK's base URL at `https://api.groq.com/openai/v1` and pass your Groq API key.
Dev tier (pay-as-you-go) starts at $0.05 / 1M input tokens for llama-3.1-8b-instant, $0.59 for llama-3.3-70b-versatile (output: $0.79/1M). If your traffic is bursty or seasonal, the free tier may be enough; if you need a guaranteed SLA, upgrade.