Home / Providers / Groq

Groq

The fastest LLM inference on the planet (LPU)

TL;DR

Groq — The fastest LLM inference on the planet (LPU). Free tier: Generous free tier with very fast inference. Rate-limited per-model in RPM, RPD, and TPM. API is OpenAI-compatible — point your SDK at https://api.groq.com/openai/v1.

Latency now
Uptime 24h
30
Free RPM
14400
Free RPD
Get free API key → Read docs ↗ Pricing ↗

Free tier limits

  • 30 requests/min
  • 14,400 requests/day
  • 6,000 tokens/min
  • 500,000 tokens/day
No credit card required.

Models on free tier

  • llama-3.3-70b-versatile
  • llama-3.1-8b-instant
  • mixtral-8x7b-32768
  • gemma2-9b-it

Upgrade path

Dev tier (pay-as-you-go) starts at $0.05 / 1M input tokens for llama-3.1-8b-instant, $0.59 for llama-3.3-70b-versatile (output: $0.79/1M).

On-Demand and Batch tiers exist; contact sales for production SLAs.

Endpoint

https://api.groq.com/openai/v1

OpenAI-compatible — works with the OpenAI SDK by overriding base_url.

Quick start
curl https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-versatile",
    "messages": [{"role": "user", "content": "Hello in 5 words"}]
  }'
When Groq is the right pick

Stay on free tier when

Latency-sensitive apps (sub-second response). Chatbots, autocomplete, agents needing fast tool-use.

Pick something else when

Vision, image generation, embeddings (Groq is text-only LLM inference).

FAQ
Is Groq's API really free?

Generous free tier with very fast inference. Rate-limited per-model in RPM, RPD, and TPM. No credit card is required to sign up.

What models can I call on Groq's free tier?

Most commonly used: llama-3.3-70b-versatile, llama-3.1-8b-instant, mixtral-8x7b-32768, gemma2-9b-it. The full current list is on Groq's docs page.

Is Groq OpenAI-compatible?

Yes — point the OpenAI SDK's base URL at `https://api.groq.com/openai/v1` and pass your Groq API key.

When should I upgrade from Groq's free tier?

Dev tier (pay-as-you-go) starts at $0.05 / 1M input tokens for llama-3.1-8b-instant, $0.59 for llama-3.3-70b-versatile (output: $0.79/1M). If your traffic is bursty or seasonal, the free tier may be enough; if you need a guaranteed SLA, upgrade.

See also
Groq vs Cerebras Inference

Side-by-side comparison.

Groq vs Together AI

Side-by-side comparison.

Groq vs OpenRouter

Side-by-side comparison.

Groq vs GitHub Models

Side-by-side comparison.