Groq

The fastest LLM inference on the planet (LPU)

TL;DR

Groq — The fastest LLM inference on the planet (LPU). Free tier: Generous free tier with very fast inference. Rate-limited per-model in RPM, RPD, and TPM. API is OpenAI-compatible — point your SDK at https://api.groq.com/openai/v1.

—

Latency now

—

Uptime 24h

Free RPM

14400

Free RPD

Get free API key → Read docs ↗ Pricing ↗

Free tier limits

30 requests/min
14,400 requests/day
6,000 tokens/min
500,000 tokens/day

No credit card required.

Models on free tier

llama-3.3-70b-versatile
llama-3.1-8b-instant
mixtral-8x7b-32768
gemma2-9b-it

Upgrade path

Dev tier (pay-as-you-go) starts at $0.05 / 1M input tokens for llama-3.1-8b-instant, $0.59 for llama-3.3-70b-versatile (output: $0.79/1M).

On-Demand and Batch tiers exist; contact sales for production SLAs.

Endpoint

https://api.groq.com/openai/v1

OpenAI-compatible — works with the OpenAI SDK by overriding base_url.

Quick start

curl https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-versatile",
    "messages": [{"role": "user", "content": "Hello in 5 words"}]
  }'

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_GROQ_KEY",
    base_url="https://api.groq.com/openai/v1",
)

resp = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Hello in 5 words"}],
)
print(resp.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.GROQ_API_KEY,
  baseURL: "https://api.groq.com/openai/v1",
});

const resp = await client.chat.completions.create({
  model: "llama-3.3-70b-versatile",
  messages: [{ role: "user", content: "Hello in 5 words" }],
});
console.log(resp.choices[0].message.content);

When Groq is the right pick

Stay on free tier when

Latency-sensitive apps (sub-second response). Chatbots, autocomplete, agents needing fast tool-use.

Pick something else when

Vision, image generation, embeddings (Groq is text-only LLM inference).

FAQ

Is Groq's API really free?

Generous free tier with very fast inference. Rate-limited per-model in RPM, RPD, and TPM. No credit card is required to sign up.

What models can I call on Groq's free tier?

Most commonly used: llama-3.3-70b-versatile, llama-3.1-8b-instant, mixtral-8x7b-32768, gemma2-9b-it. The full current list is on Groq's docs page.

Is Groq OpenAI-compatible?

Yes — point the OpenAI SDK's base URL at `https://api.groq.com/openai/v1` and pass your Groq API key.

When should I upgrade from Groq's free tier?

Dev tier (pay-as-you-go) starts at $0.05 / 1M input tokens for llama-3.1-8b-instant, $0.59 for llama-3.3-70b-versatile (output: $0.79/1M). If your traffic is bursty or seasonal, the free tier may be enough; if you need a guaranteed SLA, upgrade.