300k+ open-source models, free serverless inference
Hugging Face Inference — 300k+ open-source models, free serverless inference.
Free tier: Free serverless inference — strict per-account quotas, model can cold-start. Best for prototyping, not production.
API is OpenAI-compatible — point your SDK at https://api-inference.huggingface.co.
meta-llama/Llama-3.3-70B-Instructmistralai/Mistral-Nemo-Instruct-2407Qwen/Qwen2.5-72B-Instructdeepseek-ai/DeepSeek-R1-Distill-Llama-70BPro plan ($9/mo) → 20× rate limit. HF Inference Endpoints (dedicated) start at ~$0.06/hour for CPU, $0.60/hr for small GPU.
HF Enterprise Hub, private model hosting, SOC 2.
https://api-inference.huggingface.co
OpenAI-compatible — works with the OpenAI SDK by overriding base_url.
curl https://api-inference.huggingface.co/models/meta-llama/Llama-3.3-70B-Instruct/v1/chat/completions \
-H "Authorization: Bearer $HF_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3.3-70B-Instruct",
"messages": [{"role": "user", "content": "Hello in 5 words"}]
}'
Trying any of the 300k+ HF models without setup. Niche or fine-tuned models that other providers do not host.
Production traffic on free tier — cold starts and quotas hurt UX.
Free serverless inference — strict per-account quotas, model can cold-start. Best for prototyping, not production. No credit card is required to sign up.
Most commonly used: meta-llama/Llama-3.3-70B-Instruct, mistralai/Mistral-Nemo-Instruct-2407, Qwen/Qwen2.5-72B-Instruct, deepseek-ai/DeepSeek-R1-Distill-Llama-70B. The full current list is on Hugging Face Inference's docs page.
Yes — point the OpenAI SDK's base URL at `https://api-inference.huggingface.co` and pass your Hugging Face Inference API key.
Pro plan ($9/mo) → 20× rate limit. HF Inference Endpoints (dedicated) start at ~$0.06/hour for CPU, $0.60/hr for small GPU. If your traffic is bursty or seasonal, the free tier may be enough; if you need a guaranteed SLA, upgrade.