Groq

Groq provides the fastest inference for LLMs with speeds up to 2,000 tokens/sec. Offers generous free tiers for multiple models with excellent rate limits.

Supported Models

Llama 3.1 8BLlama 3.3 70BLlama 4 MaverickQwen3-32BKimi K2Whisper Large v3

Key Features

2,000 tokens/sec
Streaming support
Whisper STT
Multiple models

Pros

Extremely fast
High rate limits
Great for real-time apps
No credit card required

Cons

Limited to specific models
No GPT/Claude access

Best Use Cases

Real-time chatTrading botsLive coding assistantsVoice applications