AI Providers

01. Google Gemini API

Gemini API Rate Limits

Model Category RPM limit TPM limit RPD limit
gemini-2.5-flash Text-out models 5 250K 20
gemini-2.5-flash-lite Text-out models 10 250K 20
gemini-2.5-flash-native-audio-dialog Live API Unlimited 1M Unlimited
gemini-2.5-flash-tts Multi-modal 3 10K 10
gemini-3-1b Other models 30 15K 14.4K
gemini-3-2b Other models 30 15K 14.4K
gemini-3-4b Other models 30 15K 14.4K
gemini-3-12b Other models 30 15K 14.4K
gemini-3-flash Text-out models 5 250K 20
gemini-embedding-1.0 Other models 100 30K 1K
gemini-robotics-er-1.5-preview Other models 10 250K 20
gemma-3-1b Other models 30 15K 14.4K
gemma-3-2b Other models 30 15K 14.4K
gemma-3-4b Other models 30 15K 14.4K
gemma-3-12b Other models 30 15K 14.4K
gemma-3-27b Other models 30 15K 14.4K

02. Ollama Cloud Models

Ollama Cloud imposes tiered rate limits to manage capacity and prevent abuse. These vary by plan, with no exact numerical quotas publicly detailed beyond general descriptions.

Designed for light usage like chat and quick model tests. Includes hourly and daily caps, plus per-minute restrictions on rapid API calls.

Free Tier: Ollama Cloud Usage
Session usage - Resets in 5 hours
Weekly usage - Resets in 5 days


03. Openrouter Free Models API

OpenRouter uses global, credit-based limits plus a few hard caps. openrouter

Rate limits

Credits and key info

Other limitations


04. Groq API

MODEL ID RPM RPD TPM TPD ASH ASD
allam-2-7b 30 7K 6K 500K - -
canopylabs/orpheus-arabic-saudi 10 100 1.2K 3.6K - -
canopylabs/orpheus-v1-english 10 100 1.2K 3.6K - -
groq/compound 30 250 70K - - -
groq/compound-mini 30 250 70K - - -
llama-3.1-8b-instant 30 14.4K 6K 500K - -
llama-3.3-70b-versatile 30 1K 12K 100K - -
meta-llama/llama-4-maverick-17b-128e-instruct 30 1K 6K 500K - -
meta-llama/llama-4-scout-17b-16e-instruct 30 1K 30K 500K - -
meta-llama/llama-guard-4-12b 30 14.4K 15K 500K - -
meta-llama/llama-prompt-guard-2-22m 30 14.4K 15K 500K - -
meta-llama/llama-prompt-guard-2-86m 30 14.4K 15K 500K - -
moonshotai/kimi-k2-instruct 60 1K 10K 300K - -
moonshotai/kimi-k2-instruct-0905 60 1K 10K 300K - -
openai/gpt-oss-120b 30 1K 8K 200K - -
openai/gpt-oss-20b 30 1K 8K 200K - -
openai/gpt-oss-safeguard-20b 30 1K 8K 200K - -
qwen/qwen3-32b 60 1K 6K 500K - -
whisper-large-v3 20 2K - - 7.2K 28.8K
whisper-large-v3-turbo 20 2K - - 7.2K 28.8K

Groq enforces per-organization limits on requests, tokens, and audio duration, plus response headers to help you throttle when you hit caps. console.groq

What is limited

Header Value Notes
retry-after 2 In seconds
x-ratelimit-limit-requests 14400 Always refers to Requests Per Day (RPD)
x-ratelimit-limit-tokens 18000 Always refers to Tokens Per Minute (TPM)
x-ratelimit-remaining-requests 14370 Always refers to Requests Per Day (RPD)
x-ratelimit-remaining-tokens 17997 Always refers to Tokens Per Minute (TPM)
x-ratelimit-reset-requests 2m59.56s Always refers to Requests Per Day (RPD)
x-ratelimit-reset-tokens 7.66s Always refers to Tokens Per Minute (TPM)

05. Cohere API

Chat API rate limits (Markdown table)

Model Trial rate limit Production rate limit
Command A Reasoning 20 req / min Contact sales
Command A Translate 20 req / min Contact sales
Command A Vision 20 req / min Contact sales
Command A 20 req / min 500 req / min
Command R+ 20 req / min 500 req / min
Command R 20 req / min 500 req / min
Command R7B 20 req / min 500 req / min

Other endpoints (Markdown table)

Endpoint Trial rate limit Production rate limit
Embed 2,000 inputs / min 2,000 inputs / min
Embed (Images) 5 inputs / min 400 inputs / min
Rerank 10 req / min 1,000 req / min
Tokenize 100 req / min 2,000 req / min
EmbedJob 5 req / min 50 req / min
Default (other) 500 req / min 500 req / min

Summary of rate limits and limitations


06. GitHub Models API

Tier / Model group Metric Copilot Free
Low Requests per minute 15
Low Requests per day 150
Low Tokens per request 8000 in, 4000 out
Low Concurrent requests 5
High Requests per minute 10
High Requests per day 50
High Tokens per request 8000 in, 4000 out
High Concurrent requests 2

Summary of rate limits and limitations


07. Mistral API

Free plan – feature table

Feature Free plan value Notes
Price €0 / $0 Personal use for life and work.
Flash answers Up to 150 per day Quick “flash” responses.
Web searches Base quota Paid plans allow “Up to 5x Free”.
Think mode Base quota Paid plans: “Up to 30x Free”.
Deep research (preview) Base quota Paid plans: “Up to 5x Free”.
Memories 500 Saved and recallable user memories.
Libraries / storage Limited Higher tiers raise to 15–30 GB.
Document uploads Base quota Paid: “Up to 20x Free”.
Image generation Base quota Paid: “Up to 40x Free”.
Code interpreter Base quota Paid: “Up to 5x Free”.
Projects Unlimited Can group chats into projects.
Connectors directory Full access Access to connectors directory.
Custom MCP connectors Not included Marked as “Custom” only on higher tiers.
Voice / canvas / agents Limited or not listed Described as “Custom” mainly for paid tiers.
Customer support Help center only No chat/email support on Free.

Rate limits and other limitations (Free only)


08. Cerebras API

Free tier rate‑limit table (Markdown)

Model TPM TPH TPD RPM RPH RPD
gpt-oss-120b 60,000 1,000,000 1,000,000 30 900 14,400
llama3.1-8b 60,000 1,000,000 1,000,000 30 900 14,400
llama-3.3-70b 60,000 1,000,000 1,000,000 30 900 14,400
qwen-3-32b 60,000 1,000,000 1,000,000 30 900 14,400
qwen-3-235b-a22b-instruct-2507 60,000 1,000,000 1,000,000 30 900 14,400
zai-glm-4.7 60,000 1,000,000 1,000,000 10 100 100

Summary of free‑tier rate limits and other limitations


09. Cloudflare API

Feature Workers Free Workers Paid
Request 100,000 requests/day
1000 requests/min
No limit
Worker memory 128 MB 128 MB
CPU time 10 ms 5 min HTTP request
15 min Cron Trigger
Duration No limit No limit for Workers.
15 min duration limit for
Cron Triggers,
Durable Object Alarms and 
Queue Consumers

Cloudflare Workers Free gives you limited daily traffic and tighter per-request resources compared to paid plans. developers.cloudflare


10. Huggingface Inference API

Free plan rate‑limit table (Markdown)

Plan API requests / 5 min Resolver requests / 5 min Pages requests / 5 min
Free user 1,000 * 5,000 * 200 *

Summary – rate limits and other limitations (Free only)


11. Nvidia NIM API

NVIDIA does not actually publish concrete free‑tier model limits for NIM in this thread, so there is no detailed table beyond the one mentioned value. forums.developer.nvidia

Minimal “table” from the discussion

Aspect Trial / free experience detail
Published per‑model limits Not published
Example mentioned by the user 40 requests per minute (from NVIDIA API catalog trial)
Token / context window limits Not disclosed

Summary – rate limits and other limitations (trial / free)


12. Vercel API

Vercel AI Gateway’s free tier is credit-based rather than having explicit RPM/TPM quotas. vercel

Plan Monthly credit When free ends
AI Gateway $5 When you purchase any AI Credits

Summary – rate limits and other limitations (free only)


13. Iflow API

Limit type Value / behavior
Concurrent requests 1 request per user at a time (global concurrency limit)
Over-limit behavior Additional requests return HTTP 429
Pricing Free to use
Streaming requests Tokens released immediately after active cancellation
Non-streaming Model continues running;
tokens released only after completion, even if canceled

Summary – rate limits and limitations (free only)


14. Perplexity API

Perplexity Pro affects billing/credits, not API rate limits directly. perplexity

Plan Monthly API credits Credit refresh timing Notes
Perplexity Pro $5 1st day of each month Auto-applied

Summary – limits and other conditions (Perplexity Pro)


15. AI ML API

Plan type Requests Models available Credit / cost limits Notable exclusions
Free Unverified 10 requests per hour Gemma 3 4b, Gemma 3 12b, Gemma 3n 4b No explicit daily credit cap mentioned All other models
Free Verified 10 req/hour on FREE models; 10 req/day on cheap paid models Any model costing < 50,000 credits per request 50,000 credits per day; model cost ≤ $0.025 or 50,000 credits Audio, video, and most image models (typically > 50,000 credits)

Summary of rate limits and other limitations


16. Opencode Zen API

Zen does not have a classic “free tier” with rate limits; instead, some models are priced at $0 per 1M tokens, and the platform lets you cap spend with monthly limits. opencode

Model Input price / 1M tokens Output price / 1M tokens Cached read / 1M tokens Cached write / 1M tokens Notes
Big Pickle Free Free Free Free during limited beta
GPT 5 Nano Free Free Free Always-free listed model

Summary – limits and other conditions for “free” usage


17. Pollination AI API

Pollinations.ai provides a free, open-source API alternative compatible with OpenAI-style endpoints for text, image, and audio generation. pollinations

No signup or API keys are needed for basic use, prioritizing privacy with zero data storage. github

Key Features

Usage Examples

Access via simple URLs:

Integrate in code, like Python requests for images or React hooks for apps. github

OpenAI Compatibility

Offers OpenAI-compatible text models listable via API; proxy-like access to premium models without direct OpenAI keys. enter.pollinations


18. Google Gemini CLI

For the paid Gemini Code Assist subscriptions (what they effectively treat as the “Gemini Pro” style plans), the CLI lists higher fixed quotas. geminicli

Plan Requests / user / day Requests / user / minute Models used
Code Assist Standard edition 1500 120 Gemini family (auto-chosen)

Summary – Gemini “Pro”/paid limits and behavior


19. Iflow CLI

The page documents configuration knobs, not hosted rate limits; the only “limits” you can set or hit are things like session turns and token budgets. platform.iflow

Setting / limit Default value What it does
Current RPS guidance ~1 RPS until 429 Recommended request rate
maxSessionTurns -1 Unlimited turns per chat session
tokensLimit 128000 Maximum context window length
compressionTokenThreshold 0.8 Auto-compress

Summary for free usage


20. Kilo Code CLI

Kilo Code itself does not impose numeric rate limits; it relies on free quotas from external providers and on models that are priced at $0. kilo

Category Model / Provider Cost in Kilo Code
OpenRouter free models Qwen3 Coder (free) Free via OpenRouter
OpenRouter free models Z.AI GLM‑4.5 Air (free) Free via OpenRouter
OpenRouter free models DeepSeek R1 0528 (free) Free via OpenRouter
OpenRouter free models MoonshotAI Kimi K2 (free) Free via OpenRouter

21. Mistral Vide CLI

Mistral’s docs describe how limits work but do not publish concrete free‑tier numbers on this page. docs.mistral

Aspect Free API tier behavior
Availability Yes, a dedicated free API tier
Purpose Trying and exploring the API, not production workloads
Limit types Requests per second (RPS), tokens per minute/month
Scope Limits applied at workspace level
Configuration/visibility Exact limits shown only in AI Studio “limits” page per workspace
Upgrades Higher tiers provide higher limits; contact support to increase

22. Opencode CLI

Zen does not have a classic “free tier” with rate limits; instead, some models are priced at $0 per 1M tokens, and the platform lets you cap spend with monthly limits. opencode

Model Input price / 1M tokens Output price / 1M tokens Cached read / 1M tokens Cached write / 1M tokens Notes
Big Pickle Free Free Free Free during limited beta
GPT 5 Nano Free Free Free Always-free listed model

Summary – limits and other conditions for “free” usage


23. Qwen CLI

Free option Requests per day Requests per minute Token limits
Qwen OAuth 2,000 60 No token counting

Summary of rate limits and other limitations


24. GitHub Copilot CLI

Plan Monthly price Premium requests (chat, agents, reviews, CLI) Code completions Models access
Free $0 50 per month 2,000 per month Haiku 4.5, GPT‑4.1, GPT‑5 mini, and more

Summary – rate limits and other limitations (Free only)