Rate limits per plan
The table
| Plan | Requests per minute | Requests per day |
|---|---|---|
| Free | 10 | 100 |
| Starter | 60 | 3,000 |
| Growth | 200 | 15,000 |
| Business | 1,000 | 50,000 |
| Enterprise | Custom | Custom |
Limits are per API key, not per workspace. If you have two keys on the same Starter workspace, each key gets its own 60-per-minute, 3,000-per-day budget. The aggregate workspace-level cap is the per-day number multiplied by the number of keys you've created, up to the plan's monthly query allowance.
What counts as a "request"
Any HTTP call to /v1/* that returns a 2xx response. Specifically:
- Successful
POST /v1/querycalls count. - Successful
POST /v1/query/streamcalls count. - 4xx and 5xx responses don't count against quota.
OPTIONSpreflight requests don't count.- Webhook deliveries from AskVault to your endpoint don't count toward your quota; we eat that cost.
Response headers
Every response includes four rate-limit headers:
X-RateLimit-Limit-Minute: 60X-RateLimit-Remaining-Minute: 47X-RateLimit-Limit-Day: 3000X-RateLimit-Remaining-Day: 2845X-RateLimit-Reset-Day: 1747353600X-RateLimit-Reset-Day is a Unix timestamp (seconds) when the daily counter resets. Use it to schedule retries cleanly.
429 Too Many Requests
When you exceed either limit, AskVault returns:
HTTP/1.1 429 Too Many RequestsRetry-After: 23Content-Type: application/json
{ "detail": "Per-minute rate limit exceeded. Retry after 23 seconds." }The Retry-After header is in seconds. Sleep for that long, then retry. Don't retry immediately; you'll just get another 429.
The minute and day counters are independent. Hitting the per-minute cap means waiting up to 60 seconds. Hitting the per-day cap means waiting until midnight UTC.
Best practices
A few practices for production integrations:
Respect Retry-After. Most rate-limit issues in real apps come from clients that ignore Retry-After and retry in tight loops. The right behavior is back off, then retry. A simple exponential backoff that respects Retry-After works: wait max(retryAfter, base * 2^attempt) seconds.
Pre-compute backoff. Don't compute Retry-After after every 429. Track your remaining quota in advance using X-RateLimit-Remaining-Day. When it drops below your buffer threshold, slow down.
Distribute work across keys. If you have a multi-tenant integration where each tenant should get its own quota, generate one API key per tenant. Each key has its own per-minute and per-day budget.
Use streaming for large user-facing requests. Streaming responses don't count differently from synchronous, but they let you display partial answers while the rest generates. UX wins.
Batch with care. AskVault doesn't have a batch endpoint. To process 1,000 queries quickly, send them with controlled concurrency that respects the per-minute limit. On Growth (200/min) that means 200 concurrent in-flight requests max.
Per-key custom limits
On Starter and above you can tighten per-key limits below the plan default. Under Dashboard > API Keys > [key] > Edit:
- Set per-minute limit below 60 (or below 200 on Growth) for less-trusted consumers.
- Set per-day limit below the plan default for cost containment.
Loosening above the plan default isn't supported; you'd need to upgrade your plan.
High-throughput integrations
If you need more than 1,000 requests per minute or 50,000 per day, you have two options:
- Upgrade to Enterprise. Custom limits scoped to your workload. Reach out to sales@askvault.co with your expected peak request rate.
- Use multiple keys with smart routing. Each key gets its own per-minute budget; spreading load across 5 keys gives you 5x the per-minute throughput on Business (5,000 per minute total). Per-day caps still apply per key, so you also get 5x daily throughput.
The multiple-keys approach is simpler if your traffic is well-distributed. Enterprise is better if you need predictable bursty capacity.
Per-user rate limiting (separate from per-key)
If you're building an app where end users call AskVault indirectly through your backend, pass each user a unique user_id:
{ "workspace_id": "wt_xxx", "message": "...", "user_id": "your-app-user-id-42"}AskVault tracks per-user query rates. Configure caps under Settings > Rate Limits > Per-User. Useful for preventing one customer from exhausting your shared quota. Growth+
Common pitfalls
429 on the first request of the day. Per-day counters reset at midnight UTC, not in your local timezone. If you batched yesterday and ran into the cap, the first request after midnight UTC works again.
Retry-After: 0. Carrier proxies sometimes round this. Treat Retry-After: 0 as "wait at least 1 second before retrying".
Burstiness drops 429s under control. A heavy short burst (200 requests in 5 seconds) on Growth will hit the per-minute cap immediately. Stagger requests with a small inter-request delay; even 50 ms between requests keeps you under 200/minute.
Counters drift between client and server. Don't trust your local counter; trust the response headers. Server time is authoritative.
FAQ
Are rate limits per-IP or per-key?
Per-key. Two clients sharing the same key share the budget. Different keys get independent budgets, even from the same IP.
Do streaming requests count differently?
No. A streaming request that yields 100 tokens counts as 1 request, same as a synchronous request that yields the same answer.
Can I get a temporary increase for a launch?
Yes, on Growth and above. Contact support@askvault.co at least 48 hours before the event. We can issue a short-term quota bump.
What's the relationship between rate limits and the monthly query quota?
Daily limits enforce a moving cap on top of the monthly cap. You can't burn through your full monthly allowance in one day; the per-day cap holds you back. Useful for both cost and abuse prevention.
Does AskVault rate-limit by IP?
Not separately, no. Per-key only. If an attacker tries credential-stuffing your widget, the per-key cap takes effect.