Skip to content
Try Free →

Rate limits per plan

Last updated: · 4 min read

The table

PlanRequests per minuteRequests per day
Free10100
Starter603,000
Growth20015,000
Business1,00050,000
EnterpriseCustomCustom

Limits are per API key, not per workspace. If you have two keys on the same Starter workspace, each key gets its own 60-per-minute, 3,000-per-day budget. The aggregate workspace-level cap is the per-day number multiplied by the number of keys you've created, up to the plan's monthly query allowance.

What counts as a "request"

Any HTTP call to /v1/* that returns a 2xx response. Specifically:

  • Successful POST /v1/query calls count.
  • Successful POST /v1/query/stream calls count.
  • 4xx and 5xx responses don't count against quota.
  • OPTIONS preflight requests don't count.
  • Webhook deliveries from AskVault to your endpoint don't count toward your quota; we eat that cost.

Response headers

Every response includes four rate-limit headers:

X-RateLimit-Limit-Minute: 60
X-RateLimit-Remaining-Minute: 47
X-RateLimit-Limit-Day: 3000
X-RateLimit-Remaining-Day: 2845
X-RateLimit-Reset-Day: 1747353600

X-RateLimit-Reset-Day is a Unix timestamp (seconds) when the daily counter resets. Use it to schedule retries cleanly.

429 Too Many Requests

When you exceed either limit, AskVault returns:

HTTP/1.1 429 Too Many Requests
Retry-After: 23
Content-Type: application/json
{ "detail": "Per-minute rate limit exceeded. Retry after 23 seconds." }

The Retry-After header is in seconds. Sleep for that long, then retry. Don't retry immediately; you'll just get another 429.

The minute and day counters are independent. Hitting the per-minute cap means waiting up to 60 seconds. Hitting the per-day cap means waiting until midnight UTC.

Best practices

A few practices for production integrations:

Respect Retry-After. Most rate-limit issues in real apps come from clients that ignore Retry-After and retry in tight loops. The right behavior is back off, then retry. A simple exponential backoff that respects Retry-After works: wait max(retryAfter, base * 2^attempt) seconds.

Pre-compute backoff. Don't compute Retry-After after every 429. Track your remaining quota in advance using X-RateLimit-Remaining-Day. When it drops below your buffer threshold, slow down.

Distribute work across keys. If you have a multi-tenant integration where each tenant should get its own quota, generate one API key per tenant. Each key has its own per-minute and per-day budget.

Use streaming for large user-facing requests. Streaming responses don't count differently from synchronous, but they let you display partial answers while the rest generates. UX wins.

Batch with care. AskVault doesn't have a batch endpoint. To process 1,000 queries quickly, send them with controlled concurrency that respects the per-minute limit. On Growth (200/min) that means 200 concurrent in-flight requests max.

Per-key custom limits

On Starter and above you can tighten per-key limits below the plan default. Under Dashboard > API Keys > [key] > Edit:

  • Set per-minute limit below 60 (or below 200 on Growth) for less-trusted consumers.
  • Set per-day limit below the plan default for cost containment.

Loosening above the plan default isn't supported; you'd need to upgrade your plan.

High-throughput integrations

If you need more than 1,000 requests per minute or 50,000 per day, you have two options:

  1. Upgrade to Enterprise. Custom limits scoped to your workload. Reach out to sales@askvault.co with your expected peak request rate.
  2. Use multiple keys with smart routing. Each key gets its own per-minute budget; spreading load across 5 keys gives you 5x the per-minute throughput on Business (5,000 per minute total). Per-day caps still apply per key, so you also get 5x daily throughput.

The multiple-keys approach is simpler if your traffic is well-distributed. Enterprise is better if you need predictable bursty capacity.

Per-user rate limiting (separate from per-key)

If you're building an app where end users call AskVault indirectly through your backend, pass each user a unique user_id:

{
"workspace_id": "wt_xxx",
"message": "...",
"user_id": "your-app-user-id-42"
}

AskVault tracks per-user query rates. Configure caps under Settings > Rate Limits > Per-User. Useful for preventing one customer from exhausting your shared quota. Growth+

Common pitfalls

429 on the first request of the day. Per-day counters reset at midnight UTC, not in your local timezone. If you batched yesterday and ran into the cap, the first request after midnight UTC works again.

Retry-After: 0. Carrier proxies sometimes round this. Treat Retry-After: 0 as "wait at least 1 second before retrying".

Burstiness drops 429s under control. A heavy short burst (200 requests in 5 seconds) on Growth will hit the per-minute cap immediately. Stagger requests with a small inter-request delay; even 50 ms between requests keeps you under 200/minute.

Counters drift between client and server. Don't trust your local counter; trust the response headers. Server time is authoritative.

FAQ

Are rate limits per-IP or per-key?

Per-key. Two clients sharing the same key share the budget. Different keys get independent budgets, even from the same IP.

Do streaming requests count differently?

No. A streaming request that yields 100 tokens counts as 1 request, same as a synchronous request that yields the same answer.

Can I get a temporary increase for a launch?

Yes, on Growth and above. Contact support@askvault.co at least 48 hours before the event. We can issue a short-term quota bump.

What's the relationship between rate limits and the monthly query quota?

Daily limits enforce a moving cap on top of the monthly cap. You can't burn through your full monthly allowance in one day; the per-day cap holds you back. Useful for both cost and abuse prevention.

Does AskVault rate-limit by IP?

Not separately, no. Per-key only. If an attacker tries credential-stuffing your widget, the per-key cap takes effect.

Was this page helpful?