POST /v1/query endpoint
Endpoint
POST https://api.askvault.co/v1/queryAuthorization header: Bearer ak_live_xxx.
Request body
{ "workspace_id": "ws_xxx", "query": "What is your refund policy?", "conversation_id": "conv_xxx", "user": { "user_id": "u_123", "hash": "abc123...", "name": "Alice", "email": "alice@example.co" }, "metadata": { "page_url": "https://yoursite.co/pricing" }}Fields:
workspace_id(required). Your workspace ID.query(required). The question, up to 1,000 characters.conversation_id(optional). For multi-turn conversations.user(optional). For identity-verified queries.metadata(optional). Context like page URL, custom attributes.
Response
{ "answer": "Our refund policy allows refunds within 14 days...", "sources": [ { "title": "Refund Policy", "url": "https://yoursite.co/refund-policy", "chunk": "...", "score": 0.92 } ], "conversation_id": "conv_xxx", "skills_fired": ["knowledge_search"], "latency_ms": 1850, "tokens_used": 450}Sources array contains 1 to 5 retrieved chunks. Each has the source URL, exact chunk text, and relevance score.
Example
curl -X POST https://api.askvault.co/v1/query \ -H "Authorization: Bearer ak_live_xxx" \ -H "Content-Type: application/json" \ -d '{"workspace_id":"ws_xxx","query":"What is your refund policy?"}'Returns the answer with sources within 2 seconds.
Multi-turn conversations
Pass the same conversation_id across calls:
- First call. No
conversation_id; AskVault generates one. - Subsequent calls. Pass the returned
conversation_id. - Context carries: the bot remembers earlier turns.
About 80% of real conversations are multi-turn.
Identity-verified queries
For logged-in visitors:
- Pass
user.hash(HMAC ofuser_idwith your secret). - The bot applies plan-aware retrieval, identity-gated skills.
- See identity verification for hash computation.
Rate limits
Per API key:
- 60 requests per minute default.
- 10,000 requests per day default.
- Customize per key under API Keys settings.
Rate-limited responses return HTTP 429 with Retry-After header.
Errors
| Code | Meaning | Action |
|---|---|---|
| 200 | Success | Use answer |
| 400 | Malformed request | Check JSON structure |
| 401 | Auth failed | Check Bearer token |
| 403 | Quota exceeded | Upgrade plan |
| 429 | Rate limited | Wait per Retry-After |
| 500 | Server error | Retry with exponential backoff |
See API errors for full code reference.
Latency expectations
- p50: about 1.5 seconds.
- p95: about 3 seconds.
- p99: about 5 seconds.
Factors:
- Knowledge size. Larger workspaces add 200 to 500 ms.
- Identity verification overhead. About 50 ms.
- Multi-skill chains. Each skill adds 300 to 2000 ms.
SDK availability
Today, no official SDKs; use any HTTP client (curl, fetch, requests, etc.).
Official SDKs (JS, Python) on the roadmap.
Limits
- Query length. 1,000 characters.
- Response timeout. 30 seconds (then HTTP 504).
- Concurrent requests per key. 50.
- Per-day cap. Plan-dependent.
Common pitfalls
400 with no detail. Missing required field. Check workspace_id and query.
401 despite valid key. Wrong workspace ownership. Key must match the workspace.
Empty answer. Knowledge gap; bot says "I don't know". Add content.
Hash verification fails. Wrong identity secret. Check HMAC computation.
FAQ
How is this different from /v1/query/stream?
Streaming returns tokens as generated; useful for chat UIs. Standard endpoint returns full response at once.
Can I cache responses?
Yes for identical queries with stable knowledge. Use ETag headers.
Does this count against my query quota?
Yes. Each call counts as 1 query.