POST /v1/query, the chat query endpoint reference
Endpoint
POST https://api.askvault.co/v1/querySynchronous JSON response. For streaming, use POST /v1/query/stream instead.
Authentication: Authorization: Bearer ak_xxx. See authentication.
Minimal request
curl -X POST https://api.askvault.co/v1/query \-H "Authorization: Bearer ak_xxx" \-H "Content-Type: application/json" \-d '{ "workspace_id": "wt_xxx", "message": "What is your refund policy?"}'import requests, osr = requests.post( "https://api.askvault.co/v1/query", headers={"Authorization": f"Bearer {os.environ['ASKVAULT_API_KEY']}"}, json={"workspace_id": "wt_xxx", "message": "What is your refund policy?"}, timeout=15,)print(r.json())const r = await fetch("https://api.askvault.co/v1/query", {method: "POST",headers: { "Authorization": `Bearer ${process.env.ASKVAULT_API_KEY}`, "Content-Type": "application/json",},body: JSON.stringify({ workspace_id: "wt_xxx", message: "What is your refund policy?",}),});console.log(await r.json());Request parameters
| Field | Type | Required | Description |
|---|---|---|---|
workspace_id | string | Yes | The workspace to query. Format wt_xxxxxx_xxx. Find under Dashboard > Settings > General. |
message | string | Yes | The user's question. 1 to 4,000 characters. Longer messages are truncated. |
top_k | integer | No | Number of context chunks to retrieve. 1 to 10. Default 5. Higher values use more tokens and can dilute the answer. |
temperature | number | No | Answer creativity from 0.0 to 1.0. Default 0.3. Keep low for factual support; raise for creative tasks. |
strictness | string | No | "strict" to refuse answers not in the knowledge base. "helpful" (default) to combine KB with reasoning when KB is insufficient. |
document_ids | string[] | No | Restrict retrieval to specific document IDs. URL allowlist behavior; the bot can't see content outside these documents in this query. |
conversation_id | string | No | Continue a multi-turn conversation. Prior turns become context for the new query. |
user_id | string | No | Anonymous end-user identifier. Enables per-user rate limiting, conversation isolation, and analytics. |
verification_token | string | No | HMAC-signed user_id for identity-verified queries. Enables audience-based scoping. |
audience | string[] | No | Audience set this query runs under. Only documents tagged with at least one matching audience are retrievable. Requires verification_token. |
metadata | object | No | Free-form metadata stored on the conversation. Useful for tracing or tagging. Max 8 KB. |
Response schema
{ "answer": "Refunds are available within 30 days of purchase. Submit a request at acme.co/refunds with your order number.", "sources": [ { "document_id": "doc_a1b2c3", "document_title": "Refund policy", "url": "https://acme.co/policies/refunds", "relevance_score": 0.94, "snippet": "Refunds are available within 30 days of purchase..." } ], "confidence": "high", "model": "askvault-standard", "tokens_used": 187, "latency_ms": 612, "request_id": "req_5b45ff_xxx", "conversation_id": "conv_5b45ff_xxx"}| Field | Type | Description |
|---|---|---|
answer | string | The grounded answer text. Sourced from sources array. |
sources | array | The chunks retrieved to ground the answer. Sorted by relevance_score descending. |
sources[].document_id | string | Stable ID of the source document. Use for source-link rendering. |
sources[].document_title | string | Human-readable document title. |
sources[].url | string | URL of the source document, if it has one. Empty string for uploaded files. |
sources[].relevance_score | number | Cosine similarity score between 0 and 1. Higher is more relevant. |
sources[].snippet | string | First 200 characters of the chunk text. |
confidence | string | "high", "medium", "low". Low confidence means the answer might be wrong. |
model | string | The model that generated the answer. Always askvault-standard from the API's perspective. |
tokens_used | integer | Total tokens consumed by this query (input + output). |
latency_ms | integer | Server-side latency for this query in milliseconds. |
request_id | string | Unique ID for this request. Store in your logs for debugging. |
conversation_id | string | The conversation ID. Pass this back in subsequent queries to continue the conversation. |
Conversation continuity
To continue a multi-turn conversation, pass the conversation_id from the previous response:
{ "workspace_id": "wt_xxx", "message": "Can I get a refund on a sale item?", "conversation_id": "conv_5b45ff_xxx"}AskVault loads prior turns of the conversation as context for the new query. The retrieval and answer both factor in the conversation history.
To start a fresh conversation, omit conversation_id. AskVault creates a new one and returns it.
Low-confidence handling
When confidence is "low", the answer is likely wrong or incomplete. The bot probably couldn't find the right content in the knowledge base. Two options:
- Display the bot's answer with a disclaimer. "I'm not sure about this one; let me know if you need a human to help."
- Route the conversation to a human via webhook. Subscribe to the
knowledge.gap_detectedwebhook event and route low-confidence queries to your support inbox.
In strict mode ("strictness": "strict"), the bot refuses to answer when confidence is low. The answer field becomes something like "I don't have information about that in my knowledge base. Would you like me to connect you with a human?"
Document scoping
To restrict retrieval to specific documents (without changing the workspace's overall allowlist):
{ "workspace_id": "wt_xxx", "message": "What's your warranty policy?", "document_ids": ["doc_warranty_us", "doc_warranty_eu"]}The bot can only retrieve from those documents. Useful for jurisdiction-scoped queries, customer-specific knowledge subsets, and audience-restricted content.
HTTP status codes
| Status | Meaning |
|---|---|
| 200 | Query processed successfully. Response body is JSON. |
| 400 | Malformed request. Missing required field or invalid value. |
| 401 | Missing or invalid API key. |
| 403 | API key doesn't have access to the requested workspace. |
| 404 | Workspace not found. |
| 422 | Validation error. The body parsed but a field violates a constraint. |
| 429 | Rate limit exceeded. Check Retry-After header. |
| 500 | Server error. Retry with exponential backoff. |
| 503 | Service temporarily unavailable. Retry. |
Every error response includes a JSON body with a detail field describing the problem:
{ "detail": "workspace_id is required" }Log detail for debugging; never show raw error bodies to end users.
Performance
Typical latency for a standard workspace:
- Cold workspace, first request: 800 to 1,500 ms
- Warm workspace: 400 to 900 ms
- Workspace with hybrid retrieval enabled (higher tier): 600 to 1,200 ms
If you need sub-300 ms first-token-latency for live UX, switch to the streaming endpoint.
Common pitfalls
Empty sources array on every query. Your workspace has no indexed content. Check Knowledge Hub.
Same query returns different answers across calls. Set temperature to 0. Default is 0.3 which introduces some variability.
Bot answers from training data instead of your content. Set strictness to "strict". The bot will refuse to answer when content isn't in the KB.
Conversation context doesn't carry across turns. You're not passing the conversation_id from the previous response. Round-trip it.
Latency >2 seconds consistently. Workspace is cold, or you're hitting the wrong AskVault region. Pre-warm with a noop query at app start.
Related guides
- Getting started with the AskVault API
- API authentication
- POST /v1/query/stream
- Rate limits per plan
- Error codes
- Webhooks