Skip to content
Try Free →

POST /v1/query, the chat query endpoint reference

Last updated: · 6 min read

Endpoint

POST https://api.askvault.co/v1/query

Synchronous JSON response. For streaming, use POST /v1/query/stream instead.

Authentication: Authorization: Bearer ak_xxx. See authentication.

Minimal request

Terminal window
curl -X POST https://api.askvault.co/v1/query \
-H "Authorization: Bearer ak_xxx" \
-H "Content-Type: application/json" \
-d '{
"workspace_id": "wt_xxx",
"message": "What is your refund policy?"
}'

Request parameters

FieldTypeRequiredDescription
workspace_idstringYesThe workspace to query. Format wt_xxxxxx_xxx. Find under Dashboard > Settings > General.
messagestringYesThe user's question. 1 to 4,000 characters. Longer messages are truncated.
top_kintegerNoNumber of context chunks to retrieve. 1 to 10. Default 5. Higher values use more tokens and can dilute the answer.
temperaturenumberNoAnswer creativity from 0.0 to 1.0. Default 0.3. Keep low for factual support; raise for creative tasks.
strictnessstringNo"strict" to refuse answers not in the knowledge base. "helpful" (default) to combine KB with reasoning when KB is insufficient.
document_idsstring[]NoRestrict retrieval to specific document IDs. URL allowlist behavior; the bot can't see content outside these documents in this query.
conversation_idstringNoContinue a multi-turn conversation. Prior turns become context for the new query.
user_idstringNoAnonymous end-user identifier. Enables per-user rate limiting, conversation isolation, and analytics.
verification_tokenstringNoHMAC-signed user_id for identity-verified queries. Enables audience-based scoping.
audiencestring[]NoAudience set this query runs under. Only documents tagged with at least one matching audience are retrievable. Requires verification_token.
metadataobjectNoFree-form metadata stored on the conversation. Useful for tracing or tagging. Max 8 KB.

Response schema

{
"answer": "Refunds are available within 30 days of purchase. Submit a request at acme.co/refunds with your order number.",
"sources": [
{
"document_id": "doc_a1b2c3",
"document_title": "Refund policy",
"url": "https://acme.co/policies/refunds",
"relevance_score": 0.94,
"snippet": "Refunds are available within 30 days of purchase..."
}
],
"confidence": "high",
"model": "askvault-standard",
"tokens_used": 187,
"latency_ms": 612,
"request_id": "req_5b45ff_xxx",
"conversation_id": "conv_5b45ff_xxx"
}
FieldTypeDescription
answerstringThe grounded answer text. Sourced from sources array.
sourcesarrayThe chunks retrieved to ground the answer. Sorted by relevance_score descending.
sources[].document_idstringStable ID of the source document. Use for source-link rendering.
sources[].document_titlestringHuman-readable document title.
sources[].urlstringURL of the source document, if it has one. Empty string for uploaded files.
sources[].relevance_scorenumberCosine similarity score between 0 and 1. Higher is more relevant.
sources[].snippetstringFirst 200 characters of the chunk text.
confidencestring"high", "medium", "low". Low confidence means the answer might be wrong.
modelstringThe model that generated the answer. Always askvault-standard from the API's perspective.
tokens_usedintegerTotal tokens consumed by this query (input + output).
latency_msintegerServer-side latency for this query in milliseconds.
request_idstringUnique ID for this request. Store in your logs for debugging.
conversation_idstringThe conversation ID. Pass this back in subsequent queries to continue the conversation.

Conversation continuity

To continue a multi-turn conversation, pass the conversation_id from the previous response:

{
"workspace_id": "wt_xxx",
"message": "Can I get a refund on a sale item?",
"conversation_id": "conv_5b45ff_xxx"
}

AskVault loads prior turns of the conversation as context for the new query. The retrieval and answer both factor in the conversation history.

To start a fresh conversation, omit conversation_id. AskVault creates a new one and returns it.

Low-confidence handling

When confidence is "low", the answer is likely wrong or incomplete. The bot probably couldn't find the right content in the knowledge base. Two options:

  1. Display the bot's answer with a disclaimer. "I'm not sure about this one; let me know if you need a human to help."
  2. Route the conversation to a human via webhook. Subscribe to the knowledge.gap_detected webhook event and route low-confidence queries to your support inbox.

In strict mode ("strictness": "strict"), the bot refuses to answer when confidence is low. The answer field becomes something like "I don't have information about that in my knowledge base. Would you like me to connect you with a human?"

Document scoping

To restrict retrieval to specific documents (without changing the workspace's overall allowlist):

{
"workspace_id": "wt_xxx",
"message": "What's your warranty policy?",
"document_ids": ["doc_warranty_us", "doc_warranty_eu"]
}

The bot can only retrieve from those documents. Useful for jurisdiction-scoped queries, customer-specific knowledge subsets, and audience-restricted content.

HTTP status codes

StatusMeaning
200Query processed successfully. Response body is JSON.
400Malformed request. Missing required field or invalid value.
401Missing or invalid API key.
403API key doesn't have access to the requested workspace.
404Workspace not found.
422Validation error. The body parsed but a field violates a constraint.
429Rate limit exceeded. Check Retry-After header.
500Server error. Retry with exponential backoff.
503Service temporarily unavailable. Retry.

Every error response includes a JSON body with a detail field describing the problem:

{ "detail": "workspace_id is required" }

Log detail for debugging; never show raw error bodies to end users.

Performance

Typical latency for a standard workspace:

  • Cold workspace, first request: 800 to 1,500 ms
  • Warm workspace: 400 to 900 ms
  • Workspace with hybrid retrieval enabled (higher tier): 600 to 1,200 ms

If you need sub-300 ms first-token-latency for live UX, switch to the streaming endpoint.

Common pitfalls

Empty sources array on every query. Your workspace has no indexed content. Check Knowledge Hub.

Same query returns different answers across calls. Set temperature to 0. Default is 0.3 which introduces some variability.

Bot answers from training data instead of your content. Set strictness to "strict". The bot will refuse to answer when content isn't in the KB.

Conversation context doesn't carry across turns. You're not passing the conversation_id from the previous response. Round-trip it.

Latency >2 seconds consistently. Workspace is cold, or you're hitting the wrong AskVault region. Pre-warm with a noop query at app start.

Was this page helpful?