POST /v1/query, the chat query endpoint reference

Written by Aashiq, Founder, AskVault · Reviewed by Aashiq

Last updated: May 15, 2026 · 6 min read

Endpoint

POST https://api.askvault.co/v1/query

Synchronous JSON response. For streaming, use POST /v1/query/stream instead.

Authentication: Authorization: Bearer ak_xxx. See authentication.

Minimal request

curl -X POST https://api.askvault.co/v1/query \
-H "Authorization: Bearer ak_xxx" \
-H "Content-Type: application/json" \
-d '{
  "workspace_id": "wt_xxx",
  "message": "What is your refund policy?"
}'

import requests, os
r = requests.post(
  "https://api.askvault.co/v1/query",
  headers={"Authorization": f"Bearer {os.environ['ASKVAULT_API_KEY']}"},
  json={"workspace_id": "wt_xxx", "message": "What is your refund policy?"},
  timeout=15,
)
print(r.json())

const r = await fetch("https://api.askvault.co/v1/query", {
method: "POST",
headers: {
  "Authorization": `Bearer ${process.env.ASKVAULT_API_KEY}`,
  "Content-Type": "application/json",
},
body: JSON.stringify({
  workspace_id: "wt_xxx",
  message: "What is your refund policy?",
}),
});
console.log(await r.json());

Request parameters

Field	Type	Required	Description
`workspace_id`	string	Yes	The workspace to query. Format `wt_xxxxxx_xxx`. Find under Dashboard > Settings > General.
`message`	string	Yes	The user's question. 1 to 4,000 characters. Longer messages are truncated.
`top_k`	integer	No	Number of context chunks to retrieve. 1 to 10. Default 5. Higher values use more tokens and can dilute the answer.
`temperature`	number	No	Answer creativity from 0.0 to 1.0. Default 0.3. Keep low for factual support; raise for creative tasks.
`strictness`	string	No	`"strict"` to refuse answers not in the knowledge base. `"helpful"` (default) to combine KB with reasoning when KB is insufficient.
`document_ids`	string[]	No	Restrict retrieval to specific document IDs. URL allowlist behavior; the bot can't see content outside these documents in this query.
`conversation_id`	string	No	Continue a multi-turn conversation. Prior turns become context for the new query.
`user_id`	string	No	Anonymous end-user identifier. Enables per-user rate limiting, conversation isolation, and analytics.
`verification_token`	string	No	HMAC-signed user_id for identity-verified queries. Enables audience-based scoping.
`audience`	string[]	No	Audience set this query runs under. Only documents tagged with at least one matching audience are retrievable. Requires `verification_token`.
`metadata`	object	No	Free-form metadata stored on the conversation. Useful for tracing or tagging. Max 8 KB.

Response schema

{
  "answer": "Refunds are available within 30 days of purchase. Submit a request at acme.co/refunds with your order number.",
  "sources": [
    {
      "document_id": "doc_a1b2c3",
      "document_title": "Refund policy",
      "url": "https://acme.co/policies/refunds",
      "relevance_score": 0.94,
      "snippet": "Refunds are available within 30 days of purchase..."
    }
  ],
  "confidence": "high",
  "model": "askvault-standard",
  "tokens_used": 187,
  "latency_ms": 612,
  "request_id": "req_5b45ff_xxx",
  "conversation_id": "conv_5b45ff_xxx"
}

Field	Type	Description
`answer`	string	The grounded answer text. Sourced from `sources` array.
`sources`	array	The chunks retrieved to ground the answer. Sorted by `relevance_score` descending.
`sources[].document_id`	string	Stable ID of the source document. Use for source-link rendering.
`sources[].document_title`	string	Human-readable document title.
`sources[].url`	string	URL of the source document, if it has one. Empty string for uploaded files.
`sources[].relevance_score`	number	Cosine similarity score between 0 and 1. Higher is more relevant.
`sources[].snippet`	string	First 200 characters of the chunk text.
`confidence`	string	`"high"`, `"medium"`, `"low"`. Low confidence means the answer might be wrong.
`model`	string	The model that generated the answer. Always `askvault-standard` from the API's perspective.
`tokens_used`	integer	Total tokens consumed by this query (input + output).
`latency_ms`	integer	Server-side latency for this query in milliseconds.
`request_id`	string	Unique ID for this request. Store in your logs for debugging.
`conversation_id`	string	The conversation ID. Pass this back in subsequent queries to continue the conversation.

Conversation continuity

To continue a multi-turn conversation, pass the conversation_id from the previous response:

{
  "workspace_id": "wt_xxx",
  "message": "Can I get a refund on a sale item?",
  "conversation_id": "conv_5b45ff_xxx"
}

AskVault loads prior turns of the conversation as context for the new query. The retrieval and answer both factor in the conversation history.

To start a fresh conversation, omit conversation_id. AskVault creates a new one and returns it.

Low-confidence handling

When confidence is "low", the answer is likely wrong or incomplete. The bot probably couldn't find the right content in the knowledge base. Two options:

Display the bot's answer with a disclaimer. "I'm not sure about this one; let me know if you need a human to help."
Route the conversation to a human via webhook. Subscribe to the knowledge.gap_detected webhook event and route low-confidence queries to your support inbox.

In strict mode ("strictness": "strict"), the bot refuses to answer when confidence is low. The answer field becomes something like "I don't have information about that in my knowledge base. Would you like me to connect you with a human?"

Document scoping

To restrict retrieval to specific documents (without changing the workspace's overall allowlist):

{
  "workspace_id": "wt_xxx",
  "message": "What's your warranty policy?",
  "document_ids": ["doc_warranty_us", "doc_warranty_eu"]
}

The bot can only retrieve from those documents. Useful for jurisdiction-scoped queries, customer-specific knowledge subsets, and audience-restricted content.

HTTP status codes

Status	Meaning
200	Query processed successfully. Response body is JSON.
400	Malformed request. Missing required field or invalid value.
401	Missing or invalid API key.
403	API key doesn't have access to the requested workspace.
404	Workspace not found.
422	Validation error. The body parsed but a field violates a constraint.
429	Rate limit exceeded. Check `Retry-After` header.
500	Server error. Retry with exponential backoff.
503	Service temporarily unavailable. Retry.

Every error response includes a JSON body with a detail field describing the problem:

{ "detail": "workspace_id is required" }

Log detail for debugging; never show raw error bodies to end users.

Performance

Typical latency for a standard workspace:

Cold workspace, first request: 800 to 1,500 ms
Warm workspace: 400 to 900 ms
Workspace with hybrid retrieval enabled (higher tier): 600 to 1,200 ms

If you need sub-300 ms first-token-latency for live UX, switch to the streaming endpoint.

Common pitfalls

Empty sources array on every query. Your workspace has no indexed content. Check Knowledge Hub.

Same query returns different answers across calls. Set temperature to 0. Default is 0.3 which introduces some variability.

Bot answers from training data instead of your content. Set strictness to "strict". The bot will refuse to answer when content isn't in the KB.

Conversation context doesn't carry across turns. You're not passing the conversation_id from the previous response. Round-trip it.

Latency >2 seconds consistently. Workspace is cold, or you're hitting the wrong AskVault region. Pre-warm with a noop query at app start.

Was this page helpful?