Skip to content
Try Free →

POST /v1/query endpoint

Last updated: · 4 min read

Endpoint

POST https://api.askvault.co/v1/query

Authorization header: Bearer ak_live_xxx.

Request body

{
"workspace_id": "ws_xxx",
"query": "What is your refund policy?",
"conversation_id": "conv_xxx",
"user": {
"user_id": "u_123",
"hash": "abc123...",
"name": "Alice",
"email": "alice@example.co"
},
"metadata": {
"page_url": "https://yoursite.co/pricing"
}
}

Fields:

  • workspace_id (required). Your workspace ID.
  • query (required). The question, up to 1,000 characters.
  • conversation_id (optional). For multi-turn conversations.
  • user (optional). For identity-verified queries.
  • metadata (optional). Context like page URL, custom attributes.

Response

{
"answer": "Our refund policy allows refunds within 14 days...",
"sources": [
{
"title": "Refund Policy",
"url": "https://yoursite.co/refund-policy",
"chunk": "...",
"score": 0.92
}
],
"conversation_id": "conv_xxx",
"skills_fired": ["knowledge_search"],
"latency_ms": 1850,
"tokens_used": 450
}

Sources array contains 1 to 5 retrieved chunks. Each has the source URL, exact chunk text, and relevance score.

Example

Terminal window
curl -X POST https://api.askvault.co/v1/query \
-H "Authorization: Bearer ak_live_xxx" \
-H "Content-Type: application/json" \
-d '{"workspace_id":"ws_xxx","query":"What is your refund policy?"}'

Returns the answer with sources within 2 seconds.

Multi-turn conversations

Pass the same conversation_id across calls:

  • First call. No conversation_id; AskVault generates one.
  • Subsequent calls. Pass the returned conversation_id.
  • Context carries: the bot remembers earlier turns.

About 80% of real conversations are multi-turn.

Identity-verified queries

For logged-in visitors:

  • Pass user.hash (HMAC of user_id with your secret).
  • The bot applies plan-aware retrieval, identity-gated skills.
  • See identity verification for hash computation.

Rate limits

Per API key:

  • 60 requests per minute default.
  • 10,000 requests per day default.
  • Customize per key under API Keys settings.

Rate-limited responses return HTTP 429 with Retry-After header.

Errors

CodeMeaningAction
200SuccessUse answer
400Malformed requestCheck JSON structure
401Auth failedCheck Bearer token
403Quota exceededUpgrade plan
429Rate limitedWait per Retry-After
500Server errorRetry with exponential backoff

See API errors for full code reference.

Latency expectations

  • p50: about 1.5 seconds.
  • p95: about 3 seconds.
  • p99: about 5 seconds.

Factors:

  • Knowledge size. Larger workspaces add 200 to 500 ms.
  • Identity verification overhead. About 50 ms.
  • Multi-skill chains. Each skill adds 300 to 2000 ms.

SDK availability

Today, no official SDKs; use any HTTP client (curl, fetch, requests, etc.).

Official SDKs (JS, Python) on the roadmap.

Limits

  • Query length. 1,000 characters.
  • Response timeout. 30 seconds (then HTTP 504).
  • Concurrent requests per key. 50.
  • Per-day cap. Plan-dependent.

Common pitfalls

400 with no detail. Missing required field. Check workspace_id and query.

401 despite valid key. Wrong workspace ownership. Key must match the workspace.

Empty answer. Knowledge gap; bot says "I don't know". Add content.

Hash verification fails. Wrong identity secret. Check HMAC computation.

FAQ

How is this different from /v1/query/stream?

Streaming returns tokens as generated; useful for chat UIs. Standard endpoint returns full response at once.

Can I cache responses?

Yes for identical queries with stable knowledge. Use ETag headers.

Does this count against my query quota?

Yes. Each call counts as 1 query.

Was this page helpful?