POST /v1/query endpoint

Written by Aashiq, Founder, AskVault · Reviewed by Aashiq

Last updated: May 15, 2026 · 4 min read

Endpoint

POST https://api.askvault.co/v1/query

Authorization header: Bearer ak_live_xxx.

Request body

{
  "workspace_id": "ws_xxx",
  "query": "What is your refund policy?",
  "conversation_id": "conv_xxx",
  "user": {
    "user_id": "u_123",
    "hash": "abc123...",
    "name": "Alice",
    "email": "alice@example.co"
  },
  "metadata": {
    "page_url": "https://yoursite.co/pricing"
  }
}

Fields:

workspace_id (required). Your workspace ID.
query (required). The question, up to 1,000 characters.
conversation_id (optional). For multi-turn conversations.
user (optional). For identity-verified queries.
metadata (optional). Context like page URL, custom attributes.

Response

{
  "answer": "Our refund policy allows refunds within 14 days...",
  "sources": [
    {
      "title": "Refund Policy",
      "url": "https://yoursite.co/refund-policy",
      "chunk": "...",
      "score": 0.92
    }
  ],
  "conversation_id": "conv_xxx",
  "skills_fired": ["knowledge_search"],
  "latency_ms": 1850,
  "tokens_used": 450
}

Sources array contains 1 to 5 retrieved chunks. Each has the source URL, exact chunk text, and relevance score.

Example

curl -X POST https://api.askvault.co/v1/query \
  -H "Authorization: Bearer ak_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{"workspace_id":"ws_xxx","query":"What is your refund policy?"}'

Returns the answer with sources within 2 seconds.

Multi-turn conversations

Pass the same conversation_id across calls:

First call. No conversation_id; AskVault generates one.
Subsequent calls. Pass the returned conversation_id.
Context carries: the bot remembers earlier turns.

About 80% of real conversations are multi-turn.

Identity-verified queries

For logged-in visitors:

Pass user.hash (HMAC of user_id with your secret).
The bot applies plan-aware retrieval, identity-gated skills.
See identity verification for hash computation.

Rate limits

Per API key:

60 requests per minute default.
10,000 requests per day default.
Customize per key under API Keys settings.

Rate-limited responses return HTTP 429 with Retry-After header.

Errors

Code	Meaning	Action
200	Success	Use answer
400	Malformed request	Check JSON structure
401	Auth failed	Check Bearer token
403	Quota exceeded	Upgrade plan
429	Rate limited	Wait per Retry-After
500	Server error	Retry with exponential backoff

See API errors for full code reference.

Latency expectations

p50: about 1.5 seconds.
p95: about 3 seconds.
p99: about 5 seconds.

Factors:

Knowledge size. Larger workspaces add 200 to 500 ms.
Identity verification overhead. About 50 ms.
Multi-skill chains. Each skill adds 300 to 2000 ms.

SDK availability

Today, no official SDKs; use any HTTP client (curl, fetch, requests, etc.).

Official SDKs (JS, Python) on the roadmap.

Limits

Query length. 1,000 characters.
Response timeout. 30 seconds (then HTTP 504).
Concurrent requests per key. 50.
Per-day cap. Plan-dependent.

Common pitfalls

400 with no detail. Missing required field. Check workspace_id and query.

401 despite valid key. Wrong workspace ownership. Key must match the workspace.

Empty answer. Knowledge gap; bot says "I don't know". Add content.

Hash verification fails. Wrong identity secret. Check HMAC computation.

FAQ

How is this different from /v1/query/stream?

Streaming returns tokens as generated; useful for chat UIs. Standard endpoint returns full response at once.

Can I cache responses?

Yes for identical queries with stable knowledge. Use ETag headers.

Does this count against my query quota?

Yes. Each call counts as 1 query.

Was this page helpful?