Getting started with the AskVault API

Written by Aashiq, Founder, AskVault · Reviewed by Aashiq

Last updated: May 15, 2026 · 5 min read

What you can build with the AskVault API

The AskVault REST API exposes the same RAG pipeline that powers the AskVault widget, WhatsApp, Telegram, and live chat. With one HTTP endpoint you can build a custom chat interface, embed AI search inside your own product, drive an internal helpdesk bot, or pipe answers into Slack, Discord, or a customer-success dashboard.

Every API response is grounded in the documents you've indexed in your workspace. The API doesn't hallucinate. If the answer isn't in your knowledge base, it returns confidence: "low" and a graceful fallback message so you can route low-confidence queries to a human via webhook.

There are two endpoints. POST /v1/query returns a synchronous JSON response. POST /v1/query/stream streams the response as Server-Sent Events. Streaming starts in under 300 ms and completes in 1.5 to 4 seconds depending on response length.

Before you start

You need three things to make your first API call:

An AskVault workspace with content indexed. Sign up at askvault.co/signup (free) and run the onboarding wizard to crawl a website or upload a PDF. Indexing 50 pages takes about 90 seconds.
An API key in the format ak_xxx. Generate one under Dashboard > API Keys.
Any HTTP client. curl, Postman, the Python requests library, or whatever you already use.

The free plan includes API access. Starter is ₹2,499 a month for 3,000 queries. Growth is ₹4,999 for 15,000. Business is ₹8,499 for 50,000.

Generate an API key

Open Dashboard > API Keys.
Click Create new key.
Give it a recognizable name like production-website, staging, or mobile-app.
Pick a rate limit. The default matches your plan; tighter per-key limits are useful when you're exposing AskVault to end users from a multi-tenant SaaS.
Click Generate. The key (ak_5b45ff_...) is shown once. Copy it now. AskVault stores only a SHA-256 hash; we can't recover the original.

Treat API keys like passwords. Store them in environment variables (ASKVAULT_API_KEY), not in source code. Rotate them every quarter. If a key leaks, revoke it immediately under Dashboard > API Keys > Revoke.

Your first request

Send your first chat query. Replace ak_xxx with your real API key and wt_xxx with your workspace ID (visible in Dashboard → Settings → General).

curl -X POST https://api.askvault.co/v1/query \
-H "Authorization: Bearer ak_xxx" \
-H "Content-Type: application/json" \
-d '{
  "workspace_id": "wt_xxx",
  "message": "How do I reset my password?",
  "top_k": 5
}'

import os, requests

response = requests.post(
  "https://api.askvault.co/v1/query",
  headers={
      "Authorization": f"Bearer {os.environ['ASKVAULT_API_KEY']}",
      "Content-Type": "application/json",
  },
  json={
      "workspace_id": "wt_xxx",
      "message": "How do I reset my password?",
      "top_k": 5,
  },
  timeout=15,
)
response.raise_for_status()
print(response.json())

const response = await fetch("https://api.askvault.co/v1/query", {
method: "POST",
headers: {
  "Authorization": `Bearer ${process.env.ASKVAULT_API_KEY}`,
  "Content-Type": "application/json",
},
body: JSON.stringify({
  workspace_id: "wt_xxx",
  message: "How do I reset my password?",
  top_k: 5,
}),
});
const data = await response.json();
console.log(data);

A successful response looks like this:

{
  "answer": "To reset your password, go to askvault.co/forgot-password, enter your email, and click the reset link we send you. The link expires in 60 minutes.",
  "sources": [
    {
      "document_id": "doc_a1b2c3",
      "document_title": "Password reset flow",
      "url": "https://docs.askvault.co/account/password-reset/",
      "relevance_score": 0.94,
      "snippet": "Visit /forgot-password, enter your email..."
    }
  ],
  "confidence": "high",
  "model": "askvault-standard",
  "tokens_used": 187,
  "latency_ms": 612,
  "request_id": "req_5b45ff_xxx"
}

The sources array is what makes AskVault answers verifiable. Every claim in answer traces back to a snippet you indexed. Store the request_id in your logs. If a customer ever complains about an answer, you can replay the exact retrieval in the dashboard's request inspector.

Stream responses with SSE

For real-time UX (typing-effect chat interfaces, low-perceived-latency interactions), use the streaming endpoint. It uses Server-Sent Events over HTTP.

const response = await fetch("https://api.askvault.co/v1/query/stream", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.ASKVAULT_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    workspace_id: "wt_xxx",
    message: "How does pricing work?",
  }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  const chunk = decoder.decode(value);
  for (const line of chunk.split("\n")) {
    if (line.startsWith("data: ")) {
      const event = JSON.parse(line.slice(6));
      if (event.type === "token") process.stdout.write(event.text);
      else if (event.type === "source") console.log("Source:", event.document_title);
      else if (event.type === "done") console.log("\nLatency:", event.latency_ms, "ms");
    }
  }
}

Event types you'll receive: token (partial answer chunks), source (a citation as it's selected), done (final stats), and error (any failure). The full SSE schema is in the streaming reference.

Request parameters

Parameter	Type	Required	Description
`workspace_id`	string	Yes	The workspace to query (found in dashboard Settings)
`message`	string	Yes	The user's question (1 to 4,000 characters)
`top_k`	integer	No	Number of context chunks to retrieve (1 to 10, default 5)
`temperature`	number	No	Answer creativity (0.0 to 1.0, default 0.3), keep low for factual support
`strictness`	string	No	`"strict"` to refuse answers not in KB, `"helpful"` (default) to combine KB with reasoning
`document_ids`	string[]	No	Restrict retrieval to specific document IDs (URL allowlist behavior)
`conversation_id`	string	No	Continue a multi-turn conversation, passes prior messages as context
`user_id`	string	No	Anonymous end-user identifier, enables per-user rate limiting + analytics

Rate limits

The Starter plan allows 60 requests/minute and 3,000 requests/day per API key. Growth allows 200 requests/minute and 15,000/day. Business allows 1,000 requests/minute and 50,000/day. Enterprise is custom.

Every response includes rate-limit headers so you can avoid hitting the cap:

X-RateLimit-Limit-Minute: 60
X-RateLimit-Remaining-Minute: 47
X-RateLimit-Limit-Day: 3000
X-RateLimit-Remaining-Day: 2845
X-RateLimit-Reset-Day: 1747353600

When you exceed the limit you get HTTP 429 Too Many Requests back, with Retry-After in seconds. Back off for that long, then retry. Full reference: rate limits per plan.

Authentication errors

The most common failures and what to check:

Status	Meaning	Fix
401	Missing or malformed Authorization header	Header must be exactly `Bearer ak_xxx`. No "Token" or "Key" prefix.
401	Invalid or revoked API key	The key was deleted or never existed. Generate a new one in the dashboard.
403	Workspace not owned by this API key	The `workspace_id` doesn't belong to the API key's owner. Use a key from the same workspace owner.
404	Workspace not found	The `workspace_id` is wrong or the workspace was deleted.

Every error response includes a JSON body with detail describing the problem. Log it for debugging, but never show raw error bodies to end users.

FAQ

How is API usage billed?

Each /v1/query and /v1/query/stream request that returns a successful answer counts as 1 query against your plan's monthly quota. Failed requests (4xx/5xx) don't count. Indexing operations (uploading documents, crawling URLs) are billed separately by content volume in MB.

Can I use the API from a browser?

Not directly. The API key would be exposed in client-side code. For browser usage, either proxy through your own backend, or use the embeddable widget which authenticates with a public workspace token plus per-domain rate limiting.

How do I support multiple end users?

Pass a unique user_id per end user. AskVault uses it for per-user rate limiting, conversation-history isolation, and analytics. If you're building multi-tenant SaaS, also use the conversation_id field to keep each user's conversation context isolated.

What's the maximum message length?

4,000 characters. Longer messages get truncated. For document analysis ("summarize this 50-page PDF"), upload the document to the knowledge base instead. The agent reads it via retrieval, not as a raw prompt.

Does the API have an SDK?

Official JavaScript and Python clients are on the roadmap. Until they ship, the API is OpenAPI-spec compatible. You can generate a client in any language from openapi.askvault.co/openapi.json.

Was this page helpful?