How to use the AskVault REST API as a channel
What "API as a channel" actually means
The website widget, WhatsApp, Slack, and friends are pre-built integrations. The REST API is the same retrieval-augmented agent exposed as a single HTTP endpoint. You build whatever surface you want on top.
Three patterns where the REST API is the right channel:
- Custom chat UI inside your product. You want full control over the look and behavior. The widget doesn't fit your brand or you need to integrate with internal authentication flows.
- Backend automation. A nightly job that processes incoming customer emails, generates draft replies via AskVault, and stores them in your CRM.
- Mobile app. Native iOS or Android app that calls the AskVault API directly from the device with a per-user API key.
The pricing is the same per query as the other channels. You're not paying extra for the API; you're paying for the AI work the API performs.
Setup
Five minutes from zero to first API call.
- Sign up for AskVault. Free plan works for testing.
- Index some content. Run the onboarding wizard to crawl a website or upload a PDF. Indexing 50 pages takes about 90 seconds.
- Generate an API key. Dashboard > API Keys > Create new key. Give it a name like "production-api". Copy the key (
ak_xxx...) once; we only store the SHA-256 hash. - Find your workspace ID. Dashboard > Settings > General > Workspace ID. Format is
wt_5b45ff_xxx. - Send your first request. Example below.
curl -X POST https://api.askvault.co/v1/query \-H "Authorization: Bearer ak_xxx" \-H "Content-Type: application/json" \-d '{ "workspace_id": "wt_xxx", "message": "What is your refund policy?"}'import os, requests
response = requests.post( "https://api.askvault.co/v1/query", headers={"Authorization": f"Bearer {os.environ['ASKVAULT_API_KEY']}"}, json={"workspace_id": "wt_xxx", "message": "What is your refund policy?"}, timeout=15,)print(response.json())const response = await fetch("https://api.askvault.co/v1/query", {method: "POST",headers: { "Authorization": `Bearer ${process.env.ASKVAULT_API_KEY}`, "Content-Type": "application/json",},body: JSON.stringify({ workspace_id: "wt_xxx", message: "What is your refund policy?",}),});console.log(await response.json());You get back a JSON object with answer, sources, confidence, latency_ms, and request_id. Full schema in the query endpoint reference.
Streaming responses with SSE
For low-perceived-latency UX (typing-effect chat), use the streaming endpoint at /v1/query/stream. Responses start in under 300 ms and complete in 1.5 to 4 seconds.
const response = await fetch("https://api.askvault.co/v1/query/stream", { method: "POST", headers: { "Authorization": `Bearer ${process.env.ASKVAULT_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ workspace_id: "wt_xxx", message: "How does pricing work?" }),});
const reader = response.body.getReader();const decoder = new TextDecoder();while (true) { const { value, done } = await reader.read(); if (done) break; process.stdout.write(decoder.decode(value));}The streaming protocol is standard Server-Sent Events. Each chunk carries either a token (partial answer text) or a source (citation as it's selected). Full event reference is in the streaming guide.
Multi-user usage
If you're building a multi-tenant app where end users chat through your API integration, pass a unique user_id per end user:
{ "workspace_id": "wt_xxx", "message": "What's the status of my order?", "user_id": "your-app-user-id-42", "conversation_id": "conv-abc-123"}AskVault uses user_id for per-user rate limiting, conversation-history isolation, and per-user analytics. The conversation_id keeps multi-turn context across requests within the same conversation.
For audience-based scoping (different users see different content subsets), pass a verification_token (HMAC-signed user_id) and an audience array. The bot enforces the audience rules server-side.
Rate limits
Limits per API key, per plan:
| Plan | Per minute | Per day |
|---|---|---|
| Free | 10 | 100 |
| Starter | 60 | 3,000 |
| Growth | 200 | 15,000 |
| Business | 1,000 | 50,000 |
| Enterprise | Custom | Custom |
Every response includes rate-limit headers (X-RateLimit-Remaining-Minute, X-RateLimit-Remaining-Day, X-RateLimit-Reset-Day) so you can avoid hitting the cap. When you exceed it, you get HTTP 429 Too Many Requests with Retry-After in seconds. Full rate-limit reference at the rate-limits page.
Webhooks (incoming events)
In addition to the request/response pattern, you can subscribe to webhooks for events. AskVault POSTs to your endpoint when:
conversation.started: a new conversation begins on any channelconversation.escalated: the bot escalates to a humanlead.captured: the collect_lead skill captures a leadknowledge.gap_detected: a query had low retrieval confidence
Configure webhooks under Dashboard > Webhooks > Add Endpoint. Useful for piping AskVault events into your CRM, alert system, or analytics pipeline.
Security
Three things to keep in mind:
- API keys are passwords. Store in environment variables, not source code. Rotate every quarter. If a key leaks, revoke it immediately in the dashboard.
- CORS. The API doesn't allow browser-direct CORS calls (your API key would be exposed). For browser use, either proxy through your backend or use the widget channel which authenticates with a public workspace token plus per-domain rate limiting.
- HTTPS only. Plain HTTP requests are rejected. TLS 1.3 with TLS 1.2 minimum.
Common pitfalls
401 Unauthorized. Authorization header is malformed. Must be exactly Bearer ak_xxx. No "Token" or "Key" prefix.
403 Forbidden. API key belongs to a different workspace owner. Use a key from the same account.
429 Too Many Requests. You hit a rate limit. Back off for the Retry-After duration, then retry.
Empty sources array. Your workspace has no indexed content yet. Check Knowledge Hub in the dashboard.
Latency > 4 seconds. First request to a cold workspace is slow. Subsequent requests are fast. Pre-warm by sending a noop query at app start.
FAQ
Is there an official SDK?
JavaScript and Python clients are on the roadmap. Until they ship, the API is OpenAPI-spec compatible at openapi.askvault.co/openapi.json. You can generate a client in any language.
Can I batch requests?
Not yet. Each query is one HTTP request. For high-throughput batch use cases, send concurrent requests with a connection pool. Rate limits apply per-key.
Can I cache responses?
Yes, at your application layer. AskVault doesn't cache responses server-side because the underlying knowledge base can change between requests. If your queries are stable (same question expected to give same answer for hours), cache aggressively in your app.
How do I migrate from another chatbot API?
The /v1/query request shape is intentionally similar to OpenAI's Chat Completions API. For most migrations, replace the base URL and authorization header. Adapt the response parsing to use answer instead of choices[0].message.content.
Is there a sandbox for testing?
Free plan works for sandboxing. 100 queries per month, no credit card. Pass ?sandbox=true as a query parameter to skip query-count incrementing while you test integrations.
Related guides
- Getting started with the AskVault API
- POST /v1/query reference
- POST /v1/query/stream reference
- Rate limits per plan
- Error codes
- Webhooks