POST /v1/query/stream endpoint
Endpoint
POST https://api.askvault.co/v1/query/streamSame request body as /v1/query. Response uses Server-Sent Events (SSE).
Event types
Four event types streamed:
token. A partial token of the answer.source. A citation source.done. Streaming complete; final metadata.error. Error mid-stream.
Example
curl -X POST https://api.askvault.co/v1/query/stream \ -H "Authorization: Bearer ak_live_xxx" \ -H "Content-Type: application/json" \ -H "Accept: text/event-stream" \ -d '{"workspace_id":"ws_xxx","query":"What is your refund policy?"}'Returns:
event: tokendata: {"text":"Our "}
event: tokendata: {"text":"refund "}
event: tokendata: {"text":"policy "}
event: sourcedata: {"title":"Refund Policy","url":"..."}
event: donedata: {"latency_ms":1850,"tokens_used":450}When to use streaming
Three cases:
- Chat UIs. Show the bot's reply as it's typing, like ChatGPT.
- Long responses. Reduce perceived latency.
- Voice TTS streaming. Start speaking before the full answer is ready.
Latency
- First token: about 500 ms (p50).
- Full response: about 1.5 to 3 seconds (p50 to p95).
Streaming reduces perceived latency about 60 to 80%.
Client implementations
Browser (EventSource):
const es = new EventSource('/v1/query/stream?...');es.addEventListener('token', (e) => { const { text } = JSON.parse(e.data); appendToChat(text);});es.addEventListener('done', () => es.close());Node.js: Use eventsource package or native fetch with ReadableStream.
Python: Use httpx with streaming or sseclient.
Error handling
Mid-stream errors:
event: errordata: {"code":"rate_limit","message":"Rate limit exceeded","retry_after":60}Client should close the connection and retry after the specified delay.
Limits
- Same as /v1/query. Auth, rate limits, query length.
- Stream duration cap. 30 seconds before forced close.
- Reconnect. No native reconnect; client retries with new request.
Common pitfalls
Browser blocks the connection. CORS plus credentials issue. Confirm Authorization header allowed.
Stream stalls. Network buffering. Use flush() on your client framework.
Lost tokens. Network drop mid-stream. Implement client-side retry from last-known position.
FAQ
Does streaming cost more than standard?
No. Same per-query billing.
Can I cancel a stream mid-flight?
Yes by closing the EventSource. Server stops generating; tokens billed up to that point.
Does streaming work for skill-triggered responses?
Skill output is short and arrives in batches. Streaming still works but may show as fewer chunks.