Skip to content
Try Free →

POST /v1/query/stream endpoint

Last updated: · 3 min read

Endpoint

POST https://api.askvault.co/v1/query/stream

Same request body as /v1/query. Response uses Server-Sent Events (SSE).

Event types

Four event types streamed:

  • token. A partial token of the answer.
  • source. A citation source.
  • done. Streaming complete; final metadata.
  • error. Error mid-stream.

Example

Terminal window
curl -X POST https://api.askvault.co/v1/query/stream \
-H "Authorization: Bearer ak_live_xxx" \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-d '{"workspace_id":"ws_xxx","query":"What is your refund policy?"}'

Returns:

event: token
data: {"text":"Our "}
event: token
data: {"text":"refund "}
event: token
data: {"text":"policy "}
event: source
data: {"title":"Refund Policy","url":"..."}
event: done
data: {"latency_ms":1850,"tokens_used":450}

When to use streaming

Three cases:

  1. Chat UIs. Show the bot's reply as it's typing, like ChatGPT.
  2. Long responses. Reduce perceived latency.
  3. Voice TTS streaming. Start speaking before the full answer is ready.

Latency

  • First token: about 500 ms (p50).
  • Full response: about 1.5 to 3 seconds (p50 to p95).

Streaming reduces perceived latency about 60 to 80%.

Client implementations

Browser (EventSource):

const es = new EventSource('/v1/query/stream?...');
es.addEventListener('token', (e) => {
const { text } = JSON.parse(e.data);
appendToChat(text);
});
es.addEventListener('done', () => es.close());

Node.js: Use eventsource package or native fetch with ReadableStream.

Python: Use httpx with streaming or sseclient.

Error handling

Mid-stream errors:

event: error
data: {"code":"rate_limit","message":"Rate limit exceeded","retry_after":60}

Client should close the connection and retry after the specified delay.

Limits

  • Same as /v1/query. Auth, rate limits, query length.
  • Stream duration cap. 30 seconds before forced close.
  • Reconnect. No native reconnect; client retries with new request.

Common pitfalls

Browser blocks the connection. CORS plus credentials issue. Confirm Authorization header allowed.

Stream stalls. Network buffering. Use flush() on your client framework.

Lost tokens. Network drop mid-stream. Implement client-side retry from last-known position.

FAQ

Does streaming cost more than standard?

No. Same per-query billing.

Can I cancel a stream mid-flight?

Yes by closing the EventSource. Server stops generating; tokens billed up to that point.

Does streaming work for skill-triggered responses?

Skill output is short and arrives in batches. Streaming still works but may show as fewer chunks.

Was this page helpful?