AI safety commitments

Written by Aashiq, Founder, AskVault · Reviewed by Aashiq

Last updated: May 15, 2026 · 5 min read

The five commitments

Source-cited answers. Every bot answer cites a verifiable source. RAG architecture grounds responses; no free-form hallucination.
No customer-data training. Your content and conversations are never used to train shared models.
Audience-tag enforcement. Sensitive content scoped to verified visitors only.
Kill switches per skill. Hard caps the LLM cannot override.
Mandatory escalation paths. Every conversation has a path to a human.

Source-citation as anti-hallucination

Our primary defense against hallucination:

Retrieval before generation. Bot retrieves relevant chunks first.
Generation grounded in those chunks via system prompt.
Citation surface every answer with URL.
Strict mode prompts to refuse if no chunks match.

Measured hallucination rate: about 2% on text questions. Most "wrong" answers reflect knowledge gaps, not fabrication.

No customer-data training

A written commitment:

Your content never used to train shared models.
Your conversations never used.
LLM providers we route through have "not for training" flags set.

No opt-in alternative exists. If we ever introduced one, it would be explicit, off-by-default, revocable.

Audience-tag enforcement

To prevent leakage:

Every retrieval filters by audience.
The LLM never sees unauthorized content.
No prompt injection bypasses the filter (it operates pre-LLM).

Tested rigorously; we treat cross-audience leakage as P0 priority.

Skill kill switches

Some skills have hard limits beyond LLM control:

discount_negotiator. Per-visitor 15% cap; per-workspace $5,000/mo cap.
subscription_manager. Per-visitor refund cap ($500 default).
collect_lead. Once per conversation.

Enforced at the policy layer, not in the prompt. LLM "decides" to offer 50% off; policy rejects.

Mandatory escalation

Every conversation surface has a path to a human:

"Talk to a human" trigger in widget and channel UIs.
escalate_to_human skill auto-triggers on frustration.
Bot retry-then-escalate on repeated misunderstandings.

No bot-only mode where customers feel trapped.

Bias and fairness

Best-effort:

Bot inherits LLM provider's bias in worst case.
Audience tags prevent serving sensitive content to wrong segments.
Content moderation filters extreme outputs.
Continuous monitoring of CSAT, complaints.

No bot is perfectly unbiased. We aim for fairness, monitor for issues.

Privacy

See data handling:

Encrypted at rest (AES-256).
TLS 1.3 in transit.
Workspace isolation at every layer.
GDPR data deletion within 30 days of request.

Red-team testing

Periodic adversarial testing:

Prompt injection attempts. Tested quarterly.
Audience-leak attempts. Continuous.
Skill-bound bypass attempts. Tested.
Findings remediated before next release.

Sample findings publicized in transparency reports.

What we don't promise

Honesty:

Zero hallucination. Impossible. Aim for under 2%.
Perfect fairness. Best-effort; LLM bias residual.
Catching every prompt injection. Defense-in-depth; not invincible.
AGI safety. We deploy current models with current guardrails.

Be skeptical of vendors who promise more.

Limits

Hallucination rate. Target under 2% on factual questions.
Audience-tag enforcement. Architectural; verified per release.
Escalation latency. Under 30 seconds.
Red-team cadence. Quarterly (4 times per year).
Audit retention. 365 days standard, 6 years on Enterprise.

Common pitfalls

Trusting bot uncritically. Verify high-stakes answers via citations.

Assuming all safety problems solved. Bot is one layer; humans review.

Removing human-escalation path. Required; don't disable.

FAQ

Will my bot get safer over time?

Yes. We continuously improve guardrails. Your bot inherits.

Can I see the system prompts?

Yes for your workspace under AI Config. Provider-side prompts not exposed (industry norm).

What if my bot gives a wrong answer?

Customers can thumbs-down. Agents can revise-and-train. Iterative improvement.

Was this page helpful?