AI safety commitments
The five commitments
- Source-cited answers. Every bot answer cites a verifiable source. RAG architecture grounds responses; no free-form hallucination.
- No customer-data training. Your content and conversations are never used to train shared models.
- Audience-tag enforcement. Sensitive content scoped to verified visitors only.
- Kill switches per skill. Hard caps the LLM cannot override.
- Mandatory escalation paths. Every conversation has a path to a human.
Source-citation as anti-hallucination
Our primary defense against hallucination:
- Retrieval before generation. Bot retrieves relevant chunks first.
- Generation grounded in those chunks via system prompt.
- Citation surface every answer with URL.
- Strict mode prompts to refuse if no chunks match.
Measured hallucination rate: about 2% on text questions. Most "wrong" answers reflect knowledge gaps, not fabrication.
No customer-data training
A written commitment:
- Your content never used to train shared models.
- Your conversations never used.
- LLM providers we route through have "not for training" flags set.
No opt-in alternative exists. If we ever introduced one, it would be explicit, off-by-default, revocable.
Audience-tag enforcement
To prevent leakage:
- Every retrieval filters by audience.
- The LLM never sees unauthorized content.
- No prompt injection bypasses the filter (it operates pre-LLM).
Tested rigorously; we treat cross-audience leakage as P0 priority.
Skill kill switches
Some skills have hard limits beyond LLM control:
discount_negotiator. Per-visitor 15% cap; per-workspace $5,000/mo cap.subscription_manager. Per-visitor refund cap ($500 default).collect_lead. Once per conversation.
Enforced at the policy layer, not in the prompt. LLM "decides" to offer 50% off; policy rejects.
Mandatory escalation
Every conversation surface has a path to a human:
- "Talk to a human" trigger in widget and channel UIs.
escalate_to_humanskill auto-triggers on frustration.- Bot retry-then-escalate on repeated misunderstandings.
No bot-only mode where customers feel trapped.
Bias and fairness
Best-effort:
- Bot inherits LLM provider's bias in worst case.
- Audience tags prevent serving sensitive content to wrong segments.
- Content moderation filters extreme outputs.
- Continuous monitoring of CSAT, complaints.
No bot is perfectly unbiased. We aim for fairness, monitor for issues.
Privacy
See data handling:
- Encrypted at rest (AES-256).
- TLS 1.3 in transit.
- Workspace isolation at every layer.
- GDPR data deletion within 30 days of request.
Red-team testing
Periodic adversarial testing:
- Prompt injection attempts. Tested quarterly.
- Audience-leak attempts. Continuous.
- Skill-bound bypass attempts. Tested.
- Findings remediated before next release.
Sample findings publicized in transparency reports.
What we don't promise
Honesty:
- Zero hallucination. Impossible. Aim for under 2%.
- Perfect fairness. Best-effort; LLM bias residual.
- Catching every prompt injection. Defense-in-depth; not invincible.
- AGI safety. We deploy current models with current guardrails.
Be skeptical of vendors who promise more.
Limits
- Hallucination rate. Target under 2% on factual questions.
- Audience-tag enforcement. Architectural; verified per release.
- Escalation latency. Under 30 seconds.
- Red-team cadence. Quarterly (4 times per year).
- Audit retention. 365 days standard, 6 years on Enterprise.
Common pitfalls
Trusting bot uncritically. Verify high-stakes answers via citations.
Assuming all safety problems solved. Bot is one layer; humans review.
Removing human-escalation path. Required; don't disable.
FAQ
Will my bot get safer over time?
Yes. We continuously improve guardrails. Your bot inherits.
Can I see the system prompts?
Yes for your workspace under AI Config. Provider-side prompts not exposed (industry norm).
What if my bot gives a wrong answer?
Customers can thumbs-down. Agents can revise-and-train. Iterative improvement.