Skip to content
Try Free →

AI safety commitments

Last updated: · 5 min read

The five commitments

  1. Source-cited answers. Every bot answer cites a verifiable source. RAG architecture grounds responses; no free-form hallucination.
  2. No customer-data training. Your content and conversations are never used to train shared models.
  3. Audience-tag enforcement. Sensitive content scoped to verified visitors only.
  4. Kill switches per skill. Hard caps the LLM cannot override.
  5. Mandatory escalation paths. Every conversation has a path to a human.

Source-citation as anti-hallucination

Our primary defense against hallucination:

  • Retrieval before generation. Bot retrieves relevant chunks first.
  • Generation grounded in those chunks via system prompt.
  • Citation surface every answer with URL.
  • Strict mode prompts to refuse if no chunks match.

Measured hallucination rate: about 2% on text questions. Most "wrong" answers reflect knowledge gaps, not fabrication.

No customer-data training

A written commitment:

  • Your content never used to train shared models.
  • Your conversations never used.
  • LLM providers we route through have "not for training" flags set.

No opt-in alternative exists. If we ever introduced one, it would be explicit, off-by-default, revocable.

Audience-tag enforcement

To prevent leakage:

  • Every retrieval filters by audience.
  • The LLM never sees unauthorized content.
  • No prompt injection bypasses the filter (it operates pre-LLM).

Tested rigorously; we treat cross-audience leakage as P0 priority.

Skill kill switches

Some skills have hard limits beyond LLM control:

  • discount_negotiator. Per-visitor 15% cap; per-workspace $5,000/mo cap.
  • subscription_manager. Per-visitor refund cap ($500 default).
  • collect_lead. Once per conversation.

Enforced at the policy layer, not in the prompt. LLM "decides" to offer 50% off; policy rejects.

Mandatory escalation

Every conversation surface has a path to a human:

  • "Talk to a human" trigger in widget and channel UIs.
  • escalate_to_human skill auto-triggers on frustration.
  • Bot retry-then-escalate on repeated misunderstandings.

No bot-only mode where customers feel trapped.

Bias and fairness

Best-effort:

  • Bot inherits LLM provider's bias in worst case.
  • Audience tags prevent serving sensitive content to wrong segments.
  • Content moderation filters extreme outputs.
  • Continuous monitoring of CSAT, complaints.

No bot is perfectly unbiased. We aim for fairness, monitor for issues.

Privacy

See data handling:

  • Encrypted at rest (AES-256).
  • TLS 1.3 in transit.
  • Workspace isolation at every layer.
  • GDPR data deletion within 30 days of request.

Red-team testing

Periodic adversarial testing:

  • Prompt injection attempts. Tested quarterly.
  • Audience-leak attempts. Continuous.
  • Skill-bound bypass attempts. Tested.
  • Findings remediated before next release.

Sample findings publicized in transparency reports.

What we don't promise

Honesty:

  • Zero hallucination. Impossible. Aim for under 2%.
  • Perfect fairness. Best-effort; LLM bias residual.
  • Catching every prompt injection. Defense-in-depth; not invincible.
  • AGI safety. We deploy current models with current guardrails.

Be skeptical of vendors who promise more.

Limits

  • Hallucination rate. Target under 2% on factual questions.
  • Audience-tag enforcement. Architectural; verified per release.
  • Escalation latency. Under 30 seconds.
  • Red-team cadence. Quarterly (4 times per year).
  • Audit retention. 365 days standard, 6 years on Enterprise.

Common pitfalls

Trusting bot uncritically. Verify high-stakes answers via citations.

Assuming all safety problems solved. Bot is one layer; humans review.

Removing human-escalation path. Required; don't disable.

FAQ

Will my bot get safer over time?

Yes. We continuously improve guardrails. Your bot inherits.

Can I see the system prompts?

Yes for your workspace under AI Config. Provider-side prompts not exposed (industry norm).

What if my bot gives a wrong answer?

Customers can thumbs-down. Agents can revise-and-train. Iterative improvement.

Was this page helpful?