Skip to content
Try Free →

AskVault incident-response policy

Last updated: · 5 min read

Severity classification

We classify incidents into four severity levels:

  • P0 (full outage). Core service unavailable for all customers. Bot cannot respond on any channel. Customer chat doesn't work.
  • P1 (partial outage). A significant subset of customers or channels affected. Bot works on some channels but not others.
  • P2 (degradation). Service works but latency, quality, or feature availability degraded.
  • P3 (minor). Edge-case bugs, dashboard quirks, non-critical features impaired.

Classification happens within 15 minutes of incident detection. Severity drives the response cadence and customer-communication path.

Detection sources

Five paths to incident detection:

  1. Automated monitoring. Synthetic checks fire every 60 seconds from multiple regions. Sustained failures alert PagerDuty.
  2. Customer reports. Reports via support@askvault.co or status.askvault.co reports.
  3. Internal alerts. Engineering team observes anomalies during normal work.
  4. Security disclosures. Researchers reporting vulnerabilities via security@askvault.co.
  5. Sub-processor notifications. Our SOC 2 certified providers notifying us of incidents on their end.

Each path routes into the same response runbook. Detection is followed by triage within 5 minutes.

Response timeline

For P0/P1 incidents:

  • 0 minutes. Detection. PagerDuty pages the on-call engineer.
  • 5 minutes. On-call engineer acknowledges. Status page updated to "investigating".
  • 15 minutes. Initial severity classification. First customer-facing update on status page.
  • 15 to 60 minutes. Mitigation rolling out. Status page updated at major milestones.
  • At resolution. Status page updated to "resolved". Confirmation that service is healthy.
  • Within 24 hours. Preliminary incident summary posted to status page.
  • Within 14 days. Full postmortem shared with affected Enterprise customers.

For P2 and P3 incidents, the timeline relaxes. P2 typically resolves within 4 hours; P3 within 1 business day. Postmortems for P2/P3 are internal-only unless customer impact warrants disclosure.

Containment actions

When an incident is detected, the on-call engineer chooses from a runbook of containment actions:

  • Kill switch on a specific feature. Disable the offending feature workspace-wide without taking down the whole platform.
  • Roll back a deploy. Revert to the previous known-good version. Typically 90 seconds via our deploy pipeline.
  • Failover to backup region. For database or compute failures, route traffic to standby infrastructure.
  • Rate-limit traffic. For DDoS or runaway-customer scenarios, tighten rate limits on the affected dimension.
  • Isolate the affected workspace. Rare. For incidents where a specific customer's workspace is corrupted or compromised.

Containment runs in parallel with investigation. We don't wait for root cause before mitigating.

Customer notification

Three channels carry incident communication:

  • Status page at status.askvault.co. Real-time. Updated every 15 minutes during an active incident.
  • Email to workspace owners and designated security contacts. Fired at incident start, major milestones, and resolution.
  • In-product banner in the dashboard for affected workspaces.

For security incidents involving customer data, additional direct notification via the customer's primary point of contact happens within 24 hours of confirmation.

Breach notification

A confirmed data breach (unauthorized access to customer data) triggers:

  • Within 24 hours of confirmation. Direct notification to affected customers' security contacts.
  • Within 7 days. Joint root-cause analysis between AskVault and affected customers.
  • Within 14 days. Postmortem report.
  • Coordination on regulatory notification. Under GDPR's 72-hour rule, AskVault provides the technical details and timeline; the customer (as Data Controller) handles the actual regulator notification to authorities.

Specific notification channels for security incidents are documented in your Data Processing Agreement (DPA).

Customer's role during incidents

Three things customers can do during active incidents:

  1. Subscribe to status updates. Configure under status.askvault.co > Subscribe. Email, Slack, or webhook delivery.
  2. Avoid duplicate reports. Status page typically has the most current info. Don't email support unless your issue isn't reflected on the status page.
  3. Document customer impact. If the incident affects your operations, document what your customers experienced. Useful for your own internal postmortem.

For active P0/P1 incidents, the support email queue gets very busy. Status page is the canonical source.

Postmortem structure

Postmortems for P0/P1 follow a consistent template:

  • What happened. Timeline from detection to resolution.
  • Customer impact. Who was affected, what they experienced, how long.
  • Root cause. What broke, technically.
  • What we did. Investigation and remediation steps.
  • What we'll do. Action items to prevent recurrence.
  • Action items follow-up. Tracked publicly; status reported on next postmortem.

Postmortems are blameless. The goal is system improvement, not assigning fault.

SLA credit eligibility

Incidents that breach the SLA targets trigger automatic credit eligibility for affected workspaces. Customers claim credits per the SLA process; we don't auto-apply because some incidents fall outside SLA scope (force majeure, customer-caused, scheduled maintenance).

For Enterprise customers with custom SLAs, credit terms follow the contract.

On-call coverage

We maintain 24/7 on-call coverage with overlapping shifts:

  • Primary on-call. First responder. Engages within 5 minutes of an alert.
  • Secondary on-call. Backup if primary doesn't acknowledge within 10 minutes.
  • Escalation chain. Engineering lead, then leadership.

On-call rotation is documented internally. Customers don't interact with on-call directly; communication routes through status page and the support email.

Pen testing and proactive security

In addition to reactive incident response, AskVault contracts annual third-party penetration tests. The most recent test covered API authentication, multi-tenant isolation, injection attacks, XSS, CSRF, session management, file-upload handling, and DoS resilience.

Findings get triaged into the regular sprint. Critical findings fix within 7 days, high within 30, medium within 90.

Limits

  • Plan availability. Incident response applies to all paid plans. Free tier inherits status page visibility but doesn't get direct notification.
  • Postmortem distribution. Public postmortems on status page for major incidents. Detailed postmortems to affected Enterprise customers under NDA.

Common questions

How do I report an incident?

Three paths: status.askvault.co > Report (preferred), support@askvault.co, or security@askvault.co for security-specific reports. We respond within 1 business day on every paid plan, 1 hour for P0/P1 security reports.

Does AskVault have a bug bounty?

Vulnerability disclosure program with credit but no cash bounty. We offer swag and an Enterprise plan donation to the researcher's preferred non-profit.

What about sub-processor incidents?

Sub-processors are bound by their own SLAs and incident-response. When their incidents affect AskVault customers, we surface them on our status page with attribution. Compensation flows through our normal SLA credit process.

Do you publish historical incident data?

Yes. Status page retains 90 days of incident history with full timelines. Older data available on request.

Can I get a copy of incident response runbook?

Internal runbook details are confidential. The published policy (this page) summarizes the customer-facing parts. Enterprise customers can request additional detail under NDA.

Was this page helpful?