Skip to content
Try Free →

Voice IVR configuration

Last updated: · 5 min read

What IVR is for

Interactive Voice Response routes callers to the right answer. Three patterns:

  1. Self-service for common questions. Caller asks "what are your hours?". Bot answers via TTS. Call ends.
  2. Department routing. Caller picks billing, sales, technical support. Routed to the right human queue.
  3. Hybrid. Bot tries first; if it can't resolve, transfers to a human with full context.

Voice channel handles about 15 to 30% of customer issues without human involvement when configured well.

Setup walkthrough

About 45 minutes:

  1. Connect Twilio Voice. See voice setup.
  2. Purchase a phone number in Twilio.
  3. Configure the IVR flow under Deploy Hub > Voice > IVR.
  4. Test with a real call.
  5. Iterate on flow based on call data.

IVR menu types

Two interaction modes per menu level:

DTMF (keypad). "Press 1 for billing, press 2 for sales, press 3 for technical support."

  • Pros. Reliable in noisy environments. Works on any phone.
  • Cons. Less natural. Limited to 9 options per menu.

Speech. "Tell us what you're calling about: billing, sales, technical support, or something else?"

  • Pros. Natural. Handles complex inputs ("I want to cancel my subscription").
  • Cons. ASR (automatic speech recognition) accuracy varies; struggles with strong accents or background noise.

Hybrid mode: speech with DTMF fallback. Best for accessibility.

Building the flow

The IVR flow is a tree of menus and actions:

  • Greeting. First message after pickup. About 5 to 10 seconds typical.
  • Main menu. Routes to sub-menus or actions.
  • Sub-menus. Departmental or topical drilldowns.
  • Actions. Bot answer (TTS), transfer to human, voicemail, hang up.

Example:

Greeting: "Welcome to Acme. I can help with billing, orders, or technical support."
Menu (Speech):
Match "billing" → Submenu A
Match "order" → Submenu B
Match "technical" → Submenu C
Match "agent" → Transfer to live agent
No match (3 retries) → Transfer to live agent
Submenu A (Billing):
"Are you asking about a current invoice, refund, or payment method?"
Match "invoice" → Bot answer (current invoice via skill)
Match "refund" → Bot answer + transfer if can't resolve
Match "payment" → Bot answer
...

Drag-and-drop builder in Deploy Hub > Voice > IVR.

Bot answer mode

When a menu action triggers a bot answer:

  1. Caller's speech transcribes via ASR.
  2. Transcribed text passes to AskVault's RAG pipeline.
  3. Answer generates.
  4. TTS reads the answer aloud.
  5. Bot offers follow-up: "Was that helpful? Say yes, no, or transfer."

Typical end-to-end latency: 3 to 6 seconds (ASR + RAG + TTS).

Voice and TTS settings

Choose how the bot sounds:

  • Voice model. Several voices per language. Pick one matching your brand.
  • Speaking rate. Normal, slower for accessibility, faster for power users.
  • Pitch. Slight adjustments for personality.
  • Pronunciation overrides. SSML markup for tricky product names (e.g., "AskVault" pronounced as ask-vault, not aks-vault).

Configure under Deploy Hub > Voice > TTS.

Multi-language

The IVR can auto-detect the caller's language:

  • Greeting in two languages (e.g., "Welcome. Press 1 for English, press 2 for Spanish").
  • Auto-detection from caller's first speech utterance.
  • All TTS and ASR adapt to the selected language.

Supported languages: about 35 for ASR plus TTS. Configure under Voice > Languages.

Transferring to a human

Three transfer patterns:

Cold transfer. Caller is transferred without context to the human's queue. Simple but the agent re-asks everything.

Warm transfer. Bot summarizes the conversation, sends as a screen-pop to the agent, then connects. Agent has context within 5 seconds of pickup.

Conferenced transfer. Bot stays on the line briefly while introducing the issue to the agent, then drops off. Hybrid of cold + warm.

Recommended: warm transfer. Configure under Voice > Transfer Mode.

Human-agent integration

Voice calls join your support team's existing tools:

  • Call rings to a configured phone number (your agent's softphone or a real desk phone).
  • Or routes to a queue managed by your call-center software.
  • Conversation logs in AskVault post-call.
  • Transcript available for audit and quality review.

Voicemail

When no agent available:

  1. Bot informs caller: "Our team is unavailable right now. Want to leave a message?"
  2. Caller records up to 3 minutes.
  3. Voicemail transcribes automatically.
  4. Conversation creates in AskVault with the audio file plus transcript.
  5. Email notification sent to the support team.

Configure voicemail under Voice > Voicemail.

Business hours

The IVR can behave differently based on time:

  • Business hours. Full menu with transfer-to-agent options.
  • After hours. Bot-only mode plus voicemail.
  • Weekends, holidays. Configured exceptions.

Set under Workspace Settings > Business Hours plus Voice > Hours Behavior.

Call recording

For quality and compliance:

  • All calls recorded by default (announce to caller per regulation).
  • Stored 90 days standard, 1 year on Enterprise.
  • Transcripts auto-generated.
  • Searchable in conversation history.

Disable recording per call (rare, e.g., for sensitive callers) via voice prompt or agent action.

Caller authentication

For identity-verified flows:

  • Caller ID matches to a known contact.
  • Voice-based PIN ("Please say your 4-digit PIN").
  • Voice biometric (advanced; available on Enterprise).

Authenticated callers can access identity-gated skills (subscription_manager, refund flow).

Sample IVR flow

End-to-end for a B2B SaaS:

Hello (TTS in caller's language):
"Welcome to Acme. How can I help today?"
Caller: "I need help with my subscription."
Bot:
ASR transcribes; matches "subscription" intent.
Routes to subscription_manager skill.
Skill needs identity: asks for account email or 4-digit PIN.
Caller: provides PIN.
Bot:
Identity verified.
Reads subscription status: "You're on the Growth plan. Next billing May 28 for $49. Anything else?"
Caller: "Cancel my subscription."
Bot:
Policy check: cancellation requires confirmation.
"To confirm cancellation, say YES. To cancel this request, say NO."
Caller: "Yes."
Bot:
Cancels subscription via Stripe API.
"Done. Your subscription will end May 28. You can reactivate any time. Goodbye."
Call ends. Conversation logged with full transcript and audit.

Total call time: about 2 minutes. Without IVR: a human agent would take 5 to 10 minutes for the same.

Sentiment-based routing

If the sentiment_router skill detects frustration:

  • Auto-transfer to a senior agent rather than continuing bot interaction.
  • Skip remaining menu options.
  • Apologize and offer a callback if no agent available.

Reduces "screaming at IVR" customer pain points.

Audit and quality

Per call:

  • Full transcript.
  • Caller path through menus.
  • Bot decisions with reasoning.
  • Transfer details (when and why).
  • Caller sentiment trajectory.

Visible under Live Chat > Voice conversations. Useful for quality review.

Compliance

Voice channel adds compliance dimensions:

  • Call recording disclosure. Most jurisdictions require informing the caller. Enabled by default.
  • TCPA (US). Outbound calls require prior consent. AskVault is inbound-only by default; outbound requires opt-in.
  • GDPR. Voice recordings are personal data. Retention follows your workspace retention policy.

See security overview.

Cost considerations

Voice billing combines:

  • AskVault Business or Enterprise subscription.
  • Twilio per-minute charges (about $0.013 to $0.04 per minute depending on country).
  • TTS and ASR usage (typically rolled into the per-minute Twilio cost).

For a typical voice deployment handling 1,000 calls per month averaging 3 minutes each, expect about $60 to $150 per month in Twilio fees.

Plan availability

  • Free, Starter, Growth. No voice channel.
  • Business. Full voice IVR, multi-language, warm transfer. Business+
  • Enterprise. Voice biometric, SIP trunk support, custom voice models.

Planned features (on the roadmap)

Documented for accuracy:

  • Voice biometric authentication. Today, voice PIN. Native voice biometric planned for Enterprise.
  • SIP trunk support. Today, Twilio only. Direct SIP integration planned for Enterprise.
  • Outbound campaign mode. Today, inbound-only. Compliance-gated outbound campaigns planned.
  • Real-time agent assist. Today, post-call transcript. Live transcription with agent suggestions during the call planned.

Limits

  • Menu depth. Up to 5 levels.
  • Options per menu. Up to 9 (DTMF) or unlimited (speech).
  • Call recording retention. 90 days standard, 1 year Enterprise.
  • Concurrent calls per number. Twilio-dependent; typically 10 to 100 with default settings.

Common pitfalls

Caller drops off in deep menus. Menu depth too high. Cap at 3 levels.

Speech recognition fails on accents. ASR weaker for non-native speakers of the target language. Add DTMF fallback.

Bot answers vague. Knowledge coverage thin for voice queries. Add Q&A pairs explicitly for voice-pattern questions.

Transfer fails. Agent number unavailable or queue full. Fallback to voicemail.

FAQ

Does this work outside the US?

Yes. Twilio numbers available in 100+ countries. ASR and TTS adapt to local languages.

Can I use my existing phone number?

Yes via number porting in Twilio. Port to Twilio first, then configure in AskVault.

Are call transcripts accurate?

About 85 to 95% word accuracy on clear English. Lower for noisy environments or strong accents.

Can the bot make outbound calls?

Inbound-only by default. Outbound is planned with compliance gating.

Does voice support multi-language calls?

Yes. Auto-detect from caller's first speech, or explicit menu selection.

Was this page helpful?