Voice IVR configuration
What IVR is for
Interactive Voice Response routes callers to the right answer. Three patterns:
- Self-service for common questions. Caller asks "what are your hours?". Bot answers via TTS. Call ends.
- Department routing. Caller picks billing, sales, technical support. Routed to the right human queue.
- Hybrid. Bot tries first; if it can't resolve, transfers to a human with full context.
Voice channel handles about 15 to 30% of customer issues without human involvement when configured well.
Setup walkthrough
About 45 minutes:
- Connect Twilio Voice. See voice setup.
- Purchase a phone number in Twilio.
- Configure the IVR flow under Deploy Hub > Voice > IVR.
- Test with a real call.
- Iterate on flow based on call data.
IVR menu types
Two interaction modes per menu level:
DTMF (keypad). "Press 1 for billing, press 2 for sales, press 3 for technical support."
- Pros. Reliable in noisy environments. Works on any phone.
- Cons. Less natural. Limited to 9 options per menu.
Speech. "Tell us what you're calling about: billing, sales, technical support, or something else?"
- Pros. Natural. Handles complex inputs ("I want to cancel my subscription").
- Cons. ASR (automatic speech recognition) accuracy varies; struggles with strong accents or background noise.
Hybrid mode: speech with DTMF fallback. Best for accessibility.
Building the flow
The IVR flow is a tree of menus and actions:
- Greeting. First message after pickup. About 5 to 10 seconds typical.
- Main menu. Routes to sub-menus or actions.
- Sub-menus. Departmental or topical drilldowns.
- Actions. Bot answer (TTS), transfer to human, voicemail, hang up.
Example:
Greeting: "Welcome to Acme. I can help with billing, orders, or technical support."Menu (Speech): Match "billing" → Submenu A Match "order" → Submenu B Match "technical" → Submenu C Match "agent" → Transfer to live agent No match (3 retries) → Transfer to live agent
Submenu A (Billing): "Are you asking about a current invoice, refund, or payment method?" Match "invoice" → Bot answer (current invoice via skill) Match "refund" → Bot answer + transfer if can't resolve Match "payment" → Bot answer ...Drag-and-drop builder in Deploy Hub > Voice > IVR.
Bot answer mode
When a menu action triggers a bot answer:
- Caller's speech transcribes via ASR.
- Transcribed text passes to AskVault's RAG pipeline.
- Answer generates.
- TTS reads the answer aloud.
- Bot offers follow-up: "Was that helpful? Say yes, no, or transfer."
Typical end-to-end latency: 3 to 6 seconds (ASR + RAG + TTS).
Voice and TTS settings
Choose how the bot sounds:
- Voice model. Several voices per language. Pick one matching your brand.
- Speaking rate. Normal, slower for accessibility, faster for power users.
- Pitch. Slight adjustments for personality.
- Pronunciation overrides. SSML markup for tricky product names (e.g., "AskVault" pronounced as
ask-vault, notaks-vault).
Configure under Deploy Hub > Voice > TTS.
Multi-language
The IVR can auto-detect the caller's language:
- Greeting in two languages (e.g., "Welcome. Press 1 for English, press 2 for Spanish").
- Auto-detection from caller's first speech utterance.
- All TTS and ASR adapt to the selected language.
Supported languages: about 35 for ASR plus TTS. Configure under Voice > Languages.
Transferring to a human
Three transfer patterns:
Cold transfer. Caller is transferred without context to the human's queue. Simple but the agent re-asks everything.
Warm transfer. Bot summarizes the conversation, sends as a screen-pop to the agent, then connects. Agent has context within 5 seconds of pickup.
Conferenced transfer. Bot stays on the line briefly while introducing the issue to the agent, then drops off. Hybrid of cold + warm.
Recommended: warm transfer. Configure under Voice > Transfer Mode.
Human-agent integration
Voice calls join your support team's existing tools:
- Call rings to a configured phone number (your agent's softphone or a real desk phone).
- Or routes to a queue managed by your call-center software.
- Conversation logs in AskVault post-call.
- Transcript available for audit and quality review.
Voicemail
When no agent available:
- Bot informs caller: "Our team is unavailable right now. Want to leave a message?"
- Caller records up to 3 minutes.
- Voicemail transcribes automatically.
- Conversation creates in AskVault with the audio file plus transcript.
- Email notification sent to the support team.
Configure voicemail under Voice > Voicemail.
Business hours
The IVR can behave differently based on time:
- Business hours. Full menu with transfer-to-agent options.
- After hours. Bot-only mode plus voicemail.
- Weekends, holidays. Configured exceptions.
Set under Workspace Settings > Business Hours plus Voice > Hours Behavior.
Call recording
For quality and compliance:
- All calls recorded by default (announce to caller per regulation).
- Stored 90 days standard, 1 year on Enterprise.
- Transcripts auto-generated.
- Searchable in conversation history.
Disable recording per call (rare, e.g., for sensitive callers) via voice prompt or agent action.
Caller authentication
For identity-verified flows:
- Caller ID matches to a known contact.
- Voice-based PIN ("Please say your 4-digit PIN").
- Voice biometric (advanced; available on Enterprise).
Authenticated callers can access identity-gated skills (subscription_manager, refund flow).
Sample IVR flow
End-to-end for a B2B SaaS:
Hello (TTS in caller's language): "Welcome to Acme. How can I help today?"
Caller: "I need help with my subscription."
Bot: ASR transcribes; matches "subscription" intent. Routes to subscription_manager skill. Skill needs identity: asks for account email or 4-digit PIN.
Caller: provides PIN.
Bot: Identity verified. Reads subscription status: "You're on the Growth plan. Next billing May 28 for $49. Anything else?"
Caller: "Cancel my subscription."
Bot: Policy check: cancellation requires confirmation. "To confirm cancellation, say YES. To cancel this request, say NO."
Caller: "Yes."
Bot: Cancels subscription via Stripe API. "Done. Your subscription will end May 28. You can reactivate any time. Goodbye."
Call ends. Conversation logged with full transcript and audit.Total call time: about 2 minutes. Without IVR: a human agent would take 5 to 10 minutes for the same.
Sentiment-based routing
If the sentiment_router skill detects frustration:
- Auto-transfer to a senior agent rather than continuing bot interaction.
- Skip remaining menu options.
- Apologize and offer a callback if no agent available.
Reduces "screaming at IVR" customer pain points.
Audit and quality
Per call:
- Full transcript.
- Caller path through menus.
- Bot decisions with reasoning.
- Transfer details (when and why).
- Caller sentiment trajectory.
Visible under Live Chat > Voice conversations. Useful for quality review.
Compliance
Voice channel adds compliance dimensions:
- Call recording disclosure. Most jurisdictions require informing the caller. Enabled by default.
- TCPA (US). Outbound calls require prior consent. AskVault is inbound-only by default; outbound requires opt-in.
- GDPR. Voice recordings are personal data. Retention follows your workspace retention policy.
See security overview.
Cost considerations
Voice billing combines:
- AskVault Business or Enterprise subscription.
- Twilio per-minute charges (about $0.013 to $0.04 per minute depending on country).
- TTS and ASR usage (typically rolled into the per-minute Twilio cost).
For a typical voice deployment handling 1,000 calls per month averaging 3 minutes each, expect about $60 to $150 per month in Twilio fees.
Plan availability
- Free, Starter, Growth. No voice channel.
- Business. Full voice IVR, multi-language, warm transfer. Business+
- Enterprise. Voice biometric, SIP trunk support, custom voice models.
Planned features (on the roadmap)
Documented for accuracy:
- Voice biometric authentication. Today, voice PIN. Native voice biometric planned for Enterprise.
- SIP trunk support. Today, Twilio only. Direct SIP integration planned for Enterprise.
- Outbound campaign mode. Today, inbound-only. Compliance-gated outbound campaigns planned.
- Real-time agent assist. Today, post-call transcript. Live transcription with agent suggestions during the call planned.
Limits
- Menu depth. Up to 5 levels.
- Options per menu. Up to 9 (DTMF) or unlimited (speech).
- Call recording retention. 90 days standard, 1 year Enterprise.
- Concurrent calls per number. Twilio-dependent; typically 10 to 100 with default settings.
Common pitfalls
Caller drops off in deep menus. Menu depth too high. Cap at 3 levels.
Speech recognition fails on accents. ASR weaker for non-native speakers of the target language. Add DTMF fallback.
Bot answers vague. Knowledge coverage thin for voice queries. Add Q&A pairs explicitly for voice-pattern questions.
Transfer fails. Agent number unavailable or queue full. Fallback to voicemail.
FAQ
Does this work outside the US?
Yes. Twilio numbers available in 100+ countries. ASR and TTS adapt to local languages.
Can I use my existing phone number?
Yes via number porting in Twilio. Port to Twilio first, then configure in AskVault.
Are call transcripts accurate?
About 85 to 95% word accuracy on clear English. Lower for noisy environments or strong accents.
Can the bot make outbound calls?
Inbound-only by default. Outbound is planned with compliance gating.
Does voice support multi-language calls?
Yes. Auto-detect from caller's first speech, or explicit menu selection.
Related guides
- Voice setup
- Identity verification
- Subscription manager skill
- Escalate to human skill
- Sentiment router skill