How to set up voice support with AskVault
What voice support gets you
Voice support is the right channel for three patterns: appointment booking and changes where customers prefer to call rather than type, basic account questions where keyboards are inconvenient (driving customers, older demographics), and overflow handling when your phone queue is long and you want a bot to triage before a human picks up.
Voice is on the Business plan and above. Business+
How a call actually flows
When a customer calls your AskVault voice number:
- Twilio answers the call and plays your custom greeting (configurable text).
- The customer speaks their question. Twilio transcribes the audio in real time.
- The transcript goes to AskVault. The AI agent retrieves relevant content from your knowledge base.
- AskVault returns the text answer. Twilio converts it to speech using a voice you pick.
- The customer hears the answer. They can ask a follow-up. Repeat.
Round-trip latency is about 2 to 4 seconds per turn. Long enough that the customer notices but short enough that the conversation feels natural.
Step 1: buy a voice-capable Twilio number
- Sign in to Twilio Console.
- Phone Numbers > Manage > Buy a Number.
- Pick the country, search for numbers with Voice capability enabled.
- Purchase the number. US voice numbers cost about $1 per month plus per-minute fees.
Step 2: connect to AskVault
- In Twilio Console, copy your Account SID and Auth Token.
- In AskVault, open Deploy > Voice > Add Number.
- Paste the Twilio credentials and the number you bought.
- Click Save.
AskVault configures Twilio's voice webhook so inbound calls route through our pipeline. Test it: call the Twilio number from your personal phone. The bot picks up and plays your default greeting.
Step 3: customize the greeting
Under Deploy > Voice > Greeting, edit the welcome message. Examples:
Hi, you've reached Acme support. I'm an AI assistant. What can I help you with today?
Or for a more guided experience:
Hello and thanks for calling Acme. You can ask me about your account, your order, or our products. Go ahead, I'm listening.
Keep greetings under 8 seconds. Anything longer and callers start to disconnect.
Step 4: configure voice settings
Three settings matter most.
Voice. Pick the text-to-speech voice the bot uses. AskVault offers about 30 voices across major languages. For US English, the default voice sounds natural and is well-tested.
Speech recognition language. Defaults to English. Set per-number for multilingual deployments.
Interrupt mode. Whether the caller can interrupt the bot's speech to ask the next question. We recommend On: more natural conversation, no waiting for the bot to finish.
Optional: IVR menu
For traffic high enough to need routing, configure an IVR (Interactive Voice Response) menu under Voice Settings > Menu:
Hi, you've reached Acme. Press 1 for account help, press 2 for order status, press 3 for anything else.
Press 1 or 2 routes the call to a topic-specific knowledge subset (using audience tags). Press 3 sends the call to the general AI agent.
For most use cases an IVR adds friction and the default "just ask" flow is better. Use IVR only if your inbound mix is bimodal (e.g., 60% order-status and 30% other) and you want explicit routing.
Escalating to a human
The escalate_to_human skill works on voice. When triggered, the bot says "Let me transfer you to someone who can help" and hands off the call to a human agent at a configured number. The customer hears hold music for a few seconds, then a human picks up.
Configure the handoff number under Voice Settings > Escalation > Forward to. Multiple numbers can rotate based on time of day, day of week, or skill type.
Limits and costs
Twilio voice costs about $0.013 per minute inbound and $0.014 per minute outbound for US numbers. Plus per-minute speech-recognition and text-to-speech charges (about $0.05 per minute combined). Total: roughly $0.07 per minute of conversation.
AskVault counts each AI-generated voice reply as 1 query against your monthly quota. A 5-minute conversation with 10 AI turns uses 10 queries.
For high-volume voice deployments (over 5,000 minutes per month), reach out to sales@askvault.co for a volume agreement that flattens per-minute costs.
Compliance
Voice has more regulations than text. Three to know:
- Call recording. AskVault records every call by default. If you record customer conversations, you must inform them at the start (most jurisdictions). The default greeting includes "This call may be recorded for quality and training" automatically.
- PII handling. Voice transcripts often contain spoken account numbers, dates of birth, addresses. Audit logs treat voice the same as text: AES-256 at rest, TLS 1.3 in transit, 365-day retention.
- TCPA (US). Outbound automated calls and texts are heavily regulated. AskVault doesn't ship an outbound voice campaign feature for this reason.
Common pitfalls
Caller speaks but bot doesn't reply. Background noise is high enough that speech recognition fails. The bot asks them to repeat. If it keeps happening, the customer's signal is bad. Bot escalates to human after 2 failed attempts.
Bot misunderstands accents or industry jargon. Speech-to-text accuracy varies by accent and by jargon. Build a vocabulary list under Voice Settings > Custom Vocabulary with your product names, account formats, and technical terms.
Calls drop after 30 seconds. Twilio's default call timeout is 30 seconds of silence. The bot handles silence by re-prompting the customer at 10 and 20 seconds. If your callers go quiet often (e.g., they're looking up information), extend the timeout under Voice Settings > Silence Timeout.
Customer says "agent" but bot keeps talking. Make sure escalate_to_human is enabled and the trigger phrase "agent", "human", "representative" is in the skill's trigger list.
FAQ
Can the bot make outbound calls?
Not in the standard AskVault flow. Outbound automated calling is heavily regulated (TCPA, GDPR e-Privacy). For specific outbound use cases (appointment confirmations, callback campaigns), reach out to sales@askvault.co for a custom integration.
Can I use a non-Twilio voice provider?
Twilio is the default. For Plivo, Vonage, or self-hosted Asterisk, the REST API channel gives you a programmatic interface.
Does the bot work with toll-free numbers?
Yes. US toll-free numbers require a separate Twilio Toll-Free Verification process. Takes 2 to 4 weeks.
How accurate is the speech recognition?
For clear English audio without heavy background noise, transcription accuracy is roughly 92 to 96%. Accent, jargon, and bad audio degrade this. The custom vocabulary feature helps significantly for industry-specific terms.
Can I get a transcript of every call?
Yes. Every call appears in Live Chat > Conversations filtered by Voice channel, with the full transcript and audio recording attached. Export as CSV from Analytics > Export.