Skip to main content
Audio support enables your AI agent to understand spoken input, allowing customers to communicate through voice messages for a more natural and efficient support experience.

Why Audio Input Matters

Voice message capabilities transform how customers interact with your AI: Convenience - Customers can communicate while walking, driving (hands-free), or in situations where typing is inconvenient. Complex Explanations - Users can verbally explain nuanced problems that would require lengthy text descriptions, making it easier to articulate complex issues. Accessibility - Voice input particularly benefits users with:
  • Visual impairments
  • Mobility limitations or conditions like arthritis
  • Limited literacy skills in their preferred language
Natural Communication - Speaking feels more natural than typing for many users, leading to more conversational and detailed queries. Mobile Optimization - Voice is often the preferred input method on mobile devices where typing is cumbersome.

How Audio Input Works

  1. Customer records - User presses microphone button and speaks their question
  2. Audio capture - Browser or app captures audio stream
  3. Speech-to-text - Audio is accurately transcribed to text
  4. AI processing - The AI understands spoken language, identifies tone, and processes verbal descriptions
  5. Text response - AI generates response (users with visual impairments rely on screen readers for response)
Audio input is currently one-way: customers can send voice messages, but the AI responds with text. Users with visual impairments rely on screen readers for reading responses.

Multilingual Support

Audio input works across 24+ languages, enabling broader accessibility and international customer support:
  • Automatic language detection
  • Matches your configured agent language
  • Supports multiple languages if your agent is multilingual

Supported Channels

Website Widget:
  • Microphone button available in chat interface
  • Browser-based audio capture
  • Works on desktop and mobile browsers
  • Requires microphone permissions from browser
Mobile SDKs:
  • Native audio capture on iOS and Android
  • Optimized for mobile device microphones
  • Background noise reduction
Integration Channels:
  • Zendesk
  • Salesforce
  • Slack
  • Other integrated platforms (where supported)

Use Cases

Customer Support

Hands-Free Assistance:
Customer: [Speaking while driving] "I need to check the status of
my order. I ordered a blue backpack last week."

AI: [Text response] "I can help you check your order status.
Could you provide your order number or the email address used
for the purchase?"
Complex Problem Descriptions:
Customer: [Voice message] "My washing machine is making this weird
grinding noise during the spin cycle, and it's been getting louder
over the past week. It only happens when I use the heavy-duty setting."

AI: [Text response] "Based on your description, the grinding noise
during the spin cycle on heavy-duty mode could indicate a few issues..."

Mobile-First Scenarios

On-the-Go Support:
  • Customers walking or commuting
  • Users without a keyboard handy
  • Quick questions while multitasking
Accessibility Priority:
  • Users with visual impairments using voice + screen readers
  • Mobility-limited customers who find typing difficult
  • Elderly users more comfortable with speaking

Configuration

Enable audio input for your deployment:
  1. Go to Deploy → [Your Deployment] → Settings
  2. Enable Audio Input
  3. Configure language support
  4. Test with sample voice messages
  5. Deploy to production

Guidance Considerations

Train your AI to handle voice-specific scenarios:
"When customers describe problems verbally, they may include
more context and emotion than typed messages. Pay attention to
tone and ask clarifying questions if the description is unclear."
"For users with accessibility needs, ensure responses are
clear and well-structured for screen reader compatibility."

Best Practices

Clear Instructions

Guide users on how to use voice input:
"Click the microphone icon and speak your question clearly"
"Press and hold to record, release to send"
"Speak naturally - I'll transcribe and understand your message"

Fallback Options

Always provide text input as alternative:
  • Some users prefer typing
  • Audio may not work in all environments (noisy locations, poor connectivity)
  • Privacy concerns in public spaces
  • Some situations require written documentation

Audio Quality Tips

Inform users about optimal recording conditions:
"For best results, speak in a quiet environment"
"If your message wasn't transcribed correctly, you can try again
or type your question instead"
"Hold your device's microphone close to your mouth"

Response Formatting

Structure responses for clarity when read by screen readers:
  • Use clear, concise sentences
  • Organize information with bullet points
  • Avoid complex formatting that may not read well audibly
  • Include important information at the beginning

Technical Considerations

Privacy and Security

Audio Data Processing:
  • Audio temporarily processed for transcription
  • Text is stored, audio typically not retained
  • Configurable data retention policies
  • GDPR and privacy law compliance
User Consent:
  • Microphone permission required from browser
  • Clear privacy notices in your deployment
  • User control over when to enable microphone

Safety Measures

Content Moderation:
  • Malware and virus scanning for uploaded files
  • Content moderation detecting inappropriate content
  • Reporting mechanisms for policy violations
  • Team protection from inappropriate material

Performance

Transcription Quality:
  • Accurate transcription across 24+ languages
  • Tone and emotion detection
  • Handles various accents and speaking styles
  • Background noise reduction (especially on mobile)
Latency:
  • Speech-to-text processing adds 1-3 seconds
  • Overall conversation feels natural
  • Optimized for real-time interaction
Bandwidth:
  • Audio streaming requires stable connection
  • Automatic quality adjustment
  • Fallback to text on poor connections

Browser Compatibility

Requirements:
  • Modern browsers (Chrome, Firefox, Safari, Edge)
  • HTTPS required for microphone access
  • Mobile browser support
  • Permissions must be granted by user

Frequently Asked Questions

Possible causes:
  1. Browser blocked microphone permission
  2. No microphone connected
  3. Microphone used by another application
  4. Not using HTTPS connection
  5. Browser doesn’t support audio input
Solutions:
  • Check browser permission settings
  • Grant microphone access when prompted
  • Close other apps using microphone
  • Ensure website uses HTTPS
  • Try different browser
  • Test microphone with other applications
Possible causes:
  1. Background noise interference
  2. Speaking too quietly or too fast
  3. Strong accent or dialect
  4. Poor microphone quality
  5. Unstable internet connection
Solutions:
  • Move to quieter environment
  • Speak clearly and at moderate pace
  • Position microphone closer to mouth
  • Use headset microphone for better quality
  • Check internet connection stability
  • Try typing complex terms or proper nouns
Possible causes:
  1. Internet connection interrupted
  2. Audio file size too large
  3. Browser compatibility issue
  4. Microphone permission revoked
Solutions:
  • Check internet connection
  • Try shorter voice messages
  • Refresh page and grant permissions again
  • Update browser to latest version
  • Use text input as fallback
Possible causes:
  1. Speaking multiple languages in one message
  2. Language settings mismatch
  3. Strong accent or dialect
Solutions:
  • Stick to one language per message
  • Verify language settings in deployment
  • Speak more clearly or try text input
  • Contact support if persistent issues

Example Scenarios

Technical Support

Customer scenario: User experiencing software error while commuting Voice message: “Hey, I’m getting an error message when I try to log in. It says ‘Connection timeout’ or something like that. I’m on my phone using cellular data.” AI processes: Transcribes message, identifies error type, considers context (mobile, cellular) AI response: “I can help with that login error. The ‘Connection timeout’ message usually occurs when there’s a network connectivity issue. Since you’re on cellular data, let’s try these steps…”

Product Information

Customer scenario: User shopping while holding items Voice message: “I’m looking at this blue backpack. Does it have a laptop compartment? And is it waterproof?” AI processes: Transcribes questions, identifies product context, prepares detailed response AI response: “Let me provide details about that blue backpack. Yes, it features a dedicated laptop compartment that fits devices up to 15 inches…”

Order Status

Customer scenario: User checking order while driving (hands-free) Voice message: “Can you tell me where my order is? I ordered it last Tuesday. The order number is… uh… B dash 1 2 3 4 5.” AI processes: Transcribes order inquiry, extracts order number, handles uncertainty (“uh”) AI response: “I found your order B-12345 from last Tuesday. It’s currently in transit and expected to arrive on Thursday…”

Next Steps

Now that you understand audio support:
  • Vision - Enable image understanding for visual problems
  • Documents - Allow customers to attach files and documents
  • Escalations - Configure handoff when voice descriptions need human review
  • Deploy - Set up audio-enabled deployments
  • Website Integration - Add audio input to your website widget
Voice message capabilities create more natural, accessible customer interactions. Enable audio input strategically based on your users’ needs and usage contexts.