Why Audio Input Matters
Voice message capabilities transform how customers interact with your AI: Convenience - Customers can communicate while walking, driving (hands-free), or in situations where typing is inconvenient. Complex Explanations - Users can verbally explain nuanced problems that would require lengthy text descriptions, making it easier to articulate complex issues. Accessibility - Voice input particularly benefits users with:- Visual impairments
- Mobility limitations or conditions like arthritis
- Limited literacy skills in their preferred language
How Audio Input Works
- Customer records - User presses microphone button and speaks their question
- Audio capture - Browser or app captures audio stream
- Speech-to-text - Audio is accurately transcribed to text
- AI processing - The AI understands spoken language, identifies tone, and processes verbal descriptions
- Text response - AI generates response (users with visual impairments rely on screen readers for response)
Audio input is currently one-way: customers can send voice messages, but the AI responds with text. Users with visual impairments rely on screen readers for reading responses.
Multilingual Support
Audio input works across 24+ languages, enabling broader accessibility and international customer support:- Automatic language detection
- Matches your configured agent language
- Supports multiple languages if your agent is multilingual
Supported Channels
Website Widget:- Microphone button available in chat interface
- Browser-based audio capture
- Works on desktop and mobile browsers
- Requires microphone permissions from browser
- Native audio capture on iOS and Android
- Optimized for mobile device microphones
- Background noise reduction
- Zendesk
- Salesforce
- Slack
- Other integrated platforms (where supported)
Use Cases
Customer Support
Hands-Free Assistance:Mobile-First Scenarios
On-the-Go Support:- Customers walking or commuting
- Users without a keyboard handy
- Quick questions while multitasking
- Users with visual impairments using voice + screen readers
- Mobility-limited customers who find typing difficult
- Elderly users more comfortable with speaking
Configuration
Enable audio input for your deployment:- Go to Deploy → [Your Deployment] → Settings
- Enable Audio Input
- Configure language support
- Test with sample voice messages
- Deploy to production
Guidance Considerations
Train your AI to handle voice-specific scenarios:Best Practices
Clear Instructions
Guide users on how to use voice input:Fallback Options
Always provide text input as alternative:- Some users prefer typing
- Audio may not work in all environments (noisy locations, poor connectivity)
- Privacy concerns in public spaces
- Some situations require written documentation
Audio Quality Tips
Inform users about optimal recording conditions:Response Formatting
Structure responses for clarity when read by screen readers:- Use clear, concise sentences
- Organize information with bullet points
- Avoid complex formatting that may not read well audibly
- Include important information at the beginning
Technical Considerations
Privacy and Security
Audio Data Processing:- Audio temporarily processed for transcription
- Text is stored, audio typically not retained
- Configurable data retention policies
- GDPR and privacy law compliance
- Microphone permission required from browser
- Clear privacy notices in your deployment
- User control over when to enable microphone
Safety Measures
Content Moderation:- Malware and virus scanning for uploaded files
- Content moderation detecting inappropriate content
- Reporting mechanisms for policy violations
- Team protection from inappropriate material
Performance
Transcription Quality:- Accurate transcription across 24+ languages
- Tone and emotion detection
- Handles various accents and speaking styles
- Background noise reduction (especially on mobile)
- Speech-to-text processing adds 1-3 seconds
- Overall conversation feels natural
- Optimized for real-time interaction
- Audio streaming requires stable connection
- Automatic quality adjustment
- Fallback to text on poor connections
Browser Compatibility
Requirements:- Modern browsers (Chrome, Firefox, Safari, Edge)
- HTTPS required for microphone access
- Mobile browser support
- Permissions must be granted by user
Frequently Asked Questions
Microphone not working
Microphone not working
Possible causes:
- Browser blocked microphone permission
- No microphone connected
- Microphone used by another application
- Not using HTTPS connection
- Browser doesn’t support audio input
- Check browser permission settings
- Grant microphone access when prompted
- Close other apps using microphone
- Ensure website uses HTTPS
- Try different browser
- Test microphone with other applications
Poor transcription quality
Poor transcription quality
Possible causes:
- Background noise interference
- Speaking too quietly or too fast
- Strong accent or dialect
- Poor microphone quality
- Unstable internet connection
- Move to quieter environment
- Speak clearly and at moderate pace
- Position microphone closer to mouth
- Use headset microphone for better quality
- Check internet connection stability
- Try typing complex terms or proper nouns
Voice message not sending
Voice message not sending
Possible causes:
- Internet connection interrupted
- Audio file size too large
- Browser compatibility issue
- Microphone permission revoked
- Check internet connection
- Try shorter voice messages
- Refresh page and grant permissions again
- Update browser to latest version
- Use text input as fallback
Language not recognized correctly
Language not recognized correctly
Possible causes:
- Speaking multiple languages in one message
- Language settings mismatch
- Strong accent or dialect
- Stick to one language per message
- Verify language settings in deployment
- Speak more clearly or try text input
- Contact support if persistent issues
Example Scenarios
Technical Support
Customer scenario: User experiencing software error while commuting Voice message: “Hey, I’m getting an error message when I try to log in. It says ‘Connection timeout’ or something like that. I’m on my phone using cellular data.” AI processes: Transcribes message, identifies error type, considers context (mobile, cellular) AI response: “I can help with that login error. The ‘Connection timeout’ message usually occurs when there’s a network connectivity issue. Since you’re on cellular data, let’s try these steps…”Product Information
Customer scenario: User shopping while holding items Voice message: “I’m looking at this blue backpack. Does it have a laptop compartment? And is it waterproof?” AI processes: Transcribes questions, identifies product context, prepares detailed response AI response: “Let me provide details about that blue backpack. Yes, it features a dedicated laptop compartment that fits devices up to 15 inches…”Order Status
Customer scenario: User checking order while driving (hands-free) Voice message: “Can you tell me where my order is? I ordered it last Tuesday. The order number is… uh… B dash 1 2 3 4 5.” AI processes: Transcribes order inquiry, extracts order number, handles uncertainty (“uh”) AI response: “I found your order B-12345 from last Tuesday. It’s currently in transit and expected to arrive on Thursday…”Next Steps
Now that you understand audio support:- Vision - Enable image understanding for visual problems
- Documents - Allow customers to attach files and documents
- Escalations - Configure handoff when voice descriptions need human review
- Deploy - Set up audio-enabled deployments
- Website Integration - Add audio input to your website widget