Skip to main content
Traditional customer service metrics were designed for human agents. But AI agents operate differently - they handle conversations autonomously, improve continuously from feedback, and scale infinitely. You need different insights to measure what matters. This guide shows you how to use botBrains Insights to measure AI agent performance, identify improvement opportunities, and demonstrate the value of AI-powered customer service.

Why Traditional Metrics Fall Short

Classic support metrics like first response time, average handle time, and ticket volume were optimized for human agent teams. While these remain relevant, they miss what makes AI agents transformative: Traditional approach:
  • Measures speed: How fast did an agent respond?
  • Tracks workload: How many tickets per agent?
  • Counts escalations: How many went to tier 2?
AI agent reality:
  • Speed is instant and infinite: AI responds in seconds, 24/7, to unlimited conversations
  • Workload is irrelevant: One AI handles 1 or 10,000 conversations identically
  • Escalations are data: Each handoff reveals a knowledge gap to address
Don’t just apply human agent metrics to AI. Measuring “average response time” for an AI that responds in 2 seconds tells you nothing useful. Focus on quality, knowledge coverage, and autonomous resolution instead.

AI-Era Insights That Matter

botBrains provides analytics designed specifically for measuring and improving AI agent performance.

Autonomous Resolution Tracking

The most critical AI metric: how many conversations does your AI handle completely without human intervention? Autonomous conversations represent pure automation - zero human time required, infinite scalability, immediate availability. This is where AI delivers maximum value. Why it matters:
  • Direct ROI: Each autonomous conversation saves support team time
  • Scalability indicator: Higher autonomy means your AI handles growth without adding headcount
  • Quality signal: Customers getting complete answers without escalation
  • Improvement target: Focus knowledge additions on increasing this percentage
How to interpret:
60%+ Autonomous Rate = Excellent AI performance
40-60% Autonomous = Moderate automation with room to grow
Below 40% = Significant knowledge gaps requiring attention

Benchmark: Mature AI deployments typically achieve 60-70% autonomous handling
View autonomous rate trends in the Insights dashboard to track improvement over time. Click through to view autonomous conversations and identify what’s working well.
Combine autonomous rate with CSAT scores. High autonomy with high satisfaction means your AI is genuinely solving problems, not just responding quickly. If autonomy is high but CSAT is low, investigate whether the AI is providing correct answers.

AI Involvement Patterns

Understanding how your AI participates across all conversations reveals optimization opportunities. Involvement categories: Fully Autonomous (Green) AI handled the entire conversation without human involvement. This is your target state for routine questions.
Characteristics:
- Zero human time invested
- Immediate response regardless of time/day
- Scales infinitely
- Lowest cost per conversation

Optimization focus:
Identify topics with low autonomous rates and add targeted knowledge
Public Involvement (Blue) AI generated customer-visible responses, and a human also participated. This represents human-AI collaboration.
Common scenarios:
- AI provided initial answer, human added nuance
- AI maintained conversation until agent available
- Complex question requiring both AI knowledge and human judgment

Optimization focus:
Review these conversations - can you add knowledge to eliminate human intervention?
Private Involvement (Purple) AI suggested responses internally to support agents (copilot mode), but all customer messages came from humans.
Use cases:
- High-stakes conversations (legal, billing disputes)
- Enterprise customer white-glove service
- Training new support agents
- Specialized expertise required

Optimization focus:
Track which topics use copilot mode most - consider if autonomous handling is possible
Not Involved (Gray) Zero AI participation. These are typically imported historical tickets or channels where AI is disabled.
Opportunity:
Declining "not involved" percentage indicates growing AI adoption
If this remains high, investigate why AI isn't being utilized
A healthy involvement distribution for a mature AI deployment: 60-70% fully autonomous, 20-30% public involvement, 5-10% private involvement. Some human involvement is expected and valuable for complex or sensitive issues.

Quality of AI Responses

Beyond measuring if the AI responded, measure whether it provided complete, accurate answers. Answer Completeness metrics: Complete Answers AI provided comprehensive information without indicating missing knowledge. This represents ideal performance. Partial Answers AI answered but indicated uncertainty or missing details. These reveal knowledge boundaries and improvement opportunities.
Example AI responses indicating partial knowledge:
"Based on the information I have..."
"I don't have complete details about..."
"You may want to verify this with support..."
No Answer AI explicitly stated it couldn’t answer due to missing knowledge. While frustrating for users, honest acknowledgment prevents incorrect information.
An honest “I don’t know” is infinitely better than a confident but wrong answer. Monitor partial/no-answer rates to prioritize knowledge additions, but don’t eliminate them entirely - they represent appropriate AI humility.
Using answer completeness data:
  1. Filter to “No Answer” or “Partial Answer” conversations
  2. Read 10-20 examples to identify missing knowledge themes
  3. Add targeted content to your knowledge base
  4. Monitor improvement in completeness percentage over following weeks
  5. Track correlation with CSAT scores

Knowledge Gap Identification

Every unresolved conversation, escalation, or low rating reveals a specific opportunity to improve your AI. Systematic gap discovery: Step 1: Identify patterns
  • Topics with low resolution rates indicate missing knowledge
  • Recurring questions in escalated conversations show automation opportunities
  • Common phrases in low-rated conversations reveal answer quality issues
Step 2: Quantify impact
  • How many conversations are affected by this gap?
  • What’s the CSAT score for this topic?
  • Is volume growing or declining?
Step 3: Prioritize fixes
  • High volume + low resolution = highest priority
  • Growing topics require immediate attention
  • Low volume but critical customer segment (enterprise) = high priority
  • Declining topics may solve themselves
Step 4: Measure improvement
  • Add knowledge to address identified gaps
  • Deploy updated AI
  • Monitor same topic’s metrics 1-2 weeks later
  • Confirm resolution rate and CSAT improved
Don’t try to fix everything at once. Focus on high-impact knowledge gaps: topics with high conversation volume and low resolution rates. Small, targeted improvements compound over time.

Continuous Improvement Loops

AI agents improve through systematic measurement, targeted changes, and validation - not one-time configuration. The improvement cycle:
Week 1: Measure baseline
- Document current metrics (autonomous rate, CSAT, resolution by topic)
- Identify top 3 knowledge gaps
- Screenshot metrics dashboard

Week 2: Make focused improvements
- Add knowledge for 1-2 high-impact topics
- Refine guidance for problematic conversation patterns
- Deploy changes

Week 3: Measure impact
- Compare metrics to baseline
- Filter conversations by improved topics
- Validate resolution rates increased

Week 4: Iterate
- Document what worked
- Identify next improvement opportunities
- Repeat cycle
Tracking improvement over time: Use the Insights dashboard’s trend indicators to monitor progress:
  • Green +X% badges indicate improvement
  • Red -X% badges signal regression requiring investigation
  • Flat trends suggest you’ve reached a plateau - time for new approaches
Establish a weekly review routine: 15 minutes reviewing metrics, identifying one improvement to make, deploying the change. Consistent small improvements compound dramatically over months.

Using the Insights Dashboard

Access comprehensive AI performance analytics through the Insights interface.

Dashboard Overview

Navigate to Analyze → Metrics to view your AI’s performance data. Two specialized views: General View Comprehensive overview of all AI activity, customer satisfaction, and conversation outcomes. Use this for overall health monitoring and trend analysis. Ticketing View Specialized metrics for support teams managing ticketed workflows with AI involvement. Focus on automation rates, escalation patterns, and weekend coverage. Switch between views using tabs at the top of the page.

Key Metrics Cards

The dashboard displays critical KPIs with period-over-period comparisons: Messages Total message volume across all conversations. Rising messages with stable conversation count indicates longer, more complex interactions requiring attention. Conversations Unique conversation threads started. Tracks support workload and identifies unusual spikes that may indicate product issues or marketing campaigns. Unique Users Distinct users who interacted with your AI. Helps identify if support burden is concentrated among few users or distributed broadly. CSAT Score Customer satisfaction percentage: (Good + Amazing ratings) / All ratings × 100%. Click the card to filter to highly-rated conversations and learn what’s working well.
CSAT Benchmarks:
80%+ = Excellent (industry-leading)
70-80% = Good (solid performance)
60-70% = Fair (needs improvement)
Below 60% = Poor (urgent attention required)
Resolution Rate Percentage of conversations successfully resolved: Resolved / (Resolved + Escalated + Unresolved) × 100%. Click to view all resolved conversations.
Resolution Benchmarks:
75%+ = Strong autonomous performance
60-75% = Moderate effectiveness
Below 60% = Significant knowledge gaps
All metric cards show trend comparisons to the previous equivalent period. For example, viewing “Last 30 days” automatically compares to the 30 days before that, showing percentage point changes as green (improvement) or red (decline) badges.

AI-Specific Performance Charts

Beyond basic metrics, specialized visualizations reveal AI agent behavior: Involvement Rate Pie Chart Shows distribution of AI participation levels across all conversations. Use this to understand your automation mix and identify opportunities to shift conversations from human-assisted to fully autonomous. Answer Completeness Chart Displays percentage of complete vs. partial vs. no-answer responses. Declining “no answer” percentage indicates improving knowledge coverage. Conversation Status Over Time Tracks resolved, escalated, and unresolved conversations across your date range. Look for:
  • Increasing resolved percentage (good)
  • Spikes in escalations (investigate specific dates)
  • Growing unresolved percentage (knowledge gaps emerging)
User Sentiment Analysis Shows positive, neutral, and negative sentiment in user messages. Combine with CSAT to understand customer frustration:
  • High negative sentiment + low CSAT = Users frustrated with AI responses
  • High negative sentiment + high CSAT = Users frustrated with their problem, but AI helped well

Ticketing-Specific Insights

Switch to the Ticketing view for support team-focused analytics: Involvement Rate Percentage of tickets where AI participated in any capacity. Target 80%+ for high ROI. Involved Tickets Count Absolute number of tickets where AI assisted. Multiply by average handling time to calculate time saved. Relative Autonomous Rate Percentage of involved tickets handled fully autonomously (excludes not-involved tickets). This shows how effective the AI is when it does participate.
Relative Autonomous Rate = Autonomous / (Autonomous + Public + Private) × 100%

Why this differs from regular autonomous rate:
Removes "not involved" tickets from calculation
Shows AI effectiveness when engaged
Better indicator of knowledge quality
Better Monday Score Percentage of weekend tickets (Saturday/Sunday) that received at least one AI response. Higher scores mean fewer tickets pile up for Monday morning.
Better Monday Score = Weekend AI-Responded Tickets / Total Weekend Tickets × 100%

Target: 70%+ for strong weekend coverage
Result: Reduced Monday morning backlog and improved team morale
Better Monday Score is one of the most tangible ROI metrics. If your AI handles 50 tickets autonomously each weekend, that’s 50 tickets your team doesn’t face Monday morning. Quantify this in your stakeholder reports.
Involvement Flow Sankey Diagram Visual flow showing how tickets move from involvement types to resolution outcomes. Use this to:
  • Identify which involvement types lead to resolution
  • Spot problematic patterns (e.g., autonomous → escalated)
  • Understand where AI adds value vs. where humans must finish

Actionable Insights

Transform analytics into concrete improvements with systematic analysis.

What to Measure

Weekly monitoring (15 minutes):
  • CSAT and resolution rate trends
  • Any sudden drops or spikes in key metrics
  • New or growing topics
  • Better Monday Score (weekend coverage)
  • Count of 1-2 star ratings
Monthly deep dives (1 hour):
  • Topic-specific performance (resolution rate, CSAT by topic)
  • Involvement rate evolution (is autonomous percentage growing?)
  • Answer completeness trends
  • Channel-specific performance differences
  • Knowledge source usage patterns
Quarterly reviews (2-3 hours):
  • Long-term trend analysis (90-day periods)
  • ROI calculations (time saved, costs reduced)
  • Seasonal pattern identification
  • Strategic goal validation
  • Year-over-year comparisons

How to Interpret

Healthy performance pattern:
CSAT: 80%+
Resolution Rate: 75%+
Autonomous Rate: 60%+
Better Monday Score: 70%+
1-2 Star Ratings: <10%
Average Conversation Length: 4-6 messages

Interpretation: Well-tuned AI providing effective automated support
Knowledge gap pattern:
Resolution Rate: 50-60%
Answer Completeness: 40% "No Answer"
Unresolved Status: 30%+
CSAT: 60-70%
Escalation Rate: 25%+

Interpretation: Systematic knowledge gaps requiring attention
Action: Review unresolved conversations, add missing information
Quality problem pattern:
CSAT: Below 60%
Resolution Rate: 70%+
1-2 Star Ratings: 20%+
Negative Sentiment: 40%+
Conversation Length: 8+ messages

Interpretation: AI marking conversations "resolved" but users disagree
Action: Review low-rated conversations, refine guidance and answer quality
Adoption problem pattern:
Involvement Rate: <50%
Not Involved: 50%+ of tickets
Autonomous Rate: Low but high resolution when used

Interpretation: AI performs well but isn't being utilized enough
Action: Team training, better escalation rules, integration improvements
Individual metrics tell partial stories. Combine metrics to understand root causes: high resolution with low CSAT, high autonomy with many escalations, etc. Context matters.

What Actions to Take

Increase autonomous resolution:
  1. Filter to public involvement conversations (AI + human collaboration)
  2. Read 15-20 examples to identify why humans intervened
  3. Categorize intervention reasons:
    • Missing knowledge → Add to data providers
    • Unclear guidance → Refine AI instructions
    • Edge cases → Create specific snippets
    • Policy decisions → Consider automation or clear escalation rules
  4. Make targeted improvements for most common patterns
  5. Monitor autonomous rate increase over next 2 weeks
Improve CSAT scores:
  1. Filter conversations with 1-2 star ratings
  2. Read customer feedback and conversation content
  3. Identify patterns in dissatisfaction:
    • Wrong answers → Update knowledge with correct information
    • Incomplete answers → Add more comprehensive content
    • Tone issues → Adjust guidance for empathy and clarity
    • Product problems → Escalate to product team, add honest acknowledgment to AI
  4. Create snippets addressing common frustrations
  5. Track CSAT improvement for affected topics
Reduce unresolved conversations:
  1. Filter to status: unresolved
  2. Examine conversations for common themes
  3. Distinguish causes:
    • User abandoned mid-conversation → May be acceptable
    • AI couldn’t find answer → Add missing knowledge
    • AI found wrong answer → Correct knowledge base
    • Question unclear → Improve AI’s clarifying question ability
  4. Focus on high-volume unresolved topics
  5. Monitor resolution rate increase
Optimize weekend coverage:
  1. Check current Better Monday Score
  2. If below 60%, filter to weekend conversations
  3. Identify topics frequently appearing on weekends
  4. Ensure knowledge coverage for these topics
  5. Verify AI confidence thresholds aren’t too conservative
  6. Monitor Better Monday Score improvement
  7. Quantify impact: tickets handled vs. Monday morning backlog
Always link actions to metrics. Before making changes, note baseline numbers. After deploying improvements, measure the same metrics 1-2 weeks later. This validates that your changes actually worked and builds a library of effective interventions.

Combining Insights with Other Tools

Insights become more powerful when integrated with other botBrains features.

Insights + Topics

Use topic analysis to segment performance by subject matter: Workflow:
  1. View Insights dashboard metrics
  2. Identify concerning trends (declining CSAT, low resolution)
  3. Navigate to Topics dashboard
  4. Examine topic-specific resolution rates in the treemap
  5. Click red/yellow boxes (low resolution topics)
  6. Filter conversations to that topic
  7. Review conversations to identify knowledge gaps
  8. Add targeted knowledge for that topic
  9. Return to Insights to monitor improvement
Example:
  • Insights shows CSAT dropped from 78% to 71%
  • Topics reveals “Refund Policy” topic has 45% resolution (red box)
  • Click through shows 50+ conversations asking about international refunds
  • Add comprehensive international refund policy to knowledge base
  • Two weeks later: “Refund Policy” topic at 82% resolution, overall CSAT back to 77%

Insights + Labels

Create custom segments for targeted analysis: Use labels to:
  • Track enterprise customer performance separately
  • Monitor specific product areas
  • Flag training examples
  • Mark conversations requiring follow-up
  • Segment by customer lifecycle stage
Workflow:
  1. Create labels for important segments (Enterprise, Billing, Bug Reports)
  2. Apply labels to conversations manually or via automation
  3. In Insights dashboard, filter by label
  4. View metrics for just that segment
  5. Compare performance across different labels
  6. Optimize AI for high-value segments first
Example:
  • Create “Enterprise” label for high-value customers
  • Filter Insights to Enterprise conversations only
  • Discover Enterprise CSAT is 68% vs. 81% overall
  • Review enterprise conversations to identify unique needs
  • Add enterprise-specific knowledge and guidance
  • Monitor enterprise CSAT improvement to 79%

Insights + Conversations

Insights identify what to investigate; Conversations show you why: Workflow:
  1. Insights reveals problem area (e.g., low CSAT for specific topic)
  2. Navigate to Conversations
  3. Apply filters: Topic + Rating 1-2 + Last 30 days
  4. Read 15-20 conversations to understand root causes
  5. Look for patterns in user questions and AI responses
  6. Identify specific missing knowledge or guidance issues
  7. Make targeted improvements
  8. Return to Insights to validate improvement
Example investigation:
Insights: "API Authentication" topic has 55% resolution rate

Conversations filtered to topic:
- 12 conversations ask about OAuth implementation
- AI references general authentication, missing OAuth specifics
- Users rate 1-2 stars, leave feedback "didn't answer my question"

Action:
- Add detailed OAuth documentation to knowledge base
- Create snippet for common OAuth setup questions
- Update guidance to recognize OAuth-specific questions

Result:
- "API Authentication" resolution rate increases to 78%
- CSAT for topic improves from 62% to 81%
Insights and Conversations work together: Insights provides the quantitative “what” (metrics, trends, patterns). Conversations provides the qualitative “why” (actual user questions, AI responses, context). Use both.

Best Practices for AI-Era Measurement

Effective AI agent measurement requires different approaches than traditional support analytics.

Focus on Continuous Improvement, Not Perfection

Don’t aim for 100% metrics:
  • 100% CSAT is unrealistic (some users frustrated with product, not AI)
  • 100% autonomous rate is undesirable (complex issues need human judgment)
  • Some escalations prevent worse outcomes (wrong automated answers)
Do aim for steady progress:
  • 5 percentage point improvement per month is excellent
  • Consistent upward trends over 3-6 months demonstrate real improvement
  • Compound effect: Small weekly gains accumulate to dramatic annual results

Segment Your Analysis

Aggregate metrics hide important patterns: Segment by:
  • Topic: Some topics have higher knowledge coverage than others
  • Channel: Web chat users expect different responses than email
  • Customer tier: Enterprise customers may need different handling
  • Time period: Weekday vs. weekend performance may differ
  • User language: Multilingual support quality varies
Example insight from segmentation:
Overall CSAT: 76% (looks okay)

Segmented analysis:
- Web chat: 81% CSAT (good)
- Email: 72% CSAT (needs work)
- Slack: 69% CSAT (investigate)

Action: Focus improvements on email and Slack channels specifically

Establish Baselines and Targets

Metrics mean nothing without context: Create baselines:
  1. When launching AI, document day-one metrics
  2. After first month, establish monthly baseline
  3. Use baseline as comparison point for all future measurements
Set realistic targets:
Example goal setting:
Baseline: 55% Autonomous, 75% CSAT, 68% Resolution
3-month target: 65% Autonomous (+10pp), 80% CSAT (+5pp), 75% Resolution (+7pp)

Track monthly:
Month 1: 58% / 76% / 70% (+3pp / +1pp / +2pp) - on track
Month 2: 61% / 78% / 73% (+3pp / +2pp / +3pp) - on track
Month 3: 66% / 81% / 76% (+5pp / +3pp / +3pp) - exceeded target

Correlate Metrics with Actions

Build institutional knowledge by tracking what works: Create a change log:
Nov 1: Added 15 billing snippets
Nov 8: Updated guidance for technical questions
Nov 15: Enabled weekend deployments
Nov 22: Integrated new help center articles
Review metrics with change context:
Nov 1-7: Resolution 68%, Billing topic 45% resolved
Nov 8-14: Resolution 71%, Billing topic 62% resolved (+17pp) ✓
Nov 15-21: Better Monday jumped from 55% to 78% (+23pp) ✓
Nov 22-28: Resolution 74%, Answer completeness +12% ✓
Build a playbook:
  • Document which types of changes drove improvement
  • Note which interventions failed to move metrics
  • Create reusable patterns for common issues
  • Train new team members using proven approaches
Screenshot your metrics dashboard weekly. Visual trends are easier to spot than numbers alone, and historical screenshots help you remember what changed when. Store screenshots with your change log for complete context.

Balance Quantitative and Qualitative Analysis

Numbers show you what’s happening. Conversations show you why: Process:
  1. Use Insights to identify problem areas (quantitative)
  2. Read conversations to understand root causes (qualitative)
  3. Make targeted improvements based on both
  4. Use Insights to validate improvements worked (quantitative)
  5. Read new conversations to confirm quality improved (qualitative)
Example:
Quantitative: "Shipping" topic has 52% resolution rate (Insights)
Qualitative: Users asking about international shipping to specific countries (Conversations)
Improvement: Add country-specific shipping information to knowledge base
Validation: "Shipping" topic now 76% resolution (Insights)
Confirmation: Read recent shipping conversations - comprehensive answers (Conversations)
Don’t rely solely on metrics - they guide you to conversations that need review, but the conversations themselves reveal the actionable insights.

Review Regularly and Systematically

Establish predictable review cadences: Weekly review (15 minutes):
  • Check CSAT and resolution rate trends
  • Review any sudden drops or spikes
  • Filter to 1-2 star ratings, scan recent poor experiences
  • Note any growing topics or unusual patterns
  • Quick action: Create 1-2 snippets for most common gaps
Monthly deep dive (1 hour):
  • Compare month-over-month metrics across all categories
  • Segment analysis by channel, topic, customer tier
  • Review involvement rate evolution
  • Topic-specific metrics for top 10 topics
  • Export data for stakeholder reports
  • Document improvements made and measured impact
Quarterly strategic review (2-3 hours):
  • Comprehensive trend analysis across 90-day periods
  • Calculate ROI metrics (tickets automated, time saved, costs reduced)
  • Validate long-term strategic goals
  • Identify seasonal patterns for future planning
  • Present findings to leadership with recommendations
  • Set targets for next quarter
Consistency matters more than frequency. Reviewing metrics once a week consistently beats sporadic deep dives. Build the review into your team’s regular rhythm.

Common Patterns and Anti-Patterns

Learn from typical scenarios teams encounter when measuring AI agent performance.

Patterns: What Works

Pattern: The Weekly Improvement Cycle
Team: SaaS company with 500 support conversations/week
Approach:
- Every Monday: 15-minute metrics review meeting
- Identify one topic with low resolution rate
- Add targeted knowledge for that topic
- Deploy by Wednesday
- Review impact the following Monday

Results after 12 weeks:
- Resolution rate: 58% → 79% (+21pp)
- Autonomous rate: 42% → 68% (+26pp)
- CSAT: 72% → 84% (+12pp)
- Support team capacity freed up: ~30 hours/week
Pattern: Segment-Specific Optimization
Team: E-commerce company with mixed customer base
Approach:
- Created labels: Enterprise, SMB, Free
- Analyzed metrics by segment separately
- Discovered Enterprise CSAT lagging (65% vs. 82% overall)
- Investigated enterprise-specific conversation patterns
- Added detailed account management knowledge
- Created escalation rules for billing over $10k

Results after 4 weeks:
- Enterprise CSAT: 65% → 81% (+16pp)
- Enterprise escalations: 45% → 28% (-17pp)
- Enterprise retention improved (qualitative)
Pattern: Weekend Coverage Focus
Team: B2B software with weekend ticket buildup
Approach:
- Measured Better Monday Score: 34% (poor)
- Filtered to weekend conversations
- Identified common weekend topics (login issues, password reset, basic setup)
- Ensured comprehensive knowledge for these topics
- Lowered escalation threshold on weekends (more AI autonomy)

Results after 6 weeks:
- Better Monday Score: 34% → 71% (+37pp)
- Monday morning ticket backlog: -62%
- Team morale improvement (less stressful Mondays)

Anti-Patterns: What to Avoid

Anti-Pattern: Metrics Without Action
Problem:
- Team reviews metrics dashboard weekly
- Identifies issues and discusses them
- Never makes concrete changes to knowledge or guidance
- Metrics stagnate or decline

Why it fails:
- Analysis without action is wasted effort
- Metrics exist to drive improvement, not just observation
- Team becomes demoralized seeing same problems persist

Solution:
- Every metrics review ends with specific action items
- Assign ownership: who will make what change by when
- Track actions in ticketing system or project management tool
- Review action completion in next meeting
Anti-Pattern: Chasing Perfect Metrics
Problem:
- Team obsesses over reaching 100% CSAT, 100% autonomous
- Makes changes that optimize metrics but harm user experience
- Examples: Not offering CSAT survey to avoid low ratings, marking conversations resolved prematurely

Why it fails:
- Perfect metrics are impossible and undesirable
- Gaming metrics destroys their value as signals
- Damages trust in measurement system

Solution:
- Set realistic improvement targets (not perfection)
- Focus on trends over time, not absolute numbers
- Combine metrics with qualitative feedback
- Celebrate steady progress: +5pp/month is excellent
Anti-Pattern: Too Many Changes at Once
Problem:
- Team adds 100 new snippets in one deployment
- Updates all guidance simultaneously
- Changes escalation rules
- Adjusts CSAT survey timing
- Reviews metrics after one week
- Can't determine which change caused improvement or regression

Why it fails:
- Multiple concurrent changes confound attribution
- Can't identify what actually works
- Difficult to debug when metrics decline
- Wastes effort on ineffective changes

Solution:
- Make focused changes: 5-10 snippets per deployment
- Test one major change at a time
- Wait 1-2 weeks between deployments to measure impact
- Document each change with expected metric impact
- Build library of "what works" over time
Anti-Pattern: Ignoring Conversation Context
Problem:
- Team sees escalation rate of 15% and panics
- Tries to eliminate all escalations
- Creates knowledge for edge cases that should escalate
- Autonomous rate increases but CSAT plummets

Why it fails:
- Not all escalations are bad
- Complex, sensitive, or high-stakes issues legitimately need humans
- Forcing AI to handle inappropriate conversations damages trust
- Metrics without context mislead

Solution:
- Read escalated conversations before trying to eliminate escalations
- Distinguish "good escalations" (complex issues) from "bad escalations" (knowledge gaps)
- Accept baseline escalation rate (10-15% is healthy)
- Focus on reducing unnecessary escalations, not all escalations
Anti-Pattern: Analysis Paralysis
Problem:
- Team exports data to spreadsheets for "deeper analysis"
- Creates custom dashboards and complex reports
- Spends hours analyzing correlations
- Delays making improvements while seeking perfect understanding

Why it fails:
- Overthinking prevents action
- Diminishing returns on analysis depth
- Opportunity cost: Time spent analyzing could be spent improving

Solution:
- Use built-in Insights dashboard for 90% of analysis
- Spend 20% of time analyzing, 80% of time improving
- Make decisions with imperfect information
- Iterate quickly: try improvements, measure, adjust
- Deep analysis only for major strategic questions

Case Studies and Examples

Real-world examples of teams using Insights to drive AI agent improvement.

Case Study: SaaS Company Doubles Autonomous Rate

Background:
  • Mid-size SaaS company
  • 800 support conversations/month
  • Launched AI agent with basic knowledge base
  • Initial metrics: 38% autonomous, 71% CSAT, 61% resolution rate
Challenge: Low autonomous rate meant support team still handling majority of conversations. ROI unclear to leadership. Approach:
Week 1: Baseline measurement
- Documented all metrics
- Identified top 10 topics by volume
- Calculated resolution rate for each topic

Week 2-3: Focused improvement
- Selected 3 topics with lowest resolution rates
- "Billing Questions": 28% resolution
- "API Authentication": 35% resolution
- "Data Export": 42% resolution

Week 4-5: Knowledge addition
- Added 8 billing snippets (common questions)
- Imported API documentation as data provider
- Created step-by-step export guide

Week 6: Measurement
- Billing Questions: 28% → 74% resolution (+46pp)
- API Authentication: 35% → 68% resolution (+33pp)
- Data Export: 42% → 81% resolution (+39pp)
- Overall autonomous rate: 38% → 52% (+14pp)

Weeks 7-12: Iteration
- Repeated process for next 3 topics
- Added knowledge for 6 more topic areas
- Refined guidance based on conversation patterns
Results after 12 weeks:
  • Autonomous rate: 38% → 76% (+38pp)
  • CSAT: 71% → 83% (+12pp)
  • Resolution rate: 61% → 81% (+20pp)
  • Support team hours saved: ~120 hours/month
  • Tickets handled without human involvement: 608/800 (76%)
Key learnings:
  • Focus on high-volume topics first for maximum impact
  • Small, frequent improvements compound rapidly
  • Tracking topic-specific metrics reveals exactly where to invest effort
  • Team morale improved as repetitive questions automated

Example: E-Commerce Weekend Coverage

Situation: Online retailer with high weekend traffic but no weekend support coverage. Initial state:
  • Better Monday Score: 29%
  • Monday morning backlog: Average 85 tickets
  • Weekend conversations: 140/week
  • Weekend AI involvement: Minimal
Analysis: Filtered Insights to weekend conversations only:
  • 45% about order status
  • 22% about shipping questions
  • 18% about returns/exchanges
  • 15% other topics
Actions:
  1. Enriched order status knowledge with detailed tracking explanations
  2. Added comprehensive shipping policy including all carriers and timelines
  3. Created return/exchange policy snippets for all product categories
  4. Adjusted AI confidence threshold to be more autonomous on weekends
Results after 4 weeks:
  • Better Monday Score: 29% → 72% (+43pp)
  • Monday morning backlog: 85 → 32 tickets (-62%)
  • Weekend autonomous handling: 101/140 conversations (72%)
  • Customer feedback: Positive comments about weekend availability
  • Support team: “Mondays went from dreaded to manageable”

Example: Enterprise Customer Optimization

Context: B2B platform with mixed customer base (Enterprise, SMB, Free). Discovery through segmentation:
Overall metrics looked healthy:
- CSAT: 79%
- Resolution rate: 74%
- Autonomous rate: 64%

Segmented by customer tier:
Enterprise (20% of conversations):
- CSAT: 68% (11pp below average)
- Escalation rate: 38% (high)
- Frequent topics: Account management, billing, compliance

SMB (50% of conversations):
- CSAT: 82% (3pp above average)
- Performing well

Free (30% of conversations):
- CSAT: 81% (2pp above average)
- Performing well
Insight: AI optimized for SMB/Free users but missing enterprise-specific knowledge and appropriate escalation. Actions:
  1. Created enterprise-specific snippets (SSO setup, custom contracts, compliance questions)
  2. Added guidance recognizing enterprise customer conversations (via email domain, account size)
  3. Implemented faster escalation for enterprise accounts (to dedicated account managers)
  4. Enhanced knowledge for multi-seat licensing questions
Results after 8 weeks:
  • Enterprise CSAT: 68% → 79% (+11pp)
  • Enterprise escalation rate: 38% → 31% (-7pp, but escalations now go to right team)
  • Enterprise customers reporting improved support experience
  • Account renewals up 12% (partial attribution)
Key learning: Aggregate metrics can mask important segment-specific issues. Always segment by business-critical categories.

Next Steps

You now understand how to measure AI agent performance and drive continuous improvement. Here’s how to apply these insights:

Review Conversations

Dive deep into individual conversations to understand the qualitative context behind your metrics

Analyze Topics

Segment performance by topic to identify specific knowledge gaps and improvement opportunities

Track Metrics Dashboard

Access detailed performance visualizations and export data for reporting

Add Knowledge

Fill gaps identified through insights analysis with targeted content

Improve AI Responses

Refine guidance and instructions based on conversation patterns and metrics

Remember the Core Principles

AI agent measurement is fundamentally different from traditional support metrics:
  1. Autonomous resolution matters most: Track and optimize for conversations the AI handles completely
  2. Knowledge gaps are opportunities: Every unresolved conversation reveals what to add next
  3. Improvement is continuous: Weekly small gains compound to dramatic long-term results
  4. Segment your analysis: Aggregate metrics hide important patterns
  5. Combine quantitative and qualitative: Metrics identify problems, conversations explain causes
  6. Focus on trends, not perfection: Steady improvement over time beats chasing 100% metrics
Start with your current baseline, make one focused improvement per week, and systematically build an AI agent that gets better every month. The insights are there - now put them into action.