Insights for AI Agents - botBrains Docs

Traditional customer service metrics were designed for human agents. But AI agents operate differently - they handle conversations autonomously, improve continuously from feedback, and scale infinitely. You need different insights to measure what matters. This guide shows you how to use botBrains Insights to measure AI agent performance, identify improvement opportunities, and demonstrate the value of AI-powered customer service.

Why Traditional Metrics Fall Short

Classic support metrics like first response time, average handle time, and ticket volume were optimized for human agent teams. While these remain relevant, they miss what makes AI agents transformative: Traditional approach:

Measures speed: How fast did an agent respond?
Tracks workload: How many tickets per agent?
Counts escalations: How many went to tier 2?

AI agent reality:

Speed is instant and infinite: AI responds in seconds, 24/7, to unlimited conversations
Workload is irrelevant: One AI handles 1 or 10,000 conversations identically
Escalations are data: Each handoff reveals a knowledge gap to address

Don’t just apply human agent metrics to AI. Measuring “average response time” for an AI that responds in 2 seconds tells you nothing useful. Focus on quality, knowledge coverage, and autonomous resolution instead.

AI-Era Insights That Matter

botBrains provides analytics designed specifically for measuring and improving AI agent performance.

Autonomous Resolution Tracking

The most critical AI metric: how many conversations does your AI handle completely without human intervention? Autonomous conversations represent pure automation - zero human time required, infinite scalability, immediate availability. This is where AI delivers maximum value. Why it matters:

Direct ROI: Each autonomous conversation saves support team time
Scalability indicator: Higher autonomy means your AI handles growth without adding headcount
Quality signal: Customers getting complete answers without escalation
Improvement target: Focus knowledge additions on increasing this percentage

How to interpret:

60%+ Autonomous Rate = Excellent AI performance
40-60% Autonomous = Moderate automation with room to grow
Below 40% = Significant knowledge gaps requiring attention

Benchmark: Mature AI deployments typically achieve 60-70% autonomous handling

View autonomous rate trends in the Insights dashboard to track improvement over time. Click through to view autonomous conversations and identify what’s working well.

Combine autonomous rate with CSAT scores. High autonomy with high satisfaction means your AI is genuinely solving problems, not just responding quickly. If autonomy is high but CSAT is low, investigate whether the AI is providing correct answers.

AI Involvement Patterns

Understanding how your AI participates across all conversations reveals optimization opportunities. Involvement categories: Fully Autonomous (Green) AI handled the entire conversation without human involvement. This is your target state for routine questions.

Characteristics:
- Zero human time invested
- Immediate response regardless of time/day
- Scales infinitely
- Lowest cost per conversation

Optimization focus:
Identify topics with low autonomous rates and add targeted knowledge

Public Involvement (Blue) AI generated customer-visible responses, and a human also participated. This represents human-AI collaboration.

Common scenarios:
- AI provided initial answer, human added nuance
- AI maintained conversation until agent available
- Complex question requiring both AI knowledge and human judgment

Optimization focus:
Review these conversations - can you add knowledge to eliminate human intervention?

Private Involvement (Purple) AI suggested responses internally to support agents (copilot mode), but all customer messages came from humans.

Use cases:
- High-stakes conversations (legal, billing disputes)
- Enterprise customer white-glove service
- Training new support agents
- Specialized expertise required

Optimization focus:
Track which topics use copilot mode most - consider if autonomous handling is possible

Not Involved (Gray) Zero AI participation. These are typically imported historical tickets or channels where AI is disabled.

Opportunity:
Declining "not involved" percentage indicates growing AI adoption
If this remains high, investigate why AI isn't being utilized

A healthy involvement distribution for a mature AI deployment: 60-70% fully autonomous, 20-30% public involvement, 5-10% private involvement. Some human involvement is expected and valuable for complex or sensitive issues.

Quality of AI Responses

Beyond measuring if the AI responded, measure whether it provided complete, accurate answers. Answer Completeness metrics: Complete Answers AI provided comprehensive information without indicating missing knowledge. This represents ideal performance. Partial Answers AI answered but indicated uncertainty or missing details. These reveal knowledge boundaries and improvement opportunities.

Example AI responses indicating partial knowledge:
"Based on the information I have..."
"I don't have complete details about..."
"You may want to verify this with support..."

No Answer AI explicitly stated it couldn’t answer due to missing knowledge. While frustrating for users, honest acknowledgment prevents incorrect information.

An honest “I don’t know” is infinitely better than a confident but wrong answer. Monitor partial/no-answer rates to prioritize knowledge additions, but don’t eliminate them entirely - they represent appropriate AI humility.

Using answer completeness data:

Filter to “No Answer” or “Partial Answer” conversations
Read 10-20 examples to identify missing knowledge themes
Add targeted content to your knowledge base
Monitor improvement in completeness percentage over following weeks
Track correlation with CSAT scores

Knowledge Gap Identification

Every unresolved conversation, escalation, or low rating reveals a specific opportunity to improve your AI. Systematic gap discovery: Step 1: Identify patterns

Topics with low resolution rates indicate missing knowledge
Recurring questions in escalated conversations show automation opportunities
Common phrases in low-rated conversations reveal answer quality issues

Step 2: Quantify impact

How many conversations are affected by this gap?
What’s the CSAT score for this topic?
Is volume growing or declining?

Step 3: Prioritize fixes

High volume + low resolution = highest priority
Growing topics require immediate attention
Low volume but critical customer segment (enterprise) = high priority
Declining topics may solve themselves

Step 4: Measure improvement

Add knowledge to address identified gaps
Deploy updated AI
Monitor same topic’s metrics 1-2 weeks later
Confirm resolution rate and CSAT improved

Don’t try to fix everything at once. Focus on high-impact knowledge gaps: topics with high conversation volume and low resolution rates. Small, targeted improvements compound over time.

Continuous Improvement Loops

AI agents improve through systematic measurement, targeted changes, and validation - not one-time configuration. The improvement cycle:

Week 1: Measure baseline
- Document current metrics (autonomous rate, CSAT, resolution by topic)
- Identify top 3 knowledge gaps
- Screenshot metrics dashboard

Week 2: Make focused improvements
- Add knowledge for 1-2 high-impact topics
- Refine guidance for problematic conversation patterns
- Deploy changes

Week 3: Measure impact
- Compare metrics to baseline
- Filter conversations by improved topics
- Validate resolution rates increased

Week 4: Iterate
- Document what worked
- Identify next improvement opportunities
- Repeat cycle

Tracking improvement over time: Use the Insights dashboard’s trend indicators to monitor progress:

Green +X% badges indicate improvement
Red -X% badges signal regression requiring investigation
Flat trends suggest you’ve reached a plateau - time for new approaches

Establish a weekly review routine: 15 minutes reviewing metrics, identifying one improvement to make, deploying the change. Consistent small improvements compound dramatically over months.

Using the Insights Dashboard

Access comprehensive AI performance analytics through the Insights interface.

Dashboard Overview

Navigate to Analyze → Metrics to view your AI’s performance data. Two specialized views: General View Comprehensive overview of all AI activity, customer satisfaction, and conversation outcomes. Use this for overall health monitoring and trend analysis. Ticketing View Specialized metrics for support teams managing ticketed workflows with AI involvement. Focus on automation rates, escalation patterns, and weekend coverage. Switch between views using tabs at the top of the page.

Key Metrics Cards

The dashboard displays critical KPIs with period-over-period comparisons: Messages Total message volume across all conversations. Rising messages with stable conversation count indicates longer, more complex interactions requiring attention. Conversations Unique conversation threads started. Tracks support workload and identifies unusual spikes that may indicate product issues or marketing campaigns. Unique Users Distinct users who interacted with your AI. Helps identify if support burden is concentrated among few users or distributed broadly. CSAT Score Customer satisfaction percentage: (Good + Amazing ratings) / All ratings × 100%. Click the card to filter to highly-rated conversations and learn what’s working well.

CSAT Benchmarks:
80%+ = Excellent (industry-leading)
70-80% = Good (solid performance)
60-70% = Fair (needs improvement)
Below 60% = Poor (urgent attention required)

Resolution Rate Percentage of conversations successfully resolved: Resolved / (Resolved + Escalated + Unresolved) × 100%. Click to view all resolved conversations.

Resolution Benchmarks:
75%+ = Strong autonomous performance
60-75% = Moderate effectiveness
Below 60% = Significant knowledge gaps

All metric cards show trend comparisons to the previous equivalent period. For example, viewing “Last 30 days” automatically compares to the 30 days before that, showing percentage point changes as green (improvement) or red (decline) badges.

AI-Specific Performance Charts

Beyond basic metrics, specialized visualizations reveal AI agent behavior: Involvement Rate Pie Chart Shows distribution of AI participation levels across all conversations. Use this to understand your automation mix and identify opportunities to shift conversations from human-assisted to fully autonomous. Answer Completeness Chart Displays percentage of complete vs. partial vs. no-answer responses. Declining “no answer” percentage indicates improving knowledge coverage. Conversation Status Over Time Tracks resolved, escalated, and unresolved conversations across your date range. Look for:

Increasing resolved percentage (good)
Spikes in escalations (investigate specific dates)
Growing unresolved percentage (knowledge gaps emerging)

User Sentiment Analysis Shows positive, neutral, and negative sentiment in user messages. Combine with CSAT to understand customer frustration:

High negative sentiment + low CSAT = Users frustrated with AI responses
High negative sentiment + high CSAT = Users frustrated with their problem, but AI helped well

Ticketing-Specific Insights

Switch to the Ticketing view for support team-focused analytics: Involvement Rate Percentage of tickets where AI participated in any capacity. Target 80%+ for high ROI. Involved Tickets Count Absolute number of tickets where AI assisted. Multiply by average handling time to calculate time saved. Relative Autonomous Rate Percentage of involved tickets handled fully autonomously (excludes not-involved tickets). This shows how effective the AI is when it does participate.

Relative Autonomous Rate = Autonomous / (Autonomous + Public + Private) × 100%

Why this differs from regular autonomous rate:
Removes "not involved" tickets from calculation
Shows AI effectiveness when engaged
Better indicator of knowledge quality

Better Monday Score Percentage of weekend tickets (Saturday/Sunday) that received at least one AI response. Higher scores mean fewer tickets pile up for Monday morning.

Better Monday Score = Weekend AI-Responded Tickets / Total Weekend Tickets × 100%

Target: 70%+ for strong weekend coverage
Result: Reduced Monday morning backlog and improved team morale

Better Monday Score is one of the most tangible ROI metrics. If your AI handles 50 tickets autonomously each weekend, that’s 50 tickets your team doesn’t face Monday morning. Quantify this in your stakeholder reports.

Involvement Flow Sankey Diagram Visual flow showing how tickets move from involvement types to resolution outcomes. Use this to:

Identify which involvement types lead to resolution
Spot problematic patterns (e.g., autonomous → escalated)
Understand where AI adds value vs. where humans must finish

Actionable Insights

Transform analytics into concrete improvements with systematic analysis.

What to Measure

Weekly monitoring (15 minutes):

CSAT and resolution rate trends
Any sudden drops or spikes in key metrics
New or growing topics
Better Monday Score (weekend coverage)
Count of 1-2 star ratings

Monthly deep dives (1 hour):

Topic-specific performance (resolution rate, CSAT by topic)
Involvement rate evolution (is autonomous percentage growing?)
Answer completeness trends
Channel-specific performance differences
Knowledge source usage patterns

Quarterly reviews (2-3 hours):

Long-term trend analysis (90-day periods)
ROI calculations (time saved, costs reduced)
Seasonal pattern identification
Strategic goal validation
Year-over-year comparisons

How to Interpret

Healthy performance pattern:

CSAT: 80%+
Resolution Rate: 75%+
Autonomous Rate: 60%+
Better Monday Score: 70%+
1-2 Star Ratings: <10%
Average Conversation Length: 4-6 messages

Interpretation: Well-tuned AI providing effective automated support

Knowledge gap pattern:

Resolution Rate: 50-60%
Answer Completeness: 40% "No Answer"
Unresolved Status: 30%+
CSAT: 60-70%
Escalation Rate: 25%+

Interpretation: Systematic knowledge gaps requiring attention
Action: Review unresolved conversations, add missing information

Quality problem pattern:

CSAT: Below 60%
Resolution Rate: 70%+
1-2 Star Ratings: 20%+
Negative Sentiment: 40%+
Conversation Length: 8+ messages

Interpretation: AI marking conversations "resolved" but users disagree
Action: Review low-rated conversations, refine guidance and answer quality

Adoption problem pattern:

Involvement Rate: <50%
Not Involved: 50%+ of tickets
Autonomous Rate: Low but high resolution when used

Interpretation: AI performs well but isn't being utilized enough
Action: Team training, better escalation rules, integration improvements

Individual metrics tell partial stories. Combine metrics to understand root causes: high resolution with low CSAT, high autonomy with many escalations, etc. Context matters.

What Actions to Take

Increase autonomous resolution:

Filter to public involvement conversations (AI + human collaboration)
Read 15-20 examples to identify why humans intervened
Categorize intervention reasons:
- Missing knowledge → Add to data providers
- Unclear guidance → Refine AI instructions
- Edge cases → Create specific snippets
- Policy decisions → Consider automation or clear escalation rules
Make targeted improvements for most common patterns
Monitor autonomous rate increase over next 2 weeks

Improve CSAT scores:

Filter conversations with 1-2 star ratings
Read customer feedback and conversation content
Identify patterns in dissatisfaction:
- Wrong answers → Update knowledge with correct information
- Incomplete answers → Add more comprehensive content
- Tone issues → Adjust guidance for empathy and clarity
- Product problems → Escalate to product team, add honest acknowledgment to AI
Create snippets addressing common frustrations
Track CSAT improvement for affected topics

Reduce unresolved conversations:

Filter to status: unresolved
Examine conversations for common themes
Distinguish causes:
- User abandoned mid-conversation → May be acceptable
- AI couldn’t find answer → Add missing knowledge
- AI found wrong answer → Correct knowledge base
- Question unclear → Improve AI’s clarifying question ability
Focus on high-volume unresolved topics
Monitor resolution rate increase

Optimize weekend coverage:

Check current Better Monday Score
If below 60%, filter to weekend conversations
Identify topics frequently appearing on weekends
Ensure knowledge coverage for these topics
Verify AI confidence thresholds aren’t too conservative
Monitor Better Monday Score improvement
Quantify impact: tickets handled vs. Monday morning backlog

Always link actions to metrics. Before making changes, note baseline numbers. After deploying improvements, measure the same metrics 1-2 weeks later. This validates that your changes actually worked and builds a library of effective interventions.

Combining Insights with Other Tools

Insights become more powerful when integrated with other botBrains features.

Insights + Topics

Use topic analysis to segment performance by subject matter: Workflow:

View Insights dashboard metrics
Identify concerning trends (declining CSAT, low resolution)
Navigate to Topics dashboard
Examine topic-specific resolution rates in the treemap
Click red/yellow boxes (low resolution topics)
Filter conversations to that topic
Review conversations to identify knowledge gaps
Add targeted knowledge for that topic
Return to Insights to monitor improvement

Example:

Insights shows CSAT dropped from 78% to 71%
Topics reveals “Refund Policy” topic has 45% resolution (red box)
Click through shows 50+ conversations asking about international refunds
Add comprehensive international refund policy to knowledge base
Two weeks later: “Refund Policy” topic at 82% resolution, overall CSAT back to 77%

Insights + Labels

Create custom segments for targeted analysis: Use labels to:

Track enterprise customer performance separately
Monitor specific product areas
Flag training examples
Mark conversations requiring follow-up
Segment by customer lifecycle stage

Workflow:

Create labels for important segments (Enterprise, Billing, Bug Reports)
Apply labels to conversations manually or via automation
In Insights dashboard, filter by label
View metrics for just that segment
Compare performance across different labels
Optimize AI for high-value segments first

Example:

Create “Enterprise” label for high-value customers
Filter Insights to Enterprise conversations only
Discover Enterprise CSAT is 68% vs. 81% overall
Review enterprise conversations to identify unique needs
Add enterprise-specific knowledge and guidance
Monitor enterprise CSAT improvement to 79%

Insights + Conversations

Insights identify what to investigate; Conversations show you why: Workflow:

Insights reveals problem area (e.g., low CSAT for specific topic)
Navigate to Conversations
Apply filters: Topic + Rating 1-2 + Last 30 days
Read 15-20 conversations to understand root causes
Look for patterns in user questions and AI responses
Identify specific missing knowledge or guidance issues
Make targeted improvements
Return to Insights to validate improvement

Example investigation:

Insights: "API Authentication" topic has 55% resolution rate

Conversations filtered to topic:
- 12 conversations ask about OAuth implementation
- AI references general authentication, missing OAuth specifics
- Users rate 1-2 stars, leave feedback "didn't answer my question"

Action:
- Add detailed OAuth documentation to knowledge base
- Create snippet for common OAuth setup questions
- Update guidance to recognize OAuth-specific questions

Result:
- "API Authentication" resolution rate increases to 78%
- CSAT for topic improves from 62% to 81%

Insights and Conversations work together: Insights provides the quantitative “what” (metrics, trends, patterns). Conversations provides the qualitative “why” (actual user questions, AI responses, context). Use both.

Best Practices for AI-Era Measurement

Effective AI agent measurement requires different approaches than traditional support analytics.

Focus on Continuous Improvement, Not Perfection

Don’t aim for 100% metrics:

100% CSAT is unrealistic (some users frustrated with product, not AI)
100% autonomous rate is undesirable (complex issues need human judgment)
Some escalations prevent worse outcomes (wrong automated answers)

Do aim for steady progress:

5 percentage point improvement per month is excellent
Consistent upward trends over 3-6 months demonstrate real improvement
Compound effect: Small weekly gains accumulate to dramatic annual results

Segment Your Analysis

Aggregate metrics hide important patterns: Segment by:

Topic: Some topics have higher knowledge coverage than others
Channel: Web chat users expect different responses than email
Customer tier: Enterprise customers may need different handling
Time period: Weekday vs. weekend performance may differ
User language: Multilingual support quality varies

Example insight from segmentation:

Overall CSAT: 76% (looks okay)

Segmented analysis:
- Web chat: 81% CSAT (good)
- Email: 72% CSAT (needs work)
- Slack: 69% CSAT (investigate)

Action: Focus improvements on email and Slack channels specifically

Establish Baselines and Targets

Metrics mean nothing without context: Create baselines:

When launching AI, document day-one metrics
After first month, establish monthly baseline
Use baseline as comparison point for all future measurements

Set realistic targets:

Example goal setting:
Baseline: 55% Autonomous, 75% CSAT, 68% Resolution
3-month target: 65% Autonomous (+10pp), 80% CSAT (+5pp), 75% Resolution (+7pp)

Track monthly:
Month 1: 58% / 76% / 70% (+3pp / +1pp / +2pp) - on track
Month 2: 61% / 78% / 73% (+3pp / +2pp / +3pp) - on track
Month 3: 66% / 81% / 76% (+5pp / +3pp / +3pp) - exceeded target

Correlate Metrics with Actions

Build institutional knowledge by tracking what works: Create a change log:

Nov 1: Added 15 billing snippets
Nov 8: Updated guidance for technical questions
Nov 15: Enabled weekend deployments
Nov 22: Integrated new help center articles

Review metrics with change context:

Nov 1-7: Resolution 68%, Billing topic 45% resolved
Nov 8-14: Resolution 71%, Billing topic 62% resolved (+17pp) ✓
Nov 15-21: Better Monday jumped from 55% to 78% (+23pp) ✓
Nov 22-28: Resolution 74%, Answer completeness +12% ✓

Build a playbook:

Document which types of changes drove improvement
Note which interventions failed to move metrics
Create reusable patterns for common issues
Train new team members using proven approaches

Screenshot your metrics dashboard weekly. Visual trends are easier to spot than numbers alone, and historical screenshots help you remember what changed when. Store screenshots with your change log for complete context.

Balance Quantitative and Qualitative Analysis

Numbers show you what’s happening. Conversations show you why: Process:

Use Insights to identify problem areas (quantitative)
Read conversations to understand root causes (qualitative)
Make targeted improvements based on both
Use Insights to validate improvements worked (quantitative)
Read new conversations to confirm quality improved (qualitative)

Example:

Quantitative: "Shipping" topic has 52% resolution rate (Insights)
Qualitative: Users asking about international shipping to specific countries (Conversations)
Improvement: Add country-specific shipping information to knowledge base
Validation: "Shipping" topic now 76% resolution (Insights)
Confirmation: Read recent shipping conversations - comprehensive answers (Conversations)

Don’t rely solely on metrics - they guide you to conversations that need review, but the conversations themselves reveal the actionable insights.

Review Regularly and Systematically

Establish predictable review cadences: Weekly review (15 minutes):

Check CSAT and resolution rate trends
Review any sudden drops or spikes
Filter to 1-2 star ratings, scan recent poor experiences
Note any growing topics or unusual patterns
Quick action: Create 1-2 snippets for most common gaps

Monthly deep dive (1 hour):

Compare month-over-month metrics across all categories
Segment analysis by channel, topic, customer tier
Review involvement rate evolution
Topic-specific metrics for top 10 topics
Export data for stakeholder reports
Document improvements made and measured impact

Quarterly strategic review (2-3 hours):

Comprehensive trend analysis across 90-day periods
Calculate ROI metrics (tickets automated, time saved, costs reduced)
Validate long-term strategic goals
Identify seasonal patterns for future planning
Present findings to leadership with recommendations
Set targets for next quarter

Consistency matters more than frequency. Reviewing metrics once a week consistently beats sporadic deep dives. Build the review into your team’s regular rhythm.

Common Patterns and Anti-Patterns

Learn from typical scenarios teams encounter when measuring AI agent performance.

Patterns: What Works

Pattern: The Weekly Improvement Cycle

Team: SaaS company with 500 support conversations/week
Approach:
- Every Monday: 15-minute metrics review meeting
- Identify one topic with low resolution rate
- Add targeted knowledge for that topic
- Deploy by Wednesday
- Review impact the following Monday

Results after 12 weeks:
- Resolution rate: 58% → 79% (+21pp)
- Autonomous rate: 42% → 68% (+26pp)
- CSAT: 72% → 84% (+12pp)
- Support team capacity freed up: ~30 hours/week

Pattern: Segment-Specific Optimization

Team: E-commerce company with mixed customer base
Approach:
- Created labels: Enterprise, SMB, Free
- Analyzed metrics by segment separately
- Discovered Enterprise CSAT lagging (65% vs. 82% overall)
- Investigated enterprise-specific conversation patterns
- Added detailed account management knowledge
- Created escalation rules for billing over $10k

Results after 4 weeks:
- Enterprise CSAT: 65% → 81% (+16pp)
- Enterprise escalations: 45% → 28% (-17pp)
- Enterprise retention improved (qualitative)

Pattern: Weekend Coverage Focus

Team: B2B software with weekend ticket buildup
Approach:
- Measured Better Monday Score: 34% (poor)
- Filtered to weekend conversations
- Identified common weekend topics (login issues, password reset, basic setup)
- Ensured comprehensive knowledge for these topics
- Lowered escalation threshold on weekends (more AI autonomy)

Results after 6 weeks:
- Better Monday Score: 34% → 71% (+37pp)
- Monday morning ticket backlog: -62%
- Team morale improvement (less stressful Mondays)

Anti-Patterns: What to Avoid

Anti-Pattern: Metrics Without Action

Problem:
- Team reviews metrics dashboard weekly
- Identifies issues and discusses them
- Never makes concrete changes to knowledge or guidance
- Metrics stagnate or decline

Why it fails:
- Analysis without action is wasted effort
- Metrics exist to drive improvement, not just observation
- Team becomes demoralized seeing same problems persist

Solution:
- Every metrics review ends with specific action items
- Assign ownership: who will make what change by when
- Track actions in ticketing system or project management tool
- Review action completion in next meeting

Anti-Pattern: Chasing Perfect Metrics

Problem:
- Team obsesses over reaching 100% CSAT, 100% autonomous
- Makes changes that optimize metrics but harm user experience
- Examples: Not offering CSAT survey to avoid low ratings, marking conversations resolved prematurely

Why it fails:
- Perfect metrics are impossible and undesirable
- Gaming metrics destroys their value as signals
- Damages trust in measurement system

Solution:
- Set realistic improvement targets (not perfection)
- Focus on trends over time, not absolute numbers
- Combine metrics with qualitative feedback
- Celebrate steady progress: +5pp/month is excellent

Anti-Pattern: Too Many Changes at Once

Problem:
- Team adds 100 new snippets in one deployment
- Updates all guidance simultaneously
- Changes escalation rules
- Adjusts CSAT survey timing
- Reviews metrics after one week
- Can't determine which change caused improvement or regression

Why it fails:
- Multiple concurrent changes confound attribution
- Can't identify what actually works
- Difficult to debug when metrics decline
- Wastes effort on ineffective changes

Solution:
- Make focused changes: 5-10 snippets per deployment
- Test one major change at a time
- Wait 1-2 weeks between deployments to measure impact
- Document each change with expected metric impact
- Build library of "what works" over time

Anti-Pattern: Ignoring Conversation Context

Problem:
- Team sees escalation rate of 15% and panics
- Tries to eliminate all escalations
- Creates knowledge for edge cases that should escalate
- Autonomous rate increases but CSAT plummets

Why it fails:
- Not all escalations are bad
- Complex, sensitive, or high-stakes issues legitimately need humans
- Forcing AI to handle inappropriate conversations damages trust
- Metrics without context mislead

Solution:
- Read escalated conversations before trying to eliminate escalations
- Distinguish "good escalations" (complex issues) from "bad escalations" (knowledge gaps)
- Accept baseline escalation rate (10-15% is healthy)
- Focus on reducing unnecessary escalations, not all escalations

Anti-Pattern: Analysis Paralysis

Problem:
- Team exports data to spreadsheets for "deeper analysis"
- Creates custom dashboards and complex reports
- Spends hours analyzing correlations
- Delays making improvements while seeking perfect understanding

Why it fails:
- Overthinking prevents action
- Diminishing returns on analysis depth
- Opportunity cost: Time spent analyzing could be spent improving

Solution:
- Use built-in Insights dashboard for 90% of analysis
- Spend 20% of time analyzing, 80% of time improving
- Make decisions with imperfect information
- Iterate quickly: try improvements, measure, adjust
- Deep analysis only for major strategic questions

Case Studies and Examples

Real-world examples of teams using Insights to drive AI agent improvement.

Case Study: SaaS Company Doubles Autonomous Rate

Background:

Mid-size SaaS company
800 support conversations/month
Launched AI agent with basic knowledge base
Initial metrics: 38% autonomous, 71% CSAT, 61% resolution rate

Challenge: Low autonomous rate meant support team still handling majority of conversations. ROI unclear to leadership. Approach:

Week 1: Baseline measurement
- Documented all metrics
- Identified top 10 topics by volume
- Calculated resolution rate for each topic

Week 2-3: Focused improvement
- Selected 3 topics with lowest resolution rates
- "Billing Questions": 28% resolution
- "API Authentication": 35% resolution
- "Data Export": 42% resolution

Week 4-5: Knowledge addition
- Added 8 billing snippets (common questions)
- Imported API documentation as data provider
- Created step-by-step export guide

Week 6: Measurement
- Billing Questions: 28% → 74% resolution (+46pp)
- API Authentication: 35% → 68% resolution (+33pp)
- Data Export: 42% → 81% resolution (+39pp)
- Overall autonomous rate: 38% → 52% (+14pp)

Weeks 7-12: Iteration
- Repeated process for next 3 topics
- Added knowledge for 6 more topic areas
- Refined guidance based on conversation patterns

Results after 12 weeks:

Autonomous rate: 38% → 76% (+38pp)
CSAT: 71% → 83% (+12pp)
Resolution rate: 61% → 81% (+20pp)
Support team hours saved: ~120 hours/month
Tickets handled without human involvement: 608/800 (76%)

Key learnings:

Focus on high-volume topics first for maximum impact
Small, frequent improvements compound rapidly
Tracking topic-specific metrics reveals exactly where to invest effort
Team morale improved as repetitive questions automated

Example: E-Commerce Weekend Coverage

Situation: Online retailer with high weekend traffic but no weekend support coverage. Initial state:

Better Monday Score: 29%
Monday morning backlog: Average 85 tickets
Weekend conversations: 140/week
Weekend AI involvement: Minimal

Analysis: Filtered Insights to weekend conversations only:

45% about order status
22% about shipping questions
18% about returns/exchanges
15% other topics

Actions:

Enriched order status knowledge with detailed tracking explanations
Added comprehensive shipping policy including all carriers and timelines
Created return/exchange policy snippets for all product categories
Adjusted AI confidence threshold to be more autonomous on weekends

Results after 4 weeks:

Better Monday Score: 29% → 72% (+43pp)
Monday morning backlog: 85 → 32 tickets (-62%)
Weekend autonomous handling: 101/140 conversations (72%)
Customer feedback: Positive comments about weekend availability
Support team: “Mondays went from dreaded to manageable”

Example: Enterprise Customer Optimization

Context: B2B platform with mixed customer base (Enterprise, SMB, Free). Discovery through segmentation:

Overall metrics looked healthy:
- CSAT: 79%
- Resolution rate: 74%
- Autonomous rate: 64%

Segmented by customer tier:
Enterprise (20% of conversations):
- CSAT: 68% (11pp below average)
- Escalation rate: 38% (high)
- Frequent topics: Account management, billing, compliance

SMB (50% of conversations):
- CSAT: 82% (3pp above average)
- Performing well

Free (30% of conversations):
- CSAT: 81% (2pp above average)
- Performing well

Insight: AI optimized for SMB/Free users but missing enterprise-specific knowledge and appropriate escalation. Actions:

Created enterprise-specific snippets (SSO setup, custom contracts, compliance questions)
Added guidance recognizing enterprise customer conversations (via email domain, account size)
Implemented faster escalation for enterprise accounts (to dedicated account managers)
Enhanced knowledge for multi-seat licensing questions

Results after 8 weeks:

Enterprise CSAT: 68% → 79% (+11pp)
Enterprise escalation rate: 38% → 31% (-7pp, but escalations now go to right team)
Enterprise customers reporting improved support experience
Account renewals up 12% (partial attribution)

Key learning: Aggregate metrics can mask important segment-specific issues. Always segment by business-critical categories.

Next Steps

You now understand how to measure AI agent performance and drive continuous improvement. Here’s how to apply these insights:

Review Conversations

Dive deep into individual conversations to understand the qualitative context behind your metrics

Analyze Topics

Segment performance by topic to identify specific knowledge gaps and improvement opportunities

Track Metrics Dashboard

Access detailed performance visualizations and export data for reporting

Add Knowledge

Fill gaps identified through insights analysis with targeted content

Improve AI Responses

Refine guidance and instructions based on conversation patterns and metrics

Remember the Core Principles

AI agent measurement is fundamentally different from traditional support metrics:

Autonomous resolution matters most: Track and optimize for conversations the AI handles completely
Knowledge gaps are opportunities: Every unresolved conversation reveals what to add next
Improvement is continuous: Weekly small gains compound to dramatic long-term results
Segment your analysis: Aggregate metrics hide important patterns
Combine quantitative and qualitative: Metrics identify problems, conversations explain causes
Focus on trends, not perfection: Steady improvement over time beats chasing 100% metrics

Start with your current baseline, make one focused improvement per week, and systematically build an AI agent that gets better every month. The insights are there - now put them into action.

Getting Started

Core Concepts

Train

Deploy

Analyze

Guides

Security

Other

More

​Why Traditional Metrics Fall Short

​AI-Era Insights That Matter

​Autonomous Resolution Tracking

​AI Involvement Patterns

​Quality of AI Responses

​Knowledge Gap Identification

​Continuous Improvement Loops

​Using the Insights Dashboard

​Dashboard Overview

​Key Metrics Cards

​AI-Specific Performance Charts

​Ticketing-Specific Insights

​Actionable Insights

​What to Measure

​How to Interpret

​What Actions to Take

​Combining Insights with Other Tools

​Insights + Topics

​Insights + Labels

​Insights + Conversations

​Best Practices for AI-Era Measurement

​Focus on Continuous Improvement, Not Perfection

​Segment Your Analysis

​Establish Baselines and Targets

​Correlate Metrics with Actions

​Balance Quantitative and Qualitative Analysis

​Review Regularly and Systematically

​Common Patterns and Anti-Patterns

​Patterns: What Works

​Anti-Patterns: What to Avoid

​Case Studies and Examples

​Case Study: SaaS Company Doubles Autonomous Rate

​Example: E-Commerce Weekend Coverage

​Example: Enterprise Customer Optimization

​Next Steps

Review Conversations

Analyze Topics

Track Metrics Dashboard

Add Knowledge

Improve AI Responses

​Remember the Core Principles

Why Traditional Metrics Fall Short

AI-Era Insights That Matter

Autonomous Resolution Tracking

AI Involvement Patterns

Quality of AI Responses

Knowledge Gap Identification

Continuous Improvement Loops

Using the Insights Dashboard

Dashboard Overview

Key Metrics Cards

AI-Specific Performance Charts

Ticketing-Specific Insights

Actionable Insights

What to Measure

How to Interpret

What Actions to Take

Combining Insights with Other Tools

Insights + Topics

Insights + Labels

Insights + Conversations

Best Practices for AI-Era Measurement

Focus on Continuous Improvement, Not Perfection

Segment Your Analysis

Establish Baselines and Targets

Correlate Metrics with Actions

Balance Quantitative and Qualitative Analysis

Review Regularly and Systematically

Common Patterns and Anti-Patterns

Patterns: What Works

Anti-Patterns: What to Avoid

Case Studies and Examples

Case Study: SaaS Company Doubles Autonomous Rate

Example: E-Commerce Weekend Coverage

Example: Enterprise Customer Optimization

Next Steps

Remember the Core Principles