When to use simulations
Simulations are worth the setup when there is a concrete outcome to assert, such as a specific reply, a tool call, a completed procedure, or an escalation. They pay off most for:- Risky flows that write data or have monetary impact, like cancellations and refunds, where you want every branch covered before deploying.
- High-volume topics where a small quality gain removes a large amount of work.
- Regression protection on behavior you have already fixed once and don’t want to break again.
Create a simulation
Open Simulations and add a simulation to a group. Groups organize simulations by topic or audience, such as Subscriptions or Billing.| Field | Purpose |
|---|---|
| Name | Describes the scenario, for example Cancel subscription with refund. |
| Group | The group the simulation belongs to. |
| Start message | The first message the synthetic customer sends, which kicks off the run. |
| Situation context | Background the synthetic customer knows but won’t necessarily state upfront, used to drive realistic follow-ups. |
| Checks | The assertions that decide pass or fail. See Checks. |
| Tool overrides | Mocked tool outputs so the agent never calls real systems. See Tool overrides. |
| Context overrides | Override runtime context such as channel, customer user, or time. |
Checks
Checks are the assertions evaluated at the end of a run. A simulation passes only when all its checks pass. Add as many as you need.| Check | Passes when |
|---|---|
| Procedure finished | A specific procedure runs to completion during the simulation. |
| Tool used | The agent invokes a tool with the given name at least once. |
| AI replied | A freeform, LLM-judged condition holds at the end of the run, for example “The agent confirmed the cancellation and offered a refund.” |
| Escalated | The agent escalates the conversation to a human. |
Tool overrides
Tool overrides return a predetermined output whenever the agent calls a given tool, so simulations stay reproducible and never touch live systems. Use them to:- Avoid real side effects, such as actually cancelling a subscription.
- Feed specific data, such as a particular refund amount or contract state.
- Test failure handling by mocking an error and confirming the agent escalates instead of falsely confirming success.
Run simulations and read results
Run a single simulation from its row, or select several and run them as a batch on your current Production deployment. The queue moves each run throughScheduled, Running, and then Passed, Failed, or Errored. An Errored run means the agent couldn’t complete the scenario end to end, which is itself a signal worth investigating.
Open a run to see the full detail:
- Checks with a pass or fail icon each, plus the reasoning for AI-judged checks.
- The conversation transcript between the synthetic customer and the agent, including tool calls.
- The agent version the run executed against, labeled Production or Immutable.
- A history strip of recent runs so you can see when behavior changed.
Build a suite for risky changes
For a risky flow like cancellations, don’t rely on a single manual test. Write one simulation per branch of the procedure (standard case, customer with a second request, ineligible customer, tool error) and group them together. After any future edit, run the whole group with one click and deploy only when every case passes. This replaces repeated manual click testing while keeping the same safety, and it scales as the flow grows more complex.Next steps
Procedures
Model the multi-step flows your simulations verify.
Testing AI Agents
Compare simulations with live testing and View Alternative.