It's tempting to celebrate a 90% "containment rate" — the share of conversations the AI handled without a human. But containment counts conversations that ended, not conversations that went well. A customer who gave up is also "contained."
Four numbers worth watching
- Resolution rate — did the customer's actual goal get met?
- Escalation quality — when it handed off, did it hand off at the right moment with full context?
- First-response and full-resolution latency — speed is part of the experience.
- CSAT on AI-only threads — measured separately from human threads.
If containment goes up but CSAT goes down, you haven't automated support — you've automated frustration.
Sample, don't guess
Pull a weekly random sample of transcripts and score them against your SOPs. Patterns surface fast: a product the agent keeps getting wrong, a policy it over-applies, a phrasing that confuses it. That review loop is where quality actually compounds.