It's tempting to celebrate a 90% "containment rate" — the share of conversations the AI handled without a human. But containment counts conversations that ended, not conversations that went well. A customer who gave up is also "contained."

Four numbers worth watching

  • Resolution rate — did the customer's actual goal get met?
  • Escalation quality — when it handed off, did it hand off at the right moment with full context?
  • First-response and full-resolution latency — speed is part of the experience.
  • CSAT on AI-only threads — measured separately from human threads.
If containment goes up but CSAT goes down, you haven't automated support — you've automated frustration.

Sample, don't guess

Pull a weekly random sample of transcripts and score them against your SOPs. Patterns surface fast: a product the agent keeps getting wrong, a policy it over-applies, a phrasing that confuses it. That review loop is where quality actually compounds.