Engineering tends to file latency under infrastructure. Customers file it under "do they care about me?" Those are very different ledgers.

The felt experience of waiting

A correct answer that arrives late still reads as poor service. Treating response time as a customer-experience metric changes what you prioritise and how you staff.

  • Measure time-to-first-token, not just total time.
  • Set latency targets per channel and defend them.
  • Treat slow as a defect, not a tuning preference.

Fast and right is the bar. Right but slow quietly loses customers you never hear from again.