Engineering tends to file latency under infrastructure. Customers file it under "do they care about me?" Those are very different ledgers.
The felt experience of waiting
A correct answer that arrives late still reads as poor service. Treating response time as a customer-experience metric changes what you prioritise and how you staff.
- Measure time-to-first-token, not just total time.
- Set latency targets per channel and defend them.
- Treat slow as a defect, not a tuning preference.
Fast and right is the bar. Right but slow quietly loses customers you never hear from again.