How ThoughtWorks Stress-Tested Real-Time AI Agents in Collinear's Simulated Call Center Environment

5x
faster responses
5x
lower inference costs
Thousands
of simulated call center interactions
5x
faster responses
5x
lower inference costs
Thousands
of simulated call center interactions
When ThoughtWorks set out to benchmark real-time AI performance for Australian enterprises, they needed more than fast hardware. They needed a realistic environment to test against. A call center with thousands of customers, multi-turn conversations, linguistic variability, and the kind of edge cases that break models under load.
Building that environment from scratch would have taken months. Collecting real call center data raised privacy and compliance issues. ThoughtWorks needed a simulated enterprise environment they could stand up fast and run at scale.
The Challenge
- No realistic test environment. Benchmarking AI performance on scripted or templated conversations produces numbers that don't hold up in production. ThoughtWorks needed an environment that mirrored how a high-volume call center actually operates.
- Multi-layer evaluation required. The pipeline spanned speech-to-text, language model processing, and real-time response generation. Each layer needed diverse, realistic inputs to surface performance differences under load.
- Enterprise-grade validation. Australian enterprises evaluating AI ROI need more than throughput numbers. They need evidence of accuracy, reliability, and consistent behavior under realistic conditions.
The Solution
ThoughtWorks used Collinear's Simulation Lab to stand up a fully simulated call center environment for the benchmark.
- Simulated call center with thousands of realistic interactions. The Simulation Lab generated multi-turn customer conversations with real phrasing patterns, dialect variability, interruptions, and error cases. The environment replicated the conditions of a high-volume enterprise call center at a scale that would have been impossible to assemble manually.
- Real-time verification during live runs. The Simulation Lab's evaluation engine scored model performance as agents processed simulated calls, measuring accuracy, reliability, and response quality under load. ThoughtWorks could compare Groq's LPU against GPU-based alternatives with structured, reproducible results.
- Production-ready from day one. The simulated environment plugged directly into ThoughtWorks' test harness with no reformatting, cleaning, or manual review. Compliance-grade, license-clean, ready for enterprise presentation.
The Results
- 5x faster responses -- real-time AI at the speed of human conversation, validated in a simulated enterprise environment.
- 5x lower inference costs -- large-scale deployment economics proven under realistic call center load.
- Robust accuracy under volume -- thousands of simulated interactions confirmed consistent model behavior, not just peak throughput on clean inputs.
"Imagine the competitive advantage organisations could build when not having to choose between improved AI performance, user experience or budgets."
- Andy Nolan, Global VP of AI, ThoughtWorks
See what Collinear's Simulation Lab can do for your team.
Book a demo
- 1000s of simulated enterprise interactions
- 5x faster responses
- 5x lower inference costs
When ThoughtWorks set out to benchmark real-time AI performance for Australian enterprises, they needed more than fast hardware. They needed a realistic environment to test against. A call center with thousands of customers, multi-turn conversations, linguistic variability, and the kind of edge cases that break models under load.
Building that environment from scratch would have taken months. Collecting real call center data raised privacy and compliance issues. ThoughtWorks needed a simulated enterprise environment they could stand up fast and run at scale.
The Challenge
- No realistic test environment. Benchmarking AI performance on scripted or templated conversations produces numbers that don't hold up in production. ThoughtWorks needed an environment that mirrored how a high-volume call center actually operates.
- Multi-layer evaluation required. The pipeline spanned speech-to-text, language model processing, and real-time response generation. Each layer needed diverse, realistic inputs to surface performance differences under load.
- Enterprise-grade validation. Australian enterprises evaluating AI ROI need more than throughput numbers. They need evidence of accuracy, reliability, and consistent behavior under realistic conditions.
The Solution
ThoughtWorks used Collinear's Simulation Lab to stand up a fully simulated call center environment for the benchmark.
- Simulated call center with thousands of realistic interactions. The Simulation Lab generated multi-turn customer conversations with real phrasing patterns, dialect variability, interruptions, and error cases. The environment replicated the conditions of a high-volume enterprise call center at a scale that would have been impossible to assemble manually.
- Real-time verification during live runs. The Simulation Lab's evaluation engine scored model performance as agents processed simulated calls, measuring accuracy, reliability, and response quality under load. ThoughtWorks could compare Groq's LPU against GPU-based alternatives with structured, reproducible results.
- Production-ready from day one. The simulated environment plugged directly into ThoughtWorks' test harness with no reformatting, cleaning, or manual review. Compliance-grade, license-clean, ready for enterprise presentation.
The Results
- 5x faster responses -- real-time AI at the speed of human conversation, validated in a simulated enterprise environment.
- 5x lower inference costs -- large-scale deployment economics proven under realistic call center load.
- Robust accuracy under volume -- thousands of simulated interactions confirmed consistent model behavior, not just peak throughput on clean inputs.
"Imagine the competitive advantage organisations could build when not having to choose between improved AI performance, user experience or budgets."
- Andy Nolan, Global VP of AI, ThoughtWorks
See what Collinear's Simulation Lab can do for your team.
Book a demo
