December 2025

How ThoughtWorks Stress-Tested Real-Time AI Agents in Collinear's Simulated Call Center Environment

5x

faster responses

5x

lower inference costs

‍

Thousands

of simulated call center interactions

‍

5x

faster responses

5x

lower inference costs

‍

Thousands

of simulated call center interactions

‍

When ThoughtWorks set out to benchmark real-time AI performance for Australian enterprises, they needed more than fast hardware. They needed a realistic environment to test against. A call center with thousands of customers, multi-turn conversations, linguistic variability, and the kind of edge cases that break models under load.

‍

Building that environment from scratch would have taken months. Collecting real call center data raised privacy and compliance issues. ThoughtWorks needed a simulated enterprise environment they could stand up fast and run at scale.

‍

The Challenge

‍

‍No realistic test environment. Benchmarking AI performance on scripted or templated conversations produces numbers that don't hold up in production. ThoughtWorks needed an environment that mirrored how a high-volume call center actually operates.
‍
‍Multi-layer evaluation required. The pipeline spanned speech-to-text, language model processing, and real-time response generation. Each layer needed diverse, realistic inputs to surface performance differences under load.
‍
‍Enterprise-grade validation. Australian enterprises evaluating AI ROI need more than throughput numbers. They need evidence of accuracy, reliability, and consistent behavior under realistic conditions.

‍

The Solution

‍

ThoughtWorks used Collinear's Simulation Lab to stand up a fully simulated call center environment for the benchmark.

‍

‍Simulated call center with thousands of realistic interactions. The Simulation Lab generated multi-turn customer conversations with real phrasing patterns, dialect variability, interruptions, and error cases. The environment replicated the conditions of a high-volume enterprise call center at a scale that would have been impossible to assemble manually.
‍
‍Real-time verification during live runs. The Simulation Lab's evaluation engine scored model performance as agents processed simulated calls, measuring accuracy, reliability, and response quality under load. ThoughtWorks could compare Groq's LPU against GPU-based alternatives with structured, reproducible results.
‍
‍Production-ready from day one. The simulated environment plugged directly into ThoughtWorks' test harness with no reformatting, cleaning, or manual review. Compliance-grade, license-clean, ready for enterprise presentation.

‍

The Results

5x faster responses -- real-time AI at the speed of human conversation, validated in a simulated enterprise environment.

5x lower inference costs -- large-scale deployment economics proven under realistic call center load.

Robust accuracy under volume -- thousands of simulated interactions confirmed consistent model behavior, not just peak throughput on clean inputs.

‍

"Imagine the competitive advantage organisations could build when not having to choose between improved AI performance, user experience or budgets."
‍
- Andy Nolan, Global VP of AI, ThoughtWorks

‍

See what Collinear's Simulation Lab can do for your team.

‍

Book a demo

‍

Company

ThoughtWorks

Industry

Technology Consulting

Company size

10,000+ employees

Pain point

Needed a realistic simulated call center environment to credibly benchmark real-time AI performance at enterprise scale

Collinear SimLab Use Case

Agent Testing

Results

1000s of simulated enterprise interactions
5x faster responses
5x lower inference costs

‍

The Challenge

‍

‍No realistic test environment. Benchmarking AI performance on scripted or templated conversations produces numbers that don't hold up in production. ThoughtWorks needed an environment that mirrored how a high-volume call center actually operates.
‍
‍Multi-layer evaluation required. The pipeline spanned speech-to-text, language model processing, and real-time response generation. Each layer needed diverse, realistic inputs to surface performance differences under load.
‍
‍Enterprise-grade validation. Australian enterprises evaluating AI ROI need more than throughput numbers. They need evidence of accuracy, reliability, and consistent behavior under realistic conditions.

‍

The Solution

‍

ThoughtWorks used Collinear's Simulation Lab to stand up a fully simulated call center environment for the benchmark.

‍

‍Simulated call center with thousands of realistic interactions. The Simulation Lab generated multi-turn customer conversations with real phrasing patterns, dialect variability, interruptions, and error cases. The environment replicated the conditions of a high-volume enterprise call center at a scale that would have been impossible to assemble manually.
‍
‍Real-time verification during live runs. The Simulation Lab's evaluation engine scored model performance as agents processed simulated calls, measuring accuracy, reliability, and response quality under load. ThoughtWorks could compare Groq's LPU against GPU-based alternatives with structured, reproducible results.
‍
‍Production-ready from day one. The simulated environment plugged directly into ThoughtWorks' test harness with no reformatting, cleaning, or manual review. Compliance-grade, license-clean, ready for enterprise presentation.

‍

The Results

5x faster responses -- real-time AI at the speed of human conversation, validated in a simulated enterprise environment.

5x lower inference costs -- large-scale deployment economics proven under realistic call center load.

Robust accuracy under volume -- thousands of simulated interactions confirmed consistent model behavior, not just peak throughput on clean inputs.

‍

"Imagine the competitive advantage organisations could build when not having to choose between improved AI performance, user experience or budgets."
‍
- Andy Nolan, Global VP of AI, ThoughtWorks

‍

See what Collinear's Simulation Lab can do for your team.

‍

Book a demo

‍

The Challenge

The Solution

The Results

See what Collinear's Simulation Lab can do for your team.

Book a demo

Stop launch-and-pray AI.