December 2025

How ThoughtWorks Stress-Tested Real-Time AI Agents in Collinear's Simulated Call Center Environment

5x

faster responses

5x

lower inference costs

Thousands

of simulated call center interactions

5x

faster responses

5x

lower inference costs

Thousands

of simulated call center interactions

When ThoughtWorks set out to benchmark real-time AI performance for Australian enterprises, they needed more than fast hardware. They needed a realistic environment to test against. A call center with thousands of customers, multi-turn conversations, linguistic variability, and the kind of edge cases that break models under load.

Building that environment from scratch would have taken months. Collecting real call center data raised privacy and compliance issues. ThoughtWorks needed a simulated enterprise environment they could stand up fast and run at scale.

The Challenge

  • No realistic test environment. Benchmarking AI performance on scripted or templated conversations produces numbers that don't hold up in production. ThoughtWorks needed an environment that mirrored how a high-volume call center actually operates.
  • Multi-layer evaluation required. The pipeline spanned speech-to-text, language model processing, and real-time response generation. Each layer needed diverse, realistic inputs to surface performance differences under load.
  • Enterprise-grade validation. Australian enterprises evaluating AI ROI need more than throughput numbers. They need evidence of accuracy, reliability, and consistent behavior under realistic conditions.

The Solution

ThoughtWorks used Collinear's Simulation Lab to stand up a fully simulated call center environment for the benchmark.

  • Simulated call center with thousands of realistic interactions. The Simulation Lab generated multi-turn customer conversations with real phrasing patterns, dialect variability, interruptions, and error cases. The environment replicated the conditions of a high-volume enterprise call center at a scale that would have been impossible to assemble manually.
  • Real-time verification during live runs. The Simulation Lab's evaluation engine scored model performance as agents processed simulated calls, measuring accuracy, reliability, and response quality under load. ThoughtWorks could compare Groq's LPU against GPU-based alternatives with structured, reproducible results.
  • Production-ready from day one. The simulated environment plugged directly into ThoughtWorks' test harness with no reformatting, cleaning, or manual review. Compliance-grade, license-clean, ready for enterprise presentation.

The Results

  • 5x faster responses -- real-time AI at the speed of human conversation, validated in a simulated enterprise environment.
  • 5x lower inference costs -- large-scale deployment economics proven under realistic call center load.
  • Robust accuracy under volume -- thousands of simulated interactions confirmed consistent model behavior, not just peak throughput on clean inputs.

"Imagine the competitive advantage organisations could build when not having to choose between improved AI performance, user experience or budgets."

- Andy Nolan, Global VP of AI, ThoughtWorks

See what Collinear's Simulation Lab can do for your team.

Book a demo

Company
ThoughtWorks
Industry
Technology Consulting
Company size
10,000+ employees
Pain point
Needed a realistic simulated call center environment to credibly benchmark real-time AI performance at enterprise scale
Collinear SimLab Use Case
Agent Testing
About the company

ThoughtWorks is a global technology consultancy that integrates design, engineering, and AI to drive digital innovation. With 10,000+ employees across 18 countries, they help enterprises build scalable, trustworthy AI systems.

Results
  • 1000s of simulated enterprise interactions
  • 5x faster responses
  • 5x lower inference costs

When ThoughtWorks set out to benchmark real-time AI performance for Australian enterprises, they needed more than fast hardware. They needed a realistic environment to test against. A call center with thousands of customers, multi-turn conversations, linguistic variability, and the kind of edge cases that break models under load.

Building that environment from scratch would have taken months. Collecting real call center data raised privacy and compliance issues. ThoughtWorks needed a simulated enterprise environment they could stand up fast and run at scale.

The Challenge

  • No realistic test environment. Benchmarking AI performance on scripted or templated conversations produces numbers that don't hold up in production. ThoughtWorks needed an environment that mirrored how a high-volume call center actually operates.
  • Multi-layer evaluation required. The pipeline spanned speech-to-text, language model processing, and real-time response generation. Each layer needed diverse, realistic inputs to surface performance differences under load.
  • Enterprise-grade validation. Australian enterprises evaluating AI ROI need more than throughput numbers. They need evidence of accuracy, reliability, and consistent behavior under realistic conditions.

The Solution

ThoughtWorks used Collinear's Simulation Lab to stand up a fully simulated call center environment for the benchmark.

  • Simulated call center with thousands of realistic interactions. The Simulation Lab generated multi-turn customer conversations with real phrasing patterns, dialect variability, interruptions, and error cases. The environment replicated the conditions of a high-volume enterprise call center at a scale that would have been impossible to assemble manually.
  • Real-time verification during live runs. The Simulation Lab's evaluation engine scored model performance as agents processed simulated calls, measuring accuracy, reliability, and response quality under load. ThoughtWorks could compare Groq's LPU against GPU-based alternatives with structured, reproducible results.
  • Production-ready from day one. The simulated environment plugged directly into ThoughtWorks' test harness with no reformatting, cleaning, or manual review. Compliance-grade, license-clean, ready for enterprise presentation.

The Results

  • 5x faster responses -- real-time AI at the speed of human conversation, validated in a simulated enterprise environment.
  • 5x lower inference costs -- large-scale deployment economics proven under realistic call center load.
  • Robust accuracy under volume -- thousands of simulated interactions confirmed consistent model behavior, not just peak throughput on clean inputs.

"Imagine the competitive advantage organisations could build when not having to choose between improved AI performance, user experience or budgets."

- Andy Nolan, Global VP of AI, ThoughtWorks

See what Collinear's Simulation Lab can do for your team.

Book a demo