August 2025

ServiceNow’s Apriel-1.5-15b-Thinker matches larger-model accuracy with half the data

~50%

fewer training tokens 

70B

performance in a 12B model

~1.8×

faster SFT cycles

$10M+

annualized savings

~50%

fewer training tokens 

70B

performance in a 12B model

~1.8×

faster SFT cycles

ServiceNow was interested in expanding its GenAI capabilities from pilot projects to core platforms. Early gains were achieved by scaling up—larger models and ever-growing datasets—but that approach quickly ran into limits of cost, efficiency, and maintainability. ServiceNow needed a way to deliver enterprise-grade performance without endless scale, ensuring results could be sustained across multiple product lines and customer-facing workflows.

Challenge

As ServiceNow scaled its GenAI initiatives, the initial strategy of training bigger models on ever-larger datasets hit clear limits.

  • Rising costs and slower cycles: Escalating GPU spend and longer training runs made iteration unsustainable.
  • Noisy data: Synthetic datasets carried too many low-signal or incorrect examples, dragging down efficiency.
  • Performance plateau: Accuracy gains from scaling were flattening, despite higher investment.

ServiceNow needed a way to sustain performance improvements without runaway scale and make those gains repeatable across multiple product lines.

Solution 

To break through the plateau, ServiceNow turned to Collinear’s curator models—specialized evaluators trained to spot high-signal data in agentic reasoning, code, and math.

  • Domain-specific filtering: Curators scored synthetic data for correctness, coherence, and instruction-following, ensuring that only the strongest agentic, code, and math samples were selected.

  • Low false positives: By prioritizing precision over volume, curators avoided the “garbage in, garbage out” problem that often drags down post-training.

  • Repeatable process: Every curation run produced a traceable dataset split that teams could reuse, benchmark, and audit across product lines.

Instead of chasing scale, ServiceNow built a curation-first workflow—turning noisy synthetic data into a dependable foundation for model improvement.

Results

By running smaller models on curated agentic, code, and math datasets, ServiceNow achieved large-model performance at a fraction of the cost and time.

  • ~50% fewer tokens and compute — with no loss in benchmark accuracy.

  • 12B-class model matched 32B–70B baselines, even on complex agentic reasoning and coding tasks.

  • ~1.8× faster supervised fine-tuning cycles, accelerating iteration and release cadence.

  • Millions in annualized savings, from lower training bills and smaller production models.

What had once plateaued with scale now became a repeatable improvement loop—unlocking sustainable performance gains in some of the most challenging enterprise AI domains.

Build Smaller. Deliver Bigger.

ServiceNow proved that bigger isn’t always better. By curating agentic, code, and math data, they unlocked large-model performance with half the cost and time.

If your organization is facing the same plateau, the path forward is clear: curate smarter, train leaner, and deliver enterprise-grade AI without runaway spend.

Let’s explore how this can work for your organization.

Company
ServiceNow
Industry
Software
Company size
Enterprise
Pain point
Lacked high-quality training data in specific formats (e.g., IT service tickets)
Collinear product used
Bespoke curated data pipeline
About the company

A leading cloud-based enterprise software provider that offers a unified platform to automate digital workflows and IT service processes across functions like IT, HR, and customer service, helping large organizations streamline operations and improve efficiency.

ServiceNow was interested in expanding its GenAI capabilities from pilot projects to core platforms. Early gains were achieved by scaling up—larger models and ever-growing datasets—but that approach quickly ran into limits of cost, efficiency, and maintainability. ServiceNow needed a way to deliver enterprise-grade performance without endless scale, ensuring results could be sustained across multiple product lines and customer-facing workflows.

Challenge

As ServiceNow scaled its GenAI initiatives, the initial strategy of training bigger models on ever-larger datasets hit clear limits.

  • Rising costs and slower cycles: Escalating GPU spend and longer training runs made iteration unsustainable.
  • Noisy data: Synthetic datasets carried too many low-signal or incorrect examples, dragging down efficiency.
  • Performance plateau: Accuracy gains from scaling were flattening, despite higher investment.

ServiceNow needed a way to sustain performance improvements without runaway scale and make those gains repeatable across multiple product lines.

Solution 

To break through the plateau, ServiceNow turned to Collinear’s curator models—specialized evaluators trained to spot high-signal data in agentic reasoning, code, and math.

  • Domain-specific filtering: Curators scored synthetic data for correctness, coherence, and instruction-following, ensuring that only the strongest agentic, code, and math samples were selected.

  • Low false positives: By prioritizing precision over volume, curators avoided the “garbage in, garbage out” problem that often drags down post-training.

  • Repeatable process: Every curation run produced a traceable dataset split that teams could reuse, benchmark, and audit across product lines.

Instead of chasing scale, ServiceNow built a curation-first workflow—turning noisy synthetic data into a dependable foundation for model improvement.

Results

By running smaller models on curated agentic, code, and math datasets, ServiceNow achieved large-model performance at a fraction of the cost and time.

  • ~50% fewer tokens and compute — with no loss in benchmark accuracy.

  • 12B-class model matched 32B–70B baselines, even on complex agentic reasoning and coding tasks.

  • ~1.8× faster supervised fine-tuning cycles, accelerating iteration and release cadence.

  • Millions in annualized savings, from lower training bills and smaller production models.

What had once plateaued with scale now became a repeatable improvement loop—unlocking sustainable performance gains in some of the most challenging enterprise AI domains.

Build Smaller. Deliver Bigger.

ServiceNow proved that bigger isn’t always better. By curating agentic, code, and math data, they unlocked large-model performance with half the cost and time.

If your organization is facing the same plateau, the path forward is clear: curate smarter, train leaner, and deliver enterprise-grade AI without runaway spend.

Let’s explore how this can work for your organization.