August 2025

ServiceNow’s Apriel-1.5-15b-Thinker matches larger-model accuracy with half the data

~50%

fewer training tokens

‍

70B

performance in a 12B model

‍

~1.8×

faster SFT cycles

‍

$10M+

annualized savings

‍

~50%

fewer training tokens

‍

70B

performance in a 12B model

‍

~1.8×

faster SFT cycles

‍

ServiceNow was interested in expanding its GenAI capabilities from pilot projects to core platforms. Early gains were achieved by scaling up—larger models and ever-growing datasets—but that approach quickly ran into limits of cost, efficiency, and maintainability. ServiceNow needed a way to deliver enterprise-grade performance without endless scale, ensuring results could be sustained across multiple product lines and customer-facing workflows.

‍

Challenge

‍

As ServiceNow scaled its GenAI initiatives, the initial strategy of training bigger models on ever-larger datasets hit clear limits.

Rising costs and slower cycles: Escalating GPU spend and longer training runs made iteration unsustainable.
Noisy data: Synthetic datasets carried too many low-signal or incorrect examples, dragging down efficiency.
Performance plateau: Accuracy gains from scaling were flattening, despite higher investment.

ServiceNow needed a way to sustain performance improvements without runaway scale and make those gains repeatable across multiple product lines.

‍

Solution

‍

To break through the plateau, ServiceNow turned to Collinear’s curator models—specialized evaluators trained to spot high-signal data in agentic reasoning, code, and math.

Domain-specific filtering: Curators scored synthetic data for correctness, coherence, and instruction-following, ensuring that only the strongest agentic, code, and math samples were selected.
Low false positives: By prioritizing precision over volume, curators avoided the “garbage in, garbage out” problem that often drags down post-training.
Repeatable process: Every curation run produced a traceable dataset split that teams could reuse, benchmark, and audit across product lines.

Instead of chasing scale, ServiceNow built a curation-first workflow—turning noisy synthetic data into a dependable foundation for model improvement.

‍

LinkedIn Post: Launch of Apriel-1.5-15B-Thinker

‍

Results

‍

By running smaller models on curated agentic, code, and math datasets, ServiceNow achieved large-model performance at a fraction of the cost and time.

~50% fewer tokens and compute — with no loss in benchmark accuracy.
12B-class model matched 32B–70B baselines, even on complex agentic reasoning and coding tasks.
~1.8× faster supervised fine-tuning cycles, accelerating iteration and release cadence.
Millions in annualized savings, from lower training bills and smaller production models.

What had once plateaued with scale now became a repeatable improvement loop—unlocking sustainable performance gains in some of the most challenging enterprise AI domains.

‍

ServiceNow’s Apriel-1.5-15b-Thinker HuggingFace Model Card

‍

Build Smaller. Deliver Bigger.

‍

ServiceNow proved that bigger isn’t always better. By curating agentic, code, and math data, they unlocked large-model performance with half the cost and time.

If your organization is facing the same plateau, the path forward is clear: curate smarter, train leaner, and deliver enterprise-grade AI without runaway spend.

‍

Let’s explore how this can work for your organization.

‍

Company

ServiceNow

Industry

Software

Company size

Enterprise

Pain point

Lacked high-quality training data in specific formats (e.g., IT service tickets)

Collinear product used

Bespoke curated data pipeline

About the company

A leading cloud-based enterprise software provider that offers a unified platform to automate digital workflows and IT service processes across functions like IT, HR, and customer service, helping large organizations streamline operations and improve efficiency.

‍