< let's go back to the good stuff

Signal

Original article date: Jun 02, 2026

How Oracle Built Veritas: A Blueprint for Scaling Generative AI Evaluation Across Enterprise Teams

As AI teams scale, evaluation becomes the bottleneck — and Oracle built an entire framework to fix it.

When Oracle’s teams expanded their use of large language models in 2023, model evaluation quickly became chaotic. Teams were benchmarking models, testing product scenarios, and comparing systems — but every evaluation lived in its own pile of scripts, datasets, and custom reporting code. Results were useful locally but nearly impossible to compare or reproduce across teams.

The solution was Veritas: an internal evaluation framework designed to turn AI model testing from a collection of one-off scripts into a structured engineering system.

How Veritas Works

Veritas organizes every evaluation pipeline around four stages: Transformation, Generation, Evaluation, and Summarization. This consistent structure makes evaluations reusable and comparable. Instead of each team rebuilding the same infrastructure, they compose new benchmarks from shared predictors, evaluators, and datasets — and only write new code for genuinely novel parts.

Key components include:

Testsuites defining end-to-end evaluation pipelines in YAML
Shared predictors and evaluators reusable across workloads
A scheduler that tracks dependencies, runs independent work in parallel, and supports retries and checkpointing

What Changed in Practice

Since launch, Oracle’s AI team has published 150+ testsuites and 50+ model evaluation reports covering reasoning, RAG, responsible AI, code generation, NL2SQL, multimodal tasks, and more. The framework also enabled an internal leaderboard — so teams can see whether a quality improvement justifies a change in cost or latency for a specific use case.

Key Takeaways

Evaluation fragmentation is a scaling problem, not just a tooling problem — Veritas shows the value of treating it as a first-class engineering system.
Composable YAML pipelines beat custom scripts — reuse becomes practical when components are declared, not buried in code.
Agentic evaluation is next — Oracle is packaging benchmark-integration knowledge as “skills” that agentic tools can use to convert research papers into executable evaluations.

References: HELM, OpenAI Evals, lm-evaluation-harness

🔗 Read the full article on Oracle Blogs

<

< the one before

>

Canada’s $2.3B AI Strategy: What Agencies Must Do Now

Morgan Stanley: 62% of CIOs Are Increasing Microsoft Azure AI Spend

OpenAI and Microsoft Expand AI Cyber Defense Against Autonomous Threats

AI Agent Feedback Loops Reveal What Logs Can’t Show

Why Token Efficiency—Not Volume—Is the Right AI Productivity Metric

AI Strategy Communication Drove 13-Point S&P 500 Performance Gap

Australia Launches Office of AI and National Strategy as Industry Calls for Faster Adoption

AI Models Score Well on Clear HR Tasks but Fail at Nuance, New Benchmark Study Finds

New Report: 59% of Employees Use AI to Complete Tasks They Were Never Trained For

AI Tool Fatigue Is Growing Among Cybersecurity Professionals as Validation Workload Increases

Nvidia's Open Nemotron Models Are Cutting Enterprise AI Costs by Up to 20x in Specialized Sectors

74% of C-Suite Executives Have Overstated Their AI Strategy Confidence, Survey Finds

UK's New Government Must Build an AI Strategy Independent of Big Tech Influence

26 Former Meta Employees Sue Over AI-Driven Disability Discrimination in 2026 Layoffs

26 Former Meta Employees Sue Over AI-Driven Disability Discrimination in 2026 Layoffs

Most Americans Use AI Every Day — But Won't Pay for It

Oracle Expands Agentic AI Platform With New Developer Tools for Fusion Enterprise Apps

Enterprise AI Governance Must Evolve as AI Agents Become Digital Workers

OpenAI ChatGPT Work: Agentic Workplace Assistant Powered by GPT-5.6

Starbucks In-House AI Tools Replace Microsoft and IBM Software

ITU Global AI Identity Standards for Autonomous Agent Accountability

Frontier AI Models Becoming Strategic Exports: China-US Supply Chain Risk

UK's New Government Must Build an AI Strategy Independent of Big Tech Influence

26 Former Meta Employees Sue Over AI-Driven Disability Discrimination in 2026 Layoffs

Most Americans Use AI Every Day — But Won't Pay for It

Oracle Expands Agentic AI Platform With New Developer Tools for Fusion Enterprise Apps

Enterprise AI Governance Must Evolve as AI Agents Become Digital Workers

OpenAI ChatGPT Work: Agentic Workplace Assistant Powered by GPT-5.6

Starbucks In-House AI Tools Replace Microsoft and IBM Software

ITU Global AI Identity Standards for Autonomous Agent Accountability

Frontier AI Models Becoming Strategic Exports: China-US Supply Chain Risk

Meta Launches Muse Image: Its First Model From the New Superintelligence Labs

OpenAI and Google Are Selling AI Tools to Pentagon-Listed Chinese Firms — Legally

Why Embedded AI Beats Standalone AI Tools: The Case for Invisible Workflow Integration

Context Engineering Is Replacing Prompt Engineering — Here's the Framework That Explains Why

AI Makes Developers 19% Slower But They Feel 20% Faster: What METR's RCT Reveals

Meta Launches Muse Image: Its First Model From the New Superintelligence Labs

OpenAI and Google Are Selling AI Tools to Pentagon-Listed Chinese Firms — Legally

Why Embedded AI Beats Standalone AI Tools: The Case for Invisible Workflow Integration

Context Engineering Is Replacing Prompt Engineering — Here's the Framework That Explains Why

AI Makes Developers 19% Slower But They Feel 20% Faster: What METR's RCT Reveals

Meta Releases Muse Image and Muse Spark 1.1 as AI Strategy Comes Into Focus

RIAA, Grammys, and SAG-AFTRA Launch Global AI Music Labeling Standard: 'AI-Generated' vs. 'AI-Assisted'

Why Enterprises Need an AI Operating System — Not More AI Tools

AI Agents in Accounting: Practitioners Say 'None' Own a Workflow End-to-End — Here's What Actually Works

Gartner: Customers Are 3x More Likely to Use ChatGPT Than Your Brand Chatbot for Customer Service

Starbucks Builds Internal AI to Replace $400M in Software — Microsoft, IBM, and Oracle in the Crosshairs

OpenAI GPT-Live Enables Real-Time Two-Way Voice Conversations With Full-Duplex Architecture

SoFi Deploys AI Tools Coach and Composer to Drive Personalized Member Engagement

Meta Muse Image Brings Generative AI Into Instagram, WhatsApp, and Advertiser Workflows

JetBrains AI for Teams: Unified Governance for Enterprise AI Tools

InsightFinder Brings Operational AI Agent ARI to iOS and Android for On-Call Engineers

How MercadoLibre Is Embedding AI Across Search, Logistics, Fintech, and Software Development

Meta Launches Muse Image: Generative AI Built Into Instagram Stories and WhatsApp

Adobe’s CMO Insight from Cannes: Your Agentic AI Is Only as Good as the Data Beneath It

Why 54% of Workers Bypass Company AI Tools — and What Leaders Must Fix

Microsoft's $2.5B Enterprise AI Shift: From Model-Building to Real-World Deployment

Skello Raises €200M to Expand AI-Powered Workforce Tools Across Europe's Frontline Economy

Alibaba's Claude Code Ban Is About Competitive Control, Not Just Security

Abu Dhabi Deploys Microsoft Copilot to 35,000 Civil Servants in Push for AI-Native Government by 2027

Shadow AI at Work: 40% of Australian Employees Are Sharing Customer Data with Public AI Tools

Operational AI Governance Is the Missing Infrastructure Layer in Every Enterprise AI Strategy

How RailYatri Achieved 60% Faster Infrastructure Provisioning Using Cloud AI

Generative AI and Creative Ownership: Why Copyright Uncertainty Is Now a Product Design Problem

Agentic AI vs. Generative AI: The Operational Difference Every Business Needs to Understand

The 4-AI-Tool Stack That Replaced an Entire Team — Inside a Solopreneur's 7-Figure System

Bangladesh Bank Bans Confidential Data in AI Tools: What It Means for Enterprise AI Governance

Microsoft Launches $2.5B 'Frontier Company' to Help Enterprises Adopt Multi-Model AI

Iran War Disrupts Abu Dhabi's AI Hub Ambitions — A Stress Test for Global AI Infrastructure

How a First-Time Founder Used AI to Build a Health Startup — and What It Signals for Business AI Adoption

Flipkart Deploys 250+ AI Models and Custom LLMs — A Blueprint for Enterprise AI at Scale

Zuckerberg Admits Meta's AI Agent Rollout Is Behind Schedule — What It Means for Enterprise AI

Meta's Muse Spark 'Watermelon' Update Brings Stronger Coding and Agentic AI for Enterprise

California's Poppy AI Platform Goes Statewide: A Government Blueprint for Generative AI at Scale

Alibaba Bans All Anthropic Claude AI Tools Internally, Citing Hidden Data Surveillance Code

UK Business AI Adoption Hits 29% in 2026, with Large Enterprises Leading the Surge

LinkedIn Launches AI Ad Tools to Help Small Businesses Create, Test, and Personalise Campaigns

Shadow AI Is Now the Rule, Not the Exception: 55% of UK Workers Use Unapproved Tools

Canadian Law Firms Are Getting More From AI Than Anyone Else — Here’s What They’re Doing Differently

71% of Japanese Companies Using Generative AI in Creative Work Don’t Disclose It

iboss Launches Free AI Visibility Platform — Instant Shadow AI Discovery for Any Organization

Shopify’s PyTorch Foundation Membership Signals Open-Source AI Is Now Core Business Infrastructure

Meta Shifts WhatsApp Business AI Agent to Token-Based Pricing Starting August 2026

Shadow AI Is Now the Rule, Not the Exception: 55% of UK Workers Use Unapproved Tools

51% of Global Capability Centers Stuck in AI Pilot Stage — New Playbook Maps the Path to Scale

TikTok's Agentic Hub Brings AI Agents to Advertising — HubSpot, Wix Among Launch Partners

Goldman Sachs: U.S. Business AI Adoption Crosses 20% — and Could Hit 24% by Year-End

SAP Creates Separate Business AI and Autonomous Suite Divisions in Major Leadership Restructure

Why 90% of Executives See No AI Productivity Gains — and How to Break the Pattern

Board Directors Are Using AI — Here's the Legal Framework They Need to Follow

Trident Digital Tech Targets the $622B AI Market by Embedding Intelligence Into Sovereign Infrastructure

Microsoft Doubles Down on Education AI as Usage Outpaces Formal Training

Meta Blocks Internal Use of Rival AI Coding Tools Over Model Distillation Risks

Consulting Firms Start Naming Dedicated AI Strategy Officers — Vertex Is the Latest

X Launches Hosted MCP Servers to Give AI Agents Real-Time Access to Social Data

Lightspeed and Klaviyo Unite to Automate Retail Marketing Across Every Channel

NFON Shareholders Back AI-First Communications Strategy as Revenue Reaches EUR 89.1 Million

Microsoft Copilot in Excel Gets Finance-Specific AI Skills and Real-Time Data Integrations

Canva's Grow 2.0 Merges Creative and Performance Marketing Into One Automated Platform

Four UK Regulators Are Now Using Generative AI in Enforcement — What Businesses Need to Know

Stay in Rhythm

Subscribe for insights that resonate • from strategic leadership to AI-fueled growth. The kind of content that makes your work thrum.

We’ll send you thoughtful, well-tuned insights • just enough to keep your strategy thrumming.

Something’s offbeat.
We couldn’t process your submission • try again in a moment.

Related thinking

More from Thrum

Additional pieces exploring adjacent ideas

Most Marketers Are Stuck at Stage 1 of AI Proficiency. Here's Why That's About to Matter.

I keep seeing the same pattern play out.A marketer learns to write better prompts. They get faster at drafting emails, social posts, blog outlines. They feel productive. They tell people they're...

December 24, 2025

Prompt Engineering Is Already Over. Here's What Replaces It.

Everyone's talking about prompt engineering like it's the finish line.It's not.It's barely the starting gate.The real unlock happens when you stop treating AI as a chatbot and start treating it as a...

December 29, 2025

The Seven Stages of AI Proficiency Every Marketer Needs to Understand

I've been watching marketers struggle with AI adoption for the past year. The pattern is consistent.81% of professionals believe they can use AI effectively. Only 12% actually possess the skills to...

December 19, 2025