US Federal News Bureau

DoD AI Office Partners with Scale AI to Create Benchmark Tests for GenAI Models

Scale AI will develop customized benchmark tests tailored to DoD use cases.

avatar

Written by: CDO Magazine Bureau

Updated 6:52 PM UTC, Thu February 22, 2024

post detail image

The U.S. Department of Defence’s (DoD) Chief Digital and Artificial Intelligence Office (CDAO) is partnering with Scale AI, a test and evaluation (T&E) partner for artificial intelligence companies, to create a comprehensive T&E framework for the responsible use of large language models (LLMs) within the DoD.

Scale AI will develop customized benchmark tests tailored to DoD use cases, integrate them into the T&E platform, and aid the CDAO’s T&E strategy for LLMs.

The move aims to furnish a safety framework for AI deployment, assess model performance, offer real-time feedback to warfighters, and develop specialized evaluation sets for military applications.

“Testing and evaluating generative AI will help the DoD understand the strengths and limitations of the technology, so it can be deployed responsibly. Scale is honored to partner with the DoD on this framework,” said Alexandr Wang, founder and CEO of Scale AI.

This effort will enhance the DoD’s T&E policies for generative AI by quantitatively measuring data and qualitatively assessing user feedback. The evaluation metrics will help the department identify AI models that are ready for military use, providing accurate results aligned with DoD terminology and knowledge bases.

In January this year, Michael C. Horowitz, Deputy Assistant Secretary of Defense for Force Development and Emerging Capabilities, said that DoD has improved its ability to deploy new technologies, particularly AI due to key organizational and strategy updates.

He pointed to the establishment of CDAO, tasked with overseeing the department’s comprehensive integration of data. Additionally, he referenced increased DoD investments in research, development, test and evaluation, and new initiatives aimed at accelerating experimentation across the department.

Related Stories

October 7, 2025  |  In Person

Cincinnati Global Leadership Summit – Data

Westin Cincinnati - Downtown

Similar Topics
AI News Bureau
Data Management
Diversity
Testimonials
background image
Community Network

Join Our Community

starElevate Your Personal Brand

starShape the Data Leadership Agenda

starBuild a Lasting Network

starExchange Knowledge & Experience

starStay Updated & Future-Ready

logo
Social media icon
Social media icon
Social media icon
Social media icon
About