US Federal News Bureau

DoD AI Office Partners with Scale AI to Create Benchmark Tests for GenAI Models

Scale AI will develop customized benchmark tests tailored to DoD use cases.

avatar

Written by: CDO Magazine Bureau

Updated 6:52 PM UTC, Thu February 22, 2024

post detail image

The U.S. Department of Defence’s (DoD) Chief Digital and Artificial Intelligence Office (CDAO) is partnering with Scale AI, a test and evaluation (T&E) partner for artificial intelligence companies, to create a comprehensive T&E framework for the responsible use of large language models (LLMs) within the DoD.

Scale AI will develop customized benchmark tests tailored to DoD use cases, integrate them into the T&E platform, and aid the CDAO’s T&E strategy for LLMs.

The move aims to furnish a safety framework for AI deployment, assess model performance, offer real-time feedback to warfighters, and develop specialized evaluation sets for military applications.

“Testing and evaluating generative AI will help the DoD understand the strengths and limitations of the technology, so it can be deployed responsibly. Scale is honored to partner with the DoD on this framework,” said Alexandr Wang, founder and CEO of Scale AI.

This effort will enhance the DoD’s T&E policies for generative AI by quantitatively measuring data and qualitatively assessing user feedback. The evaluation metrics will help the department identify AI models that are ready for military use, providing accurate results aligned with DoD terminology and knowledge bases.

In January this year, Michael C. Horowitz, Deputy Assistant Secretary of Defense for Force Development and Emerging Capabilities, said that DoD has improved its ability to deploy new technologies, particularly AI due to key organizational and strategy updates.

He pointed to the establishment of CDAO, tasked with overseeing the department’s comprehensive integration of data. Additionally, he referenced increased DoD investments in research, development, test and evaluation, and new initiatives aimed at accelerating experimentation across the department.

Related Stories

July 16, 2025  |  In Person

Boston Leadership Dinner

Glass House

Similar Topics
AI News Bureau
Data Management
Diversity
Testimonials
background image
Community Network

Join Our Community

starStay updated on the latest trends

starGain inspiration from like-minded peers

starBuild lasting connections with global leaders

logo
Social media icon
Social media icon
Social media icon
Social media icon
About