US Federal News Bureau

DoD AI Office Partners with Scale AI to Create Benchmark Tests for GenAI Models

Scale AI will develop customized benchmark tests tailored to DoD use cases.

Written by: CDO Magazine Bureau

Updated 6:52 PM UTC, Thu February 22, 2024

The U.S. Department of Defence’s (DoD) Chief Digital and Artificial Intelligence Office (CDAO) is partnering with Scale AI, a test and evaluation (T&E) partner for artificial intelligence companies, to create a comprehensive T&E framework for the responsible use of large language models (LLMs) within the DoD.

Scale AI will develop customized benchmark tests tailored to DoD use cases, integrate them into the T&E platform, and aid the CDAO’s T&E strategy for LLMs.

The move aims to furnish a safety framework for AI deployment, assess model performance, offer real-time feedback to warfighters, and develop specialized evaluation sets for military applications.

“Testing and evaluating generative AI will help the DoD understand the strengths and limitations of the technology, so it can be deployed responsibly. Scale is honored to partner with the DoD on this framework,” said Alexandr Wang, founder and CEO of Scale AI.

Also Read

DoD Enhances Capacity for Rapid AI Adoption

This effort will enhance the DoD’s T&E policies for generative AI by quantitatively measuring data and qualitatively assessing user feedback. The evaluation metrics will help the department identify AI models that are ready for military use, providing accurate results aligned with DoD terminology and knowledge bases.

In January this year, Michael C. Horowitz, Deputy Assistant Secretary of Defense for Force Development and Emerging Capabilities, said that DoD has improved its ability to deploy new technologies, particularly AI due to key organizational and strategy updates.

He pointed to the establishment of CDAO, tasked with overseeing the department’s comprehensive integration of data. Additionally, he referenced increased DoD investments in research, development, test and evaluation, and new initiatives aimed at accelerating experimentation across the department.