(EMEA) Deepchecks, a leading company in the MLOps space that has been focusing on testing AI systems, is thrilled to announce the launch of its innovative LLM Evaluation solution. This significant new solution is designed to address the unique challenges posed by Large Language Models (LLMs) and is set to revolutionize the way AI systems are validated.
Deepchecks has been at the forefront of AI system validation since the launch of its open-source package in January 2022 for testing ML models. The company has garnered widespread recognition, amassing over 3,000 GitHub stars and more than 900,000 downloads. The enthusiastic response from the AI and machine learning community motivated Deepchecks to expand its offerings beyond tabular data testing to meet the diverse needs of its growing user base.
The LLM Evaluation solution comes as a response to the increasing demand for effective evaluation tools for LLM-based applications. Deepchecks recognized the unique challenges that LLMs present, including assessing both accuracy and model safety (addressing bias, toxicity, PII leakage) and the need for flexible testing approaches due to the possibility of multiple valid responses for a single input.
Key features of Deepchecks' LLM Evaluation solution include:
Dual Focus: Evaluating both the quality of LLM responses in terms of accuracy, relevance, and usefulness, as well as ensuring model safety by addressing bias, toxicity, and adherence to privacy policies.
Flexible Testing: Adapting to scenarios where LLMs can produce multiple valid responses for a single input, making it essential to provide flexible testing approaches, including the use of curated "golden sets."
Diverse User Base: Recognizing that LLM-based applications require input and control from a variety of stakeholders, including data curators, product managers, and business analysts, in addition to data scientists and machine learning engineers.
Phased Approach: Acknowledging the distinct phases involved in LLM-based app development, including Experimentation/Development, Staging/Beta Testing, and Production, which require tailored evaluation strategies.
"From what we've been seeing in the market, companies are managing to build 'quick-and-dirty' POCs extremely quickly based on APIs such as OpenAI combined with prompt engineering." said Philip Tannor, CEO at Deepchecks. "However, the next steps leading to a production ready application are taking a lot longer than initial expectations, largely due to difficulties with quality, consistency and adherence to policies. We believe that our LLM Evaluation solution can really move the needle in terms of delivering LLM-based applications quickly and safely"
Deepchecks recently announced a $14M funding in a seed round. The investment was led by Alpha Wave Ventures with participation from Hetz Ventures and Grove Ventures.