OpenAI researchers unveiled the CriticGPT AI model, which helps human trainers spot errors in ChatGPT’s responses. CriticGPT aims to produce detailed critiques that highlight mistakes, particularly in code outputs. This innovation addresses the inherent limitations of human review that occur in Reinforcement Learning from Human Feedback (RLHF).
Thus, it offers a scalable supervision mechanism that enhances the precision and reliability of AI systems.
Reportedly, while experimenting, researchers found that human trainers using CriticGPT to assess ChatGPT’s code outputs performed 60% better than those without this assistance. This improvement underscores CriticGPT’s potential to boost human-AI collaboration and ensure more thorough and accurate evaluations of AI outputs.
Further, CriticGPT-like models may be integrated into the RLHF labeling pipeline. Through this integration, AI trainers will have explicit AI support, facilitating the evaluation of advanced AI system outputs. This development addresses the core issue of RLHF, which is that human trainers increasingly struggle to detect minor errors in complex AI models.
The research team summarized the primary contributions of CriticGPT, which include:
Scalable Oversight Technique: This oversight technique greatly assists humans in more thoroughly detecting problems in real-world RLHF data
Enhanced Critique Detection: Critic-GPT-generated critiques catch bugs more effectively and are preferred more over humans
Improved Human-AI Collaboration: teams consisting of critic models and human contractors generate more thorough criticisms. According to research, the human-AI partnership reduces hallucinations
Force Sampling Beam Search (FSBS): This study provides Force Sampling Beam Search (FSBS), an inference-time sampling and scoring technique. This strategy well balances the trade-off between minimizing bogus concerns and discovering genuine faults in LLM-generated critiques.