On Friday, Meta announced the release of a new batch of AI models from its research division, including a “Self-Taught Evaluator” that will reduce the need for human involvement in AI development.
The release follows Meta’s introduction of the tool in an August paper, which detailed how it relies upon the same “chain of thought” technique used by OpenAI’s recently released o1 models to make reliable judgments about models’ responses.

This technique involves breaking down complex problems into smaller logical steps, which demonstrably improves the accuracy of responses on challenging problems in subjects like science, coding and math.
Meta’s researchers trained the evaluator model using entirely AI-generated data, eliminating human input at that stage.
The ability to use AI to evaluate AI reliably offers a clear pathway toward building autonomous AI agents that can learn from their own mistakes, two of the Meta researchers behind the project told Reuters.

Such agents will be digital assistants that are intelligent enough to carry out a vast array of tasks without human intervention.
Self-improving models will eliminate the need for the expensive and inefficient process of reinforcement learning from human feedback. This process requires input from human annotators who must have specialised expertise to label data accurately and verify that answers to complex maths and writing queries are correct.

“As AI becomes more and more superhuman, it will undoubtedly improve its ability to check its work, ultimately becoming better than the average human,” said Jason Weston, one of the researchers.
“The ability to self-teach and self-evaluate is essential for reaching this superhuman level of AI,” he said.

Source