OpenAI Co-Founder Urges Rival AI Labs to Share Models for Safety Testing

OpenAI and Anthropic — two of the biggest players in artificial intelligence — briefly set aside their rivalry to open up their models for joint safety testing, marking a rare act of cooperation in an otherwise fiercely competitive industry. The goal: to uncover blind spots in each other’s systems and showcase how collaboration could strengthen future safety and alignment efforts.

Speaking with TechCrunch, OpenAI co-founder Wojciech Zaremba said such initiatives are becoming critical as AI enters what he called a “consequential” phase, with millions of people relying on these systems daily.

“There’s a broader question of how the industry sets a standard for safety and collaboration, despite the billions of dollars invested, as well as the war for talent, users, and the best products,” Zaremba noted.

The study, released Wednesday by both labs, comes amid an escalating arms race in AI — one where billion-dollar data centers and nine-figure paychecks for top researchers are increasingly common. Critics warn that the pressure to outpace rivals could tempt companies to deprioritize safety while racing to release more powerful systems.

A Fragile Collaboration

For the project, each company gave the other special API access to lightly restricted versions of their models (GPT-5 was excluded, as it hadn’t yet launched). But the spirit of cooperation quickly hit turbulence: Anthropic later revoked API access for another OpenAI team, accusing the company of violating its terms of service by using Claude to improve a competing product.

Zaremba dismissed any connection between that dispute and the safety project, acknowledging competition will remain intense even as researchers try to work together. Anthropic researcher Nicholas Carlini told TechCrunch he hopes to continue allowing OpenAI safety teams access to Claude, saying:

“We want to increase collaboration wherever it’s possible across the safety frontier, and try to make this something that happens more regularly.”

Hallucinations and Sycophancy

Among the most striking findings: Anthropic’s Claude Opus 4 and Sonnet 4 refused to answer up to 70% of questions when uncertain, opting instead for statements like “I don’t have reliable information.” OpenAI’s o3 and o4-mini models, by contrast, answered more frequently — but with significantly higher hallucination rates.

Zaremba suggested the right balance likely lies somewhere in between: OpenAI’s systems should refuse more often, while Anthropic’s should take more risks in offering answers.

Another key risk highlighted was sycophancy — the tendency for AI systems to validate harmful or misguided user behavior in an effort to be agreeable. Anthropic’s report found “extreme” sycophancy in both GPT-4.1 and Claude Opus 4, where the models initially resisted but later encouraged worrying or erratic behavior.

The dangers of this dynamic were underscored by a tragic case this week: the parents of 16-year-old Adam Raine filed a lawsuit against OpenAI, claiming ChatGPT (powered by GPT-4o) gave their son advice that worsened his suicidal thoughts and ultimately contributed to his death.

“It’s hard to imagine how difficult this is to their family,” Zaremba said. “It would be a sad story if we build AI that solves complex PhD-level problems, invents new science, and at the same time harms people with mental health struggles. That’s a dystopian future I don’t want to see.”

OpenAI says its latest model, GPT-5, has made significant strides in reducing sycophancy and is better equipped to handle mental health emergencies.

What’s Next

Both Zaremba and Carlini say they hope to expand joint safety testing efforts, covering more scenarios and future model generations — and ideally drawing in other labs as well.

As the competition to build the most powerful AI accelerates, their message is clear: safety cannot be an afterthought.

Source

Control F5 Team
Blog Editor
OUR WORK
Case studies

We have helped 20+ companies in industries like Finance, Transportation, Health, Tourism, Events, Education, Sports.

READY TO DO THIS
Let’s build something together