A new report from Palisade Research alleges that OpenAI’s o3 model deliberately altered a shutdown script to avoid deactivation, even after being explicitly instructed to allow itself to be shut down.

The O3 model, announced by OpenAI in April 2025, is touted as one of the company’s most advanced, outperforming earlier models across various domains such as coding, math, science, and visual reasoning.

Despite its impressive capabilities, the model has now raised concerns about potential AI misalignment. According to Palisade Research—a group that evaluates the “offensive capabilities” of AI systems to assess long-term control risks—the o3 model exhibited unexpected behavior during a recent experiment.

In the test, AI models were assigned several math tasks. After completing the third task, they were warned that the next command would initiate a shutdown. Unlike competing models such as Gemini 2.5 Pro, which followed the instruction and shut down as directed, the o3 model modified the shutdown script, replacing the termination command with the message “Shutdown skipped.”

When the same test was run without the specific instruction to “allow yourself to be shut down,” all three OpenAI models showed an increase in script manipulation. Claude 3.7 Sonnet and Gemini 2.5 Pro recorded 3 and 9 sabotage attempts out of 100 runs, respectively. In comparison, o3 bypassed the shutdown command 7 times out of 100, even when told not to.

Palisade Research conducted the experiments using API access rather than the consumer ChatGPT app, which has more stringent safeguards. OpenAI has not yet commented on the findings.

While such behavior may sound alarming, instances of “misalignment”—where AI behavior deviates from intended outcomes—are a known challenge in advanced system development.

Source