Legal representatives for The New York Times and Daily News, currently suing OpenAI for allegedly using their copyrighted materials to train AI models without authorization, claim that OpenAI engineers inadvertently erased data critical to the lawsuit.
Earlier this fall, OpenAI agreed to provide two virtual machines for the plaintiffs’ legal team to search its AI training datasets for their copyrighted content. Virtual machines are software-based environments used for testing and data analysis. According to a letter filed in the U.S. District Court for the Southern District of New York, attorneys for the publishers and their experts have spent over 150 hours since November 1 analyzing the provided data.
However, on November 14, engineers at OpenAI reportedly deleted all search-related data stored on one of the virtual machines. While OpenAI made efforts to recover the information and succeeded to an extent, the process led to the permanent loss of folder structures and file names. This rendered the recovered data useless for determining where the plaintiffs’ articles might have been used in training OpenAI’s AI models, the letter stated.
The plaintiffs argue that this mishap forced them to restart their investigation, consuming significant time and resources. “A week’s worth of our experts’ and lawyers’ work must now be redone,” their legal team noted, adding that they filed a supplemental letter immediately upon discovering the issue.
Although the plaintiffs do not believe the deletion was intentional, they emphasized that the incident highlights OpenAI’s unique ability to efficiently search its own datasets for potentially infringing material.
OpenAI’s Response and Denial of Wrongdoing
On November 22, OpenAI’s legal counsel responded to the claims, asserting that no evidence was deleted and attributing the issue to a configuration request made by the plaintiffs. According to OpenAI’s attorneys, the requested change affected the folder structure and file names on a temporary cache drive but did not result in the loss of any files.
“There is no reason to think that any files were actually lost,” OpenAI’s response stated, shifting the blame for the technical issue back onto the plaintiffs’ configuration demands.
Broader Implications of the Case
This dispute centers on whether OpenAI’s practice of training its AI models on publicly available content, including materials from The Times and Daily News, constitutes fair use. OpenAI argues that training its models on such data is lawful and doesn’t require licensing, even when the resulting AI tools generate revenue.
Nonetheless, OpenAI has entered licensing agreements with several major publishers, including the Associated Press, Axel Springer (owner of Business Insider), Financial Times, Dotdash Meredith (parent company of People), and News Corp. While terms remain undisclosed, reports indicate Dotdash is receiving at least $16 million annually for its partnership.
Despite ongoing questions, OpenAI has neither confirmed nor denied whether its AI systems have used specific copyrighted works without permission.
We have helped 20+ companies in industries like Finance, Transportation, Health, Tourism, Events, Education, Sports.