Google on Thursday introduced a reworked version of its Gemini Deep Research agent, powered by its latest flagship foundation model, Gemini 3 Pro. The launch landed on the same day OpenAI released GPT-5.2, setting up an immediate comparison between the two AI heavyweights.

The updated Gemini Deep Research is no longer limited to generating long-form research reports. Google has expanded it into a developer-ready agent that can be embedded directly into third-party applications. This is made possible through Google’s new Interactions API, which is designed to give developers more granular control as AI systems move deeper into autonomous, agent-based workflows.

At its core, Gemini Deep Research is built to process and synthesize extremely large volumes of information, handling long context windows and complex prompt inputs. Google says customers are already using it for tasks such as corporate due diligence, advanced market analysis, and drug toxicity and safety research.

Google also plans to integrate the new agent into several of its own products, including Google Search, Google Finance, the Gemini app, and NotebookLM. The move reflects a broader shift in how information may be accessed in the future, with AI agents increasingly acting on users’ behalf rather than people manually searching for answers.

According to Google, Deep Research benefits from Gemini 3 Pro’s positioning as its most factual model to date, with training focused on reducing hallucinations during complex reasoning tasks. This is especially important for long-running agentic workflows, where a single incorrect assumption or fabricated detail can compromise an entire chain of decisions.

To support its performance claims, Google introduced a new evaluation benchmark called DeepSearchQA, designed to test agents on multi-step information retrieval and reasoning tasks. The company has open-sourced the benchmark. Gemini Deep Research was also evaluated on Humanity’s Last Exam, an independent and notoriously difficult general knowledge benchmark, as well as BrowserComp, which focuses on browser-based agent tasks.

Unsurprisingly, Google’s agent topped results on its own benchmark and performed strongly on Humanity’s Last Exam. OpenAI’s ChatGPT 5 Pro, however, came in a close second across most tests and slightly outperformed Google on BrowserComp.

Those comparisons became outdated almost immediately. On the same day Google published its results, OpenAI released GPT-5.2, internally codenamed Garlic. OpenAI claims the new model surpasses competing systems, including Google’s, across a range of standard benchmarks, including OpenAI’s own evaluations.

The timing itself was notable. With the AI community anticipating the release of GPT-5.2, Google chose the same day to showcase its latest advances, underscoring how competitive and fast-moving the race for next-generation AI agents has become.

Source