Nvidia’s latest Blackwell AI chips, already impacted by delays, are now grappling with overheating issues in their server setups, raising concerns among customers about meeting timelines for launching new data centers, according to a report by The Information on Sunday.
The overheating occurs when the Blackwell GPUs are installed in server racks designed to accommodate up to 72 chips. Sources familiar with the matter revealed that this design flaw has left some customers worried about potential disruptions.
To address the issue, Nvidia has reportedly asked its suppliers to revise the rack designs multiple times. The information comes from Nvidia employees, customers, and suppliers involved in tackling the problem. However, the report did not specify which suppliers are involved.
A spokesperson for Nvidia told Reuters, “Nvidia is working closely with leading cloud service providers as an integral part of our engineering team and process. Design refinements are normal and anticipated during development.”
Delays Impact Major Customers
First unveiled in March, the Blackwell chips were initially slated for shipment in the second quarter of this year. However, delays have pushed back their release, potentially affecting key customers, including Meta Platforms, Alphabet’s Google, and Microsoft.
The Blackwell chip is a significant leap forward for Nvidia, combining two silicon squares the size of its previous GPUs into a single unit. This innovation promises a dramatic 30-fold increase in performance for tasks such as powering chatbot responses.
Despite these advancements, Nvidia’s ability to overcome the overheating challenges will be critical for the successful rollout of its cutting-edge AI chips.
We have helped 20+ companies in industries like Finance, Transportation, Health, Tourism, Events, Education, Sports.