Amazon has detailed the cause of this week’s massive AWS outage that disrupted everything from banking apps to connected smart beds — a software bug in its automation system that spiraled into a widespread service failure.

In a technical post published Thursday, Amazon Web Services (AWS) outlined how a cascading series of errors brought down thousands of websites and applications worldwide. The root of the problem was a defect in the automation managing DynamoDB’s domain name system (DNS) — the database service many AWS customers rely on for storing and accessing data.

According to AWS, a “latent defect” in the automated DNS management system created an empty DNS record for its US-East-1 data center in Virginia, one of the company’s busiest regions. The automation failed to repair itself as intended, forcing engineers to step in manually. During this time, many customers were unable to connect to DynamoDB, causing widespread disruptions across services that depend on it.

To prevent further incidents, AWS temporarily disabled its DynamoDB DNS planner and enactor automation systems globally while working on fixes and new safeguards.

The outage rippled across the internet, taking down or disrupting platforms like Signal, Snapchat, Roblox, Duolingo, and even banking websites and Ring doorbells. Monitoring site Downdetector recorded more than 8.1 million outage reports from users around the world.

Even smart home products weren’t spared. Customers of Eight Sleep, a company that makes internet-connected beds, found themselves unable to adjust their bed temperature or incline through the app during the downtime. CEO Matteo Franceschetti apologized to users on X and announced an update that enables Bluetooth controls for critical functions in case of future internet outages.

Experts say the incident underscores how fragile the modern internet has become. Dr. Suelette Dreyfus, a lecturer in computing and information systems at the University of Melbourne, warned that the world’s increasing dependence on a handful of cloud giants creates dangerous single points of failure.

“It’s not just AWS — though they control roughly 30% of the cloud market,” she said. “The real problem is that the internet’s original resilience has been eroded. We’ve concentrated critical services into the hands of a few massive providers, and when one fails, the ripple effects are enormous.”

While AWS restored most services within hours, the outage served as a sharp reminder of just how interconnected — and vulnerable — the digital infrastructure behind daily life has become.

Source