Flash floods are among the most dangerous natural disasters worldwide, causing more than 5,000 deaths each year. They are also notoriously difficult to predict because they occur suddenly, develop rapidly, and often affect very small geographic areas. Now, Google believes it may have found an unexpected solution: training artificial intelligence to learn from decades of news reporting.
One of the biggest challenges in forecasting flash floods is the lack of consistent data. While meteorologists have long-term datasets for temperature, rainfall, and river levels, flash floods often occur too quickly and in locations without monitoring infrastructure. This creates a critical data gap that limits the effectiveness of modern deep learning weather models.
To address this issue, Google researchers turned to Gemini, the company’s large language model, to analyze around five million news articles published worldwide. From this massive dataset, the system identified approximately 2.6 million individual flood events. These reports were then converted into a structured, geo-tagged timeline dataset called Groundsource, effectively transforming qualitative news coverage into quantitative environmental data.
Using this dataset as a real-world baseline, researchers trained a forecasting model based on a Long Short-Term Memory (LSTM) neural network. The model combines historical flood information with global weather forecasts to estimate the probability of flash floods in specific areas.
The results are already being deployed. Google’s flash flood forecasting system is currently integrated into its Flood Hub platform, highlighting potential risks for urban areas across 150 countries. The platform also shares data with emergency response organizations worldwide, helping them anticipate and respond to flood events more quickly.
Early feedback suggests the system can make a meaningful difference. António José Beleza, an emergency response official at the Southern African Development Community who participated in early testing, said the tool improved their ability to respond to flooding events in vulnerable regions.
However, the system still has limitations. Its predictions currently operate at a relatively coarse resolution, identifying risk across areas of about 20 square kilometers. It is also less precise than systems such as the U.S. National Weather Service’s flood alert infrastructure, which benefits from dense radar networks and real-time precipitation monitoring.
Despite these constraints, the project was designed specifically for regions where advanced weather infrastructure does not exist. Many countries lack radar systems, river monitoring stations, or long-term meteorological records. By aggregating millions of historical reports, the Groundsource dataset helps fill those gaps.
According to Juliet Rothenberg, a program manager on Google’s Resilience team, the approach allows researchers to rebalance the global data landscape. By extracting structured information from written reports, AI models can make informed predictions even in areas where traditional datasets are scarce.
Beyond flood prediction, Google researchers believe this technique could be used to build datasets for other difficult-to-measure environmental events, such as heat waves or landslides. In many cases, these phenomena leave a trail of written documentation but limited structured data.
Experts in environmental modeling see this as part of a broader shift toward AI-driven weather prediction. Marshall Moutenot, CEO of Upstream Tech, noted that data scarcity remains one of the biggest challenges in geophysics. Ironically, there is an abundance of Earth observation data overall, yet very little reliable “ground truth” data available to validate predictive models.
Google’s approach demonstrates how AI can bridge that gap. By converting unstructured human knowledge into structured datasets, large language models may unlock new possibilities for forecasting natural disasters and improving global resilience.
We have helped 20+ companies in industries like Finance, Transportation, Health, Tourism, Events, Education, Sports.