In early June 2025, Google introduced its “Weather Lab” model, an AI-driven tool designed for forecasting the tracks and intensity of tropical cyclones. This model is a part of Google DeepMind’s broader suite of AI-based weather research models. Google announced that the Weather Lab model had shown promising results in pre-launch testing, claiming its accuracy was comparable to, and often exceeded, that of existing physics-based methods.
According to Google’s initial statement, the Weather Lab model was trained using a comprehensive dataset that reconstructed historical weather patterns and a specialized database containing detailed information on hurricane tracks, intensity, and size. To further evaluate its performance, Google partnered with the National Hurricane Center (NHC), a division of the National Oceanic and Atmospheric Service (NOAA), to assess the model’s capabilities in the Atlantic and East Pacific basins.
The Atlantic hurricane season remained relatively calm until a few weeks prior to the report, with overall activity below normal levels. This meant that there were limited opportunities to rigorously test the new model in real-world scenarios. However, approximately 10 days before the article’s publication, Hurricane Erin underwent rapid intensification in the open Atlantic Ocean, escalating into a Category 5 hurricane as it moved westward.
From a forecasting perspective, it was evident that Erin would not directly impact the United States. Nevertheless, meteorologists closely monitored the storm’s trajectory and intensity. Given Erin’s large size, there were concerns about its proximity to the East Coast of the United States, with potential impacts such as significant beach erosion, and its effects on Bermuda.
During an active storm, it can be challenging to determine which forecasting model provides the most accurate predictions. While real-time performance can offer insights, uncertainties persist until a thorough post-storm analysis is conducted. This analysis involves evaluating each model’s accuracy in predicting the storm’s path and intensity.
With Erin having dissipated, such an evaluation became possible. In what was described as the most significant test of the Atlantic season to date, Google’s Weather Lab reportedly delivered the best performance for forecasts up to 72 hours (three days). These findings were based on data compiled by James Franklin, former chief of the hurricane specialist unit at the National Hurricane Center.
Franklin’s analysis compared the performance of Google’s model (GDMI) against the National Hurricane Center’s official track forecast, as well as various physics-based models, including global forecast models and hurricane-specific models. Physics-based models, also known as numerical weather prediction models, rely on complex equations and initial atmospheric conditions to simulate atmospheric changes over time. These models require substantial computational power and have historically been a cornerstone of meteorological forecasting.
Over the past 25 years, advancements in computer hardware and improvements in the collection and input of real-time atmospheric data have led to significant reductions in hurricane track forecast errors. The data indicated that Google’s model not only outperformed the National Hurricane Center’s official track forecast but also surpassed numerous physics-based models.
In terms of intensity forecasts, Google’s model also demonstrated superior performance compared to other models within the first 72 hours. Its accuracy at the 48-hour mark was particularly noteworthy. The TVCN and IVCN models, which represent “consensus” models for track and intensity, are closely monitored by forecasters at the hurricane center. These models, which are not typically made public, provide a bias-corrected average of several top-performing models. The fact that Google’s model outperformed these consensus models was deemed significant.
Bias correction involves adjusting for known forecast biases in different models. From a forecasting standpoint, the three-to-five-day range is crucial for making informed decisions about evacuations and other hurricane preparations. While improvements in AI model performance are desired for this longer forecast range, the overall conclusion was that AI weather modeling is making substantial progress.
AI weather models are rapidly becoming essential tools for predicting high-impact events such as hurricanes. While Google’s model may not be the best performer for every storm, it will likely be given greater consideration in future forecasts. The rapid development of tools like Google’s Weather Lab and other AI weather models has demonstrated skill equivalent to the best physics-based models in a relatively short period. Continued improvements in these models could potentially establish them as the gold standard for certain types of weather prediction.
Eric Berger, the senior space editor at Ars Technica and a certified meteorologist, emphasized the increasing importance of AI in weather forecasting, noting that these models are quickly becoming a vital component of the forecaster’s toolkit. He cautioned that no single model will be the best for every storm but suggested that AI models like Google’s Weather Lab will be given more weight in future forecasting decisions.
Berger also highlighted the rapid progress of AI weather models, stating that they have already achieved skill levels comparable to the best physics-based models in a relatively short time. He concluded that if these models continue to improve, they could potentially become the gold standard for specific types of weather prediction.




