(© JanBerounsky - stock.adobe.com)
In a nutshell
Cambridge researchers developed “Aardvark Weather,” the first AI system to produce accurate weather forecasts directly from raw data without using traditional physics-based models
The system matches or outperforms established forecasting models while using only 8% of the data and can generate forecasts in seconds rather than hours
This breakthrough could democratize access to sophisticated weather forecasting, especially for regions that can’t afford traditional supercomputer-based systems
CAMBRIDGE, England — People often joke that meteorologists have the best job because they can be completely wrong and still keep their job. It’s why many are wondering if forecasting the weather can ever be a more exact science. Now, researchers have created the first AI system capable of producing accurate weather forecasts directly from raw observational data, bypassing the complex numerical models that have dominated meteorology for decades. Named “Aardvark Weather,” this system could revolutionize forecasting methods and democratize access to sophisticated weather predictions worldwide.
Traditional weather prediction involves a chain of specialized numerical models running on supercomputers—a complex workflow refined over 70 years. You may hear your local meteorologist talking about the European model and the GFS model and the ICON model, but those days could soon be over. Aardvark transforms observations from satellites, weather stations, ships, and other instruments directly into forecasts without relying on conventional physics-based simulations.
The system’s forecasts aren’t just faster—they’re competitive with established operational systems while using only about 8% of the observational data that feeds into traditional numerical weather prediction (NWP) systems. Just a few years ago, experts suggested that replacing the entire NWP pipeline with artificial intelligence would require “a number of fundamental breakthroughs.” The Cambridge-led research team has now demonstrated those breakthroughs have arrived ahead of schedule.
Study authors envision this technology making sophisticated weather forecasting more accessible globally, including in developing regions that lack the resources to run traditional NWP models. The system can be fine-tuned for specific regions and variables, potentially opening doors for custom weather models serving agriculture, renewable energy, insurance, and other weather-dependent sectors.
“This will be transformational for weather prediction,” the researchers note, in their paper, published in the journal Nature. They point to the potential for “reducing computational costs, removing bias from inflexible aspects of NWP systems, and enabling fast prototyping and optimization for specific tasks.”
Weather apps rely on the same supercomputer models used in traditional forecasting, but Aardvark could change where these platforms pull information from. (© Pixel-Shot – stock.adobe.com)
Aardvark’s Weather Forecasting Magic
Aardvark combines three interconnected modules:
An encoder that processes raw observations to estimate the current atmospheric state
A processor that generates forecasts for different timeframes
A decoder that converts these forecasts to usable predictions for specific locations
The encoder takes in raw measurements from satellites, weather stations, ships, and weather balloons. Instead of using processed data that traditional models rely on, it works directly with raw measurements. For scattered data points like weather stations, specialized neural networks convert these to gridded representations.
The processor then takes this state estimate and produces forecasts from one to ten days ahead, covering key atmospheric variables like wind, temperature, humidity, and pressure at multiple levels on a global grid. Finally, the decoder creates local predictions at specific weather station locations.
Breaking Performance Barriers
For global forecasts, Aardvark matched or exceeded the Global Forecast System (GFS) across most variables and timeframes, while approaching the performance of the European Centre for Medium-Range Weather Forecasting’s high-resolution system—widely considered the gold standard.
For local station forecasts of temperature and wind speed, Aardvark remained accurate up to ten days ahead, competing with more complex operational systems that incorporate human forecaster input. Over regions like West Africa and the Pacific, it outperformed traditional methods consistently.
The computational improvement is dramatic. Generating a complete forecast takes approximately one second on four NVIDIA A100 GPUs, compared to roughly 1,000 node-hours required by traditional systems—a reduction that could make sophisticated forecasting accessible to regions currently unable to afford conventional systems.
It can be incredibly frustrating when the skies open up on a day where the weather forecast called for a clear day. (Photo by Genaro Servín from Pexels)
The Future of Weather Forecasts
Researchers acknowledge several limitations in the current system. Aardvark doesn’t yet run at the resolution of leading NWP models, lacks ensemble forecasting capabilities (providing a range of possible outcomes rather than a single prediction), and requires further work to incorporate new observational instruments for which there’s no training data. Future developments could extend the system to higher resolutions, additional variables, and specialized applications like severe weather warnings or seasonal forecasts.
The ability to fine-tune the system for specific regions and variables represents a significant advancement. In traditional NWP, implementing new features can take considerable development time and expertise. End-to-end data-driven systems like Aardvark bypass this complexity, making it easier to create customized models for specific applications—from agriculture to renewable energy to emergency management.
“These results are just the beginning of what Aardvark can achieve,” said lead author Anna Allen. “This end-to-end learning approach can be easily applied to other weather forecasting problems, for example, hurricanes, wildfires, and tornadoes. Beyond weather, its applications extend to broader Earth system forecasting, including air quality, ocean dynamics, and sea ice prediction.”
Paper Summary
Methodology
The research team developed Aardvark as a modular system with three key components, each addressing different aspects of the weather forecasting challenge. The encoder module takes in raw observations from multiple sources and produces a gridded estimate of the current atmospheric state. This includes data from satellites (like scatterometers for surface wind, infrared and microwave sounders for temperature and humidity profiles), surface weather stations, marine platforms, and radiosondes (weather balloons). Rather than using the processed retrievals that feed traditional models, Aardvark works directly with raw measurements along with metadata about the observations. For irregularly spaced data (like weather stations), specialized neural network components called “SetConv layers” handle the conversion to gridded representations.
The processor module takes this initial state estimate and produces forecasts for different lead times. It consists of ten separate vision transformers (a type of neural network architecture) that are composed together to generate predictions from one to ten days ahead. These predictions cover key atmospheric variables at multiple pressure levels—including wind components, temperature, humidity, and geopotential—on a global 1.5-degree grid. The system works autoregressively, feeding the output from one time step as input to predict the next.
The decoder module then takes these global forecasts and produces local predictions at specific weather station locations. It uses a lightweight convolutional neural network architecture to translate the gridded forecasts to point predictions. The entire system was trained on data from before 2018, with 2018 kept as the test year and 2019 as validation. Researchers first pre-trained individual modules on high-quality reanalysis data, then fine-tuned them together, allowing them to optimize the entire pipeline for specific tasks.
Results Breakdown
Aardvark’s performance proved impressive across multiple metrics. For global forecasts, the system matched or outperformed the Global Forecast System (GFS) across most variables and lead times, with geopotential at 500hPa being the only exception. It also approached the skill of the more sophisticated HRES system from the European Centre for Medium-Range Weather Forecasting. The system showed particular strength in surface variables and at longer lead times, though it had larger errors at higher atmospheric levels and shorter lead times compared to operational models.
For station forecasts of 2-meter temperature, Aardvark competed effectively with station-corrected HRES over the contiguous United States and Europe, while outperforming it in West Africa and the Pacific. Remarkably, it matched the performance of the National Digital Forecast Database (NDFD) over the US—a system that combines multiple models with human forecaster input. For 10-meter wind speed, Aardvark had higher errors than station-corrected HRES over the US but significantly outperformed the NDFD baseline. Over Europe, it matched HRES up to four days ahead and outperformed it thereafter.
The researchers demonstrated that end-to-end fine-tuning could further improve performance. When optimized for specific regions, Aardvark saw error reductions of 6% for temperature forecasts over Europe, West Africa, the Pacific, and globally, with a 3% improvement over the US. For wind speed, smaller but statistically significant improvements of 1-2% were observed across most regions.
Limitations Discussed
Despite its impressive performance, Aardvark faces several limitations in its current form. The researchers acknowledge that like all current AI-based weather prediction systems, it doesn’t yet operate at the resolution of leading operational NWP models like the IFS. Future work will need to increase grid resolution and develop ensemble forecasting capabilities to provide probabilistic predictions rather than single deterministic forecasts.
The system also needs solutions for integrating new observational instruments for which no training data exists. The team suggests that training on simulated data could address this challenge. Additionally, methods for handling observation drift and changes in data characteristics over time will be necessary, possibly through regular fine-tuning with recent data.
Another limitation is the spectral blurring that occurs at longer lead times, a phenomenon commonly observed in data-driven weather forecasting systems. This effect causes forecasts to lose some small-scale detail as prediction horizons extend further. Although Aardvark maintains meaningful signals even at longer lead times, this remains an area for improvement.
The current implementation also focuses on a limited set of variables and applications. Extending the system to support additional forecast variables, specialized decoder modules for different end-user forecasts (like hurricane, flood, or fire weather warnings), and modeling other Earth system components would increase its utility.
Funding and Disclosures
This work was supported by several institutions and funding sources. The research was conducted with support from The Alan Turing Institute, with computational resources provided through their infrastructure. Anna Allen received funding from the UKRI Centre for Doctoral Training in the Application of Artificial Intelligence to the study of Environmental Risks (AI4ER) at the University of Cambridge, with additional studentship funding from Google DeepMind. Other researchers were supported by various grants and scholarships, including funding from the Cambridge Trust, Qualcomm Innovation Fellowship, Huawei, EPSRC grants, and the Data Sciences Institute at the University of Toronto. Richard E. Turner received support through an EPSRC Prosperity Partnership grant between the University of Cambridge and Microsoft. The authors declared no competing interests.
Publication Information
The paper “End-to-end data-driven weather prediction” was authored by a team led by Anna Allen and Stratis Markou (equal contribution first authors) along with Will Tebbutt, James Requeima, Wessel P. Bruinsma, Tom R. Andersson, Michael Herzog, Nicholas D. Lane, Matthew Chantry, J. Scott Hosking, and Richard E. Turner (corresponding author). The research represents a collaboration between the University of Cambridge, The Alan Turing Institute, Vector Institute (University of Toronto), Microsoft Research, Google DeepMind, the European Centre for Medium-Range Weather Forecasts, and the British Antarctic Survey. The paper was received on July 10, 2024, and published in the journal Nature on March 20, 2025.