SeedAI Crash Course is a series designed to break down emerging and complex AI topics for non-technical and policy audiences. Is there a topic you’d like a Crash Course on? Let us know!
In September 2023, a weather-predicting AI model with a new type of architecture correctly predicted that Hurricane Lee would make landfall on Nova Scotia, Canada, ten days before it actually made landfall. Not only was this model significantly better (every other model could only make this prediction accurately five days ahead of time), but it also required significantly less time and training than models using other architectures.
While large language models like those that power ChatGPT have demonstrated extremely impressive capabilities, they still struggle in non-language based domains like modeling the real world. On the other hand, this new architecture, which is known as a “neural operator,” is built to avoid this limitation, promising to significantly improve our understanding of the physical world.
Neural operators are a type of AI architecture that can solve what are known as “partial differential equations,” which are a specific type of complex mathematical function crucial to doing modern science, especially physical sciences. Developed by a team of researchers at Caltech and Nvidia in 2020, neural operators enable scientists to solve problems involving complex simulations of physics in an incredible variety of domains. This includes developing weather models at unrivaled large scales, understanding long-term tumor growth to potentially facilitate early detection of cancer, designing—at unprecedented speeds—medical devices over 100 times cleaner than existing ones, and advancing theoretical physics by simulating subatomic particle interactions.
Tools like neural operators that allow AI to understand the physical world are thus crucial to the development of AI that can interact with the real world, the next big frontier of opportunities for the field. To fully understand the potential value of neural operators, it is important to understand why the underlying functions they solve—partial differential equations (PDEs)—can be so hard to solve with traditional methods.
Why are PDEs so important?
Say you are a rocket scientist trying to figure out how the heat on a certain part of your rocket—say, a square panel on the side—will change after takeoff, so you can determine whether the part is strong enough to use. Given some details, like the shape and initial temperature of that panel of the rocket, you can write a function that describes how the temperature changes at different locations on the panel over time. This is a well-known PDE called the heat equation, and it is used in virtually all modern engineering that involves heat in any capacity. What characterizes it as a PDE is that it captures how the rates of change of many different variables—unknown quantities representing anything from time to temperature, distance, or radioactivity—relate to each other.
Now that you have your PDE, you may want to know the temperature at specific points on the panel a few minutes after takeoff. However, to get those values from your function you need to know the rate of change (called “derivatives”) of temperature over location and time on the panel, since it is a PDE. This is nearly impossible to measure without running expensive and time-consuming real-world tests, rendering your function not particularly useful. To make it useful, you need to “solve” it by approximating the derivatives on a grid, where the rate of change becomes the change between different points on the grid. If the grid is fine enough, you can get an accurate approximation of the solution.
In simple cases, like the square metal panel, scientists can solve PDEs. But most real-world applications—like the rocket as a whole, which has a much more complicated shape—are far more complicated, making it dramatically more difficult (in fact sometimes impossible) to solve the PDE. In those cases, scientists rely on methods that necessarily reduce the detail of the solution. When studying complex systems where details are key, this has posed a significant challenge for scientists and engineers.
Moreover, existing methods also have practical limitations. Say you want to model the weather on a small scale for local forecasting and on a large scale to measure trends over years. Both being weather systems, these situations are governed by largely similar sets of PDEs. However, you may want much higher resolution on the small scale, short-term weather forecasts than on the long term climate predictions. Current methods require separate models for each scale even though the underlying physics is the same, which results in a significant resource drain when developing methods to solve nasty PDEs.
How neural operators help
While AI can be extremely useful for many kinds of complicated mathematical problems, using traditional mathematical methods—or even AI—to solve complicated PDEs previously required estimating the PDE solution function directly from the data, which reduces the resolution of the answer.
Neural operators are different: they can learn the process of solving a PDE. They can estimate, from a function relating variables with their rates of change, that same function without the rates of change. The key difference is that neural operators take functions as input and output instead of sequences of numbers, like traditional AI models do. Then, to manipulate the functions internally and find patterns, neural operators implement mathematical tools like Fourier Transforms that can split a function up into many parts and then group them back together.
Because of this, scientists can train a neural operator model on one level of resolution and output it on another, meaning scientists avoid the practical issue of training many models for the many levels of resolution they are interested in. Higher resolution also means more accuracy when simulations run for longer. This means that scientists can use the same AI model for many different tasks that require differing levels of resolution, saving countless hours and dollars on training and compute.
Looking ahead
With only five years on the scene, the world with neural operators is a world more prepared for natural disasters, better equipped to fight diseases, and closer to solving decades-old problems in science. By supporting research in this emerging architecture within AI, the U.S. can lead in developing faster, more accurate AI systems with extremely impactful practical and scientific applications.