astrobites.org

Using Machine Learning to Make A Really Big Detailed Simulation

Title: A Gigaparsec-Scale Hydrodynamic Volume Reconstructed with Deep Learning

Authors: Cooper Jacobus, Solene Chabanier, Peter Harrington, JD Emberson, Zarija Lukic, and Salman Habib

First Author’s Institution: University of California, Berkeley

Status: preprint on ArXiv

Simulations have become one of the quintessential tools in modern astrophysics. Whether you are trying to understand a star, an active galactic nucleus (AGN) disk, a galaxy, or the universe itself, there is probably a simulation for that. One of the most popular kinds of simulations in modern astrophysics is a hydrodynamic simulation of large-scale structure (e.g., Illustris, Eagle, Astrid), which is designed to mimic a chunk of the universe containing millions of galaxies. These simulations start with an n-body simulation in which billions of particles representing dark matter are simulated under the attractive influence of gravity. In addition to dark matter, baryonic matter– everyday matter like protons, neutrons, and electrons– are simulated as a kind of “fluid” riding on the sea of dark matter, hence the “hydrodynamic” part.

Every simulation of this type is a balancing act between three competing choices: the total size of the simulation, the computing power needed to run it, and the resolution, which, in simple terms, is a combination of the smallest distance and time scales at which the data in the simulation are reliable or meaningful. When designing a simulation, if you want to alter one of these three quantities, you must consider the effect on the other two. For example, if you want a larger simulation volume, you must increase the computing power or lower your resolution. What decisions you make when designing a simulation comes down to the resources available and the scientific goals of the simulation, and researchers who create and analyze simulations have developed many clever methods (for example, zoom-in techniques) to push what is possible with simulations to their computational limits.

Today’s authors use machine learning (ML) to chip away at the chains of these three competing factors by using an ML model to replace some computationally expensive aspects of a hydrodynamics simulation. A good ML process needs training and testing data, and today’s authors use the Nyx code to create theirs. They create two pairs of simulations (4 total simulations), each one a box of size 80 megaparsecs per side. For each pair, they make a high-resolution version and a low-resolution version. Then, they create a custom deep-learning model (the details of which we won’t get into directly) and train it using one pair to recreate a high-resolution output from the low-resolution input (see Figure 1 for an example). Then, they use the other pair of simulations to test how well their model performs in recreating the high-resolution version from the low-resolution version.

Figure 1 (figure 1 in the paper) – Example of a low-resolution simulation (left), the ML prediction of the high-resolution version based on the low-resolution simulation (middle), and the high-resolution simulation (right). The darker purple color represents a higher density, ⍴, of matter. ML prediction in the middle can recreate some of the finer features of the high-resolution simulation for substantially less computational cost. However, it is not able to fully recreate the higher-resolution simulation.

After using their two simulations to train and test their model, they go one step further. They use their model on a low-resolution simulation roughly 1000x bigger than the original simulation, almost 1 gigaparsec in box side length (in the real universe, this vast volume would contain tens to hundreds of millions of galaxies – figure 2 compares the original and larger simulation volumes)! They use their trained ML algorithm to turn that low-resolution simulation into an ML prediction of what a high-resolution box would look like. By doing this, they essentially bypass the huge amount of computing power it would take for a simulation of that size and resolution to be carried out, while retaining the important details and features of a higher-resolution simulation. Why would the authors want to make a simulation box so much bigger than the one they already have at such high resolution? Aren’t 80 megaparsec boxes big enough?

Figure 2 (figure 5 in the paper) – Size representation and comparison of the training and testing simulation (represented by the box on the right) compared to the much larger nearly one gigaparsec simulation box (left) made with the trained simulation machine learning algorithm.

Well, the authors are interested in a particular astrophysics case. They engage in this simulation machine learning process to investigate a phenomenon called the Lyman-alpha forest. Long story short, when light from bright, distant sources like AGN passes through the universe, two major things happen to that light on its way to our telescopes here close to Earth. First, it passes through neutral hydrogen gas, and second, the light is redshifted by the universe’s expansion. In the first step, the hydrogen absorbs a specific, well-known wavelength called the Lyman-alpha wavelength, which is then missing in the spectrum of light that reaches Earth. In the second step, because of the universe’s expansion, that missing gap can shift to longer wavelengths as the whole spectrum is redshifted. How much it shifts can tell us how far away the gas is. We can see various gaps in the spectrum if there are multiple gas clouds along the line of sight, resulting in a one-dimensional density profile of the hydrogen gas to very far distances. Knowing this hydrogen density can tell us all sorts of things about the universe and how it evolved.

Figure 3 (figure 2 in the paper) – Illustration of simulated Lyman-alpha absorption line recovery using machine learning. The top panel is a visual diagram of how the Lyman-alpha phenomenon results from photons from a distant quasar passing through neutral hydrogen gas. The third and fourth panels show slices of the baryon density in a high (third panel) and low (fourth panel) resolution simulation. The black line represents a line-of-sight observation, purple represents baryon density at that distance, with dark purple being denser. The bottom panel depicts the ML algorithm’s prediction based on the low-resolution simulation. The second panel shows the amount of light that passes through (values near 1) or is absorbed (values near 0) by hydrogen gas at a particular distance, with the dark purple curve showing absorption from the high-resolution simulation, gray from the low-resolution simulation, and light purple from the machine learning reconstructed output. Note that the reconstructed absorption matches the high-resolution simulation much more closely than the low-resolution simulation.

Researchers, including the authors, are working towards the near future when Lyman-alpha observations will build a three-dimensional map of this hydrogen distribution, a significant step forward from the many one-dimensional maps we currently have. However, using simulations to model the Lyman-alpha forest and test those future surveys requires a size and resolution combination that is not currently feasible for modern simulations. Hence, why today’s authors choose the size and resolution goals they do. Given their success in training a machine learning model to recreate such a large volume at higher resolution with lower computational costs, a whole new way to investigate the Lyman-alpha forest and the evolution of the universe as a whole is closer than ever.

Edited by Ivey Davis

Image Credit: adapted from Fernandez, Bird and Ho, DOI: 10.48550/arXiv.2309.03943

Discover more from astrobites

Subscribe to get the latest posts to your email.

Read full news in source page