It seems like everyone wants a piece of Lu Lu’s research.
Biologists. Physicians. Astronomers. Climatologists. Physicists. Chemists. Materials scientists. The federal government. Even the occasional energy company and insurance firm.
They all see possibilities in Lu’s data science methodologies for effectively using artificial intelligence (AI) to do fundamental science. And Lu, as assistant professor of statistics and data science in Yale’s Faculty of Arts and Sciences, sees the possibilities, too.
“I’m not a domain expert in geoscience, biology, fire insurance, climate change, heart or blood diseases, or astronomy,” Lu said, a wide grin animating his youthful face. “But I have tools that can help in all of those areas.”
Lu is in the vanguard of a movement he refers to as “Physics-Informed Machine Learning.” It is a new way of working with data that infuses traditional data science and machine learning/AI techniques with a non-traditional element: physical principles and physics-based equations.
The concept, in a broad sense, has vaulted into prominence over the past few years, with researchers and institutions worldwide taking a long look at how AI-based methods may be able to conduct cutting-edge science.
Yale News spoke with Lu, who is a faculty member of the Institute for Foundations of Data Science, the Wu Tsai Institute, and the Center for Algorithms, Data, and Market Design at Yale, about his own work in this new field and where it will take him next. The interview has been edited and condensed.
What are the essential differences between your approach to data science and more traditional computing and data science techniques?
Lu Lu: In traditional scientific computing, we apply computational techniques to a physical or biological system to simulate things. But in order to do that, we need concrete mathematical equations that are applied to the system. In other words, we need a complete understanding of a system we are simulating.
But in biology, for example, there is often no way to fully understand a system before you simulate it. On the other hand, machine learning, especially deep learning, usually requires a large amount of data for training. But in many science and engineering problems, it is often difficult to obtain necessary data of high accuracy. To solve these issues, we do something that is in-between.
Something that brings in physics?
Lu: Yes, we take a limited amount of data about the system — a smaller or imperfect data set — and blend the data with physics and machine learning.
A small group of us began experimenting with this approach more than five years ago, integrating partial differential equations with machine learning. In the beginning it was hard and there were many debates about it within the research community. There is still some debate.
At this stage, more people have begun to see this approach as a powerful tool. There has been an exponential increase in the number of people applying this method in their research. It is exciting.
Let’s talk about some of those applications. What can you tell us about your ongoing work with the U.S. Department of Energy [DOE]?
Lu: This work is being supported by a $4 million grant announced in September — one part of a larger initiative to look at ways of applying artificial intelligence to fundamental science. My project deals with data sets of large volume or with privacy concerns, built on my ongoing DOE Early Career Award. We develop federated/distributed AI methods to preserve privacy and use data spread across multiple institutions.
For example, there are many climate centers around the world collecting data. As you can imagine, it is a high volume of data — it would take a long time and a major financial commitment to transfer this amount of data to another server in order to perform new climate predictions. My focus is on developing an AI method for doing efficient and robust climate research without expensive data transfer.
Another application for this is geoscience data, relating to the subsurface of Earth’s structure, to study earthquakes, and energy sources.
You’re also funded by the National Science Foundation. What is that research focused upon?
Lu: That grant is also looking at AI for science, but with an emphasis on biological research. We’re trying to understand genomic organization within the cell nucleus.
How is DNA organized in the nucleus space? We’re getting insights by imaging cells and looking at how genomic organization evolves. The challenge is finding a way to use sparse and noisy data while still conducting robust and quantitative research. This is a three-year project.
And what research are you doing with an insurance company?
Lu: We’re simulating fire.
FM Global, one of the largest commercial property insurance companies in the world, had seen some of our previous studies and contacted my group. They wanted to know if our approach could be used to estimate the damage a fire can cause to a specific building. There are traditional modeling techniques that the insurance industry uses, but they are very expensive.
We’re applying AI techniques to this question, and, so far, we have preliminary results that are very encouraging.
How does it feel to be engaged in work that touches on so many areas of science — and society?
Lu: It’s very exciting to see my methods being potentially useful. I’ll give you another example. We just published a new paper in Nature Computational Science that involves modeling of the geometry of the human heart. Currently, doctors have to do this separately for each patient and it takes a long time to do. But what if we could do it much faster using AI? How many patients would benefit?
It’s just very exciting.