“Eventually, my dream would be to simulate a virtual cell.”—Demis Hassabis
The aspiration to build the virtual cell is considered to be equivalent to a moonshot for digital biology. Recently, 42 leading life scientists published a paper in Cell on why this is so vital, and how it may ultimately be accomplished. This conversation is with 2 of the authors, Charlotte Bunn, now atEPFL and Steve Quake, a Professor at Stanford University, who heads up science at theChan-Zuckerberg Initiative
1×
0:00
-43:43
Audio playback is not supported on your browser. Please upgrade.
The audio (above) is available on iTunes and Spotify. The full video is linked here, at the top, and also can be found onYouTube.
TRANSCRIPT WITH LINKS TO AUDIO
Eric Topol (00:06):
Hello, it's Eric Topol with Ground Truths and we've got a really hot topic today, the virtual cell. And what I think is extraordinarily important futuristic paper that recently appeared in the journal Cell and the first author, Charlotte Bunne from EPFL, previously at Stanford’s Computer Science. And Steve Quake, a young friend of mine for many years who heads up the Chan Zuckerberg Initiative (CZI) as well as a professor at Stanford. So welcome, Charlotte and Steve.
Steve Quake (00:42):
Thanks, Eric. It's great to be here.
Charlotte Bunne:
Thanks for having me.
Eric Topol (00:45):
Yeah. So you wrote this article that Charlotte, the first author, and Steve, one of the senior authors, appeared in Cell in December and it just grabbed me, “How to build the virtual cell with artificial intelligence: Priorities and opportunities.” It's the holy grail of biology. We're in this era of digital biology and as you point out in the paper, it's a convergence of what's happening in AI, which is just moving at a velocity that's just so extraordinary and what's happening in biology. So maybe we can start off by, you had some 42 authors that I assume they congregated for a conference or something or how did you get 42 people to agree to the words in this paper?
Steve Quake (01:33):
We did. We had a meeting at CZI to bring community members together from many different parts of the community, from computer science to bioinformatics, AI experts, biologists who don't trust any of this. We wanted to have some real contrarians in the mix as well and have them have a conversation together about is there an opportunity here? What's the shape of it? What's realistic to expect? And that was sort of the genesis of the article.
Eric Topol (02:02):
And Charlotte, how did you get to be drafting the paper?
Charlotte Bunne (02:09):
So I did my postdoc with Aviv Regev at Genentech and Jure Leskovec at CZI and Jure was part of the residency program of CZI. And so, this is how we got involved and you had also prior work with Steve on the universal cell embedding. So this is how everything got started.
Eric Topol (02:29):
And it's actually amazing because it's a who's who of people who work in life science, AI and digital biology and omics. I mean it's pretty darn impressive. So I thought I'd start off with a quote in the article because it kind of tells a story of where this could go. So the quote was in the paper, “AIVC (artificial intelligence virtual cell) has the potential to revolutionize the scientific process, leading to future breakthroughs in biomedical research, personalized medicine, drug discovery, cell engineering, and programmable biology.”That's a pretty big statement. So maybe we can just kind of toss that around a bit and maybe give it a little more thoughts and color as to what you were positing there.
Steve Quake (03:19):
Yeah, Charlotte, you want me to take the first shot at that? Okay. So Eric,it is a bold claim and we have a really bold ambition here. We view that over the course of a decade, AI is going to provide the ability to make a transformative computational tool for biology.Right now, cell biology is 90% experimental and 10% computational, roughly speaking. And you've got to do just all kinds of tedious, expensive, challenging lab work to get to the answer. And I don't think AI is going to replace that, but it can invert the ratio. So within 10 years I think we can get to biology being 90% computational and 10% experimental. And the goal of the virtual cell is to build a tool that'll do that.
Eric Topol (04:09):
And I think a lot of people may not understand why it is considered the holy grail because it is the fundamental unit of life and it's incredibly complex. It's not just all the things happening in the cell with atoms and molecules and organelles and everything inside, but then there's also the interactions the cell to other cells in the outside tissue and world. So I mean it's really quite extraordinary challenge that you've taken on here. And I guess there's some debate, do we have the right foundation? We're going to get into foundation models in a second. A good friend of mine and part of this whole I think process that you got together, Eran Segal from Israel, he said, “We're at this tipping point…All the stars are aligned, and we have all the different components: the data, the compute, the modeling.” And in the paper you describe how we have over the last couple of decades have so many different data sets that are rich that are global initiatives. But then there's also questions. Do we really have the data? I think Bo Wang especially asked about that. Maybe Charlotte, what are your thoughts about data deficiency? There's a lot of data, but do you really have what we need before we bring them all together for this kind of single model that will get us some to the virtual cell?
Charlotte Bunne (05:41):
So I think, I mean one core idea of building this AIVC is that we basically can leverage all experimental data that is overall collected. So this also goes back to the point Steve just made. So meaning that we basically can integrate across many different studies data because we have AI algorithms or the architectures that power such an AIVC are able to integrate basically data sets on many different scales. So we are going a bit away from this dogma. I'm designing one algorithm from one dataset to this idea of I have an architecture that can take in multiple dataset on multiple scales. So this will help us a bit in being somewhat efficient with the type of experiments that we need to make and the type of experiments we need to conduct. And again, what Steve just said, ultimately, we can very much steer which data sets we need to collect.
Charlotte Bunne (06:34):
Currently, of course we don't have all the data that is sufficient. I mean in particular, I think most of the tissues we have, they are healthy tissues. We don't have all the disease phenotypes that we would like to measure, having patient data is always a very tricky case. We have mostly non-interventional data, meaning we have very limited understanding of somehow the effect of different perturbations. Perturbations that happen on many different scales in many different environments. So we need to collect a lot here. I think the overall journey that we are going with is that we take the data that we have, we make clever decisions on the data that we will collect in the future, and we have this also self-improving entity that is aware of what it doesn't know. So we need to be able to understand how well can I predict something on this somewhat regime. If I cannot, then we should focus our data collection effort into this. So I think that's not a present state, but this will basically also guide the future collection.
Eric Topol (07:41):
Speaking of data, one of the things I think that's fascinating is we saw how AlphaFold2 really revolutionized predicting proteins. But remember that was based on this extraordinary resource that had been built, the Protein Data Bank that enabled that. And for the virtual cell there's no such thing as a protein data bank. It's so much more as you emphasize Charlotte, it's so much dynamic and these perturbations that are just all across the board as you emphasize. Now the human cell atlas, which currently some tens of millions, but going into a billion cells, we learned that it used to be 200 cell types. Now I guess it's well over 5,000 and that we have 37 trillion cells approximately in the average person adult's body is a formidable map that's being made now. And I guess the idea that you're advancing is that we used to, and this goes back to a statement you made earlier, Steve, everything we did in science was hypothesis driven. But if we could get computational model of the virtual cell, then we can have AI exploration of the whole field. Is that really the nuts of this?
Steve Quake (09:06):
Yes. A couple thoughts on that, maybe Theo Karaletsos, our lead AI person at CZI says machine learning is the formalism through which we understand high dimensional data and I think that's a very deep statement. And biological systems are intrinsically very high dimensional. You've got 20,000 genes in the human genome in these cell atlases. You're measuring all of them at the same time in each single cell. And there's a lot of structure in the relationships of their gene expression there that is just not evident to the human eye. And for example, CELL by GENE, our database that collects all the aggregates, all of the single cell transcriptomic data is now over a hundred million cells. And as you mentioned, we're seeing ways to increase that by an order of magnitude in the near future. The project that Jure Leskovec and I worked on together that Charlotte referenced earlier was like a first attempt to build a foundational model on that data to discover some of the correlations and structure that was there.
Steve Quake (10:14):
And so, with a subset, I think it was the 20 or 30 million cells, we built a large language model and began asking it, what do you understand about the structure of this data? And it kind of discovered lineage relationships without us teaching it. We trained on a matrix of numbers, no biological information there, and it learned a lot about the relationships between cell type and lineage. And that emerged from that high dimensional structure, which was super pleasing to us and really, I mean for me personally gave me the confidence to say this stuff is going to work out. There is a future for the virtual cell. It's not some made up thing. There is real substance there and this is worth investing an enormous amount of CZIs resources in going forward and trying to rally the community around as a project.
Eric Topol (11:04):
Well yeah, the premise here is that there is a language of life, and you just made a good case that there is if you can predict, if you can query, if you can generate like that. It is reminiscent of the famous Go game of Lee Sedol, that world champion and how the machine came up with a move (Move 37) many, many years ago that no human would've anticipated and I think that's what you're getting at. And the ability for inference and reason now to add to this. So Charlotte, one of the things of course is about, well there's two terms in here that are unfamiliar to many of the listeners or viewers of this podcast, universal representations (UR) and virtual instrument (VIs) that you make a pretty significant part of how you are going about this virtual cell model. So could you describe that and also the embeddings as part of the universal representation (UR) because I think embeddings, or these meaningful relationships are key to what Steve was just talking about.
Charlotte Bunne (12:25):
Yes. So in order to somewhat leverage very different modalities in order to leverage basically modalities that will take measurements across different scales, like the idea is that we have large, may it be transformer models that might be very different. If I have imaging data, I have a vision transformer, if I have a text data, I have large language models that are designed of course for DNA then they have a very wide context and so on and so forth. But the idea is somewhat that we have models that are connected through the scales of biology because those scales we know. We know which components are somewhat involved or in measurements that are happening upstream. So we have the somewhat interconnection or very large model that will be trained on many different data and we have this internal model representation that somewhat capture everything they've seen. And so, this is what we call those universal representation (UR) that will exist across the scales of biology.
Charlotte Bunne (13:22):
And what is great about AI, and so I think this is a bit like a history of AI in short is the ability to predict the last years, the ability to generate, we can generate new hypothesis, we can generate modalities that we are missing. We can potentially generate certain cellular state, molecular state have a certain property, but I think what's really coming is this ability to reason. So we see this in those very large language models, the ability to reason about a hypothesis, how we can test it. So this is what those instruments ultimately need to do. So we need to be able to simulate the change of a perturbation on a cellular phenotype. So on the internal representation, the universal representation of a cell state, we need to simulate the fact the mutation has downstream and how this would propagate in our representations upstream. And we need to build many different type of virtual instruments that allow us to basically design and build all those capabilities that ultimately the AI virtual cell needs to possess that will then allow us to reason, to generate hypothesis, to basically predict the next experiment to conduct to predict the outcome of a perturbation experiment to in silico design, cellular states, molecular states, things like that. And this is why we make the separation between internal representation as well as those instruments that operate on those representations.
Eric Topol (14:47):
Yeah, that's what I really liked is that you basically described the architecture, how you're going to do this. By putting these URs into the VIs, having a decoder and a manipulator and you basically got the idea if you can bring all these different integrations about which of course is pending. Now there are obviously many naysayers here that this is impossible. One of them is this guy, Philip Ball. I don't know if you read the language,How Life Works. Now he's a science journalist and he's a prolific writer. He says, “Comparing life to a machine, a robot, a computer, sells it short. Life is a cascade of processes, each with a distinct integrity and autonomy, the logic of which has no parallel outside the living world.” Is he right? There's no way to model this. It's silly, it's too complex.
Steve Quake (15:50):
We don't know, alright. And it's great that there's naysayers. If everyone agreed this was doable, would it be worth doing? I mean the whole point is to take risks and get out and do something really challenging in the frontier where you don't know the answer. If we knew that it was doable, I wouldn't be interested in doing it. So I personally am happy that there's not a consensus.
Eric Topol (16:16):
Well, I mean to capture people's imagination here, if you're successful and you marshal a global effort, I don't know who's going to pay for it because it's a lot of work coming here going forward. But if you can do it, the question here is right today we talk about, oh let's make an organoid so we can figure out how to treat this person's cancer or understand this person's rare disease or whatever. And instead of having to wait weeks for this culture and all the expense and whatnot, you could just do it in a computer and in silico and you have this virtual twin of a person's cells and their tissue and whatnot. So the opportunity here is, I don't know if people get, this is just extraordinary and quick and cheap if you can get there. And it's such a bold initiative idea, who will pay for this do you think?
Steve Quake (17:08):
Well, CZI is putting an enormous amount of resources into it and it's a major project for us. We have been laying the groundwork for it. We recently put together what I think is if not the largest, one of the largest GPU supercomputer clusters for nonprofit basic science research that came online at the end of last year. And in fact in December we put out an RFA for the scientific community to propose using it to build models. And so we're sharing that resource within the scientific community as I think you appreciate, one of the real challenges in the field has been access to compute resources and industry has it academia at a much lower level. We are able to be somewhere in between, not quite at the level of a private company but the tech company but at a level beyond what most universities are being able to do and we're trying to use that to drive the field forward. We're also planning on launching RFAs we this year to help drive this project forward and funding people globally on that. And we are building a substantial internal effort within CZI to help drive this project forward.
Eric Topol (18:17):
I think it has the looks of the human genome project, which at time as you know when it was originally launched that people thought, oh, this is impossible. And then look what happened. It got done. And now the sequence of genome is just a commodity, very relatively, very inexpensive compared to what it used to be.
Steve Quake (18:36):
I think a lot about those parallels. And I will say one thing, Philip Ball, I will concede him the point, the cells are very complicated. The genome project, I mean the sort of genius there was to turn it from a biology problem to a chemistry problem, there is a test tube with a chemical and it work out the structure of that chemical. And if you can do that, the problem is solved. I think what it means to have the virtual cell is much more complex and ambiguous in terms of defining what it's going to do and when you're done. And so, we have our work cut out for us there to try to do that. And that's why a little bit, I established our North Star and CZI for the next decade as understanding the mysteries of the cell and that word mystery is very important to me. I think the molecules, as you pointed out earlier are understood, genome sequenced, protein structure solved or predicted, we know a lot about the molecules. Those are if not solved problems, pretty close to being solved. And the real mystery is how do they work together to create life in the cell? And that's what we're trying to answer with this virtual cell project.
Eric Topol (19:43):
Yeah, I think another thing that of course is happening concurrently to add the likelihood that you'll be successful is we've never seen the foundation models coming out in life science as they have in recent weeks and months. Never. I mean,I have a paper in Science tomorrow coming out summarizing the progress about not just RNA, DNA, ligands. I mean the whole idea, AlphaFold3, but now Boltz and so many others. It's just amazing how fast the torrent of new foundation models. So Charlotte, what do you think accounts for this? This is unprecedented in life science to see foundation models coming out at this clip on evolution on, I mean you name it, design of every different molecule of life or of course in cells included in that. What do you think is going on here?
Charlotte Bunne (20:47):
So on the one hand, of course we benefit profits and inherit from all the tremendous efforts that have been made in the last decades on assembling those data sets that are very, very standardized. CELLxGENE is very somehow AI friendly, as you can say, it is somewhat a platform that is easy to feed into algorithms, but at the same time we actually also see really new building mechanisms, design principles of AI algorithms in itself. So I think we have understood that in order to really make progress, build those systems that work well, we need to build AI tools that are designed for biological data. So to give you an easy example, if I use a large language model on text, it's not going to work out of the box for DNA because we have different reading directions, different context lens and many, many, many, many more.
Charlotte Bunne (21:40):
And if I look at standard computer vision where we can say AI really excels and I'm applying standard computer vision, vision transformers on multiplex images, they're not going to work because normal computer vision architectures, they always expect the same three inputs, RGB, right? In multiplex images, I'm measuring up to 150 proteins potentially in a single experiment, but every study will measure different proteins. So I deal with many different scales like larger scales and I used to attention mechanisms that we have in usual computer vision. Transformers are not going to work anymore, they're not going to scale. And at the same time, I need to be completely flexible in whatever input combination of channel I'm just going to face in this experiment. So this is what we right now did for example, in our very first work, inheriting the design principle that we laid out in the paper AI virtual cell and then come up with new AI architectures that are dealing with these very special requirements that biological data have.
Charlotte Bunne (22:46):
So we have now a lot of computer scientists that work very, very closely have a very good understanding of biologists. Biologists that are getting much and much more into the computer science. So people who are fluent in both languages somewhat, that are able to now build models that are adopted and designed for biological data. And we don't just take basically computer vision architectures that work well on street scenes and try to apply them on biological data. So it's just a very different way of thinking about it, starting constructing basically specialized architectures, besides of course the tremendous data efforts that have happened in the past.
Eric Topol (23:24):
Yeah, and we're not even talking about just sequence because we've also got imaging which has gone through a revolution, be able to image subcellular without having to use any types of stains that would disrupt cells. That's another part of the deep learning era that came along. One thing I thought was fascinating in the paper in Cell you wrote, “For instance, the Short Read Archive of biological sequence data holds over 14 petabytes of information, which is 1,000 times larger than the dataset used to train ChatGPT.” I mean that's a lot of tokens, that's a lot of stuff, compute resources. It's almost like you're going to need a DeepSeek type of way to get this. I mean not that DeepSeek as its claim to be so much more economical, but there's a data challenge here in terms of working with that massive amount that is different than the human language. That is our language, wouldn't you say?
Steve Quake (24:35):
So Eric, that brings to mind one of my favorite quotes from Sydney Brenner who is such a wit. And in 2000 at the sort of early first flush of success in genomics, he said,biology is drowning in a sea of data and starving for knowledge. A very deep statement, right? And that's a little bit what the motivation was for putting the Short Read Archive statistic into the paper there. And again, for me, part of the value of this endeavor of creating a virtual cell is it's a tool to help us translate data into knowledge.
Eric Topol (25:14):
Yeah, well there's two, I think phenomenal figures in your Cell paper. The first one that kicks across the capabilities of the virtual cell and the second that compares the virtual cell to the real or the physical cell. And we'll link that with this in the transcript. And the other thing we'll link is there's a niceAtlantic article, “A Virtual Cell Is a ‘Holy Grail’ of Science. It's Getting Closer.” That might not be quite close as next week or year, but it's getting close and that's good for people who are not well grounded in this because it's much more taken out of the technical realm. This is really exciting. I mean what you're onto here and what's interesting, Steve, since I've known you for so many years earlier in your career you really worked on omics that is being DNA and RNA and in recent times you've made this switch to cells. Is that just because you're trying to anticipate the field or tell us a little bit about your migration.
Steve Quake (26:23):
Yeah, so a big part of my career has been trying to develop new measurement technologies that'll provide insight into biology. And decades ago that was understanding molecules. Now it's understanding more complex biological things like cells and it was like a natural progression. I mean we built the sequencers, sequenced the genomes, done. And it was clear that people were just going to do that at scale then and create lots of data. Hopefully knowledge would get out of that. But for me as an academic, I never thought I'd be in the position I'm in now was put it that way. I just wanted to keep running a small research group. So I realized I would have to get out of the genome thing and find the next frontier and it became this intersection of microfluidics and genomics, which as you know, I spent a lot of time developing microfluidic tools to analyze cells and try to do single cell biology to understand their heterogeneity. And that through a winding path led me to all these cell atlases and to where we are now.
Eric Topol (27:26):
Well, we're fortunate for that and also with your work with CZI to help propel that forward and I think it sounds like we're going to need a lot of help to get this thing done. Now Charlotte, as a computer scientist now at EPFL, what are you going to do to keep working on this and what's your career advice for people in computer science who have an interest in digital biology?
Charlotte Bunne (27:51):
So I work in particular on the prospect of using this to build diagnostic tools and to make diagnostics in the clinic easier because ultimately we have somewhat limited capabilities in the hospital to run deep omics, but the idea of being able to somewhat map with a cheaper and lighter modality or somewhat diagnostic test into something much richer because a model has been seeing all those different data and can basically contextualize it. It's very interesting. We've seen all those pathology foundation models. If I can always run an H&E, but then decide when to run deeper diagnostics to have a better or more accurate prediction, that is very powerful and it's ultimately reducing the costs, but the precision that we have in hospitals. So my faculty position right now is co-located between the School of Life Sciences, School of Computer Science. So I have a dual affiliation and I'm affiliated to the hospitals to actually make this possible and as a career advice, I think don't be shy and stick to your discipline.
Charlotte Bunne (28:56):
I have a bachelor's in biology, but I never only did biology. I have a PhD in computer science, which you would think a bachelor in biology not necessarily qualifies you through. So I think this interdisciplinarity also requires you to be very fluent, very comfortable in reading many different styles of papers and publications because a publication in a computer science venue will be very, very different from the way we write in biology. So don't stick to your study program, but just be free in selecting whatever course gets you closer to the knowledge you need in order to do the research or whatever task you are building and working on.
Eric Topol (29:39):
Well, Charlotte, the way you're set up there with this coalescence of life science and computer science is so ideal and so unusual here in the US, so that's fantastic. That's what we need and that's really the underpinning of how you're going to get to the virtual cells, getting these two communities together. And Steve, likewise, you were an engineer and somehow you became one of the pioneers of digital biology way back before it had that term, this interdisciplinary, transdisciplinary. We need so much of that in order for you all to be successful, right?
Steve Quake (30:20):
Absolutely. I mean there's so much great discovery to be done on the boundary between fields. I trained as a physicist and kind of made my career this boundary between physics and biology and technology development and it's just sort of been a gift that keeps on giving. You've got a new way to measure something, you discover something new scientifically and it just all suggests new things to measure. It's very self-reinforcing.
Eric Topol (30:50):
Now, a couple of people who you know well have made some pretty big statements about this whole era of digital biology and I think the virtual cell is perhaps the biggest initiative of all the digital biology ongoing efforts, but Jensen Huang wrote, “for the first time in human history, biology has the opportunity to be engineering, not science.” And Demis Hassabis wrote or said, ‘we're seeing engineering science, you have to build the artifact of interest first, and then once you have it, you can use the scientific method to reduce it down and understand its components.’ Well here there's a lot to do to understand its components and if we can do that, for example, right now as both of AI drug discoveries and high gear and there's umpteen numbers of companies working on it, but it doesn't account for the cell. I mean it basically is protein, protein ligand interactions. What if we had drug discovery that was cell based? Could you comment about that? Because that doesn't even exist right now.
Steve Quake (32:02):
Yeah, I mean I can say something first, Charlotte, if you've got thoughts, I'm curious to hear them. So I do think AI approaches are going to be very useful designing molecules. And so, from the perspective of designing new therapeutics, whether they're small molecules or antibodies, yeah, I mean there's a ton of investment in that area that is a near term fruit, perfect thing for venture people to invest in and there's opportunity there. There's been enough proof of principle. However, I do agree with you that if you want to really understand what happens when you drug a target, you're going to want to have some model of the cell and maybe not just the cell, but all the different cell types of the body to understand where toxicity will come from if you have on-target toxicity and whether you get efficacy on the thing you're trying to do.
Steve Quake (32:55):
And so, we really hope that people will use the virtual cell models we're going to build as part of the drug discovery development process, I agree with you in a little of a blind spot and we think if we make something useful, people will be using it. The other thing I'll say on that point is I'm very enthusiastic about the future of cellular therapies and one of our big bets at CZI has been starting the New York Biohub, which is aimed at really being very ambitious about establishing the engineering and scientific foundations of how to engineer completely, radically more powerful cellular therapies. And the virtual cell is going to help them do that, right? It's going to be essential for them to achieve that mission.
Eric Topol (33:39):
I think you're pointing out one of the most important things going on in medicine today is how we didn't anticipate that live cell therapy, engineered cells and ideally off the shelf or in vivo, not just having to take them out and work on them outside the body, is a revolution ongoing, and it's not just in cancer, it's in autoimmune diseases and many others. So it's part of the virtual cell need. We need this. One of the things that's a misnomer, I want you both to comment on, we keep talking about single cell, single cell. Andthere's a paper spatial multi-omics this week, five different single cell scales all integrated. It's great, but we don't get to single cell. We're basically looking at 50 cells, 100 cells. We're not doing single cell because we're not going deep enough. Is that just a matter of time when we actually are doing, and of course the more we do get down to the single or a few cells, the more insights we're going to get. Would you comment about that? Because we have all this literature on single cell comes out every day, but we're not really there yet.
Steve Quake (34:53):
Charlotte, do you want to take a first pass at that and then I can say something?
Charlotte Bunne (34:56):
Yes. So it depends. So I think if we look at certain spatial proteomics, we still have subcellular resolutions. So of course, we always measure many different cells, but we are able to somewhat get down to resolution where we can look at certain colocalization of proteins. This also goes back to the point just made before having this very good environment to study drugs. If I want to build a new drug, if I want to build a new protein, the idea of building this multiscale model allows us to actually simulate different, somehow binding changes and binding because we simulate the effect of a drug. Ultimately, the redouts we have they are subcellular. So of course, we often in the spatial biology, we often have a bit like methods that are rather coarse they have a spot that averages over certain some cells like hundreds of cells or few cells.
Charlotte Bunne (35:50):
But I think we also have more and more technologies that are zooming in that are subcellular where we can actually tag or have those probe-based methods that allow us to zoom in. There's microscopy of individual cells to really capture them in 3D. They are of course not very high throughput yet, but it gives us also an idea of the morphology and how ultimately morphology determine certain somehow cellular properties or cellular phenotype. So I think there's lots of progress also on the experimental and that ultimately will back feed into the AI virtual cell, those models that will be fed by those data. Similarly, looking at dynamics, right, looking at live imaging of individual cells of their morphological changes. Also, this ultimately is data that we'll need to get a better understanding of disease mechanisms, cellular phenotypes functions, perturbation responses.
Eric Topol (36:47):
Right. Yes, Steve, you can comment on that and the amazing progress that we have made with space and time, spatial temporal resolution, spatial omics over these years, but that we still could go deeper in terms of getting to individual cells, right?
Steve Quake (37:06):
So, what can we do with a single cell? I'd say we are very mature in our ability to amplify and sequence the genome of a single cell, amplify and sequence the transcriptome of a single cell. You can ask is one cell enough to make a biological conclusion? And maybe I think what you're referring to is people want to see replicates and so you can ask how many cells do you need to see to have confidence in any given biological conclusion, which is a reasonable thing. It's a statistical question in good science. I think I've been very impressed with how the mass spec people have been doing recently. I think they've finally cracked the ability to look at proteins from single cells and they can look at a couple thousand proteins. That was I think one of these Nature method of the year things at the end of last year and deep visual proteomics.
Eric Topol (37:59):
Deep visual proteomics, yes.
Steve Quake (38:00):
Yeah, they are over the hump. Yeah, they are over the hump with single cell measurements. Part of what's missing right now I think is the ability to reliably do all of that on the same cell. So this is what Charlotte was referring to be able to do sort of multi-modal measurements on single cells. That's kind of in its infancy and there's a few examples, but there's a lot more work to be done on that. And I think also the fact that these measurements are all destructive right now, and so you're losing the ability to look how the cells evolve over time. You've got to say this time point, I'm going to dissect this thing and look at a state and I don't get to see what happens further down the road. So that's another future I think measurement challenge to be addressed.
Eric Topol (38:42):
And I think I'm just trying to identify some of the multitude of challenges in this extraordinarily bold initiative because there are no shortage and that's good about it. It is given people lots of work to do to overcome, override some of these challenges. Now before we wrap up, besides the fact that you point out that all the work has to be done and be validated in real experiments, not just live in a virtual AI world, but you also comment about the safety and ethics of this work and assuming you're going to gradually get there and be successful. So could either or both of you comment about that because it's very thoughtful that you're thinking already about that.
Steve Quake (41:10):
As scientists and members of the larger community, we want to be careful and ensure that we're interacting with people who said policy in a way that ensures that these tools are being used to advance the cause of science and not do things that are detrimental to human health and are used in a way that respects patient privacy. And so, the ethics around how you use all this with respect to individuals is going to be important to be thoughtful about from the beginning. And I also think there's an ethical question around what it means to be publishing papers and you don't want people to be forging papers using data from the virtual cell without being clear about where that came from and pretending that it was a real experiment. So there's issues around those sorts of ethics as well that need to be considered.
Eric Topol (42:07):
And of those 40 some authors, do you around the world, do you have the sense that you all work together to achieve this goal? Is there kind of a global bonding here that's going to collaborate?
Steve Quake (42:23):
I think this effort is going to go way beyond those 40 authors. It's going to include a much larger set of people and I'm really excited to see that evolve with time.
Eric Topol (42:31):
Yeah, no, it's really quite extraordinary how you kick this thing off and the paper is the blueprint for something that we are all going to anticipate that could change a lot of science and medicine. I mean we saw, as you mentioned, Steve, how that deep visual proteomics (DVP) saved lives.It was what I wrote a spatial medicine, no longer spatial biology. And so, the way that this can change the future of medicine, I think a lot of people just have to have a little bit of imagination that once we get there with this AIVC, that there's a lot in store that's really quite exciting. Well, I think this has been an invigorating review of that paper and some of the issues surrounding it. I couldn't be more enthusiastic for your success and ultimately where this could take us. Did I miss anything during the discussion that we should touch on before we wrap up?
Steve Quake (43:31):
Not from my perspective. It was a pleasure as always Eric, and a fun discussion.
Charlotte Bunne (43:38):
Thanks so much.
Eric Topol (43:39):
Well thank you both and all the co-authors of this paper. We're going to be following this with the great interest, and I think for most people listening, they may not know that this is in store for the future. Someday we will get there. I think one of the things to point out right now is the models we have today that large language models based on transformer architecture, they're going to continue to evolve. We're already seeing so much in inference and ability for reasoning to be exploited and not asking for prompts with immediate answers, but waiting for days to get back. A lot more work from a lot more computing resources. But we're going to get models in the future to fold this together. I think that's one of the things that you've touched on the paper so that whatever we have today in concert with what you've laid out, AI is just going to keep getting better.
Eric Topol (44:39):
The biology that these foundation models are going to get broader and more compelling as to their use cases. So that's why I believe in this. I don't see this as a static situation right now. I just think that you're anticipating the future, and we will have better models to be able to integrate this massive amount of what some people would consider disparate data sources. So thank you both and all your colleagues for writing this paper. I don't know how you got the 42 authors to agree to it all, which is great, and it's just a beginning of something that's a new frontier. So thanks very much.
Steve Quake (45:19):
Thank you, Eric.
**********************************************
Thanks for listening, watching or reading Ground Truths. Your subscription is greatly appreciated.
If you found this podcast interesting please share it!
That makes the work involved in putting these together especially worthwhile.
All content on Ground Truths—newsletters, analyses, and podcasts—is free, open-access, with no ads..
Paid subscriptions are voluntary and all proceeds from them go to supportScripps Research. They do allow for posting comments and questions, which I do my best to respond to. Many thanks to those who have contributed—they have greatly helped fund our summer internship programs for the past two years. And such support is becoming more vital In light of current changes of funding by US biomedical research at NIH and other governmental agencies.
Thanks to my producer Jessica Nguyen and to Sinjun Balabanoff for audio and video support at Scripps Research.