Discover how an advanced AI-powered tool, PlantRNA-FM, is revolutionizing plant biology by uncovering hidden RNA codes, paving the way for resilient crops and groundbreaking genetic insights.
The pretraining dataset comprises transcriptomic sequences from 1,124 plant species, consisting of approximately 25.0M RNA sequences and 54.2B RNA bases. The green dots on the global mean temperature map represent the geographical distribution of these plant species across the world.The pretraining dataset comprises transcriptomic sequences from 1,124 plant species, consisting of approximately 25.0M RNA sequences and 54.2B RNA bases. The green dots on the global mean temperature map represent the geographical distribution of these plant species across the world. Research: An interpretable RNA foundation model for exploring functional RNA motifs in plants
A research collaboration has launched a pioneering Artificial Intelligence (AI)-powered model that can understand the sequences and structure patterns that make up the genetic "language" of plants.
Structural Insights: PlantRNA-FM incorporates both RNA sequence and structural information, allowing it to predict and classify RNA secondary and tertiary motifs critical for translation and gene regulation.
PlantRNA-FM, a high-performance and interpretable RNA foundation model, is believed to be the first AI model of its kind. It has been developed by a collaboration between plant researchers at the John Innes Centre and computer scientists at the University of Exeter.
According to its creators, the model is a smart technological breakthrough that can drive discovery and innovation in plant science and potentially across the study of invertebrates and bacteria. The model outperforms existing RNA models in key metrics such as genic region annotation and translation efficiency prediction, achieving F1 scores as high as 0.974.
RNA, like its better-known chemical relative DNA, is an important molecule throughout all organisms, responsible for carrying genetic information in its sequences and structures. In the genome, RNA architecture comprises combinations of building blocks called nucleotides, which are arranged in patterns in the same way that the alphabet combines to make words and phrases in language.
Professor Yiliang Ding's group at the John Innes Centre studies RNA structure, one of the key languages in RNA molecules, where RNAs can fold into complex structures that regulate sophisticated biological functions such as plant growth and stress response.
To better understand the complex language of RNA in its functions, Professor Ding's group collaborated with Dr Ke Li's group at the University of Exeter.
Together, they developed PlantRNA-FM, a model trained on an enormous data set of 54.2 billion pieces of RNA information spanning sequences and structures that make up a genetic alphabet across 1,124 plant species.
When creating PlantRNA-FM, the researchers followed the methodology used to train AI models such as ChatGPT to understand human language. However, instead of focusing on text, the model deciphers RNA sequences and secondary and tertiary structure motifs critical for biological functions. The AI model was taught plant-based language by studying RNA information from plant species worldwide, giving it a comprehensive view of how RNA works across the plant kingdom.
Just as ChatGPT can understand and respond to human language, PlantRNA-FM has learned to understand the grammar and logic of RNA sequences and structures.
The researchers have already used the model to make precise predictions about RNA functions and to identify specific functional RNA structural patterns across the transcriptomes. Notably, the model successfully identified translation-associated RNA motifs and determined how their positions within the genic regions influence translation efficiency. Their predictions have been validated by experiments using dual-luciferase reporter assays, confirming that RNA structures identified by PlantRNA-FM influence the efficiency of translating genetic information into protein.
"While RNA sequences may appear random to the human eye, our AI model has learned to decode the hidden patterns within them," says Dr Haopeng Yu, the postdoc researcher in Professor Yiliang Ding's group at the John Innes Centre.
This successful collaboration was also supported by scientists from Northeast Normal University and the Chinese Academy of Sciences in China, who contributed to this work.
Professor Ding said, "Our PlantRNA-FM is just the beginning. We are working closely with Dr. Li's group to develop more advanced AI approaches to understanding the hidden DNA and RNA languages in nature. By incorporating insights from both RNA sequences and their structural motifs, these models are poised to open new possibilities for understanding and potentially programming plants. This breakthrough opens new possibilities for understanding and potentially programming plants, which could have profound implications for crop improvement and the next generation of AI-based gene design. AI is increasingly instrumental in helping plant scientists tackle challenges, from feeding a global population to developing crops that can thrive in a changing climate."
An Interpretable RNA Foundation Model for Exploration Functional RNA Motifs in Plants appears in Nature Machine Intelligence.
Journal reference:
Yu, H., Yang, H., Sun, W., Yan, Z., Yang, X., Zhang, H., Ding, Y., & Li, K. (2024). An interpretable RNA foundation model for exploring functional RNA motifs in plants. Nature Machine Intelligence, 1-10. DOI: 10.1038/s42256-024-00946-z, https://www.nature.com/articles/s42256-024-00946-z