nature.com

Scaling a foundational protein language model to 100 billion parameters

xTrimoPGLM, a protein language model scaled to 100 billion parameters, showcased scaling behavior to excel in various protein-related tasks. This development advances protein understanding and design, and contributes to the evolving landscape of comprehensive models designed to serve as a base for various specialized tasks (foundation models) in protein science.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

$29.99 / 30 days

cancel any time

Learn more

Subscribe to this journal

Receive 12 print issues and online access

$259.00 per year

only $21.58 per issue

Learn more

Buy this article

Purchase on SpringerLink

Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Additional access options:

Log in

Learn about institutional subscriptions

Read our FAQs

Contact customer support

Fig. 1: Applying xTrimoPGLM to protein understanding and generation tasks.

References

Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023). This paper reports that scaling protein language models to billions of parameters enables direct prediction of high-resolution protein structures from their primary sequences.

CASPubMedGoogle Scholar

Cheng, X. et al. Training compute-optimal protein language models. Preprint at bioRxivhttps://doi.org/10.1101/2024.06.06.597716 (2024). This paper presents optimal training of protein language models by considering the effect of factors such as the pretrained dataset, pretrained objective and compute budget.

Hayes, T. et al. Simulating 500 million years of evolution with a language model. Science 387, 850–858 (2025). This paper introduces a multimodal generative model that integrates sequence, structure and text for programmable protein design.

CASPubMedGoogle Scholar

Abramson, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature 630, 493–500 (2024). This paper presents accurate joint structure predictions of biomolecular complexes including proteins, nucleic acids, small molecules, ions and modified residues.

CASPubMedPubMed CentralGoogle Scholar

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). This paper presents a computational method that can accurately predict protein structures with atomic accuracy.

CASPubMedPubMed CentralGoogle Scholar

Download references

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This is a summary of: Chen, B. et al. xTrimoPGLM: unified 100-billion-parameter pretrained transformer for deciphering the language of proteins. Nat. Methods https://doi.org/10.1038/s41592-025-02636-z (2025).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Scaling a foundational protein language model to 100 billion parameters. Nat Methods (2025). https://doi.org/10.1038/s41592-025-02637-y

Download citation

Published:03 April 2025

DOI:https://doi.org/10.1038/s41592-025-02637-y

Share this article

Anyone you share the following link with will be able to read this content:

Get shareable link

Sorry, a shareable link is not currently available for this article.

Copy to clipboard

Provided by the Springer Nature SharedIt content-sharing initiative

Read full news in source page