Life begins with a tiny orchestra of stem cells, setting the stage for the body’s grand symphony of development. As the performance unfolds, a dynamic interplay of factors takes center stage, orchestrating the tempo and rhythm of the cells’ differentiation into hundreds of specialized cell types that make up the body. For years, scientists have been working to unravel this intricate symphony, identifying the key factors driving cellular differentiation and uncovering clues that may allow them to turn back the clock on cells nearing their finale.
The capacity to become nearly any cell type makes pluripotent stem cells (PSCs) an attractive source for developing cell therapies for a variety of diseases, including leukemia, heart disease, and cystic fibrosis. However, these potent stem cells are restricted to a narrow developmental window and their collection requires the destruction of embryos, raising ethical concerns around their use.
In the late 1990s and early 2000s, researchers reprogrammed somatic cells either by transferring their nuclear contents into unfertilized eggs or by fusing the cells with embryonic stem cells (ESCs).1 Stem cell researcher Shinya Yamanaka, now at the University of California, San Francisco, wondered if introducing just the essential ingredients from an ESC, rather than fusing the entire cell, would have the same effect. In 2006, he identified a simple concoction of four transcription factors—dubbed the Yamanaka factors—that transformed adult skin cells into induced PSCs (iPSCs)—a finding that revolutionized stem cell research.2 Finally, scientists had a method for generating PSCs that bypassed the need for embryos and ESCs. In the years since, scientists have developed advanced human iPSC technology to generate patient-specific cell lines for personalized medicine, disease modeling, and drug testing.3
Although the technique reliably produces iPSCs, cellular reprogramming with the four Yamanaka factors—Oct4, Sox2, Klf4, and cMyc (collectively known as OSKM)—remains inefficient and incomplete, and it poses significant safety concerns, including the risk of introducing cancer-causing mutations. Now, researchers are shaking up the reprogramming cocktail, focusing their efforts on what has long been considered the most invaluable of the transcription factors: Oct4.
A Rocky Road for the Oct4 Transcription Factor
In the late 1980s, under the tutelage of developmental biologist Peter Gruss at the Max Planck Institute for Biophysical Chemistry, Hans Schöler was exploring gene regulation during early mouse embryogenesis. While looking at promoter and enhancer regions in developmentally regulated genes, he noticed a common string of letters—ATGCAAAT—shared by many different genes. To determine which factors were binding to this recurring DNA fingerprint, called an octamer motif, Schöler collected nuclear extracts from different developmental stages in the mouse embryo, incubated them with the octamer motif sequence, and ran an electrophoretic mobility shift assay. He identified a slew of factors, called the octamer-binding proteins (Oct), which interacted with this snippet of DNA.4 “I just numbered them from one to 10 according to their mobility in the band shift,” said Schöler. He was most excited about the factor occupying the fourth band, which he called Oct4, as it was expressed in both germ cells and ESCs.
Photo of Hans Schoeler.
In the late 1980s, Hans Schöler identified a slew of new octamer-binding proteins, including Oct4, which he and others went on to show plays a key role in pluripotency.
Max Planck Institute for Molecular Biomedicine
Schöler, who was interested in exploring the factors orchestrating early mammalian development, along with others in the field went on to demonstrate how Oct4 is critical for early development. For example, Schöler and his colleagues found that Oct4-deficient mouse embryos developed to the blastocyst stage, but the inner cell mass, a group of pluripotent ESCs that eventually give rise to the fetus, never formed.5His lab and others also went on to show that the transcription factor plays a key role in halting differentiation, maintaining this state of pluripotency.6
These studies helped make Oct4's role in development clear: “Without Oct4, there is no pluripotency,” summarized Schöler.
The importance of Oct4 only increased after Yamanaka demonstrated its essential role in the generation of iPSCs. Although reprogramming mixologists have been shaking up the cocktail to address its limitations, Oct4 was widely considered the most important and indispensable reprogramming factor.7,8 “We always thought Oct4 is the one that is kind of the captain, and the others are supporting Oct4,” said Schöler. But then, Schöler and his team made a surprising discovery that would cause them to rethink their approach to reprogramming.
Back in 2019, Sergiy Velychko, now a stem cell biologist at Harvard Medical School, was a graduate student in Schöler’s team at the Max Planck Institute for Molecular Biomedicine and exploring Oct4’s role in reprogramming mouse embryonic fibroblasts into iPSCs. As a negative control in an experiment, he introduced a reprogramming cassette that lacked Oct4, but he was shocked to discover that these cells successfully reprogrammed into iPSCs.9 No one had ever generated iPSCs without overexpressing Oct4.
Photo of a brown mouse sitting in a glass dish being held by a blue-gloved hand.
Sergiy Velychko and his colleagues generated the the first-ever Oct4 free iPSCs, which could be used to produce all-iPSC mice.
Sergiy Velychko
When Velychko and his team tried to find an explanation for these dogma-defying iPSCs, they landed on a key methodological difference between their approach and previous efforts to remove Oct4. To introduce the transcription factors to somatic cells, Velychko and his team used an inducible polycistronic lentiviral vector, whereas other researchers had used monocistronic retroviruses. Velychko was familiar with research showing that pluripotent cells could silence retroviral transgenes, so he ran follow up experiments to explore if this could explain the disparate findings.10 Indeed, he found that overexpression of Sox2 and cMyc shuts down the retrovirus in just two days, stalling the reprogramming process, while the presence of Oct4 accelerates the process, allowing for reprogramming to take place before retroviral silencing occurs. However, when they delivered an Oct4-free cocktail using lentiviral or episomal vectors, the reprogramming process proceeded unimpeded, giving rise to the first-ever SKM iPSCs.
“Things take longer, but the iPS cells that you obtain are of much higher quality,” said Schöler.
Not only did the Oct4-free cassette generate less off-target gene activation and fewer epigenetic aberrations in iPSCs than cells created with OSKM, but the resulting cells also outperformed the OSKM iPSCs in the most stringent test of pluripotency: the tetraploid complementation assay. The technique involves fusing two-cell embryos to create a tetraploid embryo, which contains double the number of chromosomes and therefore cannot develop into a fetus but does develop into a functional placenta. However, if normal diploid ESCs or iPSCs are introduced to the tetraploid embryo, then the fetus will be formed entirely from these introduced iPSCs and develop normally, generating so-called all-iPSC animals. “This is how you prove ultimately that your cells are truly pluripotent,” said Velychko. When they introduced their SKM iPSCs into tetraploid embryos, they successfully generated 20 times more living mouse pups compared to iPSCs generated using the conventional OSKM cocktail.
Oct4 and Sox2: A Pioneering Reprogramming Power Couple
Although Velychko and his colleagues demonstrated that Oct4 kept iPSCs from reaching their full pluripotent potential, they didn’t know why. However, Velychko noted that transcription factors do not work is isolation. “They work in cooperation and the level of each factor matters,” he added.
Oct4 is coming in with all its friends having a great party and opening everything, but it's doing a lot of nonsense if you let it go.
—Hans Schöler, Max Planck Institute for Molecular Biomedicine
During the earliest stages of embryogenesis, the expression patterns of specific transcription factors ebb and flow to perfectly time genome engagement, gene expression programs, and developmental trajectories. As elite pioneer transcription factors, Oct4, Sox2, and Klf4—but not c-Myc—can engage wound up, transcriptionally silent chromatin as well as open regions.11 “They have this power of binding to closed chromatin and then opening up chromatin for the other factors to bind in,” said Abdenour Soufi, a stem cell biologist at the University of Edinburgh. This is part of what makes these factors trailblazers in reprogramming somatic cells, as they facilitate an epigenetic reset and reactivate dormant pluripotency networks. In development, transcription factors cooperate to delicately unravel the genome and target specific DNA sequences at specific timepoints. However, it's difficult to achieve this level of precision by overexpressing the four factors all at once, as is the case with somatic reprogramming.
“Oct4 is so powerful, it's like a bulldozer. It's opening chromatin like crazy,” said Schöler, who hypothesized that overexpression of the transcription factor in somatic cells that don't normally express it wreaks havoc on the epigenome. “Oct4 is coming in with all its friends having a great party and opening everything, but it's doing a lot of nonsense if you let it go.” When left to its own devices, Schöler said that Oct4 will visit all the octamer binding motifs in the genome, turning on more than just the pluripotency network. For example, when overexpressed, as is the case with the OSKM cocktail, Oct4 can act as a monomer or a dimer and trigger proliferation and differentiation at the same time that it’s inducing early-stage pluripotency. “What we did is to ask, ‘how can we take Oct4 on a short leash?’” said Schöler.
During development, Sox2 and Oct4 form a dimer that binds to SoxOct DNA motifs. These sites control the pluripotency fate and effectively restart life through epigenetic reprogramming, a process that is essential for the establishment and maintenance of pluripotency. Specifically, the researchers wanted to restrict Oct4 binding to the canonical SoxOct motif, which induces naïve pluripotency—a state of pluripotency found in preimplantation embryos. “This is the perfect site for reprogramming,” said Schöler. However, the Oct4-Sox2 dimer also visits other SoxOct motifs, so they wanted to find a way to force Oct4 to bind specifically at the canonical motif. For this, they looked to other Oct4 dimers for some ideas.
This is the first modification of the Yamanaka factor that could not just improve the number of colonies you get, but the quality is improved. So, you get more mice by just changing one letter.
—Sergiy Velychko, Harvard Medical School
Oct4 also links up with Sox17, another member of the Sox family, to bind to the compressed SoxOct motif, which is similar to the canonical SoxOct motif, but contains one fewer DNA base pair between Sox and Oct binding sites. Velychko and his team found that a structural element located on the surface of Sox17, specifically at residue 61V, allowed the factor to more strongly interact with Oct4. Residue 61 is also located on the interface between Sox2 and Oct4, but only when they cooperate to bind to the canonical SoxOct motif. So, the researchers introduced a single swap—an A to a V—at position 61 in Sox2 to produce a chimeric Sox2-Sox17 transcription factor, which he called super-SOX.12,13 Swapping Sox2 for super-SOX in the OSKM cocktail enabled a much more stable Oct4-Sox2 dimerization that redistributed Oct4 binding towards the canonical SoxOct motif. This edit resulted in iPSCs with greater differentiation potential, enabling the generation of healthy all-iPSC mice, even in the presence of Oct4 in the cocktail.
Velychko was excited to find that they could generate more full-term mice in the tetraploid complementation assay when compared to the standard Yamanaka cocktail and comparable numbers relative to the Oct4-free cocktail. “This is the first modification of the Yamanaka factor that could not just improve the number of colonies you get, but the quality is improved. So, you get more healthy mice by just changing one letter,” he said,
Super-SOX also achieved something that the Oct4-free cocktail couldn't: enhanced human iPSC generation. Generating human iPSCs is significantly less efficient than reprogramming mouse cells, meaning that the derivation of patient-specific iPSCs is challenging. However, the human version of super-SOX enhanced the reprogramming of human fibroblast cells by 50 times, or 5,000 percent.
“This was a huge improvement! Super-SOX didn’t just improve the process—in many cases it made it possible for the first time,” said Velyckho. “While original Yamanaka cocktail failed completely, super-SOX successfully generated iPSCs for older patients, monkeys, and livestock. Without it, these cells simply couldn’t be reprogrammed at all.”
Illustration that depicts Oct4 as a yellow coil nearby super-Sox, depicted as a blue coil. Small, light blue rods jut out from both coils, with red and dark blue dots at the ends of the rods. A single yellow dot signifies the A to V swap at position 61 in Sox2.
By swapping an A to a V at position 61 in Sox2, Velychko and his team created a chimeric Sox2-Sox17 transcription factor, which they called super-SOX. This allowed for an enhanced dimerization between Oct4 (depicted in yellow) and super-SOX (in blue).
Sergiy Velychko; MacCarthy CM, et al. Cell Stem Cell. 2024.
Now, with super-SOX, the researchers can reprogram a somatic cell to an early pluripotent state and use it to create embryos that successfully develop into live mice. But Velychko and his team went beyond this laboratory model and found that their technology also works in human, pig, cow, and macaque cells. Velychko emphasized he isn’t suggesting scientists use a similar approach to create complete humans from iPSCs and noted, “We first need to validate the technology by creating iPSC-derived mammals beyond the mouse model, particularly non-human primates.” He added, “In the future, this technology will enable unlimited genetic editing of mammalian germline as well as growing patient-specific tissues and organs.”
In addition to improving organoid disease models, Schöler hopes that researchers can use super-SOX to establish libraries of human iPSCs that have enhanced differentiation ability to become any organ needed for transplantation.
Bespoke Reprogramming Factors
Putting Oct4 on a leash allowed Velychko and his team to improve reprogramming efficiency, but this isn’t the only concern that scientists have with the OSKM-cocktail-driven cell transformation. Low-fidelity reprogramming is also an issue. “That's going to be, for most studies, more important than efficiency,” said Soufi. For example, a scientist who is trying to model a neurodegenerative disease that is characterized by a methylation defect needs to make sure that the reprogrammed neurons retain these epigenetic signatures to recapitulate the disease. “But in most cases, the reprogramming itself will wipe out those epigenetic marks,” said Soufi. OSKM factors can also introduce unwanted modifications, including cancer-causing mutations.
“These transcription factors do far more than we think they're doing. We simply don't understand,” said Soufi.
Photo of Abdenour Soufi
In his lab at the University of Edinburgh, Abdenour Soufi is developing synthetic reprogramming factors to convert fibroblasts into iPSCs.
Abdenour Soufi
To help shine a light on the black box of reprogramming, Soufi studies how transcription factors interact with the genome to transform a somatic cell into an iPSC. In the early 2010s, while he was working as a postdoctoral researcher with developmental biologist Kenneth Zaret at the University of Pennsylvania, Soufi and his colleagues found that the OSKM factors interact with the genome in human fibroblast differently from that in ESCs.14 “To our surprise, what we found was [that] they are not bound to the regions we expect them to be,” said Soufi. The factors were not making a beeline for the pluripotency genes in open chromatin, but rather, they were concentrated at closed chromatin sites enriched for nucleosomes, blocked off from the sites that are necessary for complete reprogramming.15
In a recently published study, Soufi and his team used different DNA sequencing techniques to map where in the genome the OSKM factors bind.16 They found that the four factors often flock to the same locations in the genome during reprogramming and that their movements are far from random. Instead, they traverse the genome by following specific DNA patterns, which serve as sort of “signposts,” directing factors as they navigate chromatin loops in search of their gene targets. Depending on the combination of transcription factors, they can follow separate routes through the genome.
These earliest OSKM binding decisions help set the stage for determining cell identity, but researchers still know relatively little about where and how these factors bind closed chromatin in the genomes across cell types, which could hold clues about the inefficiency, heterogeneity, and mutation risk documented in resulting iPSCs.
To address this, Soufi and his team have been tracking the interactions of the pluripotency superstar Oct4 with nucleosomes. In an earlier study, they created a library of human Oct4 mutants by deleting subsequent stretches of five amino acids throughout the entire sequence of the gene.17 In monitoring their activity, the researchers determined which regions of Oct4 played a role in orchestrating pioneer activity.
Now, equipped with a library of custom-made Oct4 variants, Soufi and his team aim to identify the minimum components of the transcription factors that are necessary for reprogramming. In a paper, which is currently under review, they found that there are large portions of Oct4 that are not required for reprogramming. “So, you can delete them altogether and still reprogram cells,” said Soufi. Additionally, their findings suggest that Oct4 utilizes distinct DNA binding domains to interact with various motifs, thereby activating different genetic programs. “We found [that] the regions that are important for maintaining pluripotency in stem cells and the regions that are important for inducing pluripotency are actually quite different.” Using this information, Soufi said that they could potentially create synthetic versions of Oct4 that either induce or maintain pluripotency while avoiding off-target binding.
Custom-made, synthetic reprogramming factors could provide scientists with more control over reprogramming and generate high-fidelity iPSC that are safer for use in the clinic. For example, different versions of Oct4 could either induce or maintain pluripotency while a synthetic factor that selectively adopts c-Myc’s reprogramming functions could avoid its oncogenic properties.
Using AI to Engineer Enhanced Reprogramming Factors
Oct4 has long been considered the most important among the four Yamanaka reprogramming factors. However, as scientists have continued to explore the ins and outs of reprogramming, Oct4’s role has proven to be more nuanced. Rather than serving as a single, master conductor, scientists are identifying various partnerships between Oct4 forms and other factors. Fine-tuning Oct4’s partnerships can enhance the quality of reprogrammed cells. Now, researchers are turning to the power of artificial intelligence (AI) to give Oct4 and the other Yamanaka factors a boost.
For example, earlier this year, OpenAI announced a partnership with Retro Biosciences, a longevity research company with the slogan “We’re adding 10 years to healthy human lifespan.” To achieve this lofty goal, the biotechnology company is exploring ways to prevent and reverse aging and treat age-related diseases, which is where stem cell reprogramming comes into the picture. Stem cell technologies have enormous potential in regenerative medicine, drug discovery, and disease modeling.
Recognizing the limitations with the classic Yamanaka factors, OpenAI is developing an AI model for Retro Biosciences that can predict new protein designs that are better at reprogramming cells into stem cells. While AlphaFold can predict protein structure and aid in the design of de novo proteins, it struggles with intrinsically disordered proteins, and Yamanaka factors contain such intrinsically disordered regions. Therefore, OpenAI’s model took a different approach. With the help of their model, Retro Biosciences scientists created “retroSOX” and “retroKLF”, enhanced versions of the two factors that Velychko and colleagues showed to be crucial for inducing human early-stage pluripotency. OpenAI’s model generated completely novel sequences, which Velychko noted could potentially cause immune responses if delivered to human patients. This is an important consideration if scientists want to use the AI-generated factors for gene therapy purposes, such as to reverse age-related neurodegeneration.18 Moreover, if engineered reprogramming factors contain completely novel sequences, especially in disordered regions, it will be very difficult to determine the mechanism of their action. "The mechanism is key!" says Velychko.
Velychko has some concerns with the preliminary data shared with MIT Technology Review, including images provided by OpenAI that look more like transformed fibroblasts rather than iPSCs. “Nevertheless, it is exciting to see the interest in our field from big and powerful players,” said Velychko.
Whether or not AI will one day help us concoct a fountain of youth, the reprogramming field is poised to make considerable advances over the next decade.
Disclosure of conflicts of interest: Hans Schöler and Sergiy Velychko filed a patent with Max Planck Innovation on highly cooperative Sox factors and SK naive reset. Velychko serves as an advisor for eGenesis.
Cowan CA, et al. Nuclear reprogramming of somatic cells after fusion with human embryonic stem cells. Science. 2005;309(5739):1369-1373.
Takahashi K, Yamanaka S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell. 2006;126(4):663-676.
Shi Y, et al. Induced pluripotent stem cell technology: A decade of progress. Nat Rev Drug Discov. 2017;16(2):115-130.
Schöler HR, et al. A family of octamer-specific proteins present during mouse embryogenesis: Evidence for germline-specific expression of an Oct factor. EMBO J. 1989;8(9):2543-2550.
Nichols J, et al. Formation of pluripotent stem cells in the mammalian embryo depends on the POU transcription factor Oct4. Cell. 1998;95(3):379-391.
Pesce M, Schöler HR. Oct-4: Gatekeeper in the beginnings of mammalian development. Stem Cells. 2001;19(4):271-278.
Radzisheuskaya A, Silva JC. Do all roads lead to Oct4? The emerging concepts of induced pluripotency. Trends Cell Biol. 2014;24(5):275-284.
Nakagawa M, et al. Generation of induced pluripotent stem cells without Myc from mouse and human fibroblasts. Nat Biotechnol. 2008;26(1):101-106.
Velychko S, et al. Excluding Oct4 from Yamanaka cocktail unleashes the developmental potential of iPSCs. Cell Stem Cell. 2019;25(6):737-753.e4.
Hochedlinger K, Plath K. Epigenetic reprogramming and induced pluripotency. Development. 2009;136(4):509-523.
Zaret KS, Carroll JS. Pioneer transcription factors: Establishing competence for gene expression. Genes Dev. 2011;25(21):2227-2241.
Jauch R, et al. Conversion of Sox17 into a pluripotency reprogramming factor by reengineering its association with Oct4 on DNA. Stem Cells. 2011;29(6):940-951.
MacCarthy CM, et al. Highly cooperative chimeric super-SOX induces naive pluripotency across species. Cell Stem Cell. 2024;31(1):127-147.e9.
Soufi A, et al. Facilitators and impediments of the pluripotency reprogramming factors' initial engagement with the genome. Cell. 2012;151(5):994-1004.
Soufi A, et al. Pioneer transcription factors target partial DNA motifs on nucleosomes to initiate reprogramming. Cell. 2015;161(3):555-568.
O'Dwyer MR, et al. Nucleosome fibre topology guides transcription factor binding to enhancers. Nature. 2025;638(8049):251-260.
Roberts GA, et al. Dissecting OCT4 defines the role of nucleosome binding in pluripotency. Nat Cell Biol. 2021;23(8):834-845.
Shen YR, et al. Expansion of the neocortex and protection from neurodegeneration by in vivo transient reprogramming. Cell Stem Cell. 2024;31(12):1741-1759.e8.