BGPT: Paper Review: RegFormer: A Single-Cell Foundation Model Powered by Gene Regulatory Hierarchies

Fuel Your Discoveries

Quick Explanation Copied

RegFormer proposes a single-cell foundation model that injects gene regulatory network (GRN) hierarchy into sequence modeling by topologically sorting genes and training with dual objectives (masked expression-value prediction + regulatory-role / next-token prediction) using Mamba Blocks, then evaluates on cell annotation, GRN construction, genetic perturbations, and drug response prediction.

Evidence anchored in the provided paper text: 10.1101/2025.01.24.634217.

Long Explanation

Paper Review (Evidence-grounded): RegFormer

Paper: RegFormer: A Single-Cell Foundation Model Powered by Gene Regulatory Hierarchies

Preprint date in provided text: January 26, 2025.

Model size & pretraining

~50M params

Pretrained on ~22M human cells

Core mechanism

GRN-guided gene order

Topological sorting removes cycles (~2%) then imposes order

Training signals

Dual pretraining

Masked value prediction + regulatory-role / next-token prediction

1) What the paper claims (mechanistic + evaluation claims)

Core modeling idea: RegFormer treats scRNA-seq gene expression profiles as a sequence by imposing a GRN-derived order. It builds a regulatory DAG from a knowledge base, removes cycles (≈2% edges), then induces a subgraph on non-zero expressed genes and applies topological sorting to obtain an ordered gene sequence for each cell.

Dual embeddings: It separately embeds expression values (quantitative) and gene identities (qualitative/context), then uses Mamba Blocks to model dependencies.

Generative pretraining: It uses masked expression-value prediction and a regulatory-role prediction that leverages the sorted order (described as next-token prediction).

Reported downstream wins:

Cell annotation: On Zheng68K, the provided text states accuracy 0.86 and Macro-F1 0.77, and reports better clustering separability in UMAPs.
GRN construction: It uses RegFormer-generated gene embeddings and pairs them with SCENIC to infer TF→target relationships, reporting superiority on a functional similarity index (GO BP), and enriched GO pathways across Leiden resolutions.
Genetic perturbations: It integrates with GEARS perturbation embedding framework and reports lower MSE and higher Pearson correlation on Adamson and consistent strong performance on Norman (numerical values not included in the provided excerpt beyond metric directions).
Drug response: It integrates with DeepCDR and reports improvements in correlation metrics (PCC, SRCC), including a leave-drug-out blind test described as outperforming deepCDR across 200 drugs.

2) Scientific plausibility check (known/assumed vs uncertain)

Known from the paper text:

The ordering step is explicitly GRN-knowledge based via topological sorting after cycle removal.
Pretraining includes masked reconstruction of expression values and a next-token / regulatory-role objective that uses this order.

Uncertainties / potential failure modes (skeptical lens):

GRN incompleteness bias: If the knowledge-base GRN is incomplete or context-mismatched, the model’s imposed “biological order” may encode systematic biases. The paper acknowledges that performance can depend on the quality of regulatory network data, but the provided excerpt doesn’t show stress-tests across alternative GRN sources.
Cycle removal distortion: Removing cycles to force a DAG changes the underlying feedback structure of real GRNs. Even if cycle removal is only ~2% edges per the paper, feedback motifs are often biologically important; this can blur interpretation of regulatory directionality.
Artificial order vs true gene-expression “set” nature: scRNA-seq is fundamentally unordered across genes; RegFormer makes it sequential by imposing order. That can help optimization but can also create spurious positional correlations (positional leakage) if the GRN ordering correlates with technical/biological confounders. The provided excerpt does mention ablations for topological sorting, but numeric details are not provided here.
Downstream pairing may dominate gains: Some tasks are “foundation model embeddings + external predictor” (SCENIC, GEARS, DeepCDR). Improvements might reflect better embeddings, but also could reflect how RegFormer outputs align with the inductive biases of these pipelines. Without explicit “end-to-end vs embedding-only” ablations shown in this excerpt, attribution remains uncertain.

3) Evaluation critique: what’s strong vs what’s missing

Strength signals (from provided text)

Multi-task evaluation spanning annotation, GRN inference, perturbations, and drug response.
Pretraining is claimed large-scale (~22M cells), which is plausibly helpful for generalization, and the paper reports consistent performance across multiple human tissues/datasets described in the results excerpt.

Missing information / red-flag possibilities (within excerpt constraints)

No explicit statistical testing details in the excerpt (e.g., confidence intervals, repeated runs, significance tests). The results are described qualitatively and with some metric values (e.g., Zheng68K accuracy/Macro-F1), but the uncertainty quantification approach is not visible here.
Attribution ambiguity because several improvements may come from embedding compatibility with downstream modules rather than from GRN hierarchy per se; the excerpt mentions ablations for topological sorting but details are not provided here.
Biological validation beyond benchmark metrics: The excerpt provides case studies (e.g., FOSB, SOX7) but does not include direct experimental confirmation that these inferred regulons are causally active in the tested datasets. Without wet-lab confirmation, interpretability claims remain correlational.

4) Reproducibility & implementation signals

Code availability is explicitly claimed with a public repository: https://github.com/BGIResearch/RegFormer.
Data availability is described as using published sources, with supplementary details, but the excerpt does not include exact dataset accession IDs for each benchmark.

5) Bottom-line assessment (with confidence)

Main conclusion

RegFormer’s central novelty is imposing GRN-derived hierarchical gene ordering and then using that order to support generative pretraining with masked expression-value reconstruction plus hierarchy-aware regulatory-role prediction in a Mamba-based architecture. In the provided paper text, this design is reported to improve performance across cell annotation, GRN construction (via SCENIC integration), genetic perturbation prediction (via GEARS integration), and drug response prediction (via DeepCDR integration).

Confidence: moderate (the excerpt contains architecture and some explicit metric values, but lacks uncertainty quantification and many numeric details/ablation specifics needed for a fully rigorous attribution).

What would most challenge the claims?

If alternative GRN knowledge bases (different coverage/completeness) substantially degrade performance, the effect may be more about specific GRN priors than “hierarchies in general.”
If topological-order ablations (e.g., shuffled order, identity order, or non-hierarchical DAGs) eliminate gains, the purported hierarchy advantage could be due to order-imposition rather than biology-informed hierarchy.
If improvements in GRN/perturbation/drug tasks do not translate when embedding extraction is frozen and only the downstream head changes (or vice versa), the attribution to the foundation model’s GRN-hierarchical training becomes weaker.

Explore more on BGPT

Author Reviews (BGPT)

Feedback:

Updated: April 10, 2026