GeneRAG: A Retrieval-Augmented Framework for
Spatially Resolved Gene Expression Prediction

1Seoul National University, Korea    2LG CNS, Korea    3OGQ, Korea    4Asteromorph, Inc., Korea
MICCAI 2026
Corresponding author: kyskim@snu.ac.kr
GeneRAG framework overview: a frozen encoder and decoder produce a hybrid query that retrieves from a pre-constructed Reference Bank.

Overview of GeneRAG. At test time, a frozen encoder $\mathcal{E}$ and decoder $\mathcal{D}$ produce a feature embedding $f_{img}$ and an initial gene prediction $\hat{y}_{init}$. These form a Hybrid Query used to retrieve from a pre-constructed Trainset Reference Bank, yielding the full gene-expression prediction $\hat{y}_{full}$ — including genes that were never in the backbone's training panel.

Abstract

Spatial transcriptomics (ST) is pivotal for deciphering molecular organization, yet cross-modal variability challenges accurate H&E-based profiling. Existing models struggle to generalize to unseen genes and lack clinical interpretability. We propose GeneRAG, a model-agnostic Retrieval-Augmented Generation framework with a Dual-Constrained Retrieval module. Unlike conventional black-box networks that rely solely on fixed parameters, GeneRAG explicitly decouples knowledge storage from model training. By optimizing an ElasticNet-based sparse sampling matrix, GeneRAG integrates morphological and biological constraints to fetch relevant samples from a pre-constructed bank. Leveraging conserved gene correlations, this enables accurate reconstruction of comprehensive profiles, including entirely unseen genes.

On the HEST-1k dataset, GeneRAG seamlessly enhances state-of-the-art models in a plug-and-play manner, improving Stem's PCC-10 from 0.8322 to 0.8711 (Breast). For zero-shot generalization to 5,000 genes, Stem + GeneRAG achieves a PCC-5000 of 0.5188, vastly outperforming DeepSpot (0.0748). GeneRAG provides robust, transparent predictions, highlighting its potential for clinical deployment.

5,000
Unseen genes predicted (zero-shot)
Plug & Play
Model-agnostic — works with any Image-to-ST backbone
Validated on Stem, UNI, EXAONE-Path 2.5
~50
Reference patches activated per query (sparse & interpretable)

Dual-Constrained Retrieval

GeneRAG formulates retrieval as a convex optimization that simultaneously enforces morphological and biological constraints. Given the Morphology Bank $D_{img}\!\in\!\mathbb{R}^{d\times N}$ (frozen visual embeddings of $N$ training spots), an Anchor Gene Bank $D_{anchor}\!\in\!\mathbb{R}^{k\times N}$ (a subset of the Full Gene Bank), and the hybrid query $(f_{img}, y_{anchor})$, GeneRAG solves the following ElasticNet problem for the sparse sampling vector $\hat{\alpha}\!\in\!\mathbb{R}^{N}$:

$$ \hat{\alpha} \;=\; \arg\min_{\alpha\in\mathbb{R}^{N}} \; \omega\|f_{img}-D_{img}\alpha\|_{2}^{2} \,+\,(1-\omega)\|y_{anchor}-D_{anchor}\alpha\|_{2}^{2} \,+\,\gamma\lambda\|\alpha\|_{2}^{2} \,+\,(1-\gamma)\lambda\|\alpha\|_{1}. \qquad (1) $$

The first two terms balance morphology vs. biology through $\omega\!\in\![0,1]$; the ElasticNet regularizer enforces sparsity (typically $\sim\!50$ active references per query) while stabilizing the selection. Equation (1) is solved as a multi-output ElasticNet via batched FISTA, completing slide-level optimization within minutes on a single GPU.

Given $\hat{\alpha}$, the full transcriptomic prediction is a simple linear combination over the Full Gene Bank $D_{full}$:

$$ \hat{y}_{full} \;=\; D_{full}\cdot\hat{\alpha}. \qquad (2) $$

Because $D_{full}$ retains the complete expression panel (including genes beyond the backbone's training subset), retrieval-based reconstruction in Eq. (2) achieves zero-shot extrapolation for unseen genes, leveraging the biologically conserved gene-to-gene correlations among morphologically similar reference patches.

Two-panel diagram. (a) Dual-Constrained Retrieval: ElasticNet outputs the sparse sampling matrix from the morphology and anchor-gene queries. (b) Full Gene Reconstruction: the sparse weights are applied to the Full Gene Bank to recover the entire expression panel.
GeneRAG Retrieval and Reconstruction Pipeline. (a) ElasticNet-based Dual-Constrained Retrieval produces the sparse sampling matrix $\hat{\alpha}$ from the morphology query $f_{img}$ and the anchor-gene query $y_{anchor}$. (b) Applying $\hat{\alpha}$ to the Full Gene Bank $D_{full}$ extrapolates the entire expression panel, including unseen genes.

In-Domain Calibration (Core HVG)

On the HEST-1k Core HVG setup (top 300 HVGs for Breast; top 200 HVGs for Kidney and Prostate), plugging GeneRAG on top of any backbone — generative (Stem), H&E-only foundation model (UNI), or multi-modal foundation model (EXAONE-Path 2.5) — consistently calibrates and improves predictive performance.

Model GeneRAG Breast Kidney Prostate
PCC-10PCC-50PCC-300 PCC-10PCC-50PCC-200 PCC-10PCC-50PCC-200
HisToGene0.68120.63450.52500.42940.35030.09050.40350.35540.2235
BLEEP0.77270.71410.56520.49980.42210.31430.57980.51020.3158
TRIPLEX0.79070.73940.57660.46540.41050.31650.61730.49530.3601
Stem0.83220.76960.62280.57680.49890.33540.63100.55460.3937
UNI0.83010.79090.60240.48280.38880.27150.55480.47610.3076
CONCH0.77990.74670.60430.35830.31090.22430.56600.47150.3171
EXAONE-Path 2.50.82170.78500.62510.45840.40230.30090.62040.53130.3536
Stem0.87110.82750.70290.60680.54190.40550.70010.66930.5415
UNI0.86700.82570.70170.55290.49870.35250.68010.63220.5046
EXAONE-Path 2.50.85890.81750.70020.54790.48860.33470.69110.65130.5371

Table 1. Performance comparison on the Core HVG (PCC-$k$, higher is better). GeneRAG-augmented rows highlighted.

Zero-Shot Extrapolation (Global HVG, 5,000 genes)

On 5,000 unseen genes, GeneRAG-augmented backbones outperform DeepSpot — a parametric model explicitly trained for large-scale gene prediction — by 5×–10×. On the Breast dataset, Stem + GeneRAG attains PCC-5000 = 0.5188 versus DeepSpot's 0.0748 (~7×).

Dataset Model GeneRAG PCC-10PCC-50PCC-300PCC-1000PCC-2000PCC-3000PCC-5000
Breast DeepSpot0.39320.32870.24690.18230.14130.11420.0748
UNI0.87160.83930.78940.72970.67210.62480.5465
EXAONE-Path 2.50.86190.83160.78250.72540.67280.63010.5591
Stem0.87200.83470.78010.71420.65050.60020.5188
Kidney DeepSpot0.21020.16850.13340.08890.06600.05100.0273
UNI0.56660.51820.41890.31480.25580.22360.1828
EXAONE-Path 2.50.56010.50690.40840.31070.24950.21490.1715
Stem0.61580.56560.47780.36960.30600.26910.2221
Prostate DeepSpot0.13620.11760.08680.06500.05030.04060.0265
UNI0.68400.65880.58750.49020.41570.36630.1980
EXAONE-Path 2.50.70510.68000.60420.49770.41870.36690.2981
Stem0.71280.68460.60000.48510.40150.34770.2781

Table 2. Performance comparison on the Global HVG zero-shot extrapolation (PCC-$k$, higher is better).

BibTeX

@inproceedings{kim2026generag,
  title     = {GeneRAG: A Retrieval-Augmented Framework for Spatially Resolved Gene Expression Prediction},
  author    = {Kim, Hyeongsub and Kim, Sihyun and Cho, Minyoung and Jo, Sanghyun and Lee, Minhyeong and Kim, Kyungsu},
  booktitle = {Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)},
  year      = {2026}
}