GeneRAG: A Retrieval-Augmented Framework for Spatially Resolved Gene Expression Prediction

Kim, Hyeongsub; Kim, Sihyun; Cho, Minyoung; Jo, Sanghyun; Lee, Minhyeong; Kim, Kyungsu

GeneRAG: A Retrieval-Augmented Framework for
Spatially Resolved Gene Expression Prediction

Hyeongsub Kim^1,2, Sihyun Kim¹, Minyoung Cho^1,2, Sanghyun Jo^1,3, Minhyeong Lee^1,4, Kyungsu Kim^1,✉

¹Seoul National University, Korea ²LG CNS, Korea ³OGQ, Korea ⁴Asteromorph, Inc., Korea

MICCAI 2026

^✉ Corresponding author: kyskim@snu.ac.kr

Paper Code

GeneRAG framework overview: a frozen encoder and decoder produce a hybrid query that retrieves from a pre-constructed Reference Bank.

Overview of GeneRAG. At test time, a frozen encoder $\mathcal{E}$ and decoder $\mathcal{D}$ produce a feature embedding $f_{img}$ and an initial gene prediction $\hat{y}_{init}$. These form a Hybrid Query used to retrieve from a pre-constructed Reference Bank, yielding the full gene-expression prediction $\hat{y}_{full}$ — including genes that were never in the backbone's training panel.

Abstract

Spatial transcriptomics (ST) is pivotal for deciphering molecular organization, yet cross-modal variability challenges accurate H&E-based profiling. Existing models struggle to generalize to unseen genes and lack clinical interpretability. We propose GeneRAG, a model-agnostic Retrieval-Augmented Generation framework with a Dual-Constrained Retrieval module. Unlike conventional black-box networks that rely solely on fixed parameters, GeneRAG explicitly decouples knowledge storage from model training. By optimizing an ElasticNet-based sparse sampling matrix, GeneRAG integrates morphological and biological constraints to fetch relevant samples from a pre-constructed bank. Leveraging conserved gene correlations, this enables accurate reconstruction of comprehensive profiles, including entirely unseen genes.

On the HEST-1k dataset, GeneRAG seamlessly enhances state-of-the-art models in a plug-and-play manner, improving Stem's PCC-10 from 0.8322 to 0.8711 (Breast). For zero-shot generalization to 5,000 genes, Stem + GeneRAG achieves a PCC-5000 of 0.5188, vastly outperforming DeepSpot (0.0748). GeneRAG provides robust, transparent predictions, highlighting its potential for clinical deployment.

Zero-Shot

Predicts 5,000 unseen genes per spot
No fine-tuning, no re-training

Plug & Play

Model-agnostic — works with any Image-to-ST backbone
Validated on Stem, UNI, EXAONE-Path 2.5

Interpretable

Sparse retrieval — ~50 reference patches per query
Each weight traces back to a training spot

Dual-Constrained Retrieval

GeneRAG formulates retrieval as a convex optimization that simultaneously enforces morphological and biological constraints. Given the Morphology Bank $D_{img}\!\in\!\mathbb{R}^{d\times N}$ (frozen visual embeddings of $N$ training spots), an Anchor Gene Bank $D_{anchor}\!\in\!\mathbb{R}^{k\times N}$ (a subset of the Full Gene Bank), and the hybrid query $(f_{img}, y_{anchor})$, GeneRAG solves the following ElasticNet problem for the sparse sampling vector $\hat{\alpha}\!\in\!\mathbb{R}^{N}$:

$$ \hat{\alpha} \;=\; \arg\min_{\alpha\in\mathbb{R}^{N}} \; \omega\|f_{img}-D_{img}\alpha\|_{2}^{2} \,+\,(1-\omega)\|y_{anchor}-D_{anchor}\alpha\|_{2}^{2} \,+\,\gamma\lambda\|\alpha\|_{2}^{2} \,+\,(1-\gamma)\lambda\|\alpha\|_{1}. \qquad (1) $$

The first two terms balance morphology vs. biology through $\omega\!\in\![0,1]$; the ElasticNet regularizer enforces sparsity (typically $\sim\!50$ active references per query) while stabilizing the selection. Equation (1) is solved as a multi-output ElasticNet via batched FISTA, completing slide-level optimization within minutes on a single GPU.

Given $\hat{\alpha}$, the full transcriptomic prediction is a simple linear combination over the Full Gene Bank $D_{full}$:

$$ \hat{y}_{full} \;=\; D_{full}\cdot\hat{\alpha}. \qquad (2) $$

Because $D_{full}$ retains the complete expression panel (including genes beyond the backbone's training subset), retrieval-based reconstruction in Eq. (2) achieves zero-shot extrapolation for unseen genes, leveraging the biologically conserved gene-to-gene correlations among morphologically similar reference patches.

Two-panel diagram. (a) Dual-Constrained Retrieval: ElasticNet outputs the sparse sampling matrix from the morphology and anchor-gene queries. (b) Full Gene Reconstruction: the sparse weights are applied to the Full Gene Bank to recover the entire expression panel. — **GeneRAG Retrieval and Reconstruction Pipeline.** (a) ElasticNet-based Dual-Constrained Retrieval produces the sparse sampling matrix $\hat{\alpha}$ from the morphology query $f_{img}$ and the anchor-gene query $y_{anchor}$. (b) Applying $\hat{\alpha}$ to the Full Gene Bank $D_{full}$ extrapolates the entire expression panel, including unseen genes.

In-Domain Calibration (Core HVG)

On the HEST-1k Core HVG setup (top 300 HVGs for Breast; top 200 HVGs for Kidney and Prostate), plugging GeneRAG on top of any backbone — generative model (Stem), H&E-only foundation model (UNI), or multi-modal foundation model (EXAONE-Path 2.5) — consistently calibrates and improves predictive performance.

Model	GeneRAG	Breast			Kidney			Prostate
Model	GeneRAG	PCC-10	PCC-50	PCC-300	PCC-10	PCC-50	PCC-200	PCC-10	PCC-50	PCC-200
HisToGene	✗	0.6812	0.6345	0.5250	0.4294	0.3503	0.0905	0.4035	0.3554	0.2235
BLEEP	✗	0.7727	0.7141	0.5652	0.4998	0.4221	0.3143	0.5798	0.5102	0.3158
TRIPLEX	✗	0.7907	0.7394	0.5766	0.4654	0.4105	0.3165	0.6173	0.4953	0.3601
Stem	✗	0.8322	0.7696	0.6228	0.5768	0.4989	0.3354	0.6310	0.5546	0.3937
UNI	✗	0.8301	0.7909	0.6024	0.4828	0.3888	0.2715	0.5548	0.4761	0.3076
CONCH	✗	0.7799	0.7467	0.6043	0.3583	0.3109	0.2243	0.5660	0.4715	0.3171
EXAONE-Path 2.5	✗	0.8217	0.7850	0.6251	0.4584	0.4023	0.3009	0.6204	0.5313	0.3536
Stem	✓	0.8711	0.8275	0.7029	0.6068	0.5419	0.4055	0.7001	0.6693	0.5415
UNI	✓	0.8670	0.8257	0.7017	0.5529	0.4987	0.3525	0.6801	0.6322	0.5046
EXAONE-Path 2.5	✓	0.8589	0.8175	0.7002	0.5479	0.4886	0.3347	0.6911	0.6513	0.5371

Table 1. Performance comparison on the Core HVG (PCC-$k$, higher is better). GeneRAG-augmented rows highlighted.

Zero-Shot Extrapolation (Global HVG, 5,000 genes)

On 5,000 unseen genes, GeneRAG-augmented backbones outperform DeepSpot — a parametric model explicitly trained for large-scale gene prediction. On the Breast dataset, Stem + GeneRAG attains PCC-5000 = 0.5188 versus DeepSpot's 0.0748.

Dataset	Model	GeneRAG	PCC-10	PCC-50	PCC-300	PCC-1000	PCC-2000	PCC-3000	PCC-5000
Breast	DeepSpot	✗	0.3932	0.3287	0.2469	0.1823	0.1413	0.1142	0.0748
	UNI	✓	0.8716	0.8393	0.7894	0.7297	0.6721	0.6248	0.5465
	EXAONE-Path 2.5	✓	0.8619	0.8316	0.7825	0.7254	0.6728	0.6301	0.5591
	Stem	✓	0.8720	0.8347	0.7801	0.7142	0.6505	0.6002	0.5188
Kidney	DeepSpot	✗	0.2102	0.1685	0.1334	0.0889	0.0660	0.0510	0.0273
	UNI	✓	0.5666	0.5182	0.4189	0.3148	0.2558	0.2236	0.1828
	EXAONE-Path 2.5	✓	0.5601	0.5069	0.4084	0.3107	0.2495	0.2149	0.1715
	Stem	✓	0.6158	0.5656	0.4778	0.3696	0.3060	0.2691	0.2221
Prostate	DeepSpot	✗	0.1362	0.1176	0.0868	0.0650	0.0503	0.0406	0.0265
	UNI	✓	0.6840	0.6588	0.5875	0.4902	0.4157	0.3663	0.1980
	EXAONE-Path 2.5	✓	0.7051	0.6800	0.6042	0.4977	0.4187	0.3669	0.2981
	Stem	✓	0.7128	0.6846	0.6000	0.4851	0.4015	0.3477	0.2781

Table 2. Performance comparison on the Global HVG zero-shot extrapolation (PCC-$k$, higher is better).

Qualitative Results

Top-5 retrieved reference patches sharing morphological and transcriptomic profiles — Top-5 retrieved reference patches share both morphological and transcriptomic profiles with the target spot, providing transparent, biologically plausible rationales for each prediction.

GeneRAG: A Retrieval-Augmented Framework forSpatially Resolved Gene Expression Prediction