Skip to content

Commit c59bd58

Browse files
- Minor update to the docs for clarification on the API and Dataset Recipies.
1 parent 5cb9674 commit c59bd58

File tree

3 files changed

+45
-45
lines changed

3 files changed

+45
-45
lines changed

README.md

Lines changed: 40 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,46 @@
1-
# SpatialTranscriptFormer
1+
# SpatialTranscriptFormer Framework
22

33
> [!WARNING]
44
> **Work in Progress**: This project is under active development. Core architectures, CLI flags, and data formats are subject to major changes.
55
6+
<!-- -->
7+
68
> [!TIP]
79
> **Framework Release**: SpatialTranscriptFormer has been restructured from a research codebase into a robust framework. You can now use the Python API to train on your own spatial transcriptomics data with custom backbones and architectures.
810
911
**SpatialTranscriptFormer** is a modular deep learning framework designed to bridge histology and biological pathways. It leverages transformer architectures to model the interplay between morphological features and gene expression signatures, providing interpretable mapping of the tissue microenvironment.
1012

13+
## Python API: Quick Start
14+
15+
The framework is designed to be integrated programmatically into your scanpy/AnnData workflows:
16+
17+
```python
18+
from spatial_transcript_former import SpatialTranscriptFormer, Predictor, FeatureExtractor
19+
from spatial_transcript_former.predict import inject_predictions
20+
21+
# 1. Initialize model and backbone
22+
model = SpatialTranscriptFormer.from_pretrained("./checkpoints/stf_small/")
23+
extractor = FeatureExtractor(backbone="phikon", device="cuda")
24+
predictor = Predictor(model, device="cuda")
25+
26+
# 2. Predict from features
27+
predictions = predictor.predict_wsi(features, coords) # (1, G)
28+
29+
# 3. Integrate with Scanpy
30+
inject_predictions(adata, coords, predictions[0], gene_names=model.gene_names)
31+
```
32+
33+
For more details, see the **[Python API Reference](docs/API.md)**.
34+
1135
## Key Technical Pillars
1236

37+
- **Modular Architecture**: Decoupled backbones, interaction modules, and output heads.
1338
- **Quad-Flow Interaction**: Configurable attention between Pathways and Histology patches (`p2p`, `p2h`, `h2p`, `h2h`).
1439
- **Pathway Bottleneck**: Interpretable gene expression prediction via 50 MSigDB Hallmark tokens.
15-
- **Spatial Pattern Coherence**: Optimized using a composite **MSE + PCC (Pearson Correlation) loss** to prevent spatial collapse and ensure accurate morphology-expression mapping.
40+
- **Spatial Pattern Coherence**: Optimized using a composite **MSE + PCC (Pearson Correlation) loss**.
1641
- **Foundation Model Ready**: Native support for **CTransPath**, **Phikon**, **Hibou**, and **GigaPath**.
17-
- **Biologically Informed Initialization**: Gene reconstruction weights derived from known hallmark memberships.
42+
43+
---
1844

1945
## License
2046

@@ -31,69 +57,42 @@ This project is protected by a **Proprietary Source Code License**. See the [LIC
3157

3258
The core architectural innovations, including the **SpatialTranscriptFormer** interaction logic and spatial masking strategies, are the unique Intellectual Property of the author. For a detailed breakdown, see the [IP Statement](docs/IP_STATEMENT.md).
3359

60+
---
61+
3462
## Installation
3563

3664
This project requires [Conda](https://docs.conda.io/en/latest/).
3765

3866
1. Clone the repository.
3967
2. Run the automated setup script:
40-
3. On Windows: `.\setup.ps1`
41-
4. On Linux/HPC: `bash setup.sh`
68+
- On Windows: `.\setup.ps1`
69+
- On Linux/HPC: `bash setup.sh`
4270

43-
## Usage: HEST-1k Benchmark Recipe
71+
## Exemplar Recipe: HEST-1k Benchmark
4472

45-
While the core `SpatialTranscriptFormer` framework can be integrated programmatically with any dataset (see the **[Python API Reference](docs/API.md)** and **[Bring Your Own Data Guide](src/spatial_transcript_former/recipes/custom/README.md)**), this repository includes a complete, out-of-the-box CLI pipeline specifically for reproducing our benchmarks on the [HEST-1k dataset](https://huggingface.co/datasets/MahmoodLab/hest).
73+
The `SpatialTranscriptFormer` repository includes a complete, out-of-the-box CLI pipeline as an exemplar for reproducing our benchmarks on the [HEST-1k dataset](https://huggingface.co/datasets/MahmoodLab/hest).
4674

47-
### Dataset Access
75+
### 1. Dataset Access & Preprocessing
4876

4977
```bash
50-
# List available filtering options
51-
stf-download --list-options
52-
53-
# Download a specific subset (e.g., Breast Cancer samples from Visium)
78+
# Download a specific subset
5479
stf-download --organ Breast --disease Cancer --tech Visium --local_dir hest_data
55-
56-
# Download all human samples
57-
stf-download --species "Homo sapiens" --local_dir hest_data
5880
```
5981

60-
> [!NOTE]
61-
> The HEST dataset is gated on Hugging Face. Ensure you have accepted the terms at [MahmoodLab/hest](https://huggingface.co/datasets/MahmoodLab/hest) and are logged in via `huggingface-cli login`.
62-
63-
### Train Models
64-
65-
We provide presets for baseline models and scaled versions of the SpatialTranscriptFormer.
82+
### 2. Training with Presets
6683

6784
```bash
6885
# Recommended: Run the Interaction model (Small)
6986
python scripts/run_preset.py --preset stf_small
70-
71-
# Run the lightweight Tiny version
72-
python scripts/run_preset.py --preset stf_tiny
73-
74-
# Run baselines
75-
python scripts/run_preset.py --preset he2rna_baseline
76-
```
77-
78-
For a complete list of configurations, see the [Training Guide](docs/TRAINING_GUIDE.md).
79-
80-
### Real-Time Monitoring
81-
82-
Monitor training progress, loss curves, and **prediction variance (collapse detector)** via the web dashboard:
83-
84-
```bash
85-
python scripts/monitor.py --run-dir runs/stf_interaction_l4
8687
```
8788

88-
### Inference & Visualization
89-
90-
Generate spatial maps comparing Ground Truth vs Predictions:
89+
### 3. Inference & Visualization
9190

9291
```bash
9392
stf-predict --data-dir A:\hest_data --sample-id MEND29 --model-path checkpoints/best_model.pth --model-type interaction
9493
```
9594

96-
Visualization plots will be saved to the `./results` directory.
95+
Visualization plots and spatial expression maps will be saved to the `./results` directory. For the full guide, see the **[HEST Recipe Docs](src/spatial_transcript_former/recipes/hest/README.md)**.
9796

9897
## Documentation
9998

docs/TRAINING_GUIDE.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
1-
# Training Guide
1+
# Training Guide (HEST Benchmark Recipe)
22

3-
This guide provides command-line recipes for training different architectures and configurations using `spatial_transcript_former.train`.
3+
> [!NOTE]
4+
> This guide provides command-line recipes specifically for the **HEST-1k benchmark dataset**. If you are looking to train on your own data using the core API, please see the **[Python API Reference](API.md)**.
45
56
## Prerequisites
67

src/spatial_transcript_former/recipes/hest/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
# HEST-1k Recipe
1+
# HEST-1k Recipe (Exemplar)
22

3-
This directory contains the recipe for training `SpatialTranscriptFormer` on the **HEST-1k** benchmark dataset.
3+
This directory serves as a comprehensive **exemplar** for training `SpatialTranscriptFormer` on the **HEST-1k** benchmark dataset.
44

55
While the core `SpatialTranscriptFormer` framework is dataset-agnostic, this recipe provides a complete, out-of-the-box pipeline for reproducing our benchmarks, including data downloading, preprocessing, and specialized dataloaders.
66

0 commit comments

Comments
 (0)