You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+40-41Lines changed: 40 additions & 41 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,20 +1,46 @@
1
-
# SpatialTranscriptFormer
1
+
# SpatialTranscriptFormer Framework
2
2
3
3
> [!WARNING]
4
4
> **Work in Progress**: This project is under active development. Core architectures, CLI flags, and data formats are subject to major changes.
5
5
6
+
<!---->
7
+
6
8
> [!TIP]
7
9
> **Framework Release**: SpatialTranscriptFormer has been restructured from a research codebase into a robust framework. You can now use the Python API to train on your own spatial transcriptomics data with custom backbones and architectures.
8
10
9
11
**SpatialTranscriptFormer** is a modular deep learning framework designed to bridge histology and biological pathways. It leverages transformer architectures to model the interplay between morphological features and gene expression signatures, providing interpretable mapping of the tissue microenvironment.
10
12
13
+
## Python API: Quick Start
14
+
15
+
The framework is designed to be integrated programmatically into your scanpy/AnnData workflows:
16
+
17
+
```python
18
+
from spatial_transcript_former import SpatialTranscriptFormer, Predictor, FeatureExtractor
19
+
from spatial_transcript_former.predict import inject_predictions
20
+
21
+
# 1. Initialize model and backbone
22
+
model = SpatialTranscriptFormer.from_pretrained("./checkpoints/stf_small/")
-**Spatial Pattern Coherence**: Optimized using a composite **MSE + PCC (Pearson Correlation) loss** to prevent spatial collapse and ensure accurate morphology-expression mapping.
40
+
-**Spatial Pattern Coherence**: Optimized using a composite **MSE + PCC (Pearson Correlation) loss**.
16
41
-**Foundation Model Ready**: Native support for **CTransPath**, **Phikon**, **Hibou**, and **GigaPath**.
17
-
-**Biologically Informed Initialization**: Gene reconstruction weights derived from known hallmark memberships.
42
+
43
+
---
18
44
19
45
## License
20
46
@@ -31,69 +57,42 @@ This project is protected by a **Proprietary Source Code License**. See the [LIC
31
57
32
58
The core architectural innovations, including the **SpatialTranscriptFormer** interaction logic and spatial masking strategies, are the unique Intellectual Property of the author. For a detailed breakdown, see the [IP Statement](docs/IP_STATEMENT.md).
33
59
60
+
---
61
+
34
62
## Installation
35
63
36
64
This project requires [Conda](https://docs.conda.io/en/latest/).
37
65
38
66
1. Clone the repository.
39
67
2. Run the automated setup script:
40
-
3. On Windows: `.\setup.ps1`
41
-
4. On Linux/HPC: `bash setup.sh`
68
+
- On Windows: `.\setup.ps1`
69
+
- On Linux/HPC: `bash setup.sh`
42
70
43
-
## Usage: HEST-1k Benchmark Recipe
71
+
## Exemplar Recipe: HEST-1k Benchmark
44
72
45
-
While the core `SpatialTranscriptFormer`framework can be integrated programmatically with any dataset (see the **[Python API Reference](docs/API.md)** and **[Bring Your Own Data Guide](src/spatial_transcript_former/recipes/custom/README.md)**), this repository includes a complete, out-of-the-box CLI pipeline specifically for reproducing our benchmarks on the [HEST-1k dataset](https://huggingface.co/datasets/MahmoodLab/hest).
73
+
The `SpatialTranscriptFormer` repository includes a complete, out-of-the-box CLI pipeline as an exemplar for reproducing our benchmarks on the [HEST-1k dataset](https://huggingface.co/datasets/MahmoodLab/hest).
46
74
47
-
### Dataset Access
75
+
### 1. Dataset Access & Preprocessing
48
76
49
77
```bash
50
-
# List available filtering options
51
-
stf-download --list-options
52
-
53
-
# Download a specific subset (e.g., Breast Cancer samples from Visium)
78
+
# Download a specific subset
54
79
stf-download --organ Breast --disease Cancer --tech Visium --local_dir hest_data
> The HEST dataset is gated on Hugging Face. Ensure you have accepted the terms at [MahmoodLab/hest](https://huggingface.co/datasets/MahmoodLab/hest) and are logged in via `huggingface-cli login`.
62
-
63
-
### Train Models
64
-
65
-
We provide presets for baseline models and scaled versions of the SpatialTranscriptFormer.
Visualization plots will be saved to the `./results` directory.
95
+
Visualization plots and spatial expression maps will be saved to the `./results` directory. For the full guide, see the **[HEST Recipe Docs](src/spatial_transcript_former/recipes/hest/README.md)**.
Copy file name to clipboardExpand all lines: docs/TRAINING_GUIDE.md
+3-2Lines changed: 3 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,7 @@
1
-
# Training Guide
1
+
# Training Guide (HEST Benchmark Recipe)
2
2
3
-
This guide provides command-line recipes for training different architectures and configurations using `spatial_transcript_former.train`.
3
+
> [!NOTE]
4
+
> This guide provides command-line recipes specifically for the **HEST-1k benchmark dataset**. If you are looking to train on your own data using the core API, please see the **[Python API Reference](API.md)**.
Copy file name to clipboardExpand all lines: src/spatial_transcript_former/recipes/hest/README.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
-
# HEST-1k Recipe
1
+
# HEST-1k Recipe (Exemplar)
2
2
3
-
This directory contains the recipe for training `SpatialTranscriptFormer` on the **HEST-1k** benchmark dataset.
3
+
This directory serves as a comprehensive **exemplar** for training `SpatialTranscriptFormer` on the **HEST-1k** benchmark dataset.
4
4
5
5
While the core `SpatialTranscriptFormer` framework is dataset-agnostic, this recipe provides a complete, out-of-the-box pipeline for reproducing our benchmarks, including data downloading, preprocessing, and specialized dataloaders.
0 commit comments