Skip to content

DopamineLcy/ALTA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Efficient medical vision-language alignment through adapting masked vision models (ALTA)

This is the official code for the paper: Efficient medical vision-language alignment through adapting masked vision models (TMI 2025).

ALTA

Getting started

1 Requirement

OS: Ubuntu 20.04 LTS.

Language: Python 3.10.8

If you are using conda, we provide an easy way to continue:

  conda env create -f environment.yaml
  pip install -r requirements.txt

2 Data preparation

  • We use MIMIC-CXR-JPG for pre-training. You can acquire more information about this dataset at Johnson et al. MIMIC-CXR-JPG.
  • The dataset directory specified in run.sh includes the MIMIC-CXR-JPG dataset and you need to prepare files "train.csv" according to the paper, then put them into the dateset directory MIMIC-CXR_dataset.
  • The file "train.csv" includes many columns for each line, including: image_path, auxview_image_path, last_image_path, last_auxview_image_path, report, which stands for the path of current frontal image, current lateral image, prior frontal image, prior lateral image, and the content of report, respectively.
  • Besides, the validation set of RSNA Pneumonia dataset is used for validation, please put the dataset into the directory of RSNA_dataset. The dataset can be downloaded from https://www.kaggle.com/competitions/rsna-pneumonia-detection-challenge,

3 Pre-trained models preparation

4 Start Training

  • Set the data path, GPU IDs, batch size, output directory, and other parameters in run.sh.

  • Start training by running

    chmod a+x run.sh
    ./run.sh
    

5 Evaluation

Here we provide the trained weights of ALTA, you can download it from Google Drive and put it into the directory of ALTA_weights.

5.1 Image-to-image retrieval on CheXpert 8×200

  • Prepare the dataset following convirt and put the directories of "image-retrieval" and "text-retrieval" into CheXpert8X200_dataset.

  • Run

    python CheXpert8X200_img2img.py
    

5.2 Text-to-image retrieval on CheXpert 8×200

  • The dataset has been prepared in 5.1.

  • Run

    python CheXpert8X200_img2img.py
    

5.3 Image-to-text retrieval on CheXpert 5×200

5.4 Zero-shot classification on CheXpert 5×200

  • The dataset has been prepared in 5.3.

  • Run

    python CheXpert5X200_zeroshot.py
    

5.5 Zero-shot classification on RSNA

Acknowledgments

Some code of this repository is borrowed from MAE, MRM, AIM, GLoRIA and huggingface.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

About

Official code for Efficient medical vision-language alignment through adapting masked vision models (TMI 2025)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors