RoseTTAFold2-PPI

A fast deep learning method for large-scale protein-protein interaction screening.

Installation

Clone the repository:

git clone https://github.com/CongLabCode/RoseTTAFold2-PPI.git

Download the weights to RoseTTAFold2-PPI/src/model:

cd RoseTTAFold2-PPI/src/models
wget --no-check-certificate https://conglab.swmed.edu/humanPPI/downloads/RF2-PPI.pt

Choose one of the following installation methods:

A. Download our singularity image (hhsuite is needed if you want to use our paired MSA generation script):

cd RoseTTAFold2-PPI/
wget --no-check-certificate https://conglab.swmed.edu/humanPPI/downloads/SE3nv-20230612.sif
conda install -c conda-forge -c bioconda hhsuite

B. Install conda environment (if cannot use singularity):

conda create -n rf2ppi python=3.9
conda activate rf2ppi
conda install -c conda-forge -c bioconda hhsuite
pip install numpy==1.21.2
pip install pandas==1.5.3
pip install torch==1.12.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html
pip install biopython==1.79
pip install scipy==1.7.1
pip install einops

Usage

using singularity:

singularity exec \
  --bind /path/to/input_and_output_directory:/work/users \
  --bind /path/to/rosettafold2-ppi/directory:/home/RoseTTAFold2-PPI \
  --nv SE3nv_20230612.sif \
  /bin/bash -c "cd /work/users && python /home/RoseTTAFold2-PPI/src/predict_list_PPI.py -list_fn input_file -model_file model_file"

using conda environment:

conda activate rf2ppi
python [/path/to/]RoseTTAFold2-PPI/src/predict_list_PPI.py -list_fn [input_file] -model_file [/path/to]/RoseTTAFold2-PPI/src/models/RF2-PPI.pt

Input

Building your own paired MSAs

For the [input_file], e.g., examples/input_file, each line should contain two columns:

File path of the paired multiple sequence alignment (MSA) input.
Length of the first protein.

Note: When using Singularity, paths should be relative to the directories mounted inside the container. If you prefer to use absolute paths, ensure they reference the file paths inside the container after mounting the directories.

A simplified pipeline to generate paired MSAs is as follows:

Search homologs for each protein using tools like HHblits (be sure to turn on "-all" flag in HHblits to output all the hits)
Identify the closest hit (by sequence identity) to the query from each organism, and discard other hits
Combine the resulting MSAs of both proteins by concatenating the two hit sequences (one for each query) from the same organism
Discard sequences that cannot be paired
Remove redundancy by 90% or 95% sequence identity using hhfilter.

Using omicMSA

For most human proteins, you can generate paired MSAs using the omicMSAs for single proteins we shared at https://conglab.swmed.edu/humanPPI/humanPPI_download.html

We share omicMSAs both as entire proteins (protein_omicMSAs.tar.gz) and segments (segment_omicMSAs.tar.gz, breaking long proteins into shorter segments and excluding low-quality positions in the MSAs).

Commands for generating paired MSAs from omicMSAs

To generate paired MSAs from the omicMSAs we shared, please use the following commands:

For proteins:

python /path/to/RoseTTAFold2-PPI/generate_protein_pair_MSA.py [list_of_protein_pairs] [directory_with_single_protein_MSAs] [output_directory_with_paired_MSAs]

For segments:

python /path/to/RoseTTAFold2-PPI/generate_segment_pair_MSA.py [list_of_segment_pairs] [directory_with_single_segment_MSAs] [output_directory_with_paired_MSAs]

Best practices

Note: The performance is affected by the quality of the paired MSAs. Our benchmarks suggest the following best practice can enhance the accuracy of RoseTTAFold2-PPI:

Get deeper MSAs
Remove low-quality regions, corresponding to poorly conserved intrinsically disordered regions
Only include paired MSAs and remove any unpaired sequences
Remove redundancy sequences at 90% or 95% sequence identity after "pairing"

Output

The output file will be saved as [input_file].npz and [input_file].log, e.g., those in examples/expected_output.

Log File Format (`[input_file].log`)

The log file contains three columns:

The input MSA file name
Predicted Interaction probability for a protein/segment pair
Compute time

NPZ File Format (`[input_file].npz`)

The npz file contains the inter-residue interaction probabilities. The input MSA file names were used as keys that point to a numpy matrix containing the predicted interaction probability between a residue in the first protein and a residue in the second. This matrix has the shape of (L1, L2), where L1 and L2 are the lengths of the two proteins.

Important Note on Prediction Variability

Similar to AlphaFold2, predicted interaction probability by RoseTTAFold-PPI is not deterministic. Our benchmark suggests that the standard deviation in predicted interaction probabilities might exceed 0.1 (out of 1) for about 5% of cases (Fig. A below), and such variability is more obvious for pairs with intermediate interaction probabilities (Fig. B below).

Test

using singularity:

cd RoseTTAFold2-PPI
exec_dir=$(pwd)
singularity exec \
    --bind $exec_dir:/home/RoseTTAFold2-PPI \
    --nv SE3nv.sif \
    /bin/bash -c "cd /home/RoseTTAFold2-PPI && python /home/RoseTTAFold2-PPI/src/predict_list_PPI.py -list_fn examples/protein_pairs_input -model_file src/models/RF2-PPI.pt"

using conda environment:

conda activate rf2ppi
cd RoseTTAFold2-PPI/examples
python ../src/predict_list_PPI.py -list_fn segment_pairs_input -model_file ../src/models/RF2-PPI.pt

These commands will generate outputs similar to those in examples/expected_output.

Reference

Jing Zhang*, Ian R Humphreys*, Jimin Pei*, Jinuk Kim, Chulwon Choi, Rongqing Yuan, Jesse Durham, Siqi Liu, Hee-Jung Choi, Minkyung Baek, David Baker, Qian Cong. Computing the Human Interactome. (https://www.biorxiv.org/content/10.1101/2024.10.01.615885v1)

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
examples		examples
src		src
LICENSE		LICENSE
README.md		README.md
generate_protein_pair_MSA.py		generate_protein_pair_MSA.py
generate_segment_pair_MSA.py		generate_segment_pair_MSA.py
rf2_ppi.png		rf2_ppi.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RoseTTAFold2-PPI

Installation

Usage

Input

Building your own paired MSAs

Using omicMSA

Commands for generating paired MSAs from omicMSAs

For proteins:

For segments:

Best practices

Output

Log File Format (`[input_file].log`)

NPZ File Format (`[input_file].npz`)

Important Note on Prediction Variability

Test

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RoseTTAFold2-PPI

Installation

Usage

Input

Building your own paired MSAs

Using omicMSA

Commands for generating paired MSAs from omicMSAs

For proteins:

For segments:

Best practices

Output

Log File Format ([input_file].log)

NPZ File Format ([input_file].npz)

Important Note on Prediction Variability

Test

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Log File Format (`[input_file].log`)

NPZ File Format (`[input_file].npz`)

Packages