TAFFISH wrapper for ESMFold, the protein language model structure-prediction interface from the ESM project.
This repository packages fair-esm 2.0.0 with the upstream ESMFold command-line
script as a TAFFISH tool app. The published command is
taf-esm-fold; the in-container upstream command is esm-fold.
Release 2.0.0-r3 is a help-only TAFFISH update. It keeps the upstream
software, Dockerfile, runtime dependencies, smoke tests, and command behavior
unchanged from 2.0.0-r2, and refreshes the terminal taf-esm-fold --help
text.
Install from the public TAFFISH Hub index:
taf update
taf install esm-foldInstall the exact release:
taf install esm-fold 2.0.0-r3For local testing before the app is published to the public index:
taf install --from .Show TAFFISH app help:
taf-esm-fold --helpShow upstream ESMFold help:
taf-esm-fold esm-fold -h
taf-esm-fold -- -hPredict structures from a protein FASTA file:
taf-esm-fold esm-fold -i proteins.fa -o pdb-outReduce memory use for long proteins:
taf-esm-fold esm-fold -i proteins.fa -o pdb-out --chunk-size 128Reduce batch size when short sequences still run out of GPU memory:
taf-esm-fold esm-fold -i proteins.fa -o pdb-out --max-tokens-per-batch 512The default command is esm-fold, so option-leading calls can also use the
TAFFISH -- separator:
taf-esm-fold -- -i proteins.fa -o pdb-out --chunk-size 128Because this is a command-mode TAFFISH tool, the first non-option argument is treated as an executable inside the container. For normal ESMFold use, name the upstream command explicitly:
taf-esm-fold esm-fold -i proteins.fa -o pdb-out
taf-esm-fold python -c 'import esm; print(esm.pretrained.esmfold_v1)'ESMFold is much more practical with GPU acceleration, but the upstream CLI also
supports --cpu-only. This app therefore uses a conservative automatic GPU
policy rather than always adding GPU flags.
By default, TAFFISH_ESM_FOLD_GPU=auto. In auto mode, the wrapper adds the
backend GPU flag only when the host appears to be a Linux NVIDIA environment:
Docker: --gpus all
Podman: --device nvidia.com/gpu=all
Apptainer: --nv
Force GPU runtime flags when auto-detection is too conservative:
TAFFISH_ESM_FOLD_GPU=1 taf-esm-fold esm-fold -i proteins.fa -o pdb-outForce CPU/container-no-GPU mode:
TAFFISH_ESM_FOLD_GPU=0 taf-esm-fold esm-fold --cpu-only -i proteins.fa -o pdb-outValid values are auto, 1/true/yes/on, and
0/false/no/off.
If auto-detection adds GPU flags on a host whose container backend is not yet
configured for NVIDIA access, either configure that backend or set
TAFFISH_ESM_FOLD_GPU=0 for CPU/container-no-GPU runs.
The image patches the upstream ESMFold CLI so --cpu-only converts the model to
float32 before inference. Without this, PyTorch CPU LayerNorm can fail on
half-precision tensors. CPU-only mode is still very slow and memory-heavy; local
validation used a 36 aa test sequence, about 8 GB of cached weights, and a Docker
VM with about 16 GB memory.
TAFFISH_DOCKER_RUN_ARGS, TAFFISH_PODMAN_RUN_ARGS, and
TAFFISH_APPTAINER_RUN_ARGS remain available for local site policy, such as
extra mounts or cluster-specific runtime options. They are appended after the
app-declared runtime arguments.
Even tiny CPU-only predictions still load/download the full ESMFold model, so they are not used as publish smoke tests.
This app exposes the upstream ESMFold FASTA-to-PDB CLI:
esm-fold -i FASTA -o PDB_DIR
Common upstream options include:
--num-recycles
--max-tokens-per-batch
--chunk-size
--cpu-only
--cpu-offload
The upstream script supports --cpu-only. In practice, ESMFold model inference
is large and slow without GPU acceleration, so GPU-backed Docker/Podman/Apptainer
is the recommended route for real predictions.
Model weights are not baked into the image. On first real prediction, PyTorch
downloads the ESMFold checkpoint plus the underlying ESM2 model weights into the
user's Torch cache. In local validation, the first-run cache was about 8 GB
(esmfold_3B_v1.pt, esm2_t36_3B_UR50D.pt, and a small contact-regression
file). The wrapper sets TORCH_HOME to
${TORCH_HOME:-$HOME/.cache/taffish/esm-fold/torch} at container runtime, so
the default cache is under the host home directory that TAFFISH mounts into the
container. This keeps the image smaller and lets the same host cache be reused
across runs.
For shared scratch or cluster storage, choose an explicit cache directory:
TORCH_HOME=/path/to/torch-cache taf-esm-fold esm-fold -i proteins.fa -o pdb-outThis app does not bundle AlphaFold, ColabFold, template databases, MSA
databases, or protein visualization tools. ESMFold predicts directly from
single protein sequences and does not require an MSA database for the standard
esm-fold path.
This release declares native linux/amd64 only. The container is based on an
NVIDIA CUDA devel image and installs CUDA-enabled PyTorch plus OpenFold.
Apple Silicon and other arm64 hosts are not native targets for this release. For
Docker and Podman, src/main.taf declares --platform linux/amd64, so amd64
emulation may be useful for inspecting the image and running lightweight
commands. It does not expose the Apple GPU to this Linux CUDA container. PyTorch
MPS is a macOS-native backend, not a Docker Linux container GPU passthrough
mechanism.
On Apple Silicon macOS, use Docker emulation and disable app-managed GPU flags:
TAFFISH_CONTAINER_BACKEND=docker \
TAFFISH_ESM_FOLD_GPU=0 \
taf-esm-fold esm-fold -hCPU-only prediction can also be attempted through amd64 emulation:
TAFFISH_CONTAINER_BACKEND=docker \
TAFFISH_ESM_FOLD_GPU=0 \
taf-esm-fold esm-fold --cpu-only -i proteins.fa -o pdb-outThis is a compatibility path, not native Apple GPU support. It needs the full model cache, enough Docker Desktop memory, and patience; local validation used about 16 GB of Docker VM memory.
name: esm-fold
command: taf-esm-fold
version: 2.0.0-r3
kind: tool
image: ghcr.io/taffish/esm-fold:2.0.0-r3
The container image is built from docker/Dockerfile. It starts from
nvidia/cuda:11.3.1-devel-ubuntu20.04, installs Python 3, CUDA-enabled PyTorch
1.12.1, fair-esm[esmfold] 2.0.0, NVIDIA dllogger, and OpenFold at the
upstream commit recommended by the ESM project.
The Dockerfile also installs the upstream scripts/esmfold_inference.py file as
/usr/local/bin/esm-fold, because the fair-esm wheel provides the Python
package but not a console entry point. The installed script is patched for
CPU-only float32 inference inside this container.
The TAFFISH metadata declares a Docker smoke check:
exist: esm-fold, python, python3, pip, nvcc
test: upstream CLI help is available
test: fair-esm is pinned to 2.0.0
test: PyTorch imports with the pinned 1.12.1 runtime
test: esm, openfold, deepspeed, dllogger, and dm-tree import
test: the CPU-only float32 patch is present
test: esm.data.read_fasta parses a tiny FASTA file
Smoke does not run a full prediction, because that would download large model weights and require a real GPU during index validation.
Manual local validation also completed a CPU-only prediction for a 36 aa FASTA sequence and produced a 300-line PDB file after the model cache was prepared.
- Project: ESM / ESMFold
- Homepage: https://github.com/facebookresearch/esm
- Release: https://github.com/facebookresearch/esm/releases/tag/v2.0.0
- PyPI: https://pypi.org/project/fair-esm/2.0.0/
- License: MIT
- Citation: Lin et al. 2023, doi:10.1126/science.ade2574, PMID:36927031