Skip to content

taffish/mmseqs2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

taf-mmseqs2

TAFFISH wrapper for MMseqs2, an ultra-fast sequence search, clustering, and taxonomy suite for protein and nucleotide datasets.

This repository packages upstream MMseqs2 release 18-8cc5c as the TAFFISH package version 18-r2. The short TAFFISH version keeps package names stable and readable, while the Dockerfile, upstream metadata, and smoke tests pin the exact upstream release tag and binary commit.

Installation

Install from the public TAFFISH Hub index:

taf update
taf install mmseqs2

Install the exact release:

taf install mmseqs2 18-r2

For local testing before the app is published to the public index:

taf install --from .

Usage

Show TAFFISH app help:

taf-mmseqs2 --help

Show the TAFFISH package version:

taf-mmseqs2 --version

Show upstream MMseqs2 version:

taf-mmseqs2 mmseqs version

Show upstream command help:

taf-mmseqs2 mmseqs
taf-mmseqs2 mmseqs easy-search
taf-mmseqs2 mmseqs createdb

Run an easy protein search directly from FASTA files:

taf-mmseqs2 mmseqs easy-search query.fa target.fa result.m8 tmp --threads 8 -s 5.7

Create and index a reusable target database:

taf-mmseqs2 mmseqs createdb target.fa targetDB --threads 8
taf-mmseqs2 mmseqs createindex targetDB tmp --threads 8
taf-mmseqs2 mmseqs easy-search query.fa targetDB result.m8 tmp --threads 8

Cluster sequences:

taf-mmseqs2 mmseqs easy-cluster sequences.fa clusterRes tmp --min-seq-id 0.5 -c 0.8 --cov-mode 1 --threads 8
taf-mmseqs2 mmseqs easy-linclust sequences.fa linclusterRes tmp --threads 8

Convert database-format search output to a tabular alignment file:

taf-mmseqs2 mmseqs createdb query.fa queryDB
taf-mmseqs2 mmseqs createdb target.fa targetDB
taf-mmseqs2 mmseqs search queryDB targetDB resultDB tmp --threads 8
taf-mmseqs2 mmseqs convertalis queryDB targetDB resultDB result.m8

Download one of the reference databases supported by the upstream databases workflow:

taf-mmseqs2 mmseqs databases UniProtKB/Swiss-Prot swissprot tmp
taf-mmseqs2 mmseqs easy-search query.fa swissprot result.m8 tmp --threads 8

Because this is a command-mode TAFFISH tool, the first non-option argument is the in-container command. MMseqs2's actual executable is named mmseqs, so the clearest form is:

taf-mmseqs2 mmseqs easy-search ...
taf-mmseqs2 mmseqs easy-cluster ...
taf-mmseqs2 mmseqs createdb ...

Do not use taf-mmseqs2 easy-search ... as the normal form. In command mode, TAFFISH will interpret easy-search as an executable inside the container, not as a subcommand of mmseqs.

The -- separator is only useful for option-leading arguments to the default mmseqs command:

taf-mmseqs2 -- -h
taf-mmseqs2 -- --help

For MMseqs2 modules such as version, easy-search, or createdb, use the explicit taf-mmseqs2 mmseqs <module> ... form.

This README lists common usage patterns, not the full upstream manual. The TAFFISH wrapper calls the upstream mmseqs command directly, so official MMseqs2 modules and options are available as upstream implements them. Use taf-mmseqs2 mmseqs or the upstream user guide for the complete module list. Release 18 modules such as fwbw, pairaln, taxonomyreport, and createdmptaxonomy are available through the same taf-mmseqs2 mmseqs ... form even though they are not expanded in the common examples above.

Package

name: mmseqs2
command: taf-mmseqs2
version: 18-r2
kind: tool
image: ghcr.io/taffish/mmseqs2:18-r2
upstream release: 18-8cc5c
upstream binary commit: 8cc5ce367b5638c4306c2d7cfc652dd099a4643f

Container

The container image is built from docker/Dockerfile. It starts from alpine:3.20, downloads official static MMseqs2 release binaries from GitHub, and verifies every downloaded archive with the upstream GitHub release sha256 digests. Release 18-r2 keeps the same upstream MMseqs2 release as 18-r1, but rebuilds the runtime layer on Alpine to reduce image size after confirming that the official binaries do not need glibc dynamic libraries.

For linux/amd64, the image installs the official sse2, sse41, and avx2 CPU binaries. The /usr/local/bin/mmseqs launcher selects the fastest available binary at runtime based on /proc/cpuinfo, falling back to SSE2 on older CPUs.

For linux/arm64, the image installs the official mmseqs-linux-arm64 binary. The upstream matrices, examples, README, license, bash-completion script, and user guide are kept under /opt/mmseqs2.

The image includes these user-facing commands and runtime tools:

mmseqs
bash
gawk
grep
curl
wget
xargs
tar
gzip
bzip2
xz

The downloader and shell tools are included because upstream workflows such as mmseqs databases and taxonomy database setup can call external download, archive, and text-processing utilities. This image intentionally keeps the lighter curl/wget/xargs path instead of bundling aria2c. The core search and clustering workflows are provided by the upstream mmseqs binary itself.

This TAFFISH package is a CPU MMseqs2 build. Upstream release 18-8cc5c also publishes GPU archives, but those CUDA-enabled binaries are not bundled in this image. --gpu 1 workflows require a future GPU-specific TAFFISH app or a custom image with the upstream GPU binary and backend GPU runtime options.

The official precompiled static binaries do not include MPI support. Use a custom MPI build if you need the upstream multi-node RUNNER=mpirun ... execution mode.

The image is built and validated for:

linux/amd64
linux/arm64

The TAFFISH metadata declares a Docker smoke check:

exist: mmseqs, bash, gawk, grep, curl, wget, xargs, tar, gzip, bzip2, xz
test:  mmseqs version reports the pinned upstream binary commit
test:  bundled CPU-specific binary variants report the pinned commit
test:  top-level MMseqs2 help is available
test:  easy-search, createdb, and databases help surfaces are available
test:  selected release 18 and taxonomy-related module help surfaces are available
test:  shell runtime is usable
test:  createdb accepts gzip and bzip2 FASTA inputs
test:  a tiny easy-search produces a tabular hit
test:  a tiny easy-cluster produces a cluster TSV
test:  tiny easy-linclust and easy-rbh workflows run
test:  createdb, search, convertalis, createtsv, and convert2fasta run as a DB workflow

During TAFFISH Hub indexing, this smoke metadata verifies that the published image exposes the expected command surface, reports the pinned upstream binary commit, includes the helper runtime tools needed by upstream workflows, and can run representative local sequence search and clustering tasks. It does not download remote databases or exhaustively validate every MMseqs2 module.

Each smoke command is self-contained because the public index runs every [smoke].test entry in a fresh temporary container. No smoke entry depends on files created by an earlier entry.

Upstream

Maintainer Notes

Useful checks before publishing:

taf check
taf compile -- mmseqs version
taf publish --release --dry-run
docker build -t ghcr.io/taffish/mmseqs2:18-r2 -f docker/Dockerfile .
docker build --platform linux/amd64 -t ghcr.io/taffish/mmseqs2:18-r2-amd64-test -f docker/Dockerfile .
docker build --platform linux/arm64 -t ghcr.io/taffish/mmseqs2:18-r2-arm64-test -f docker/Dockerfile .
docker run --rm ghcr.io/taffish/mmseqs2:18-r2 mmseqs version
docker run --rm ghcr.io/taffish/mmseqs2:18-r2 mmseqs easy-search

The repository wrapper files are licensed under Apache-2.0. Upstream MMseqs2 is distributed under the MIT license, and bundled third-party components remain under their own upstream licenses.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors