Skip to content

taffish/evidencemodeler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

taf-evidencemodeler

TAFFISH wrapper for EVidenceModeler (EVM), a eukaryotic gene-structure annotation tool that combines ab initio gene predictions, protein alignments, transcript alignments, and other evidence into weighted consensus gene models.

This repository packages EVidenceModeler 2.1.0 as a TAFFISH tool app. The published command is taf-evidencemodeler; the default in-container upstream command is EVidenceModeler.

Installation

Install from the public TAFFISH Hub index:

taf update
taf install evidencemodeler

Install the exact release:

taf install evidencemodeler 2.1.0-r1

For local testing before the app is published to the public index:

taf install --from .

Usage

Show TAFFISH app help:

taf-evidencemodeler --help

Show upstream EVidenceModeler help:

taf-evidencemodeler EVidenceModeler --help
taf-evidencemodeler -- --help

Run the upstream bundled sample:

mkdir evm-test
cd evm-test
taf-evidencemodeler cp -a /opt/evidencemodeler/testing/. .
taf-evidencemodeler EVidenceModeler \
  --sample_id smalltest \
  --genome genome.fasta \
  --weights weights.txt \
  --gene_predictions gene_predictions.gff3 \
  --protein_alignments protein_alignments.gff3 \
  --transcript_alignments transcript_alignments.gff3 \
  --segmentSize 100000 \
  --overlapSize 10000 \
  --CPU 4

The main outputs from that run are:

smalltest.EVM.gff3
smalltest.EVM.pep
smalltest.EVM.cds
smalltest.EVM.bed

For a real run, provide your own genome FASTA, weight file, and GFF3 evidence:

taf-evidencemodeler EVidenceModeler \
  --sample_id sample1 \
  --genome genome.fa \
  --weights weights.txt \
  --gene_predictions gene_predictions.gff3 \
  --protein_alignments protein_alignments.gff3 \
  --transcript_alignments transcript_alignments.gff3 \
  --segmentSize 100000 \
  --overlapSize 10000 \
  --CPU 8

The default command is EVidenceModeler, so option-leading calls can also use the TAFFISH -- separator:

taf-evidencemodeler -- --version
taf-evidencemodeler -- --help

Because this is a command-mode TAFFISH tool, the first non-option argument is treated as an executable inside the container. For normal EVM use, name the upstream command explicitly:

taf-evidencemodeler EVidenceModeler --help
taf-evidencemodeler create_weights_file.pl -h
taf-evidencemodeler augustus_GFF3_to_EVM_GFF3.pl augustus.gff3 > augustus.evm.gff3
taf-evidencemodeler miniprot_GFF_2_EVM_GFF3.py miniprot.gff > miniprot.evm.gff3

Evidence Inputs

EVidenceModeler does not run gene predictors or aligners for you. It combines evidence that has already been generated and converted into EVM-compatible GFF3 formats.

Required inputs:

--sample_id
--genome
--weights
--gene_predictions
--segmentSize
--overlapSize

Optional but commonly used inputs:

--protein_alignments
--transcript_alignments
--repeats
--terminalExonsFile

The weights.txt file gives each evidence source a class and numeric weight. The bundled helper can infer source names from GFF3 files:

taf-evidencemodeler create_weights_file.pl \
  -A gene_predictions.gff3 \
  -P protein_alignments.gff3 \
  -T transcript_alignments.gff3 > weights.txt

Review and edit the generated weights before real analysis; EVM weights are a biological modeling choice, not just a file-format detail.

Included Commands

The image includes:

EVidenceModeler
ParaFly
EvmUtils/*.pl
EvmUtils/misc/*.pl
EvmUtils/misc/*.py
EvmUtils/misc/GFF2_toolkit/*.pl
PerlLib modules
upstream testing data

Common helpers include:

partition_EVM_inputs.pl
write_EVM_commands.pl
execute_EVM_commands.pl
recombine_EVM_partial_outputs.pl
convert_EVM_outputs_to_GFF3.pl
gff3_file_to_proteins.pl
gene_gff3_to_bed.pl
augustus_GFF3_to_EVM_GFF3.pl
braker_GTF_to_EVM_GFF3.pl
genomeThreader_to_evm_gff3.pl
miniprot_GFF_2_EVM_GFF3.py
BPbtab.pl
prepare_Jigsaw_formats.pl
gff3_genes_to_gff2.pl

The converter helpers support common upstream evidence sources such as AUGUSTUS, BRAKER, SNAP, GeneMark, GlimmerHMM, GenomeThreader, Exonerate, miniprot, MAKER, and TACO outputs. Those external tools are not bundled here; produce their outputs with dedicated TAFFISH apps or your local workflow, then feed the converted GFF3 into EVM.

Some legacy helper scripts are included for older EVM preparation paths, such as BTAB and GFF2/Jigsaw conversion utilities:

taf-evidencemodeler BPbtab.pl < blast.output > blast.output.btab
taf-evidencemodeler prepare_Jigsaw_formats.pl -h
taf-evidencemodeler gff3_genes_to_gff2.pl

These helpers are available for upstream compatibility. They do not replace the modern recommendation to prepare high-quality spliced protein/transcript alignments and EVM-compatible GFF3 evidence before running EVM.

Container Notes

The container builds EVidenceModeler from the official EVidenceModeler-v2.1.0 tag and initializes the bundled ParaFly submodule. The upstream ParaFly build system hard-codes -m64; this app removes that architecture-specific flag at build time so the package can build natively on both linux/amd64 and linux/arm64.

The upstream 2.1.0 script still reports EVidenceModeler-v2.0.0 internally. This image patches only that displayed version string to EVidenceModeler-v2.1.0 so wrapper version checks match the packaged source tag. No algorithmic code is changed.

Runtime dependencies include Perl, Python 3, ParaFly, GNU find, GNU sort, bash, Perl DB_File, Perl URI::Escape, and BioPerl Bio::SearchIO. EUK_MODULES and PERL5LIB are set to the bundled EVM PerlLib directory so older helper scripts that still reference EUK_MODULES can resolve the same modules as the main EVM command. These are covered by the smoke tests because they are used by the main EVM execution path and bundled helpers.

Platform

This release declares native linux/amd64 and linux/arm64 builds.

On Apple Silicon macOS, Docker or Podman should normally pull the native arm64 image once it is published. If a local backend still tries to use an amd64 image for inspection or compatibility testing, use a backend-specific platform override:

TAFFISH_CONTAINER_BACKEND=docker \
TAFFISH_DOCKER_RUN_ARGS="--platform linux/amd64" \
taf-evidencemodeler EVidenceModeler --help

This kind of global TAFFISH_*_RUN_ARGS override is for local platform or site policy. The app itself does not need GPU, special devices, or network access at runtime.

Maintenance Status

The upstream GitHub project notes that EVidenceModeler is no longer being actively maintained as of 2024. This TAFFISH app packages the existing 2.1.0 release for reproducible use, but new projects should also evaluate current annotation alternatives when appropriate.

Package

name: evidencemodeler
command: taf-evidencemodeler
version: 2.1.0-r1
kind: tool
image: ghcr.io/taffish/evidencemodeler:2.1.0-r1

Citation

Haas et al. 2008. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biology 9, R7. DOI: 10.1186/gb-2008-9-1-r7; PMID: 18190707.

Upstream EVidenceModeler is distributed under the BSD 3-Clause license. This TAFFISH wrapper repository is distributed under the Apache License 2.0.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors