-
Notifications
You must be signed in to change notification settings - Fork 2
Home
Welcome to the Unipept Database wiki. This repository contains all code that orchestrates the construction and structure of the Unipept Database, a peptide-centric database derived from the UniProtKB-resource which ultimately powers the Unipept metaproteomics analysis platform (see https://unipept.ugent.be).
- Building tables for the suffix array
- Building the UMGAP indexes
- Set up and configure OpenSearch
- Build and import protein metadata into OpenSearch
Since generate_sa_tables.sh and generate_umgap_tables.sh are shell-scripts that have a very complex task to adhere to, we have developed a list of helper tools (written in Rust) that are invoked by the main script and that each have a very specific function.
This Rust workspace consists of a collection of libraries and executables, and serves as a toolkit for parsing and transforming datasets from sources like UniProt and NCBI. Below you can find an overview of all utilities (that reside in the rust-utils workspace).
The workspace includes:
-
dat-parser: Parses UniProt.datfiles to extract relevant metadata and annotations. -
tables-generator: Converts parsed data into tables for SA or database construction. -
ncbi: Provides some NCBI-specific structs for taxa and their current ranks. If NCBI updates their ranks, only this library will need to be updated. -
utils: Shared utility functions and helper tools used across the pipeline.
-
function-calculator: Performs functional aggregation of sequences. -
lca-calculator: Computes the Lowest Common Ancestor (LCA) for taxonomic entries. -
taxdmp-parser: Parses NCBItaxdumpfiles. -
uniprot-parser: Parses UniProtdatfiles and returns structured entry data. -
uniprot-parser-tryptic: Parses UniProtdatfiles and returns structured entry and (tryptic) sequence data.
To build everything in this workspace:
cargo build --releaseFrom the root directory, run:
cargo build -p <package_name>Replace <package_name> with the name of any crate, such as dat-parser.