Skip to content
Pieter Verschaffelt edited this page Jun 30, 2025 · 21 revisions

Welcome to the Unipept Database wiki. This repository contains all code that orchestrates the construction and structure of the Unipept Database, a peptide-centric database derived from the UniProtKB-resource which ultimately powers the Unipept metaproteomics analysis platform (see https://unipept.ugent.be).

Guides

Rust Utils Workspace

Since generate_sa_tables.sh and generate_umgap_tables.sh are shell-scripts that have a very complex task to adhere to, we have developed a list of helper tools (written in Rust) that are invoked by the main script and that each have a very specific function. This Rust workspace consists of a collection of libraries and executables, and serves as a toolkit for parsing and transforming datasets from sources like UniProt and NCBI. Below you can find an overview of all utilities (that reside in the rust-utils workspace).

Structure

The workspace includes:

📚 Libraries

  • dat-parser: Parses UniProt .dat files to extract relevant metadata and annotations.
  • tables-generator: Converts parsed data into tables for SA or database construction.
  • ncbi: Provides some NCBI-specific structs for taxa and their current ranks. If NCBI updates their ranks, only this library will need to be updated.
  • utils: Shared utility functions and helper tools used across the pipeline.

⚙️ Executables

  • function-calculator: Performs functional aggregation of sequences.
  • lca-calculator: Computes the Lowest Common Ancestor (LCA) for taxonomic entries.
  • taxdmp-parser: Parses NCBI taxdump files.
  • uniprot-parser: Parses UniProt dat files and returns structured entry data.
  • uniprot-parser-tryptic: Parses UniProt dat files and returns structured entry and (tryptic) sequence data.

🛠️ Building the Workspace

To build everything in this workspace:

cargo build --release

Build a Specific Package

From the root directory, run:

cargo build -p <package_name>

Replace <package_name> with the name of any crate, such as dat-parser.

Clone this wiki locally