Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions .docker/Dockerfile.docs
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
# Docs build/verification image: baleen (CPU-only) + MkDocs toolchain.
# Docs build/verification image: baleen + krill (CPU) + MkDocs toolchain.
# Used to run `mkdocs build --strict` so mkdocstrings can import baleen.
FROM python:3.11-slim

ENV DEBIAN_FRONTEND=noninteractive
ENV BALEEN_NO_CUDA=1

RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential zlib1g-dev libhdf5-dev \
Expand All @@ -15,7 +14,11 @@ RUN groupadd -g ${GID} app && useradd -m -u ${UID} -g ${GID} app

WORKDIR /app
COPY . .
RUN pip install --no-cache-dir ".[docs]"
# krill (engine) is imported by baleen at import time; mkdocstrings needs it.
RUN pip install --no-cache-dir ".[docs]" \
&& pip install --no-cache-dir numpy scipy pyslow5 pyfastx \
&& pip install --no-cache-dir krill --no-deps \
--index-url https://loganylchen.github.io/krill-dist/simple/

USER app
WORKDIR /work
9 changes: 7 additions & 2 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11"]
python-version: ["3.10", "3.11", "3.12"]

steps:
- name: Checkout
Expand All @@ -24,7 +24,12 @@ jobs:

- name: Install and test
run: |
BALEEN_NO_CUDA=1 pip install ".[test]"
pip install ".[test]"
# krill (DTW + eventalign engine) is not on PyPI — install the CPU
# wheel from the project index. Its runtime deps come from PyPI first.
pip install numpy scipy pyslow5 pyfastx
pip install krill --no-deps \
--index-url https://loganylchen.github.io/krill-dist/simple/
pytest

build-cpu:
Expand Down
8 changes: 7 additions & 1 deletion .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,13 @@ jobs:
python-version: "3.11"

- name: Install docs dependencies
run: BALEEN_NO_CUDA=1 pip install ".[docs]"
run: |
pip install ".[docs]"
# krill (engine) is not on PyPI; mkdocstrings imports baleen, which
# imports it. Install the CPU wheel from the project index.
pip install numpy scipy pyslow5 pyfastx
pip install krill --no-deps \
--index-url https://loganylchen.github.io/krill-dist/simple/

- name: Build site (strict)
run: mkdocs build --strict
Expand Down
67 changes: 37 additions & 30 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,16 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
## Build and Test Commands

```bash
# Install package (CUDA auto-detected if nvcc available)
# Install package (pure Python — no C extension to build)
pip install .

# Install CPU-only (skip CUDA compilation)
BALEEN_NO_CUDA=1 pip install .

# Target specific GPU archs (comma-separated compute capabilities without dot)
BALEEN_CUDA_ARCHS=86,90 pip install .
# Or auto-detect installed GPU
BALEEN_CUDA_ARCHS=native pip install .
# The DTW + eventalign engine 'krill' is a required runtime dependency that is
# NOT on PyPI. Install it from the project index (GPU cu122 wheel, or CPU):
pip install krill --no-deps \
--index-url https://loganylchen.github.io/krill-dist/cu122/simple/ # GPU
pip install krill --no-deps \
--index-url https://loganylchen.github.io/krill-dist/simple/ # CPU
# (Or use a prebuilt baleen Docker image, which bundles krill + slow5tools.)

# Run all tests
pytest
Expand Down Expand Up @@ -49,20 +49,19 @@ Conventional commits: `feat:`, `fix:`, `perf:`, `build:`, `bench:`, `ci:`, `refa

## Architecture Overview

Baleen is a CUDA-accelerated DTW (Dynamic Time Warping) and nanopore signal analysis pipeline for detecting RNA modifications by comparing native and IVT (in vitro transcribed) nanopore signals.
Baleen is a GPU-accelerated DTW (Dynamic Time Warping) and nanopore signal analysis pipeline for detecting RNA modifications by comparing native and IVT (in vitro transcribed) nanopore signals. The DTW and eventalign engine is provided by **krill**.

### Package Structure

```
baleen/
├── __init__.py # Re-exports public API from eventalign
├── _cuda_dtw/ # CUDA DTW implementation with CPU fallback
│ └── __init__.py # Python wrapper (dtw_distance, dtw_pairwise, etc.)
├── _dtw.py # DTW shim delegating to krill (+ GPU memory helpers)
└── eventalign/ # Main analysis pipeline
├── __init__.py # Public API exports
├── _pipeline.py # run_pipeline(), save/load_results()
├── _bam.py # BAM parsing, contig stats, filtering
├── _f5c.py # f5c eventalign CLI wrapper
├── _eventalign.py # krill eventalign wrapper (f5c-format TSV output)
├── _signal.py # Signal extraction and grouping by position
├── _probability.py # Modification probability algorithms
├── _hierarchical.py # Hierarchical Bayesian + HMM pipeline (V1→V2→V3)
Expand All @@ -72,9 +71,9 @@ baleen/
### Data Flow

1. **Input**: Native + IVT BAM/FASTQ/BLOW5 files + reference FASTA
2. **Event alignment**: f5c eventalign produces per-read signal tables per position
2. **Event alignment**: krill aligns each read's signal to its mapped reference subsequence (HMM-free, forced-dense) and emits an f5c-format per-position signal table
3. **Signal grouping**: Group signals by genomic position, find common positions
4. **DTW computation**: Pairwise DTW distance matrices per position (CUDA or tslearn fallback)
4. **DTW computation**: Pairwise DTW distance matrices per position (krill GPU kernel, CPU fallback)
5. **Modification calling**: Three-stage hierarchical pipeline:
- V1: Empirical-Bayes null scoring with hierarchical shrinkage
- V2: Anchored two-component mixture EM
Expand All @@ -90,11 +89,14 @@ baleen/

### DTW Backend Selection

The `_cuda_dtw` module auto-selects backend at import time:
- CUDA (GPU) if `_cuda_dtw` C extension compiled successfully
- CPU (tslearn) fallback otherwise
`baleen/_dtw.py` is a thin shim over krill's bundled DTW (same cuDTW++ kernel
the project previously vendored as the `_cuda_dtw` C extension). krill
auto-selects GPU when a device + GPU wheel are present, else CPU.

Use `use_cuda=True/False` to force backend, or `None` for auto-select.
Use `use_cuda=True/False` to force backend, or `None` for auto-select (mapped
to krill's `use_gpu`). The pure-Python GPU memory-planning helpers
(`estimate_gpu_memory`, `get_device_count`, `get_per_device_memory`) live in
the shim since krill does not expose them.

### Modification Probability Algorithms

Expand All @@ -110,22 +112,27 @@ Three modes in `_hmm_training.py`:
- **Semi-supervised**: Platt-scaling calibrator from labeled positions
- **Supervised**: MLE transitions + KDE emissions from labeled trajectories

## CUDA Kernel Architecture
## DTW Engine (krill)

The DTW kernels (GPU + CPU) live in the **krill** package, not in this repo.
krill ships the same cuDTW++ warp-shuffle kernel baleen previously vendored.
The GPU path is bit-identical to that legacy kernel (verified during the swap);
krill's CPU path resamples long signals to fixed buckets (GPU-consistent),
which differs from the old tslearn fallback only on CPU-only installs.

- **FP32 only** — `DTWDistance<float>` template, always float. FP16 would break Pascal consumer GPUs (1/64 FP32 throughput).
- **Wavefront parallelism**: one thread per row of cost matrix, diagonal sweep. `blockDim.x = 1024` (max threads per block). Three rolling diagonals in shared memory (~12 KB).
- **One block per pair** for pairwise mode; grid.x = num_comparisons. Outer loop over reference sequences is serial.
- **Cost function**: squared Euclidean distance, `sqrt` only at the end. Path matrix = nullptr for pairwise (no memory waste).
- **No Sakoe-Chiba band** — a soft-band variant was tried and reverted because setting out-of-band cells to INF without reducing thread count/diagonals is pure overhead. A real band optimization requires skipping diagonals and sizing `blockDim.x` to `min(1024, 2*band_width+1)`.
- Source files: `dtw.hpp` (kernel), `dtw_api.cpp` (Python-C bridge), `multithreading.cpp` (CPU thread pool).
GPU vs CPU is decided by which krill wheel is installed (cu122 vs plain) plus
device presence. krill exposes `dtw_distance`, `dtw_pairwise`,
`dtw_pairwise_varlen`, `dtw_multi_position_pairwise`, `dtw_backend`,
`dtw_available`.

## External Dependencies

- **f5c**: External CLI tool for nanopore event alignment. Must be on PATH.
- **pysam**: BAM file parsing
- **tslearn**: CPU DTW fallback
- **scipy**: Statistical functions, optimization
- **numba** (optional): JIT-compiled HMM forward-backward kernel (`@njit(cache=True)`), kicks in when installed
- **krill**: DTW + eventalign engine (not on PyPI; install from the project
index — cu122 GPU wheel or plain CPU wheel). Required.
- **slow5tools**: CLI used to index BLOW5 (`slow5tools index`); must be on PATH.
- **pyslow5 / pyfastx / pysam**: BLOW5 signal, reference FASTA, and BAM access.
- **scipy**: Statistical functions, optimization.
- **numba** (optional): JIT-compiled HMM forward-backward kernel (`@njit(cache=True)`), kicks in when installed.


# CLAUDE.md
Expand Down
54 changes: 22 additions & 32 deletions Dockerfile.cpu
Original file line number Diff line number Diff line change
@@ -1,44 +1,34 @@
# --- Build stage ---
FROM python:3.11-slim AS builder
# CPU production image: baleen + krill (CPU wheel) + slow5tools.
# No f5c, no CUDA build — krill is the DTW + eventalign engine and is pure
# Python to install (a prebuilt wheel from the project index).
FROM python:3.11-slim

ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update && apt-get install -y --no-install-recommends \
wget build-essential zlib1g-dev libhdf5-dev \
wget ca-certificates zlib1g \
&& apt-cache search '^libhdf5-[0-9]' | head -1 | awk '{print $1}' \
| xargs apt-get install -y --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*

# Install f5c v1.6 from pre-built binaries (CPU)
RUN wget -q "https://github.com/hasindu2008/f5c/releases/download/v1.6/f5c-v1.6-binaries.tar.gz" \
&& tar xf f5c-v1.6-binaries.tar.gz \
&& cp f5c-v1.6/f5c_x86_64_linux /usr/local/bin/f5c \
&& chmod +x /usr/local/bin/f5c \
&& rm -rf f5c-v1.6 f5c-v1.6-binaries.tar.gz
# slow5tools — baleen indexes BLOW5 via `slow5tools index` (pyslow5 needs .idx).
RUN wget -q "https://github.com/hasindu2008/slow5tools/releases/download/v1.3.0/slow5tools-v1.3.0-x86_64-linux-binaries.tar.gz" \
&& tar xf slow5tools-v1.3.0-x86_64-linux-binaries.tar.gz \
&& cp slow5tools-v1.3.0/slow5tools /usr/local/bin/slow5tools \
&& chmod +x /usr/local/bin/slow5tools \
&& rm -rf slow5tools-v1.3.0 slow5tools-v1.3.0-x86_64-linux-binaries.tar.gz

# Install baleen (CPU only); copy console_script to a known path
# baleen (pure Python now — no C extension to build).
WORKDIR /app
COPY . .
ENV BALEEN_NO_CUDA=1
RUN pip install --no-cache-dir . \
&& BALEEN_BIN="$(which baleen 2>/dev/null)" \
&& if [ -z "$BALEEN_BIN" ]; then \
printf '#!/bin/sh\nexec python3 -m baleen "$@"\n' > /usr/local/bin/baleen; \
elif [ "$BALEEN_BIN" != "/usr/local/bin/baleen" ]; then \
cp "$BALEEN_BIN" /usr/local/bin/baleen; \
fi \
&& chmod +x /usr/local/bin/baleen

# --- Runtime stage ---
FROM python:3.11-slim

RUN apt-get update \
&& apt-get install -y --no-install-recommends zlib1g \
&& apt-cache search '^libhdf5-[0-9]' | head -1 | awk '{print $1}' \
| xargs apt-get install -y --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*

COPY --from=builder /usr/local/bin/f5c /usr/local/bin/f5c
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY --from=builder /usr/local/bin/baleen /usr/local/bin/baleen
RUN pip install --no-cache-dir .

# krill engine — follow the package's strict rules:
# 1. runtime deps from PyPI first (NEVER `krill[...]`, NEVER --extra-index-url)
# 2. krill itself ONLY from the project index, --no-deps. CPU wheel here.
RUN pip install --no-cache-dir numpy scipy pyslow5 pyfastx \
&& pip install --no-cache-dir krill --no-deps \
--index-url https://loganylchen.github.io/krill-dist/simple/

WORKDIR /data
ENTRYPOINT ["baleen"]
99 changes: 27 additions & 72 deletions Dockerfile.gpu
Original file line number Diff line number Diff line change
@@ -1,87 +1,42 @@
# --- Build stage ---
FROM nvidia/cuda:12.6.3-devel-ubuntu22.04 AS builder
# GPU production image: baleen + krill (cu122 GPU wheel) + slow5tools.
# No f5c, no nvcc/CUDA build — krill ships the GPU DTW kernel as a prebuilt
# wheel, so a CUDA *runtime* base (matching the cu122 wheel) is sufficient.
FROM nvidia/cuda:12.2.2-runtime-ubuntu22.04

ENV DEBIAN_FRONTEND=noninteractive

# Switch to Azure-hosted Ubuntu mirror — GitHub Actions runners live in
# Azure, so this is effectively an internal hop and avoids the intermittent
# archive.ubuntu.com timeouts that kill the build.
# Azure-hosted Ubuntu mirror — GitHub Actions runners live in Azure; avoids the
# intermittent archive.ubuntu.com timeouts that kill the build.
RUN sed -i 's|http://archive.ubuntu.com|http://azure.archive.ubuntu.com|g; \
s|http://security.ubuntu.com|http://azure.archive.ubuntu.com|g' \
/etc/apt/sources.list \
&& apt-get update && apt-get install -y --no-install-recommends \
python3 python3-pip python3-dev python3-venv \
wget build-essential zlib1g-dev libhdf5-dev \
&& rm -rf /var/lib/apt/lists/*

# Install f5c v1.6 from pre-built binaries (CPU)
RUN wget -q "https://github.com/hasindu2008/f5c/releases/download/v1.6/f5c-v1.6-binaries.tar.gz" \
&& tar xf f5c-v1.6-binaries.tar.gz \
&& cp f5c-v1.6/f5c_x86_64_linux /usr/local/bin/f5c \
&& chmod +x /usr/local/bin/f5c \
&& rm -rf f5c-v1.6 f5c-v1.6-binaries.tar.gz

# Install baleen with CUDA; ensure console_script is at a known path.
# -v surfaces nvcc compile/link output so silent CUDA fallback is visible in CI.
# After install, verify _cuda_dtw*.so was actually built — fail loud otherwise.
WORKDIR /app
COPY . .
RUN pip3 install pytest \
&& pip3 install --no-cache-dir -v . 2>&1 | tee /tmp/pip-install.log \
&& SO_PATH=$(cd / && python3 -c "import baleen._cuda_dtw, os; print(os.path.dirname(baleen._cuda_dtw.__file__))") \
&& if ! ls "$SO_PATH"/_cuda_dtw*.so >/dev/null 2>&1; then \
echo "ERROR: CUDA extension (.so) was not built — image would be CPU-only." >&2; \
echo "SO_PATH=$SO_PATH" >&2; \
echo "find result:" >&2; \
find /usr/local/lib /usr/lib -name '_cuda_dtw*.so' 2>/dev/null >&2 || true; \
echo "pip install log tail:" >&2; \
tail -80 /tmp/pip-install.log >&2; \
exit 1; \
fi \
&& echo "OK: _cuda_dtw $(ls $SO_PATH/_cuda_dtw*.so)" \
&& BALEEN_BIN="$(which baleen 2>/dev/null)" \
&& if [ -z "$BALEEN_BIN" ]; then \
printf '#!/bin/sh\nexec python3 -m baleen "$@"\n' > /usr/local/bin/baleen; \
elif [ "$BALEEN_BIN" != "/usr/local/bin/baleen" ]; then \
cp "$BALEEN_BIN" /usr/local/bin/baleen; \
fi \
&& chmod +x /usr/local/bin/baleen

# Record the Python version so the runtime stage can verify it matches
RUN python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")' \
> /tmp/python_version.txt

# --- Runtime stage ---
# NOTE: must use the same Ubuntu version as the builder so Python versions match
FROM nvidia/cuda:12.6.3-runtime-ubuntu22.04

ENV DEBIAN_FRONTEND=noninteractive

RUN sed -i 's|http://archive.ubuntu.com|http://azure.archive.ubuntu.com|g; \
s|http://security.ubuntu.com|http://azure.archive.ubuntu.com|g' \
/etc/apt/sources.list \
&& apt-get update \
&& apt-get install -y --no-install-recommends python3 python3-pip zlib1g \
python3 python3-pip wget ca-certificates zlib1g \
&& apt-cache search '^libhdf5-[0-9]' | head -1 | awk '{print $1}' \
| xargs apt-get install -y --no-install-recommends \
&& rm -rf /var/lib/apt/lists/*

COPY --from=builder /usr/local/bin/f5c /usr/local/bin/f5c
COPY --from=builder /usr/local/bin/baleen /usr/local/bin/baleen
COPY --from=builder /tmp/python_version.txt /tmp/python_version.txt
# slow5tools — baleen indexes BLOW5 via `slow5tools index` (pyslow5 needs .idx).
RUN wget -q "https://github.com/hasindu2008/slow5tools/releases/download/v1.3.0/slow5tools-v1.3.0-x86_64-linux-binaries.tar.gz" \
&& tar xf slow5tools-v1.3.0-x86_64-linux-binaries.tar.gz \
&& cp slow5tools-v1.3.0/slow5tools /usr/local/bin/slow5tools \
&& chmod +x /usr/local/bin/slow5tools \
&& rm -rf slow5tools-v1.3.0 slow5tools-v1.3.0-x86_64-linux-binaries.tar.gz

# Copy installed Python packages — Ubuntu 22.04 ships Python 3.10
# Verify the builder used the same version before copying
COPY --from=builder /usr/local/lib/python3.10/dist-packages /usr/local/lib/python3.10/dist-packages
COPY --from=builder /usr/lib/python3/dist-packages /usr/lib/python3/dist-packages
RUN BUILD_VER=$(cat /tmp/python_version.txt) \
&& RUNTIME_VER=$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")') \
&& if [ "$BUILD_VER" != "$RUNTIME_VER" ]; then \
echo "ERROR: Python version mismatch: builder=$BUILD_VER runtime=$RUNTIME_VER" >&2; \
echo "Update the COPY paths in Dockerfile.gpu to match python$RUNTIME_VER" >&2; \
exit 1; \
fi \
&& rm /tmp/python_version.txt
# baleen (pure Python now — no C extension to build). Ubuntu 22.04 ships a
# setuptools that predates PEP 621, so upgrade pip/setuptools first or the
# [project] table is ignored and an empty "UNKNOWN" package is built.
WORKDIR /app
COPY . .
RUN pip3 install --no-cache-dir --upgrade pip setuptools wheel \
&& pip3 install --no-cache-dir .

# krill engine — follow the package's strict rules:
# 1. runtime deps from PyPI first (NEVER `krill[...]`, NEVER --extra-index-url)
# 2. krill itself ONLY from the project index, --no-deps. cu122 GPU wheel here.
RUN pip3 install --no-cache-dir numpy scipy pyslow5 pyfastx \
&& pip3 install --no-cache-dir krill --no-deps \
--index-url https://loganylchen.github.io/krill-dist/cu122/simple/

WORKDIR /data
ENTRYPOINT ["baleen"]
Loading
Loading