Skip to content

SFGLab/pMMC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pMMC

Parallel Multiscale Monte Carlo approach to 3D chromatin structure reconstruction from ChIA-PET data.

pMMC (also known as 3D-GNOME) reconstructs high-resolution 3D genome organization by combining hierarchical modeling with GPU-accelerated Monte Carlo simulations. It takes ChIA-PET interaction data (anchors, PET clusters, singletons) and produces 3D polymer models of chromatin at multiple scales.

Based on: Szalaj et al., "3D-GNOME: 3D Genome Organization Modeling Engine", Nucleic Acids Research, 2016.

Features

  • Multiscale reconstruction across 4 hierarchical levels: chromosome, segment (~2 Mb), anchor/interaction block, and subanchor (~10 kb)
  • CUDA-accelerated Monte Carlo simulations with simulated annealing
  • Ensemble modeling to generate multiple independent structures
  • Synthetic data generation and automated benchmarking pipeline
  • Structural comparison metrics including RMSD, Pearson correlation, SCC, and contact decay curves
  • Multiple output formats: HCM (native binary), mmCIF, and PDB
  • Cross-platform: Linux, Windows (MSVC), and Docker
  • Deterministic seeding for reproducible results
  • Memory budget enforcement with graceful shutdown

Requirements

  • NVIDIA GPU with CUDA Compute Capability >= 6.0
  • CUDA Toolkit 11.8+
  • CMake 3.13+
  • C++17 compiler (GCC, Clang, or MSVC 2017+)

Building

Linux

mkdir build && cd build
cmake .. -DCUDA_ARCH="80" -GNinja
ninja

Set CUDA_ARCH to match your GPU architecture:

GPU Generation Architecture
Pascal (GTX 10xx) 60
Volta (V100) 70
Turing (RTX 20xx) 75
Ampere (RTX 30xx, A100) 80, 86
Ada Lovelace (RTX 40xx) 89

Multiple architectures can be specified: -DCUDA_ARCH="70;75;80"

Windows

Open the Visual Studio solution (MSVC++/pMMC.sln) or use CMake:

mkdir build && cd build
cmake .. -DCUDA_ARCH="80"
cmake --build . --config Release

Docker

docker build -t pmmc:11.8 .
docker run --gpus all -it pmmc:11.8

The Docker image builds for architectures 60, 70, 75, 80, and 86.

Usage

pMMC -a <action> [options]

Actions

Action Alias Description
create c Reconstruct 3D structure from ChIA-PET data (default)
generate g Generate synthetic polymer and ChIA-PET data
benchmark k Run full benchmark: generate, reconstruct, and compare
distmap Compute distance/contact/frequency maps from structure
metrics Compute structural comparison metrics between two structures
ensemble e Pairwise distance analysis across an ensemble of structures
extract r Extract a genomic fragment from a structure
position p Get 3D coordinates for BED regions
smooth s Create equidistant smoothed model
distance d Compute structural distance matrix
flatten f Flatten hierarchical structure to text coordinates
rewiring w Loop rewiring analysis across an ensemble

Common Options

Option Description
-s FILE Settings INI file (required for create)
-c REGION Chromosomes or region, e.g. genome, chr14, chr1-chr5, chr14:1:2500000
-n LABEL Label for output file names
-o DIR/ Output directory (include trailing /)
-m N Ensemble size (number of structures to generate)
-j SEED Fixed random seed for reproducibility
-i FILE Input file or directory
-F FORMAT Additional output format: cif (mmCIF), pdb, or both (HCM is always written)
-M MB Memory budget in MB (0 = unlimited)
-I METHOD Initialization method: random (default) or mds
-E Enable per-step energy trace CSV
-L N Energy logging interval in MC steps (default: 1000)

Examples

Reconstruct a single chromosome:

pMMC -a create -s config.ini -c chr14 -n my_run -o ./output/

Reconstruct entire genome as an ensemble of 10 structures:

pMMC -a create -s config.ini -c genome -m 10 -o ./ensemble/ -F both

Reconstruct a specific genomic region with a fixed seed:

pMMC -a create -s config.ini -c chr14:1:2500000 -j 42 -o ./region/

Generate synthetic data and benchmark:

pMMC -a generate -c chr22 -l 100 -m 20 -o ./synthetic/
pMMC -a benchmark -c chr22 -m 20 -o ./benchmark/

Compare two structures:

pMMC -a metrics -i structure_a.hcm,structure_b.hcm -o ./comparison/

Extract distance and contact maps:

pMMC -a distmap -i model.hcm -c chr14 -r 25000 -o ./maps/

Ensemble pairwise analysis:

pMMC -a ensemble -i ./models/ -p "model_{N}.hcm"

Configuration

Reconstruction is driven by an INI settings file (see config.ini for a complete example with inline documentation). Key sections:

[data] - Input files

data_dir = ../data_GM12878/
anchors = GM12878_anchors.bed           # Anchor BED file (e.g. CTCF binding sites)
clusters = GM12878_clusters.bedpe       # PET cluster files (comma-separated)
factors = CTCF                          # Factor names
singletons = GM12878_singletons.bedpe   # Singleton interaction files
segment_split = GM12878_segments.bed    # Segment boundary definitions
centromeres = hg38.bed                  # Centromere regions

[cuda] - GPU parameters

num_threads = 512         # CUDA threads per block
blocks_multiplier = 16    # Grid size multiplier
milestone_fails = 3       # Max consecutive failures before stopping

[simulation_heatmap] - Heatmap-level Monte Carlo parameters

max_temp_heatmap = 5.0                                    # Starting temperature
delta_temp_heatmap = 0.9999                               # Cooling rate
stop_condition_improvement_threshold_heatmap = 0.99        # Convergence threshold
stop_condition_steps_heatmap = 50000                       # Steps per milestone

[simulation_arcs] - Arc-level Monte Carlo parameters

max_temp = 5.0                                # Starting temperature
delta_temp = 0.9999                           # Cooling rate
stop_condition_improvement_threshold = 0.975   # Convergence threshold
stop_condition_steps = 50000                   # Steps per milestone

[distance] - Interaction-to-distance mapping

freq_dist_scale = 25.0     # Singleton frequency to 3D distance scale
freq_dist_power = -0.6     # Singleton frequency to 3D distance exponent
count_dist_a = 0.2         # PET count to distance parameter

[springs] - Polymer constraints

stretch_constant = 0.1     # Linker stretch penalty
squeeze_constant = 0.1     # Linker compression penalty
angular_constant = 0.1     # Bending angle penalty

See config.ini for the full list of ~110 configurable parameters.

Input Data

pMMC expects ChIA-PET data organized as:

  • Anchors (.bed): Genomic coordinates of protein binding sites (e.g. CTCF peaks)
  • PET clusters (.bedpe): High-confidence chromatin interaction pairs
  • Singletons (.bedpe): Lower-frequency interaction pairs used to build contact heatmaps
  • Segment boundaries (.bed): Defines ~2 Mb segments for the hierarchical decomposition
  • Centromeres (.bed): Centromere positions for chromosome arm separation

Output Formats

  • HCM (.hcm): Native hierarchical binary format preserving the full 4-level tree structure
  • mmCIF (.cif): Macromolecular Crystallographic Information File for molecular viewers
  • PDB (.pdb): Protein Data Bank format for visualization in tools like PyMOL, Chimera, VMD
  • Text (.txt): Flat 3D coordinate files from flatten and smooth actions
  • Heatmap (.heat): Distance/contact matrices from distmap and distance actions
  • CSV: Energy traces, contact decay curves, and metric reports

Algorithm Overview

pMMC reconstructs 3D structure through a top-down multiscale approach:

  1. Tree construction: Build a 4-level hierarchical tree from input data (chromosome > segment > anchor > subanchor)
  2. Heatmap-guided MC (levels 0-1): Position chromosome territories and segments using singleton contact frequency heatmaps
  3. Arc-distance-guided MC (level 2): Refine anchor positions using PET cluster interaction distances
  4. Smoothing MC (level 3): Generate fine-grained subanchor structure with polymer constraints

Each level uses parallel CUDA Monte Carlo with Metropolis acceptance and simulated annealing. The energy function combines:

  • Contact frequency matching (heatmap scale)
  • Interaction distance matching (arc/loop scale)
  • Polymer spring constraints (stretch, squeeze, angular)
  • Optional microscopy density constraints
  • Optional CTCF motif orientation scoring

Project Structure

pMMC/
├── src/                    C++ and CUDA source files
│   ├── main.cpp            CLI entry point
│   ├── LooperSolver.cpp    Main reconstruction engine
│   ├── heatmap.cpp         Contact frequency matrices
│   ├── chromosome.cpp      Bead-chain polymer model
│   ├── HierarchicalChromosome.cu   Hierarchical tree (CUDA)
│   ├── ParallelMonteCarlo*.cu      MC simulation kernels
│   ├── SyntheticGenerator.cpp      Synthetic data generation
│   ├── BenchmarkRunner.cpp         Benchmark pipeline
│   ├── MetricsFramework.cpp        Structural comparison metrics
│   ├── CifWriter.cpp / PdbWriter.cpp   Output format writers
│   └── ...
├── include/                Header files with Doxygen documentation
├── thirdparty/             Bundled dependencies (INI parser, RMSD, matrix lib)
├── test/                   Test and CI scripts (.bat)
├── MSVC++/                 Visual Studio solution and project files
├── config.ini              Default configuration with inline docs
├── CMakeLists.txt          CMake build configuration
└── Dockerfile              NVIDIA CUDA container build

Testing

Test scripts are provided in the test/ directory:

test/build_and_test.bat        # Compile and verify basic functionality
test/ci_local.bat              # Full CI pipeline (build + validate + benchmark)
test/test_determinism.bat      # Reproducibility with fixed seeds
test/test_interchrom.bat       # Inter-chromosomal reconstruction
test/run_benchmarks.bat        # Performance benchmarks

License

See the repository for license information.

About

Parallel Multiscale Monte Carlo Approach to 3D Structure Modelling

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors