compact_pro now contains a full clean-room implementation baseline:
- Pure Zig core for Compact Pro parsing/extraction/writing (RLE + LZH profiles).
- C ABI for embedding.
- C CLI (
compact-pro) handling all filesystem/resource-fork/metadata I/O. - Nix-based dev/build/test environment and CI.
-
COMPACT_PRO_CLEANROOM_SPEC.md- Clean-room binary/codec/container specification and host fork policy.
-
src/crc32jam.zig- JAMCRC implementation used for header and file payload CRC verification.
-
src/rle8182.zig- Compact Pro RLE (
0x81/0x82) encoder/decoder. - Encoder now emits repeat-run opcodes for repeated non-
0x81bytes (thresholded to avoid size regressions on short runs). - Encoder
0x81path now handles arbitrary runs of0x81bytes correctly (including0x81,0x81,0x81,0x82patterns) to avoid decode drift/checksum failures in external tools. - Encoder hot path now batches literal spans and uses vectorized repeated-byte run scanning plus slice-reserved run emission to reduce compress-time overhead on large inputs.
- Decoder now writes into preallocated output and fills repeat runs in chunks to reduce extraction-time overhead.
- Compact Pro RLE (
-
src/lzh.zig- Pure Compact Pro LZH codec with per-block Huffman codebook parsing/writing, LZSS window copy/match finding, and integrated Compact Pro RLE decode stage.
- Supports extraction of LZH-compressed forks from legacy archives and creation of LZH-over-RLE fork payloads for new archives.
- Encoder match finder now prefilters chain candidates using current best-match boundary bytes to reduce needless byte-by-byte scans on large multi-block inputs.
- Encode path now supports concurrent block emission: sequential tokenization preserves dictionary semantics, while block Huffman/bitstream encoding runs in parallel with deterministic ordered merge (
encodeWithWorkerLimit;encodeauto-selects worker count). - Encode progress callbacks now span both tokenization and block-encode phases (single monotonic stream) so long-running encode work reports meaningful live progress/ETA.
- Tokenization now has a bold parallel path for multi-segment inputs: deterministic fixed-size segments seeded with a window overlap prefix are tokenized concurrently, then globally reassembled into valid Compact Pro block boundaries (
block_count >= 0x1fff0) before block encoding. Single-segment inputs stay on a direct fast path to avoid extra merge overhead.
-
src/core.zig- Pure archive engine: metadata parser, recursive entry parsing (directories/files), extraction, archive creation, and add semantics.
- Writer supports directory-structured entry encoding from slash-delimited archive paths.
- Parser/writer now use legacy Compact Pro subtree-count semantics for root/directory entry counts (not immediate-child counts), improving external-tool compatibility.
- Header CRC validation/generation now covers the full metadata envelope expected by legacy tooling: entry count, comment, and all serialized entry records up to payload start.
- Returns explicit unsupported errors for encrypted paths.
- Read path decodes LZH forks through
src/lzh.zig. - Write path picks per-fork compression strategy (
RLEvsLZH(RLE)) by encoded size and emits full multi-block LZH streams using legacy block-count termination semantics (>= 0x1fff0with overshoot-permitted final token). - Adds archive-create progress callback plumbing that reports encode work units (RLE + LZH stages) for CLI progress/ETA rendering without introducing I/O into core.
- Progress accumulator is mutex-protected so callback dispatch remains safe when LZH block encoding runs concurrently.
-
src/ffi.zig- C ABI layer over core.
- Exposes create/add/extract/list functions and matching free functions.
- Adds
cp_archive_create_with_progresscallback-enabled entrypoint for encode progress reporting. - Maps Zig errors to stable C error codes/messages.
-
src/main.zig- Zig entry shim invoking C CLI main function.
- Prints debug-build warning banner in Debug mode.
-
include/compact_pro.h- Public C header: input/output structs, list structs, API calls, progress callback type (
cp_progress_fn), free helpers, and error codes.
- Public C header: input/output structs, list structs, API calls, progress callback type (
-
csrc/compact_pro_cli.c- CLI command parser and implementations for
compress,expand,add,list. - Performs all file/resource-fork I/O.
compress/addrecursively expand directory inputs into file entries (including paths containing spaces).compressdefaults to no-clobber output and supports--force/-ffor explicit overwrite.- Captures directory metadata recursively from directory inputs (including empty directories) and recreates metadata-recorded empty directories on expand.
compress/expandprint completion stats to stderr (bytes, ratio/percent, MB/s, elapsed).- Progress UI now renders bar + percent + ETA (
progress: <phase> [====>----] 42% (done/total) (ETA: Ns)), focused on meaningful long phases (compress-encode,add-encode,expand-decode,expand-write) with callback-driven encode updates from core. - macOS default resource-fork ingestion now reads only real
com.apple.ResourceForkxattr content; it no longer blindly ingests..namedfork/rsrcbytes that may be filesystem-compression backing data. - Implements
expand --pathselective extraction. - Implements appended metadata trailer v2 with hierarchical file/dir metadata records and per-field masks.
- Restores metadata best-effort and emits explicit per-field warnings for unsupported/unrestorable fields (including cross-OS NTFS/Apple metadata cases).
listprints trailer size accounting (trailer_size,trailer_payload) in addition to normal entries.compresssupports optional-o(default archive naming),~path expansion, auto.cptsuffix for named outputs, stdin input via-, and stdout output via-.
- CLI command parser and implementations for
-
tests/unit/zig_unit_tests.zig- Unit tests for RLE behavior (including repeated-byte compression, literal/run/escape boundary encoding, and
0x81,0x81,0x81,0x82regression), archive roundtrip/create/add, fixture metadata parse, LZH fixture extraction/hash verification, single- and multi-block LZH encode/decode roundtrip, deterministic multi-worker LZH encoding (worker_limit=1vs4), deterministic multi-segment LZH encoding (worker_limit=1vs4on >16 MiB payload), LZH progress callback phase coverage (token + encode), write-path LZH flag selection, Compact Pro subtree-count encoding semantics, and header-CRC metadata coverage compatibility.
- Unit tests for RLE behavior (including repeated-byte compression, literal/run/escape boundary encoding, and
-
tests/cli/test_cli.sh- End-to-end CLI tests for help surface, compress/expand/add/list, selective extraction, sidecar handling, directory path roundtrip (including directory input paths with spaces), empty-directory metadata roundtrip, no-clobber vs
--forceoverwrite behavior, directory metadata restore, LZH fixture extraction, externalunarcompatibility regression for generated archives, progress flags (including bar + ETA output), trailer accounting output, cross-OS metadata warning behavior, and macOS default resource-fork gating behavior.
- End-to-end CLI tests for help surface, compress/expand/add/list, selective extraction, sidecar handling, directory path roundtrip (including directory input paths with spaces), empty-directory metadata roundtrip, no-clobber vs
-
build.zig- Zig build graph for static library, CLI executable, and unit-test step.
- Default optimization mode is
ReleaseFast.
-
flake.nix- Dependency control and reproducible dev shell/build/check definitions.
- Defines Garnix-facing CI checks directly in flake outputs (no
garnix.yaml) for five targets: Linuxx86_64/aarch64, macOSaarch64, Windowsx86_64/aarch64.
-
.github/workflows/ci.yml- GitHub Actions CI on
yolo: platform matrix with Linux (x86_64+aarch64) and macOS (aarch64) Nix-based full test/build jobs, Windowsx86_64native Zig test/build/smoke job, and Windowsaarch64cross-compile artifact job.
- GitHub Actions CI on
-
build- Project build wrapper (
ReleaseFastdefault;--debug,--testsupported).
- Project build wrapper (
-
test- Full deterministic suite runner (Zig unit tests + CLI integration tests).
-
bm- Benchmark suite comparing
compact-proagainstzipandgzip. - Measures compression/extraction speed (wall + CPU + throughput) and compression ratio.
- Writes history to
tests/benchmark/history.tsvand fails on sudden drift unless explicitly accepted.
- Benchmark suite comparing
-
fuzz- Fuzz entrypoint placeholder.
-
README.md- Project description, CI badges, command surface, behavior notes, and compatibility caveats.
-
PROJECT_OVERVIEW.md- Goal and terminology overview.
-
RULES.md- Local always-on implementation constraints.
-
fixtures/cpt/MacEnvy21.cpt- Real Compact Pro fixture for metadata/parser compatibility tests.