|
| 1 | +# Session: Full Mmap Implementation + Benchmarks (2026-04-08) |
| 2 | + |
| 3 | +## What was done (15 commits on branch Claude) |
| 4 | + |
| 5 | +### Steps 1-2: Count-Before-Load + Auto Method Selection |
| 6 | +- `DataLoader::count()`, `--method auto` (default), CLARA auto-scaling |
| 7 | + |
| 8 | +### Steps 3-5: Mmap Distance Matrix |
| 9 | +- llfio integrated via CPM → made non-optional (core dependency) |
| 10 | +- `MmapDistanceMatrix` with 32-byte binary header, llfio backend |
| 11 | +- `std::variant<DenseDistanceMatrix, MmapDistanceMatrix>` in Problem |
| 12 | +- Binary checkpoint (`save_binary_checkpoint`/`load_binary_checkpoint`) |
| 13 | +- CLI: `--restart`, `--mmap-threshold` |
| 14 | + |
| 15 | +### Benchmarks (N=5000, ~95MB matrix) |
| 16 | +- Mmap random access: only 5% slower than heap |
| 17 | +- Mmap startup: 78x faster than fread (1.4ms vs 109ms) |
| 18 | +- CLARA views: 48x faster than vector copies |
| 19 | +- `std::visit` overhead: 0.000025% — negligible |
| 20 | + |
| 21 | +### Phase 2: llfio non-optional |
| 22 | +- Removed all `#ifdef DTWC_HAS_MMAP` guards (12 files) |
| 23 | +- llfio always linked, always available |
| 24 | + |
| 25 | +### MmapDataStore for time series |
| 26 | +- `dtwc/core/mmap_data_store.hpp` — contiguous mmap cache for series data |
| 27 | +- 64-byte header + offset table + contiguous doubles |
| 28 | +- Supports variable-length, multivariate series |
| 29 | +- Extracted shared `crc32.hpp` utility |
| 30 | +- 6 test cases, all passing |
| 31 | + |
| 32 | +## Test results |
| 33 | +- 67/67 pass, 2 CUDA skipped |
| 34 | + |
| 35 | +## Commits |
| 36 | +``` |
| 37 | +77d0745 docs: add mmap benchmark results to LESSONS.md, update plans and handoff |
| 38 | +52d1e20 feat: implement MmapDataStore with llfio backend |
| 39 | +32061d5 test(RED): add MmapDataStore unit tests |
| 40 | +1f91d71 refactor: make llfio non-optional, remove DTWC_ENABLE_MMAP guards |
| 41 | +adfa375 feat(bench): add mmap access pattern benchmark suite |
| 42 | +55844ef docs: update CHANGELOG, fix --mmap-threshold 0 logic |
| 43 | +60e0a3e feat: add binary checkpoint and --restart/--mmap-threshold CLI flags |
| 44 | +1134695 test(RED): add binary checkpoint tests |
| 45 | +535bdf4 feat: integrate MmapDistanceMatrix into Problem via std::variant |
| 46 | +d8dea97 test(RED): add variant distance matrix integration tests |
| 47 | +9a6cfe0 feat: implement MmapDistanceMatrix with llfio backend |
| 48 | +6457644 Add llfio as optional dependency for memory-mapped distance matrices |
| 49 | +75c760d Add DataLoader::count() and --method auto (default) for CLI |
| 50 | +c4fb30c Add lessons learned, mmap/auto-select implementation plan, session handoff |
| 51 | +``` |
| 52 | + |
| 53 | +## What to do next |
| 54 | + |
| 55 | +### IMMEDIATE: C++20 upgrade + std::span integration (Phase 3c) |
| 56 | +User approved C++20 upgrade. This unlocks: |
| 57 | +1. **Change CMake minimum to C++20** (`CMAKE_CXX_STANDARD 20`) |
| 58 | +2. **Change `dtw_fn_` type** from `std::function<data_t(const vector<data_t>&, const vector<data_t>&)>` to `std::function<data_t(std::span<const data_t>, std::span<const data_t>)>` |
| 59 | +3. **std::span is constructible from both `vector<double>` and `{double*, size_t}`** — works seamlessly with heap AND mmap backends |
| 60 | +4. **Replace `TimeSeriesView`** with `std::span<const double>` (standard version) |
| 61 | +5. **Fix CLARA** to pass spans into mmap data instead of copying vectors |
| 62 | +6. **Problem::p_vec(i)** can return `std::span<const data_t>` instead of `const vector<data_t>&` |
| 63 | + |
| 64 | +### Then: Phase 4 (Storage Policy) |
| 65 | +- `StoragePolicy` enum: Auto/Heap/Mmap |
| 66 | +- Auto-select based on N threshold (benchmarks say 5000 is fine) |
| 67 | +- Default to mmap for all sizes (benchmarks show negligible overhead) |
| 68 | + |
| 69 | +### Deferred |
| 70 | +- Streaming CLARA (Step 6 from original plan) |
| 71 | +- Stale cache detection (hash of input filenames in mmap header) |
| 72 | +- Checkpoint warm-start in algorithms (--restart loads but doesn't skip re-clustering yet) |
| 73 | + |
| 74 | +## Open bugs |
| 75 | +- hierarchical + SoftDTW crashes |
| 76 | +- set_if_unset in YAML overrides CLI values |
| 77 | +- MV banded DTW silently ignores band |
| 78 | +- Pruned distance matrix incompatible with MmapDistanceMatrix (calls `dense_distance_matrix()`) |
| 79 | + |
| 80 | +## Key decisions |
| 81 | +- llfio is now a core dependency (not optional) |
| 82 | +- Mmap has negligible overhead — safe as default backend |
| 83 | +- C++20 approved for std::span (user confirmed) |
| 84 | +- Eigen::Map useful for series (SIMD ops), not for packed triangular distance matrix |
| 85 | +- Binary formats are internal caches, not user-facing — users keep CSV/HDF5/Parquet |
0 commit comments