feat: IC-Former-inspired improvements — base64 classifier, overhead metric by SimplyLiz · Pull Request #23 · SimplyLiz/ContextCompressionEngine

SimplyLiz · 2026-03-22T17:43:52Z

Summary

Base64 classification fix: Messages containing base64 blobs (40+ chars) were falling through the classifier as T2/T3 and getting summarized, destroying opaque data. Added base64_content pattern to FORCE_T0_PATTERNS and HARD_T0_REASONS — these are now preserved verbatim.
Compression overhead ratio metric: computeOverheadRatio() measures compress() wall-clock time vs estimated LLM inference cost. Displayed as OvhdR column in quality bench output.
High-entropy quality bench scenario: New scenario with hex dumps, UUID arrays, base64 blobs, and mixed entropy+prose. 4/4 probes pass. Quality baseline saved for v1.4.0.
Adjacency scoring was implemented, A/B tested, and removed — zero measurable effect across all scenarios. Commit history preserved for reference.

Verification

671 tests pass, zero regressions against v1.3.0 baseline
npm run lint && npm run format:check clean
npm run bench:quality --check passes
New v1.4.0 quality baseline saved with high-entropy scenario

Test plan

npm test — all 671 tests pass
npm run bench:quality — high-entropy probes 4/4
npm run bench:quality -- --check — no regressions vs v1.3.0
npm run lint && npm run format:check — clean
A/B tested adjacency scoring, confirmed no effect, removed

…scoring, overhead metric - Fix base64 classification gap: add FORCE_T0_PATTERNS entry for 40+ char base64 strings, add to HARD_T0_REASONS. Hex/UUID probes already pass, base64 probe now passes too. - Sentence adjacency scoring: boost sentences sharing entities with neighbors (+2 one side, +3 both) to improve topical coherence in summaries. Uses exported extractMessageEntities from importance.ts. - Compression overhead ratio metric: computeOverheadRatio() measures compress() wall-clock time vs estimated LLM inference time. Displayed as OvhdR column in quality bench output. - High-entropy quality bench scenario with hex dump, UUID array, base64 blob, and mixed entropy+prose messages (4/4 probes pass). - Unit tests for base64/hex/UUID classification and compress preservation, adjacency scoring behavior, and documented known limitations (UUID gap, camelCase false-positive on base64 pattern).

v1.4.0 baseline with high-entropy content scenario. Zero regressions vs v1.3.0 across all 13 shared scenarios. New scenario: High-entropy content at 1.35x ratio, 100% entity retention, 4/4 probes.

Settings bar: - depth dropdown (gentle/moderate/aggressive/auto) - relevance toggle + threshold input - flow, importance, contradiction, coreference, clustering toggles - budget strategy dropdown (binary-search/tiered, visible when budget on) - visual divider between v1 and v2 controls Stats bar: - quality_score, entity_retention, structural_integrity chips - messages_relevance_dropped, importance_preserved, contradicted chips - color-coded: green >=90%, amber >=70%, red <70% Examples: - "Q&A + corrections" — demonstrates flow + contradiction detection - "Topic-scattered" — 3 interleaved topics for clustering demo Help panel: V2 Features section with all new options explained

A/B testing showed adjacency scoring (+2/+3 boost for entity-linked neighbor sentences) produces identical results across all quality bench scenarios. The summarizer budget is wide enough that sentence selection pressure never triggers the tiebreaker. Removing to avoid dead complexity.

SimplyLiz added 4 commits March 22, 2026 17:48

chore: save quality baseline after IC-Former improvements

e677d8d

v1.4.0 baseline with high-entropy content scenario. Zero regressions vs v1.3.0 across all 13 shared scenarios. New scenario: High-entropy content at 1.35x ratio, 100% entity retention, 4/4 probes.

SimplyLiz merged commit cd9165f into develop Mar 22, 2026
4 of 7 checks passed

SimplyLiz deleted the feature/ic-former-inspired-improvements branch March 22, 2026 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: IC-Former-inspired improvements — base64 classifier, overhead metric#23

feat: IC-Former-inspired improvements — base64 classifier, overhead metric#23
SimplyLiz merged 4 commits intodevelopfrom
feature/ic-former-inspired-improvements

SimplyLiz commented Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SimplyLiz commented Mar 22, 2026

Summary

Verification

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant