fix(script): rebalance manifest backfill workers#2314
Conversation
📝 WalkthroughWalkthroughRefactors scripts/backfill_manifest_file_sizes.mjs to claim bounded manifest.id windows instead of fixed ranges. Adds createIdWindowClaimer and --id-window-size, introduces processIdWindow for in-window pagination, updates runRangeWorker to repeatedly claim windows, and extends progress reporting with totalWindows/claimedWindows and per-worker current window fields. ChangesWindow-based manifest backfill refactoring
🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Comment |
Merging this PR will not alter performance
Comparing Footnotes
|
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/backfill_manifest_file_sizes.mjs`:
- Line 676: The default id window size calculation for idWindowSize (the
getNumberArg call using Math.max(batchSize * 25, 50000)) uses magic numbers 25
and 50000 without context; add a brief inline comment next to this expression
that documents why 25x batchSize is chosen (e.g., balances worker claim
frequency vs DB scan overhead) and why 50000 is the floor/threshold, referencing
the variables batchSize and the all flag so future maintainers understand the
tuning rationale.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c401b078-5a53-4d67-8aee-78ef1ca28e13
📒 Files selected for processing (1)
scripts/backfill_manifest_file_sizes.mjs
|



Summary (AI generated)
Motivation (AI generated)
The previous worker model could still waste workers on sparse ID regions. A shared cursor keeps every worker claiming the next 1000-row batch until no candidates remain, while preserving indexed manifest.id reads.
Business Impact (AI generated)
This should make the production backfill finish faster and produce a concrete CSV audit trail for files whose storage metadata cannot be resolved.
Test Plan (AI generated)