Skip to content

Deduplicate chunked bucket-fill logic; fix #95 and #97 in the process#100

Merged
cmutel merged 2 commits into
mainfrom
refactor/deduplicate-chunked-fill
Jun 4, 2026
Merged

Deduplicate chunked bucket-fill logic; fix #95 and #97 in the process#100
cmutel merged 2 commits into
mainfrom
refactor/deduplicate-chunked-fill

Conversation

@cmutel
Copy link
Copy Markdown
Member

@cmutel cmutel commented Jun 4, 2026

Summary

Collapses create_chunked_structured_array and create_chunked_array into a single public create_chunked function, and fixes two bugs that existed in the old implementations:

  • create_chunked(iterable, dtype, ncols=None, bucket_size=500) handles both cases:
    • Omit ncols → 1D structured array
    • Pass ncols → plain 2D array
  • The two old functions (and the intermediate _fill_chunked helper) are removed entirely.

Bugs fixed

Closes #98. Also fixes #95 and #97.

Test plan

  • test_create_chunked_multiple_full_buckets — shape is (1000, 3), not (500, 6)
  • test_create_chunked_full_plus_partial_bucket — no ValueError when combining differently-sized chunks
  • test_create_chunked_under_one_bucket / test_create_chunked_empty_plain
  • test_create_chunked_structured_multiple_buckets / _under_one_bucket / _empty
  • All previously passing tests continue to pass (211/211)

cmutel added 2 commits June 4, 2026 08:35
create_chunked_structured_array and create_chunked_array shared identical
bucket-fill loops that had to be maintained separately. Extract _fill_chunked,
parameterised by bucket_shape and empty_shape, so both delegate to one place.

The consolidation also fixes two latent bugs in the previous duplicate code:
- create_chunked_array used np.hstack, which concatenates 2D arrays on axis=1
  (columns) instead of axis=0 (rows), silently producing wrong shapes or
  raising ValueError for inputs longer than bucket_size rows. (#95)
- The partial-chunk slice array[:i+1] was a view that kept the full bucket
  buffer alive until the final concatenation; now .copy() is called to release
  the oversized allocation immediately. (#97)

Closes #98. Also fixes #95 and #97.
…create_chunked

Both functions differed only in bucket shape; the _fill_chunked helper introduced in
this branch already proved they were identical. Merge into a single public create_chunked
that accepts an optional ncols parameter: omit for a 1D structured array, supply for a
plain 2D array. Removes _fill_chunked, create_chunked_structured_array, and
create_chunked_array entirely.
@cmutel cmutel merged commit 0d294a9 into main Jun 4, 2026
6 checks passed
@cmutel cmutel deleted the refactor/deduplicate-chunked-fill branch June 4, 2026 07:02
@cmutel cmutel mentioned this pull request Jun 4, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant