Skip to content

Fixing corrupt rows#50

Draft
Josef-Haupt wants to merge 2 commits into
mainfrom
49-large-number-of-redundant-rows
Draft

Fixing corrupt rows#50
Josef-Haupt wants to merge 2 commits into
mainfrom
49-large-number-of-redundant-rows

Conversation

@Josef-Haupt

Copy link
Copy Markdown
Member

No description provided.

@Josef-Haupt Josef-Haupt linked an issue Jun 16, 2026 that may be closed by this pull request

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses data corruption and inconsistent duration calculations in the acoustic inference pipeline by making duration derivation deterministic (frames / samplerate) and by fixing tensor growth to preserve masking for unwritten segments.

Changes:

  • Compute audio duration from frames / samplerate instead of relying on sf_info.duration.
  • Replace in-place ndarray.resize() growth with explicit reallocation + copy to ensure newly added segments remain masked/uninitialized as intended.
  • Add focused regression tests for duration computation and tensor resize/masking behavior (including the “initial pointer = 0” case).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/birdnet/acoustic/inference/core/producer.py Derives duration from sample frames and samplerate to avoid inaccurate sf_info.duration.
src/birdnet/acoustic/inference/core/prediction/prediction_tensor.py Reworks segment capacity growth to keep unwritten prediction segments masked; adjusts initial allocation for zero pointer.
src/birdnet/acoustic/inference/core/encoding/encoding_tensor.py Mirrors the prediction tensor resize fix for embeddings; adjusts initial allocation for zero pointer.
src/birdnet_tests/acoustic_models/inference/producer_py/test_get_file_segments_with_overlap.py Adds regression test ensuring duration uses frames/samplerate rather than sf_info.duration.
src/birdnet_tests/acoustic_models/inference/predictions/prediction_tensor_py/test_prediction_tensor.py Adds tests verifying resize preserves masking and that zero-pointer initialization starts empty.
src/birdnet_tests/acoustic_models/inference/encoding/encoding_tensor_py/test_encoding_tensor.py Adds tests verifying resize preserves masking and that zero-pointer initialization starts empty.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Large number of redundant rows

2 participants