Skip to content

Add CrcTransformer for checksum calculation#224

Merged
St4NNi merged 13 commits intomainfrom
feat/crc-transformer
Jan 6, 2026
Merged

Add CrcTransformer for checksum calculation#224
St4NNi merged 13 commits intomainfrom
feat/crc-transformer

Conversation

@das-Abroxas
Copy link
Copy Markdown
Contributor

@das-Abroxas das-Abroxas commented Dec 17, 2025

This PR implements CRC checksum calculation support to enable data integrity validation for S3 uploads. By utilizing the crc-fast crate, the new CrcTransformer supports all known CRC-32 and CRC-64 variants.

Changes

  • Added crc-fast dependency for efficient CRC calculations
  • Implemented CrcTransformer with support for CRC32, CRC32C, and CRC64NVME algorithms
  • Added comprehensive unit tests for all three CRC algorithms, multifile situations and combination with other transformers

Testing

All unit tests pass and verify correct checksum calculation for known input data across all three supported CRC algorithms.

Closes #218

@das-Abroxas das-Abroxas added the enhancement New feature or request label Dec 17, 2025
@das-Abroxas das-Abroxas self-assigned this Dec 17, 2025
@codecov
Copy link
Copy Markdown

codecov Bot commented Dec 17, 2025

Codecov Report

❌ Patch coverage is 95.88477% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 36.49%. Comparing base (9a10bc7) to head (75e2e68).
⚠️ Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
...ata_proxy/src/s3_frontend/utils/crc_transformer.rs 95.88% 5 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #224      +/-   ##
==========================================
+ Coverage   35.69%   36.49%   +0.80%     
==========================================
  Files         127      128       +1     
  Lines       17583    17826     +243     
  Branches    17583    17826     +243     
==========================================
+ Hits         6276     6506     +230     
- Misses      10532    10540       +8     
- Partials      775      780       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@St4NNi St4NNi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition, as usual the whole transformer stuff is error prone. So there are a few things to look out for.

Comment thread components/data_proxy/src/s3_frontend/utils/crc_transformer.rs Outdated
Comment thread components/data_proxy/src/s3_frontend/utils/crc_transformer.rs Outdated
Comment thread components/data_proxy/src/s3_frontend/utils/crc_transformer.rs
let Ok((finished, should_flush)) = self.process_messages() else {
return Err(anyhow!("[HashingTransformer] Error processing messages"));
};

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there could be a race condition between receiving a file context and process_bytes. Maybe its best to assert this and fail if there is no file context when processing bytes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that message and byte processing run asynchronously is a fundamental problem of the implementation of the Pithos library, not the transformer.

Of course, additional checks could be built in for this, but that would add a considerable degree of complexity that would not be relevant for our default use case.

Copy link
Copy Markdown
Member

@St4NNi St4NNi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good now ! Approved !

@St4NNi St4NNi merged commit 3f9d591 into main Jan 6, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feat] Add CRC checksum calculation support

2 participants