Merged
Conversation
|
simolus3
reviewed
Apr 8, 2026
Contributor
simolus3
left a comment
There was a problem hiding this comment.
The approach makes sense to me, I basically just have nits regarding parseDocumentId.
simolus3
previously approved these changes
Apr 8, 2026
This uses now uses a custom _id parser for snapshots.
6d7e475 to
b68bea1
Compare
Contributor
Author
|
Rebased after merging #591, otherwise unchanged. |
simolus3
approved these changes
Apr 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Builds on #591.
This changes the internal implementations to operate on a Buffer (plus original _id) for each source document as far as possible, instead of using the deserialized version.
For change streams, this means parsing each event using
{raw: true}(which parses top-level fields, but returns a Buffer for each nested document). We then manually deserialize relevant fields.For snapshot queries, this means using
{raw: true}on the query, and then doing custom parsing to get the last_idfor subsequent queries.This result is that we can do the conversion to
SqliteRowall in one step:Right now, the implementation still uses a pipeline of
bson.deserialize() -> constructAfterRecord() -> applyRowContext(). The next step is to replace that with a custom bson parser -> SqliteRow implementation, which can give significant performance benefits.Some synthetic benchmarks, comparing parsing of insert and update change stream events against the
bson.deserializeapproach used previously are below. This shows a slight performance regression for processinginsertevents (up to 10% for small events), while giving a potential performance boost forupdateevents (since this skips parsing ofupdateDescription).The absolute differences here are quite small - we're targeting around 20k ops/s or 20MiB/s for the entire replication process, with current throughput being around half that).