Skip to content

[MongoDB] Raw buffers#598

Merged
rkistner merged 8 commits intomainfrom
mongo-json-direct
Apr 13, 2026
Merged

[MongoDB] Raw buffers#598
rkistner merged 8 commits intomainfrom
mongo-json-direct

Conversation

@rkistner
Copy link
Copy Markdown
Contributor

@rkistner rkistner commented Apr 8, 2026

Builds on #591.

This changes the internal implementations to operate on a Buffer (plus original _id) for each source document as far as possible, instead of using the deserialized version.

For change streams, this means parsing each event using {raw: true} (which parses top-level fields, but returns a Buffer for each nested document). We then manually deserialize relevant fields.

For snapshot queries, this means using {raw: true} on the query, and then doing custom parsing to get the last _id for subsequent queries.

This result is that we can do the conversion to SqliteRow all in one step:

rawToSqliteRow(source: Buffer): { row: SqliteRow; replicaId: any }

Right now, the implementation still uses a pipeline of bson.deserialize() -> constructAfterRecord() -> applyRowContext(). The next step is to replace that with a custom bson parser -> SqliteRow implementation, which can give significant performance benefits.

Some synthetic benchmarks, comparing parsing of insert and update change stream events against the bson.deserialize approach used previously are below. This shows a slight performance regression for processing insert events (up to 10% for small events), while giving a potential performance boost for update events (since this skips parsing of updateDescription).

The absolute differences here are quite small - we're targeting around 20k ops/s or 20MiB/s for the entire replication process, with current throughput being around half that).

Scenario          Full doc      Event         Benchmark                           Ops/s             MiB/s
--------          --------      -----         ---------                           -----             -----
insert 1 KB       1.0 KB        1.3 KB        parseChangeDocument                 596,238           734
                                              parseChangeDocument + fullDocument  193,051           237
                                              bson.deserialize                    211,247           260
insert 10 KB      10 KB         10 KB         parseChangeDocument                 605,646           6,069
                                              parseChangeDocument + fullDocument  67,400            675
                                              bson.deserialize                    68,967            691
insert 100 KB     100 KB        100 KB        parseChangeDocument                 548,519           53,707
                                              parseChangeDocument + fullDocument  8,916             873
                                              bson.deserialize                    9,009             882
update 1 KB       1.0 KB        1.9 KB        parseChangeDocument                 559,738           1,062
                                              parseChangeDocument + fullDocument  185,484           352
                                              bson.deserialize                    148,481           282
update 10 KB      10 KB         20 KB         parseChangeDocument                 540,899           10,534
                                              parseChangeDocument + fullDocument  64,267            1,252
                                              bson.deserialize                    38,016            740
update 100 KB     100 KB        200 KB        parseChangeDocument                 562,278           109,788
                                              parseChangeDocument + fullDocument  8,976             1,753
                                              bson.deserialize                    4,587             896

@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 8, 2026

⚠️ No Changeset found

Latest commit: b68bea1

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@rkistner rkistner marked this pull request as ready for review April 8, 2026 12:21
@rkistner rkistner requested a review from simolus3 April 8, 2026 12:21
Copy link
Copy Markdown
Contributor

@simolus3 simolus3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach makes sense to me, I basically just have nits regarding parseDocumentId.

simolus3
simolus3 previously approved these changes Apr 8, 2026
Base automatically changed from raw-change-streams-2 to main April 13, 2026 11:52
@rkistner rkistner dismissed simolus3’s stale review April 13, 2026 11:52

The base branch was changed.

@rkistner rkistner force-pushed the mongo-json-direct branch from 6d7e475 to b68bea1 Compare April 13, 2026 11:57
@rkistner
Copy link
Copy Markdown
Contributor Author

Rebased after merging #591, otherwise unchanged.

@rkistner rkistner requested a review from simolus3 April 13, 2026 12:01
@rkistner rkistner merged commit 7f993e4 into main Apr 13, 2026
44 checks passed
@rkistner rkistner deleted the mongo-json-direct branch April 13, 2026 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants