[spark] support batch read from fluss cluster #2377

YannByron · 2026-01-15T07:29:54Z

Purpose

Linked issue: close #2376

Brief change log

Tests

API and Format

Documentation

wuchong

@YannByron thanks, I left some comments.

wuchong · 2026-01-20T09:43:02Z

fluss-common/src/main/java/org/apache/fluss/utils/InternalRowUtils.java

+        return new GenericMap(newMap);
+    }
+
+    public static Object copyRow(Object o, DataType type) {


This method name conflicts with copyRow(InternalRow row, RowType rowType) and may be confusing. Can we rename it to copyValue to distinguish with copyRow?

Besides, since it is only called InternalRowUtils, we can change the visibility of this method and copyArray, copyMap to private.

wuchong · 2026-01-20T09:46:54Z

...ark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussPartitionReader.scala

+  protected val POLL_TIMEOUT: Duration = Duration.ofMillis(100)
+  protected lazy val conn: Connection = ConnectionFactory.createConnection(flussConfig)
+  protected lazy val table: Table = conn.getTable(tablePath)
+  protected lazy val tableInfo: TableInfo = conn.getAdmin.getTableInfo(tablePath).get()


The tableInfo can be got from table: table.getTableInfo.

wuchong · 2026-01-20T17:39:40Z

fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussBatch.scala

+              FlussUpsertInputPartition(
+                tableBucket,
+                snapshotIdOpt.getAsLong,
+                logOffsetOpt.getAsLong


Since this is a batch InputPartition, we should add an end offset to make the log split bounded. The latest end offset can be got from OffsetsInitializer.latest().getBucketOffsets(..) method.

We should:

fetch the latest kvSnapshots, it is a map<bucket, snapshot_id&log_start_offset>.

fetch the latest offset from OffsetsInitializer.latest, it is a map<bucket, log_end_offset>.

Join the kvSnapshots and OffsetsInitializer.latest, to generate a input partition list for each bucket.

wuchong · 2026-01-20T17:42:18Z

fluss-spark/fluss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussBatch.scala

+              )
+            } else {
+              // No snapshot yet, only read log from beginning
+              FlussUpsertInputPartition(tableBucket, -1L, 0L)


We should use org.apache.fluss.client.table.scanner.log.LogScanner#EARLIEST_OFFSET instead of 0 to indicate reading log from beginning. Because the 0L offset maybe TTLed, and be thrown LogOffsetOutOfRangeException.

wuchong · 2026-01-20T17:55:14Z

...uss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussUpsertPartitionReader.scala

+    }
+
+    // Poll for more log records
+    val scanRecords: ScanRecords = logScanner.poll(POLL_TIMEOUT)


logScanner.poll() is a best-effort API: it may return an empty result due to transient issues (e.g., network glitches) even when unread log records remain on the server. Therefore, we should poll in a loop until we reach the known end offset.

The end offset should be determined at job startup using OffsetsInitializer.latest().getBucketOffsets(...), which gives us the high-watermark for each bucket at the beginning of the batch job.

Since there’s no built-in API to read a bounded log split, we must manually:

Skip any records with offsets beyond the precomputed end offset, and

Signal there is no next once all buckets have reached their respective end offsets.

wuchong · 2026-01-20T18:00:18Z

...uss-spark-common/src/main/scala/org/apache/fluss/spark/read/FlussUpsertPartitionReader.scala

+    logRecords = bucketRecords.iterator()
+    if (logRecords.hasNext) {
+      val scanRecord = logRecords.next()
+      currentRow = convertToSparkRow(scanRecord)


The LogRecord is a changelog that contains -D (delete) and -U (update-before) records. To produce a consistent view, we need to merge these changes with the KV snapshot data in a union-read fashion—just like how we combine data lake snapshots with changelogs.

Fortunately, the KV snapshot scan is already sorted by primary key. We can leverage this by:

Materializing the delta changes into a temporary delta table;

Sorting the delta table by primary key using org.apache.fluss.row.encode.KeyEncoder#of(...);

Performing a sort-merge between the sorted KV snapshot reader and the sorted delta table reader.

This enables an efficient and correct merge without requiring random lookups or hash-based joins.

wuchong · 2026-01-20T18:03:51Z

fluss-spark/fluss-spark-ut/src/test/scala/org/apache/fluss/spark/SparkReadTest.scala

+      checkAnswer(
+        sql(s"SELECT * FROM $DEFAULT_DATABASE.t ORDER BY orderId"),
+        Row(600L, 21L, 601, "addr1", "2026-01-01") ::
+          Row(700L, 220L, 602, "addr2_updated", "2026-01-01") ::
+          Row(800L, 23L, 603, "addr3", "2026-01-02") ::
+          Row(900L, 240L, 604, "addr4_updated", "2026-01-02") ::
+          Row(1000L, 25L, 605, "addr5", "2026-01-03") ::
+          Row(1100L, 260L, 606, "addr6", "2026-01-03") ::


Currently, the test passes even if changelog reading is not properly implemented. This is because the test base uses a very short KV snapshot interval (1 second), so the reader always falls back to the KV snapshot and never actually consumes the changelog.

I think it’s acceptable to keep this as-is for now, since we plan to refactor the changelog merge-read logic in upcoming PRs, as discussed offline. But please create an issue to track this.

YannByron force-pushed the main-spark branch from 5133ff7 to 7263a17 Compare January 15, 2026 09:20

YannByron added 2 commits January 18, 2026 22:40

[spark] support batch read from fluss cluster

0226069

update

cbd062d

YannByron force-pushed the main-spark branch from 02d49b2 to cbd062d Compare January 18, 2026 14:44

wuchong reviewed Jan 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[spark] support batch read from fluss cluster #2377

[spark] support batch read from fluss cluster #2377

Uh oh!

YannByron commented Jan 15, 2026

Uh oh!

wuchong left a comment

Uh oh!

wuchong Jan 20, 2026

Uh oh!

wuchong Jan 20, 2026

Uh oh!

wuchong Jan 20, 2026

Uh oh!

wuchong Jan 20, 2026

Uh oh!

wuchong Jan 20, 2026

Uh oh!

wuchong Jan 20, 2026

Uh oh!

wuchong Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[spark] support batch read from fluss cluster #2377

Are you sure you want to change the base?

[spark] support batch read from fluss cluster #2377

Uh oh!

Conversation

YannByron commented Jan 15, 2026

Purpose

Brief change log

Tests

API and Format

Documentation

Uh oh!

wuchong left a comment

Choose a reason for hiding this comment

Uh oh!

wuchong Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

wuchong Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

wuchong Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

wuchong Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

wuchong Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

wuchong Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

wuchong Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants