lazy catalog integration#113
Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
📝 WalkthroughWalkthroughAdds a callback-driven lazy pg_catalog: Rust bridge and Server API to install a Python lazy source, lazy-capable session context wiring, a DuckDB-backed provider, example code, docs, and integration tests exercising live schema visibility and error propagation. ChangesLazy Catalog Implementation
Sequence DiagramsequenceDiagram
participant Client
participant RiffqServer
participant PyLazyCatalogSource
participant DuckDB
Client->>RiffqServer: CREATE TABLE new_table (...)
RiffqServer->>RiffqServer: execute query (backend may update)
Client->>RiffqServer: SELECT * FROM pg_catalog.pg_class
RiffqServer->>PyLazyCatalogSource: relations(database,schema,callback)
PyLazyCatalogSource->>DuckDB: query system tables / information_schema
DuckDB-->>PyLazyCatalogSource: return rows
PyLazyCatalogSource-->>RiffqServer: emit RelationDef rows
RiffqServer-->>Client: result includes new_table
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
src/lib.rs (1)
2181-2202: 💤 Low valueMinor: Avoid redundant lock acquisition.
Line 2184 re-acquires the
lazy_catalog_sourcelock to check if it'sNone, but you already cloned the value intolazy_sourceat line 2141-2146. You can simplify this to use the local variable:Suggested fix
- if self.lazy_catalog_source.lock().unwrap().is_none() { + if lazy_source.is_none() {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib.rs` around lines 2181 - 2202, The code re-locks self.lazy_catalog_source to check for None even though a local variable lazy_source was previously cloned; replace the second lock check with a check on the existing lazy_source (e.g., if lazy_source.is_none()) so you don't reacquire the mutex, and keep the eager registration logic (calls to register_user_database, register_schema, register_user_tables) unchanged; ensure lazy_source is in scope where you make this change.teleduck/tests/test_lazy_catalog.py (1)
65-101: 💤 Low valueConsider using context manager for the connection.
If an assertion fails before
conn.close(), the connection remains open. While the process terminates anyway in test context, using a context manager is cleaner and consistent with the pattern intests/test_lazy_catalog.py.♻️ Proposed refactor
def test_table_created_after_startup_is_visible(self): - conn = psycopg.connect( + with psycopg.connect( f"postgresql://user:123@127.0.0.1:{self.port}/db", autocommit=True - ) - with conn.cursor() as cur: + ) as conn, conn.cursor() as cur: # The table does not exist yet ... cur.execute("SELECT count(*) FROM pg_catalog.pg_class WHERE relname='late_arrivals'") # ... rest of test ... - conn.close()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@teleduck/tests/test_lazy_catalog.py` around lines 65 - 101, The test opens a psycopg connection via conn = psycopg.connect(...) and calls conn.close() at the end, which can leak the connection if an assertion fails; change to use a context manager: replace the top-level conn = psycopg.connect(...) and subsequent conn.close() with a with psycopg.connect(..., autocommit=True) as conn: block and indent the cursor usage accordingly (the existing with conn.cursor() as cur: blocks stay the same) so the connection is always closed even on test failures.teleduck/src/teleduck/server.py (2)
136-143: 💤 Low valueConsider handling materialized views explicitly.
The
kindmapping classifies anything not starting with "VIEW" as a table. DuckDB'sinformation_schema.tablescan reporttable_type = 'MATERIALIZED VIEW'(e.g., via extensions), which would be misclassified as"table"rather than"materialized_view".Not breaking (the row is still visible), but
pg_class.relkindmay be inaccurate.Optional: handle materialized views
for table_name, table_type in rows: - kind = "view" if (table_type or "").upper().startswith("VIEW") else "table" + tt = (table_type or "").upper() + if tt.startswith("VIEW"): + kind = "view" + elif "MATERIALIZED" in tt: + kind = "materialized_view" + else: + kind = "table"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@teleduck/src/teleduck/server.py` around lines 136 - 143, The code currently sets kind = "view" when table_type.upper().startswith("VIEW") else "table", which misclassifies "MATERIALIZED VIEW" rows; update the logic in the loop that iterates rows (referencing table_type and kind) to explicitly detect materialized views (e.g., check table_type.strip().upper().startswith("MATERIALIZED VIEW") first) and set kind = "materialized_view", then fall back to "view" and finally "table" so _stable_oid("rel", ...), _stable_oid("type", ...), name, and kind are populated correctly.
69-71: 💤 Low valueSHA1 usage is appropriate here (static analysis false positive).
S324 flags SHA1 as insecure, but this is a non-cryptographic context—SHA1 is fine for generating stable, deterministic identifiers. No security concern.
Minor style note: per RUF005,
(salt, *parts)is slightly cleaner than(salt,) + parts.Optional style tweak
- key = "\x00".join((salt,) + parts) + key = "\x00".join((salt, *parts))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@teleduck/src/teleduck/server.py` around lines 69 - 71, The SHA1 use here is fine for non-crypto deterministic IDs; leave the hashlib.sha1-based logic (variables key, h and the final return) as-is but clean up the tuple construction by replacing "(salt,) + parts" with the clearer "(salt, *parts)" when building key so the join uses (salt, *parts) instead of concatenation.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Cargo.toml`:
- Around line 40-41: The Cargo.toml currently uses a local path for
datafusion_pg_catalog (datafusion_pg_catalog = { path = "../pg_catalog" }),
which will break CI and other developers; fix by restoring a non-local
dependency: either revert to the git-based dependency (uncomment the git line
for datafusion_pg_catalog and remove the path entry), or publish/use crates.io
and point datafusion_pg_catalog to the published version, or push the crate to
the remote git repo and update the dependency to the git URL/branch — ensure the
final Cargo.toml references a non-local source for datafusion_pg_catalog.
In `@teleduck/tests/test_lazy_catalog.py`:
- Around line 30-31: The temporary file descriptor returned by tempfile.mkstemp
is assigned to fd but never used; change the assignment to use an
unused-variable name (e.g., _fd) when calling tempfile.mkstemp in
test_lazy_catalog.py so the unused file descriptor is prefixed with an
underscore and linters stop complaining, keeping the rest of the logic
(assigning cls.db_file and unlinking) intact.
---
Nitpick comments:
In `@src/lib.rs`:
- Around line 2181-2202: The code re-locks self.lazy_catalog_source to check for
None even though a local variable lazy_source was previously cloned; replace the
second lock check with a check on the existing lazy_source (e.g., if
lazy_source.is_none()) so you don't reacquire the mutex, and keep the eager
registration logic (calls to register_user_database, register_schema,
register_user_tables) unchanged; ensure lazy_source is in scope where you make
this change.
In `@teleduck/src/teleduck/server.py`:
- Around line 136-143: The code currently sets kind = "view" when
table_type.upper().startswith("VIEW") else "table", which misclassifies
"MATERIALIZED VIEW" rows; update the logic in the loop that iterates rows
(referencing table_type and kind) to explicitly detect materialized views (e.g.,
check table_type.strip().upper().startswith("MATERIALIZED VIEW") first) and set
kind = "materialized_view", then fall back to "view" and finally "table" so
_stable_oid("rel", ...), _stable_oid("type", ...), name, and kind are populated
correctly.
- Around line 69-71: The SHA1 use here is fine for non-crypto deterministic IDs;
leave the hashlib.sha1-based logic (variables key, h and the final return) as-is
but clean up the tuple construction by replacing "(salt,) + parts" with the
clearer "(salt, *parts)" when building key so the join uses (salt, *parts)
instead of concatenation.
In `@teleduck/tests/test_lazy_catalog.py`:
- Around line 65-101: The test opens a psycopg connection via conn =
psycopg.connect(...) and calls conn.close() at the end, which can leak the
connection if an assertion fails; change to use a context manager: replace the
top-level conn = psycopg.connect(...) and subsequent conn.close() with a with
psycopg.connect(..., autocommit=True) as conn: block and indent the cursor usage
accordingly (the existing with conn.cursor() as cur: blocks stay the same) so
the connection is always closed even on test failures.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1fa72bd3-04f8-4157-bba9-d4fdbf6bd3ff
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
Cargo.tomlpysrc/riffq/connection.pysrc/lib.rsteleduck/src/teleduck.egg-info/SOURCES.txtteleduck/src/teleduck/server.pyteleduck/tests/test_lazy_catalog.pytests/test_lazy_catalog.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
example/lazy_catalog.py (1)
126-126: 💤 Low valueCREATE TABLE parser ignores NOT NULL constraints.
The parser always sets
nullable=Truefor every column, ignoring anyNOT NULLconstraints in the DDL. While this is acceptable for example code demonstrating the lazy catalog mechanism, consider adding a comment documenting this limitation so users understand that production implementations should parse constraints properly.📝 Suggested comment
for part in body.rstrip(")").split(","): bits = part.split() if len(bits) >= 2: + # Note: This naive parser always sets nullable=True, + # ignoring NOT NULL and other constraints for simplicity. cols.append((bits[0].strip('"'), TYPE_OIDS.get(bits[1].lower(), 25), True))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@example/lazy_catalog.py` at line 126, The CREATE TABLE parser currently forces every column to nullable by appending cols.append((bits[0].strip('"'), TYPE_OIDS.get(bits[1].lower(), 25), True)) which ignores NOT NULL constraints; add a clear comment next to that line (or at the top of the parsing function) stating that this example lazy catalog does not parse or enforce column constraints (e.g., NOT NULL) and that production implementations must parse constraints properly and set the nullable flag accordingly; reference the cols.append call, TYPE_OIDS, and the bits parsing logic so reviewers can locate where to document the limitation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@example/lazy_catalog.py`:
- Line 127: The assignment always uses the hardcoded key "appdb"
(CATALOG.setdefault("appdb", {}).setdefault(schema, {})[name] = cols) which
breaks multi-database use; change it to derive the database name from the
connection context (e.g., check kwargs for a "database" or "db" key, or use an
instance property like self.database or a connection helper method) and use that
variable as the top-level key (e.g., db_name = kwargs.get("database") or
self.database with a sensible fallback) before calling
CATALOG.setdefault(db_name, ...). Update the code path in the function/method
that contains that line so new tables are registered under the actual connected
database rather than the literal "appdb".
---
Nitpick comments:
In `@example/lazy_catalog.py`:
- Line 126: The CREATE TABLE parser currently forces every column to nullable by
appending cols.append((bits[0].strip('"'), TYPE_OIDS.get(bits[1].lower(), 25),
True)) which ignores NOT NULL constraints; add a clear comment next to that line
(or at the top of the parsing function) stating that this example lazy catalog
does not parse or enforce column constraints (e.g., NOT NULL) and that
production implementations must parse constraints properly and set the nullable
flag accordingly; reference the cols.append call, TYPE_OIDS, and the bits
parsing logic so reviewers can locate where to document the limitation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2d00c4c2-d4c6-4b92-b1ca-8ed3ab8ae562
📒 Files selected for processing (6)
README.mddocs/catalog.mdexample/lazy_catalog.pyteleduck/README.mdteleduck/TODO.mdtests/test_lazy_catalog.py
✅ Files skipped from review due to trivial changes (2)
- README.md
- docs/catalog.md
| bits = part.split() | ||
| if len(bits) >= 2: | ||
| cols.append((bits[0].strip('"'), TYPE_OIDS.get(bits[1].lower(), 25), True)) | ||
| CATALOG.setdefault("appdb", {}).setdefault(schema, {})[name] = cols |
There was a problem hiding this comment.
Hardcoded database name breaks multi-database scenarios.
Line 127 always mutates CATALOG under the key "appdb", regardless of which database the client connected to. If a client connects with -d other_db and issues CREATE TABLE, the table will be incorrectly added to appdb in the catalog structure, breaking catalog queries for other_db.
Extract the database name from the connection context (likely available in **kwargs or via self) and use it instead of the hardcoded "appdb" literal.
🔧 Proposed fix
- CATALOG.setdefault("appdb", {}).setdefault(schema, {})[name] = cols
+ # Extract database from connection context or default to "appdb"
+ database = kwargs.get("database", "appdb")
+ CATALOG.setdefault(database, {}).setdefault(schema, {})[name] = cols📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| CATALOG.setdefault("appdb", {}).setdefault(schema, {})[name] = cols | |
| # Extract database from connection context or default to "appdb" | |
| database = kwargs.get("database", "appdb") | |
| CATALOG.setdefault(database, {}).setdefault(schema, {})[name] = cols |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@example/lazy_catalog.py` at line 127, The assignment always uses the
hardcoded key "appdb" (CATALOG.setdefault("appdb", {}).setdefault(schema,
{})[name] = cols) which breaks multi-database use; change it to derive the
database name from the connection context (e.g., check kwargs for a "database"
or "db" key, or use an instance property like self.database or a connection
helper method) and use that variable as the top-level key (e.g., db_name =
kwargs.get("database") or self.database with a sensible fallback) before calling
CATALOG.setdefault(db_name, ...). Update the code path in the function/method
that contains that line so new tables are registered under the actual connected
database rather than the literal "appdb".
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@teleduck/tests/test_lazy_catalog.py`:
- Around line 30-31: The test calls tempfile.mkstemp and assigns its return to
fd and cls.db_file but never closes fd, leaking a file descriptor; after
obtaining cls.db_file from tempfile.mkstemp, explicitly close the fd (close the
integer descriptor returned as fd) before calling Path(cls.db_file).unlink() so
the descriptor is released; update the setup code near tempfile.mkstemp / fd /
cls.db_file usage in test_lazy_catalog.py to call os.close(fd) (or fd.close() if
using an open file) immediately after mkstemp and before unlinking.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5215173c-e6d9-4a1b-a2f8-fbbd39facc8a
📒 Files selected for processing (2)
teleduck/tests/test_lazy_catalog.pytests/test_lazy_catalog.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_lazy_catalog.py
| fd, cls.db_file = tempfile.mkstemp(suffix=".db") | ||
| Path(cls.db_file).unlink() # remove so DuckDB can create it |
There was a problem hiding this comment.
Close the file descriptor returned by mkstemp.
tempfile.mkstemp returns an open file descriptor alongside the path. The code never closes fd, causing a resource leak. While the test process will eventually reclaim it, the correct pattern is to close the descriptor immediately after obtaining the path.
🔧 Proposed fix
+import os
+
...
fd, cls.db_file = tempfile.mkstemp(suffix=".db")
+ os.close(fd)
Path(cls.db_file).unlink() # remove so DuckDB can create it🧰 Tools
🪛 Ruff (0.15.15)
[warning] 30-30: Unpacked variable fd is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@teleduck/tests/test_lazy_catalog.py` around lines 30 - 31, The test calls
tempfile.mkstemp and assigns its return to fd and cls.db_file but never closes
fd, leaking a file descriptor; after obtaining cls.db_file from
tempfile.mkstemp, explicitly close the fd (close the integer descriptor returned
as fd) before calling Path(cls.db_file).unlink() so the descriptor is released;
update the setup code near tempfile.mkstemp / fd / cls.db_file usage in
test_lazy_catalog.py to call os.close(fd) (or fd.close() if using an open file)
immediately after mkstemp and before unlinking.
Summary by CodeRabbit
New Features
Integrations
Documentation
Tests
Chores