Convert report API to ADRF views (prep for async embedding trigger)#230
Convert report API to ADRF views (prep for async embedding trigger)#230samuelvkwong wants to merge 29 commits into
Conversation
Capture the design for replacing the DRF ReportViewSet with three adrf.views.APIView subclasses (list/detail/bulk-upsert), preserving the existing API contract. This is the structural prerequisite for a follow-up PR that triggers the async embedding pipeline from the upload path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five-task plan, TDD-flavored: 1. Move bulk_upsert_reports into its own module (pure refactor) 2. Lock the wire contract with end-to-end tests + async-shape guards 3. Add the three ADRF view classes 4. Swap urls.py to the new views; delete ReportViewSet 5. Lint + full test + manual smoke + open PR Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…/bulk.py Pure code move with one rename (_bulk_upsert_reports -> bulk_upsert_reports) since it's now the only public symbol of the new module. The DRF viewset becomes a thinner HTTP wrapper. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lock the wire-level contract for all five report endpoints before the ADRF rewrite. The three iscoroutinefunction guards fail today and will go green once the new ADRF view classes land. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce ReportListAPIView, ReportDetailAPIView, and ReportBulkUpsertAPIView following ADIT's adrf.views.APIView pattern. The classes are unreachable until urls.py is swapped in the next commit; the async-shape guards in test_report_api.py go green now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop DefaultRouter in favor of explicit path() entries wired to the three new ADRF views. Deletes radis/reports/api/viewsets.py. URLs, response shapes, status codes, query-param semantics, and permission behavior are byte-for-byte identical to the prior DRF implementation — guarded by radis/reports/tests/test_report_api.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRewrites the Reports API to a single async ADRF ChangesAsync ADRF Report ViewSet Migration
Estimated code review effort: Possibly related PRs:
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request refactors the radiology report API by replacing the synchronous DRF ReportViewSet with three explicit asynchronous ADRF APIView subclasses (ReportListAPIView, ReportDetailAPIView, and ReportBulkUpsertAPIView) and extracting the bulk upsert logic into a separate module. End-to-end tests and async-shape guards are also introduced to ensure API contract preservation. The code review identified critical concurrency and transaction safety issues in the new async views, including thread-safety concerns when separating report.adelete() from transaction.on_commit registration, potential event loop blocking from accessing request.data inside database_sync_to_async blocks, and an opportunity to optimize external document fetching by utilizing asyncio.gather for parallel execution.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Four fixes raised by Gemini code-assist review: 1. delete: combine `report.delete()` and `transaction.on_commit()` registration into a single `database_sync_to_async` block wrapped in `transaction.atomic()`. Previously, `await report.adelete()` ran on the async connection and the on_commit registration ran on a separate sync connection, so the callback was not bound to the delete's transaction. 2. post / put / bulk-upsert: materialize `request.data` on the async thread (and capture as a local) before entering any `database_sync_to_async` wrapper. Parsing the ASGI body stream from a worker thread risks SynchronousOnlyOperation under ASGI and is the most likely cause of the failing test_report_api / test_bulk_upsert cases in the previous CI run. 3. get (?full=true): replace the sequential per-fetcher `database_sync_to_async` loop with `asyncio.gather`, so multiple document_fetchers run concurrently instead of one-at-a-time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Django's test Client dispatches an async view via async_to_sync, which spawns a thread that does not see the test's wrapping atomic transaction. The first DB query inside the async view (e.g. groups field validation in ReportSerializer) then hits "the connection is closed". Fix: mark every HTTP-based test (and the live_server-based radis-client test) with @pytest.mark.django_db(transaction=True) so pytest-django uses TransactionTestCase semantics (truncate after, no hidden transaction). The two tests that exercise the helpers directly (test_bulk_upsert_dedupes_metadata_keys, test_report_data_valid) keep the default marker. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The sync Client dispatches an async view via async_to_sync, which nested with our own database_sync_to_async deadlocks asgiref's thread executor under pytest-django (Django 6.0 + channels 4). Switch all HTTP tests in test_report_api.py and the two HTTP-based tests in test_bulk_upsert.py to django.test.AsyncClient + async def + @pytest.mark.asyncio + @pytest.mark.django_db(transaction=True). This runs the async view in the test's event loop with no outer async_to_sync wrapping, eliminating the deadlock. ORM assertions inside async tests now use a* variants (aexists, aget, acount) since we can't use sync ORM from async without sync_to_async. The two helper-direct tests (test_bulk_upsert_dedupes_metadata_keys, test_report_data_valid) stay sync — they don't hit HTTP. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Django's async-unsafe sentinel raises SynchronousOnlyOperation when sync ORM (factory_boy `.create()`, `.objects.create_token()`, `user.groups.add()`) is called directly from an `async def` test. Wrap each helper invocation with `await sync_to_async(...)`: - test_report_api.py: every `_staff_user_and_token()` and `_non_staff_user_and_token()` call now goes through `sync_to_async`. - test_bulk_upsert.py: factor the per-test factory setup into a shared `_create_staff_user_group_token(label)` helper and call it via `sync_to_async`. Helper return types tightened so pyright can resolve `.pk` access. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4203250 to
06cbfd3
Compare
Pull in PR #230's ADRF report-API rewrite (ReportListAPIView, ReportDetailAPIView, ReportBulkUpsertAPIView; native async ORM + serializer wrap via channels.db.database_sync_to_async). Skips the trailing temp CI commit bc2dc07 (marked 'revert before merge' by author) — that change does not belong on this branch. Sets up the inline async embedding wiring on top. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pulls in PR #230's three follow-up test commits (86ac291, 07b8751, 06cbfd3): use transaction=True on the HTTP tests, migrate them to AsyncClient, and wrap sync ORM helpers with sync_to_async. These fix async-safety issues that were causing report-api test failures on this branch after the inline async embedding wiring landed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…trator Replaces the periodic Job/Task orchestrator with inline async embedding triggered directly from the ADRF report views (PR #230). When a report is POSTed, PUT, or bulk-upserted, the view awaits AsyncEmbeddingClient via a new helper `embed_reports_inline`, which fetches the just-created RSV rows, calls `embed_documents` once, and bulk_updates the embeddings. Why this is simpler: no periodic cron, no queue, no system user, no EmbeddingJob/EmbeddingTask models, no embeddings_worker container — and reports become semantically searchable at the moment the API call returns (modulo a sub-second embedding round-trip). Failure policy unchanged in spirit: if the embedding service is unreachable or returns malformed data, the helper logs a WARNING and returns; the RSV row keeps embedding=NULL and the report stays searchable via the FTS half. Never raises into the request handler. Changes: - Add AsyncEmbeddingClient (httpx.AsyncClient sibling of EmbeddingClient) - Refactor EmbeddingClient internals into shared helpers (_resolve_config, _normalize_response) so both clients reuse validation + post-processing - Add radis/pgsearch/utils/inline_embedding.py with embed_reports_inline() - Wire the helper into ReportListAPIView.post, ReportDetailAPIView.put, and ReportBulkUpsertAPIView.post (radis/reports/api/views.py) - Override EMBEDDING_PROVIDER_URL="" in radis/settings/test.py so the inline path fast-fails into the documented WARNING fallback in CI - Delete radis/pgsearch/tasks.py orchestrator entries (kept the FTS bulk_index_reports + enqueue_bulk_index_reports used by bulk.py) - Delete EmbeddingJob/EmbeddingTask models - Slim the squashed migration to schema-only (extension + embedding column + HNSW index); drop the Job/Task tables and system-user RunPython - Remove EMBEDDING_DRAIN_CRON / EMBEDDING_INDEX_PRIORITY / EMBEDDING_SYSTEM_USERNAME settings - Drop the embeddings_worker container from all three compose files - Delete the now-obsolete orchestrator tests (test_embedding_launcher / test_process_embedding_job / test_process_embedding_task / test_migrations_system_user / test_models_embedding) Existing reports with NULL embeddings remain searchable via the FTS half. Operators who need to retroactively embed historical reports can re-PUT each affected document; no backfill command is reinstated. Verified locally: pgsearch (62 tests) and reports (20 tests) suites both pass against the merged feat/adrf-views. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
§3 architecture diagram swapped from cron+Job/Task orchestrator to async ADRF view + embed_reports_inline. §3 components table updated to reflect AsyncEmbeddingClient + inline_embedding helper instead of orchestrator tasks + system user. §4.2 squashed migration is schema-only. §4.5 dim-change procedure uses re-PUT rather than launcher.defer. §6 rewritten end-to-end as inline architecture: the helper, view wiring across the three ADRF endpoints, AsyncEmbeddingClient, failure semantics, historical NULL handling. §8.2 drops EMBEDDING_DRAIN_CRON/INDEX_PRIORITY/ SYSTEM_USERNAME constants. §8.4 drops embeddings_worker compose block. §9 error-handling table swaps orchestrator-era rows for the inline-call failure. §10 testing replaces the three orchestrator test files with test_inline_embedding. §11.3 dim-change mitigation now points to §4.5. §12 rollout becomes a 6-step plan keyed off PR #230's ADRF views landing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nqueue The follow-up PR will await the async embedding client from inside the view to embed the report inline during the upload request, so the vector is populated before the API returns. The previous wording suggested enqueueing a background job, which is a different design. ADRF is what lets the view handler `await` the I/O-bound embedding call without holding a worker thread for the duration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch from three explicit `adrf.views.APIView` subclasses
(ReportListAPIView, ReportDetailAPIView, ReportBulkUpsertAPIView) to a
single `adrf.viewsets.GenericViewSet` subclass plus the four async
mixins from `adrf.mixins` and an `@action` for bulk-upsert. URL wiring
goes back to `DefaultRouter.register("", ReportViewSet, basename="report")`.
Why: the legacy class was a `GenericViewSet` + sync mixins; this is the
minimum-diff async equivalent. The router-generated URLs, route names
(`report-list` / `report-detail` / `report-bulk-upsert`), and default
`lookup_value_regex` ([^/.]+) match the legacy contract verbatim, so no
test or client change is needed beyond the async-shape guards. The
browsable API root at /api/reports/ comes back for free.
PATCH still returns 405, now via `http_method_names` instead of an
explicit `apartial_update` override.
The async-shape guard tests are folded into a single test that pins
every dispatched method (acreate / aretrieve / aupdate / adestroy /
bulk_upsert) to `inspect.iscoroutinefunction`. The risk this guards is
specific to the viewset shape: `adrf.mixins.*ModelMixin` inherits from
DRF's sync mixins, so each class has both `create` and `acreate` (etc.)
on the MRO; an accidental sync override would silently flip dispatch
back to sync.
Spec + plan updated to match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This is a sync-DRF → async-ADRF conversion of the existing
ReportViewSet, not a file restructure. Undoes the earlier
extract-into-bulk.py + rename-to-views.py steps:
- Move bulk_upsert_reports back into radis/reports/api/viewsets.py
(delete radis/reports/api/bulk.py).
- Rename radis/reports/api/views.py back to viewsets.py.
- Point urls.py at `from .viewsets import ReportViewSet`.
- Update test_bulk_upsert.py and radis-client/tests/test_client.py
to import / patch under `radis.reports.api.viewsets`.
- Update spec + plan to match: one module, no file split, no rename.
Behaviour is unchanged from the previous viewset commit (7215e2f).
CI on that commit was green; the file moves here are import-path
only and should keep CI green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…c overrides DRF's `rest_framework.routers.DefaultRouter` maps HTTP methods to the SYNC action names (POST → 'create', PUT → 'update', GET → 'retrieve', DELETE → 'destroy'). Because adrf.mixins.* inherit from DRF's sync mixins, those sync names exist on the class as fully-functional sync methods — so router-driven dispatch silently calls the inherited sync mixin implementations, NEVER our acreate/aretrieve/aupdate/adestroy overrides. That broke the four tests that exercise behaviour added by our overrides: - test_post_creates_report_and_fires_created_handler (no on_commit) - test_get_full_includes_documents_from_fetchers (no `?full=`) - test_put_upsert_creates_when_missing (no `?upsert=`) - test_delete_removes_report_and_fires_deleted_handler (no on_commit) `adrf.routers.DefaultRouter` rewrites the action mapping to the a-prefixed names whenever `view_is_async=True`, so POST → 'acreate', PUT → 'aupdate', GET → 'aretrieve', DELETE → 'adestroy'. Dispatch now hits our overrides. Verified with `resolve(...).func.actions` showing the a-prefixed mapping. Spec + plan call out that the router choice is load-bearing (the async-shape guard test catches override identity but cannot catch a mis-wired router). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/superpowers/specs/2026-06-08-adrf-report-views-design.md (2)
189-189:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winClarify “router removal” wording to “router migration.”
Given this design keeps routing via
adrf.routers.DefaultRouter, “router removal” is inaccurate and may confuse regression intent for the route-resolution test.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/superpowers/specs/2026-06-08-adrf-report-views-design.md` at line 189, Update the test wording and assertion guidance: change the comment/commit message that says “router removal” to “router migration” and ensure the regression guard in radis/reports/tests/test_bulk_upsert.py asserts that the bulk-upsert endpoint still resolves under the current routing (using adrf.routers.DefaultRouter); specifically, add one explicit assertion that the route resolves (e.g., client.get/post to the bulk-upsert route returns the expected status) and update any related test text to reference “router migration” instead of “router removal.”
19-25:⚠️ Potential issue | 🟠 Major | ⚡ Quick winResolve the architecture contradiction in “In scope.”
This section says to drop
ReportViewSet/router and move to three APIViews + explicitpath()routes, but later sections define the chosen design as a single ADRFReportViewSetonadrf.routers.DefaultRouter. Keep one architecture only, or this spec will drive conflicting implementations/reviews.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/superpowers/specs/2026-06-08-adrf-report-views-design.md` around lines 19 - 25, The spec contains a contradiction between using three adrf.views.APIView subclasses (ReportListAPIView, ReportDetailAPIView, ReportBulkUpsertAPIView) with explicit path() routes and using a single ADRF ReportViewSet registered on adrf.routers.DefaultRouter; pick one approach and make the entire document consistent: either remove all mentions of ReportViewSet/DefaultRouter/router registration and ensure radis/reports/api/urls.py and the “In scope” list describe the three APIViews and the rename of _bulk_upsert_reports to bulk_upsert_reports, or keep ReportViewSet/DefaultRouter as the single chosen design and remove the APIView references and explicit path() instructions (also revert or document how bulk_upsert_reports is exposed via the viewset action). Ensure every section (In scope, endpoints list, URL wiring, and helper rename) aligns with the single chosen architecture and reference the exact symbols above when updating text.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/superpowers/specs/2026-06-08-adrf-report-views-design.md`:
- Around line 87-90: Update the example imports to use ADRF modules and the
viewsets module: replace the DRF router import so it uses DefaultRouter from
adrf.routers (symbol DefaultRouter) and change the ReportViewSet import to come
from .viewsets (symbol ReportViewSet) instead of rest_framework.routers and
.views; keep the rest of the wiring unchanged so the router and ReportViewSet
integration matches the async dispatch design.
- Line 27: The spec incorrectly references Django's synchronous Client for the
new async ADRF contract; update all mentions (e.g., the line describing the new
test file `radis/reports/tests/test_report_api.py` and any other occurrences) to
recommend using Django's AsyncClient and async test plumbing (async test
functions or pytest-asyncio/Django async test utilities) so tests exercise async
dispatch end-to-end rather than the synchronous Client.
In `@radis/reports/api/viewsets.py`:
- Around line 283-296: The commit callback that reindexes touched_report_ids
(calling bulk_upsert_report_search_vectors or enqueue_bulk_index_reports) is
only present in the bulk path's on_commit closure; update the single-record
handlers in acreate() and aupdate() to mirror that logic: when a single Report
is created/updated, register the same on_commit behavior (collect the single
document_id into touched_report_ids and, inside the on_commit callback, call
bulk_upsert_report_search_vectors(touched_report_ids) if
settings.PGSEARCH_SYNC_INDEXING else
enqueue_bulk_index_reports(touched_report_ids)) and still invoke
reports_created_handlers / reports_updated_handlers as before so single-record
POST/PUT commits also update the PGSEARCH index.
- Around line 354-359: The document fetches wrapped by _fetch use
database_sync_to_async with its default thread_sensitive=True which serializes
work onto one worker thread; update the calls in the _fetch coroutine to use
database_sync_to_async(fetcher.fetch, thread_sensitive=False)(report) (or
convert fetcher.fetch implementations to be async) so asyncio.gather can run
them in parallel; locate _fetch, fetcher.fetch, document_fetchers and
asyncio.gather in this block and apply the thread_sensitive=False change (or
switch the fetchers to native async) accordingly.
---
Outside diff comments:
In `@docs/superpowers/specs/2026-06-08-adrf-report-views-design.md`:
- Line 189: Update the test wording and assertion guidance: change the
comment/commit message that says “router removal” to “router migration” and
ensure the regression guard in radis/reports/tests/test_bulk_upsert.py asserts
that the bulk-upsert endpoint still resolves under the current routing (using
adrf.routers.DefaultRouter); specifically, add one explicit assertion that the
route resolves (e.g., client.get/post to the bulk-upsert route returns the
expected status) and update any related test text to reference “router
migration” instead of “router removal.”
- Around line 19-25: The spec contains a contradiction between using three
adrf.views.APIView subclasses (ReportListAPIView, ReportDetailAPIView,
ReportBulkUpsertAPIView) with explicit path() routes and using a single ADRF
ReportViewSet registered on adrf.routers.DefaultRouter; pick one approach and
make the entire document consistent: either remove all mentions of
ReportViewSet/DefaultRouter/router registration and ensure
radis/reports/api/urls.py and the “In scope” list describe the three APIViews
and the rename of _bulk_upsert_reports to bulk_upsert_reports, or keep
ReportViewSet/DefaultRouter as the single chosen design and remove the APIView
references and explicit path() instructions (also revert or document how
bulk_upsert_reports is exposed via the viewset action). Ensure every section (In
scope, endpoints list, URL wiring, and helper rename) aligns with the single
chosen architecture and reference the exact symbols above when updating text.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3aface82-eb2b-4979-99f9-467994501c38
📒 Files selected for processing (7)
docs/superpowers/plans/2026-06-08-adrf-report-views.mddocs/superpowers/specs/2026-06-08-adrf-report-views-design.mdradis-client/tests/test_client.pyradis/reports/api/urls.pyradis/reports/api/viewsets.pyradis/reports/tests/test_bulk_upsert.pyradis/reports/tests/test_report_api.py
✅ Files skipped from review due to trivial changes (1)
- docs/superpowers/plans/2026-06-08-adrf-report-views.md
🚧 Files skipped from review as they are similar to previous changes (1)
- radis/reports/api/urls.py
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/superpowers/specs/2026-06-08-adrf-report-views-design.md (2)
189-189:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winClarify “router removal” wording to “router migration.”
Given this design keeps routing via
adrf.routers.DefaultRouter, “router removal” is inaccurate and may confuse regression intent for the route-resolution test.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/superpowers/specs/2026-06-08-adrf-report-views-design.md` at line 189, Update the test wording and assertion guidance: change the comment/commit message that says “router removal” to “router migration” and ensure the regression guard in radis/reports/tests/test_bulk_upsert.py asserts that the bulk-upsert endpoint still resolves under the current routing (using adrf.routers.DefaultRouter); specifically, add one explicit assertion that the route resolves (e.g., client.get/post to the bulk-upsert route returns the expected status) and update any related test text to reference “router migration” instead of “router removal.”
19-25:⚠️ Potential issue | 🟠 Major | ⚡ Quick winResolve the architecture contradiction in “In scope.”
This section says to drop
ReportViewSet/router and move to three APIViews + explicitpath()routes, but later sections define the chosen design as a single ADRFReportViewSetonadrf.routers.DefaultRouter. Keep one architecture only, or this spec will drive conflicting implementations/reviews.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/superpowers/specs/2026-06-08-adrf-report-views-design.md` around lines 19 - 25, The spec contains a contradiction between using three adrf.views.APIView subclasses (ReportListAPIView, ReportDetailAPIView, ReportBulkUpsertAPIView) with explicit path() routes and using a single ADRF ReportViewSet registered on adrf.routers.DefaultRouter; pick one approach and make the entire document consistent: either remove all mentions of ReportViewSet/DefaultRouter/router registration and ensure radis/reports/api/urls.py and the “In scope” list describe the three APIViews and the rename of _bulk_upsert_reports to bulk_upsert_reports, or keep ReportViewSet/DefaultRouter as the single chosen design and remove the APIView references and explicit path() instructions (also revert or document how bulk_upsert_reports is exposed via the viewset action). Ensure every section (In scope, endpoints list, URL wiring, and helper rename) aligns with the single chosen architecture and reference the exact symbols above when updating text.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/superpowers/specs/2026-06-08-adrf-report-views-design.md`:
- Around line 87-90: Update the example imports to use ADRF modules and the
viewsets module: replace the DRF router import so it uses DefaultRouter from
adrf.routers (symbol DefaultRouter) and change the ReportViewSet import to come
from .viewsets (symbol ReportViewSet) instead of rest_framework.routers and
.views; keep the rest of the wiring unchanged so the router and ReportViewSet
integration matches the async dispatch design.
- Line 27: The spec incorrectly references Django's synchronous Client for the
new async ADRF contract; update all mentions (e.g., the line describing the new
test file `radis/reports/tests/test_report_api.py` and any other occurrences) to
recommend using Django's AsyncClient and async test plumbing (async test
functions or pytest-asyncio/Django async test utilities) so tests exercise async
dispatch end-to-end rather than the synchronous Client.
In `@radis/reports/api/viewsets.py`:
- Around line 283-296: The commit callback that reindexes touched_report_ids
(calling bulk_upsert_report_search_vectors or enqueue_bulk_index_reports) is
only present in the bulk path's on_commit closure; update the single-record
handlers in acreate() and aupdate() to mirror that logic: when a single Report
is created/updated, register the same on_commit behavior (collect the single
document_id into touched_report_ids and, inside the on_commit callback, call
bulk_upsert_report_search_vectors(touched_report_ids) if
settings.PGSEARCH_SYNC_INDEXING else
enqueue_bulk_index_reports(touched_report_ids)) and still invoke
reports_created_handlers / reports_updated_handlers as before so single-record
POST/PUT commits also update the PGSEARCH index.
- Around line 354-359: The document fetches wrapped by _fetch use
database_sync_to_async with its default thread_sensitive=True which serializes
work onto one worker thread; update the calls in the _fetch coroutine to use
database_sync_to_async(fetcher.fetch, thread_sensitive=False)(report) (or
convert fetcher.fetch implementations to be async) so asyncio.gather can run
them in parallel; locate _fetch, fetcher.fetch, document_fetchers and
asyncio.gather in this block and apply the thread_sensitive=False change (or
switch the fetchers to native async) accordingly.
---
Outside diff comments:
In `@docs/superpowers/specs/2026-06-08-adrf-report-views-design.md`:
- Line 189: Update the test wording and assertion guidance: change the
comment/commit message that says “router removal” to “router migration” and
ensure the regression guard in radis/reports/tests/test_bulk_upsert.py asserts
that the bulk-upsert endpoint still resolves under the current routing (using
adrf.routers.DefaultRouter); specifically, add one explicit assertion that the
route resolves (e.g., client.get/post to the bulk-upsert route returns the
expected status) and update any related test text to reference “router
migration” instead of “router removal.”
- Around line 19-25: The spec contains a contradiction between using three
adrf.views.APIView subclasses (ReportListAPIView, ReportDetailAPIView,
ReportBulkUpsertAPIView) with explicit path() routes and using a single ADRF
ReportViewSet registered on adrf.routers.DefaultRouter; pick one approach and
make the entire document consistent: either remove all mentions of
ReportViewSet/DefaultRouter/router registration and ensure
radis/reports/api/urls.py and the “In scope” list describe the three APIViews
and the rename of _bulk_upsert_reports to bulk_upsert_reports, or keep
ReportViewSet/DefaultRouter as the single chosen design and remove the APIView
references and explicit path() instructions (also revert or document how
bulk_upsert_reports is exposed via the viewset action). Ensure every section (In
scope, endpoints list, URL wiring, and helper rename) aligns with the single
chosen architecture and reference the exact symbols above when updating text.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3aface82-eb2b-4979-99f9-467994501c38
📒 Files selected for processing (7)
docs/superpowers/plans/2026-06-08-adrf-report-views.mddocs/superpowers/specs/2026-06-08-adrf-report-views-design.mdradis-client/tests/test_client.pyradis/reports/api/urls.pyradis/reports/api/viewsets.pyradis/reports/tests/test_bulk_upsert.pyradis/reports/tests/test_report_api.py
✅ Files skipped from review due to trivial changes (1)
- docs/superpowers/plans/2026-06-08-adrf-report-views.md
🚧 Files skipped from review as they are similar to previous changes (1)
- radis/reports/api/urls.py
🛑 Comments failed to post (4)
docs/superpowers/specs/2026-06-08-adrf-report-views-design.md (2)
27-27:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winUpdate test client guidance to AsyncClient consistently.
The spec still says Django
Client, but this migration’s contract is async ADRF behavior and async test plumbing. Keeping sync-client guidance here risks false confidence on async dispatch coverage.Also applies to: 191-191
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/superpowers/specs/2026-06-08-adrf-report-views-design.md` at line 27, The spec incorrectly references Django's synchronous Client for the new async ADRF contract; update all mentions (e.g., the line describing the new test file `radis/reports/tests/test_report_api.py` and any other occurrences) to recommend using Django's AsyncClient and async test plumbing (async test functions or pytest-asyncio/Django async test utilities) so tests exercise async dispatch end-to-end rather than the synchronous Client.
87-90:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix the URL wiring sample to ADRF imports.
The snippet currently shows DRF router and
.viewsimport, which contradicts the async dispatch design and the implemented wiring. It should usefrom adrf.routers import DefaultRouterand importReportViewSetfrom.viewsets.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/superpowers/specs/2026-06-08-adrf-report-views-design.md` around lines 87 - 90, Update the example imports to use ADRF modules and the viewsets module: replace the DRF router import so it uses DefaultRouter from adrf.routers (symbol DefaultRouter) and change the ReportViewSet import to come from .viewsets (symbol ReportViewSet) instead of rest_framework.routers and .views; keep the rest of the wiring unchanged so the router and ReportViewSet integration matches the async dispatch design.radis/reports/api/viewsets.py (2)
283-296:
⚠️ Potential issue | 🟠 Major | ⚡ Quick winMirror the bulk-upsert PGSEARCH callback in
acreate()andaupdate().The bulk path reindexes
touched_report_idson commit, but the single-record POST/PUT paths only invokereports_created_handlers/reports_updated_handlers. That leaves/api/reports/and/api/reports/{document_id}/able to commit successfully while the PGSEARCH index stays stale for those reports.Suggested fix
def on_commit(): for handler in reports_created_handlers: logger.debug( f"{handler.name} - handle newly created reports: " f"{[report.document_id]}" ) handler.handle([report]) + report_ids = [report.pk] + if settings.PGSEARCH_SYNC_INDEXING: + bulk_upsert_report_search_vectors(report_ids) + else: + enqueue_bulk_index_reports(report_ids)def on_commit(): handlers = ( reports_created_handlers if report is None else reports_updated_handlers ) event = "newly created" if report is None else "updated" for handler in handlers: logger.debug( f"{handler.name} - handle {event} reports: " f"{[saved.document_id]}" ) handler.handle([saved]) + report_ids = [saved.pk] + if settings.PGSEARCH_SYNC_INDEXING: + bulk_upsert_report_search_vectors(report_ids) + else: + enqueue_bulk_index_reports(report_ids)Also applies to: 327-336, 392-406
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@radis/reports/api/viewsets.py` around lines 283 - 296, The commit callback that reindexes touched_report_ids (calling bulk_upsert_report_search_vectors or enqueue_bulk_index_reports) is only present in the bulk path's on_commit closure; update the single-record handlers in acreate() and aupdate() to mirror that logic: when a single Report is created/updated, register the same on_commit behavior (collect the single document_id into touched_report_ids and, inside the on_commit callback, call bulk_upsert_report_search_vectors(touched_report_ids) if settings.PGSEARCH_SYNC_INDEXING else enqueue_bulk_index_reports(touched_report_ids)) and still invoke reports_created_handlers / reports_updated_handlers as before so single-record POST/PUT commits also update the PGSEARCH index.
354-359:
⚠️ Potential issue | 🟠 Major🧩 Analysis chain
🌐 Web query:
In Channels 4.2.0 / asgiref, does channels.db.database_sync_to_async default to thread_sensitive=True such that multiple calls awaited via asyncio.gather execute serially on the same thread executor?💡 Result:
Yes, channels.db.database_sync_to_async (which inherits from asgiref.sync.SyncToAsync) defaults to thread_sensitive=True [1]. In this mode, multiple calls to database_sync_to_async that are awaited via asyncio.gather will execute serially because they share a single-threaded executor [2][3][4]. The thread_sensitive=True setting is designed to ensure that all synchronous functions marked as "thread-sensitive" run in the same specific thread [4][5]. This is necessary for compatibility with code that relies on thread-local storage or non-thread-safe resources, such as many database adapters used by the Django ORM [2][4][5]. When the outermost program is asynchronous, this shared resource is a single dedicated sub-thread [4]. Consequently, if you launch multiple synchronous tasks using asyncio.gather, they will queue up and execute one after the other within that single, shared thread, rather than running in parallel across multiple threads [2][4]. If you set thread_sensitive=False, each call will instead run in its own brand-new thread, which would allow for parallel execution but may lead to issues if your code depends on thread-local state or specific database connection constraints [4][5].
Citations:
- 1: https://github.com/django/channels/blob/main/channels/db.py
- 2: https://docs.djangoproject.com/en/4.2/_modules/asgiref/sync/
- 3: https://github.com/django/asgiref/blob/main/asgiref/sync.py
- 4: https://github.com/django/asgiref/blob/main/README.rst
- 5: https://docs.djangoproject.com/en/dev/topics/async/
🏁 Script executed:
#!/bin/bash set -euo pipefail # Show the exact code around the referenced lines python3 - <<'PY' from pathlib import Path path = Path("radis/reports/api/viewsets.py") start, end = 330, 390 lines = path.read_text().splitlines() for i in range(start, end+1): if 1 <= i <= len(lines): print(f"{i}:{lines[i-1]}") PY # Find where document_fetchers is built and where fetcher.fetch is defined/used rg -n "document_fetchers|_fetch\\(|fetcher\\.fetch\\(|database_sync_to_async\\(" radis/reports/api/viewsets.py # If fetchers come from elsewhere, try to locate "document_fetchers" usage/definition across the repo rg -n "document_fetchers" radis | head -n 50 # Locate the fetcher type/class (best-effort) by searching for ".source" and "fetch(" patterns rg -n "class .*Fetcher|def fetch\\(|fetch\\(" radis/reports | head -n 50Repository: openradx/radis
Length of output: 3788
🏁 Script executed:
#!/bin/bash set -euo pipefail # Inspect imports and database_sync_to_async definition/usage near top of the file python3 - <<'PY' from pathlib import Path p = Path("radis/reports/api/viewsets.py") lines = p.read_text().splitlines() for i in range(1, min(120, len(lines))+1): print(f"{i}:{lines[i-1]}") PY # Show definition of DocumentFetcher and how fetcher.fetch is constructed python3 - <<'PY' from pathlib import Path p = Path("radis/reports/site.py") lines = p.read_text().splitlines() for i in range(1, min(140, len(lines))+1): print(f"{i}:{lines[i-1]}") PY # Find where DocumentFetcher fetch callables come from; also locate stub/fetchers usage rg -n "DocumentFetcher\\(|fetchers|document_fetchers\\[" radis/reports | head -n 200 # Locate implementations of fetch functions (best-effort) rg -n "def fetch\\(|async def fetch\\(" radis/reports | head -n 200 # Locate any database_sync_to_async calls that override thread_sensitive rg -n "database_sync_to_async\\(.*thread_sensitive|thread_sensitive=" radis | head -n 200Repository: openradx/radis
Length of output: 8683
🏁 Script executed:
#!/bin/bash set -euo pipefail # 1) Locate all document fetcher registrations (to inspect what fetcher.fetch actually is) rg -n "register_document_fetcher\\(" radis || true # 2) Inspect the exact fetcher callables used in registrations (common names: fetch, get_document, etc.) # We search for functions passed as the 2nd arg to register_document_fetcher rg -n "register_document_fetcher\\([^,]+,\\s*([A-Za-z0-9_\\.]+)\\)" radis || true # 3) Look for any thread_sensitive overrides in the repo (esp. around database_sync_to_async) rg -n "database_sync_to_async\\(.*thread_sensitive" radis || true rg -n "thread_sensitive\\s*=" radis || true # 4) Also list all database_sync_to_async usages inside radis/reports for context rg -n "database_sync_to_async\\(" radis/reports || trueRepository: openradx/radis
Length of output: 461
🏁 Script executed:
#!/bin/bash set -euo pipefail python3 - <<'PY' from pathlib import Path p = Path("radis/reports/tests/test_report_api.py") lines = p.read_text().splitlines() start, end = 140, 240 for i in range(start, end+1): if 1 <= i <= len(lines): print(f"{i}:{lines[i-1]}") PYRepository: openradx/radis
Length of output: 3887
asyncio.gather()won’t parallelize these document fetches
channels.db.database_sync_to_asyncdefaults tothread_sensitive=True, so thefetcher.fetch()calls wrapped here will queue onto the same single worker thread. As a result,?full=truewill still scale roughly linearly with the number ofdocument_fetchersdespite usingasyncio.gather. Iffetcher.fetchdoesn’t rely on Django/thread-local constraints, switch todatabase_sync_to_async(..., thread_sensitive=False)(or make the fetchers native async).async def _fetch(fetcher): return fetcher.source, await database_sync_to_async(fetcher.fetch)(report) results = await asyncio.gather(*(_fetch(f) for f in document_fetchers.values()))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@radis/reports/api/viewsets.py` around lines 354 - 359, The document fetches wrapped by _fetch use database_sync_to_async with its default thread_sensitive=True which serializes work onto one worker thread; update the calls in the _fetch coroutine to use database_sync_to_async(fetcher.fetch, thread_sensitive=False)(report) (or convert fetcher.fetch implementations to be async) so asyncio.gather can run them in parallel; locate _fetch, fetcher.fetch, document_fetchers and asyncio.gather in this block and apply the thread_sensitive=False change (or switch the fetchers to native async) accordingly.
|
My review : Right now the lexical indexing (which is cheaper to calculate compared to embedding vector) is done locally (on radx server) in batches, routed through the background Procrastinate queue. If you are still not convinced, think of the following scenario : today nothing in the indexing path can be "down," because there's no external service in it. The outage risk is entirely created by the proposed "inline embedding" design. This is what I mean - you have two ways of wiring things up - Ordering A — embed, then save. Request arrives → await embed(text) → embedding service is down → the await raises → request returns a 5xx → the report is never saved. During an embedding outage, ingestion stops dead. Every upload fails; the radis-client or the ETL pipeline either errors out or retry-loops against a dead service. Ordering B — save, then embed. Request → save report → await embed → store vector → respond. Embedding down → the report is saved but the embed step throws. Now either the request 500s (the client thinks it failed and may retry, creating duplicate/re-upsert churn) or you swallow the error and return 201 — and either way you're left with a saved-but-unembedded report: it exists, but it's invisible to semantic search, silently inconsistent, with nothing scheduled to fix it. (and you havent written any test cases to catch this behaviour .. ;)) All this is happening because you have chosen to do the upload and embedding together. You could just reuse the Procrastinate stuff of lexical indexing .. i think that is the more technically fitter solution .. But ALSO::: we dont have a lot of work to do to get ninja running for our async support. useing adrf is arguably the poorer solution .. especially now that there is nothing aprt from reportviewsets that uses DRF .. we have very little change to manage the migration to Ninja ... there would not be a better time. My strong sugesstion would be to use Ninja .. Cheers |
09b0a3c to
103f36b
Compare
Restructure the viewset and bulk_upsert_reports so the async/sync seam
is uniform and reads predictably across every handler:
- Native async ORM (aget, async-for comprehensions, abulk_create) for
anything that does NOT need to be atomic.
- One sync helper closure per atomic block, decorated with:
@sync_to_async(thread_sensitive=True)
@transaction.atomic
The sync helper owns the entire atomic write plus the
transaction.on_commit registration, on a single thread and a
single DB connection. `thread_sensitive=True` ensures the helper
runs on Django's shared sync thread so the transaction semantics
hold.
Changes:
- bulk_upsert_reports is now `async def`. Preflight (Language /
Modality dedupe + ensure-exists, plus the existing-Reports lookup)
uses native async ORM. The atomic write block (Phase 4) is a
nested sync helper with the new decorator stack.
- acreate / aupdate / adestroy switched from
`database_sync_to_async def _x()` to
`@sync_to_async(thread_sensitive=True) @transaction.atomic def _x()`.
adestroy no longer needs the explicit `with transaction.atomic`
inside the body — the decorator does it.
- aretrieve's serializer.data + fetcher fetches switch to plain
sync_to_async(..., thread_sensitive=True) (no atomic needed).
- The bulk_upsert action splits per-payload DRF validation (sync, no
atomic) from the now-async bulk_upsert_reports call.
- test_bulk_upsert_dedupes_metadata_keys becomes async to call the
now-async bulk_upsert_reports.
Caveat documented in the module docstring: Django 6.0/6.1's native
async ORM methods (aget, abulk_create, ...) still wrap sync_to_async
internally — there is no native async DB backend in Django today (PR
#17275 stalled since 2024). The Phase 2 native async preflight in
bulk_upsert_reports does not parallelize at the DB layer today; the
win is architectural clarity. When Django ships native async DB
support, the code shape is positioned to benefit automatically.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…elpers
`ReportSerializer.create` and `ReportSerializer.update` already open
their own `with transaction.atomic():` block to wrap the multi-step
Language → Report → groups → Metadata → Modalities write. Decorating
the outer `_create` / `_save` helpers with another `@transaction.atomic`
opened a redundant outer transaction whose only inner work was a
savepoint from the serializer's own block — functionally a no-op but
misleading about who owns atomicity at each layer.
Remove the decorator from `acreate._create` and `aupdate._save`. The
`transaction.on_commit(...)` registration following `serializer.save()`
in those helpers now runs outside any active transaction in production
(serializer.save committed already) and fires immediately — same
end-state as before, just without claiming ownership we don't have.
`adestroy._delete_and_schedule` keeps `@transaction.atomic` because the
`report.delete()` + `transaction.on_commit` registration must be bound
to the same transaction (Gemini's original review fix).
`bulk_upsert_reports._do_atomic_writes` keeps it because the multi-row
churn must commit atomically.
Module docstring updated to describe the criterion ("does this helper
own a transaction?") rather than blanket-applying the stack.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restructure the report API to follow the canonical async-with-atomic
pattern: domain writes live as `async def` operations that use native
async ORM (`aget_or_create`, `acreate`, `asave`, `aset`, `aclear`,
`adelete`); view-level atomic helpers hold the transaction and invoke
the async operations via `async_to_sync(...)`. `thread_sensitive=True`
on the outer wrapper keeps the entire call chain on one Django thread,
so the transaction context applies to every write.
Files:
- New `radis/reports/api/operations.py`:
- `create_report_from_validated(validated_data)` — creates a
Report + Language/get-or-create + groups + Metadata bulk +
Modalities/get-or-create via native async ORM.
- `update_report_from_validated(report, validated_data)` —
replaces all mutable fields + nested associations (metadata
delete + recreate, modalities aclear+aset, groups aset).
- `delete_report(report)` — `await report.adelete()`.
None of these own a transaction; the caller does.
- `radis/reports/api/serializers.py`:
- `ReportSerializer.create` / `.update` are sync shims that
delegate to `async_to_sync(operations.*_report_from_validated)`.
DRF's standard `serializer.save()` works unchanged for any
future caller; the `with transaction.atomic():` block formerly
owned by the serializer is gone — atomicity moves up to the
view's helper where it belongs.
- `radis/reports/api/viewsets.py`:
- `acreate` / `aupdate` re-add `@transaction.atomic` on the
sync_to_async helper now that the serializer no longer owns
atomicity. The helper body is just `serializer.is_valid` +
`serializer.save()` + `transaction.on_commit(...)`.
- `adestroy` replaces the bare `report.delete()` call inside the
atomic helper with `async_to_sync(operations.delete_report)(report)`,
matching the pattern across the file.
- Phase 4 of `bulk_upsert_reports` keeps inline sync ORM
(bulk_create / bulk_update / through-table churn) because those
are already single-statement bulk operations; decomposing them
into per-entity async ops would be noise. The pattern applies
uniformly to single-entity write paths.
- Module docstring updated to describe the operations.py +
async_to_sync architecture.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ReportSerializer now subclasses adrf.serializers.ModelSerializer.
The sync `create`/`update` shims that wrapped operations in
async_to_sync are replaced by native `async def acreate`/`aupdate`
that `await` the operations directly:
async def acreate(self, validated_data):
return await operations.create_report_from_validated(validated_data)
async def aupdate(self, report, validated_data):
return await operations.update_report_from_validated(report, validated_data)
The view's atomic helpers now bridge to the async serializer via
`async_to_sync(serializer.asave)()` inside the
`@sync_to_async(thread_sensitive=True) @transaction.atomic` block.
Net bridge count is unchanged (the `async_to_sync` simply moved from
inside the serializer up to the view's atomic helper) but two things
get cleaner:
- The serializer reads as native async — no sync shim importing
`async_to_sync` purely to bridge back to an async operation.
- Async callers (notably the forthcoming inline-embedding flow that
awaits the embedding client) can `await serializer.asave()`
directly, eliminating a double bridge (async → sync_to_async →
sync → async_to_sync → async) that the sync-shim version would
have required.
Other changes:
- `cast(ReportSerializer, self.get_serializer(...))` is used inside
the view's atomic helpers so pyright resolves `.asave` against
the actual ADRF type rather than DRF's `BaseSerializer` stub.
- `serializers.py` imports `operations` at module scope (no longer
needs a function-scope import for the bridge).
- `serializers.py` drops the now-unused `async_to_sync` import.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`ReportSerializer.acreate` and `aupdate` now each open their own
`@sync_to_async(thread_sensitive=True) @transaction.atomic` helper
that invokes the corresponding operation via `async_to_sync`. The
view's `acreate` / `aupdate` handlers are now pure async orchestration:
- `serializer = self.get_serializer(...)`
- `await sync_to_async(serializer.is_valid)(raise_exception=True)`
- `report = await serializer.asave()` ← transaction lives inside
- `transaction.on_commit(on_commit)` ← fires post-commit
- `await sync_to_async(lambda: serializer.data)()` for the response
Bridge accounting (per request):
- Old shape (atomic in view): 1 sync_to_async outer + 1 async_to_sync
inside (for serializer.asave) = 2 bridges per write.
- New shape (atomic in serializer): 1 sync_to_async (is_valid) +
1 sync_to_async + 1 async_to_sync inside serializer.acreate +
1 sync_to_async (serializer.data) = 4 bridges per write.
The 2 extra bridges are sync_to_async hops with µs-level dispatch
overhead, negligible against the SQL latency they straddle. The win is
that the serializer reads as the unit that owns "save + its
transaction", and the view reads as pure orchestration around an
async-native serializer that can be awaited directly by future async
callers (notably the forthcoming inline-embedding flow).
`adestroy` and `bulk_upsert_reports` keep their existing
`@sync_to_async @transaction.atomic` shape because neither involves a
serializer.
Module docstring updated to describe the new layout.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phases 1 (payload dedupe) and 3 (build new_reports / updated_reports
lists) of `bulk_upsert_reports` are pure-CPU work that used to run
inline in the outer async function — which means they were running on
the event loop thread and blocking it during the iteration. Wrap them
both in `@sync_to_async(thread_sensitive=True)` helpers so they execute
on the asgiref thread pool and the event loop stays free.
- Phase 1 (`_dedupe_payload`) — dedupes the input payload by
`document_id` and emits a warning on duplicates. Returns the deduped
list back to the outer scope.
- Phase 3 (`_build_report_lists`) — iterates the validated payload,
builds the `new_reports` / `updated_reports` lists using
`language_by_code` and `existing_by_document_id` from Phase 2.
Returns the four collections back to the outer scope.
`report_field_names` is hoisted above Phase 1 since both Phase 3 and
Phase 4 reference it. The dedupe helpers (`_dedupe_by_key`,
`_dedupe_metadata`, `_dedupe_groups`) move inside `_do_atomic_writes`
because that's their only call site, keeping the outer scope clean.
Performance note (documented in the module docstring): there's no
real wall-clock benefit today because Django's `aget`, `abulk_create`
and friends are still `sync_to_async`-wrapped sync ORM internally, so
Phase 2's "native async" path schedules on the same thread pool as
Phases 1/3/4. The win is purely structural: event loop stays free of
CPU work, and the code reads as "async coordination dispatching CPU
work to threads", which composes correctly the moment Django ships
native async DB support.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…xpected cleanup
The previous docstring blurb pointed at PR #17275 (stale since 2024)
as the canonical "native async DB" tracker. That's wrong — #17275
has had no maintainer engagement since mid-2024. The actual active
work is:
- django/django PR #18408 (Arfey) — AsyncConnectionHandler,
AsyncCursor, async_atomic, async test bases. Open, not draft,
~1300 LOC. Django core's stated gate is production evidence from
an external prototype before merging.
- `django-async-backend` on PyPI (also Arfey) — the external
prototype. Async reads shipped in v0.0.3 (March 2026): aget,
acount, aexists, async for, filtering, `async_atomic`. Writes are
in-progress; select_related / prefetch_related / multi-connection
`gather` parallelism remain unimplemented.
The comment now spells out both pointers and lists the concrete
cleanup the codebase is positioned for once async writes land:
- ReportSerializer.acreate / aupdate collapse from
`@sync_to_async @transaction.atomic def _atomic: async_to_sync(operations.X)`
to `async with async_atomic(): return await operations.X(...)`
— 6 lines per method → 3.
- adestroy and bulk_upsert_reports Phase 4 follow the same pattern.
- operations.py does not change; its `await` calls just stop
being internally sync_to_async-wrapped.
- Phases 1 and 3 of bulk_upsert_reports stay sync_to_async-wrapped
because they're pure CPU; correct regardless of the DB backend.
Comment-only change; no behavior impact.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bulk-upsert batch size for `bulk_upsert_reports` was hardcoded as a module-level constant in viewsets.py. Move it to settings as `REPORTS_BULK_DB_BATCH_SIZE` (env-overridable, default 1000) so it can be tuned per-environment without a code change. Aligns with the existing `PGSEARCH_BULK_INSERT_BATCH_SIZE` next to it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
radis/reports/api/operations.py (1)
82-87: 💤 Low valueMinor:
aclear()beforeaset()is redundant.
aset()already replaces the entire M2M relation, so the precedingaclear()is unnecessary. This doesn't affect correctness but adds an extra query.♻️ Optional simplification
- await report.modalities.aclear() modality_instances: list[Modality] = [] for modality in modalities: instance, _ = await Modality.objects.aget_or_create(**modality) modality_instances.append(instance) await report.modalities.aset(modality_instances)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@radis/reports/api/operations.py` around lines 82 - 87, The code calls report.modalities.aclear() before building modality_instances and then report.modalities.aset(modality_instances); remove the redundant await report.modalities.aclear() since aset() replaces the entire M2M relation itself. Keep the loop that uses Modality.objects.aget_or_create(...) to populate modality_instances and then call await report.modalities.aset(modality_instances) directly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@radis/reports/api/serializers.py`:
- Around line 48-56: The docstring for ReportSerializer is incorrect about
transaction ownership; update it to state that acreate and aupdate do own/manage
their own transactions via the inner `@_atomic`() helper (used in functions
acreate and aupdate) rather than relying on the caller to provide
`@transaction.atomic`; reference ReportSerializer, acreate, aupdate, and the inner
_atomic helper when rewording the docstring so it accurately reflects that these
methods run their own atomic blocks.
In `@radis/reports/api/viewsets.py`:
- Around line 432-433: In aretrieve and adestroy replace the current exception
re-raise that does "raise Http404" inside the "except Report.DoesNotExist"
blocks with "raise Http404 from None" so the original Report.DoesNotExist is
suppressed and the traceback is not chained; locate the except blocks in the
aretrieve and adestroy methods and update the raise statements accordingly.
- Around line 512-516: In adestroy, the Report.DoesNotExist except block should
suppress the original exception per Ruff B904; change the handler to capture the
exception (e.g., "except Report.DoesNotExist as err:") and re-raise Http404 with
explicit chaining to suppress the original (use "raise Http404 from None") so
the original traceback is not shown; update the except clause in the adestroy
method accordingly.
In `@radis/settings/base.py`:
- Around line 167-168: REPORTS_BULK_DB_BATCH_SIZE is read from the environment
and must be validated at startup: after you read REPORTS_BULK_DB_BATCH_SIZE in
radis/settings/base.py add a guard that raises
django.core.exceptions.ImproperlyConfigured if REPORTS_BULK_DB_BATCH_SIZE <= 0
(ensure you import ImproperlyConfigured from django.core.exceptions at top of
the file); this guarantees a clear error during startup rather than passing an
invalid batch_size into functions like
radis.reports.api.viewsets.bulk_upsert_reports which call
abulk_create/bulk_create/bulk_update.
---
Nitpick comments:
In `@radis/reports/api/operations.py`:
- Around line 82-87: The code calls report.modalities.aclear() before building
modality_instances and then report.modalities.aset(modality_instances); remove
the redundant await report.modalities.aclear() since aset() replaces the entire
M2M relation itself. Keep the loop that uses
Modality.objects.aget_or_create(...) to populate modality_instances and then
call await report.modalities.aset(modality_instances) directly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 63a41d2d-461b-4bb6-8a9f-407e069d7d9f
📒 Files selected for processing (5)
radis/reports/api/operations.pyradis/reports/api/serializers.pyradis/reports/api/viewsets.pyradis/reports/tests/test_bulk_upsert.pyradis/settings/base.py
🚧 Files skipped from review as they are similar to previous changes (1)
- radis/reports/tests/test_bulk_upsert.py
| except Report.DoesNotExist: | ||
| raise Http404 |
There was a problem hiding this comment.
Both aretrieve and adestroy should use raise Http404 from None.
Both methods raise Http404 within an except Report.DoesNotExist clause without exception chaining. Use from None to explicitly suppress the original exception and avoid confusing tracebacks.
🧰 Tools
🪛 Ruff (0.15.15)
[warning] 433-433: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@radis/reports/api/viewsets.py` around lines 432 - 433, In aretrieve and
adestroy replace the current exception re-raise that does "raise Http404" inside
the "except Report.DoesNotExist" blocks with "raise Http404 from None" so the
original Report.DoesNotExist is suppressed and the traceback is not chained;
locate the except blocks in the aretrieve and adestroy methods and update the
raise statements accordingly.
| async def adestroy(self, request: Request, *args: Any, **kwargs: Any) -> Response: | ||
| try: | ||
| report = await Report.objects.aget(document_id=kwargs[self.lookup_field]) | ||
| except Report.DoesNotExist: | ||
| raise Http404 |
There was a problem hiding this comment.
Add exception chaining to suppress the original exception.
Per Ruff B904, within an except clause, use raise ... from None to indicate the original exception is intentionally suppressed, or raise ... from err to chain it.
🔧 Proposed fix
async def adestroy(self, request: Request, *args: Any, **kwargs: Any) -> Response:
try:
report = await Report.objects.aget(document_id=kwargs[self.lookup_field])
except Report.DoesNotExist:
- raise Http404
+ raise Http404 from None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def adestroy(self, request: Request, *args: Any, **kwargs: Any) -> Response: | |
| try: | |
| report = await Report.objects.aget(document_id=kwargs[self.lookup_field]) | |
| except Report.DoesNotExist: | |
| raise Http404 | |
| async def adestroy(self, request: Request, *args: Any, **kwargs: Any) -> Response: | |
| try: | |
| report = await Report.objects.aget(document_id=kwargs[self.lookup_field]) | |
| except Report.DoesNotExist: | |
| raise Http404 from None |
🧰 Tools
🪛 Ruff (0.15.15)
[warning] 516-516: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@radis/reports/api/viewsets.py` around lines 512 - 516, In adestroy, the
Report.DoesNotExist except block should suppress the original exception per Ruff
B904; change the handler to capture the exception (e.g., "except
Report.DoesNotExist as err:") and re-raise Http404 with explicit chaining to
suppress the original (use "raise Http404 from None") so the original traceback
is not shown; update the except clause in the adestroy method accordingly.
Source: Linters/SAST tools
A single operator-driven command that drives the same async embedding code
path the ADRF views use: load NULL-embedding RSV rows, batch them, await
embed_reports_inline per batch. One embedding code path in the system, two
call sites.
Serves three operationally distinct scenarios with the same mechanics:
1. Historical backfill — reports loaded before the inline wiring shipped.
2. Dim or model change — after §4.5 (or
`ReportSearchVector.objects.update(embedding=None)` for a same-dim
model swap), every row is NULL.
3. Outage recovery — reports whose ADRF write succeeded but whose inline
embedding step caught EmbeddingClientError.
Properties:
- Idempotent (filter is `embedding IS NULL`; nothing-to-do is a no-op).
- Resumable (no checkpoint state; re-runs pick up remaining NULLs).
- Race-tolerant with live API traffic (deterministic model => concurrent
overlapping writes produce identical vectors; cost is one wasted HTTP
call per overlap, no corruption).
- Failure-tolerant within a run (embed_reports_inline catches
EmbeddingClientError; loop continues; next run retries the NULLs).
Tests cover empty pool, full drain in batches, --limit, and service-failure
no-crash. Tests use `transaction=True` so the helper's
database_sync_to_async writes are visible across connections — same pattern
as the test_report_api.py suite from PR #230.
Spec §4.5 (dim change), §5.4 (dev/prod model swap), §6.5 (recovery),
§11.3 (model-swap mitigation), and §12 step 5 (rollout) updated to point
at the command.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- operations.py docstring trimmed: drop the calling-pattern paragraph
(decorator stack + async_to_sync). That belongs at the call sites,
not in the operations module.
- serializers.py ReportSerializer docstring corrected: the serializer
NOW owns the atomic block via `@sync_to_async @transaction.atomic`
inside acreate/aupdate. The previous wording claimed the view's
helper owned the transaction — stale from before that ownership
moved into the serializer.
- viewsets.py module docstring cut from ~70 lines to ~15. The bullet
list of strategy notes moves to where the relevant code lives:
* bulk_upsert_reports gets a function docstring describing the
4-phase model + the caveat that Django's `a*` methods are still
internally sync_to_async-wrapped.
* adestroy gets an inline comment explaining why it owns its
own atomic helper instead of delegating to a serializer.
* Per-line comments at is_valid / asave / on_commit /
serializer.data / clone_request were already accurate and stay.
Comment-only change; no behavior impact.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… constant Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…/aupdate transaction.on_commit internally calls ensure_connection(), which is sync-only and raises SynchronousOnlyOperation when invoked from an async handler. adestroy and bulk_upsert_reports were already safe because they register on_commit inside a @sync_to_async @transaction.atomic helper; acreate and aupdate registered it directly in the async body and 500'd on every request. Wrap the registration call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop comments that reference the prior DRF code, describe how the conversion was performed, or restate what the code already shows. Keep the comments that capture a genuine non-obvious invariant (the router-dispatch trap, the async-unsafe transaction.on_commit wrap, the 403-not-404 upsert permission re-check, and the bulk phase outline). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Convert the report API from sync DRF to async ADRF as a 1:1 viewset translation, with the async/sync seam expressed uniformly across every handler.
radis/reports/api/viewsets.pykeeps its name andReportViewSetkeeps its place; the structural changes are:viewsets.GenericViewSet→adrf.viewsets.GenericViewSetwithadrf.mixins.*async mixins.def create/retrieve/update/destroy→async def acreate/aretrieve/aupdate/adestroy.@action(detail=False, …)forbulk_upsertstays a single class member, nowasync def.radis/reports/api/operations.py— async write operations (create_report_from_validated,update_report_from_validated,delete_report) using native async ORM (aget_or_create,acreate,asave,aset,aclear,adelete). Operations do not own transactions.ReportSerializerbecomesadrf.serializers.ModelSerializerwithasync def acreate/aupdate. Each of those is a@sync_to_async(thread_sensitive=True) @transaction.atomichelper that bridges to the corresponding operation viaasync_to_sync(...). The serializer owns its atomicity; the view does not._bulk_upsert_reportshelper (legacy) is renamed tobulk_upsert_reports, stays inviewsets.py, and is nowasync def. Its phases are explicitly split:@sync_to_async(thread_sensitive=True)helper.async for,await abulk_create).new_reports/updated_reportslists, pure CPU) —@sync_to_async(thread_sensitive=True)helper.@sync_to_async(thread_sensitive=True) @transaction.atomichelper with inline sync ORM (bulk_create / bulk_update / through-table churn).URLs are wired through
adrf.routers.DefaultRouter(not DRF's). See "Router choice is load-bearing" below.Why now: the follow-up PR will embed each uploaded report inline, during the upload request, by
awaiting the async embedding client from inside the view so the report's vector is populated by the time the API returns (no eventual-consistency window). The embedding call is I/O-bound, so an async view lets us yield to the event loop while it's in flight instead of pinning a worker thread. This PR is the structural prerequisite — no inline embedding wiring yet.Bridge architecture
Each view handler is a pure async coordinator:
await sync_to_async(serializer.is_valid)(raise_exception=True)— DRFis_validis sync; sync_to_async-wrapped from the view.report = await serializer.asave()— async-native, the serializer'sacreate/aupdateowns the atomic block.transaction.on_commit(on_commit)— registered afterasavereturns; fires immediately (atomic already committed) or queues under the test fixture (django_capture_on_commit_callbacks).await sync_to_async(lambda: serializer.data)()— DRFto_representationis sync.The atomic-owning helper inside
ReportSerializer.acreate/aupdateis:thread_sensitive=Trueensures the outer wrapper, the innerasync_to_syncevent loop, and the sync adapters Django's async ORM calls internally all run on the same Django thread, so the transaction context applies to every write the operation performs.adestroyfollows the same shape directly (no serializer involved): a@sync_to_async(thread_sensitive=True) @transaction.atomichelper callingasync_to_sync(operations.delete_report)(report).bulk_upsert_reportsPhase 4 keeps inline sync ORM in its atomic helper because the writes are single-statement bulk operations that don't decompose into per-entity ops.Router choice is load-bearing
DRF's
rest_framework.routers.DefaultRoutermaps HTTP methods to the sync action names (POST →create, PUT →update, GET →retrieve, DELETE →destroy). Becauseadrf.mixins.*inherit from DRF's sync mixins, those sync method names exist on the class as fully-functional sync methods — so DRF-router-driven dispatch silently calls the inherited sync mixin implementations and never reaches ouracreate/aretrieve/aupdate/adestroyoverrides. The four tests that exercise behavior added by our overrides (created/deleted on_commit handlers,?full=truedocuments,?upsert=true201) fail invisibly under this misconfiguration.adrf.routers.DefaultRouterrewrites the action mapping to thea-prefixed names wheneverview_is_async=True, so dispatch reaches our overrides. Verified locally withresolve(...).func.actions == {"post": "acreate", "put": "aupdate", ...}.The async-shape guard in
test_report_api.pypins every dispatched method toinspect.iscoroutinefunctionto catch a future contributor accidentally overriding the sync sibling — but that guard cannot catch a mis-wired router. The router choice is part of the architectural contract.API contract — byte-for-byte unchanged
URLs (
/api/reports/,/api/reports/{document_id}/,/api/reports/bulk-upsert/), route names (report-list,report-detail,report-bulk-upsert), response shapes, status codes, query params (?upsert,?full,?replace), 405-for-PATCH, and the 403-vs-404 semantics onPUT?upsert=trueagainst an unknown id are all preserved. The router's defaultlookup_value_regex = [^/.]+already forbids.indocument_id— the legacy behavior — without explicit configuration.The browsable API root at
/api/reports/(auto-generatedDefaultRouterindex view) is preserved as well.Contract is locked by
radis/reports/tests/test_report_api.py— written before the rewrite, passing against both the legacy DRF viewset (during intermediate commits) and the new ADRF viewset.Test plumbing
HTTP-based tests use
django.test.AsyncClient+async def+@pytest.mark.asyncio+@pytest.mark.django_db(transaction=True). The syncClientwould dispatch the async view viaasync_to_sync, which nested with ourdatabase_sync_to_asyncdeadlocks asgiref's thread executor (Django 6.0 + channels 4).AsyncClientruns the view in the test's event loop with no outer wrapping.transaction=Trueis needed because thedatabase_sync_to_asyncthread doesn't see aTestCase-wrapped transaction.Sync ORM helpers (
UserFactory.create,Token.objects.create_token) are wrapped inawait sync_to_async(...)()at the call site to satisfy Django'sasync_unsafeguard.Note on Django's current async ORM surface
Every
a*method onQuerySet/Manager(aget,aget_or_create,abulk_create,aset,acreate,asave,adelete, ...) is literallyawait sync_to_async(self.X)()in Django 6.0 / 6.1 — there is no native async DB backend in core today. Phase 2'sasync for/awaitcalls dispatch to the asgiref thread pool just like our explicitsync_to_asynccalls. The win today is architectural clarity, not runtime concurrency. Once a native async DB backend ships, the@sync_to_async @transaction.atomic+async_to_sync(operations.X)helpers in the serializer,adestroy, and Phase 4 collapse toasync with async_atomic(): return await operations.X(...).operations.pydoes not change; Phases 1 and 3 staysync_to_async-wrapped because they are pure CPU.Out of scope
radis.search,radis.chats,radis.extractions, ...).Design + plan docs
docs/superpowers/specs/2026-06-08-adrf-report-views-design.mddocs/superpowers/plans/2026-06-08-adrf-report-views.mdBoth docs explain the router-choice trap explicitly so the next person reading the spec doesn't repeat the mistake this PR made and recovered from.
Test plan
uv run cli lint— 0 errors, 0 warnings, 0 informations (ruff + pyright + djlint)uv run cli test -- --cov— 243 passed on the latest CI run.test_report_api.py(URL resolution, POST/GET/PUT/DELETE/PATCH→405,?upsert=true201/403,?full=truedocument fetchers, bulk-upsert reject paths, async-shape guard pinning every dispatched method toiscoroutinefunction).🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Behavior Changes / Bug Fixes
Tests
Documentation
Chores