Skip to content

Implement null detections (calculate what has been processed or not)#1093

Merged
mihow merged 22 commits intomainfrom
484-make-it-clear-what-images-have-not-been-processed
Mar 21, 2026
Merged

Implement null detections (calculate what has been processed or not)#1093
mihow merged 22 commits intomainfrom
484-make-it-clear-what-images-have-not-been-processed

Conversation

@vanessavmac
Copy link
Collaborator

@vanessavmac vanessavmac commented Jan 20, 2026

Summary

If a pipeline doesn't return detections but still successfully processed the image, it will have a "null" detection. This allows distinguishing "not yet processed" from "processed but found nothing."

Image selection logic for processing

When a pipeline runs, filter_processed_images() decides which images to send. Each image is checked against its existing detections from the pipeline's algorithms:

Yield (needs processing):

  1. No existing detections — image has never been processed by this pipeline
  2. Has real detections without classifications — detector ran but classifier didn't
  3. Has real detections with classifications, but not from all pipeline classifiers (e.g. a new classifier was added since last run)

Skip (already processed):
4. Only null detections exist (bbox=None) — pipeline ran but found nothing
5. Real detections exist and are fully classified by all pipeline classifiers

Null detections are sentinels that mark an image as "processed, nothing found." They are excluded from classification checks so they don't trigger reprocessing.

List of Changes

Backend (pipeline processing)

  • Made bbox field optional in DetectionRequest and DetectionResponse schemas
  • Added create_null_detections_for_undetected_images() to generate null detection records for images where the pipeline found nothing
  • Updated save_results() to append null detections before saving
  • Updated filter_processed_images() to skip images that have only null detections (already processed, nothing found) while still yielding images that have real unclassified detections
  • Added algorithm-specific lookup for null detections in get_or_create_detection() so different pipelines each track their own processed status per-image. Uses bbox__isnull=True (not bbox=None) to correctly match SQL NULL in Django's JSONField.
  • Added NULL_DETECTIONS_FILTER constant and DetectionQuerySet.null_detections() for consistent null detection filtering
  • Excluded null detections from get_detections_count(), update_detection_counts(), DetectionViewSet, and prefetch_detections()

Backend (processed status)

  • Added SourceImage.get_was_processed() model method with optional algorithm_key parameter
  • Added SourceImageQuerySet.with_was_processed() annotation for bulk use
  • Updated SourceImageAdmin to use the annotation in get_queryset() to avoid N+1 queries from get_was_processed in list_display
  • Added was_processed to timeline API response (EventTimelineIntervalSerializer)
  • Added source_images_processed_count to SourceImageCollectionSerializer

Frontend (capture sets)

  • Added "Processed captures" column to the capture sets table showing count and percentage

Tests

  • Added test_image_with_null_detection for null detection creation and get_was_processed() behavior
  • Added test_filter_processed_images_skips_null_only_image and test_filter_processed_images_yields_image_with_null_and_real_unclassified for the skip/yield logic
  • Added test_filter_processed_images_skips_null_and_fully_classified for the null + real fully-classified scenario
  • Added test_null_detections_are_algorithm_specific and test_null_detection_deduplication_same_pipeline for cross-pipeline isolation
  • Updated test_process_nats_pipeline_result_mixed_results to include a detection algorithm

Related Issues

Closes #484

Follow-up PR

Session detail timeline UI: #1179

How to Test the Changes

docker compose run --rm django python manage.py test ami.ml.tests.TestPipeline --keepdb
  • View an existing Capture Set that has a % processed (currently this number will never go up and the % of captures with detections will always match % of captures processed)
  • Process the Capture Set
  • Now the % of captures processed should reach 100% (and be greater or equal to the % captures with detections)
  • Process the Capture Set again, no images should be sent to the processing service (unless "always reprocess" has been manually turned on)

Deployment Notes

No migrations required. Null detections will be created for new pipeline runs going forward. Existing images will not have null detections until they are reprocessed.

Checklist

  • I have tested these changes appropriately.
  • I have added and/or modified relevant tests.
  • I updated relevant documentation or comments.
  • I have verified that this PR follows the project's coding standards.
  • Any dependent changes have already been merged to main.

@netlify
Copy link

netlify bot commented Jan 20, 2026

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit fd36611
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/69bdb2a8e698950008beab53

@netlify
Copy link

netlify bot commented Jan 20, 2026

Deploy Preview for antenna-ssec canceled.

Name Link
🔨 Latest commit fd36611
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/69bdb2a8c0192700084e26f2

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds processed-image tracking by introducing null-detections and annotations across pipeline processing, models, API, admin, tests, and frontend so images that were sent to ML pipelines (but returned no detections) are recorded and surfaced.

Changes

Cohort / File(s) Summary
Core models & querysets
ami/main/models.py
Add NULL_DETECTIONS_FILTER, DetectionQuerySet/DetectionManager, with_was_processed() and get_was_processed(), update detection-count queries, and add with_source_images_processed_count() on collections.
Pipeline logic & schema
ami/ml/models/pipeline.py, ami/ml/schemas.py
Make detection bbox optional; add create_null_detections_for_undetected_images(); update save_results() to create/persist null detections and adjust signature.
API layer
ami/main/api/views.py, ami/main/api/serializers.py
Expose was_processed in timeline responses and add source_images_processed_count to collection serializer and ordering; use with_was_processed() and exclude NULL_DETECTIONS_FILTER where appropriate.
Admin
ami/main/admin.py
Show get_was_processed in SourceImageAdmin.list_display and use with_was_processed() in admin queryset to avoid N+1 queries.
Frontend data & UI
ui/src/data-services/models/capture-set.ts, ui/src/utils/language.ts, ui/src/pages/project/capture-sets/...
Add getters numImagesProcessed/numImagesProcessedLabel, translation key for "Processed captures", add total-processed-captures column and enable its visibility by default.
ML tests & job tests
ami/ml/tests.py, ami/jobs/tests/test_tasks.py
Add tests covering null-detection handling, per-algorithm processing flags, deduplication, and update a pipeline task test to include an Algorithm setup.

Sequence Diagram(s)

sequenceDiagram
  participant ML as ML Pipeline
  participant Pipeline as save_results()
  participant DB as Database / Models
  participant API as Timeline View
  participant UI as Frontend

  ML->>Pipeline: return PipelineResultsResponse (may include empty detections)
  Pipeline->>Pipeline: create_null_detections_for_undetected_images() for images with no detections
  Pipeline->>DB: persist Detection records (including null bbox)
  DB-->>Pipeline: confirm persisted annotations (was_processed annotated)
  API->>DB: query SourceImage objects .with_was_processed(), include counts
  DB-->>API: return annotated images + source_images_processed_count
  UI->>API: request timeline / collection
  API-->>UI: return intervals/images with was_processed and processed counts
  UI->>UI: render processed-count column and labels
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

backend, ml

Suggested reviewers

  • annavik

Poem

🐰 I hopped through logs and database rows,

Marked empty boxes where no bbox grows.
Null detections whisper "this was tried",
Now processed captures won't be denied.
Frontend beams, backend hums — hooray!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 69.70% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Out of Scope Changes check ❓ Inconclusive Most changes are in-scope for #484, but the addition of algorithm-specific null detection lookup and the timeline was_processed field represent slightly broader scope than the linked issue's immediate requirements, though they are logical extensions needed for proper implementation. Verify that algorithm-specific null detection isolation and timeline enhancements were intentional extensions of the core feature, not scope creep.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title "Implement null detections (calculate what has been processed or not)" clearly and concisely summarizes the main change: introducing null detections as a mechanism to track processed status.
Linked Issues check ✅ Passed The PR successfully addresses the primary objectives from issue #484: implementing null detections to distinguish processed images with no detections from unprocessed ones, adding processed-status tracking via get_was_processed(), and exposing the metric in the collections table UI.
Description check ✅ Passed The pull request description is comprehensive and follows the template structure with all required sections completed: Summary, List of Changes, Related Issues, detailed descriptions of backend/frontend/test changes, How to Test, and Deployment Notes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 484-make-it-clear-what-images-have-not-been-processed

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

Flake8 can be used to improve the quality of Python code reviews.

Flake8 is a Python linter that wraps PyFlakes, pycodestyle and Ned Batchelder's McCabe script.

To configure Flake8, add a '.flake8' or 'setup.cfg' file to your project root.

See Flake8 Documentation for more details.

@vanessavmac vanessavmac changed the title Add null detections test; update save_results Implement null detections Feb 11, 2026
@vanessavmac vanessavmac marked this pull request as ready for review February 11, 2026 16:27
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ami/main/api/views.py (1)

378-439: ⚠️ Potential issue | 🟠 Major

Fix was_processed for empty intervals (and avoid stale values).

Line 439 uses image after the loop; for intervals with no captures (or events with zero captures) this either reuses the prior interval’s image or can crash. Accumulate during the loop and keep False when no captures.

🐛 Suggested fix
@@
             while image_index < len(source_images) and source_images[image_index]["timestamp"] <= interval_end:
                 image = source_images[image_index]
@@
                 if image["detections_count"] >= max(interval_data["detection_counts"]):
                     interval_data["top_capture"] = SourceImage(pk=image["id"])
+                interval_data["was_processed"] = interval_data["was_processed"] or image["was_processed"]
                 image_index += 1
@@
-            interval_data["was_processed"] = image["was_processed"]
+            # was_processed already accumulated; remain False when no captures
🤖 Fix all issues with AI agents
In `@ami/main/models.py`:
- Around line 1839-1841: Remove the redundant Q(bbox=None) in
get_detections_count() (keep Q(bbox__isnull=True) and the empty-list check
Q(bbox=[])) and then make update_detection_counts() use the same exclusion
filter as get_detections_count() so the cached detections_count is computed
consistently; specifically, update the counting logic in
update_detection_counts() to apply .exclude(Q(bbox__isnull=True) | Q(bbox=[]))
on self.detections before calling .count() so both methods return the same
value.

In `@ami/ml/models/pipeline.py`:
- Around line 882-888: The loop that assigns detector_algorithm_reference over
algorithms_known currently overwrites when multiple detection algorithms exist;
update the logic in the section that iterates algorithms_known to make selection
deterministic or to emit a null detection per detector: either pick one
deterministic algorithm (e.g., lowest key or sorted by name) and log/warn when
multiple AlgorithmTaskType.DETECTION entries are present, or instead create an
AlgorithmReference for each known_algorithm with task_type DETECTION (rather
than a single detector_algorithm_reference) so downstream checks use
per-detector null detections; modify the code around
detector_algorithm_reference, algorithms_known and AlgorithmReference
accordingly and add a warning using the existing logger when multiple detectors
are found.
- Around line 896-902: The appended null detection currently uses a naive
timestamp via datetime.datetime.now(); update the DetectionResponse creation
inside the null_detections_to_add.append call to use the timezone-aware now()
function (already imported and used elsewhere) instead of
datetime.datetime.now(), ensuring the timestamp field for DetectionResponse is
timezone-aware and consistent with other parts of the pipeline.

In `@ui/src/data-services/models/collection.ts`:
- Around line 93-101: Rename the getter numImagesProccessed to
numImagesProcessedLabel in the Collection model so the name is spelled correctly
and follows the "*Label" convention; update the getter implementation (currently
computing numProcessed, pct and returning the formatted string) to the new name,
then find and replace all usages (e.g., any occurrences of
item.numImagesProccessed such as in collection-columns.tsx) to
item.numImagesProcessedLabel, and update any type declarations/exports or tests
that reference the old symbol to the new one to ensure no unresolved references
remain.

In `@ui/src/pages/session-details/playback/activity-plot/activity-plot.tsx`:
- Around line 71-96: The "Was processed" trace currently maps wasProcessed→0/1
onto yaxis 'y2' (shared with detectionsMaxCount) so it becomes invisible when
detectionsMaxCount is large and the mapping is inverted; fix by either (A)
giving this trace its own y-axis (e.g., set yaxis: 'y3' and add a yaxis config
with fixed range [0,1]) and keep y as wasProcessed ? 0 : 1 but rename the trace
to "Not processed" (or add a clarifying comment about the inversion), or (B)
scale the y values to the detections axis by mapping y: timeline.map(t =>
t.numCaptures>0 ? (t.wasProcessed ? 0 : detectionsMaxCount) : 0) and keep the
name "Not processed" (or invert the boolean mapping and rename to "Was
processed" accordingly); update hover/customdata text to match the chosen
semantics.
🧹 Nitpick comments (3)
ui/src/data-services/models/timeline-tick.ts (1)

35-37: Consider adding a nullish fallback for wasProcessed, consistent with other getters.

Other getters in this class (e.g., numDetections, numCaptures) use ?? 0 / ?? false to guard against missing server data. If was_processed is absent from the API response (e.g., during rollout or with an older API version), this will return undefined instead of false.

Suggested fix
   get wasProcessed(): boolean {
-    return this._timelineTick.was_processed
+    return this._timelineTick.was_processed ?? false
   }
ami/main/admin.py (1)

256-269: Reduce admin list N+1 by annotating was_processed.

get_was_processed in list_display can trigger a query per row. Consider annotating in get_queryset and reading the annotation in an admin method.

♻️ Suggested adjustment
@@
     def get_queryset(self, request: HttpRequest) -> QuerySet[Any]:
-        return super().get_queryset(request).select_related("event", "deployment", "deployment__data_source")
+        return (
+            super()
+            .get_queryset(request)
+            .select_related("event", "deployment", "deployment__data_source")
+            .with_was_processed()
+        )
+
+    `@admin.display`(boolean=True, ordering="was_processed", description="Was processed")
+    def get_was_processed(self, obj: SourceImage) -> bool:
+        return getattr(obj, "was_processed", obj.get_was_processed())
ami/main/models.py (1)

3725-3734: Same bbox=None / bbox__isnull=True redundancy as in get_detections_count.

~models.Q(images__detections__bbox__isnull=True) and ~models.Q(images__detections__bbox=None) produce identical SQL. One can be removed for clarity.

Simplify the filter
             source_images_with_detections_count=models.Count(
                 "images",
                 filter=(
                     models.Q(images__detections__isnull=False)
                     & ~models.Q(images__detections__bbox__isnull=True)
-                    & ~models.Q(images__detections__bbox=None)
                     & ~models.Q(images__detections__bbox=[])
                 ),
                 distinct=True,
             )

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ami/ml/models/pipeline.py (1)

939-944: ⚠️ Potential issue | 🟠 Major

Null detections incorrectly create empty Occurrence records.

create_and_update_occurrences_for_detections receives the full detections list, which includes null Detection objects (where bbox=None). For each null detection without a pre-existing occurrence, it bulk-creates an Occurrence with event, deployment, and project set but no taxon or determination. These phantom occurrences are semantically wrong — a null detection means "no animal found" — and persist in the Occurrence table. Some API endpoints filter out these empty occurrences (checking determination__isnull=False), but the core queryset .valid() only excludes occurrences with zero detections, allowing these to inflate counts in certain views. Filter out null detections before creating occurrences:

🛠️ Proposed fix in save_results
+    real_detections = [d for d in detections if d.bbox is not None]
     create_and_update_occurrences_for_detections(
-        detections=detections,
+        detections=real_detections,
         logger=job_logger,
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/models/pipeline.py` around lines 939 - 944, The code passes the full
detections list (including null detections with bbox=None) into
create_and_update_occurrences_for_detections which creates empty Occurrence
records; filter the detections before calling that function (e.g., in
save_results) by keeping only detections where bbox is not None (or detection is
truthy) and then call create_and_update_occurrences_for_detections with that
filtered list so null detections are ignored and no phantom Occurrence rows are
created.
🧹 Nitpick comments (1)
ui/src/data-services/models/capture-set.ts (1)

82-101: numImagesWithDetectionsLabel can render "undefined (0%)" — worth aligning with the new pattern.

The new numImagesProcessedLabel correctly defaults numProcessed to 0 via ?? 0, so it never emits the string "undefined". The adjacent, pre-existing numImagesWithDetectionsLabel uses this.numImagesWithDetections?.toLocaleString() without a fallback; when numImagesWithDetections is undefined, the template literal produces the literal string "undefined (0%)" in the rendered cell.

Since the new getter sets the correct precedent, consider aligning numImagesWithDetectionsLabel for consistency:

♻️ Proposed alignment for numImagesWithDetectionsLabel
 get numImagesWithDetectionsLabel(): string {
+  const numWithDetections = this.numImagesWithDetections ?? 0
   const pct =
-    this.numImagesWithDetections && this.numImages
-      ? (this.numImagesWithDetections / this.numImages) * 100
+    this.numImages && this.numImages > 0
+      ? (numWithDetections / this.numImages) * 100
       : 0

-  return `${this.numImagesWithDetections?.toLocaleString()} (${pct.toFixed(
-    0
-  )}%)`
+  return `${numWithDetections.toLocaleString()} (${pct.toFixed(0)}%)`
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui/src/data-services/models/capture-set.ts` around lines 82 - 101, The getter
numImagesWithDetectionsLabel can produce "undefined (0%)" because it calls
this.numImagesWithDetections?.toLocaleString() without a fallback; update it to
mirror numImagesProcessedLabel by defaulting numImagesWithDetections to 0 (use
the nullish coalescing operator, e.g. const numWithDetections =
this.numImagesWithDetections ?? 0), compute pct from numWithDetections and
this.numImages, and use numWithDetections.toLocaleString() in the returned
template so the label never renders "undefined".
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/main/models.py`:
- Around line 3741-3744: Remove the redundant null-checks in the queryset: drop
the negated Q(models.Q(images__detections__bbox=None)) since
Q(...__bbox__isnull=True) and Q(...__bbox=None) are equivalent for JSONField,
and also remove the now-unnecessary models.Q(images__detections__isnull=False)
because excluding bbox IS NULL already guarantees a detection row; update the
queryset fragment that currently contains
models.Q(images__detections__isnull=False) &
~models.Q(images__detections__bbox__isnull=True) &
~models.Q(images__detections__bbox=None) &
~models.Q(images__detections__bbox=[]) to only keep the meaningful predicates
(e.g., ~models.Q(images__detections__bbox__isnull=True) and
~models.Q(images__detections__bbox=[])).

In `@ami/ml/models/pipeline.py`:
- Around line 917-924: filter_processed_images is incorrectly treating the null
placeholder detections (created by create_null_detections_for_undetected_images,
which use detections with bbox=None) as "unprocessed" because they have no
classifications; add a branch in filter_processed_images before the existing
classifications__isnull check that detects when existing_detections consists
only of null detections (e.g., all detections have bbox==None or an equivalent
null marker) and treat that case as "processed - no animal found" so the image
is skipped on subsequent runs; update any comments/tests to reflect that
null-only detection sets should not cause re-queuing.
- Around line 830-834: The int() casts on detection.source_image_id and
source_img.id can raise ValueError for non-numeric IDs; change the membership
check to use string comparison instead of int conversion by building
source_images_with_detections from str(detection.source_image_id) (iterating
results.detections) and comparing against str(source_img.id) when iterating
results.source_images (or otherwise guard the cast with a try/except and fall
back to string comparison) so comparisons are robust for UUIDs and other
non-integer IDs.

In `@ui/src/pages/project/capture-sets/capture-set-columns.tsx`:
- Around line 113-114: The renderCell parameter is typed as Collection but that
type isn't imported—replace the incorrect type with CaptureSet (i.e., change the
renderCell signature from item: Collection to item: CaptureSet) to match the
other renderCell entries and ensure CaptureSet is imported at the top of the
file so the BasicTableCell usage of item.numImagesProcessedLabel compiles.

---

Outside diff comments:
In `@ami/ml/models/pipeline.py`:
- Around line 939-944: The code passes the full detections list (including null
detections with bbox=None) into create_and_update_occurrences_for_detections
which creates empty Occurrence records; filter the detections before calling
that function (e.g., in save_results) by keeping only detections where bbox is
not None (or detection is truthy) and then call
create_and_update_occurrences_for_detections with that filtered list so null
detections are ignored and no phantom Occurrence rows are created.

---

Duplicate comments:
In `@ami/ml/models/pipeline.py`:
- Around line 835-840: The loop over algorithms_known silently lets the last
detection algorithm win; modify the logic in the block that sets
detector_algorithm_reference so that if a second AlgorithmTaskType.DETECTION is
encountered (i.e., detector_algorithm_reference is already set) you surface a
clear failure (raise a ValueError/RuntimeError or log+raise) instead of
overwriting; reference the symbols algorithms_known,
AlgorithmTaskType.DETECTION, detector_algorithm_reference, and
AlgorithmReference to implement the check and error with a descriptive message
listing conflicting algorithm names/keys.

---

Nitpick comments:
In `@ui/src/data-services/models/capture-set.ts`:
- Around line 82-101: The getter numImagesWithDetectionsLabel can produce
"undefined (0%)" because it calls this.numImagesWithDetections?.toLocaleString()
without a fallback; update it to mirror numImagesProcessedLabel by defaulting
numImagesWithDetections to 0 (use the nullish coalescing operator, e.g. const
numWithDetections = this.numImagesWithDetections ?? 0), compute pct from
numWithDetections and this.numImages, and use numWithDetections.toLocaleString()
in the returned template so the label never renders "undefined".

Copilot AI review requested due to automatic review settings February 28, 2026 00:36
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
ami/ml/models/pipeline.py (2)

911-918: ⚠️ Potential issue | 🟠 Major

Null detections will cause images to be re-queued on subsequent pipeline runs.

The filter_processed_images function (lines 81-115) yields images when existing_detections.filter(classifications__isnull=True).exists() is True. Since null detections (bbox=None) have no classifications, this condition will always be True for them, defeating the deduplication logic.

filter_processed_images needs an additional branch to recognize null-only detections as "processed, no animal found — skip" before the classifications__isnull check.

🛠️ Proposed fix in filter_processed_images
     for image in images:
         existing_detections = image.detections.filter(detection_algorithm__in=pipeline_algorithms)
+        real_detections = existing_detections.filter(bbox__isnull=False)
+        null_only = existing_detections.exists() and not real_detections.exists()
+
         if not existing_detections.exists():
             task_logger.debug(f"Image {image} needs processing: has no existing detections from pipeline's detector")
             yield image
-        elif existing_detections.filter(classifications__isnull=True).exists():
+        elif null_only:
+            task_logger.debug(f"Image {image} was processed with no detections (null detection exists), skipping")
+            continue
+        elif real_detections.filter(classifications__isnull=True).exists():
             task_logger.debug(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/models/pipeline.py` around lines 911 - 918, filter_processed_images is
treating any detection with classifications__isnull as unprocessed, but your
create_null_detections_for_undetected_images adds detections with bbox=None
which also have no classifications and thus get re-queued; update
filter_processed_images to first check for "null-only" detections (e.g.,
detections where bbox is None or bbox==NULL and there are no other non-null
bboxes) and consider those images as processed and skip them before falling back
to the existing classifications__isnull check; locate the logic in
filter_processed_images and use the same detection attributes/naming as in
create_null_detections_for_undetected_images and results.detections to identify
null-only detections and exclude those images from being yielded.

830-835: ⚠️ Potential issue | 🟡 Minor

Use string comparison instead of int() cast to avoid potential ValueError.

The source_image_id field is typed as str in DetectionResponse. Casting to int() could raise ValueError if UUIDs or other non-numeric IDs are used in the future.

🛠️ Proposed fix
-    source_images_with_detections = {int(detection.source_image_id) for detection in results.detections}
+    source_images_with_detections = {str(detection.source_image_id) for detection in results.detections}
     null_detections_to_add = []
     detection_algorithm_reference = AlgorithmReference(name=detection_algorithm.name, key=detection_algorithm.key)

     for source_img in results.source_images:
-        if int(source_img.id) not in source_images_with_detections:
+        if str(source_img.id) not in source_images_with_detections:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/models/pipeline.py` around lines 830 - 835, The code currently casts
detection.source_image_id to int which can raise ValueError for non-numeric IDs;
change the set and membership check to use string values instead: build
source_images_with_detections = {detection.source_image_id for detection in
results.detections} (or explicitly str(detection.source_image_id)) and compare
using if str(source_img.id) not in source_images_with_detections so all checks
around results.detections, source_images_with_detections, and the loop over
results.source_images use string comparisons rather than int() casts (leave
AlgorithmReference(name=detection_algorithm.name, key=detection_algorithm.key)
unchanged).
🧹 Nitpick comments (3)
ui/src/pages/session-details/playback/activity-plot/activity-plot.tsx (1)

11-11: Consider using a more muted green for visual consistency.

The bright green #00ff1a stands out significantly against the muted palette used for other traces (#4E4F57 captures, #5F8AC6 detections). Consider a softer green like #4CAF50 or #66BB6A for better visual harmony.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ui/src/pages/session-details/playback/activity-plot/activity-plot.tsx` at
line 11, Replace the overly bright green constant lineColorProcessed (currently
'#00ff1a') with a more muted green to match the rest of the palette; update the
value of lineColorProcessed to a softer hex such as '#4CAF50' or '#66BB6A' (or
another muted green your design system uses) so processed traces visually
harmonize with captures ('#4E4F57') and detections ('#5F8AC6').
ami/ml/models/pipeline.py (2)

817-821: Remove unused logger parameter or use it for logging.

The logger parameter is declared but never used in the function body. Either add logging statements or remove the parameter to avoid confusion.

♻️ Proposed fix

Option 1: Remove unused parameter:

 def create_null_detections_for_undetected_images(
     results: PipelineResultsResponse,
     detection_algorithm: Algorithm,
-    logger: logging.Logger = logger,
 ) -> list[DetectionResponse]:

Option 2: Add logging:

     for source_img in results.source_images:
         if int(source_img.id) not in source_images_with_detections:
+            logger.debug(f"Creating null detection for source image {source_img.id}")
             null_detections_to_add.append(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/models/pipeline.py` around lines 817 - 821, The function
create_null_detections_for_undetected_images currently declares a logger
parameter that is unused; either remove the logger parameter from the signature
and all callers (to keep the API minimal) or use it inside the function to emit
useful diagnostics (e.g., log how many null detections were created, which
detection_algorithm was used, and any skipped images). Update references to
PipelineResultsResponse, DetectionResponse and detection_algorithm within the
function accordingly and ensure callers are adjusted if you remove the
parameter.

895-900: Consider chaining exceptions for better debugging.

Per static analysis hint (B904), re-raising with from err or from None improves error diagnostics by preserving the exception chain.

♻️ Proposed fix
     try:
         detection_algorithm = pipeline.algorithms.get(task_type=AlgorithmTaskType.DETECTION)
-    except Algorithm.DoesNotExist:
-        raise ValueError("Pipeline does not have a detection algorithm")
-    except Algorithm.MultipleObjectsReturned:
-        raise NotImplementedError("Multiple detection algorithms per pipeline are not supported")
+    except Algorithm.DoesNotExist as err:
+        raise ValueError("Pipeline does not have a detection algorithm") from err
+    except Algorithm.MultipleObjectsReturned as err:
+        raise NotImplementedError("Multiple detection algorithms per pipeline are not supported") from err
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/models/pipeline.py` around lines 895 - 900, The exception handlers
around the pipeline.algorithms.get call should chain the original exceptions
when re-raising so the traceback is preserved; update the except blocks that
catch Algorithm.DoesNotExist and Algorithm.MultipleObjectsReturned to raise the
new ValueError and NotImplementedError using "from err" (e.g., raise
ValueError("...") from err) where "err" is the caught exception variable,
keeping the same messages and referencing the detection_algorithm retrieval and
Algorithm.DoesNotExist / Algorithm.MultipleObjectsReturned symbols.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@ami/ml/models/pipeline.py`:
- Around line 911-918: filter_processed_images is treating any detection with
classifications__isnull as unprocessed, but your
create_null_detections_for_undetected_images adds detections with bbox=None
which also have no classifications and thus get re-queued; update
filter_processed_images to first check for "null-only" detections (e.g.,
detections where bbox is None or bbox==NULL and there are no other non-null
bboxes) and consider those images as processed and skip them before falling back
to the existing classifications__isnull check; locate the logic in
filter_processed_images and use the same detection attributes/naming as in
create_null_detections_for_undetected_images and results.detections to identify
null-only detections and exclude those images from being yielded.
- Around line 830-835: The code currently casts detection.source_image_id to int
which can raise ValueError for non-numeric IDs; change the set and membership
check to use string values instead: build source_images_with_detections =
{detection.source_image_id for detection in results.detections} (or explicitly
str(detection.source_image_id)) and compare using if str(source_img.id) not in
source_images_with_detections so all checks around results.detections,
source_images_with_detections, and the loop over results.source_images use
string comparisons rather than int() casts (leave
AlgorithmReference(name=detection_algorithm.name, key=detection_algorithm.key)
unchanged).

---

Nitpick comments:
In `@ami/ml/models/pipeline.py`:
- Around line 817-821: The function create_null_detections_for_undetected_images
currently declares a logger parameter that is unused; either remove the logger
parameter from the signature and all callers (to keep the API minimal) or use it
inside the function to emit useful diagnostics (e.g., log how many null
detections were created, which detection_algorithm was used, and any skipped
images). Update references to PipelineResultsResponse, DetectionResponse and
detection_algorithm within the function accordingly and ensure callers are
adjusted if you remove the parameter.
- Around line 895-900: The exception handlers around the pipeline.algorithms.get
call should chain the original exceptions when re-raising so the traceback is
preserved; update the except blocks that catch Algorithm.DoesNotExist and
Algorithm.MultipleObjectsReturned to raise the new ValueError and
NotImplementedError using "from err" (e.g., raise ValueError("...") from err)
where "err" is the caught exception variable, keeping the same messages and
referencing the detection_algorithm retrieval and Algorithm.DoesNotExist /
Algorithm.MultipleObjectsReturned symbols.

In `@ui/src/pages/session-details/playback/activity-plot/activity-plot.tsx`:
- Line 11: Replace the overly bright green constant lineColorProcessed
(currently '#00ff1a') with a more muted green to match the rest of the palette;
update the value of lineColorProcessed to a softer hex such as '#4CAF50' or
'#66BB6A' (or another muted green your design system uses) so processed traces
visually harmonize with captures ('#4E4F57') and detections ('#5F8AC6').

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd6d22c and 93aa8b0.

📒 Files selected for processing (2)
  • ami/ml/models/pipeline.py
  • ui/src/pages/session-details/playback/activity-plot/activity-plot.tsx

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds explicit “processed” tracking for captures by introducing “null” detections when a pipeline succeeds but returns no detections, and surfaces this in timeline + capture set statistics in the UI/API.

Changes:

  • Backend: create/save “null” detections and adjust detection-counting logic to exclude null/empty bboxes while still tracking “processed”.
  • API/UI: expose and visualize was_processed in the session timeline; add “Total Processed Captures” to capture sets.
  • Tests: add coverage for saving results with null detections.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
ui/src/utils/language.ts Adds label string for “Total Processed Captures”.
ui/src/pages/session-details/playback/activity-plot/activity-plot.tsx Adds “Was processed” series to activity plot.
ui/src/pages/project/capture-sets/capture-sets.tsx Enables the new processed-captures column.
ui/src/pages/project/capture-sets/capture-set-columns.tsx Adds processed-captures column rendering.
ui/src/data-services/models/timeline-tick.ts Adds wasProcessed mapping from API field.
ui/src/data-services/models/capture-set.ts Adds processed capture count/label derivations.
ami/ml/tests.py Adds test for pipeline results producing null detections.
ami/ml/schemas.py Allows bbox to be optional in detection request/response schemas.
ami/ml/models/pipeline.py Creates null detections for undetected images; updates bbox serialization handling.
ami/main/models.py Adds queryset annotation for was_processed; excludes null/empty bbox from detection counts.
ami/main/api/views.py Adds was_processed to timeline payload; adds processed count annotation for collections; filters detections endpoint.
ami/main/api/serializers.py Adds was_processed to timeline interval serializer; includes processed count field for collections.
ami/main/admin.py Shows get_was_processed in SourceImage admin list display.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
ami/ml/models/pipeline.py (1)

87-90: ⚠️ Potential issue | 🟠 Major

Null-detection skip is too broad and can mask reprocessing needs.

Line 87 currently skips when any null detection exists. If a capture has both null and real detections, it will skip before checking real detections/classifications.

Suggested fix
-        elif existing_detections.null_detections().exists():  # type: ignore
-            task_logger.debug(f"Image {image} has a null detection from pipeline {pipeline}, skipping! ")
-            continue
-        elif existing_detections.filter(classifications__isnull=True).exists():
+        real_detections = existing_detections.exclude(models.Q(bbox__isnull=True) | models.Q(bbox=[]))
+        null_only = existing_detections.exists() and not real_detections.exists()
+        if null_only:
+            task_logger.debug(f"Image {image} has only null detections from pipeline {pipeline}, skipping!")
+            continue
+        elif real_detections.filter(classifications__isnull=True).exists():
             # Check if there are detections with no classifications
             task_logger.debug(
                 f"Image {image} needs processing: has existing detections with no classifications "
                 "from pipeline {pipeline}"
             )
             yield image
         else:
             # If there are existing detections with classifications,
             # Compare their classification algorithms to the current pipeline's algorithms
             pipeline_algorithm_ids = set(classification_algorithms.values_list("id", flat=True))
-            detection_algorithm_ids = set(existing_detections.values_list("classifications__algorithm_id", flat=True))
+            detection_algorithm_ids = set(real_detections.values_list("classifications__algorithm_id", flat=True))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/models/pipeline.py` around lines 87 - 90, The current check uses
existing_detections.null_detections().exists() which skips the image if any null
detection exists and can mask cases where real detections are also present;
change the logic so the pipeline only skips when ALL detections are null (i.e.,
there are no non-null/classified detections). Concretely, replace the current
condition with a check that no classified/non-null detections exist (for
example, use existing_detections.filter(classifications__isnull=False).exists()
to detect real detections and only skip when that returns False), and keep the
task_logger.debug message referencing image and pipeline.
🧹 Nitpick comments (1)
ami/ml/models/pipeline.py (1)

899-903: Use the same detection-task criteria here as in filter_processed_images.

Line 899 uses only AlgorithmTaskType.DETECTION, but earlier logic uses Algorithm.detection_task_types. Aligning these avoids inconsistent behavior across pipeline paths.

Suggested fix
-    try:
-        detection_algorithm = pipeline.algorithms.get(task_type=AlgorithmTaskType.DETECTION)
-    except Algorithm.DoesNotExist:
+    detection_algorithms = pipeline.algorithms.filter(task_type__in=Algorithm.detection_task_types)
+    if not detection_algorithms.exists():
         raise ValueError("Pipeline does not have a detection algorithm")
-    except Algorithm.MultipleObjectsReturned:
+    if detection_algorithms.count() > 1:
         raise NotImplementedError("Multiple detection algorithms per pipeline are not supported")
+    detection_algorithm = detection_algorithms.get()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/models/pipeline.py` around lines 899 - 903, The detection lookup uses
only AlgorithmTaskType.DETECTION but should use the same criteria as
filter_processed_images; update the lookup for detection_algorithm (the
pipeline.algorithms.get call) to use Algorithm.detection_task_types (e.g.,
task_type in/IN the detection_task_types set) instead of a single
AlgorithmTaskType.DETECTION and keep the existing exception handling
(Algorithm.DoesNotExist and Algorithm.MultipleObjectsReturned) so behavior
matches filter_processed_images.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/main/api/views.py`:
- Line 904: The queryset assignment is using inverted logic: change the
exclude(~NULL_DETECTIONS_FILTER) call on Detection to
exclude(NULL_DETECTIONS_FILTER) so it filters out null detections and returns
valid ones; update the line that sets queryset (the
Detection.objects.exclude(...) expression that also calls
select_related("source_image", "detection_algorithm")) to use
NULL_DETECTIONS_FILTER without the bitwise negation so behavior matches other
uses in the codebase.

---

Duplicate comments:
In `@ami/ml/models/pipeline.py`:
- Around line 87-90: The current check uses
existing_detections.null_detections().exists() which skips the image if any null
detection exists and can mask cases where real detections are also present;
change the logic so the pipeline only skips when ALL detections are null (i.e.,
there are no non-null/classified detections). Concretely, replace the current
condition with a check that no classified/non-null detections exist (for
example, use existing_detections.filter(classifications__isnull=False).exists()
to detect real detections and only skip when that returns False), and keep the
task_logger.debug message referencing image and pipeline.

---

Nitpick comments:
In `@ami/ml/models/pipeline.py`:
- Around line 899-903: The detection lookup uses only
AlgorithmTaskType.DETECTION but should use the same criteria as
filter_processed_images; update the lookup for detection_algorithm (the
pipeline.algorithms.get call) to use Algorithm.detection_task_types (e.g.,
task_type in/IN the detection_task_types set) instead of a single
AlgorithmTaskType.DETECTION and keep the existing exception handling
(Algorithm.DoesNotExist and Algorithm.MultipleObjectsReturned) so behavior
matches filter_processed_images.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 93aa8b0 and 1b12e14.

📒 Files selected for processing (4)
  • ami/main/api/views.py
  • ami/main/models.py
  • ami/ml/models/pipeline.py
  • ui/src/pages/project/capture-sets/capture-set-columns.tsx
🚧 Files skipped from review as they are similar to previous changes (1)
  • ui/src/pages/project/capture-sets/capture-set-columns.tsx

@mihow mihow requested a review from annavik March 2, 2026 06:38
@mihow mihow changed the title Implement null detections Implement null detections (calculate what has been processed or not) Mar 2, 2026
@mihow
Copy link
Collaborator

mihow commented Mar 2, 2026

@annavik this includes some important backend changes for processing, but we also can start showing what has been or hasn't been processed in the UI! My suggestion is that we add it to the collections view for now, but spend some more time on the session detail view in a follow-up PR (perhaps it's time for a vertical bar chart or another type of visualization).

@annavik annavik self-assigned this Mar 2, 2026
@annavik
Copy link
Member

annavik commented Mar 2, 2026

@annavik this includes some important backend changes for processing, but we also can start showing what has been or hasn't been processed in the UI! My suggestion is that we add it to the collections view for now, but spend some more time on the session detail view in a follow-up PR (perhaps it's time for a vertical bar chart or another type of visualization).

Happy to take this on! This PR looks like a great start. I'll review shortly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ami/ml/models/pipeline.py (1)

422-426: ⚠️ Potential issue | 🟠 Major

Don’t de-duplicate null detections across algorithms.

For bbox=None, this lookup merges placeholders from different pipelines into the first matching row on the image. That breaks algorithm-scoped processed tracking: SourceImage.get_was_processed(algorithm_key=...) and filter_processed_images() can no longer tell that pipeline B processed the image if pipeline A wrote the null marker first.

💡 Suggested fix
-    existing_detection = Detection.objects.filter(
-        source_image=source_image,
-        bbox=serialized_bbox,
-    ).first()
+    lookup = {
+        "source_image": source_image,
+        "bbox": serialized_bbox,
+    }
+    if serialized_bbox is None:
+        assert detection_resp.algorithm, f"No detection algorithm was specified for detection {detection_repr}"
+        detection_algo = algorithms_known[detection_resp.algorithm.key]
+        lookup["detection_algorithm"] = detection_algo
+
+    existing_detection = Detection.objects.filter(**lookup).first()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/models/pipeline.py` around lines 422 - 426, The current lookup for
existing detections (Detection.objects.filter(source_image=source_image,
bbox=serialized_bbox).first()) incorrectly de-duplicates null bboxes across
algorithms; update the lookup so that when serialized_bbox is None you also
filter by the algorithm identifier (e.g., include algorithm=algorithm_key or
detection_algorithm=algorithm_key) or else skip the global .first() merge and
always create a per-algorithm placeholder; change the Detection.objects.filter
call to include the algorithm field when bbox is None to preserve
algorithm-scoped markers.
♻️ Duplicate comments (1)
ami/ml/models/pipeline.py (1)

87-89: ⚠️ Potential issue | 🟠 Major

Only treat an image as “processed with no detections” when all its detections are null placeholders.

This branch skips as soon as any null detection exists. After a reprocess, an image can legitimately have a legacy null placeholder plus real detections; in that case this short-circuits the classification checks below and incorrectly suppresses reprocessing.

💡 Suggested fix
-        elif existing_detections.null_detections().exists():  # type: ignore
+        elif not existing_detections.exclude(bbox__isnull=True).exclude(bbox=[]).exists():
             task_logger.debug(f"Image {image} has a null detection from pipeline {pipeline}, skipping! ")
             continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/models/pipeline.py` around lines 87 - 89, The current branch
short-circuits when any null placeholder exists
(existing_detections.null_detections().exists()), which is wrong; change the
check so we only skip when every detection for the image is a null placeholder
(e.g., compare counts or use an "all are null" style query), and only then call
task_logger.debug(... image ... pipeline ...) and continue; modify the condition
around existing_detections.null_detections() to require that null placeholders
constitute the entire result set (for example: existing_detections.count() ==
existing_detections.null_detections().count()) before skipping.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/main/models.py`:
- Around line 3775-3776: The constant SOURCE_IMAGES_WITH_NULL_DETECTIONS_FILTER
currently uses Q(images__detections__isnull=True) which actually matches images
with no detection rows (i.e., no detections) rather than images with an existing
detection row that is a null/empty placeholder; either rename the constant to
SOURCE_IMAGES_WITH_NO_DETECTIONS_FILTER to reflect the current semantics, or
change the Q to explicitly match a placeholder detection row (e.g. use
Q(images__detections__isnull=False) &
Q(images__detections__<placeholder_field>__isnull=True) or
Q(images__detections__is_placeholder=True) depending on your model field) —
update SOURCE_IMAGES_WITH_NULL_DETECTIONS_FILTER accordingly and fix any callers
to use the new name/semantics.

---

Outside diff comments:
In `@ami/ml/models/pipeline.py`:
- Around line 422-426: The current lookup for existing detections
(Detection.objects.filter(source_image=source_image,
bbox=serialized_bbox).first()) incorrectly de-duplicates null bboxes across
algorithms; update the lookup so that when serialized_bbox is None you also
filter by the algorithm identifier (e.g., include algorithm=algorithm_key or
detection_algorithm=algorithm_key) or else skip the global .first() merge and
always create a per-algorithm placeholder; change the Detection.objects.filter
call to include the algorithm field when bbox is None to preserve
algorithm-scoped markers.

---

Duplicate comments:
In `@ami/ml/models/pipeline.py`:
- Around line 87-89: The current branch short-circuits when any null placeholder
exists (existing_detections.null_detections().exists()), which is wrong; change
the check so we only skip when every detection for the image is a null
placeholder (e.g., compare counts or use an "all are null" style query), and
only then call task_logger.debug(... image ... pipeline ...) and continue;
modify the condition around existing_detections.null_detections() to require
that null placeholders constitute the entire result set (for example:
existing_detections.count() == existing_detections.null_detections().count())
before skipping.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 609f7924-e4b9-431e-9c29-f225175e5742

📥 Commits

Reviewing files that changed from the base of the PR and between f917853 and 15487cb.

📒 Files selected for processing (7)
  • ami/jobs/tests/test_tasks.py
  • ami/main/api/serializers.py
  • ami/main/api/views.py
  • ami/main/models.py
  • ami/ml/models/pipeline.py
  • ami/ml/tests.py
  • ui/src/utils/language.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • ui/src/utils/language.ts
  • ami/main/api/serializers.py
  • ami/ml/tests.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/main/models.py`:
- Around line 1780-1789: The with_was_processed queryset currently marks images
as processed if any Detection exists, but must be scoped to the same
algorithm/pipeline semantics as
SourceImage.get_was_processed(algorithm_key=...); update with_was_processed (and
the analogous helper around lines ~3794-3800) to accept an optional
algorithm_key (or pipeline identifier) parameter and use it in the Exists/Count
filter (e.g. Detection.objects.filter(source_image_id=models.OuterRef("pk"),
algorithm_key=algorithm_key) or the equivalent pipeline field) so the annotation
only considers detections from that algorithm; keep the OuterRef("pk") join and
return the annotated queryset as before.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b33a1f87-82f7-458a-950c-108bb4fcee10

📥 Commits

Reviewing files that changed from the base of the PR and between 15487cb and 31dea6e.

📒 Files selected for processing (1)
  • ami/main/models.py

@annavik
Copy link
Member

annavik commented Mar 16, 2026

Hello @vanessavmac and @mihow! It's so great to soon have this information, it will help a lot, thank you!

Some thoughts on the UI side:

  • Currently one tick on the timeline might involve more than one capture (see screenshot). I think for users, this will make "was processed" a bit too unspecific. How can we know if all, some or none captures was processed for a tick?
  • The color selected for the graph does not belong to our design system.
  • We are planning to rework the timeline component in a near future (see Session ticks look off for generated sessions (short samples) #1163), because it has some problems. We would also like to explore a bar chart layout. I'm a bit unsure if a good idea to extend this component with more functionality before this, but also I don't want to block your work.

What I suggest:

  • Present "Captures processed" (number) instead of "Was processed" (boolean) in the graph. I think this would make it easier to compare with the capture count graph?
  • Use a color from our design system for the new graph, maybe we can try the emerald/success color, since we also use that to for "successful processing status".

Or an alternative:

  • Exclude all timeline component updates in this PR, instead we can include this data when we rework this component. It's already great to see the number of captures processed for the capture set!

Example of a session with more than one capture per tick:
Screenshot 2026-03-16 at 15 33 30

# Detections with null bounding boxes are valid and indicates the image was successfully processed
return self.detections.exclude(NULL_DETECTIONS_FILTER).count()

def get_was_processed(self, algorithm_key: str | None = None) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could introduce n+1 queries, we have a queryset method for with_was_processed. I know we need a was_processed attribute on the model so the serializer knows it exists, but our pattern so far has been to return None for this placeholder model method and add a docstring pointing to the query annotation instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears this is only used in the admin & tests. The was_processed query annotation is what's used in the serializer / API View

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed: get_was_processed now reads from the was_processed annotation when available (hasattr(self, 'was_processed')), falling back to a DB query otherwise. SourceImageAdmin.get_queryset calls .with_was_processed() so the admin list view uses the annotation. See commit 10d8c45.


def get_detections_count(self) -> int:
return self.detections.distinct().count()
# Detections count excludes detections without bounding boxes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be a queryset method as well? I see it already exists before this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not changed in this round — lower priority since it's only used per-instance (admin detail, tests). Could be a follow-up.

task_logger.debug(f"Image {image} needs processing: has no existing detections from pipeline's detector")
# If there are no existing detections from this pipeline, send the image
yield image
elif existing_detections.null_detections().exists(): # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unnecessary! it's covered by the first on check if not existing_detections.exists() since we are checking for any detection there.

@mihow
Copy link
Collaborator

mihow commented Mar 17, 2026

Code review

Found 3 issues:

  1. filter_processed_images skips images with both null detections and real unclassified detections. The elif existing_detections.null_detections().exists(): continue fires before the elif existing_detections.filter(classifications__isnull=True).exists(): yield image check. Any image that has a null detection (processed, nothing found) plus a real detection lacking classifications will be skipped — its real detections never get classified. The null check should only skip when there are no real detections.

if not existing_detections.exists():
task_logger.debug(f"Image {image} needs processing: has no existing detections from pipeline's detector")
# If there are no existing detections from this pipeline, send the image
yield image
elif existing_detections.null_detections().exists(): # type: ignore
task_logger.debug(f"Image {image} has a null detection from pipeline {pipeline}, skipping! ")
continue
elif existing_detections.filter(classifications__isnull=True).exists():
# Check if there are detections with no classifications
task_logger.debug(
f"Image {image} needs processing: has existing detections with no classifications "
"from pipeline {pipeline}"
)
yield image
else:

  1. get_or_create_detection uses filter(source_image=source_image, bbox=None) for null detections, which matches any existing null-bbox detection regardless of algorithm. If two pipelines (A and B) both process the same image with no detections, pipeline B's call finds pipeline A's null detection and reuses it. The null detection retains algorithm A, so get_was_processed(algorithm_key=<B-detector>) returns False even though pipeline B processed the image. Algorithm should be included in the lookup when bbox is None.

# When reprocessing, we don't care which detection algorithm created the existing detection
existing_detection = Detection.objects.filter(
source_image=source_image,
bbox=serialized_bbox,
).first()

  1. was_processed is added to EventTimelineIntervalSerializer but the TypeScript ServerTimelineTick type is not updated to include it. The field is silently ignored by the frontend; no component reads or renders it.

export type ServerTimelineTick = {
start: string
end: string
first_capture: {
id: number
} | null
top_capture: {
id: number
} | null
captures_count: number
detections_count: number
detections_avg: number
}

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

mihow and others added 3 commits March 17, 2026 15:50
…ed_images

An image with both a null detection and real unclassified detections
was being incorrectly skipped due to elif ordering. The null check now
only fires when all existing detections are null (bbox=None), preserving
reprocessing for images with real unclassified detections.

Co-Authored-By: Claude <noreply@anthropic.com>
…create_detection

Null detections are sentinel values, not physical detections. Previously,
the bbox=None lookup was algorithm-agnostic, so pipeline B would reuse
pipeline A's null detection. This broke get_was_processed(algorithm_key=...)
for any pipeline that wasn't first to process an image.

Also fixed a JSONField gotcha: filter(bbox=None) matches JSON null literal,
not SQL NULL. Null detections now use bbox__isnull=True for correct lookups.

The algorithm resolution is factored out to occur once before the lookup,
since it's needed for both deduplication and creation.

Co-Authored-By: Claude <noreply@anthropic.com>
get_was_processed was in SourceImageAdmin.list_display, firing one
detections.exists() query per row. The admin queryset now uses
with_was_processed() annotation, and the model method reads from the
annotation when available instead of hitting the DB again.

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
ami/ml/models/pipeline.py (1)

91-107: ⚠️ Potential issue | 🟠 Major

Exclude null detections from the “unclassified” branch.

At Line 95, the query still includes null detections, so images with a historical null sentinel plus fully-classified real detections can be re-queued repeatedly. Restrict the unclassified/classifier-id checks to real detections only.

Suggested fix
     for image in images:
         existing_detections = image.detections.filter(detection_algorithm__in=pipeline_algorithms)
+        real_detections = existing_detections.exclude(NULL_DETECTIONS_FILTER)
         if not existing_detections.exists():
             task_logger.debug(f"Image {image} needs processing: has no existing detections from pipeline's detector")
             # If there are no existing detections from this pipeline, send the image
             yield image
-        elif not existing_detections.exclude(NULL_DETECTIONS_FILTER).exists():  # type: ignore
+        elif not real_detections.exists():  # type: ignore
             # All detections for this image are null (processed but nothing found) — skip
             task_logger.debug(f"Image {image} has only null detections from pipeline {pipeline}, skipping!")
             continue
-        elif existing_detections.filter(classifications__isnull=True).exists():
+        elif real_detections.filter(classifications__isnull=True).exists():
             # Check if there are detections with no classifications
             task_logger.debug(
                 f"Image {image} needs processing: has existing detections with no classifications "
                 "from pipeline {pipeline}"
             )
             yield image
         else:
             pipeline_algorithm_ids = set(classification_algorithms.values_list("id", flat=True))
-            detection_algorithm_ids = set(existing_detections.values_list("classifications__algorithm_id", flat=True))
+            detection_algorithm_ids = set(real_detections.values_list("classifications__algorithm_id", flat=True))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/models/pipeline.py` around lines 91 - 107, The unclassified branch
still includes null sentinel detections, so change the checks and queries that
look for missing classifications and algorithm IDs to exclude
NULL_DETECTIONS_FILTER: update the condition using
existing_detections.filter(classifications__isnull=True).exists() to use
existing_detections.exclude(NULL_DETECTIONS_FILTER).filter(classifications__isnull=True).exists(),
and likewise use
existing_detections.exclude(NULL_DETECTIONS_FILTER).values_list("classifications__algorithm_id",
flat=True) when building detection_algorithm_ids so only real (non-null)
detections are considered in the unclassified/classifier-id logic.
ami/main/models.py (1)

1780-1791: ⚠️ Potential issue | 🟠 Major

Scope processed-state helpers to algorithm/pipeline to prevent cross-pipeline false positives.

Line 1780 and Line 3810 currently mark/count an image as processed if any detection exists. In multi-pipeline projects, this can report processed status for the wrong pipeline.

💡 Suggested fix
-    def with_was_processed(self):
+    def with_was_processed(self, algorithm_key: str | None = None):
         """
         Annotate each SourceImage with a boolean `was_processed` indicating
         whether any detections exist for that image.
@@
-        processed_exists = models.Exists(Detection.objects.filter(source_image_id=models.OuterRef("pk")))
+        processed_qs = Detection.objects.filter(source_image_id=models.OuterRef("pk"))
+        if algorithm_key is not None:
+            processed_qs = processed_qs.filter(detection_algorithm__key=algorithm_key)
+        processed_exists = models.Exists(processed_qs)
         return self.annotate(was_processed=processed_exists)
-    def with_source_images_processed_count(self):
+    def with_source_images_processed_count(self, algorithm_id: int | None = None):
+        processed_filter = models.Q(images__detections__isnull=False)
+        if algorithm_id is not None:
+            processed_filter &= models.Q(images__detections__detection_algorithm_id=algorithm_id)
         return self.annotate(
             source_images_processed_count=models.Count(
                 "images",
-                filter=models.Q(images__detections__isnull=False),
+                filter=processed_filter,
                 distinct=True,
             )
         )

Also applies to: 3810-3816

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/main/models.py` around lines 1780 - 1791, The with_was_processed
annotation currently marks images as processed if any Detection exists, causing
cross-pipeline false positives; change with_was_processed to accept a
pipeline/algorithm identifier (e.g., pipeline_id or algorithm_name) — matching
the signature used by SourceImage.get_was_processed — and add that constraint to
the Exists filter (i.e.,
Detection.objects.filter(source_image_id=models.OuterRef("pk"),
pipeline=selected_pipeline) or algorithm=selected_algorithm). Update the
corresponding helper at ~3810 (the similar was/count logic) the same way so both
annotations are scoped to the provided pipeline/algorithm and default to the
previous behavior when no pipeline/algorithm is passed.
🧹 Nitpick comments (1)
ami/ml/tests.py (1)

782-807: Add one more regression case: null + real + fully-classified should be skipped.

Current additions cover null-only and null+real-unclassified well, but a test for “null + real-classified” would lock in the expected non-requeue behavior and catch false positives in filter_processed_images.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ami/ml/tests.py` around lines 782 - 807, Add a regression test that ensures
an image with a null detection plus a real detection that already has a complete
classification is skipped by filter_processed_images: create an image and
detector as in the existing tests, create a Detection with bbox=None, create
another Detection with a real bbox and attach a complete Classification (using
the same model used elsewhere in tests), then assert
list(filter_processed_images([image], self.pipeline)) returns [] (image should
not be yielded). Reference the existing test setup and use the same Detection
and Classification model creation pattern so this mirrors the other cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@ami/main/models.py`:
- Around line 1780-1791: The with_was_processed annotation currently marks
images as processed if any Detection exists, causing cross-pipeline false
positives; change with_was_processed to accept a pipeline/algorithm identifier
(e.g., pipeline_id or algorithm_name) — matching the signature used by
SourceImage.get_was_processed — and add that constraint to the Exists filter
(i.e., Detection.objects.filter(source_image_id=models.OuterRef("pk"),
pipeline=selected_pipeline) or algorithm=selected_algorithm). Update the
corresponding helper at ~3810 (the similar was/count logic) the same way so both
annotations are scoped to the provided pipeline/algorithm and default to the
previous behavior when no pipeline/algorithm is passed.

In `@ami/ml/models/pipeline.py`:
- Around line 91-107: The unclassified branch still includes null sentinel
detections, so change the checks and queries that look for missing
classifications and algorithm IDs to exclude NULL_DETECTIONS_FILTER: update the
condition using
existing_detections.filter(classifications__isnull=True).exists() to use
existing_detections.exclude(NULL_DETECTIONS_FILTER).filter(classifications__isnull=True).exists(),
and likewise use
existing_detections.exclude(NULL_DETECTIONS_FILTER).values_list("classifications__algorithm_id",
flat=True) when building detection_algorithm_ids so only real (non-null)
detections are considered in the unclassified/classifier-id logic.

---

Nitpick comments:
In `@ami/ml/tests.py`:
- Around line 782-807: Add a regression test that ensures an image with a null
detection plus a real detection that already has a complete classification is
skipped by filter_processed_images: create an image and detector as in the
existing tests, create a Detection with bbox=None, create another Detection with
a real bbox and attach a complete Classification (using the same model used
elsewhere in tests), then assert list(filter_processed_images([image],
self.pipeline)) returns [] (image should not be yielded). Reference the existing
test setup and use the same Detection and Classification model creation pattern
so this mirrors the other cases.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 61e75ec4-c0be-48a6-b3c8-69fd5d8facca

📥 Commits

Reviewing files that changed from the base of the PR and between c0132bd and 10d8c45.

📒 Files selected for processing (4)
  • ami/main/admin.py
  • ami/main/models.py
  • ami/ml/models/pipeline.py
  • ami/ml/tests.py

…processed_images

A null detection has no classifications, so it was triggering the
"unclassified detections" branch and causing unnecessary reprocessing
even when the real detections were fully classified. The classification
check now only considers real detections (non-null bbox).

Added test for the null + real fully-classified scenario.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow
Copy link
Collaborator

mihow commented Mar 18, 2026

Status update on review comments

Addressed in recent commits:

  1. filter_processed_images elif ordering — only skips images where ALL detections are null. Also excludes null detections from the classification check (a null detection has no classifications, so it was triggering unnecessary reprocessing even when real detections were fully classified). See 8fba25a, 2a61fbb.

  2. get_or_create_detection algorithm collision — null detection lookups are now algorithm-specific (bbox__isnull=True + detection_algorithm=algo). Also fixed a Django JSONField gotcha where filter(bbox=None) matches JSON null literal, not SQL NULL. See 10d8c45.

  3. get_was_processed N+1 — model method now reads from the was_processed annotation when available. SourceImageAdmin.get_queryset adds .with_was_processed(). See 10d8c45.

  4. One detection algorithm per pipeline assertion — already in place at save_results lines 904-909.

New tests added:

  • test_filter_processed_images_skips_null_only_image
  • test_filter_processed_images_yields_image_with_null_and_real_unclassified
  • test_filter_processed_images_skips_null_and_fully_classified
  • test_null_detections_are_algorithm_specific
  • test_null_detection_deduplication_same_pipeline

Not addressed (lower priority):

  • get_detections_count as queryset method — only used per-instance in admin/tests

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ami/ml/models/pipeline.py`:
- Around line 438-445: The except KeyError handler that raises
PipelineNotConfigured when looking up detection_algo from algorithms_known
should chain the original KeyError (e.g., capture it as err) and re-raise using
"raise PipelineNotConfigured(...) from err" to preserve traceback context;
update the block referencing detection_resp.algorithm.key, algorithms_known, and
PipelineNotConfigured accordingly, and apply the identical change to the similar
lookup block that raises PipelineNotConfigured later (the block handling the
other algorithm lookup) so both use "raise ... from err".
- Around line 930-935: In the try/except around
pipeline.algorithms.get(task_type__in=Algorithm.detection_task_types), preserve
exception context by chaining the new exceptions off the original ones;
specifically, when catching Algorithm.DoesNotExist and
Algorithm.MultipleObjectsReturned, re-raise the ValueError and
NotImplementedError respectively using "raise ValueError(... ) from None" and
"raise NotImplementedError(... ) from None" so the original Django exception
traceback is handled per Ruff B904 while keeping your descriptive messages;
update the handlers for Algorithm.DoesNotExist and
Algorithm.MultipleObjectsReturned in the same block that calls
pipeline.algorithms.get.

In `@ami/ml/tests.py`:
- Around line 851-889: The test fails because fake_pipeline_results returns
detections hardcoded with ALGORITHM_CHOICES["random-detector"] which mismatch
pipeline_2.algorithms; update the test so the fake results use the pipeline's
actual algorithm(s) instead of the hardcoded "random-detector" or alternatively
register "random-detector" on pipeline_2 before saving. Concretely, modify the
call path in fake_pipeline_results (or its internals) to read the detector
key(s) from the pipeline object passed in (pipeline.algorithms.first() or
similar) so save_results receives detections whose algorithm.key matches
pipeline_2.algorithms, or call
pipeline_2.algorithms.set([Algorithm.objects.get(key="random-detector"), ...])
before save_results.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c273cd57-eae1-4fa6-a203-7340447f251f

📥 Commits

Reviewing files that changed from the base of the PR and between 10d8c45 and 2a61fbb.

📒 Files selected for processing (2)
  • ami/ml/models/pipeline.py
  • ami/ml/tests.py

mihow and others added 2 commits March 18, 2026 12:31
The function docstring now clearly describes which images are yielded
(needs processing) vs skipped (already processed), including the role
of null detections as sentinels.

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Collaborator

@mihow mihow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pragmatic approach to very important addition and a long-time need. Thank you @vanessavmac

@mihow mihow merged commit ff4b939 into main Mar 21, 2026
9 of 11 checks passed
@mihow mihow deleted the 484-make-it-clear-what-images-have-not-been-processed branch March 21, 2026 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make it clear what images have not been processed

5 participants