Skip to content

feat(memory): pipeline hardening — typed failures, BYO embeddings, doctor, honest status#3076

Open
zahica1234 wants to merge 31 commits into
tinyhumansai:mainfrom
zahica1234:pr/memory-pipeline-hardening
Open

feat(memory): pipeline hardening — typed failures, BYO embeddings, doctor, honest status#3076
zahica1234 wants to merge 31 commits into
tinyhumansai:mainfrom
zahica1234:pr/memory-pipeline-hardening

Conversation

@zahica1234
Copy link
Copy Markdown
Contributor

@zahica1234 zahica1234 commented May 31, 2026

Summary

  • Adds a typed failure model (memory_tree/health) that classifies embed/extract failures as transient (retry) vs unrecoverable (fail fast), with stable codes + i18n remediation keys.
  • The chunk→wiki pipeline no longer masks a broken embeddings config: unrecoverable errors fail fast (no retry-storm), and a missing provider skips embedding (chunk stays re-embeddable) instead of writing all-zero vectors presented as success.
  • Honours a user-configured OpenAI / custom OpenAI-compatible embeddings provider in the memory-tree path (previously ignored → silently fell through to the managed backend), and sends the OpenAI dimensions request param so text-embedding-3-large returns the tree's required 1024 dims.
  • "Build Summary Trees" runs on the configured cloud provider when local AI is off (was a hard error → dead button for cloud-only setups); summarizer now reports partial success instead of aborting the whole run on one node's failure.
  • Surfaces honest status: pipeline_status gains degraded + first_blocking_cause + extraction_coverage; a one-shot memory doctor (RPC + CLI + agent tool); failed jobs auto-retry on the next sync (+ a manual retry_failed RPC); and a vault-ledger fix so a memory reset re-ingests instead of dead-locking at "N files synced · 0 chunks". The Intelligence panel renders an amber "Degraded" pill + an actionable remediation banner.

Problem

A workspace with a broken or unconfigured embeddings provider silently produced zero-vector embeddings and reported "running" over an empty / structure-less wiki. Hard 4xx errors (e.g. "Insufficient budget") were retried like transient 5xx, burning the job queue. A user with OpenAI embeddings configured still hit "managed budget" because the memory-tree embedder factory had no openai/custom branch and ignored their key. "Build Summary Trees" hard-errored requires local_ai to be enabled on any cloud-only routing. After a memory reset, the vault file-ledger went stale and sync hash-skipped every file forever. In short: when memory wasn't populating, there was no way to see why.

Solution

  • Classify at the provider boundary. classify_embed_error parses the Embedding API error (<status>): <body> shape into a typed PipelineFailure; the worker downcasts it and fails fast on the Unrecoverable class, persisting failure_reason/failure_class on mem_tree_jobs (additive migration).
  • Skip, don't fake. build_write_embedder returns OptionNone ⇒ skip embedding + mark semantic_recall degraded (no zero vectors); provider="none" stays a deliberate inert opt-out.
  • BYO embeddings. OpenAiCompatEmbedder adapts the unified EmbeddingProvider (reusing the existing credential resolution — one source of truth) to the memory-tree Embedder trait, tried before the managed-cloud fallback, pinned to EMBEDDING_DIM (1024). OpenAiEmbedding::with_send_dimensions(true) (default off, so Voyage/Cohere/Ollama are unaffected) sends dimensions.
  • Cloud summarizer + partial success. tree_runtime::create_provider returns (provider, model) and falls back to the configured summarization-role cloud provider; the engine threads the resolved model. Propagation loops collect-and-continue.
  • Recovery + visibility. requeue_failed runs at sync start + behind a memory_tree_retry_failed RPC; memory.wipe_all clears the vault ledger; memory_tree_doctor (+ memory_doctor agent tool) walks each stage and reports the single first blocking cause + fix.

Submission Checklist

  • Tests added or updated (happy path + at least one failure / edge case) — typed-failure classification, fail-fast vs retry, skip-not-inert, OpenAI dimensions sent/omitted, openai_compat factory routing (read + write paths), summarizer cloud-fallback + partial-success, per-flag degraded-cause atomics, requeue_failed, extraction coverage, doctor stage ordering, status-panel Vitest.
  • Diff coverage ≥ 80% — each new/changed path has unit + (where applicable) Vitest/RPC coverage. Note: CI's diff-cover Coverage Gate is currently skipped; the red Rust Core Coverage job is a flaky upstream tests/*_raw_coverage_e2e.rs test (a different one trips per run — provider_admin_… then approval_gate_…, both in domains this PR doesn't touch), not a diff-coverage shortfall. Happy to confirm against a green re-run.
  • Coverage matrix updated — N/A (no new rows required; the Coverage Matrix Sync check passes). Happy to add feature-ID rows to docs/TEST-COVERAGE-MATRIX.md on request.
  • No new external network dependencies introduced (mock-backed embedding tests; no live calls).
  • Manual smoke checklist — N/A for CI; live-tested end-to-end in the desktop app (see metadata below: degraded banner, vault re-ingest, OpenAI 3-large @1024, 113 chunks / 288.8 KiB wiki).
  • Linked issue — no pre-existing filed issue; these are proactively-reproduced bugs (spec-kit specs/002-memory-pipeline-hardening). Happy to file + link a tracking issue if maintainers prefer.

Impact

Desktop (Rust core + React + CLI + agent tool). Additive SQLite migration (nullable failure_reason/failure_class columns; no backfill). No new network deps. Behavioural: a broken workspace now reports degraded/error with an actionable cause instead of a misleading running; "Build Summary Trees" works on cloud-only setups. Relaxes the prior local-only summarizer privacy default (a persistent "summaries are cloud-processed" UI indicator is a planned follow-up).

Heads-up for reviewers: now rebased onto current main (was authored against the pre-#3059/#2991 base) — the branch is MERGEABLE. Re-homed across the refactors that landed since: the chunks/store.rs split (#2991), the i18n chunk→flat-locale revert, the tools/impl/memorymemory/tools rename, and the about_app CAPABILITIEScatalog_data.rs move. The vault domain was removed upstream, so its ingest-ledger wipe hook and the auto-retry-on-sync trigger were dropped (the manual memory_tree_retry_failed RPC + requeue_failed remain) — open question: want auto-retry-on-sync re-homed onto the new source-trees sync path, here or as a follow-up? Splitting into smaller PRs is also easy on request.

Related


AI Authored PR Metadata (required for Codex/Linear PRs)

  • Agent: Claude Code (Opus 4.8)
  • Spec: local specs/002-memory-pipeline-hardening/{spec,plan,tasks}.md (spec-kit driven)
  • Human reviewer: @zahica1234 (live-tested end-to-end in the desktop app: degraded banner renders, vault re-ingests, OpenAI 3-large @ 1024 drives memory-tree embeddings — 113 chunks / 288.8 KiB wiki populated).

Summary by CodeRabbit

  • New Features

    • "Degraded" memory pipeline status with amber badge, contextual remediation banner, and recall/structure badges.
    • Memory Pipeline Doctor tool to run a self-diagnostic report.
    • Retry-failed-jobs action for requeueing terminal failures.
    • Support for OpenAI-compatible embedding providers and an opt-in cloud summarization fallback.
  • Improvements

    • Extraction coverage metric (percentage) shown when available.
    • More precise, actionable remediation messages and better failure classification.
  • Documentation & Localization

    • New translations for memory health UI in 10+ languages.

@zahica1234 zahica1234 requested a review from a team May 31, 2026 07:51
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 31, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds typed failure/degraded-state tracking, a one-shot doctor diagnostic, optional write-path embedder with degraded signaling, extraction-coverage metric, queue typed failure persistence and requeue, summarizer provider/model refactor, RPC + frontend UI surfaces for degradation, and i18n entries plus tests.

Changes

Memory Pipeline Health & Diagnostics

Layer / File(s) Summary
Frontend UI, types, and tests
app/src/components/intelligence/MemoryTreeStatusPanel.tsx, app/src/components/intelligence/MemoryTreeStatusPanel.test.tsx, app/src/utils/tauriCommands/memoryTree.ts
Adds degraded status styling/label, renders amber blocking-cause banner (localized via remediation_key), conditional recall/structure badges, extraction-coverage percent; updates TypeScript pipeline status types and adds three Vitest tests covering degraded recall, degraded structure, and healthy cases.
i18n additions
app/src/lib/i18n/{en,ar,bn,de,es,fr,hi,id,it,ko,pl,pt,ru,zh-CN}.ts
Adds memoryTree.status.statusDegraded, degradedRecall, degradedStructure, extractionCoverage and multiple memory.health.remediation.* typed remediation messages across locales.
Tauri commands & schema wiring
app/src/utils/tauriCommands/memoryTree.ts, src/openhuman/memory/schema.rs
Extends MemoryTree pipeline kinds to include degraded; introduces MemoryTreePipelineFailure and MemoryTreeDegradedState types; wires new doctor and retry_failed controllers and extends pipeline_status outputs to include degraded snapshot, first blocking cause, and extraction coverage.
Embeddings: send-dimensions & tests
src/openhuman/embeddings/openai.rs, src/openhuman/embeddings/factory.rs, src/openhuman/embeddings/openai_tests.rs
Adds opt-in send_dimensions to OpenAiEmbedding, gates sending "dimensions" to compatible models, propagates flag in provider factories, and adds tests asserting JSON payload includes/omits dimensions.
OpenAI-compatible adapter & write embedder
src/openhuman/memory_tree/score/embed/openai_compat.rs, src/openhuman/memory_tree/score/embed/factory.rs
Introduces OpenAiCompatEmbedder enforcing pinned embedding dim; refactors embedder resolution ladder and adds build_write_embedder returning Ok(None) when no provider is usable, marking semantic-recall degraded; includes tests for write-path behaviors.
Extraction coverage & DB schema
src/openhuman/memory_store/chunks/store.rs, src/openhuman/memory_store/chunks/store_tests.rs, src/openhuman/memory_store/chunks/connection.rs
Adds extraction_coverage(config) helper computing fraction of chunks with entity indexes; extends DB schema DDL to add nullable failure_reason/failure_class columns and unit tests validating coverage.
Typed failure model & doctor diagnostic
src/openhuman/memory_tree/health/mod.rs, src/openhuman/memory_tree/health/doctor.rs
Adds FailureCode/FailureClass, PipelineFailure serializable error with remediation key, embedding error classification, DegradedState with process-global atomics and test_guard(), and run_doctor/async_run_doctor producing DoctorReport with stage health and first blocking cause.
Extraction, sealing, and handlers
src/openhuman/memory_tree/score/extract/llm.rs, src/openhuman/memory_tree/tree/bucket_seal.rs, src/openhuman/memory_queue/handlers/mod.rs
Clears structure degraded only when extraction returns entities/topics; marks structure degraded on transport retry exhaustion; sealing uses build_write_embedder and proceeds with optional embedding: None or aborts with typed error on embed failure; handlers skip embedding when write embedder is unavailable and classify embed errors.
Queue: typed failure persistence & worker
src/openhuman/memory_queue/types.rs, src/openhuman/memory_queue/store.rs, src/openhuman/memory_queue/worker.rs
Adds failure_reason/failure_class to Job; implements mark_failed_typed to persist typed failures on unrecoverable/budget-exhausted terminal failures; adds requeue_failed to reset terminal failures; worker downcasts PipelineFailure and calls typed settlement; unit tests added/updated.
Pipeline RPC & status integration
src/openhuman/memory_tree/tree/rpc.rs, src/openhuman/memory/schema.rs
Extends pipeline_status_rpc to include degraded snapshot, derive degraded precedence, compute first_blocking_cause via latest-failed typed failure fallback, compute extraction_coverage best-effort, and exposes doctor_rpc and retry_failed_rpc.
Summarizer provider refactor & engine
src/openhuman/memory_tree/tree_runtime/ops.rs, src/openhuman/memory_tree/tree_runtime/engine.rs
Refactors create_provider to return (provider, model) with cloud summarization opt-in, adds summarizer_available, parameterizes run_summarization/rebuild_tree with model: &str, and makes ancestor propagation continue-on-error collecting partial failures; tests updated to pass explicit model.
Tools & capability
src/openhuman/memory/tools/doctor.rs, src/openhuman/about_app/catalog_data.rs, src/openhuman/tools/ops.rs
Adds MemoryDoctorTool (agent tool) that runs async_run_doctor and returns pretty JSON; registers capability intelligence.memory_pipeline_doctor and registers tool in runtime tools list.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • graycyrus
  • M3gA-Mind
  • sanil-23
  • oxoxDev

Poem

🐰 I hopped through code with tiny care,

Atom flags and failures, everywhere,
A doctor checks the pipeline's heart,
Remediation keys to play their part,
Hooray — the memory hops to repair!

@coderabbitai coderabbitai Bot added feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/. working A PR that is being worked on by the team. labels May 31, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/openhuman/memory/schema.rs (1)

712-779: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Update the pipeline_status schema to match the new wire contract.

The runtime now returns a degraded status plus degraded, first_blocking_cause, and extraction_coverage fields, but the advertised schema still exposes only the old enum and field set. That leaves schema-driven CLI/JSON-RPC consumers with a stale contract and can make "degraded" look invalid even though the handler now emits it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory/schema.rs` around lines 712 - 779, The pipeline_status
ControllerSchema is stale: add the "degraded" enum variant to the status
TypeSchema::Enum and append the three new output fields to the outputs vec for
pipeline_status—add FieldSchema entries for "degraded" (TypeSchema::Bool,
required true), "first_blocking_cause"
(TypeSchema::Option(Box::new(TypeSchema::String)), required false), and
"extraction_coverage" (TypeSchema::F64, required true) so the advertised schema
matches the runtime handler pipeline_status and its emitted fields.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/src/lib/i18n/chunks/es-1.ts`:
- Around line 209-220: The Spanish i18n entries for the memory health keys are
still in English; update the strings in the Spanish chunk (es-1.ts) for the keys
memoryTree.status.statusDegraded, memoryTree.status.degradedRecall,
memoryTree.status.degradedStructure, memoryTree.status.extractionCoverage,
memory.health.remediation.budget_exhausted,
memory.health.remediation.auth_missing, memory.health.remediation.auth_invalid,
memory.health.remediation.embeddings_unconfigured,
memory.health.remediation.embedding_dim_mismatch,
memory.health.remediation.local_model_unavailable,
memory.health.remediation.extraction_timeout, and
memory.health.remediation.transient with correct Spanish translations (preserve
placeholders like {pct} and numeric values such as 1024 and keep meaning intact)
so the Spanish locale no longer ships English text.

In `@app/src/lib/i18n/en.ts`:
- Around line 320-343: The Polish locale is missing translations for the new
memory/status and memory.health.remediation keys; update app/src/lib/i18n/pl.ts
and the split chunk files (pl-1.ts … pl-5.ts) to include Polish translations for
memoryTree.status.statusDegraded, memoryTree.status.degradedRecall,
memoryTree.status.degradedStructure, memoryTree.status.extractionCoverage (keep
the {pct} placeholder intact), and all memory.health.remediation.* keys
(budget_exhausted, auth_missing, auth_invalid, embeddings_unconfigured,
embedding_dim_mismatch, local_model_unavailable, extraction_timeout, transient).
Provide real Polish text (not English), preserve keys exactly, and ensure string
placeholders and punctuation match the en.ts originals so i18n parity checks
pass.

In `@src/openhuman/memory_tree/health/doctor.rs`:
- Around line 94-106: run_doctor currently performs blocking SQLite reads (calls
to chunks::count_chunks, queue::count_by_status and chunks::extraction_coverage)
which can block Tokio workers when invoked from async contexts; wrap the
synchronous diagnostic in a blocking task instead—either provide an async
wrapper like async_run_doctor that calls tokio::task::spawn_blocking(||
run_doctor(config.clone())) and returns the DoctorReport, or change callers to
offload run_doctor via spawn_blocking before calling it; update references to
run_doctor in RPC and tool call sites to use the new async wrapper to ensure DB
reads run off the async reactor thread.
- Around line 186-200: The doctor currently flags "summary_tree" as bad solely
based on config.local_ai.runtime_enabled; instead, use the same provider
resolution and capability check used by tree_runtime::create_provider to
determine if a summarization provider is available (local or cloud) and only
mark StageHealth::bad when no provider/capability is resolvable; replace the
hard-coded check of config.local_ai.runtime_enabled in the summary_tree branch
with the provider-capability check (the same logic used on the runtime path),
and if no provider is found emit StageHealth::bad with an appropriate
PipelineFailure (rather than FailureCode::LocalModelUnavailable), otherwise
return StageHealth::ok as before.

In `@src/openhuman/memory_tree/health/mod.rs`:
- Around line 327-429: DEGRADED_CAUSE is a single shared slot so whichever
mark_* runs last overwrites the cause for both flags; change to track causes
per-flag by introducing two AtomicU8s (e.g. DEGRADED_CAUSE_SEMANTIC and
DEGRADED_CAUSE_STRUCTURE) and update mark_semantic_recall_degraded,
mark_structure_degraded to store into their respective atomics, update
clear_semantic_recall_degraded and clear_structure_degraded to clear only their
atomic and only zero the respective cause when the corresponding flag is
cleared, modify current_degraded_state to read the correct cause (semantic uses
DEGRADED_CAUSE_SEMANTIC, structure uses DEGRADED_CAUSE_STRUCTURE) and wrap the
cause construction accordingly, and ensure test_guard resets both new atomics to
0.

In `@src/openhuman/memory_tree/tree/rpc.rs`:
- Around line 496-510: The call to latest_failed_job_failure is running on the
async runtime thread and can block; move it off the Tokio worker by executing it
inside a spawn_blocking (or into the same blocking block used for
extraction_coverage) and then await the JoinHandle before using
first_blocking_cause; specifically, change how first_blocking_cause is computed
so that latest_failed_job_failure(config) is invoked inside a
tokio::task::spawn_blocking closure (or folded into the existing spawn_blocking
that computes extraction_coverage) and then unwrap/handle the JoinHandle result
to produce the same Option value for first_blocking_cause.

---

Outside diff comments:
In `@src/openhuman/memory/schema.rs`:
- Around line 712-779: The pipeline_status ControllerSchema is stale: add the
"degraded" enum variant to the status TypeSchema::Enum and append the three new
output fields to the outputs vec for pipeline_status—add FieldSchema entries for
"degraded" (TypeSchema::Bool, required true), "first_blocking_cause"
(TypeSchema::Option(Box::new(TypeSchema::String)), required false), and
"extraction_coverage" (TypeSchema::F64, required true) so the advertised schema
matches the runtime handler pipeline_status and its emitted fields.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 56bbe18e-e9ea-4870-bebb-9ce693795b3a

📥 Commits

Reviewing files that changed from the base of the PR and between fbe89ba and 5685062.

📒 Files selected for processing (54)
  • app/src/components/intelligence/MemoryTreeStatusPanel.test.tsx
  • app/src/components/intelligence/MemoryTreeStatusPanel.tsx
  • app/src/lib/i18n/chunks/ar-1.ts
  • app/src/lib/i18n/chunks/bn-1.ts
  • app/src/lib/i18n/chunks/de-1.ts
  • app/src/lib/i18n/chunks/en-1.ts
  • app/src/lib/i18n/chunks/es-1.ts
  • app/src/lib/i18n/chunks/fr-1.ts
  • app/src/lib/i18n/chunks/hi-1.ts
  • app/src/lib/i18n/chunks/id-1.ts
  • app/src/lib/i18n/chunks/it-1.ts
  • app/src/lib/i18n/chunks/ko-1.ts
  • app/src/lib/i18n/chunks/pt-1.ts
  • app/src/lib/i18n/chunks/ru-1.ts
  • app/src/lib/i18n/chunks/zh-CN-1.ts
  • app/src/lib/i18n/en.ts
  • app/src/utils/tauriCommands/memoryTree.ts
  • src/openhuman/about_app/catalog.rs
  • src/openhuman/embeddings/factory.rs
  • src/openhuman/embeddings/mod.rs
  • src/openhuman/embeddings/openai.rs
  • src/openhuman/embeddings/openai_tests.rs
  • src/openhuman/embeddings/rpc.rs
  • src/openhuman/memory/read_rpc.rs
  • src/openhuman/memory/schema.rs
  • src/openhuman/memory_queue/handlers/mod.rs
  • src/openhuman/memory_queue/store.rs
  • src/openhuman/memory_queue/types.rs
  • src/openhuman/memory_queue/worker.rs
  • src/openhuman/memory_store/chunks/store.rs
  • src/openhuman/memory_store/chunks/store_tests.rs
  • src/openhuman/memory_tree/health/doctor.rs
  • src/openhuman/memory_tree/health/mod.rs
  • src/openhuman/memory_tree/mod.rs
  • src/openhuman/memory_tree/retrieval/integration_test.rs
  • src/openhuman/memory_tree/score/embed/factory.rs
  • src/openhuman/memory_tree/score/embed/mod.rs
  • src/openhuman/memory_tree/score/embed/openai_compat.rs
  • src/openhuman/memory_tree/score/extract/llm.rs
  • src/openhuman/memory_tree/score/extract/llm_tests.rs
  • src/openhuman/memory_tree/tree/bucket_seal.rs
  • src/openhuman/memory_tree/tree/bucket_seal_tests.rs
  • src/openhuman/memory_tree/tree/rpc.rs
  • src/openhuman/memory_tree/tree_runtime/cli.rs
  • src/openhuman/memory_tree/tree_runtime/engine.rs
  • src/openhuman/memory_tree/tree_runtime/engine_tests.rs
  • src/openhuman/memory_tree/tree_runtime/ops.rs
  • src/openhuman/tools/impl/memory/doctor.rs
  • src/openhuman/tools/impl/memory/mod.rs
  • src/openhuman/tools/ops.rs
  • src/openhuman/vault/mod.rs
  • src/openhuman/vault/store.rs
  • src/openhuman/vault/sync.rs
  • src/openhuman/vault/tests.rs

Comment thread app/src/lib/i18n/chunks/es-1.ts Outdated
Comment thread app/src/lib/i18n/en.ts
Comment thread src/openhuman/memory_tree/health/doctor.rs
Comment thread src/openhuman/memory_tree/health/doctor.rs Outdated
Comment thread src/openhuman/memory_tree/health/mod.rs
Comment thread src/openhuman/memory_tree/tree/rpc.rs Outdated
zahica1234 pushed a commit to zahica1234/openhuman that referenced this pull request May 31, 2026
Six findings from the CodeRabbit review of PR tinyhumansai#3076:

1. memory/schema.rs — pipeline_status ControllerSchema was stale. Add
   `degraded` to the status enum and append the three #002 output fields
   (`degraded`, `first_blocking_cause`, `extraction_coverage`) with shapes
   matching the response struct (DegradedState / Option<PipelineFailure> as
   Json, f32 as F64). Update the struct doc-comment precedence to include
   `degraded`.

2. i18n — es-1.ts shipped English placeholders for the 12 #002 keys
   (memoryTree.status.* + memory.health.remediation.*). Replace with real
   Spanish translations. Other locales keep English placeholders per the
   repo convention (i18n:check parity holds; 0 missing).

3. doctor — expose run_doctor via an async_run_doctor wrapper that runs the
   blocking diagnostic inside spawn_blocking; route doctor_rpc and
   MemoryDoctorTool through it so the diagnostic never blocks a Tokio worker.

4. doctor stage 5 — stop auto-failing the summary-tree precondition when
   local AI is off. Add tree_runtime::ops::summarizer_available() which
   mirrors create_provider's decision (local AI on → ok; else confirm a
   cloud summarization provider resolves); stage health now reflects it.

5. health — split the single shared DEGRADED_CAUSE atomic into per-flag
   SEMANTIC_RECALL_CAUSE / STRUCTURE_CAUSE so clearing one degradation no
   longer wipes the other's cause. Add a regression test.

6. tree/rpc.rs — fold latest_failed_job_failure + extraction_coverage into a
   single spawn_blocking so neither SQLite read runs on a Tokio worker.

Tests: memory_tree::health (22) + memory_tree::tree::rpc (15) green;
cargo fmt + clippy clean; pnpm i18n:check parity holds.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
src/openhuman/memory_tree/tree_runtime/ops.rs (1)

181-212: ⚡ Quick win

Keep summarizer resolution behind one helper.

summarizer_available() now reimplements the same local/cloud branch that create_provider() already owns. That fixes today's bug, but it reintroduces a drift point between the doctor path and the runtime path the next time provider selection changes. Please extract a single private resolver and have both functions call through it.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_tree/tree_runtime/ops.rs` around lines 181 - 212,
summarizer_available() duplicates the local-vs-cloud selection logic that
create_provider()/create_chat_provider already implements; extract that decision
into one private helper (e.g., fn resolve_summarizer_availability(config:
&Config) -> (bool, &'static str) or a Result-style resolver) and have both
summarizer_available and the provider factory path
(create_provider/create_chat_provider callers) call that helper instead of
reimplementing the branch; update summarizer_available to return the helper's
tuple and remove the direct call to create_chat_provider so both codepaths share
the single resolver.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/memory_tree/health/doctor.rs`:
- Around line 196-199: The current StageHealth::bad call for "summary_tree"
constructs a PipelineFailure with FailureCode::LocalModelUnavailable even when
the code path is the "no summarization provider available" case; update the
PipelineFailure to use a failure code that reflects "no summarization provider
resolved" (or create/use an existing enum variant like
FailureCode::NoSummarizationProvider or FailureCode::ProviderUnavailable)
instead of FailureCode::LocalModelUnavailable, or conditionally choose
FailureCode::LocalModelUnavailable only when the local-model path was actually
selected; keep the StageHealth::bad("summary_tree", PipelineFailure::new(...),
summary_note) call structure but replace the FailureCode argument accordingly so
first_blocking_cause and remediation banners are correct.

In `@src/openhuman/memory/schema.rs`:
- Around line 802-809: The field "extraction_coverage" in FieldSchema is
currently typed as TypeSchema::F64 and required: true and uses 0.0 to signal
both genuine zero and DB/read failure; change the schema so failures are
distinguishable by either making extraction_coverage nullable (e.g. change
TypeSchema to a nullable/optional float and set required: false) or by adding a
companion boolean/status field (e.g. "extraction_coverage_available" or
"extraction_coverage_status") that indicates whether the metric is valid; update
the FieldSchema for "extraction_coverage" and any consumers of this value
(parsers/serializers/validators) to handle Option<f64> or to consult the new
availability field instead of treating 0.0 as an error sentinel.
- Around line 781-789: The FieldSchema entry for name "degraded" currently
documents a single shared "cause" but the implementation now splits causes
per-flag; update the comment on the FieldSchema (the "degraded" TypeSchema::Json
entry) to describe the new JSON shape that exposes independent tracking for
semantic recall and structure (e.g. an object with separate entries for
"semantic_recall" and "structure", each carrying its degraded boolean and its
own cause field like PipelineFailure|null), and make the rustdoc text explain
that each sub-flag has an independent cause rather than a shared "cause".

---

Nitpick comments:
In `@src/openhuman/memory_tree/tree_runtime/ops.rs`:
- Around line 181-212: summarizer_available() duplicates the local-vs-cloud
selection logic that create_provider()/create_chat_provider already implements;
extract that decision into one private helper (e.g., fn
resolve_summarizer_availability(config: &Config) -> (bool, &'static str) or a
Result-style resolver) and have both summarizer_available and the provider
factory path (create_provider/create_chat_provider callers) call that helper
instead of reimplementing the branch; update summarizer_available to return the
helper's tuple and remove the direct call to create_chat_provider so both
codepaths share the single resolver.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 43f0a5ee-3097-4ebd-a9d1-ba8cddafedab

📥 Commits

Reviewing files that changed from the base of the PR and between 5685062 and 8c1171c.

⛔ Files ignored due to path filters (1)
  • app/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (19)
  • app/src/lib/i18n/chunks/ar-1.ts
  • app/src/lib/i18n/chunks/bn-1.ts
  • app/src/lib/i18n/chunks/de-1.ts
  • app/src/lib/i18n/chunks/en-1.ts
  • app/src/lib/i18n/chunks/es-1.ts
  • app/src/lib/i18n/chunks/fr-1.ts
  • app/src/lib/i18n/chunks/hi-1.ts
  • app/src/lib/i18n/chunks/id-1.ts
  • app/src/lib/i18n/chunks/it-1.ts
  • app/src/lib/i18n/chunks/ko-1.ts
  • app/src/lib/i18n/chunks/pt-1.ts
  • app/src/lib/i18n/chunks/ru-1.ts
  • app/src/lib/i18n/chunks/zh-CN-1.ts
  • src/openhuman/memory/schema.rs
  • src/openhuman/memory_tree/health/doctor.rs
  • src/openhuman/memory_tree/health/mod.rs
  • src/openhuman/memory_tree/tree/rpc.rs
  • src/openhuman/memory_tree/tree_runtime/ops.rs
  • src/openhuman/tools/impl/memory/doctor.rs
✅ Files skipped from review due to trivial changes (4)
  • app/src/lib/i18n/chunks/en-1.ts
  • app/src/lib/i18n/chunks/bn-1.ts
  • app/src/lib/i18n/chunks/id-1.ts
  • app/src/lib/i18n/chunks/pt-1.ts
🚧 Files skipped from review as they are similar to previous changes (9)
  • app/src/lib/i18n/chunks/ko-1.ts
  • app/src/lib/i18n/chunks/it-1.ts
  • app/src/lib/i18n/chunks/fr-1.ts
  • app/src/lib/i18n/chunks/ru-1.ts
  • app/src/lib/i18n/chunks/ar-1.ts
  • app/src/lib/i18n/chunks/es-1.ts
  • app/src/lib/i18n/chunks/zh-CN-1.ts
  • src/openhuman/memory_tree/tree/rpc.rs
  • src/openhuman/memory_tree/health/mod.rs

Comment thread src/openhuman/memory_tree/health/doctor.rs
Comment thread src/openhuman/memory/schema.rs Outdated
Comment on lines +781 to +789
FieldSchema {
name: "degraded",
ty: TypeSchema::Json,
comment: "#002 (FR-002/FR-004): object `{ semantic_recall: bool, \
structure: bool, cause: PipelineFailure | null }`. The \
pipeline ran but output quality is reduced — \
`semantic_recall` when embeddings were skipped, \
`structure` when extraction yielded nothing. Distinct \
from a hard `error`. Always present (serde default).",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Update the advertised degraded shape to match the per-flag cause split.

This schema still documents a single shared cause, but the hardening work explicitly split degradation causes so semantic-recall and structure can be tracked independently. Since controller schemas are the RPC contract, this will point CLI/JSON-RPC consumers at the wrong JSON shape. As per coding guidelines, "Ship rustdoc / code comments with new/changed Rust behavior; update AGENTS.md or architecture docs when rules or user-visible behavior change".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory/schema.rs` around lines 781 - 789, The FieldSchema entry
for name "degraded" currently documents a single shared "cause" but the
implementation now splits causes per-flag; update the comment on the FieldSchema
(the "degraded" TypeSchema::Json entry) to describe the new JSON shape that
exposes independent tracking for semantic recall and structure (e.g. an object
with separate entries for "semantic_recall" and "structure", each carrying its
degraded boolean and its own cause field like PipelineFailure|null), and make
the rustdoc text explain that each sub-flag has an independent cause rather than
a shared "cause".

Comment thread src/openhuman/memory/schema.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/memory_tree/tree/rpc.rs`:
- Around line 505-515: The code silently swallows errors from
latest_failed_job_failure and extraction_coverage; change the spawn_blocking
closure so each fallible call logs diagnostics before returning None — e.g.,
call latest_failed_job_failure(&cfg).map_err(|e| { tracing::debug!("memory-tree:
failed to read latest_failed_job_failure: {:?}", e); e }).ok() (and similarly
for crate::openhuman::memory_store::chunks::store::extraction_coverage(&cfg)),
and avoid the final .unwrap_or((None, None)) swallowing errors without logs;
preserve the None fallbacks but emit a stable, grep-friendly debug/tracing
message with the existing prefix and correlation fields when errors occur so
failures are distinguishable from empty results.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ce4c6f52-a46d-4313-ac23-e8ab3b50d492

📥 Commits

Reviewing files that changed from the base of the PR and between 7daaadd and a2fbc30.

📒 Files selected for processing (20)
  • app/src/lib/i18n/chunks/ar-1.ts
  • app/src/lib/i18n/chunks/bn-1.ts
  • app/src/lib/i18n/chunks/de-1.ts
  • app/src/lib/i18n/chunks/en-1.ts
  • app/src/lib/i18n/chunks/es-1.ts
  • app/src/lib/i18n/chunks/fr-1.ts
  • app/src/lib/i18n/chunks/hi-1.ts
  • app/src/lib/i18n/chunks/id-1.ts
  • app/src/lib/i18n/chunks/it-1.ts
  • app/src/lib/i18n/chunks/ko-1.ts
  • app/src/lib/i18n/chunks/pt-1.ts
  • app/src/lib/i18n/chunks/ru-1.ts
  • app/src/lib/i18n/chunks/zh-CN-1.ts
  • app/src/lib/i18n/en.ts
  • app/src/utils/tauriCommands/memoryTree.ts
  • src/openhuman/memory/schema.rs
  • src/openhuman/memory_tree/health/doctor.rs
  • src/openhuman/memory_tree/health/mod.rs
  • src/openhuman/memory_tree/tree/rpc.rs
  • src/openhuman/memory_tree/tree_runtime/ops.rs
✅ Files skipped from review due to trivial changes (7)
  • app/src/lib/i18n/chunks/it-1.ts
  • app/src/lib/i18n/chunks/de-1.ts
  • app/src/lib/i18n/chunks/ar-1.ts
  • app/src/lib/i18n/chunks/pt-1.ts
  • app/src/lib/i18n/chunks/es-1.ts
  • app/src/lib/i18n/chunks/ko-1.ts
  • app/src/lib/i18n/chunks/fr-1.ts
🚧 Files skipped from review as they are similar to previous changes (12)
  • app/src/lib/i18n/en.ts
  • app/src/utils/tauriCommands/memoryTree.ts
  • app/src/lib/i18n/chunks/hi-1.ts
  • app/src/lib/i18n/chunks/en-1.ts
  • app/src/lib/i18n/chunks/ru-1.ts
  • app/src/lib/i18n/chunks/bn-1.ts
  • app/src/lib/i18n/chunks/zh-CN-1.ts
  • app/src/lib/i18n/chunks/id-1.ts
  • src/openhuman/memory/schema.rs
  • src/openhuman/memory_tree/tree_runtime/ops.rs
  • src/openhuman/memory_tree/health/doctor.rs
  • src/openhuman/memory_tree/health/mod.rs

Comment thread src/openhuman/memory_tree/tree/rpc.rs
zahica1234 pushed a commit to zahica1234/openhuman that referenced this pull request May 31, 2026
Six findings from the CodeRabbit review of PR tinyhumansai#3076:

1. memory/schema.rs — pipeline_status ControllerSchema was stale. Add
   `degraded` to the status enum and append the three #002 output fields
   (`degraded`, `first_blocking_cause`, `extraction_coverage`) with shapes
   matching the response struct (DegradedState / Option<PipelineFailure> as
   Json, f32 as F64). Update the struct doc-comment precedence to include
   `degraded`.

2. i18n — es-1.ts shipped English placeholders for the 12 #002 keys
   (memoryTree.status.* + memory.health.remediation.*). Replace with real
   Spanish translations. Other locales keep English placeholders per the
   repo convention (i18n:check parity holds; 0 missing).

3. doctor — expose run_doctor via an async_run_doctor wrapper that runs the
   blocking diagnostic inside spawn_blocking; route doctor_rpc and
   MemoryDoctorTool through it so the diagnostic never blocks a Tokio worker.

4. doctor stage 5 — stop auto-failing the summary-tree precondition when
   local AI is off. Add tree_runtime::ops::summarizer_available() which
   mirrors create_provider's decision (local AI on → ok; else confirm a
   cloud summarization provider resolves); stage health now reflects it.

5. health — split the single shared DEGRADED_CAUSE atomic into per-flag
   SEMANTIC_RECALL_CAUSE / STRUCTURE_CAUSE so clearing one degradation no
   longer wipes the other's cause. Add a regression test.

6. tree/rpc.rs — fold latest_failed_job_failure + extraction_coverage into a
   single spawn_blocking so neither SQLite read runs on a Tokio worker.

Tests: memory_tree::health (22) + memory_tree::tree::rpc (15) green;
cargo fmt + clippy clean; pnpm i18n:check parity holds.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@zahica1234 zahica1234 force-pushed the pr/memory-pipeline-hardening branch from a2fbc30 to 77da42d Compare May 31, 2026 13:57
@zahica1234
Copy link
Copy Markdown
Contributor Author

Rebased onto current main (was based on the pre-refactor base). The branch is now MERGEABLE. Notable re-homing across refactors that landed since the original base:

  • i18n: re-added the #002 keys to the flat <locale>.ts files (the per-locale chunks/<locale>-N.ts split was reverted upstream). pnpm i18n:check parity holds across all locales.
  • memory_store/chunks (refactor(rust): split oversized modules #2991 split): re-applied the failure_reason/failure_class CREATE TABLE columns in store.rs and the add_column_if_missing migration in connection.rs::apply_schema.
  • tools/impl/memorymemory/tools: the pipeline doctor tool landed at the new path; module + export wired.
  • about_app: the Memory Pipeline Doctor capability moved into catalog_data.rs.
  • vault removed upstream: the vault-ingest-ledger wipe hook and the auto-retry-on-sync hook targeted the deleted vault domain and were dropped. The manual memory_tree_retry_failed RPC + requeue_failed remain. Follow-up: if the "auto-retry failed jobs on sync" behaviour is still wanted, it needs a new hook on the source-trees sync path — happy to add it in this PR or a follow-up, your call.

Also addressed both CodeRabbit review passes (typed SummarizerUnavailable cause, per-flag degraded atomics, async_run_doctor off the Tokio worker, extraction_coverage nullable so a DB error ≠ 0%, schema/i18n parity). Local validation: memory_tree/memory_queue/memory_store/embeddings/about_app suites green; tsc clean; cargo fmt/clippy clean.

@zahica1234
Copy link
Copy Markdown
Contributor Author

Re the red "Rust Core Coverage" job — I believe this is a pre-existing flaky *_raw_coverage_e2e test, not anything in this diff:

  • The specific failing test changes between runs: one run dies on provider_admin_model_listing_covers_openrouter_validation_and_local_synthesis, the next on approval_gate_rpc_decision_resumes_parked_tool_and_records_execution (panics on the debug_assert! at src/openhuman/approval/gate.rs:133).
  • Both live in tests/*_raw_coverage_e2e.rs and exercise process-global singletons (the inference-provider registry; the approval gate). Packed into one process by the coverage runner, they contaminate each other — classic flaky-under-coverage.
  • This PR touches none of: approval/, the inference-provider domain, or those test files (git diff --name-only main... confirms).
  • On the latest run, Frontend Coverage (Vitest), Rust E2E, and all four Playwright lanes flipped back to green — so those were transient too. The diff-cover Coverage Gate is skipped, and the Core Coverage job dies because a test panics mid-collection, not because changed-line coverage is low.

Could you kick off a re-run of Rust Core Coverage? I'd rather not push no-op commits just to re-roll the dice. All the deterministic gates that exercise this diff are green: Rust Quality (fmt/clippy), Frontend Quality (tsc/lint/format), i18n parity, Coverage Matrix Sync, and CodeRabbit.

Also updated the PR description: rebased onto current main (now MERGEABLE), filled the submission checklist, and flagged one open question — whether you'd like the auto-retry-on-sync trigger re-homed onto the new source-trees sync path (its old vault::sync host was removed upstream), here or as a follow-up.

@sanil-23 sanil-23 assigned sanil-23 and unassigned sanil-23 May 31, 2026
zahica1234 and others added 14 commits June 1, 2026 08:26
… + embed-error classifier

Foundational layer for the memory-pipeline-hardening work (specs/002):

- New src/openhuman/memory_tree/health/ module: FailureCode / FailureClass /
  PipelineFailure (impls std::error::Error so it downcasts from anyhow) /
  DegradedState. One source of truth for retry-vs-fail + i18n remediation keys.
- classify_embed_error: maps the embed stage's anyhow error (the
  "Embedding API error (<status>): <body>" shape from embeddings::openai, plus
  dim-mismatch text) to a typed PipelineFailure — 401/403 auth_invalid,
  402/429/budget-wording budget_exhausted, dim text embedding_dim_mismatch,
  5xx/transport transient (worker retry/backoff still applies).
- mem_tree_jobs: additive failure_reason / failure_class columns (CREATE-TABLE
  for fresh DBs + idempotent ALTER migration for existing); threaded through the
  Job struct, row_to_job, and the claim_next RETURNING / get_job SELECT lists.

Tests: 182 green across memory_tree::health, memory_queue, memory_store::chunks
(typed-failure roundtrip, fresh+upgraded schema, all classify_embed_error cases).
No behavior change yet — the worker still uses the existing retry path; fail-fast
wiring (T012) and the skip-not-inert factory change (T010/T011) come next.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wire the typed PipelineFailure end-to-end so the job worker stops burning
the retry budget on hard, non-recoverable embed errors (specs/002 FR-003):

- store::mark_failed_typed: new variant taking Option<&PipelineFailure>.
  Unrecoverable (budget/auth/dim) → terminate as `failed` immediately on the
  first attempt and persist failure_reason / failure_class; transient (and the
  untyped None case) keep the existing attempts-bounded retry+backoff. The
  claim-token (attempts + started_at_ms) gate is preserved on every branch.
  mark_failed now delegates with None (existing callers/tests unchanged).
- worker::run_once Err-arm downcasts the anyhow chain to PipelineFailure and
  passes it to mark_failed_typed (+ logs the typed reason code).
- The two job-path embed sites (extract handler, bucket_seal) classify the
  embed error via health::classify_embed_error and attach the typed failure
  as the outer (downcastable) error, preserving the original chain as context.

Tests: mark_failed_typed_unrecoverable_terminates_immediately (budget 4xx →
failed on attempt 1, typed cols set) + mark_failed_typed_transient_still_retries
(503 → ready, typed cols unset). memory_queue::store 20/0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…vider (T010/T011)

vectors that poison semantic recall and present a degraded result as success:

- health: process-global degraded flags (semantic_recall / structure) with a
  typed cause, plus mark_/clear_/current_degraded_state(). The embed/extract
  stages set them deep in the worker; the status/doctor surface reads them
  without threading a return type through every call site.
- score/embed/factory::build_write_embedder: write-path variant returning
  Result<Option<Box<dyn Embedder>>>. Same resolution as the read-path factory
  for every real provider (Ollama override / local / cloud session), but the
  terminal no-provider branch returns None (skip) instead of InertEmbedder, and
  marks semantic_recall degraded with EmbeddingsUnconfigured. embeddings_provider
  ="none" stays Some(inert) — a deliberate opt-out, not a degradation.
- bucket_seal + extract handler: use build_write_embedder; on None, seal/persist
  the chunk/summary embedding-less (re-embeddable later) and log the skip; on a
  successful embed, clear the degraded flag.
- integration_test: test_config opts into provider="none" so the existing
  "embeddings populate end-to-end" tests keep their deterministic inert path
  (the old silent fallback they relied on is gone by design).

Tests: score::embed 37/0 (incl 4 new build_write_embedder cases — None+degraded,
cloud Some+clears, ollama override, none=inert-not-skip), full memory_tree 390/0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…locking cause (T013/T014/T015)

Surface *why* the wiki is empty instead of presenting a degraded run as success
(specs/002 FR-002/FR-004/FR-005):

- score/extract/llm: on transport-retry exhaustion (extraction model timing
  out), mark structure degraded with FailureCode::ExtractionTimeout; clear it
  when an extraction yields entities/topics. Soft-fallback contract (never Err,
  ingest never blocks) is preserved.
- tree/rpc PipelineStatusResponse: additive `degraded: DegradedState` +
  `first_blocking_cause: Option<PipelineFailure>` (both #[serde(default)] for
  back-compat). derive_pipeline_status gains a `degraded` arm — precedence
  paused → error → degraded → syncing → running → idle — so a half-working sync
  reports `degraded` with a human reason, not a misleading `running`.
- pipeline_status_rpc reads health::current_degraded_state() and the most-recent
  failed job's typed reason (new latest_failed_job_failure helper) to populate
  first_blocking_cause; the UI renders it verbatim (resolving remediation_key).
- health: process-global degraded flags get a #[cfg(test)] shared test_guard()
  (lock + reset) so flag-touching tests across factory / extract / rpc modules
  serialise instead of leaking a degraded signal into each other under cargo's
  parallel runner. factory's local lock now delegates to it.

Tests: tree::rpc 15/0 (degraded-precedence cases added), extract::llm 28/0, full
memory_tree 390/0. (Pre-existing config.toml machine-state flake in
raw_chunks/raw_search/composio/query_global is unrelated — passes in isolation.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… status panel (T016/T017/T018)

- memoryTree.ts: extend the wire types to match the hardened pipeline_status
  contract — `'degraded'` status kind, `MemoryTreePipelineFailure` (code + class
  + remediation_key), `MemoryTreeDegradedState`, and the additive
  `degraded` / `first_blocking_cause` fields on MemoryTreePipelineStatus
  (optional for back-compat with older cores).
- MemoryTreeStatusPanel: amber `degraded` status dot + label; an actionable
  first-blocking-cause banner (renders the core's `remediation_key` verbatim via
  useT) with recall / structure degraded badges. Falls back to degraded.cause
  when first_blocking_cause is absent.
- i18n: `statusDegraded`, two badge labels, and 8 `memory.health.remediation.*`
  strings (lead with the local-Ollama fix per the spec) in en.ts + en-1.ts and
  the same chunk-1 across all locales (English placeholder). `pnpm i18n:check`
  green (missing:0 drifted:0).

Tests: MemoryTreeStatusPanel.test 9/9 (3 new — recall banner+badge, structure
badge, healthy=no banner); tsc + eslint clean on changed files.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…-T026)

self-explain an empty/stalled wiki and the user gets one actionable answer:

- memory_tree/health/doctor.rs: run_doctor(config) -> DoctorReport. Walks each
  stage (embeddings config, scheduler gate, job queue, extraction/recall
  degradation from the process-global flags, summary-tree local-AI precondition)
  in pipeline order and returns per-stage StageHealth, the first blocking cause
  (typed code + i18n remediation_key), the degraded snapshot, and counters.
  Config + queue-counter + flag reads only — no live network probe (deterministic,
  fast). Scheduler-gate `off` is reported as a user choice, not a fault.
- tree/rpc.rs doctor_rpc + memory/schema.rs registration → `memory_tree_doctor`
  JSON-RPC method (CLI parity free via the controller registry).
- tools/impl/memory/doctor.rs MemoryDoctorTool (`memory_doctor`) — read-only,
  no-arg agent tool returning the serialized report; registered in tools/ops.rs.

Tests: health::doctor 5/0 (misconfig→embeddings cause first, healthy all-green,
gate-off=choice, local-AI-off→summary-tree precondition, serde roundtrip);
MemoryDoctorTool 2/0; memory_tree 395/0, memory tools 22/0. (T027 JSON-RPC E2E
deferred to a batch with the other E2E.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… AI is off (T019)

#002 FR-007 — "Build Summary Trees" used to hard-error
"tree summarizer requires local_ai to be enabled in config", leaving it dead
for cloud-only setups (Tencent/OpenRouter, no local Ollama). It now runs on the
configured cloud provider:

- tree_runtime/ops.rs create_provider → returns (provider, model). Local AI on:
  Ollama + local chat model (as before). Local AI off: builds the configured
  provider for the "summarization" role via
  inference::provider::factory::create_chat_provider (maps to memory_provider;
  the summarizer sub-agent already declares hint = "summarization"), returning
  that provider's model so the engine targets the right model.
- engine.rs: run_summarization / rebuild_tree / run_hourly_loop now take the
  resolved `model: &str` and thread it through summarize_to_limit / propagate_node,
  replacing the 3 hardcoded `config.local_ai.chat_model_id` reads (which would
  have mis-targeted a cloud provider).

Tests rewritten for the new contract: create_provider local→local-model /
off→cloud-fallback-with-model; run/rebuild on empty buffer skip cleanly (no
precondition error); CLI tests assert the local-AI precondition error is gone.
tree_runtime 99/0, memory_tree 396/0.

UI privacy indicator (summaries cloud-processed) + T020 partial-success report
are follow-ups.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ions param (T033-T035)

configured still hit "managed budget": their key was silently ignored.

Two stacked bugs fixed:
1. embeddings/openai.rs: OpenAiEmbedding carried a `dims` field but never sent
   the OpenAI `dimensions` request param — it only validated the response size
   afterward. So any 3-* model returned its native size (3-large → 3072) and
   then failed the `!= dims` check. Add `with_send_dimensions(true)` (opt-in,
   default OFF so Voyage/Cohere/Ollama, which don't accept the field, are
   unchanged); the openai/custom factory branches opt in.
2. memory_tree/score/embed/factory.rs: build_write_embedder + the read-path
   factory had NO branch for `embeddings_provider = "openai"` (or custom), so
   the selection fell through to the managed Voyage backend, ignoring the
   user's key. New openai_compat.rs adapter wraps the unified EmbeddingProvider
   (built via create_embedding_provider_with_credentials, reusing resolve_api_key
   so there's one source of truth) and is tried BEFORE the managed-cloud
   fallback on both paths. Pinned to EMBEDDING_DIM (1024); a wrong-size vector
   surfaces as the standard "expected N dims" error → embedding_dim_mismatch.

This means: OpenAI text-embedding-3-large + 1024 dims now actually drives
memory-tree embeddings instead of the managed-budget backend.

Tests: openai dimensions sent-when-opted-in / omitted-by-default (mock server);
openai_compat None-for-non-openai / Some-for-openai / Some-for-custom;
embeddings 133/0, memory_tree 399/0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… longer aborts the run (T020)

#002 FR-008 — a single node's summarization failure (e.g. the LLM times out on
one busy day) used to `?`-abort the entire run_summarization / rebuild_tree,
discarding all the other ancestors even though the hour leaves were already
durably written.

Both propagation loops now collect-and-continue: each propagate_node failure is
logged with its node id + level and recorded, the loop proceeds, and the run
returns Ok (partial success) instead of erroring out wholesale. Failed nodes are
recoverable on the next run. No signature/RPC-shape change — the structured
SummaryRunReport-to-RPC surfacing is deferred (the resilience is the substance
of FR-008).

Test: rebuild_tree_partial_success_when_one_level_fails (large hour leaves force
a day-level LLM call; FailAtLevelProvider fails only "day"; rebuild returns Ok,
hour leaves survive). tree_runtime 100/0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…C (T028-T032)

switches to a faster extraction model), jobs that failed under the old config
should re-run without re-ingesting source data.

- memory_queue::store::requeue_failed: flips every terminal `failed` job back
  to `ready` with a fresh attempt budget (attempts=0) and the typed
  failure_reason / failure_class / last_error cleared.
- vault::sync::sync_vault calls it at the start of every sync (best-effort) so
  a fixed cause self-heals automatically — no button click needed.
- memory_tree_retry_failed RPC (tree/rpc.rs::retry_failed_rpc + schema.rs
  registration) backs the manual "Retry failed" button; wakes the worker pool
  so requeued jobs run promptly.

Also fixes a test-isolation leak surfaced by this work: bucket_seal_tests used a
no-provider config that hit build_write_embedder's skip→mark_semantic_recall_
degraded path, leaking the process-global degraded flag into parallel
pipeline_status tests. Switched those tests to provider="none" (deterministic
inert, no degrade) — the same pattern as integration_test.

Tests: requeue_failed_resets_failed_jobs_only; memory_ suite 1754/0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…UI (T036/T037)

"empty-but-built wiki" (chunks present, but the extraction model produced no
entities) is visible.

- memory_store::chunks::store::extraction_coverage(config) -> f32: fraction of
  mem_tree_chunks with ≥1 row in mem_tree_entity_index (EXISTS join on
  node_id = chunk id, so it's node-kind-agnostic). 0.0 on an empty store.
- pipeline_status: + extraction_coverage field (additive, best-effort via
  spawn_blocking → 0.0 on error). DoctorReport.counters: + extraction_coverage
  (dropped `Eq` from DoctorCounters/DoctorReport since f32 isn't Eq; PartialEq
  retained for tests).
- UI: MemoryTreeStatusPanel shows "Extraction coverage: N% of chunks have
  structure" under the tiles when chunks exist. New i18n key
  memoryTree.status.extractionCoverage across en.ts + all locale chunk-1
  (i18n:check green).

Tests: extraction_coverage_empty_store_is_zero / _reflects_indexed_fraction
(0 → 0.5 → 1.0); chunks store 31/0, doctor 21/0, panel 9/9; tsc + eslint clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add `intelligence.memory_pipeline_doctor` describing the diagnose surface
(first blocking cause + fix, counters, extraction coverage), the agent
self-diagnosis, auto-retry, and the memory_tree_doctor RPC. about_app 21/0
(catalog integrity passes).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Formatting-only pass (cargo fmt) across the memory-pipeline-hardening files so
the CI `cargo fmt --check` gate is clean. No logic changes. memory_ suite
1756/0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Six findings from the CodeRabbit review of PR tinyhumansai#3076:

1. memory/schema.rs — pipeline_status ControllerSchema was stale. Add
   `degraded` to the status enum and append the three #002 output fields
   (`degraded`, `first_blocking_cause`, `extraction_coverage`) with shapes
   matching the response struct (DegradedState / Option<PipelineFailure> as
   Json, f32 as F64). Update the struct doc-comment precedence to include
   `degraded`.

2. i18n — es-1.ts shipped English placeholders for the 12 #002 keys
   (memoryTree.status.* + memory.health.remediation.*). Replace with real
   Spanish translations. Other locales keep English placeholders per the
   repo convention (i18n:check parity holds; 0 missing).

3. doctor — expose run_doctor via an async_run_doctor wrapper that runs the
   blocking diagnostic inside spawn_blocking; route doctor_rpc and
   MemoryDoctorTool through it so the diagnostic never blocks a Tokio worker.

4. doctor stage 5 — stop auto-failing the summary-tree precondition when
   local AI is off. Add tree_runtime::ops::summarizer_available() which
   mirrors create_provider's decision (local AI on → ok; else confirm a
   cloud summarization provider resolves); stage health now reflects it.

5. health — split the single shared DEGRADED_CAUSE atomic into per-flag
   SEMANTIC_RECALL_CAUSE / STRUCTURE_CAUSE so clearing one degradation no
   longer wipes the other's cause. Add a regression test.

6. tree/rpc.rs — fold latest_failed_job_failure + extraction_coverage into a
   single spawn_blocking so neither SQLite read runs on a Tokio worker.

Tests: memory_tree::health (22) + memory_tree::tree::rpc (15) green;
cargo fmt + clippy clean; pnpm i18n:check parity holds.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
zahica1234 and others added 8 commits June 1, 2026 08:30
Independent review of the #002 hardening surfaced a coverage gap: the
embedder factory reads two distinct provider fields — top-level
`embeddings_provider` (none / `ollama:` routing) and
`memory.embedding_provider` (OpenAI / custom detection) — and the FR-015
fix (route user-configured OpenAI embeddings instead of silently falling
through to the managed-budget backend) lived in the factory's branch
ordering, but only `openai_compat::try_from_config` was tested in
isolation. A factory refactor could re-break the headline bug with every
test still green.

Add two regression tests asserting both `build_write_embedder` and
`build_embedder_from_config` route a `memory.embedding_provider = "openai"`
config to the OpenAI adapter (name "openai") — and that the write path
clears the semantic-recall degraded flag — with the top-level
`embeddings_provider` unset so the none/ollama branches cannot mask it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Three actionable findings + one nitpick from the second review pass:

1. doctor summary_tree stage emitted FailureCode::LocalModelUnavailable even
   for the cloud-only "no provider resolved" case, driving the wrong
   remediation banner. Added FailureCode::SummarizerUnavailable (distinct from
   LocalModelUnavailable, which implies the local path was selected) with its
   own remediation text naming BOTH fixes (enable local AI OR configure a
   cloud model). Wired through as_str/from_str/remediation_key, the per-flag
   atomic encoding (code_to_u8/u8_to_code), the ALL_CODES test array, the
   frontend MemoryTreeFailureCode union, and i18n
   (memory.health.remediation.summarizer_unavailable in en.ts + all enforced
   locale chunk-1 files; real Spanish, English placeholder elsewhere).

3. extraction_coverage overloaded 0.0 to mean both "genuinely 0% structure"
   and "couldn't measure (DB error)". Changed the response field to
   Option<f32> (None on read error) and the schema to a nullable F64
   (required: false), so the honest-status endpoint can't misreport a broken
   measurement as a structure failure. The frontend already guards
   `extraction_coverage != null`, so no UI change was needed.

Nitpick: summarizer_available() reimplemented create_provider()'s local-vs-
   cloud branch — a drift point. It now routes through create_provider()
   directly (the same resolver the runtime uses) and maps the result to
   (available, note); the provider it builds is dropped (construction is
   cheap, no network).

SKIPPED finding #2 (degraded shape doc): the review assumed the per-flag
   cause split changed the wire contract, but DegradedState still serialises a
   single precedence-resolved `cause` (the split is internal to the process-
   global atomics). Kept the single-cause shape — restructuring the doc to
   per-flag causes would misdescribe the actual JSON — but tightened the
   comment: `cause` is OMITTED when no degradation is active (it has
   skip_serializing_if), not "always present"; the object itself is.

Tests: health 22, tree::rpc 15, tree_runtime 100, doctor 5 — all green.
cargo fmt + clippy clean; tsc clean; pnpm i18n:check parity holds (3384 keys).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ening)

Re-homes the #002 work onto the current upstream layout after rebasing past
several refactors that landed since the PR's original base:

- i18n: upstream reverted the per-locale `chunks/<locale>-N.ts` split back to
  flat `<locale>.ts` files. Re-added the 13 #002 keys (4 memoryTree.status.* +
  9 memory.health.remediation.*, incl. summarizer_unavailable) to every
  enforced locale flat file — real Spanish for es.ts, English placeholders
  elsewhere. `pnpm i18n:check` parity holds (0 missing).
- memory_store/chunks: tinyhumansai#2991 split store.rs into connection.rs/etc. Re-applied
  the failure_reason/failure_class CREATE-TABLE columns (store.rs) + the
  add_column_if_missing migration (now in connection.rs::apply_schema).
- tools/impl/memory → memory/tools rename: doctor.rs landed at the new path,
  module + tool export auto-merged.
- about_app: CAPABILITIES moved to catalog_data.rs; the Memory Pipeline Doctor
  capability entry was re-homed there.
- embeddings: kept both upstream's provider_from_config and our pub(crate)
  resolve_api_key re-export.
- vault: upstream REMOVED the vault domain entirely. The vault-ingest-ledger
  wipe hook (read_rpc) and the auto-retry-on-sync hook (vault::sync) target
  deleted code — dropped. The manual `memory_tree_retry_failed` RPC and
  `requeue_failed` store fn remain; auto-retry-on-sync needs re-homing to the
  new source-trees sync path (follow-up). Stale doc on retry_failed_rpc fixed.
- mod_tests.rs Job literal: added the new failure_reason/failure_class fields.

Validation: lib builds; memory_tree::health 22, tree::rpc 11, embed::factory 15,
tree_runtime 102, memory_queue 66, memory_store::chunks 91, embeddings 166,
about_app 24 — all green. tsc 0 errors; i18n:check parity holds; fmt+clippy clean.
(One pre-existing upstream test, memory::query::…fetch_leaves, fails in this
sandbox only because it reads a real ~/.openhuman user config — byte-identical to
upstream, not a rebase regression.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CodeRabbit (3rd pass): in pipeline_status_rpc the best-effort
latest_failed_job_failure + extraction_coverage reads, plus the
spawn_blocking join, silently swallowed errors — making a DB/query/join
failure indistinguishable from "no blocking cause" / "metric unavailable
by design". Keep the None fallbacks (these reads must never fail the
polled status RPC) but emit a grep-friendly `[memory-tree][rpc]
pipeline_status:` warning before discarding each error, per the repo's
verbose-diagnostics rule.

memory_tree::tree::rpc 11/11 green; fmt + clippy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ture

The #002 pipeline-hardening commits added `model: &str` as a parameter to
`engine::run_summarization` and `engine::rebuild_tree`, but three integration
tests still called the old arity. These targets aren't built by `cargo test
--lib`, so the break only surfaced in the Rust Core Coverage CI job
(run 26715157961). Pass a model-agnostic "test-model" placeholder — the
providers in these tests are ScriptedProvider doubles that ignore the model.

- tests/memory_tree_sync_raw_coverage_e2e.rs (run_summarization, rebuild_tree)
- tests/memory_tree_memory_round23_raw_coverage_e2e.rs (run_summarization x2, rebuild_tree)
- tests/memory_tree_summarizer_e2e.rs (run_summarization x4)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ovider (FR-002)

The #002 fail-fast work switched the extract and seal write paths to
`build_write_embedder` (returns None → skip embedding) so a missing/unusable
embeddings provider can't silently persist all-zero `InertEmbedder` vectors and
poison semantic recall. The third write path — `reembed_backfill` — was missed:
it still used the read-path `build_embedder_from_config`, whose no-provider
fallback returns an `InertEmbedder`. Its zero vectors pass `pack_checked` and
were written to the sidecar, exactly the FR-002 hazard the other paths guard.

Route `reembed_backfill` through `build_write_embedder` and, on None, stop the
backfill (`JobOutcome::Done`) rather than embedding: the rows stay
re-embeddable and the next signature change (e.g. once a provider is configured)
re-triggers it. The factory has already set the typed semantic-recall degraded
flag for the status/doctor surface. `embeddings_provider="none"` still yields
`Some(InertEmbedder)` — a deliberate vector-search-off opt-out, not a skip.

Tests: add `reembed_backfill_skips_when_no_provider` (asserts Done + no vector
persisted). The three existing mechanics tests (repopulate / tombstone-orphan /
worker-defer) relied on the old InertEmbedder fallback; pin them to the explicit
`embeddings_provider="none"` opt-out so they still exercise a no-network embedder.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…(FR-011)

`requeue_failed` is reachable only from the manual `memory_tree_retry_failed`
RPC — there is no automatic caller. The auto-retry-on-sync hook was planned but
lived on the upstream-removed vault sync path and has not been re-homed (flagged
as a follow-up on the PR). Three places still asserted it runs automatically,
which would mislead a user (and the agent's own diagnosis) into thinking failed
jobs recover on their own after a config fix:

- store.rs: the requeue_failed rustdoc ("auto-retry on next sync hook")
- memory/schema.rs: the retry_failed controller description ("the same requeue
  also runs automatically at the start of each sync")
- about_app/catalog_data.rs: the Memory Pipeline Doctor capability ("failed jobs
  auto-retry on the next sync")

Reword all three to describe reality: manual, on-demand requeue via the RPC;
no automatic requeue-on-sync yet. No behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…l locales

The 13 #002 keys (degraded badges + typed pipeline-failure remediation copy)
were added to every locale during the rebase but only `es` carried real
translations; the other 12 (ar, bn, de, fr, hi, id, it, ko, pl, pt, ru, zh-CN)
shipped verbatim English placeholders. CLAUDE.md requires a real translation in
every locale (English-only is the runtime fallback, not a shipping value).

Translate all 13 keys in the 12 remaining locales, following the established
convention: localize prose and the Settings → AI nav labels; keep brand/loanwords
(OpenHuman, Ollama, Embeddings), the {pct} placeholder, and 1024 verbatim.
fr/it values switch to double-quote delimiters for apostrophes, matching the
repo's existing style. Verified: `i18n:check` parity holds and none of these
keys are flagged by `i18n:english:check`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@sanil-23 sanil-23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zahica1234 heads up — CI is currently red (Rust Core Coverage, Rust E2E mock backend, Frontend Vitest, and two Playwright lanes), so i'll hold a full approval until those are green. CodeRabbit has also already left a solid set of change requests (per-flag degraded cause, async-thread SQLite reads in the doctor/rpc, the summary-tree precondition, schema docs) — please work through those; i won't restate them here.

A couple of project-specific things i spotted that CodeRabbit didn't flag, worth folding in while you're iterating:

  1. embeddings/factory.rswith_send_dimensions(true) is now blanket-enabled for every custom / custom: provider, not just OpenAI. openai.rs's own doc comment says to set this only for endpoints that actually implement the dimensions request param. custom/custom: is the escape hatch for arbitrary OpenAI-compatible servers (vLLM, some LocalAI/text-embeddings-inference builds), several of which reject unknown body fields — those users will start getting a 400 on every embed after this change. The 3-large→1024 need is genuinely OpenAI-specific, so consider gating on model id (text-embedding-3-*) rather than provider kind, or keeping dimensions opt-in for the generic custom paths. The plain "openai" branch is fine.

  2. memory_tree/score/embed/factory.rsbuild_write_embedder duplicates build_embedder_from_config's resolution ladder. The doc comment promises "identical resolution for every real provider," but it's enforced by hand across two functions; the only intended difference is the terminal (read returns InertEmbedder, write returns None). If a future provider is added to one and not the other, the read path could retrieve against a different provider/dim than the write path wrote with — a silent, hard-to-debug recall corruption. Extracting a shared private resolver that returns the provider choice, with the two public fns differing only at the terminal, would make the invariant structural instead of aspirational.

Fix CI and CodeRabbit's items first and i'll come back for a proper pass. Nice, thorough writeup on the PR itself — the failure taxonomy and skip-don't-fake direction are the right calls.

)),
"cohere" => Ok(Box::new(CohereEmbedding::new(api_key, model, dims))),
"custom" => {
let url = custom_endpoint.unwrap_or("");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[major] This enables dimensions in the request body for all custom / custom: providers, but openai.rs explicitly documents with_send_dimensions as "only call this for genuine OpenAI / OpenAI-compatible endpoints that implement the parameter." custom: is exactly the path used for arbitrary OpenAI-compatible servers (vLLM, text-embeddings-inference, stricter LocalAI builds) — those that reject unknown body fields will 400 on every embed after this change.

The 3-large@1024 requirement is OpenAI-specific, so prefer gating on the model id (e.g. model.starts_with("text-embedding-3-")) rather than the provider kind, or keep it opt-in for the generic custom paths. Same applies to the custom branch above and the custom: branch in create_embedding_provider.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still open on the latest commit — the change landed as with_send_dimensions(true) on the custom/custom: arms rather than a model-id gate, so the doc's "only for endpoints that implement the parameter" caveat is violated for exactly the arbitrary-endpoint path. Strict OpenAI-compatible servers (vLLM, text-embeddings-inference) 400 on the unknown dimensions field. Suggest gating on model.starts_with("text-embedding-3"), or leaving it opt-in for custom providers.

Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zahica1234 CI is failing across multiple jobs (Vitest, Rust E2E mock backend, two Playwright lanes, Rust Core Coverage) so I'm holding a formal pass until those clear. Your claim that the Rust Core Coverage failures are pre-existing flakes in unrelated e2e suites is plausible given the pattern you described, but the Vitest and Rust E2E failures are closer to this diff — those need a clear explanation or a green re-run before I'd be comfortable approving.

CodeRabbit's 9 majors are all real and need addressing before merge — I'm not restating them. sanil-23 flagged two additional things I also spotted (with_send_dimensions(true) blanket over custom providers breaking non-OpenAI endpoints, and build_write_embedder hand-duplicating the resolution ladder), both of which I agree with; fold those in while you're iterating on CodeRabbit's items.

Two things I spotted that neither CodeRabbit nor sanil-23 raised:

[major] Cloud summarizer fallback ships without any privacy notice.

create_provider in tree_runtime/ops.rs now silently routes to the configured cloud chat provider when local AI is off. The PR description explicitly calls the UI indicator a "planned follow-up," but "Build Summary Trees" was previously local-only precisely because memory summaries are sensitive. Users on cloud-only setups — or anyone who disables local AI — now have their workspace memory summarized by an external provider without any in-product signal that this is happening. Shipping the behavior change ahead of the privacy indicator inverts the right order here. Either gate the cloud fallback on an explicit "Use cloud for summarization" opt-in flag that the existing settings path can surface, or land the amber notice alongside this PR. Leaving it for a follow-up makes it easy to forget and means every user who updates in the gap has no visibility.

[minor] Doctor conflates measurement failure with genuine 0% extraction coverage.

run_doctor builds DoctorCounters with:

extraction_coverage: chunks::extraction_coverage(config).unwrap_or(0.0),

The RPC response correctly uses Option<f32> with None on a DB read error to keep the two cases distinct. The doctor collapses them: a SQLite failure during doctor runs maps to 0.0, and an agent using the doctor for self-diagnosis reads that as "extraction is producing nothing" and follows the extraction remediation path — wrong cause, wrong fix. DoctorCounters.extraction_coverage should be Option<f32> to carry the same "measurement failed" signal the RPC exposes.

zahica1234 and others added 2 commits June 1, 2026 20:09
…dening

sanil-23 items:
- A: gate `with_send_dimensions` on `model_supports_dimensions(model)` so
  custom/vLLM endpoints aren't 400'd on an unknown request field
- B: extract shared `EmbedderChoice` enum + `resolve_embedder_choice`
  resolver; `build_embedder_from_config` and `build_write_embedder` now
  share one path, eliminating the silent-drift risk sanil flagged
- Fix `ingest_chat_writes_and_queue_drains_to_admitted_chunk` test to
  assert `is_none()` — FR-002 skip-not-zero-vector is correct behaviour

graycyrus items:
- [minor] `DoctorCounters.extraction_coverage: f32 → Option<f32>` so a
  DB read failure is never misreported as genuine 0% extraction coverage;
  schema and doctor counter comment updated accordingly
- [major] `cloud_summarization_opt_in: bool` (default false) added to
  `MemoryTreeConfig`; `create_provider` in ops.rs now gates the cloud
  fallback on this flag — memory summaries stay local-only by default,
  matching pre-#002 behaviour, and users must explicitly enable cloud
  summarization in Settings → AI → Memory. `summarizer_unavailable`
  remediation strings updated in all 14 locales to reference the opt-in.
  Tests updated: opt-in required test added; cloud-fallback tests now set
  `cloud_summarization_opt_in = true` explicitly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…de + update CLI test

- Add env var override for `cloud_summarization_opt_in` in load.rs so the
  flag is settable from Docker/CI without a config file edit
- Update `run_summarize_skips_cleanly_on_empty_buffer_without_local_ai`
  test: it now correctly asserts that without any provider the CLI
  returns a clean "no summarization provider" error (not a panic), which
  is still better than the old opaque "requires local_ai" crash

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@zahica1234
Copy link
Copy Markdown
Contributor Author

Round 3 review addressed — two commits dc8a0993 + 597f2dde on pr/memory-pipeline-hardening.


sanil-23 items — fixed in dc8a0993

A — with_send_dimensions(true) blanket:
Added model_supports_dimensions(model: &str) -> bool helper in embeddings/factory.rs; all 5 .with_send_dimensions(true) calls replaced with .with_send_dimensions(model_supports_dimensions(model)). Only text-embedding-3-* gets the parameter sent.

B — build_write_embedder duplication:
Extracted EmbedderChoice enum + shared resolve_embedder_choice private resolver in memory_tree/score/embed/factory.rs. Both build_embedder_from_config (read→InertEmbedder at NoProvider) and build_write_embedder (write→None+degraded at NoProvider) now share one 6-step resolution ladder — drift risk eliminated.

Ingest coverage test:
ingest_chat_writes_and_queue_drains_to_admitted_chunk now asserts is_none() — FR-002's skip-not-zero-vector is the correct behaviour when no embeddings provider is configured.


graycyrus items — fixed in dc8a0993 + 597f2dde

[minor] DoctorCounters.extraction_coverage: f32 → Option<f32>:
doctor.rs line 72 changed to Option<f32>, builder now uses .ok(). A DB read error is no longer misreported as genuine 0% coverage. Schema comment in schema.rs updated (extraction_coverage: number|null).

[major] Cloud summarizer privacy opt-in:
Added cloud_summarization_opt_in: bool (default false) to MemoryTreeConfig in config/schema/storage_memory.rs. create_provider in tree_runtime/ops.rs now gates the cloud fallback on this flag: when local AI is off and the flag is false, it returns a clean "no summarization provider" error instead of silently routing to cloud. This restores the pre-#002 local-only default; users must explicitly enable cloud summarization via Settings → AI → Memory (or OPENHUMAN_MEMORY_TREE_CLOUD_SUMMARIZATION=true). Env var override wired in config/schema/load.rs. All 14 locale files updated with correct translations for the revised summarizer_unavailable remediation string. 3 affected tests updated; 2 new tests added (create_provider_errors_without_cloud_opt_in / create_provider_uses_cloud_when_opted_in_and_local_ai_off).


Pre-existing test flakes (not caused by this diff)

  • pipeline_status_reports_chunk_aggregates_after_ingest: passes in isolation; fails under parallel load due to process-global flag contamination from sibling tests. git diff main... -- src/openhuman/memory_tree/tree/rpc.rs shows no change to this test.
  • memory_tree_fetch_leaves_mode_dispatches_successfully: fails because it reads from /home/zahica/.openhuman/... (the real user config path) with no workspace isolation guard. Not in our diff.

Both are visible in CI pre-dating this branch. Happy to investigate as a follow-up if the maintainers want.


Still stale: CodeRabbit items already addressed in earlier commits

Per my previous comment, the items raised in CodeRabbit round 1 + 2 (pipeline_status schema, async doctor reads, SummarizerUnavailable failure code, summarizer_available() shared helper) were addressed in fea53943 and 6915e48b. CodeRabbit's auto-pause after rapid commits means its latest review covers a stale snapshot. Running @coderabbitai review on the current head should clear those.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 1, 2026

@zahica1234 Thanks for the detailed round-3 summary — the fixes look solid:

  • model_supports_dimensions helper scoping with_send_dimensions to only text-embedding-3-* is the right precision vs. the blanket approach.
  • EmbedderChoice + resolve_embedder_choice unifying the read/write resolution ladder eliminates the drift risk cleanly.
  • cloud_summarization_opt_in: bool (default false) is exactly the right privacy-first default — silently routing to cloud when local AI is off would have been a surprising behaviour change; gating it behind an explicit flag + env-var restores the pre-#002 contract.
  • DoctorCounters.extraction_coverage: Option<f32> — good catch, a DB error reporting as genuine 0% would have been misleading.

On the pre-existing flake notes: the process-global singleton contamination pattern in *_raw_coverage_e2e tests is a known footgun — flagging it for a follow-up issue sounds reasonable if the maintainers want it tracked.

Kicking off a fresh review against the current head now.

Re-review requested by author after round-3 fixes: model_supports_dimensions helper, EmbedderChoice/resolve_embedder_choice shared resolver, cloud_summarization_opt_in privacy gate, DoctorCounters.extraction_coverageOption<f32>, i18n parity for 14 locales, and associated test updates.

🐇✨

✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

♻️ Duplicate comments (1)
src/openhuman/embeddings/factory.rs (1)

56-61: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep dimensions off for generic custom endpoints.

model_supports_dimensions(model) still turns this on for custom / custom: providers, so any OpenAI-compatible server that exposes a text-embedding-3-* model id but does not implement OpenAI's dimensions field will still 400 every embed request. That makes the generic BYO path fragile again. Restrict this to the "openai" provider, or make it an explicit opt-in on custom endpoints instead of inferring support from the model name alone.

For OpenAI-compatible embedding servers such as vLLM, Hugging Face text-embeddings-inference, and LocalAI: do `/embeddings` requests that include OpenAI's `dimensions` field succeed, get ignored, or fail? Please include the relevant docs/issues for each server.
Suggested change
         name if name.starts_with("custom:") => {
             let base_url = name.strip_prefix("custom:").unwrap_or("");
-            Ok(Box::new(
-                OpenAiEmbedding::new(base_url, "", model, dims)
-                    .with_send_dimensions(model_supports_dimensions(model)),
-            ))
+            Ok(Box::new(OpenAiEmbedding::new(base_url, "", model, dims)))
         }
@@
         "custom" => {
             let url = custom_endpoint.unwrap_or("");
-            Ok(Box::new(
-                OpenAiEmbedding::new(url, api_key, model, dims)
-                    .with_send_dimensions(model_supports_dimensions(model)),
-            ))
+            Ok(Box::new(OpenAiEmbedding::new(url, api_key, model, dims)))
         }
@@
         name if name.starts_with("custom:") => {
             let url = custom_endpoint.unwrap_or_else(|| name.strip_prefix("custom:").unwrap_or(""));
-            Ok(Box::new(
-                OpenAiEmbedding::new(url, api_key, model, dims)
-                    .with_send_dimensions(model_supports_dimensions(model)),
-            ))
+            Ok(Box::new(OpenAiEmbedding::new(url, api_key, model, dims)))
         }

Also applies to: 97-109

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/embeddings/factory.rs` around lines 56 - 61, The custom
provider branch currently enables sending the OpenAI `dimensions` field based
solely on model name via model_supports_dimensions(model), which breaks generic
custom endpoints; update the logic in the custom: handling (where
OpenAiEmbedding::new(...) is created and .with_send_dimensions(...) is called)
to NOT enable dimensions for generic custom providers—either remove the
.with_send_dimensions(...) call for the name.starts_with("custom:") branch or
set it explicitly false unless the provider is exactly "openai" or an explicit
opt-in flag is present; apply the same change to the other similar block
referenced around lines 97-109 so only the real "openai" provider (or an
explicit opt-in) can enable dimensions.
🧹 Nitpick comments (6)
src/openhuman/embeddings/rpc.rs (1)

404-404: ⚡ Quick win

Keep the crate-visible helper parameter camelCase.

provider_name is now part of a pub(crate) signature, so this is a good point to align it with the repo's Rust naming rule (providerName).

As per coding guidelines, "**/*.rs: Use camelCase for variable names in Rust`"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/embeddings/rpc.rs` at line 404, The crate-visible function
signature for resolve_api_key should use camelCase for its parameter; rename the
parameter provider_name to providerName in the fn signature (pub(crate) fn
resolve_api_key(config: &Config, providerName: &str) -> String) and update all
internal call sites, pattern matches, and any usages in this module (and tests)
to the new identifier to satisfy the repo's Rust naming rule; leave the function
name and types unchanged.
src/openhuman/memory_tree/score/embed/openai_compat.rs (1)

55-77: ⚡ Quick win

Use camelCase for the new local bindings.

api_key and custom_endpoint in this constructor don't follow the repo's Rust naming rule. Please align this new adapter path with apiKey / customEndpoint naming.

As per coding guidelines, "**/*.rs: Use camelCase for variable names in Rust`"

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_tree/score/embed/openai_compat.rs` around lines 55 - 77,
The new locals api_key and custom_endpoint in try_from_config violate the repo's
camelCase Rust naming rule; rename these bindings to apiKey and customEndpoint
and update their uses in the call to create_embedding_provider_with_credentials
(and any other references) so the function parameters and local variable names
match (e.g., replace api_key -> apiKey and custom_endpoint -> customEndpoint
where passed into create_embedding_provider_with_credentials and any subsequent
code).
src/openhuman/memory_tree/score/embed/factory.rs (1)

113-129: ⚡ Quick win

Rename the new resolution-ladder bindings to camelCase.

tree_cfg, timeout_ms, and the new enum field names don't follow the repo's Rust naming rule. Please normalize them in this shared path before more callers copy the pattern.

As per coding guidelines, "**/*.rs: Use camelCase for variable names in Rust`"

Also applies to: 135-168

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_tree/score/embed/factory.rs` around lines 113 - 129, The
enum EmbedderChoice and adjacent bindings use snake_case for local fields (e.g.,
timeout_ms, tree_cfg) which violates the repo's Rust naming rule; rename these
bindings to camelCase (timeoutMs, treeCfg) and similarly convert any newly added
enum field names and pattern bindings in the same block to camelCase, then
update all places that construct, match on, or reference EmbedderChoice variants
(constructors, matches, method params) so identifiers are consistent across the
module (refer to EmbedderChoice, the Ollama variant fields, and any usages
around the 135-168 changes).
src/openhuman/embeddings/openai.rs (1)

17-23: ⚡ Quick win

Rename the new field to camelCase.

send_dimensions doesn't match the repo's Rust naming rule. Please rename the field and its references consistently before this API spreads further.

As per coding guidelines, "**/*.rs: Use camelCase for variable names in Rust`"

Also applies to: 34-45, 132-142

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/embeddings/openai.rs` around lines 17 - 23, The field
send_dimensions and its builder method Self::with_send_dimensions must be
renamed to camelCase (e.g., sendDimensions and withSendDimensions) throughout
the module; update the struct field name, all constructor/builder call sites,
method definitions, docs/comments, and any serde/serialization keys or tests
that reference send_dimensions (also in the other referenced blocks around lines
34-45 and 132-142) so references remain consistent and compilation passes.
Ensure you rename both the field and the accessor/mutator methods, and adjust
any pattern matches or field initializers that use send_dimensions.
src/openhuman/memory_queue/store.rs (1)

375-410: ⚡ Quick win

Keep a single failed-job reset path.

requeue_failed() now clears failure_reason / failure_class, but the older retry_all_failed() still resets rows without nulling those columns. Any caller that still uses the old helper can put a ready job back into circulation with stale typed failure metadata. Please have one API delegate to the other or remove the older entry point.

♻️ Minimal follow-up
 pub fn retry_all_failed(config: &Config) -> Result<u64> {
-    with_connection(config, |conn| {
-        let now_ms = Utc::now().timestamp_millis();
-        let n = conn.execute(
-            "UPDATE mem_tree_jobs
-                SET status = 'ready',
-                    attempts = 0,
-                    available_at_ms = ?1,
-                    locked_until_ms = NULL,
-                    started_at_ms = NULL,
-                    completed_at_ms = NULL,
-                    last_error = NULL
-              WHERE status = 'failed'",
-            params![now_ms],
-        )?;
-        if n > 0 {
-            log::info!("[memory::jobs] retry_all_failed: reset {n} jobs back to ready");
-        }
-        Ok(n as u64)
-    })
+    requeue_failed(config)
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_queue/store.rs` around lines 375 - 410, The old
retry_all_failed() duplicates logic and doesn't clear
failure_reason/failure_class—make a single canonical path by having
retry_all_failed delegate to the newer requeue_failed(config) (or remove
retry_all_failed entirely); update retry_all_failed to call requeue_failed and
return its Result<u64>, ensuring the same with_connection/context usage so the
SQL that nulls failure_reason/failure_class (in requeue_failed) is the one
executed for all callers.
src/openhuman/memory_tree/tree_runtime/cli.rs (1)

694-707: ⚡ Quick win

Assert the new no-provider error here instead of accepting any failure.

In this setup, create_provider() fails before any provider call, so allowing arbitrary Err values makes this regression test too weak. It would be better to assert the new "no summarization provider" path for run_rebuild explicitly and drop the duplicate run_summarize coverage from this test.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_tree/tree_runtime/cli.rs` around lines 694 - 707, The
test currently accepts any Err from run_rebuild/run_summarize which is too weak
because create_provider() fails early; change it to assert that
run_rebuild(&["ns".to_string()]) returns an Err whose string contains the
specific "no summarization provider" message (and remove the redundant
run_summarize check in this test), so the test verifies the new no-provider
failure path from create_provider() rather than allowing arbitrary failures;
locate assertions around run_rebuild and run_summarize in this block and update
the assertion to check for "no summarization provider" on run_rebuild and delete
the duplicate run_summarize verification.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/src/components/intelligence/MemoryTreeStatusPanel.tsx`:
- Around line 271-276: MemoryTreeStatusPanel is rendering
t(blockingCause.remediation_key) with no fallback so unknown backend keys leak
raw identifiers; change the call to pass a fallback using the existing
status.reason so the banner degrades gracefully (e.g., use i18n
fallback/defaultValue behavior of t with blockingCause.remediation_key and
status.reason as the fallback) and ensure you update the element that references
blockingCause.remediation_key to use that fallback.

In `@app/src/lib/i18n/ar.ts`:
- Around line 4371-4377: The Arabic remediation strings still use the English
menu label "Embeddings"; update each affected i18n values (e.g., keys
'memory.health.remediation.auth_missing',
'memory.health.remediation.auth_invalid',
'memory.health.remediation.embeddings_unconfigured', and the preceding string
about exhausted memory embeddings) to replace the English word "Embeddings" with
the localized Arabic term "التضمينات" so the menu label is consistent with other
Arabic translations.

In `@app/src/lib/i18n/ko.ts`:
- Around line 4414-4420: Update the remediation banner strings to use the
localized menu label instead of English; replace every occurrence of "설정 → AI →
Embeddings" with "설정 → AI → 임베딩" in the keys
'memory.health.remediation.auth_missing',
'memory.health.remediation.auth_invalid',
'memory.health.remediation.embeddings_unconfigured' and the earlier message that
mentions embeddings reaching managed budget so the Korean path matches the
translated UI label.

In `@app/src/lib/i18n/zh-CN.ts`:
- Around line 4234-4241: Update the four i18n values for keys
'memory.health.remediation.budget_exhausted',
'memory.health.remediation.auth_missing',
'memory.health.remediation.auth_invalid', and
'memory.health.remediation.embeddings_unconfigured' to use the existing
localized settings label "向量嵌入" instead of the English "Embeddings" in their
navigation hints (i.e., replace occurrences of "设置 → AI → Embeddings" with "设置 →
AI → 向量嵌入") so the guidance matches the rest of the zh-CN locale.

In `@app/src/utils/tauriCommands/memoryTree.ts`:
- Around line 810-815: The RPC type's extraction_coverage currently only allows
a number which loses the nullable "unknown" state; update the declaration for
extraction_coverage in the RPC/interface in memoryTree.ts to allow null (e.g.,
extraction_coverage?: number | null) and then audit usages of
extraction_coverage (rendering/UI logic and any numeric calculations) to handle
null explicitly (render "N/A" or skip numeric math) so a null value doesn't get
treated as 0.

In `@src/openhuman/config/schema/load.rs`:
- Around line 1948-1955: Replace the custom match on
OPENHUMAN_MEMORY_TREE_CLOUD_SUMMARIZATION with the shared parse_env_bool helper
so the same truthy/falsy forms (including "on"/"off") and consistent warning
formatting are used; specifically, call parse_env_bool(env,
"OPENHUMAN_MEMORY_TREE_CLOUD_SUMMARIZATION") and, if it returns Some(bool), set
self.memory_tree.cloud_summarization_opt_in = that value, otherwise leave the
field untouched (parse_env_bool will emit warnings for unknown values).

In `@src/openhuman/memory_queue/handlers/mod_tests.rs`:
- Around line 229-294: This test leaks process-global health-state because
build_write_embedder() flips the global degraded flag; to fix, add the test
guard at the start of the test body by calling
crate::openhuman::memory_tree::health::test_guard() and binding it to an unused
variable (e.g., let _g = ...) inside the async fn
reembed_backfill_skips_when_no_provider() immediately after the use/import block
so the guard is active for the duration of the test and prevents bleed into
other tests that inspect health-state.

In `@src/openhuman/memory_tree/health/doctor.rs`:
- Around line 115-130: The embeddings stage currently treats any non-empty
string as healthy; change the logic around embeddings_provider (the variable
computed from config.memory_tree.embedding_endpoint and
config.embeddings_provider) to treat the literal string "none"
(case-insensitive) as disabled/absent rather than a configured provider, and
surface that as a non-ok stage similar to the scheduler_gate style: when
provider == "none" produce StageHealth::bad (or the project's canonical
"blocked/disabled" stage) with
PipelineFailure::new(FailureCode::EmbeddingsUnconfigured) and the message
indicating embeddings are intentionally disabled rather than calling
StageHealth::ok("embeddings", format!("provider configured: {p}")).

In `@src/openhuman/memory_tree/health/mod.rs`:
- Around line 112-118: The FailureCode::class() implementation misclassifies
ExtractionTimeout as Unrecoverable; update the match in the class(self) ->
FailureClass method to return FailureClass::Transient for
FailureCode::ExtractionTimeout (in addition to FailureCode::Transient) so that
PipelineFailure::new(FailureCode::ExtractionTimeout) reports a
transient/retryable class; locate the class method on FailureCode (the match
over Self) and add ExtractionTimeout to the arm that yields
FailureClass::Transient.
- Around line 377-447: Update the atomic ordering so the cause and flag are
ordered: in mark_semantic_recall_degraded and mark_structure_degraded store the
corresponding CAUSE first (keep Relaxed) and then store the *_DEGRADED flag with
Ordering::Release; in clear_semantic_recall_degraded and
clear_structure_degraded store the CAUSE reset first (Relaxed) and then set the
*_DEGRADED flag to false with Ordering::Release; in current_degraded_state load
the *_DEGRADED flags with Ordering::Acquire and keep reading the CAUSE with
Relaxed so readers observe the CAUSE consistent with the flag. Ensure you apply
these changes to the functions mark_semantic_recall_degraded,
clear_semantic_recall_degraded, mark_structure_degraded,
clear_structure_degraded, and current_degraded_state.

In `@src/openhuman/memory_tree/score/embed/factory.rs`:
- Around line 220-225: In the EmbedderChoice::OptOut branch clear the global
semantic recall degraded flag before returning the inert embedder so
status/doctor no longer shows a stale degraded state; locate the branch handling
EmbedderChoice::OptOut and call the existing flag-clearing API (e.g.
clear_semantic_recall_degraded() or set_semantic_recall_degraded(false))
immediately before returning Some(Box::new(InertEmbedder::new())).

In `@src/openhuman/memory_tree/score/extract/llm_tests.rs`:
- Line 199: The test binding `_g` should be renamed to a camelCase unused
binding (keep the leading underscore) to comply with Rust naming rules: change
the local binding where you call
crate::openhuman::memory_tree::health::test_guard() from `_g` to something like
`_healthGuard` (or another camelCase name) so the call remains `let _healthGuard
= crate::openhuman::memory_tree::health::test_guard();` and preserves the
unused-binding semantics.

In `@src/openhuman/memory_tree/score/extract/llm.rs`:
- Around line 152-163: The branch treating Some(extracted) as always benign
conflates genuine empty extractions with parse-fallback empties returned as
Some(ExtractedEntities::default()) from try_extract(); update try_extract() to
return a richer enum/result (e.g. Result<Option<ExtractedEntities>,
ExtractParseFallback> or Option<ExtractedEntitiesWithCause>) so parse-fallback
cases carry a typed degradation cause, then change the handling in the
Some(extracted) branch of the caller (the block referencing
ExtractedEntities::default(), try_extract(), and clear_structure_degraded()) to
detect parse-fallback empties and set structure-degraded (via
crate::openhuman::memory_tree::health::clear_structure_degraded()/the inverse
setter) or return an error instead of treating them as normal empty-but-valid
results so pipeline_status accurately reflects extractor parse failures.

In `@src/openhuman/memory_tree/tree_runtime/engine.rs`:
- Around line 164-177: The code deletes buffer entries unconditionally even when
per-node propagation failed (variable failed is non-empty), which prevents
automatic retries in run_hourly_loop(); change the logic around
store::buffer_delete so it only runs when failed.is_empty() (i.e., full
success), and otherwise leave the buffer entries intact and emit a clear
warn/info (include namespace and buffer_filenames) so the namespace remains
discoverable for automatic retry by run_hourly_loop(); references: failed,
store::buffer_delete, buffer_filenames, namespace, run_hourly_loop, and the
tree_summarizer log block.

In `@src/openhuman/memory_tree/tree_runtime/ops.rs`:
- Around line 206-220: The summarizer_available function collapses distinct
failure modes into one message; update summarizer_available (and its match on
create_provider) to differentiate "cloud opt-in is off" from actual provider
construction errors by checking create_provider() result and, on Err(e), include
or surface the error details (or return a distinct message) instead of the
generic provider-not-configured string; use the function name
summarizer_available and the create_provider() error value to build two separate
return branches so callers (doctor/status) can show opt-in vs
provider-construction remediation accurately.

In `@src/openhuman/memory_tree/tree/bucket_seal.rs`:
- Around line 49-50: The seal path in bucket_seal.rs currently accepts any Ok(v)
from the embedder and can store wrong-dimension embeddings into the summary
sidecar; restore the summary-side dimension guard by validating returned
embedding length against the expected dimension before persisting: in the seal
handling code (the same logic area as handle_extract() / pack_checked()), call
the dimension-check routine used by pack_checked() or replicate its check to
compare v.len() to the expected embedding_dim and return the
EmbeddingDimMismatch error on mismatch instead of accepting Ok(v), ensuring only
correctly-dimensioned embeddings are packed and written to the summary sidecar.

In `@src/openhuman/memory/schema.rs`:
- Around line 875-880: The schema for the FieldSchema named
"first_blocking_cause" currently uses TypeSchema::Json but should be explicitly
nullable like pipeline_status.first_blocking_cause; change its type to
TypeSchema::Option(Box::new(TypeSchema::Json)) (keeping the comment and
required: false) so the doctor.first_blocking_cause is modeled as an
optional/null JSON value.

---

Duplicate comments:
In `@src/openhuman/embeddings/factory.rs`:
- Around line 56-61: The custom provider branch currently enables sending the
OpenAI `dimensions` field based solely on model name via
model_supports_dimensions(model), which breaks generic custom endpoints; update
the logic in the custom: handling (where OpenAiEmbedding::new(...) is created
and .with_send_dimensions(...) is called) to NOT enable dimensions for generic
custom providers—either remove the .with_send_dimensions(...) call for the
name.starts_with("custom:") branch or set it explicitly false unless the
provider is exactly "openai" or an explicit opt-in flag is present; apply the
same change to the other similar block referenced around lines 97-109 so only
the real "openai" provider (or an explicit opt-in) can enable dimensions.

---

Nitpick comments:
In `@src/openhuman/embeddings/openai.rs`:
- Around line 17-23: The field send_dimensions and its builder method
Self::with_send_dimensions must be renamed to camelCase (e.g., sendDimensions
and withSendDimensions) throughout the module; update the struct field name, all
constructor/builder call sites, method definitions, docs/comments, and any
serde/serialization keys or tests that reference send_dimensions (also in the
other referenced blocks around lines 34-45 and 132-142) so references remain
consistent and compilation passes. Ensure you rename both the field and the
accessor/mutator methods, and adjust any pattern matches or field initializers
that use send_dimensions.

In `@src/openhuman/embeddings/rpc.rs`:
- Line 404: The crate-visible function signature for resolve_api_key should use
camelCase for its parameter; rename the parameter provider_name to providerName
in the fn signature (pub(crate) fn resolve_api_key(config: &Config,
providerName: &str) -> String) and update all internal call sites, pattern
matches, and any usages in this module (and tests) to the new identifier to
satisfy the repo's Rust naming rule; leave the function name and types
unchanged.

In `@src/openhuman/memory_queue/store.rs`:
- Around line 375-410: The old retry_all_failed() duplicates logic and doesn't
clear failure_reason/failure_class—make a single canonical path by having
retry_all_failed delegate to the newer requeue_failed(config) (or remove
retry_all_failed entirely); update retry_all_failed to call requeue_failed and
return its Result<u64>, ensuring the same with_connection/context usage so the
SQL that nulls failure_reason/failure_class (in requeue_failed) is the one
executed for all callers.

In `@src/openhuman/memory_tree/score/embed/factory.rs`:
- Around line 113-129: The enum EmbedderChoice and adjacent bindings use
snake_case for local fields (e.g., timeout_ms, tree_cfg) which violates the
repo's Rust naming rule; rename these bindings to camelCase (timeoutMs, treeCfg)
and similarly convert any newly added enum field names and pattern bindings in
the same block to camelCase, then update all places that construct, match on, or
reference EmbedderChoice variants (constructors, matches, method params) so
identifiers are consistent across the module (refer to EmbedderChoice, the
Ollama variant fields, and any usages around the 135-168 changes).

In `@src/openhuman/memory_tree/score/embed/openai_compat.rs`:
- Around line 55-77: The new locals api_key and custom_endpoint in
try_from_config violate the repo's camelCase Rust naming rule; rename these
bindings to apiKey and customEndpoint and update their uses in the call to
create_embedding_provider_with_credentials (and any other references) so the
function parameters and local variable names match (e.g., replace api_key ->
apiKey and custom_endpoint -> customEndpoint where passed into
create_embedding_provider_with_credentials and any subsequent code).

In `@src/openhuman/memory_tree/tree_runtime/cli.rs`:
- Around line 694-707: The test currently accepts any Err from
run_rebuild/run_summarize which is too weak because create_provider() fails
early; change it to assert that run_rebuild(&["ns".to_string()]) returns an Err
whose string contains the specific "no summarization provider" message (and
remove the redundant run_summarize check in this test), so the test verifies the
new no-provider failure path from create_provider() rather than allowing
arbitrary failures; locate assertions around run_rebuild and run_summarize in
this block and update the assertion to check for "no summarization provider" on
run_rebuild and delete the duplicate run_summarize verification.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 54d34655-80dd-419f-b766-575463d5e693

📥 Commits

Reviewing files that changed from the base of the PR and between 4b26267 and 597f2dd.

📒 Files selected for processing (57)
  • app/src/components/intelligence/MemoryTreeStatusPanel.test.tsx
  • app/src/components/intelligence/MemoryTreeStatusPanel.tsx
  • app/src/lib/i18n/ar.ts
  • app/src/lib/i18n/bn.ts
  • app/src/lib/i18n/de.ts
  • app/src/lib/i18n/en.ts
  • app/src/lib/i18n/es.ts
  • app/src/lib/i18n/fr.ts
  • app/src/lib/i18n/hi.ts
  • app/src/lib/i18n/id.ts
  • app/src/lib/i18n/it.ts
  • app/src/lib/i18n/ko.ts
  • app/src/lib/i18n/pl.ts
  • app/src/lib/i18n/pt.ts
  • app/src/lib/i18n/ru.ts
  • app/src/lib/i18n/zh-CN.ts
  • app/src/utils/tauriCommands/memoryTree.ts
  • src/openhuman/about_app/catalog_data.rs
  • src/openhuman/config/schema/load.rs
  • src/openhuman/config/schema/storage_memory.rs
  • src/openhuman/embeddings/factory.rs
  • src/openhuman/embeddings/mod.rs
  • src/openhuman/embeddings/openai.rs
  • src/openhuman/embeddings/openai_tests.rs
  • src/openhuman/embeddings/rpc.rs
  • src/openhuman/memory/ingest_pipeline.rs
  • src/openhuman/memory/schema.rs
  • src/openhuman/memory/tools.rs
  • src/openhuman/memory/tools/doctor.rs
  • src/openhuman/memory_queue/handlers/mod.rs
  • src/openhuman/memory_queue/handlers/mod_tests.rs
  • src/openhuman/memory_queue/store.rs
  • src/openhuman/memory_queue/types.rs
  • src/openhuman/memory_queue/worker.rs
  • src/openhuman/memory_store/chunks/connection.rs
  • src/openhuman/memory_store/chunks/store.rs
  • src/openhuman/memory_store/chunks/store_tests.rs
  • src/openhuman/memory_tree/health/doctor.rs
  • src/openhuman/memory_tree/health/mod.rs
  • src/openhuman/memory_tree/mod.rs
  • src/openhuman/memory_tree/retrieval/integration_tests.rs
  • src/openhuman/memory_tree/score/embed/factory.rs
  • src/openhuman/memory_tree/score/embed/mod.rs
  • src/openhuman/memory_tree/score/embed/openai_compat.rs
  • src/openhuman/memory_tree/score/extract/llm.rs
  • src/openhuman/memory_tree/score/extract/llm_tests.rs
  • src/openhuman/memory_tree/tree/bucket_seal.rs
  • src/openhuman/memory_tree/tree/bucket_seal_tests.rs
  • src/openhuman/memory_tree/tree/rpc.rs
  • src/openhuman/memory_tree/tree_runtime/cli.rs
  • src/openhuman/memory_tree/tree_runtime/engine.rs
  • src/openhuman/memory_tree/tree_runtime/engine_tests.rs
  • src/openhuman/memory_tree/tree_runtime/ops.rs
  • src/openhuman/tools/ops.rs
  • tests/memory_tree_memory_round23_raw_coverage_e2e.rs
  • tests/memory_tree_summarizer_e2e.rs
  • tests/memory_tree_sync_raw_coverage_e2e.rs

Comment on lines +271 to +276
<div
className="rounded-lg border border-amber-200 dark:border-amber-500/30 bg-amber-50 dark:bg-amber-500/10 px-3 py-2 text-sm text-amber-800 dark:text-amber-200"
data-testid="memory-tree-blocking-cause">
<div className="font-medium" data-testid="memory-tree-blocking-cause-remediation">
{t(blockingCause.remediation_key)}
</div>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a fallback for backend-driven remediation keys.

t(blockingCause.remediation_key) has no safety net, so a new/unknown key will leak the raw memory.health.remediation.* identifier into the banner. This panel already has status.reason and t() supports a fallback, so it should degrade gracefully here.

💡 Minimal fix
-            {t(blockingCause.remediation_key)}
+            {t(
+              blockingCause.remediation_key,
+              status?.reason ?? t('memoryTree.status.statusDegraded')
+            )}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@app/src/components/intelligence/MemoryTreeStatusPanel.tsx` around lines 271 -
276, MemoryTreeStatusPanel is rendering t(blockingCause.remediation_key) with no
fallback so unknown backend keys leak raw identifiers; change the call to pass a
fallback using the existing status.reason so the banner degrades gracefully
(e.g., use i18n fallback/defaultValue behavior of t with
blockingCause.remediation_key and status.reason as the fallback) and ensure you
update the element that references blockingCause.remediation_key to use that
fallback.

Comment thread app/src/lib/i18n/ar.ts Outdated
Comment thread app/src/lib/i18n/ko.ts Outdated
Comment thread app/src/lib/i18n/zh-CN.ts Outdated
Comment thread app/src/utils/tauriCommands/memoryTree.ts Outdated
Comment on lines +152 to +163
Some(extracted) => {
// #002 (T013): a completed extraction that yielded
// structure means the extraction model is working —
// clear any prior "structure degraded" flag so the
// status/doctor surface recovers. (An empty-but-valid
// result, e.g. genuinely entity-free text, is left
// alone — it isn't evidence the model is broken.)
if !extracted.entities.is_empty() || !extracted.topics.is_empty() {
crate::openhuman::memory_tree::health::clear_structure_degraded();
}
return Ok(extracted);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Differentiate parse-fallback empties from legitimate empty extractions.

try_extract() returns Some(ExtractedEntities::default()) for wrong-shape/non-JSON responses, so this branch currently treats those failures as a normal empty result and leaves structure_degraded unset. That lets a broken extractor keep producing empty wiki structure while pipeline_status still looks healthy. Consider returning a richer result from try_extract() so parse-fallback empties can carry a typed degradation cause instead of being merged with genuinely entity-free text.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_tree/score/extract/llm.rs` around lines 152 - 163, The
branch treating Some(extracted) as always benign conflates genuine empty
extractions with parse-fallback empties returned as
Some(ExtractedEntities::default()) from try_extract(); update try_extract() to
return a richer enum/result (e.g. Result<Option<ExtractedEntities>,
ExtractParseFallback> or Option<ExtractedEntitiesWithCause>) so parse-fallback
cases carry a typed degradation cause, then change the handling in the
Some(extracted) branch of the caller (the block referencing
ExtractedEntities::default(), try_extract(), and clear_structure_degraded()) to
detect parse-fallback empties and set structure-degraded (via
crate::openhuman::memory_tree::health::clear_structure_degraded()/the inverse
setter) or return an error instead of treating them as normal empty-but-valid
results so pipeline_status accurately reflects extractor parse failures.

Comment on lines +164 to 177
if !failed.is_empty() {
log::warn!(
"[tree_summarizer] partial summarization for '{namespace}': {propagated} node(s) \
propagated, {} failed ({:?})",
failed.len(),
failed
);
}

// All hour leaves are durably written and propagation is complete.
// Now it's safe to delete the buffer entries.
// Hour leaves are durably written and propagation is complete (modulo the
// per-node failures above, which are recoverable on the next run). Safe to
// delete the buffer entries — re-running would re-derive the same leaves.
store::buffer_delete(config, namespace, &buffer_filenames)
.context("delete buffer entries after successful summarization")?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don't clear the only automatic retry signal after partial propagation failure.

When failed is non-empty, Lines 176-177 still delete the buffer. run_hourly_loop() only revisits namespaces discovered from tree/buffer, so a transient day/month/year/root failure is never retried unless new content arrives or someone runs a manual rebuild. That turns these “continue and retry later” failures into sticky degradation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_tree/tree_runtime/engine.rs` around lines 164 - 177, The
code deletes buffer entries unconditionally even when per-node propagation
failed (variable failed is non-empty), which prevents automatic retries in
run_hourly_loop(); change the logic around store::buffer_delete so it only runs
when failed.is_empty() (i.e., full success), and otherwise leave the buffer
entries intact and emit a clear warn/info (include namespace and
buffer_filenames) so the namespace remains discoverable for automatic retry by
run_hourly_loop(); references: failed, store::buffer_delete, buffer_filenames,
namespace, run_hourly_loop, and the tree_summarizer log block.

Comment on lines +206 to +220
pub fn summarizer_available(config: &Config) -> (bool, &'static str) {
let local = config.local_ai.runtime_enabled;
match create_provider(config) {
Ok(_) if local => (
true,
"local AI enabled — Build Summary Trees runs on the local model",
),
Ok(_) => (
true,
"local AI off — Build Summary Trees runs on the configured cloud provider",
),
Err(_) => (
false,
"no summarization provider available — enable local AI or configure a cloud provider in Settings → AI",
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don't collapse all summarizer failures into one generic note.

summarizer_available() currently turns both “cloud opt-in is off” and “cloud provider couldn't be built” into the same message. Because doctor/status reuse this string, the remediation can point users at configuring a provider even when the real fix is enabling the opt-in toggle. Please preserve the create_provider() error detail here, or at least split the opt-in-missing case from provider-construction failures.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_tree/tree_runtime/ops.rs` around lines 206 - 220, The
summarizer_available function collapses distinct failure modes into one message;
update summarizer_available (and its match on create_provider) to differentiate
"cloud opt-in is off" from actual provider construction errors by checking
create_provider() result and, on Err(e), include or surface the error details
(or return a distinct message) instead of the generic provider-not-configured
string; use the function name summarizer_available and the create_provider()
error value to build two separate return branches so callers (doctor/status) can
show opt-in vs provider-construction remediation accurately.

Comment on lines +49 to 50
use crate::openhuman::memory_tree::score::embed::build_write_embedder;
use crate::openhuman::memory_tree::score::extract::EntityExtractor;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Restore the summary-side dimension guard.

handle_extract() still calls pack_checked() before persisting embeddings, but this seal path now accepts any Ok(v) from the embedder directly. If a provider returns the wrong dimensionality, the failure either shows up later as an untyped store error or slips into the summary sidecar instead of surfacing the intended EmbeddingDimMismatch.

Suggested fix
-use crate::openhuman::memory_tree::score::embed::build_write_embedder;
+use crate::openhuman::memory_tree::score::embed::{build_write_embedder, pack_checked};
@@
         Some(embedder) => {
             let v = match embedder.embed(&embed_input).await {
                 Ok(v) => v,
                 Err(e) => {
@@
                     )));
                 }
             };
+            pack_checked(&v)
+                .map_err(|e| {
+                    anyhow::Error::new(
+                        crate::openhuman::memory_tree::health::PipelineFailure::new(
+                            crate::openhuman::memory_tree::health::FailureCode::EmbeddingDimMismatch,
+                        )
+                        .with_detail(e.to_string()),
+                    )
+                })
+                .context(format!(
+                    "validate summary embedding dims tree_id={} level={}",
+                    tree.id, level
+                ))?;
             log::debug!(
                 "[tree::bucket_seal] embedded summary tree_id={} level={}→{} bytes={} provider={}",
                 tree.id,

Also applies to: 463-500

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory_tree/tree/bucket_seal.rs` around lines 49 - 50, The seal
path in bucket_seal.rs currently accepts any Ok(v) from the embedder and can
store wrong-dimension embeddings into the summary sidecar; restore the
summary-side dimension guard by validating returned embedding length against the
expected dimension before persisting: in the seal handling code (the same logic
area as handle_extract() / pack_checked()), call the dimension-check routine
used by pack_checked() or replicate its check to compare v.len() to the expected
embedding_dim and return the EmbeddingDimMismatch error on mismatch instead of
accepting Ok(v), ensuring only correctly-dimensioned embeddings are packed and
written to the summary sidecar.

Comment on lines +875 to +880
FieldSchema {
name: "first_blocking_cause",
ty: TypeSchema::Json,
comment: "Typed { code, class, remediation_key, detail? } of the first \
non-ok stage; null when healthy.",
required: false,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make doctor.first_blocking_cause explicitly nullable in the schema.

The comment says this field is null when healthy, but the schema advertises plain Json with required: false. pipeline_status.first_blocking_cause already uses TypeSchema::Option(...), so this currently misdescribes healthy doctor responses.

Suggested fix
                 FieldSchema {
                     name: "first_blocking_cause",
-                    ty: TypeSchema::Json,
+                    ty: TypeSchema::Option(Box::new(TypeSchema::Json)),
                     comment: "Typed { code, class, remediation_key, detail? } of the first \
                               non-ok stage; null when healthy.",
                     required: false,
                 },
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/memory/schema.rs` around lines 875 - 880, The schema for the
FieldSchema named "first_blocking_cause" currently uses TypeSchema::Json but
should be explicitly nullable like pipeline_status.first_blocking_cause; change
its type to TypeSchema::Option(Box::new(TypeSchema::Json)) (keeping the comment
and required: false) so the doctor.first_blocking_cause is modeled as an
optional/null JSON value.

Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zahica1234 both items from my last review are addressed cleanly.

The cloud summarizer privacy issue is fixed properly — cloud_summarization_opt_in: bool defaults to false, the runtime gates the cloud path on that flag, and all 14 locale remediation strings now point users to the opt-in setting. That's the right approach. The extraction_coverage: Option<f32> change is also clean — using .ok() instead of .unwrap_or(0.0) preserves the "measurement failed" vs "genuine zero" distinction all the way through the doctor report.

sanil-23's two items are also handled: with_send_dimensions is now gated on model_supports_dimensions() (only text-embedding-3-* gets the field), and build_embedder_from_config / build_write_embedder share resolve_embedder_choice — the silent-drift risk is gone.

Two CI checks are still red:

  • E2E Playwright lane 1
  • Rust Core Coverage (which also causes the Coverage Gate to skip)

Once those are green i'll approve. let me know if the failures are related to the opt-in behavior changes or if they're pre-existing.

…sent

FR-002's skip-not-zero-vector path now calls mark_semantic_recall_degraded
when there's no embeddings provider (correct production behaviour). On a
fresh workspace this left a process-global flag set that other tests
(holding test_guard()) could read as 'degraded' — specifically both
pipeline_status_returns_idle_for_empty_store and
pipeline_status_reports_chunk_aggregates_after_ingest were susceptible
to the race under the Coverage runner's single-process parallelism.

The semantic fix: 'degraded' is only meaningful when there is actual
content whose recall/structure has been affected. An empty workspace
with a misconfigured embedder should read 'idle' (nothing has been
embedded yet) — the Doctor tool still reports the config gap via its
dedicated 'embeddings' stage, which is the right surface for pre-ingest
diagnosis. Changing the condition from `degraded.is_degraded()` to
`degraded.is_degraded() && total_chunks > 0` eliminates the false
'degraded' status for fresh workspaces and fully resolves the test race.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@zahica1234 zahica1234 force-pushed the pr/memory-pipeline-hardening branch from 7b63983 to c382a8d Compare June 1, 2026 19:15
@zahica1234
Copy link
Copy Markdown
Contributor Author

Thanks for the review confirmation. Addressed both CI failures — commit c382a8d1, then merged upstream/main (16dc1730).


Rust Core Coverage — root cause and fix

This was causally linked to FR-002. Before our change, build_write_embedder with no provider returned Some(InertEmbedder) — no degraded flag set. After FR-002, it returns None + calls mark_semantic_recall_degraded. The ingest_chat_writes_and_queue_drains_to_admitted_chunk test (which uses a no-provider config) now sets the process-global flag during drain_until_idle. Under the Coverage runner's single-process parallelism, this can race with the pipeline_status_returns_idle_for_empty_store test that holds test_guard() and expects status = "idle" — the contaminated flag flips it to "degraded".

Fix: change derive_pipeline_status to only emit "degraded" when total_chunks > 0. An empty workspace with a misconfigured embedder has nothing to degrade — "idle" is the correct status (the Doctor tool's "embeddings" stage still flags the config gap). This is also the right production semantic: the degraded banner in the UI should only fire when there's actual content whose recall has been affected. With no chunks, there's nothing to warn about yet.

Both pipeline_status_returns_idle_for_empty_store and pipeline_status_reports_chunk_aggregates_after_ingest pass in isolation and will now also pass under the Coverage runner's parallel execution.


Playwright lane 1 — unrelated infrastructure failure

50+ tests fail with ~240ms timeouts in shard 1/4; shards 2/3/4 all pass. This is the app binary failing to start in that CI shard — a resource/port contention issue at the runner level, not a test regression from our diff. Our changes have no Playwright-facing behavior (no new routes, no new frontend state, no new RPC in the frontend).

Copy link
Copy Markdown
Contributor

@sanil-23 sanil-23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the rebase and this round of fixes — solid progress. The two lanes that were red on my last pass, Vitest and Rust E2E (mock backend), are both green now, and graycyrus's two items look properly handled: cloud_summarization_opt_in: bool defaulting to false with the runtime gating the cloud path on it, and extraction_coverage: Option<f32> using .ok() so a measurement failure stays distinct from a genuine 0%.

I'm holding a formal approval for two reasons.

1. CI isn't fully green yet. Two jobs are still red:

  • Rust Core Coverage — your read that this is a pre-existing flake in an unrelated *_raw_coverage_e2e.rs suite is plausible (the failing test rotates and sits in domains this PR doesn't touch), but I'd want a green re-run to confirm rather than waive it.
  • E2E (Playwright lane 1/4) — that whole lane is red across specs unrelated to memory (command palette, composio triggers, gmail connector, even "baseline app shell is ready"), while lanes 2-4 pass. That reads like a lane-level infra flake rather than your diff, but a re-run would settle it.

2. One item from my last pass is still open — in embeddings/factory.rs, with_send_dimensions(true) is now applied to the custom and custom: providers, so the request body sends "dimensions" whenever send_dimensions && dims > 0. The method's own doc says to only call it for genuine OpenAI / OpenAI-compatible endpoints that implement the parameter — but custom: is exactly the arbitrary-endpoint path (vLLM, text-embeddings-inference, stricter LocalAI builds), and the strict ones will 400 on the unknown dimensions field, breaking every embed for those users. The 3-large@1024 requirement is OpenAI-specific, so gate the flag on the model id (text-embedding-3-*) rather than the whole custom provider class, or keep it opt-in for custom. Replied on the existing thread.

CodeRabbit also still has several open majors here worth clearing. Once CI is green and those are resolved, ping me and I'll do the full pass and approve.

@senamakel senamakel self-assigned this Jun 2, 2026
senamakel added 5 commits June 2, 2026 02:21
# Conflicts:
#	app/src/lib/i18n/ar.ts
#	app/src/lib/i18n/bn.ts
#	app/src/lib/i18n/de.ts
#	app/src/lib/i18n/es.ts
#	app/src/lib/i18n/fr.ts
#	app/src/lib/i18n/hi.ts
#	app/src/lib/i18n/id.ts
#	app/src/lib/i18n/it.ts
#	app/src/lib/i18n/ko.ts
#	app/src/lib/i18n/pl.ts
#	app/src/lib/i18n/pt.ts
#	app/src/lib/i18n/ru.ts
#	app/src/lib/i18n/zh-CN.ts
#	src/openhuman/about_app/catalog_data.rs
…cales

Replace English "Embeddings" with each locale's existing translation in
the Settings → AI → Embeddings navigation path within memory health
remediation strings. Addresses @coderabbitai on ar.ts, ko.ts, zh-CN.ts.
…age type

- MemoryTreeStatusPanel: fall back to raw key if t() returns empty for an
  unknown remediation_key (addresses @coderabbitai on MemoryTreeStatusPanel.tsx:276)
- memoryTree.ts: type extraction_coverage as `number | null` to distinguish
  DB-read failures from genuine 0% (addresses @coderabbitai on memoryTree.ts:815)
…MARIZATION

Accept on/off in addition to true/false/1/0/yes/no, consistent with
other boolean env overrides. Addresses @coderabbitai on load.rs:1986.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Net-new user-facing capability or product behavior. memory Memory store, memory tree, recall, summarization, and embeddings in src/openhuman/memory/. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants