feat: add automated model refresh and recall tooling#330
Conversation
|
very chunky pr. hope this is up to par. |
|
@WXBR this is a big one (+1236). The scope is right — automated sleep model refresh is something I've wanted natively — but the 3.13 runner can't download BAAI/bge-small-en-v1.5 and 48 tests fail as a result. All failures trace to the same model download infra issue, not code quality. The PR itself looks structurally sound. Holding for now. Need to fix the model provider injection path before this can land. That's my side — I'll get it sorted and come back to this. What I like: The architecture is clean. Validation pipeline, deterministic accept/reject, no free-blob nonsense. That's exactly how I'd want it wired. Thanks for the substantial work. I will circle back on this one. |
23a685c to
6f3d3a2
Compare
|
Follow-up note on what changed in the latest push: I rebased this branch onto current The new commit adds three main pieces:
I also replaced the previously stubbed Validation after the final push: GitHub Actions is green across docs, build, and Python 3.10 through 3.13. |
|
I checked the latest head (
Both seem worth covering with tests before merge. @AxDSan fyi |
|
@dplush, both concerns land:
@WXBR, address those two items and we merge. The scope of this PR is exactly what the platform needs. Will wait for the fixes, then merge on green CI. Also, to be explicit on the merge order: #338 just landed and touched both provider init files. Rebase onto current main before pushing the fixes so we get a clean diff. |
|
Got it👍 |
9f4c703 to
1437f6c
Compare
|
Addressed the two maintainer-requested items and rebased onto current Latest head: Changes in this push:
Local validation: GitHub Actions has started on the rebased head and is currently pending. |
|
CI is now green on |
Closes #323. Adds parity tests between hermes_memory_provider and mnemosyne_hermes to prevent behavior drift. +717/-17, almost entirely test code (low risk of behavioral regression). Catches the class of bugs already hit: dunder-tool vs canonical-schema path inconsistency, import shadowing. The parity coverage should land before the larger PRs (#353, #330) that touch the same provider files, so any future drift is caught by the new tests.
|
Reviewed head
The rest of the design is also solid:
Need a rebase though. The other six PRs in the queue landed today (#347, #349, #348, #353, #352, #354) and the provider init files, tools, and test_provider_all_15_tools.py now have conflicts with your branch. Can you rebase on current main and push? Once the rebase lands cleanly with CI green, we merge immediately. Merge order matters here: #330 was the last in the queue by design because of the 23-file surface, and now everything smaller has shipped first so the diff against main is just the #330 delta plus the new merge resolution. |
1437f6c to
caa8012
Compare
|
Rebased onto current What changed in the rebase resolution:
Local validation after the final push: GitHub Actions is green on |
AxDSan
left a comment
There was a problem hiding this comment.
Sorry for the delay, was IRL busy the last few days.
@WXBR 2066 lines, 26 files, opt-in by default. The shape is right (env-gated new tools, CanonicalStore owner scoping, dead-stub orchestrator replacement, model-refresh safety rails with MNEMOSYNE_SLEEP_MODEL_REFRESH_AUTO_APPLY=false as the emergency brake). Tool count math checks out: 33 -> 37 with the four new tools (mnemosyne_model_card, mnemosyne_model_refresh, mnemosyne_recall_diagnostics, mnemosyne_task_progress). Tests are substantial, including the 235-line test_model_refresh_stress.py covering apply/reject paths, owner namespace, cron suppression, and conflict supersession.
That said, I need changes before merge. There is one behavior change buried in here that ships silently unless I call it out, plus two process gaps.
Critical: sync_roles default flip is undocumented
hermes_memory_provider/__init__.py and integrations/hermes/src/mnemosyne_hermes/__init__.py both change self._sync_roles default from {"user","assistant"} to {"user"}. The config schema default in sync_roles.default flips the same way. test_sync_roles.py and test_hermes_provider_parity.py were updated to match.
This breaks any deployment that relied on the previous default for assistant-turn autosave. They will not see a loud error; they will see assistant turns silently stop landing in Mnemosyne on upgrade. That is the worst kind of change: invisible, no signal until a user notices their L3 persona stopped updating, and impossible to diagnose from outside.
The PR body does not mention this flip. CHANGELOG has no entry. Three things I need before merge:
- Call the flip out in the PR description under a "Behavior changes" heading. State the old default, the new default, and the migration path for existing deployments (
memory.mnemosyne.sync_roles: ["user", "assistant"]inconfig.yaml). - Add a CHANGELOG entry under
[Unreleased]that names the flip and points to the migration config. - Bump the version to a clear target. Right now
mnemosyne/__init__.pyis on3.10.1(shipped via #373) with a stale duplicate3.9.0line (pre-existing, not yours, but worth cleaning up while you are in there). A behavior change like this warrants MINOR at minimum, so3.11.0.
Warnings (not blockers, but worth knowing)
- Hard-coded provider re-init paths in the new tool handlers.
_handle_model_card,_handle_task_progress,_handle_recall_diagnosticseach repeat thegetattr(self._beam, "canonical", None)+ lazyCanonicalStore(...)+self._beam.canonical = storepattern. Worth extracting a small_get_canonical_store()helper on the provider base so the next tool does not copy-paste it. mnemosyne.core.local_llmimport insidemodel_refresh.infer_model_update_proposalsuses internal helpers_try_host_llmand_call_remote_llm. The docstring acknowledges "internal sibling API used by sleep too". This coupling should be promoted to a documented public seam if not already, otherwise a future refactor inlocal_llmsilently breaks model-refresh and the failure mode is non-obvious._FACT_MATCH_STOPWORDSseeds fromMNEMOSYNE_RECALL_EXTRA_STOPWORDSat import time. Operators who change the env var need a process restart to pick it up. Not a regression, just worth a sentence in the PR body so the limitation is on the record.
Looks good
mnemosyne_recall_diagnosticsis properly gated; default returns{"status": "disabled"}with no snapshot leak.mnemosyne_model_refreshis correctly diagnostic-only: schema'saction.enumis hard-coded to["list"], andtest_model_refresh_tool_is_diagnostic_onlycovers it.mnemosyne_task_progressuses CanonicalStore owner scoping end-to-end (round-trip, profile-isolation, clear validation all covered bytest_task_progress_tool.py).- Orchestrator replacement preserves the old
from mnemosyne.core.orchestrator import ...signature and returns[]on exception instead of raising. Strict improvement over the dead stub. - Model-refresh safety rails (
_EPHEMERAL_RE, min-confidence + min-evidence gates, evidence-IDs subset check) are well-designed. TheMNEMOSYNE_SLEEP_MODEL_REFRESH_AUTO_APPLY=falseemergency brake is documented in the right place. test_query_intent_recall.pycorrectly verifies both the opt-in path and the explicit-weight override (theMNEMOSYNE_QUERY_INTENTbranch only fires whenvec_weight,fts_weight,importance_weightare allNone).canonical_owner_idwiring is consistent across both provider inits and survives standalone use (defaults to"default"/"primary").
Why this is request-changes, not approve
The sync_roles flip is not a "feel like calling it out" nit. It is a silent behavior change for every deployment that did not override the default. We cannot ship that without telling the user, recording it in the changelog, and bumping the version. Once those three are in, this is a fast approve. I want to land it.
Push the changes and I will re-review same day.
caa8012 to
0ee9df6
Compare
|
Addressed the requested release-hygiene fixes on latest head
Validation: |
Summary
This PR adds automated sleep-time model refresh plus opt-in observability/continuity surfaces for Mnemosyne:
mnemosyne_model_cardandmnemosyne_model_refreshfor canonical model-card inspection and model-refresh diagnosticsMNEMOSYNE_QUERY_INTENT=1gated query-intent weight adjustment inBeamMemory.recall()mnemosyne_recall_diagnosticsfor recall tier/fallback counters, gated byMNEMOSYNE_RECALL_DIAGNOSTICS=1mnemosyne_task_progress, an owner-scoped canonicaltask:progresstool for curated “where did we leave off?” statemnemosyne.core.orchestratorstub with a thin compatibility wrapper overBeamMemory.recall()Behavior changes
sync_rolesnow defaults to user-turn autosave only. The old default was["user", "assistant"]; the new default is["user"].config.yaml:3.11.0because the default autosave behavior changes silently for deployments that did not already overridesync_roles.Notes
mnemosyne_recall_diagnosticsreturns disabled unless explicitly enabled via env var.mnemosyne_model_refreshis diagnostic-only; the schema exposesaction: "list"only.mnemosyne_task_progressuses the existingCanonicalStoreowner scoping instead of raw global keys.MNEMOSYNE_RECALL_EXTRA_STOPWORDSis read at import time through_FACT_MATCH_STOPWORDS, so changing that env var requires a process restart to affect recall stopword matching.Verification
Ran locally after rebasing on current
upstream/mainand applying the release-hygiene fixes:Also checked for duplicate upstream issues/PRs by exact symbols earlier in the PR:
mnemosyne_recall_diagnosticsmnemosyne_task_progressMNEMOSYNE_QUERY_INTENT adjust_weightsorchestrate_recall BeamMemoryNo matching issues/PRs found.