feat: add automated model refresh and recall tooling by WXBR · Pull Request #330 · AxDSan/mnemosyne

WXBR · 2026-06-16T08:13:08Z

Summary

This PR adds automated sleep-time model refresh plus opt-in observability/continuity surfaces for Mnemosyne:

add sleep-time model update inference, validation, deterministic auto-apply/reject, and diagnostic listing
add mnemosyne_model_card and mnemosyne_model_refresh for canonical model-card inspection and model-refresh diagnostics
add MNEMOSYNE_QUERY_INTENT=1 gated query-intent weight adjustment in BeamMemory.recall()
expose mnemosyne_recall_diagnostics for recall tier/fallback counters, gated by MNEMOSYNE_RECALL_DIAGNOSTICS=1
add mnemosyne_task_progress, an owner-scoped canonical task:progress tool for curated “where did we leave off?” state
replace the dead mnemosyne.core.orchestrator stub with a thin compatibility wrapper over BeamMemory.recall()

Behavior changes

Hermes sync_roles now defaults to user-turn autosave only. The old default was ["user", "assistant"]; the new default is ["user"].
Existing deployments that want to keep assistant-turn autosave should set this explicitly in config.yaml:

memory:
  mnemosyne:
    sync_roles: ["user", "assistant"]

This changes the release target to 3.11.0 because the default autosave behavior changes silently for deployments that did not already override sync_roles.

Notes

mnemosyne_recall_diagnostics returns disabled unless explicitly enabled via env var.
mnemosyne_model_refresh is diagnostic-only; the schema exposes action: "list" only.
mnemosyne_task_progress uses the existing CanonicalStore owner scoping instead of raw global keys.
Normal recall remains retrieval-only. No LLM reranking or new network dependency is added.
Query-intent weighting is opt-in and explicit caller weights still win.
MNEMOSYNE_RECALL_EXTRA_STOPWORDS is read at import time through _FACT_MATCH_STOPWORDS, so changing that env var requires a process restart to affect recall stopword matching.

Verification

Ran locally after rebasing on current upstream/main and applying the release-hygiene fixes:

PYTHONPATH=integrations/hermes/src:. python3 -m pytest tests/ -q -o 'addopts='
# 1712 passed, 28 skipped in 34.05s

cd integrations/hermes && PYTHONPATH=src:../.. python3 -m pytest tests/ -q -o 'addopts='
# 22 passed in 0.20s

python3 -m py_compile mnemosyne/__init__.py mnemosyne/core/beam.py mnemosyne/core/model_refresh.py hermes_memory_provider/__init__.py integrations/hermes/src/mnemosyne_hermes/__init__.py integrations/hermes/src/mnemosyne_hermes/tools.py integrations/hermes/src/mnemosyne_hermes/install.py
# passed

git diff --check
# passed

PYTHONPATH=. python3 - <<'PY'
import mnemosyne
assert mnemosyne.__version__ == '3.11.0', mnemosyne.__version__
print('version', mnemosyne.__version__)
PY
# version 3.11.0

Also checked for duplicate upstream issues/PRs by exact symbols earlier in the PR:

mnemosyne_recall_diagnostics
mnemosyne_task_progress
MNEMOSYNE_QUERY_INTENT adjust_weights
orchestrate_recall BeamMemory

No matching issues/PRs found.

WXBR · 2026-06-16T08:17:34Z

very chunky pr. hope this is up to par.

AxDSan · 2026-06-16T19:06:42Z

@WXBR this is a big one (+1236). The scope is right — automated sleep model refresh is something I've wanted natively — but the 3.13 runner can't download BAAI/bge-small-en-v1.5 and 48 tests fail as a result.

All failures trace to the same model download infra issue, not code quality. The PR itself looks structurally sound.

Holding for now. Need to fix the model provider injection path before this can land. That's my side — I'll get it sorted and come back to this.

What I like: The architecture is clean. Validation pipeline, deterministic accept/reject, no free-blob nonsense. That's exactly how I'd want it wired.

Thanks for the substantial work. I will circle back on this one.

WXBR · 2026-06-17T04:53:59Z

Follow-up note on what changed in the latest push:

I rebased this branch onto current upstream/main, resolved the provider config conflict by keeping the upstream user-only sync_roles default, then added the recall/tooling layer on top of the existing automated model-refresh work.

The new commit adds three main pieces:

MNEMOSYNE_QUERY_INTENT=1 gated query-intent weighting in BeamMemory.recall(). This keeps default recall unchanged, and explicit caller weights still override the intent adjustment.
mnemosyne_recall_diagnostics, gated by MNEMOSYNE_RECALL_DIAGNOSTICS=1, for inspecting recall-tier/fallback counters without making diagnostics globally active.
mnemosyne_task_progress, backed by owner-scoped canonical facts, for curated task-continuity state instead of relying on raw transcript recall.

I also replaced the previously stubbed mnemosyne.core.orchestrator path with a compatibility wrapper over BeamMemory.recall(), and fixed two regressions that showed up only when running the broader suite after the rebase: empty MNEMOSYNE_DATA_DIR fallback behavior under isolated HOME, and double-counting accepted sync events when the remember pipeline succeeded.

Validation after the final push:

python -m pytest tests/ -q -o 'addopts='
1634 passed, 1 warning, 4 subtests passed

cd integrations/hermes && python -m pytest tests/ -q -o 'addopts='
16 passed

GitHub Actions is green across docs, build, and Python 3.10 through 3.13.

dplush · 2026-06-17T06:30:41Z

I checked the latest head (9f4c703). CI is green, but I would still hold this before merge for two safety/scoping fixes:

model-refresh auto-apply currently uses owner_id="default", which can write canonical model facts into the wrong namespace for non-default Hermes profiles
model-refresh inference can still fall through to remote LLM calls instead of respecting disabled/force-local LLM expectations

Both seem worth covering with tests before merge.

@AxDSan fyi

AxDSan · 2026-06-17T19:02:21Z

@dplush, both concerns land:

model-refresh auto-apply must scope owner_id properly so non-default Hermes profiles do not get canonical facts written into the wrong namespace
model-refresh auto-apply must skip when triggered from cron

@WXBR, address those two items and we merge. The scope of this PR is exactly what the platform needs. Will wait for the fixes, then merge on green CI.

Also, to be explicit on the merge order: #338 just landed and touched both provider init files. Rebase onto current main before pushing the fixes so we get a clean diff.

WXBR · 2026-06-17T23:31:18Z

Got it👍

WXBR · 2026-06-17T23:38:55Z

Addressed the two maintainer-requested items and rebased onto current upstream/main (v3.9.0).

Latest head: 1437f6c

Changes in this push:

Model-refresh auto-apply now writes canonical model facts through the active Beam/runtime owner namespace instead of hardcoding owner_id="default".
- Hermes providers attach beam.canonical_owner_id = self._canonical_owner() after initialization.
- Explicit canonical tools and sleep-time model-refresh auto-apply now target the same owner namespace for non-default profiles.
Sleep-time model refresh now skips in cron context.
- Providers attach beam.agent_context = self._agent_context.
- BeamMemory.sleep() suppresses model-refresh inference/proposal/auto-apply when agent_context == "cron", so cron maintenance sleep cannot mutate canonical model slots.
Added regression coverage:
- auto-apply uses a non-default Beam owner namespace and does not write into default
- cron-context sleep does not call model-refresh inference and creates no canonical model facts
- active Hermes adapter initializes Beam with canonical owner + agent context

Local validation:

python -m pytest tests/test_model_refresh_stress.py integrations/hermes/tests/test_canonical_tools.py -q -o 'addopts='
11 passed

python -m pytest tests/ -q -o 'addopts='
1652 passed, 1 warning, 4 subtests passed

cd integrations/hermes && python -m pytest tests/ -q -o 'addopts='
17 passed

python -m py_compile mnemosyne/core/beam.py mnemosyne/core/model_refresh.py hermes_memory_provider/__init__.py integrations/hermes/src/mnemosyne_hermes/__init__.py integrations/hermes/src/mnemosyne_hermes/tools.py
# passed

git diff --check
# passed

GitHub Actions has started on the rebased head and is currently pending.

WXBR · 2026-06-17T23:41:36Z

CI is now green on 1437f6c: docs-check, build, and Python 3.10/3.11/3.12/3.13 all passed.

Closes #323. Adds parity tests between hermes_memory_provider and mnemosyne_hermes to prevent behavior drift. +717/-17, almost entirely test code (low risk of behavioral regression). Catches the class of bugs already hit: dunder-tool vs canonical-schema path inconsistency, import shadowing. The parity coverage should land before the larger PRs (#353, #330) that touch the same provider files, so any future drift is caught by the new tests.

AxDSan · 2026-06-19T03:44:41Z

Reviewed head 1437f6c. Both dplush concerns are addressed cleanly:

owner_id scoping: hermes_memory_provider/__init__.py:1607 sets self._beam.canonical_owner_id = self._canonical_owner() after init, and mnemosyne/core/beam.py:7910 reads self.canonical_owner_id for the auto-apply call. No more hardcoded "default" for non-default profiles.
cron guard: mnemosyne/core/beam.py:7911 skips infer_model_update_proposals when agent_context == "cron". Auto-apply never fires in cron context.

The rest of the design is also solid:

MNEMOSYNE_QUERY_INTENT=1 gates the query-intent weighting, opt-in only
MNEMOSYNE_RECALL_DIAGNOSTICS=1 gates the new tool surface, opt-in only
MNEMOSYNE_AUTO_REFRESH_MODEL gates the auto-apply itself
the validator pipeline (confidence threshold, evidence count, conflict thresholds) is the right shape

Need a rebase though. The other six PRs in the queue landed today (#347, #349, #348, #353, #352, #354) and the provider init files, tools, and test_provider_all_15_tools.py now have conflicts with your branch. Can you rebase on current main and push? Once the rebase lands cleanly with CI green, we merge immediately.

Merge order matters here: #330 was the last in the queue by design because of the 23-file surface, and now everything smaller has shipped first so the diff against main is just the #330 delta plus the new merge resolution.

WXBR · 2026-06-19T06:02:09Z

Rebased onto current upstream/main and pushed latest head caa8012.

What changed in the rebase resolution:

Resolved the provider/tool conflicts from the landed PR queue.
- Kept the current upstream tool surfaces such as mnemosyne_triple_end, sync tools, persona tools, and installer status handling.
- Preserved this PR's model-refresh/model-card tools.
- Added mnemosyne_recall_diagnostics and mnemosyne_task_progress to the root hermes_memory_provider as well as the pip adapter so provider parity holds.
Preserved the PR's intended user-only autosave default.
- Both provider surfaces now agree on sync_roles defaulting to ["user"].
- Updated the provider parity expectation accordingly.
Fixed the integration installer conflict/regression found during local validation.
- Removed the duplicate enum-style PluginState/recursive plugin_state() path so the dataclass-based install status tests pass.

Local validation after the final push:

PYTHONPATH=integrations/hermes/src:. python3 -m pytest tests/test_hermes_provider_parity.py tests/test_provider_all_15_tools.py tests/test_sync_roles.py tests/test_model_refresh_stress.py integrations/hermes/tests/test_canonical_tools.py integrations/hermes/tests/test_recall_diagnostics_tool.py integrations/hermes/tests/test_task_progress_tool.py -q -o 'addopts='
85 passed

PYTHONPATH=integrations/hermes/src:. python3 -m pytest tests/ -q -o 'addopts='
1707 passed, 28 skipped

cd integrations/hermes && PYTHONPATH=src:../.. python3 -m pytest tests/ -q -o 'addopts='
22 passed

python3 -m py_compile mnemosyne/core/beam.py mnemosyne/core/model_refresh.py hermes_memory_provider/__init__.py integrations/hermes/src/mnemosyne_hermes/__init__.py integrations/hermes/src/mnemosyne_hermes/tools.py integrations/hermes/src/mnemosyne_hermes/install.py
# passed

git diff --check
# passed

GitHub Actions is green on caa8012: docs-check, build, and Python 3.10/3.11/3.12/3.13 all passed.

AxDSan

Sorry for the delay, was IRL busy the last few days.

@WXBR 2066 lines, 26 files, opt-in by default. The shape is right (env-gated new tools, CanonicalStore owner scoping, dead-stub orchestrator replacement, model-refresh safety rails with MNEMOSYNE_SLEEP_MODEL_REFRESH_AUTO_APPLY=false as the emergency brake). Tool count math checks out: 33 -> 37 with the four new tools (mnemosyne_model_card, mnemosyne_model_refresh, mnemosyne_recall_diagnostics, mnemosyne_task_progress). Tests are substantial, including the 235-line test_model_refresh_stress.py covering apply/reject paths, owner namespace, cron suppression, and conflict supersession.

That said, I need changes before merge. There is one behavior change buried in here that ships silently unless I call it out, plus two process gaps.

Critical: sync_roles default flip is undocumented

hermes_memory_provider/__init__.py and integrations/hermes/src/mnemosyne_hermes/__init__.py both change self._sync_roles default from {"user","assistant"} to {"user"}. The config schema default in sync_roles.default flips the same way. test_sync_roles.py and test_hermes_provider_parity.py were updated to match.

This breaks any deployment that relied on the previous default for assistant-turn autosave. They will not see a loud error; they will see assistant turns silently stop landing in Mnemosyne on upgrade. That is the worst kind of change: invisible, no signal until a user notices their L3 persona stopped updating, and impossible to diagnose from outside.

The PR body does not mention this flip. CHANGELOG has no entry. Three things I need before merge:

Call the flip out in the PR description under a "Behavior changes" heading. State the old default, the new default, and the migration path for existing deployments (memory.mnemosyne.sync_roles: ["user", "assistant"] in config.yaml).
Add a CHANGELOG entry under [Unreleased] that names the flip and points to the migration config.
Bump the version to a clear target. Right now mnemosyne/__init__.py is on 3.10.1 (shipped via #373) with a stale duplicate 3.9.0 line (pre-existing, not yours, but worth cleaning up while you are in there). A behavior change like this warrants MINOR at minimum, so 3.11.0.

Warnings (not blockers, but worth knowing)

Hard-coded provider re-init paths in the new tool handlers. _handle_model_card, _handle_task_progress, _handle_recall_diagnostics each repeat the getattr(self._beam, "canonical", None) + lazy CanonicalStore(...) + self._beam.canonical = store pattern. Worth extracting a small _get_canonical_store() helper on the provider base so the next tool does not copy-paste it.
mnemosyne.core.local_llm import inside model_refresh.infer_model_update_proposals uses internal helpers _try_host_llm and _call_remote_llm. The docstring acknowledges "internal sibling API used by sleep too". This coupling should be promoted to a documented public seam if not already, otherwise a future refactor in local_llm silently breaks model-refresh and the failure mode is non-obvious.
_FACT_MATCH_STOPWORDS seeds from MNEMOSYNE_RECALL_EXTRA_STOPWORDS at import time. Operators who change the env var need a process restart to pick it up. Not a regression, just worth a sentence in the PR body so the limitation is on the record.

Looks good

mnemosyne_recall_diagnostics is properly gated; default returns {"status": "disabled"} with no snapshot leak.
mnemosyne_model_refresh is correctly diagnostic-only: schema's action.enum is hard-coded to ["list"], and test_model_refresh_tool_is_diagnostic_only covers it.
mnemosyne_task_progress uses CanonicalStore owner scoping end-to-end (round-trip, profile-isolation, clear validation all covered by test_task_progress_tool.py).
Orchestrator replacement preserves the old from mnemosyne.core.orchestrator import ... signature and returns [] on exception instead of raising. Strict improvement over the dead stub.
Model-refresh safety rails (_EPHEMERAL_RE, min-confidence + min-evidence gates, evidence-IDs subset check) are well-designed. The MNEMOSYNE_SLEEP_MODEL_REFRESH_AUTO_APPLY=false emergency brake is documented in the right place.
test_query_intent_recall.py correctly verifies both the opt-in path and the explicit-weight override (the MNEMOSYNE_QUERY_INTENT branch only fires when vec_weight, fts_weight, importance_weight are all None).
canonical_owner_id wiring is consistent across both provider inits and survives standalone use (defaults to "default" / "primary").

Why this is request-changes, not approve

The sync_roles flip is not a "feel like calling it out" nit. It is a silent behavior change for every deployment that did not override the default. We cannot ship that without telling the user, recording it in the changelog, and bumping the version. Once those three are in, this is a fast approve. I want to land it.

Push the changes and I will re-review same day.

WXBR · 2026-06-23T09:36:48Z

Addressed the requested release-hygiene fixes on latest head 0ee9df6:

Added a Behavior changes section to the PR body documenting the sync_roles default flip from ["user", "assistant"] to ["user"] and the migration config for deployments that want assistant-turn autosave.
Added a [Unreleased] CHANGELOG entry with the same behavior change and migration path.
Bumped mnemosyne.__version__ from 3.10.1 to 3.11.0.
Kept the branch rebased on current upstream/main, including the chore: release: v3.10.1 — critical JWT signature verification fix (CVE GHSA-xcw4-53cc-hv32) #373 3.10.1 security release, before applying these changes.

Validation:

PYTHONPATH=integrations/hermes/src:. python3 -m pytest tests/ -q -o 'addopts='
1712 passed, 28 skipped

cd integrations/hermes && PYTHONPATH=src:../.. python3 -m pytest tests/ -q -o 'addopts='
22 passed

python3 -m py_compile ...
passed

git diff --check
passed

GitHub Actions on 0ee9df6
all checks passed: docs-check, build, Python 3.10/3.11/3.12/3.13

AxDSan mentioned this pull request Jun 16, 2026

Add reflection budget and cron guardrails #337

Closed

WXBR force-pushed the feat/automated-sleep-model-refresh branch from 23a685c to 6f3d3a2 Compare June 17, 2026 04:49

WXBR changed the title ~~feat: add automated sleep model refresh~~ feat: add automated model refresh and recall tooling Jun 17, 2026

WXBR force-pushed the feat/automated-sleep-model-refresh branch from 9f4c703 to 1437f6c Compare June 17, 2026 23:38

AxDSan mentioned this pull request Jun 19, 2026

test(hermes): add provider parity coverage #348

Merged

WXBR force-pushed the feat/automated-sleep-model-refresh branch from 1437f6c to caa8012 Compare June 19, 2026 05:58

AxDSan requested changes Jun 23, 2026

View reviewed changes

WXBR added 6 commits June 23, 2026 17:32

feat: add automated sleep model refresh

e6b4cf8

feat: add recall diagnostics and task progress tools

5a6ed8d

docs: explain recall tooling rationale

e4b3652

fix: scope model refresh auto-apply

167713d

fix: resolve provider parity after rebase

991665a

docs: call out sync role default change

0ee9df6

WXBR force-pushed the feat/automated-sleep-model-refresh branch from caa8012 to 0ee9df6 Compare June 23, 2026 09:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add automated model refresh and recall tooling#330

feat: add automated model refresh and recall tooling#330
WXBR wants to merge 6 commits into
AxDSan:mainfrom
WXBR:feat/automated-sleep-model-refresh

WXBR commented Jun 16, 2026 •

edited

Loading

Uh oh!

WXBR commented Jun 16, 2026

Uh oh!

AxDSan commented Jun 16, 2026

Uh oh!

WXBR commented Jun 17, 2026

Uh oh!

dplush commented Jun 17, 2026 •

edited

Loading

Uh oh!

AxDSan commented Jun 17, 2026

Uh oh!

WXBR commented Jun 17, 2026

Uh oh!

WXBR commented Jun 17, 2026

Uh oh!

WXBR commented Jun 17, 2026

Uh oh!

AxDSan commented Jun 19, 2026

Uh oh!

WXBR commented Jun 19, 2026

Uh oh!

AxDSan left a comment

Uh oh!

WXBR commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

WXBR commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Behavior changes

Notes

Verification

Uh oh!

WXBR commented Jun 16, 2026

Uh oh!

AxDSan commented Jun 16, 2026

Uh oh!

WXBR commented Jun 17, 2026

Uh oh!

dplush commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AxDSan commented Jun 17, 2026

Uh oh!

WXBR commented Jun 17, 2026

Uh oh!

WXBR commented Jun 17, 2026

Uh oh!

WXBR commented Jun 17, 2026

Uh oh!

AxDSan commented Jun 19, 2026

Uh oh!

WXBR commented Jun 19, 2026

Uh oh!

AxDSan left a comment

Choose a reason for hiding this comment

Critical: sync_roles default flip is undocumented

Warnings (not blockers, but worth knowing)

Looks good

Why this is request-changes, not approve

Uh oh!

WXBR commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

WXBR commented Jun 16, 2026 •

edited

Loading

dplush commented Jun 17, 2026 •

edited

Loading