Skip to content

feat: add automated model refresh and recall tooling#330

Open
WXBR wants to merge 6 commits into
AxDSan:mainfrom
WXBR:feat/automated-sleep-model-refresh
Open

feat: add automated model refresh and recall tooling#330
WXBR wants to merge 6 commits into
AxDSan:mainfrom
WXBR:feat/automated-sleep-model-refresh

Conversation

@WXBR

@WXBR WXBR commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR adds automated sleep-time model refresh plus opt-in observability/continuity surfaces for Mnemosyne:

  • add sleep-time model update inference, validation, deterministic auto-apply/reject, and diagnostic listing
  • add mnemosyne_model_card and mnemosyne_model_refresh for canonical model-card inspection and model-refresh diagnostics
  • add MNEMOSYNE_QUERY_INTENT=1 gated query-intent weight adjustment in BeamMemory.recall()
  • expose mnemosyne_recall_diagnostics for recall tier/fallback counters, gated by MNEMOSYNE_RECALL_DIAGNOSTICS=1
  • add mnemosyne_task_progress, an owner-scoped canonical task:progress tool for curated “where did we leave off?” state
  • replace the dead mnemosyne.core.orchestrator stub with a thin compatibility wrapper over BeamMemory.recall()

Behavior changes

  • Hermes sync_roles now defaults to user-turn autosave only. The old default was ["user", "assistant"]; the new default is ["user"].
  • Existing deployments that want to keep assistant-turn autosave should set this explicitly in config.yaml:
memory:
  mnemosyne:
    sync_roles: ["user", "assistant"]
  • This changes the release target to 3.11.0 because the default autosave behavior changes silently for deployments that did not already override sync_roles.

Notes

  • mnemosyne_recall_diagnostics returns disabled unless explicitly enabled via env var.
  • mnemosyne_model_refresh is diagnostic-only; the schema exposes action: "list" only.
  • mnemosyne_task_progress uses the existing CanonicalStore owner scoping instead of raw global keys.
  • Normal recall remains retrieval-only. No LLM reranking or new network dependency is added.
  • Query-intent weighting is opt-in and explicit caller weights still win.
  • MNEMOSYNE_RECALL_EXTRA_STOPWORDS is read at import time through _FACT_MATCH_STOPWORDS, so changing that env var requires a process restart to affect recall stopword matching.

Verification

Ran locally after rebasing on current upstream/main and applying the release-hygiene fixes:

PYTHONPATH=integrations/hermes/src:. python3 -m pytest tests/ -q -o 'addopts='
# 1712 passed, 28 skipped in 34.05s

cd integrations/hermes && PYTHONPATH=src:../.. python3 -m pytest tests/ -q -o 'addopts='
# 22 passed in 0.20s

python3 -m py_compile mnemosyne/__init__.py mnemosyne/core/beam.py mnemosyne/core/model_refresh.py hermes_memory_provider/__init__.py integrations/hermes/src/mnemosyne_hermes/__init__.py integrations/hermes/src/mnemosyne_hermes/tools.py integrations/hermes/src/mnemosyne_hermes/install.py
# passed

git diff --check
# passed

PYTHONPATH=. python3 - <<'PY'
import mnemosyne
assert mnemosyne.__version__ == '3.11.0', mnemosyne.__version__
print('version', mnemosyne.__version__)
PY
# version 3.11.0

Also checked for duplicate upstream issues/PRs by exact symbols earlier in the PR:

  • mnemosyne_recall_diagnostics
  • mnemosyne_task_progress
  • MNEMOSYNE_QUERY_INTENT adjust_weights
  • orchestrate_recall BeamMemory

No matching issues/PRs found.

@WXBR

WXBR commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

very chunky pr. hope this is up to par.

@AxDSan

AxDSan commented Jun 16, 2026

Copy link
Copy Markdown
Owner

@WXBR this is a big one (+1236). The scope is right — automated sleep model refresh is something I've wanted natively — but the 3.13 runner can't download BAAI/bge-small-en-v1.5 and 48 tests fail as a result.

All failures trace to the same model download infra issue, not code quality. The PR itself looks structurally sound.

Holding for now. Need to fix the model provider injection path before this can land. That's my side — I'll get it sorted and come back to this.

What I like: The architecture is clean. Validation pipeline, deterministic accept/reject, no free-blob nonsense. That's exactly how I'd want it wired.

Thanks for the substantial work. I will circle back on this one.

@WXBR WXBR force-pushed the feat/automated-sleep-model-refresh branch from 23a685c to 6f3d3a2 Compare June 17, 2026 04:49
@WXBR WXBR changed the title feat: add automated sleep model refresh feat: add automated model refresh and recall tooling Jun 17, 2026
@WXBR

WXBR commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Follow-up note on what changed in the latest push:

I rebased this branch onto current upstream/main, resolved the provider config conflict by keeping the upstream user-only sync_roles default, then added the recall/tooling layer on top of the existing automated model-refresh work.

The new commit adds three main pieces:

  1. MNEMOSYNE_QUERY_INTENT=1 gated query-intent weighting in BeamMemory.recall(). This keeps default recall unchanged, and explicit caller weights still override the intent adjustment.
  2. mnemosyne_recall_diagnostics, gated by MNEMOSYNE_RECALL_DIAGNOSTICS=1, for inspecting recall-tier/fallback counters without making diagnostics globally active.
  3. mnemosyne_task_progress, backed by owner-scoped canonical facts, for curated task-continuity state instead of relying on raw transcript recall.

I also replaced the previously stubbed mnemosyne.core.orchestrator path with a compatibility wrapper over BeamMemory.recall(), and fixed two regressions that showed up only when running the broader suite after the rebase: empty MNEMOSYNE_DATA_DIR fallback behavior under isolated HOME, and double-counting accepted sync events when the remember pipeline succeeded.

Validation after the final push:

python -m pytest tests/ -q -o 'addopts='
1634 passed, 1 warning, 4 subtests passed

cd integrations/hermes && python -m pytest tests/ -q -o 'addopts='
16 passed

GitHub Actions is green across docs, build, and Python 3.10 through 3.13.

@dplush

dplush commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

I checked the latest head (9f4c703). CI is green, but I would still hold this before merge for two safety/scoping fixes:

  • model-refresh auto-apply currently uses owner_id="default", which can write canonical model facts into the wrong namespace for non-default Hermes profiles
  • model-refresh inference can still fall through to remote LLM calls instead of respecting disabled/force-local LLM expectations

Both seem worth covering with tests before merge.

@AxDSan fyi

@AxDSan

AxDSan commented Jun 17, 2026

Copy link
Copy Markdown
Owner

@dplush, both concerns land:

  1. model-refresh auto-apply must scope owner_id properly so non-default Hermes profiles do not get canonical facts written into the wrong namespace
  2. model-refresh auto-apply must skip when triggered from cron

@WXBR, address those two items and we merge. The scope of this PR is exactly what the platform needs. Will wait for the fixes, then merge on green CI.

Also, to be explicit on the merge order: #338 just landed and touched both provider init files. Rebase onto current main before pushing the fixes so we get a clean diff.

@WXBR

WXBR commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Got it👍

@WXBR WXBR force-pushed the feat/automated-sleep-model-refresh branch from 9f4c703 to 1437f6c Compare June 17, 2026 23:38
@WXBR

WXBR commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Addressed the two maintainer-requested items and rebased onto current upstream/main (v3.9.0).

Latest head: 1437f6c

Changes in this push:

  1. Model-refresh auto-apply now writes canonical model facts through the active Beam/runtime owner namespace instead of hardcoding owner_id="default".

    • Hermes providers attach beam.canonical_owner_id = self._canonical_owner() after initialization.
    • Explicit canonical tools and sleep-time model-refresh auto-apply now target the same owner namespace for non-default profiles.
  2. Sleep-time model refresh now skips in cron context.

    • Providers attach beam.agent_context = self._agent_context.
    • BeamMemory.sleep() suppresses model-refresh inference/proposal/auto-apply when agent_context == "cron", so cron maintenance sleep cannot mutate canonical model slots.
  3. Added regression coverage:

    • auto-apply uses a non-default Beam owner namespace and does not write into default
    • cron-context sleep does not call model-refresh inference and creates no canonical model facts
    • active Hermes adapter initializes Beam with canonical owner + agent context

Local validation:

python -m pytest tests/test_model_refresh_stress.py integrations/hermes/tests/test_canonical_tools.py -q -o 'addopts='
11 passed

python -m pytest tests/ -q -o 'addopts='
1652 passed, 1 warning, 4 subtests passed

cd integrations/hermes && python -m pytest tests/ -q -o 'addopts='
17 passed

python -m py_compile mnemosyne/core/beam.py mnemosyne/core/model_refresh.py hermes_memory_provider/__init__.py integrations/hermes/src/mnemosyne_hermes/__init__.py integrations/hermes/src/mnemosyne_hermes/tools.py
# passed

git diff --check
# passed

GitHub Actions has started on the rebased head and is currently pending.

@WXBR

WXBR commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

CI is now green on 1437f6c: docs-check, build, and Python 3.10/3.11/3.12/3.13 all passed.

AxDSan pushed a commit that referenced this pull request Jun 19, 2026
Closes #323.

Adds parity tests between hermes_memory_provider and mnemosyne_hermes to prevent behavior drift. +717/-17, almost entirely test code (low risk of behavioral regression). Catches the class of bugs already hit: dunder-tool vs canonical-schema path inconsistency, import shadowing.

The parity coverage should land before the larger PRs (#353, #330) that touch the same provider files, so any future drift is caught by the new tests.
@AxDSan

AxDSan commented Jun 19, 2026

Copy link
Copy Markdown
Owner

Reviewed head 1437f6c. Both dplush concerns are addressed cleanly:

  • owner_id scoping: hermes_memory_provider/__init__.py:1607 sets self._beam.canonical_owner_id = self._canonical_owner() after init, and mnemosyne/core/beam.py:7910 reads self.canonical_owner_id for the auto-apply call. No more hardcoded "default" for non-default profiles.
  • cron guard: mnemosyne/core/beam.py:7911 skips infer_model_update_proposals when agent_context == "cron". Auto-apply never fires in cron context.

The rest of the design is also solid:

  • MNEMOSYNE_QUERY_INTENT=1 gates the query-intent weighting, opt-in only
  • MNEMOSYNE_RECALL_DIAGNOSTICS=1 gates the new tool surface, opt-in only
  • MNEMOSYNE_AUTO_REFRESH_MODEL gates the auto-apply itself
  • the validator pipeline (confidence threshold, evidence count, conflict thresholds) is the right shape

Need a rebase though. The other six PRs in the queue landed today (#347, #349, #348, #353, #352, #354) and the provider init files, tools, and test_provider_all_15_tools.py now have conflicts with your branch. Can you rebase on current main and push? Once the rebase lands cleanly with CI green, we merge immediately.

Merge order matters here: #330 was the last in the queue by design because of the 23-file surface, and now everything smaller has shipped first so the diff against main is just the #330 delta plus the new merge resolution.

@WXBR WXBR force-pushed the feat/automated-sleep-model-refresh branch from 1437f6c to caa8012 Compare June 19, 2026 05:58
@WXBR

WXBR commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Rebased onto current upstream/main and pushed latest head caa8012.

What changed in the rebase resolution:

  1. Resolved the provider/tool conflicts from the landed PR queue.

    • Kept the current upstream tool surfaces such as mnemosyne_triple_end, sync tools, persona tools, and installer status handling.
    • Preserved this PR's model-refresh/model-card tools.
    • Added mnemosyne_recall_diagnostics and mnemosyne_task_progress to the root hermes_memory_provider as well as the pip adapter so provider parity holds.
  2. Preserved the PR's intended user-only autosave default.

    • Both provider surfaces now agree on sync_roles defaulting to ["user"].
    • Updated the provider parity expectation accordingly.
  3. Fixed the integration installer conflict/regression found during local validation.

    • Removed the duplicate enum-style PluginState/recursive plugin_state() path so the dataclass-based install status tests pass.

Local validation after the final push:

PYTHONPATH=integrations/hermes/src:. python3 -m pytest tests/test_hermes_provider_parity.py tests/test_provider_all_15_tools.py tests/test_sync_roles.py tests/test_model_refresh_stress.py integrations/hermes/tests/test_canonical_tools.py integrations/hermes/tests/test_recall_diagnostics_tool.py integrations/hermes/tests/test_task_progress_tool.py -q -o 'addopts='
85 passed

PYTHONPATH=integrations/hermes/src:. python3 -m pytest tests/ -q -o 'addopts='
1707 passed, 28 skipped

cd integrations/hermes && PYTHONPATH=src:../.. python3 -m pytest tests/ -q -o 'addopts='
22 passed

python3 -m py_compile mnemosyne/core/beam.py mnemosyne/core/model_refresh.py hermes_memory_provider/__init__.py integrations/hermes/src/mnemosyne_hermes/__init__.py integrations/hermes/src/mnemosyne_hermes/tools.py integrations/hermes/src/mnemosyne_hermes/install.py
# passed

git diff --check
# passed

GitHub Actions is green on caa8012: docs-check, build, and Python 3.10/3.11/3.12/3.13 all passed.

@AxDSan AxDSan left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, was IRL busy the last few days.

@WXBR 2066 lines, 26 files, opt-in by default. The shape is right (env-gated new tools, CanonicalStore owner scoping, dead-stub orchestrator replacement, model-refresh safety rails with MNEMOSYNE_SLEEP_MODEL_REFRESH_AUTO_APPLY=false as the emergency brake). Tool count math checks out: 33 -> 37 with the four new tools (mnemosyne_model_card, mnemosyne_model_refresh, mnemosyne_recall_diagnostics, mnemosyne_task_progress). Tests are substantial, including the 235-line test_model_refresh_stress.py covering apply/reject paths, owner namespace, cron suppression, and conflict supersession.

That said, I need changes before merge. There is one behavior change buried in here that ships silently unless I call it out, plus two process gaps.

Critical: sync_roles default flip is undocumented

hermes_memory_provider/__init__.py and integrations/hermes/src/mnemosyne_hermes/__init__.py both change self._sync_roles default from {"user","assistant"} to {"user"}. The config schema default in sync_roles.default flips the same way. test_sync_roles.py and test_hermes_provider_parity.py were updated to match.

This breaks any deployment that relied on the previous default for assistant-turn autosave. They will not see a loud error; they will see assistant turns silently stop landing in Mnemosyne on upgrade. That is the worst kind of change: invisible, no signal until a user notices their L3 persona stopped updating, and impossible to diagnose from outside.

The PR body does not mention this flip. CHANGELOG has no entry. Three things I need before merge:

  1. Call the flip out in the PR description under a "Behavior changes" heading. State the old default, the new default, and the migration path for existing deployments (memory.mnemosyne.sync_roles: ["user", "assistant"] in config.yaml).
  2. Add a CHANGELOG entry under [Unreleased] that names the flip and points to the migration config.
  3. Bump the version to a clear target. Right now mnemosyne/__init__.py is on 3.10.1 (shipped via #373) with a stale duplicate 3.9.0 line (pre-existing, not yours, but worth cleaning up while you are in there). A behavior change like this warrants MINOR at minimum, so 3.11.0.

Warnings (not blockers, but worth knowing)

  • Hard-coded provider re-init paths in the new tool handlers. _handle_model_card, _handle_task_progress, _handle_recall_diagnostics each repeat the getattr(self._beam, "canonical", None) + lazy CanonicalStore(...) + self._beam.canonical = store pattern. Worth extracting a small _get_canonical_store() helper on the provider base so the next tool does not copy-paste it.
  • mnemosyne.core.local_llm import inside model_refresh.infer_model_update_proposals uses internal helpers _try_host_llm and _call_remote_llm. The docstring acknowledges "internal sibling API used by sleep too". This coupling should be promoted to a documented public seam if not already, otherwise a future refactor in local_llm silently breaks model-refresh and the failure mode is non-obvious.
  • _FACT_MATCH_STOPWORDS seeds from MNEMOSYNE_RECALL_EXTRA_STOPWORDS at import time. Operators who change the env var need a process restart to pick it up. Not a regression, just worth a sentence in the PR body so the limitation is on the record.

Looks good

  • mnemosyne_recall_diagnostics is properly gated; default returns {"status": "disabled"} with no snapshot leak.
  • mnemosyne_model_refresh is correctly diagnostic-only: schema's action.enum is hard-coded to ["list"], and test_model_refresh_tool_is_diagnostic_only covers it.
  • mnemosyne_task_progress uses CanonicalStore owner scoping end-to-end (round-trip, profile-isolation, clear validation all covered by test_task_progress_tool.py).
  • Orchestrator replacement preserves the old from mnemosyne.core.orchestrator import ... signature and returns [] on exception instead of raising. Strict improvement over the dead stub.
  • Model-refresh safety rails (_EPHEMERAL_RE, min-confidence + min-evidence gates, evidence-IDs subset check) are well-designed. The MNEMOSYNE_SLEEP_MODEL_REFRESH_AUTO_APPLY=false emergency brake is documented in the right place.
  • test_query_intent_recall.py correctly verifies both the opt-in path and the explicit-weight override (the MNEMOSYNE_QUERY_INTENT branch only fires when vec_weight, fts_weight, importance_weight are all None).
  • canonical_owner_id wiring is consistent across both provider inits and survives standalone use (defaults to "default" / "primary").

Why this is request-changes, not approve

The sync_roles flip is not a "feel like calling it out" nit. It is a silent behavior change for every deployment that did not override the default. We cannot ship that without telling the user, recording it in the changelog, and bumping the version. Once those three are in, this is a fast approve. I want to land it.

Push the changes and I will re-review same day.

@WXBR WXBR force-pushed the feat/automated-sleep-model-refresh branch from caa8012 to 0ee9df6 Compare June 23, 2026 09:33
@WXBR

WXBR commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Addressed the requested release-hygiene fixes on latest head 0ee9df6:

  • Added a Behavior changes section to the PR body documenting the sync_roles default flip from ["user", "assistant"] to ["user"] and the migration config for deployments that want assistant-turn autosave.
  • Added a [Unreleased] CHANGELOG entry with the same behavior change and migration path.
  • Bumped mnemosyne.__version__ from 3.10.1 to 3.11.0.
  • Kept the branch rebased on current upstream/main, including the chore: release: v3.10.1 — critical JWT signature verification fix (CVE GHSA-xcw4-53cc-hv32) #373 3.10.1 security release, before applying these changes.

Validation:

PYTHONPATH=integrations/hermes/src:. python3 -m pytest tests/ -q -o 'addopts='
1712 passed, 28 skipped

cd integrations/hermes && PYTHONPATH=src:../.. python3 -m pytest tests/ -q -o 'addopts='
22 passed

python3 -m py_compile ...
passed

git diff --check
passed

GitHub Actions on 0ee9df6
all checks passed: docs-check, build, Python 3.10/3.11/3.12/3.13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants