fix(embeddings): respect HERMES_HOME for fastembed cache default#366
fix(embeddings): respect HERMES_HOME for fastembed cache default#366mxfng wants to merge 2 commits into
Conversation
AxDSan
left a comment
There was a problem hiding this comment.
Sorry for the delay, was IRL busy the last few days.
Nice catch. The default ~/.hermes/cache/fastembed path was the only place I leaked Hermes state outside $HERMES_HOME for users on a non-default layout. Confirmed the fix matches the HERMES_HOME pattern used in mcp_tools.py, beam.py, banks.py, memory.py, and memory_browser.py, so this is a consistency fix, not a behavior change for the default install.
Verified locally on pr/366:
python3 -m py_compile mnemosyne/core/embeddings.pyclean.- With
HERMES_HOMEunset:_FASTEMBED_CACHE_DIRresolves to/root/.hermes/cache/fastembed(same asmain). - With
HERMES_HOME=/tmp/hermes-test: resolves to/tmp/hermes-test/cache/fastembed. pytest tests/test_optional_embeddings.py tests/test_embedding_optout.py tests/test_embeddings_multilingual.py-> 24 passed in 2.71s.- CI green: 6/6 jobs SUCCESS (build, docs-check, test on 3.10/3.11/3.12/3.13).
Two small asks before merge:
- CHANGELOG entry under
[Unreleased]for the fix. Even one line ("respect HERMES_HOME for fastembed cache default") keeps the release notes honest. - No version bump. That is fine for this, since it is a no-op for default layouts and a correctness fix for non-default ones. Just noting I am not asking for a bump here.
Diff is small, scoped, and the comment block at the top of the change is actually useful. Approving.
The MNEMOSYNE_FASTEMBED_CACHE_DIR override already lets CI redirect the fastembed model cache, but its default still hardcodes ~/.hermes and ignores HERMES_HOME. Users who relocate HERMES_HOME (e.g. to ~/.config/hermes) get a stray ~/.hermes/cache/fastembed directory holding ~64MB of model weights, separate from the rest of their Hermes state. Default the cache to <HERMES_HOME>/cache/fastembed, falling back to ~/.hermes when HERMES_HOME is unset. This matches the HERMES_HOME handling already used elsewhere in the package (mcp_tools.py) and changes nothing for users on the default layout.
76f15fa to
a789810
Compare
|
Added the CHANGELOG entry under [Unreleased] (no version bump, per your note). Also rebased onto main to pick up v3.10.1; resolved a CHANGELOG conflict from the new [3.10.1] section. Ready to merge. Thanks for the review! |
The
MNEMOSYNE_FASTEMBED_CACHE_DIRoverride (added infix/ci-embedding-model-cache) lets CI redirect the fastembed model cache, but its default still hardcodes~/.hermesand ignoresHERMES_HOME.Problem
Users who relocate
HERMES_HOME— e.g. to~/.config/hermesfor XDG cleanliness — end up with a stray~/.hermes/cache/fastembeddirectory holding ~64MB of BGE model weights, separate from the rest of their Hermes state (config, db, logs, mnemosyne data), all of which correctly live under$HERMES_HOME.Because the cache path is passed as an explicit
cache_dir=argument toTextEmbedding(), no fastembed env var can redirect it — the explicit arg wins. So the default itself has to honorHERMES_HOME.Fix
Default the cache to
<HERMES_HOME>/cache/fastembed, falling back to~/.hermes/cache/fastembedwhenHERMES_HOMEis unset.This matches the
HERMES_HOMEhandling already used elsewhere in the package (mcp_tools.py), and is a no-op for anyone on the default layout (HERMES_HOME unset → identical~/.hermespath). The explicitMNEMOSYNE_FASTEMBED_CACHE_DIRoverride still takes precedence.Testing
ast.parseclean.HERMES_HOME=~/.config/hermes:_FASTEMBED_CACHE_DIRresolves to~/.config/hermes/cache/fastembed, the model loads from there, andembed()returns a(1, 384)vector. No~/.hermesdirectory is recreated.