feat(ci): two-tier Docker base images + Skaffold build optimization by NovusEdge · Pull Request #63 · engrammic-ai/engrammic

NovusEdge · 2026-06-12T14:56:51Z

Summary

Add two-tier base image architecture (base-api, base-dagster) to speed up CI/CD builds
Replace Cloud Build configs with Skaffold orchestration
Add build-base-images.yml workflow that auto-rebuilds bases on uv.lock/pyproject.toml changes
Update deploy-beta.yml to use single skaffold build command

Expected Impact

Metric	Before	After
Code-only deploy	10-15 min	~30s
Dep change deploy	10-15 min	3-5 min

Files Changed

New:

docker/Dockerfile.base-api
docker/Dockerfile.base-dagster
skaffold.yaml
.github/workflows/build-base-images.yml

Modified:

docker/Dockerfile.api - inherits from base-api
docker/Dockerfile.dagster - inherits from base-dagster
.github/workflows/deploy-beta.yml - uses Skaffold
justfile - added build commands

Removed:

deploy/cloudbuild/api.yaml
deploy/cloudbuild/dagster.yaml
deploy/cloudbuild/beacon.yaml

Test plan

Base images built and pushed to Artifact Registry
Merge to beta and verify deploy workflow runs successfully
Verify app images build in <1 min (code-only)

A stored confidence of exactly 0.0 was promoted to 1.0 by falsy `or 1.0` defaults at five read sites. All sites now use the canonical effective_confidence helper (missing -> 1.0, present -> respected).

…ndpoints

Vertex AI text-embedding-005 has 20K token limit. With large documents like accessibility trees, 4 items at ~4K tokens each stays safely under the limit. Previously batch size of 50 would exceed token limits when processing chunked content. Also includes minor formatting fixes in context_query and fusion tests.

LongMemEval questions like 'I am using... True or false: ...' were not being classified as hard queries because they don't start with question words. Now any query containing '?' or 'true or false' is expanded to improve retrieval.

The harness uses REST /api/v1/recall which was bypassing the query expansion logic that exists in the MCP tools. Now both paths expand hard queries (those containing ? or 'true or false' patterns).

Add fields for reasoning chain persistence: - conclusion: chain conclusion text - conclusion_embedding: vector for consensus matching - agent_id: originating agent - source_hypothesis_id: link to WorkingHypothesis - traced_at: timestamp when traced Includes alembic migration 0016.

Collection stores conclusion embeddings for reasoning chains, enabling consensus detection via ANN similarity search. Created on startup alongside main collection.

Add extract_claims_task to transform Memory content into Knowledge: - LLM extraction of verifiable claims from observations - CITE v2 credibility scaling (source_tier * method_weight * raw_conf) - Dedup via CORROBORATES edges for similar existing claims - EXTRACTED_FROM edge linking claims to source Memory - Idempotency via extracted_at/extraction_version tracking Removes notification-only status from CHECK_EXTRACTION_TRIGGER.

Config in ExtractionConfig: - enabled, threshold, max_claims, model, timeout_ms - reextract_before_version for version-based re-extraction Metrics in recorder.py: - extraction_triggered, extraction_skipped, extraction_claims - extraction_corroborates, extraction_latency

Redis methods for session activity: - touch_session_activity: set key with TTL on hypothesis activity - check_session_active: check if session still active ReactionEventType additions: - TRACE_REASONING: session ended, persist hypotheses (TX7) - CHECK_CONSENSUS: multi-agent agreement check (TX6)

Tests for extract_claims_task: - test_extract_skips_short_content - test_extract_creates_claims - test_extract_idempotent - test_extract_links_to_source - test_extract_handles_llm_error - test_extract_dedup_creates_corroborates - test_extract_credibility_scaled

trace_reasoning_task persists WorkingHypothesis as ReasoningChain: - Fetches uncommitted hypotheses for session - Creates ReasoningChainSteps row in Postgres - Creates TRACED_FROM edge in graph - Emits CHECK_CONSENSUS for each chain (TX6)

TX6 CONSENSUS handler: - ANN search for similar conclusions (0.85 threshold) - DTW reasoning compatibility check (0.5 threshold) - K=3 chains from J=2 agents required - Creates Fact with PROMOTED_FROM + CONSENSUS_FROM dual edges - Extends existing consensus if found Type fixes in extract_claims_task: - build_llm_provider now gets provider/model from settings - llm.complete takes messages list, returns tuple - BinaryEdge uses type= not edge_type=, silo_id as UUID - store_claim uses evidence_refs, returns tuple Also adds specs and TX7 tests.

- Add ConsensusConfig with min_chains, min_agents, conclusion/reasoning thresholds, and trace_on_commit toggle - Replace hardcoded constants in check_consensus_task and chain_tombstoned_task with settings.consensus.* - Emit CHAIN_TOMBSTONED in forget() for intelligence layer nodes - Update TX7 tests for trace_reasoning_task

TX7 TRACE fixes: - Create stub ReasoningChain graph nodes (edges require both endpoints) - Upsert conclusion embedding to reasoning_chains Qdrant collection - Fetch agent_id from query and persist to Postgres - Fix crystallized filter to use traced_at and crystallized_into fields TX11 staleness cascade fixes: - Change edge queries from -[:CONSENSUS_FROM]-> to -[e:EDGE]-> WHERE e.type - Only emit CHAIN_TOMBSTONED for ReasoningChain nodes, not all intelligence TX1 EXTRACT fixes: - Pass scaled credibility (0.45 cap) instead of raw_confidence - Add Claim type filter to dedup search - Use payload.node_id instead of Qdrant point id Query updates: - GET_WORKING_HYPOTHESES_FOR_SESSION returns agent_id, traced_at, crystallized_into - GET_NODE_FOR_FORGET returns node_type for specific type checking

Without explicit location, litellm does slow discovery/defaulting causing 3s+ latency instead of <1s.

BUG-01: TX7 now marks only successfully traced hypotheses, not all in session BUG-04: TX11 remaining_query now includes silo filter on chain side BUG-05: synthesize() and revise_belief() now use f["fact_id"] matching query

…genai

…ceptions - Create src/context_service/exceptions.py with RateLimitExceeded - Update api/rate_limit.py to re-export from exceptions - Update mcp/error_boundary.py to import from exceptions - Remove obsolete _apply_reranking tests (FR handles reranking internally) - Fix pre-existing SIM300 lint error in query_expander.py

…mic-fusion

…back - Revert from Qwen MaaS to Gemini 2.5 Flash (reliable, no region issues) - Add json-repair fallback for malformed JSON responses - Remove debug logging

Config consolidation (per context/plans/2026-06-12-selfhosted-config-consolidation.md): - Delete embeddings.yaml, merge into models.yaml (sparse, qdrant_collection) - Add SparseConfig to ModelsConfig with BM25/SPLADE support - Remove legacy settings: litellm_embedding_model, embedding_provider - Unify Ollama URL: OLLAMA_URL/OLLAMA_BASE_URL/OLLAMA_API_BASE all work - Add startup validation (config/validation.py) with actionable errors - Update all load_config("embeddings") callsites to use settings.models - Add selfhosted.env.ollama.example and selfhosted.env.vertex.example Mypy stub fixes (460 -> 76 errors): - Add ignore_missing_imports for optional deps (dagster, redis, qdrant, fastapi, sqlalchemy, torch, litellm, taskiq, pydantic-ai, etc.) - Add disallow_untyped_decorators=false for FastAPI routes, dagster assets - Remove now-unused type: ignore comments across codebase - Remaining 76 errors are pre-existing type issues (subclass Any, no-any-return)

API changes: - Add layers/tags params to RecallRequest - Pass rrf_k config to FusionRetriever - Pass layers filter to retrieve() Fusion changes: - Add grep channel (regex text search via Memgraph) - Log per-channel hit counts for diagnostics

Search each word independently and rank by match count (pseudo-trigram). This finds results where terms appear in any order, not just sequence.

Replace caller-supplied X-Silo-ID header with get_authenticated_silo dependency that derives silo_id from WorkOS-verified org_id. This closes a HIGH severity auth gap where routes trusted unverified headers. All 18 REST endpoints now use the secure pattern.

NovusEdge added 30 commits June 11, 2026 19:47

feat(engine): canonical confidence interpretation helpers

906786f

fix(services): respect stored zero confidence on read paths

9ddad17

A stored confidence of exactly 0.0 was promoted to 1.0 by falsy `or 1.0` defaults at five read sites. All sites now use the canonical effective_confidence helper (missing -> 1.0, present -> respected).

feat(services): stamp confidence_formula_version on claim writes

51c1d8b

docs(plans): mark confidence hygiene pre-fix complete

6cc5b65

feat(config): add EpistemicFusionConfig for read-path score fusion

a18b0dc

feat(services): expose superseded_by on QueryResult

38babd3

docs(env): document epistemic fusion settings

f575928

feat(reranking): pure epistemic score fusion module

d6cadc3

feat(reranking): abstention floor compares pre-fusion rerank score

9bc7263

feat(recall): fuse epistemic state into post-rerank ranking

6833652

docs(plans): mark sprint step 1 read-path fusion complete

e9e54a3

plans and shit

135847d

chore(docker): disable SPLADE in dev compose to skip torch/NVIDIA deps

2b5ce46

fix(api): register graph router for /api/v1/nodes and /api/v1/edges e…

559a149

…ndpoints

feat(rest): add query expansion to REST recall endpoint

fe6916b

The harness uses REST /api/v1/recall which was bypassing the query expansion logic that exists in the MCP tools. Now both paths expand hard queries (those containing ? or 'true or false' patterns).

feat(qdrant): add reasoning_chains collection for TX6 CONSENSUS

ed348d9

Collection stores conclusion embeddings for reasoning chains, enabling consensus detection via ANN similarity search. Created on startup alongside main collection.

feat(reactions): implement TX7 TRACE handler

7e0d50f

trace_reasoning_task persists WorkingHypothesis as ReasoningChain: - Fetches uncommitted hypotheses for session - Creates ReasoningChainSteps row in Postgres - Creates TRACED_FROM edge in graph - Emits CHECK_CONSENSUS for each chain (TX6)

test(reactions): add TX6 CONSENSUS tests for check_consensus_task

460aa5e

fix(query-expander): pass vertex_ai_location to litellm

a278438

Without explicit location, litellm does slow discovery/defaulting causing 3s+ latency instead of <1s.

fix(intelligence): address blocker bugs from re-review

caed6cb

BUG-01: TX7 now marks only successfully traced hypotheses, not all in session BUG-04: TX11 remaining_query now includes silo filter on chain side BUG-05: synthesize() and revise_belief() now use f["fact_id"] matching query

NovusEdge added 30 commits June 12, 2026 19:14

perf(query_expander): switch to qwen3-7b for faster query expansion

dd7c474

fix(query_expander): use full model string for litellm, stripped for …

79dd992

…genai

fix: convert Layer enum to string for FusionRetriever layers param

7035680

deps: add google-cloud-aiplatform to llm-core for vertex partner models

63409ba

docker: add google-cloud-aiplatform to Dockerfile.api

d361445

perf(query_expander): use gemini-2.0-flash (qwen MaaS not enabled)

6bd2c75

revert: use qwen3-7b for query expansion (enable MaaS in GCP)

f2f04d9

fix(query_expander): use qwen3-235b-a22b-instruct (correct model ID)

09bb85c

fix(docker): install aiplatform to venv not system python

56c856b

fix(docker): install aiplatform after PATH is set

f63d3f4

fix(docker): use venv python -m pip to install aiplatform

f4fe939

fix(docker): use uv pip install for venv

0e97ea2

fix(query_expander): pass vertex_project and vertex_location to litellm

5c5f9bb

fix(config): use us-south1 for Qwen MaaS

b6a425d

fix(query_expander): remove response_format for Qwen MaaS compatibility

b320dbb

fix(query_expander): use vertex_ai_project/location params for litellm

4f0ab8e

Merge branch 'worktree-mcp-fusion-upgrade' into feat/read-path-episte…

ca1d396

…mic-fusion

debug: add verbose logging to litellm query expansion

b47ebbd

fix(query-expander): switch to gemini-2.5-flash, add json-repair fall…

c8e3541

…back - Revert from Qwen MaaS to Gemini 2.5 Flash (reliable, no region issues) - Add json-repair fallback for malformed JSON responses - Remove debug logging

chore: disable query expansion in dev compose (Gemini latency too high)

cc5fc74

Remove old scripts

49e9287

spec: LoCoMo-Plus benchmark integration design

fb3dd75

feat(grep): use OR pattern with match counting instead of sequential AND

b2cf26d

Search each word independently and rank by match count (pseudo-trigram). This finds results where terms appear in any order, not just sequence.

docs: spec trigram search integration for BM25 channel

0723863

Update lockfile

bcfad9f

test(api): update memory tests for token-based auth

dcdd4b5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(ci): two-tier Docker base images + Skaffold build optimization#63

feat(ci): two-tier Docker base images + Skaffold build optimization#63
NovusEdge wants to merge 120 commits into
mainfrom
feat/read-path-epistemic-fusion

NovusEdge commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

NovusEdge commented Jun 12, 2026

Summary

Expected Impact

Files Changed

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant