Skip to content

feat(ci): two-tier Docker base images + Skaffold build optimization#63

Open
NovusEdge wants to merge 120 commits into
mainfrom
feat/read-path-epistemic-fusion
Open

feat(ci): two-tier Docker base images + Skaffold build optimization#63
NovusEdge wants to merge 120 commits into
mainfrom
feat/read-path-epistemic-fusion

Conversation

@NovusEdge

Copy link
Copy Markdown
Contributor

Summary

  • Add two-tier base image architecture (base-api, base-dagster) to speed up CI/CD builds
  • Replace Cloud Build configs with Skaffold orchestration
  • Add build-base-images.yml workflow that auto-rebuilds bases on uv.lock/pyproject.toml changes
  • Update deploy-beta.yml to use single skaffold build command

Expected Impact

Metric Before After
Code-only deploy 10-15 min ~30s
Dep change deploy 10-15 min 3-5 min

Files Changed

New:

  • docker/Dockerfile.base-api
  • docker/Dockerfile.base-dagster
  • skaffold.yaml
  • .github/workflows/build-base-images.yml

Modified:

  • docker/Dockerfile.api - inherits from base-api
  • docker/Dockerfile.dagster - inherits from base-dagster
  • .github/workflows/deploy-beta.yml - uses Skaffold
  • justfile - added build commands

Removed:

  • deploy/cloudbuild/api.yaml
  • deploy/cloudbuild/dagster.yaml
  • deploy/cloudbuild/beacon.yaml

Test plan

  • Base images built and pushed to Artifact Registry
  • Merge to beta and verify deploy workflow runs successfully
  • Verify app images build in <1 min (code-only)

NovusEdge added 30 commits June 11, 2026 19:47
A stored confidence of exactly 0.0 was promoted to 1.0 by falsy
`or 1.0` defaults at five read sites. All sites now use the canonical
effective_confidence helper (missing -> 1.0, present -> respected).
Vertex AI text-embedding-005 has 20K token limit. With large documents
like accessibility trees, 4 items at ~4K tokens each stays safely under
the limit. Previously batch size of 50 would exceed token limits when
processing chunked content.

Also includes minor formatting fixes in context_query and fusion tests.
LongMemEval questions like 'I am using... True or false: ...' were
not being classified as hard queries because they don't start with
question words. Now any query containing '?' or 'true or false' is
expanded to improve retrieval.
The harness uses REST /api/v1/recall which was bypassing the query
expansion logic that exists in the MCP tools. Now both paths expand
hard queries (those containing ? or 'true or false' patterns).
Add fields for reasoning chain persistence:
- conclusion: chain conclusion text
- conclusion_embedding: vector for consensus matching
- agent_id: originating agent
- source_hypothesis_id: link to WorkingHypothesis
- traced_at: timestamp when traced

Includes alembic migration 0016.
Collection stores conclusion embeddings for reasoning chains,
enabling consensus detection via ANN similarity search.
Created on startup alongside main collection.
Add extract_claims_task to transform Memory content into Knowledge:
- LLM extraction of verifiable claims from observations
- CITE v2 credibility scaling (source_tier * method_weight * raw_conf)
- Dedup via CORROBORATES edges for similar existing claims
- EXTRACTED_FROM edge linking claims to source Memory
- Idempotency via extracted_at/extraction_version tracking

Removes notification-only status from CHECK_EXTRACTION_TRIGGER.
Config in ExtractionConfig:
- enabled, threshold, max_claims, model, timeout_ms
- reextract_before_version for version-based re-extraction

Metrics in recorder.py:
- extraction_triggered, extraction_skipped, extraction_claims
- extraction_corroborates, extraction_latency
Redis methods for session activity:
- touch_session_activity: set key with TTL on hypothesis activity
- check_session_active: check if session still active

ReactionEventType additions:
- TRACE_REASONING: session ended, persist hypotheses (TX7)
- CHECK_CONSENSUS: multi-agent agreement check (TX6)
Tests for extract_claims_task:
- test_extract_skips_short_content
- test_extract_creates_claims
- test_extract_idempotent
- test_extract_links_to_source
- test_extract_handles_llm_error
- test_extract_dedup_creates_corroborates
- test_extract_credibility_scaled
trace_reasoning_task persists WorkingHypothesis as ReasoningChain:
- Fetches uncommitted hypotheses for session
- Creates ReasoningChainSteps row in Postgres
- Creates TRACED_FROM edge in graph
- Emits CHECK_CONSENSUS for each chain (TX6)
TX6 CONSENSUS handler:
- ANN search for similar conclusions (0.85 threshold)
- DTW reasoning compatibility check (0.5 threshold)
- K=3 chains from J=2 agents required
- Creates Fact with PROMOTED_FROM + CONSENSUS_FROM dual edges
- Extends existing consensus if found

Type fixes in extract_claims_task:
- build_llm_provider now gets provider/model from settings
- llm.complete takes messages list, returns tuple
- BinaryEdge uses type= not edge_type=, silo_id as UUID
- store_claim uses evidence_refs, returns tuple

Also adds specs and TX7 tests.
- Add ConsensusConfig with min_chains, min_agents, conclusion/reasoning
  thresholds, and trace_on_commit toggle
- Replace hardcoded constants in check_consensus_task and
  chain_tombstoned_task with settings.consensus.*
- Emit CHAIN_TOMBSTONED in forget() for intelligence layer nodes
- Update TX7 tests for trace_reasoning_task
TX7 TRACE fixes:
- Create stub ReasoningChain graph nodes (edges require both endpoints)
- Upsert conclusion embedding to reasoning_chains Qdrant collection
- Fetch agent_id from query and persist to Postgres
- Fix crystallized filter to use traced_at and crystallized_into fields

TX11 staleness cascade fixes:
- Change edge queries from -[:CONSENSUS_FROM]-> to -[e:EDGE]-> WHERE e.type
- Only emit CHAIN_TOMBSTONED for ReasoningChain nodes, not all intelligence

TX1 EXTRACT fixes:
- Pass scaled credibility (0.45 cap) instead of raw_confidence
- Add Claim type filter to dedup search
- Use payload.node_id instead of Qdrant point id

Query updates:
- GET_WORKING_HYPOTHESES_FOR_SESSION returns agent_id, traced_at, crystallized_into
- GET_NODE_FOR_FORGET returns node_type for specific type checking
Without explicit location, litellm does slow discovery/defaulting
causing 3s+ latency instead of <1s.
BUG-01: TX7 now marks only successfully traced hypotheses, not all in session
BUG-04: TX11 remaining_query now includes silo filter on chain side
BUG-05: synthesize() and revise_belief() now use f["fact_id"] matching query
NovusEdge added 30 commits June 12, 2026 19:14
…ceptions

- Create src/context_service/exceptions.py with RateLimitExceeded
- Update api/rate_limit.py to re-export from exceptions
- Update mcp/error_boundary.py to import from exceptions
- Remove obsolete _apply_reranking tests (FR handles reranking internally)
- Fix pre-existing SIM300 lint error in query_expander.py
…back

- Revert from Qwen MaaS to Gemini 2.5 Flash (reliable, no region issues)
- Add json-repair fallback for malformed JSON responses
- Remove debug logging
Config consolidation (per context/plans/2026-06-12-selfhosted-config-consolidation.md):
- Delete embeddings.yaml, merge into models.yaml (sparse, qdrant_collection)
- Add SparseConfig to ModelsConfig with BM25/SPLADE support
- Remove legacy settings: litellm_embedding_model, embedding_provider
- Unify Ollama URL: OLLAMA_URL/OLLAMA_BASE_URL/OLLAMA_API_BASE all work
- Add startup validation (config/validation.py) with actionable errors
- Update all load_config("embeddings") callsites to use settings.models
- Add selfhosted.env.ollama.example and selfhosted.env.vertex.example

Mypy stub fixes (460 -> 76 errors):
- Add ignore_missing_imports for optional deps (dagster, redis, qdrant,
  fastapi, sqlalchemy, torch, litellm, taskiq, pydantic-ai, etc.)
- Add disallow_untyped_decorators=false for FastAPI routes, dagster assets
- Remove now-unused type: ignore comments across codebase
- Remaining 76 errors are pre-existing type issues (subclass Any, no-any-return)
API changes:
- Add layers/tags params to RecallRequest
- Pass rrf_k config to FusionRetriever
- Pass layers filter to retrieve()

Fusion changes:
- Add grep channel (regex text search via Memgraph)
- Log per-channel hit counts for diagnostics
Search each word independently and rank by match count (pseudo-trigram).
This finds results where terms appear in any order, not just sequence.
Replace caller-supplied X-Silo-ID header with get_authenticated_silo
dependency that derives silo_id from WorkOS-verified org_id. This
closes a HIGH severity auth gap where routes trusted unverified headers.

All 18 REST endpoints now use the secure pattern.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant