Comprehensive technical documentation covering system architecture, analysis pipeline, data models, enforcement hooks, sharing ecosystem, and development.
- System Overview
- Technology Stack
- Project Structure
- Layered Architecture
- Engine — Local Analysis
- Ignore System — Three Tiers
- Bulk-Content Classifier
- Coordinator — AI Pipeline
- Fast Scan (
/archie-scan) - Deep Scan (
/archie-deep-scan) - Findings Store
- Pitfalls
- Hooks — Real-Time Enforcement
- Rules — Synthesis and Delivery
- Renderer — Output Generation
- Standalone Scripts
- NPM Package — Distribution
- Claude Code Integration
- Share Pipeline (
/archie-share) - StructuredBlueprint Data Model
- Data Flow
- Compound Learning
- Drift Detection
- Cycle Detection
- Telemetry
- No Inline Python Constraint
- Error Handling and Resilience
- Testing
- File Sync Protocol
Archie is a CLI tool + NPM package that bolts onto any codebase to give AI coding agents durable architectural context. No backend server and no database are required for local operation; a lightweight Supabase-backed share ecosystem exists for handing blueprints to teammates via a URL.
The core workflow:
- Scan — deterministic local analysis of the repository (file tree, imports, frameworks, hashing, token counting, skeleton extraction, bulk-content classification). Pure Python, no AI.
- Analyze — Claude Code subagents (Sonnet) gather architectural facts in parallel; a single Opus subagent produces deep reasoning.
- Store — architectural synthesis lands in
blueprint.json; concrete problems land in the shared, compoundingfindings.jsonstore (4-field shape with id-stable upsert). - Render — deterministic JSON-to-Markdown generation of CLAUDE.md, AGENTS.md, optional per-folder context, and rule files.
- Enforce — Claude Code hooks validate every file write against extracted rules.
- Share (optional) —
/archie-sharehas three modes. Default uploads a bundle to BitRaptors' Supabase edge function and renders via the hosted React viewer atarchie-viewer.vercel.app. Enterprise (stored credentials) uploads directly to the customer's own S3 bucket via pure-stdlib sigv4 signing — BitRaptors is never in the data path; the GET URL rides in the share URL's fragment (never transmitted to any server); the viewer fetches client-side from the customer bucket. Enterprise (paste URL) is the same flow but with a per-share presigned PUT URL minted by the customer's InfoSec, so no credentials ever live on the dev's laptop.
Archie has four user-facing slash commands (+ one local inspector):
/archie-scan— architecture health check (1–3 min). Runs deterministic scanner for data gathering, then AI acts as a senior architect: analyzes dependencies, finds pattern drift, identifies complexity hotspots, proposes enforceable rules. Writes concrete findings to the shared.archie/findings.jsonstore with id-stable upsert. Single AI synthesis, 3 parallel Sonnet agents below it./archie-deep-scan— comprehensive baseline (15–20 min). Full 2-wave multi-agent analysis (3–4 parallel Sonnet fact-gatherers + one Opus reasoner). Produces complete blueprint and all outputs. Supports--incremental(changed files only, 3–6 min),--continue(resume interrupted run),--from N(resume from step N),--reconfigure(re-prompt monorepo scope). Auto-detects monorepos and offers parallel sub-project analysis. Intent Layer (per-folder CLAUDE.md) is opt-in via an interactive prompt at Step E. Implemented as a modular Claude Code skill —.claude/skills/archie-deep-scan/holds aSKILL.mdrouter plus self-contained per-step files, fragments, and templates, so the long pipeline survives/compactand resumes mid-run./archie-intent-layer— standalone per-folder CLAUDE.md regeneration. Phase 0.5 asks Full/Incremental/Auto upfront (Auto usesdetect-changesagainst thelast_deep_scan.jsonbaseline). Hard-requiresblueprint.json— otherwise tells the user to run/archie-deep-scanfirst, no degraded path. Shares its Phases 1–4 pipeline with/archie-deep-scanStep 7 (single source of truth); deep-scan Reads this file and layers its own deltas (telemetry, SCAN_MODE mapping, Compact Checkpoint B)./archie-share— uploads blueprint + findings + scan report and returns a URL. Dual-mode at share time: default (BitRaptors Supabase, unchanged) or enterprise (BYO customer S3 bucket, zero BitRaptors storage). Enterprise mode supports either stored credentials (one-timeshare_setup.py) or per-share presigned PUT URL paste. See Share Pipeline for the full architecture./archie-viewer— local inspector that runs the same React UI as the hosted share viewer.viewer.pyserves the prebuilt Reactdist/plus a localhost JSON API (/api/bundle,/api/generated-files,/api/folder-claude-mds,/api/intent-layer-status,/api/ignored-rules,POST /api/rules) atlocalhost:5847/local. Two tabs: Blueprint (the full report — health, diagram, decisions, findings, pitfalls, plus inline rule adopt/reject/edit) and Files (per-folder CLAUDE.md browser + click-to-view generated-files tree). The bundle is rebuilt from.archie/on every request; the rule actions write back to.archie/.
| Component | Technology | Purpose |
|---|---|---|
| Standalone scripts | Python 3.9+ (stdlib only) | Type hints, dataclasses, pathlib, http.server |
| Python package (CLI) | Python 3.9+, Pydantic, Click | Model validation, command dispatch |
| AI agents | Claude Code CLI (claude -p) + Anthropic Sonnet + Opus |
Subagent execution via subprocess |
| NPM installer | Node.js 18+ | npx @bitraptors/archie distribution |
| Viewer (shared) | React 18 + Vite + TypeScript + Tailwind v3 + React Router | One React codebase serves both the hosted share viewer (archie-viewer.vercel.app) and the local /archie-viewer sidecar |
| Share backend | Supabase Edge Functions (Deno) + Postgres | Upload + blueprint fetch by token + anonymous telemetry ingest |
| Testing | pytest | 50 test files |
| Linting | Ruff | Python linting and formatting |
Standalone scripts (copied to target projects via npx @bitraptors/archie) have zero pip dependencies — Python 3.9+ stdlib only, including viewer.py's HTTP server. The viewer's React/TypeScript stack is a separate concern: its source ships inside the npm package and the installer builds it once into a static dist/ (cached by version); target projects never install or run React at scan time.
archie/
__init__.py
cli/ # Click CLI commands (used by the Python-package path)
main.py # CLI group
init_command.py # Full pipeline orchestration
refresh_command.py # Rescan + change detection
status_command.py # Blueprint freshness, rule stats, health metrics
serve_command.py # FastAPI viewer server (legacy)
check_command.py # CI validation: check files against rules
engine/ # Local codebase analysis (no AI)
models.py # Pydantic: FileEntry, DependencyEntry, FrameworkSignal, RawScan
scan.py # Orchestrator: runs all analysis steps -> RawScan
scanner.py # Walk directory tree, apply ignore + bulk layering
dependencies.py # Parse requirements.txt, package.json, go.mod, Cargo.toml, pyproject.toml
frameworks.py # Detect React, FastAPI, Django, etc. with confidence scores
hasher.py # SHA256 file hashes + token counting
imports.py # Build import graph from source code
coordinator/ # AI pipeline (Python-package path)
planner.py # Group files into token-budgeted SubagentAssignments
prompts.py # Build markdown prompts for subagents and coordinator
runner.py # Spawn `claude -p` subprocesses, parse JSON responses
merger.py # Deep merge partial blueprints into single StructuredBlueprint
hooks/
generator.py # Generate .claude/hooks/*.sh, register in settings.local.json
enforcement.py # Validate files against rules (Python API)
renderer/
render.py # Adapter: calls standalone renderer + intent layer
intent_layer.py # Generate per-folder CLAUDE.md with local patterns
rules/
extractor.py # Legacy blueprint rule extractor — RETIRED in v2.5.0, no longer on the pipeline (kept for tests)
standalone/ # Zero-dependency scripts (30 files, exported to target projects)
_common.py # IgnoreMatcher, BulkMatcher, DECISION_RE, normalize_blueprint
scanner.py # File tree, import graph, framework detection, skeleton extraction, bulk manifest
renderer.py # Generate AGENTS.md (canonical) + CLAUDE.md pointer + .claude/rules/ topic files
intent_layer.py # Per-folder CLAUDE.md via DAG scheduling + AI enrichment + inspect/scan-config/deep-scan-state/save-run-context
viewer.py # Local viewer — serves the React dist/ + /api/* JSON endpoints (stdlib http.server)
drift.py # Mechanical drift detection
validate.py # Cross-reference blueprint against actual codebase
check_rules.py # Check files against rules (CI path)
measure_health.py # Erosion, gini, verbosity, top-20%, waste scores + history append + --compare-history
code_shape.py # Code-shape matching primitives consumed by the pre-validate hook + rule index
detect_cycles.py # Tarjan's SCC on the import graph
install_hooks.py # Install 6 hooks + 29 permissions in settings.local.json
merge.py # Merge blueprint sections from multiple sources
finalize.py # Deep merge + findings upsert into store + pitfalls into blueprint
verify_findings.py # Parallel Haiku verifier — checks each finding's triggering_call_site against real code
apply_verdicts.py # Apply verifier verdicts to findings.json with cross-run hysteresis
rule_index.py # Pre-compute .archie/rule_index.json (keyword / path / always-inject buckets) for hot-path enforcement
align_check.py # Phase 3 semantic alignment classifier — plan/diff intent vs rule description+why+example
migrate_blueprint_rules.py # Migrate legacy blueprint rule sections into proposed_rules.json
arch_review.py # Architectural review checklist for plans and diffs
refresh.py # File change detection (hash comparison)
extract_output.py # rules / deep-drift / recent-files / save-duplications subcommands
telemetry.py # Per-run step-level wall-clock timing + steps-count action
telemetry_sync.py # Anonymous opt-in telemetry — record events, push to Supabase
update_check.py # Anonymous opt-in npm-registry update check + snooze ladder
config.py # Machine-level config at ~/.archie/config.json (telemetry consent, update prefs, install id)
analytics.py # Local analytics dashboard over ~/.archie/analytics/runs.jsonl
upload.py # Build share bundle; default mode POSTs to Supabase, enterprise modes do sigv4-PUT or presigned-PUT to customer bucket + build fragment-embedded viewer URL
share_setup.py # Enterprise share setup wizard: writes ~/.archie/share-profile.json (chmod 600) from flags
lint_gate.py # Opt-in external linter gate (ruff / eslint / golangci-lint / semgrep) behind .archie/enforcement.json
npm-package/
bin/archie.mjs # npx @bitraptors/archie installer entry point
assets/ # Canonical copies of everything the installer ships
*.py # Mirror of every standalone script (30)
archie-*.md # Mirror of the 5 slash commands
_shared/scope_resolution.md # Mirror of the shared Phase 0 fragment
skills/archie-deep-scan/ # Mirror of the deep-scan skill subtree (SKILL.md, steps/, fragments/, templates/)
viewer/ # Mirror of share/viewer/ source — built into dist/ at install time
archieignore.default # Default `.archieignore` template
archiebulk.default # Default `.archiebulk` template (three tier, path-based)
platform_rules.json # 30 predefined architectural checks
package.json
share/
viewer/ # React/Vite app — powers archie-viewer.vercel.app AND the local /archie-viewer sidecar
src/pages/ # HomePage, LocalPage (/local), CoverPage (/r/:token), ReportPage (/r/:token/details)
src/components/ # ReportSections, FixThisButton, MermaidDiagram + local/ (TreeNav, browsers, RuleControls, RuleEditModal)
src/lib/ # api.ts (Bundle/ReportResponse types), findings.ts, fixPrompt.ts, autocode.ts
supabase/
migrations/ # reports table + telemetry_events table (RLS-restricted)
functions/upload/ # POST a bundle, get a token
functions/blueprint/ # GET bundle by token
functions/telemetry-ingest/ # Anonymous telemetry ingest (anon key + insert-only RLS)
tests/ # 50 test files
docs/
ARCHITECTURE.md # This file
enterprise-share-setup.md # Customer-side bucket setup walkthrough (CORS + IAM templates)
landing/ # Landing page
v1/ # Archived V1 web app + landing (FastAPI + Next.js, obsolete)
.claude/
commands/
archie-scan.md # Architecture health check (1–3 min)
archie-deep-scan.md # Thin router — invokes the archie-deep-scan skill
archie-intent-layer.md # Standalone per-folder CLAUDE.md regen (shared pipeline w/ deep-scan Step 7)
archie-share.md # Upload bundle to shared viewer
archie-viewer.md # Local viewer launcher
_shared/scope_resolution.md # Phase 0 scope-resolution fragment, loaded by deep-scan (and inlined by scan)
skills/
archie-deep-scan/ # The deep-scan pipeline as a skill: SKILL.md router + steps/ + fragments/ + templates/
scripts/
verify_sync.py # Pre-commit: verify canonical ↔ asset sync
pyproject.toml # Package metadata
CLAUDE.md # AI agent instructions for this repository
README.md # User-facing documentation
User
|
v
Claude Code slash commands (archie-scan, archie-deep-scan, archie-share, archie-viewer)
|
v (orchestration is markdown-in-a-slash-command, not Python — the slash command
| spawns subagents and calls the standalone scripts via Bash)
|
v
Standalone scripts (archie/standalone/*.py) <-- primary runtime path
| scanner, measure_health, detect_cycles,
| finalize, merge, intent_layer, drift,
| extract_output, telemetry, upload, ...
v
File system + Claude Code subagent spawning (Agent tool) + Anthropic API
A parallel Python-package path exists (archie/cli/ + engine/ + coordinator/) for non-Claude-Code execution — but the shipped user experience runs through the slash commands orchestrating the standalone scripts. The standalone scripts are the canonical implementation; the Python package modules are older scaffolding, kept for CI / tests but not on the primary runtime path.
Separation of concerns:
- Engine / standalone scanner — stateless local analysis. No AI, no blueprint writing. Input: repo path. Output:
scan.jsonwith file tree, imports, frameworks, hashes, skeletons,bulk_content_manifest,frontend_ratio. - Coordinator / slash-command orchestration — spawns subagents, builds prompts, merges outputs. Input:
scan.json. Output:blueprint.json+findings.jsonupdates. - Hooks — real-time Claude Code integration. Registered in
.claude/settings.local.jsonat install time. - Renderer — deterministic file generation. Input:
blueprint.json. Output: CLAUDE.md, AGENTS.md, per-folder context, rule files. - Rules — extraction and severity management. Input: blueprint + AI proposals + platform rules. Output:
rules.json. - Share —
upload.pybuilds a bundle (blueprint + findings + scan_report + health + rules) and POSTs it to the Supabase edge function; the React viewer renders it from a token URL.
The engine runs analysis steps in sequence and produces a RawScan (defined in archie/engine/models.py). The standalone scanner produces the same shape as JSON at .archie/scan.json:
class FileEntry(BaseModel):
path: str
size: int = 0
extension: str = ""
bulk: dict | None = None # Present if matched by .archiebulk
class RawScan(BaseModel):
file_tree: list[FileEntry]
token_counts: dict[str, int]
tokens_by_directory: dict[str, int]
dependencies: list[DependencyEntry]
framework_signals: list[FrameworkSignal]
config_patterns: dict[str, str]
import_graph: dict[str, list[str]]
file_hashes: dict[str, str]
entry_points: list[str]
frontend_ratio: float
bulk_content_manifest: dict[str, dict] # category -> {count, frameworks, files}| # | What it does |
|---|---|
| 1 | Build IgnoreMatcher (.gitignore + .archieignore + SKIP_DIRS fallback) |
| 2 | Build BulkMatcher from .archiebulk |
| 3 | Walk directory tree — pruning ignored dirs, emitting FileEntry per surviving file |
| 4 | Classify each file against BulkMatcher — bulk matches get {category, framework} tag; downstream steps skip reading their contents |
| 5 | Parse manifests (requirements.txt, package.json, go.mod, Cargo.toml, pyproject.toml, etc.) |
| 6 | Detect frameworks with confidence scores + evidence |
| 7 | SHA256 hash + token count every non-bulk file |
| 8 | Parse import statements to build directed import graph (Python, JS/TS, Go, Rust, Kotlin, Swift) |
| 9 | Detect entry points by filename pattern |
| 10 | Read first 500 chars of config files |
| 11 | Extract skeletons (class/function signatures + imports + first lines) for every non-bulk source file |
| 12 | Compute frontend_ratio — counts extension-tagged UI source (tsx/jsx/vue/swift/xib/storyboard/dart/xaml) plus bulk files tagged ui_resource against a denominator of readable source + source-shape bulk categories |
| 13 | Assemble scan.json and skeletons.json; return |
Most-restrictive wins. A path matched by a higher tier is not re-considered by a lower tier.
| Tier | File | Semantics |
|---|---|---|
| 1 | .gitignore (root + nested) |
Fully skipped — the scanner never sees the path |
| 2 | .archieignore |
Fully skipped — Archie-specific exclusions on top of .gitignore |
| 3 | .archiebulk |
Visible inventory, opaque contents — recorded in scan.json.file_tree with a bulk: {category, framework} tag and aggregated in bulk_content_manifest, but never skeleton-parsed, hashed, token-counted, or import-parsed |
All three files use gitignore-style globs. .archiebulk patterns carry two whitespace-separated metadata columns (category, optional framework):
# <glob> <category> <framework>
**/res/layout/** ui_resource android
**/res/drawable*/** ui_resource android
**/*.storyboard ui_resource ios
**/*.g.dart generated flutter
**/*.pb.go generated protobuf
**/migrations/**/*.sql migration sql
package-lock.json lockfile node
IgnoreMatcher and BulkMatcher both live in archie/standalone/_common.py. BulkMatcher implements proper ** semantics via _glob_to_regex (translating gitignore-style globs into regex so **/res/layout/** matches at any depth).
Some files are structurally verbose but architecturally inert — Android res/layout/*.xml, iOS .storyboard, generated protobuf Go stubs, minified JS, TypeScript .d.ts, SQL migrations. Their existence and count carry signal (this is an Android project with 248 layouts → frontend_ratio flips to 0.5 → UI Layer agent spawns), but reading their contents burns context for no analytical gain.
| Category | Examples | Counted as source shape? |
|---|---|---|
ui_resource |
Android res/layout/, drawable/, values/; iOS storyboards/xibs |
yes → boosts frontend_ratio |
generated |
.g.dart, .freezed.dart, .pb.go, .min.js, .d.ts, dist/, .next/, .nuxt/, .svelte-kit/ |
yes |
localization |
locales/, i18n/, .arb, .po, .strings |
yes |
migration |
db/migrate/, migrations/**/*.sql, prisma/migrations/ |
yes |
fixture |
fixtures/, seeds/, testdata/, __snapshots__/ |
yes |
asset |
fonts, mipmaps, Assets.xcassets/, res/raw/ |
no (not source shape) |
lockfile |
package-lock.json, yarn.lock, pnpm-lock.yaml, Cargo.lock, go.sum |
no |
dependency |
vendor/ (Go) |
no |
data |
large CSV/JSON catalogs | no |
Every scan / deep-scan agent prompt injects a no-read rule:
scan.json.bulk_content_manifestlists files classified by.archiebulkas "visible inventory, not contents". You may reference these paths by name and inventory counts, but you MUST NOT call Read on them. The scanner has already summarised their shape.
Agents can still Read bulk paths in rare surgical cases (resolving a specific finding) — the prompt allows it as an explicit exception.
Project-specific bulks (e.g. a 20k-key product catalog, a giant CLDR dump) are added in one line to .archiebulk. New frameworks require only new rows in the default template — no scanner code changes.
/archie-scan is a single markdown template in .claude/commands/. /archie-deep-scan is a thin router in .claude/commands/ that invokes the archie-deep-scan skill — a SKILL.md orchestrator plus self-contained per-step files under .claude/skills/archie-deep-scan/ (steps/, fragments/, templates/). The skill split keeps each step independently readable and lets the pipeline survive /compact and resume via --continue / --from N. Both orchestrate the pipeline by:
- Calling standalone Python scripts for deterministic steps (
scanner.py,measure_health.py,detect_cycles.py,finalize.py,extract_output.py,drift.py,intent_layer.py,telemetry.py). - Spawning subagents via the Agent tool for AI steps (3–4 parallel Sonnets in Wave 1, one Opus in Wave 2, one Sonnet for rule synthesis, N Sonnets for Intent Layer if opted in).
- Using
AskUserQuestionfor all single-choice prompts (scope picker, parallel/sequential, Intent Layer opt-in) — no free-text answers to parse.
Kept for CI/tests and standalone Python usage. Groups files into token-budgeted SubagentAssignments (150k tokens per group, bin-packed by top-level directory), builds prompts, spawns claude -p subprocesses, parses JSON responses with three-strategy fallback, merges partial blueprints. Not exercised by the default slash-command flow.
1–3 minutes. Designed for frequent runs.
Phase 1 Data Gathering (deterministic, seconds)
scanner.py, measure_health.py, detect_cycles.py, git log
Phase 2 Read accumulated knowledge
skeletons.json, scan.json, health.json, blueprint.json (if exists),
findings.json, rules.json, proposed_rules.json
Phase 3 Parallel analysis — 3 Sonnet agents
Agent A (architecture + dependencies)
Agent B (health + complexity)
Agent C (patterns + rules)
Each agent gets its slice of findings.json and is told to prioritise NEW problems.
Phase 4 Synthesis (single AI pass, Sonnet)
4a Evolve blueprint (does NOT touch blueprint.pitfalls — deep-scan-Opus-only)
4b Write structured findings to .archie/findings.json (id-stable upsert)
4c Write prose scan report to .archie/scan_history/ + .archie/scan_report.md
4d Save semantic duplications to .archie/semantic_duplications.json
(deterministic via `extract_output.py save-duplications`)
Phase 5 Proposed rules to .archie/proposed_rules.json
Phase 6 Telemetry write
15–20 minutes on first run; --incremental mode handles later runs in 3–6 min.
Skill structure. /archie-deep-scan is a Claude Code skill, not a monolithic command file. .claude/commands/archie-deep-scan.md is a thin router; the real pipeline lives in .claude/skills/archie-deep-scan/:
SKILL.md— the orchestrator: flag parsing (--incremental/--continue/--from N/--reconfigure), the resume preamble, and a step-routing table.steps/step-1-scanner.md…step-10-telemetry.md— one self-contained file per step (Step 3 expands intostep-3-wave1/with one prompt file per Wave 1 agent + shared grounding rules).fragments/— cross-step contracts:telemetry-conventions.md,compact-resume-contract.md,resume-prelude.md.templates/scan-report.md— the scan-report template Step 9 fills in.- Phase 0 scope resolution is loaded from the shared
.claude/commands/_shared/scope_resolution.mdfragment.
The router Reads only the files for the steps it actually runs, so a --from 7 resume never loads Steps 1–6, and .archie/deep_scan_state.json rehydrates shell variables after a /compact.
Phase 0 Scope resolution (interactive, AskUserQuestion)
- monorepo detection -> whole/per-package/hybrid/single picker
- if multi-workspace: AskUserQuestion for parallel/sequential
- AskUserQuestion for Intent Layer opt-in (Step E)
Step 1 Scanner (same as fast scan)
Step 2 Read accumulated knowledge
Step 3 Wave 1 (parallel) — 3–4 Sonnet agents
Structure, Patterns, Technology [+ UI Layer if frontend_ratio >= 0.20]
Writes /tmp/archie_agent_*.json
Step 4 Merge Wave 1 outputs into blueprint_raw.json via merge.py
Step 5 Wave 2 (single Opus) — synthesis
Reads blueprint_raw.json + findings.json
Runs three probes: A complexity-budget, B invariants & gates, C seams
Emits decision chain, architectural style, key decisions,
trade-offs, out-of-scope, findings (upgrade + new),
pitfalls, architecture diagram, implementation guidelines
Writes /tmp/archie_sub_x_*.json
finalize.py deep-merges into blueprint.json and upserts findings into the store
Step 6 Rule synthesis (single Sonnet) — proposes architecturally-grounded rules
Step 7 Intent Layer (opt-in) — per-folder CLAUDE.md via DAG scheduling
If INTENT_LAYER=no from Step E, this step is skipped and
telemetry records "skipped": true
Step 8 Cleanup
Step 9 Drift Detection & Architectural Assessment
Mechanical drift (drift.py) + Deep AI drift (single Sonnet)
Writes scan_report.md to scan_history/
Step 10 Telemetry write
Skips Wave 1 entirely. One scoped Reasoning agent receives blueprint.json + blueprint_raw.json + changed-files list + findings.json, returns only the sections that need updating, and finalize.py --patch deep-merges the diff.
--continueresumes from the last completed step (tracked in.archie/deep_scan_state.json).--from Nresumes from a specific step.
.archie/findings.json is a shared, compounding store — both /archie-scan and /archie-deep-scan read from it and write back to it. Neither requires the other to have run first.
{
"scanned_at": "YYYY-MM-DDTHHMM",
"scan_count": N,
"findings": [
{
"id": "f_NNNN",
"problem_statement": "One sentence, specific.",
"evidence": ["src/file.py:42", "pattern observed in 7 modules", "..."],
"root_cause": "Names a decision/pattern/constraint, specific to this codebase.",
"fix_direction": "Single tactical sentence." | ["step 1", "step 2", "step 3"],
"severity": "error | warn | info",
"confidence": 0.85,
"applies_to": ["src/auth/", "src/routes/profile.py"],
"source": "scan:structure | scan:health | scan:patterns | deep:synthesis",
"depth": "draft | canonical",
"pitfall_id": "pf_NNNN (optional, when root_cause is structural)",
"first_seen": "YYYY-MM-DDTHHMM",
"confirmed_in_scan": 1,
"status": "active | resolved"
}
]
}| Event | Action |
|---|---|
Scan agent surfaces a problem matching an existing finding (similar problem_statement ∧ overlapping applies_to) |
Reuse id, keep first_seen, increment confirmed_in_scan, keep evidence |
| Scan agent surfaces a novel problem | Assign next-free f_NNNN, set first_seen = today, confirmed_in_scan = 1, depth: "draft" |
| Previous finding no longer appears | Flip status: "resolved", add resolved_at. Preserved as history, not deleted |
| Deep-scan Wave 2 Opus upgrades a draft | Preserve id, first_seen, applies_to, evidence; rewrite root_cause with architectural grounding; rewrite fix_direction as an ordered list of sequenced steps; flip depth: "canonical", source: "deep:synthesis"; link to parent pitfall via pitfall_id if structural |
| Deep-scan Wave 2 Opus discovers a NEW problem | Mint a new f_NNNN after running a novelty check against the existing store |
Deterministic Python code enforces id-stable upsert: entries whose id matches an existing one are replaced in place (with preserved first_seen and bumped confirmed_in_scan), unmatched entries in the new set get minted, and existing entries not referenced by the current run are left untouched (scan is responsible for marking resolution).
Every scan agent (A / B / C) receives the findings store scoped to its source slice and is explicitly told:
Those problems are already known — your job is not to rediscover them. Emit them with the same
problem_statementwording (so 4b can id-match them) without re-deriving evidence. Spend your cognitive budget on problems NOT in the store.
Same rule for Wave 2 Opus: the primary goal is NEW findings visible only from the overall-picture pass; upgrading drafts is secondary housekeeping.
Pitfalls are classes of problem — architectural traps rooted in decisions or patterns. They share the same 4-field core as findings but live in blueprint.pitfalls (blueprint-durable, not per-run).
| Aspect | Finding | Pitfall |
|---|---|---|
| Altitude | Instance (specific files) | Class-of-problem |
applies_to |
File-level | Component/folder-level |
source |
scan:{structure,health,patterns} or deep:synthesis |
deep:synthesis only |
depth |
draft or canonical |
canonical only |
| Owner | Both commands write to findings.json | Only deep-scan Wave 2 Opus writes to blueprint.pitfalls |
| Persistence | Compounding store, status flips on resolve | Durable blueprint entries |
| ID | f_NNNN |
pf_NNNN |
A single pitfall may have multiple confirming findings via pitfall_id.
The installer (npx @bitraptors/archie) generates six hooks and registers them in .claude/settings.local.json:
pre-validate.sh (PreToolUse, matcher: Write|Edit|MultiEdit)
- Rule injection (Tier 4) — before the violation check, prints every rule that applies to the file's path (rules with
applies_toprefix-matchingrel_path) plus every rule taggedalways_inject: true(critical globals). Each rule carries inline semantic content the agent reads at edit time —description, theWHY:block, and anEXAMPLE:block — so no blueprint read is needed. Deduped per-turn via/tmp/.archie_turn_<cksum-of-project-root>marker so the same rule doesn't re-surface on every Edit within a turn. - Violation check — runs the mechanical
checkrules (forbidden_import,required_pattern,forbidden_content,architectural_constraint,file_naming). severity_classgating — the hook gates its response on the rule'sseverity_class:decision_violation/pitfall_triggered/mechanical_violationblock (exit 2),tradeoff_underminedwarns (exit 0, prominent),pattern_divergenceinforms (exit 0, quiet). Old-shape rules withoutseverity_classfall back toseverity+rationale.- Reads the content being written for deeper validation (not just file path); uses
printf %s+ tempfile + JSON parse to pass tool input to Python without shell-escaping bugs. - Loads
rules.json+platform_rules.json; consults the pre-computed.archie/rule_index.json(keyword / path / always-inject buckets, rebuilt byrule_index.py) for hot-path lookup.
pre-turn.sh (UserPromptSubmit)
- Clears the per-turn rule-injection marker at the start of each new user turn so applicable rules re-surface on the first Write/Edit of that turn.
pre-commit-review.sh (PreToolUse, matcher: Bash)
- Filters to fire only on
git commitcommands - Triggers an architectural review of the staged diff via
arch_review.py
post-plan-review.sh (PostToolUse, matcher: ExitPlanMode)
- Triggers an architectural review of the plan via
arch_review.py
blueprint-nudge.sh (PreToolUse, matcher: Glob|Grep)
- Always-on architectural reminder — fires before code exploration to remind the agent about project architecture
post-lint.sh (PostToolUse, matcher: Write|Edit|MultiEdit)
- Opt-in external linter gate — reads
.archie/enforcement.json; no-op unless{"enabled": true} - Auto-detects the right linter per file extension + project config: Python (
.py+pyproject.toml [tool.ruff]→ruff check --quiet), JS/TS (.js/.ts/...+.eslintrc+ eslint on PATH →eslint --quiet), Go (.go+.golangci.yaml→golangci-lint run --faston the parent dir since golangci-lint is package-aware), Semgrep (any file type +.semgrep.yml→semgrep --error --quiet) - Config overrides let users pin custom commands per kind
severity: error→ exit 2 blocks;severity: warn→ exit 0 with message
The installer writes a 29-entry allow list into .claude/settings.local.json so the workflow runs without interactive prompts:
Bash(python3 .archie/*.py *),Bash(python3 -c *)Bash(git *),Bash(sort *),Bash(head *),Bash(test *),Bash(cp *),Bash(ls *),Bash(wc *),Bash(cat *),Bash(echo *),Bash(for *),Bash(mkdir *),Bash(date *)Bash(rm -f /tmp/archie_*),Bash(rm -f .archie/health.json)Write(//tmp/archie_*),Read(//tmp/archie_*)Read(.archie/*),Read(.archie/**),Write(.archie/*),Write(.archie/**),Edit(.archie/*),Edit(.archie/**)Read(**)Write(**/CLAUDE.md),Edit(**/CLAUDE.md)Agent(*)for subagent spawning
All hooks fail open: missing rules/config/marker files → hooks exit 0 silently.
Every Sonnet/Opus subagent spawned during a scan receives a mandatory instruction to Write its own output directly to /tmp/archie_*.json using the Write tool (permissioned via Write(//tmp/archie_*)). The orchestrator never copies subagent transcripts. This avoids Claude Code's sensitive-file guardrail on ~/.claude/projects/.../subagents/*.jsonl (which used to fire a permission prompt on every batch), keeps subagent output out of the orchestrator's context (less compaction pressure), and isolates failures (missing confirmation line or missing file → clear signal, no silent fallback to transcript scraping).
The contract is enforced in 6 spawn sites across the slash commands: Wave 1 structural agents (3–4 Sonnets), Wave 2 reasoning agent (full + incremental paths), rule-proposer agent, deep-drift reviewer, and Intent Layer enrichment subagents.
Rules come from two AI-synthesized sources plus a universal platform set. They all share one inline rich rule shape — each rule carries the semantic content the agent needs at edit time, so the pre-edit hook never has to read the blueprint or follow a pointer.
Retired: the deterministic blueprint extractor (
archie/rules/extractor.py::extract_rules) was removed from the pipeline in v2.5.0. Itsallowed_dirslookup went stale and AI rule synthesis (source 1 below) covers placement + naming with full semantic content the extractor couldn't express. The file still exists for test coverage but nothing calls it. A fresharchie initwrites an emptyrules.json; the user populates it by running/archie-deep-scanor/archie-scan.
The Step 6 agent reads the synthesized blueprint and proposes the architectural rule baseline. Each rule carries inline semantic content:
severity_class— one ofdecision_violation/pitfall_triggered/tradeoff_undermined/pattern_divergence/mechanical_violation(drives how the pre-edit hook responds)description— what the rule enforceswhy— the reasoning, copy-pasted from the motivating blueprint sectionexample— canonical code shape, fromimplementation_guidelines.usage_examplewhen presentforced_by/enables/alternative— links back to the motivating decisionsource: "deep_scan"
Mechanical rules (regex-checkable housekeeping) additionally carry check + forbidden_patterns / required_in_content fields, and may emit a code_shape entry consumed by code_shape.py + the rule index.
/archie-scan proposes new rules in the same inline shape (source: "scan"), or amends existing ones (source: "scan-amended"), by spotting drift in recently-changed files. Severity is inferred from the blueprint anchor, never invented — and scan never promotes a rule to blocking on its own; that happens at the next deep-scan with full Wave 2 reasoning. The mechanical check types remain: forbidden_import, required_pattern, forbidden_content, architectural_constraint, file_naming.
Old-shape rules (no severity_class / why / example) still work — the hook and renderer fall back to severity + rationale.
Installed with every project. Categories:
| Category | Count | Examples |
|---|---|---|
architecture |
12 | Android ViewModel/Context separation, Fragment/network, Swift view-layer networking, React components fetching data, TypeScript any, array index keys |
safety |
6 | Swift force unwraps / force try, Python TYPE_CHECKING guards, React DOM manipulation |
erosion |
5 | God-functions, growing complexity, monster files |
decay |
4 | Empty catches, disabled tests, TODO/HACK markers, debug breakpoints |
security |
3 | Hardcoded secrets, eval/exec, plaintext API keys in logs |
Severity can be changed from /archie-viewer or by Claude Code during scans. Rules are stored in .archie/rules.json as {"rules": [...]}.
Rules don't do anything unless the agent sees them at the right moment. Archie uses a layered delivery model:
| Tier | Mechanism | What it covers | Effect |
|---|---|---|---|
| Session start | CLAUDE.md, AGENTS.md, .claude/rules/*.md auto-loaded by Claude Code |
All rules (prose + mechanical) | Agent-aware |
| Prompt time | UserPromptSubmit hook keyword-matches rules.json entries against the user's prompt |
Rules with keywords array OR severity: error (always-inject) |
Surfaces relevant rules before the agent starts writing |
| Edit time — injection (Tier 4) | PreToolUse pre-validate.sh prints matching rules — description + WHY: + EXAMPLE: — before any violation check |
Rules with applies_to prefix-matching the file path, and rules with always_inject: true |
Re-surfaces rule + reasoning + example at the point of edit, deduped per-turn |
| Edit time — mechanical block (Tier 1) | PreToolUse pre-validate.sh runs regex / glob checks, then gates the response on severity_class |
Rules with a check field (forbidden_content, forbidden_import, required_pattern, architectural_constraint, file_naming) |
decision_violation / pitfall_triggered / mechanical_violation → exit 2 (blocked); tradeoff_undermined → warn; pattern_divergence → quiet inform |
| Post-edit — external linter (Tier 3, opt-in) | PostToolUse post-lint.sh runs project's native linter on changed file |
Standard language-level issues (ruff / eslint / golangci-lint / semgrep) | Blocks if severity: error, warns otherwise |
Plan-time and commit-time, align_check.py adds a semantic layer: it compares the agent's intent (ExitPlanMode plan text or staged diff) against each rule's description + why + example via a single Claude CLI call, catching divergence the regex check rules can't express.
Rule schema fields that drive this ladder:
severity_class—decision_violation/pitfall_triggered/tradeoff_undermined/pattern_divergence/mechanical_violation; gates the Tier 1 hook response (old-shape rules fall back toseverity)description/why/example— inline semantic content surfaced at edit time (Tier 4), no blueprint read neededforced_by/enables/alternative— links back to the motivating architectural decision (rendered in rule cards + topic files)keywords: [...]— 2–5 terms for prompt-time matching (Tier 2)applies_to: "path/prefix"— scopes the rule to edits under that path (Tier 4 injection + content-check scoping)always_inject: true— critical globals that should re-surface at every first-edit-of-turn regardless of path (Tier 4)check: "..."+forbidden_patterns/required_in_content/file_pattern(+ optionalcode_shape) — mechanical enforcement (Tier 1)source— provenance:deep_scan(baseline),scan/scan-amended(senior-architect pass),scan-adopted(curated in),platform
The AI rule synthesizer in /archie-deep-scan Step 6 and the /archie-scan proposer emit keywords for every rule and pick the narrowest meaningful applies_to scope, promoting broad-but-critical globals to always_inject rather than leaving them scopeless. rule_index.py pre-computes .archie/rule_index.json from rules.json + platform_rules.json so the hot-path hook does bucket lookups instead of scanning every rule.
Tier 4 injection uses /tmp/.archie_turn_<cksum-of-project-root> as a marker file: pre-validate.sh appends each injected rule id after printing it; pre-turn.sh (UserPromptSubmit) clears the file at the start of each user turn. Within a single turn, editing N files doesn't re-inject the same rule N times — the agent sees it once, keeps it in context, and moves on.
Deterministic JSON-to-Markdown, runs independently with just a blueprint.json:
python3 .archie/renderer.py /path/to/projectProduces:
AGENTS.md— the canonical agent context: architecture summary, decision chains, and deep-links into the topic rule filesCLAUDE.md— a thin pointer toAGENTS.md, so Claude Code and agent-agnostic tools load the same context (the renderer can still preserve hand-authored content in either file via bracketed markers).claude/rules/*.md— topic-split rule files:architecture,patterns,frontend,guidelines,pitfalls,dev-rules,infrastructure,technology(the lean Commands catalog moved here), plus a browsableenforcement/directory (index.md+ per-topic +universal.md) indexing every rule the pre-edit hook and the plan/commit classifier consult, grouped by severity and path glob
Pitfalls and findings use a shared _render_pitfall_lines helper that handles the 4-field shape (problem_statement / evidence / root_cause / fix_direction as list or string) and falls back to the legacy {area, description, recommendation} shape for blueprints written before 2.3.0.
Per-folder CLAUDE.md generation via bottom-up DAG scheduling:
- Leaf folders processed first, parents inherit child summaries
- Incremental re-generation: only folders containing changed files + their parent chain
- State tracked automatically in
.archie/enrich_state.json - Batches of folders processed in parallel waves (spawned Sonnet subagents)
Intent Layer is opt-in at deep-scan Step E. When skipped, a one-line note is printed and telemetry records "skipped": true.
30 zero-dependency Python scripts in archie/standalone/. These are exported to target projects via npx @bitraptors/archie.
| Script | Purpose |
|---|---|
_common.py |
IgnoreMatcher, BulkMatcher, _glob_to_regex, DECISION_RE, normalize_blueprint(), JSON helpers |
scanner.py |
File tree, import graph, framework detection, skeleton extraction, bulk classification, frontend_ratio |
renderer.py |
Blueprint JSON → AGENTS.md (canonical) + CLAUDE.md pointer + .claude/rules/ topic files + enforcement/ directory |
intent_layer.py |
Per-folder CLAUDE.md via DAG scheduling + AI enrichment. Subcommands: prepare, next-ready, suggest-batches, prompt, save-enrichment, merge, inspect [--query] [--list], scan-config, deep-scan-state (incl. save-run-context for shell-friendly run-context writes) |
viewer.py |
Local viewer — stdlib http.server serving the React dist/ + /api/bundle (reuses upload.py::build_bundle), /api/generated-files, /api/folder-claude-mds, /api/intent-layer-status, /api/ignored-rules, POST /api/rules (5 atomic rule actions); auto-reloads when its own source changes |
drift.py |
Mechanical drift detection |
validate.py |
Cross-reference blueprint against actual codebase |
check_rules.py |
Check files against rules (CI path) |
measure_health.py |
Erosion, gini, verbosity, top-20%, waste scores + --append-history + --compare-history (trend deltas) |
code_shape.py |
Code-shape matching primitives — a code_shape entry on a rule, consumed by pre-validate.sh + rule_index.py |
detect_cycles.py |
Tarjan's SCC on the import graph |
install_hooks.py |
6 hooks + 29 permissions in .claude/settings.local.json |
merge.py |
Merge blueprint sections; extract_json_from_text handles conversation envelopes / code fences |
finalize.py |
Normalise blueprint + deep-merge Opus output + id-stable findings upsert + pitfalls into blueprint |
verify_findings.py |
Parallel Haiku verifier — per finding, reads the cited triggering_call_site + surrounding files and returns keep / demote / drop |
apply_verdicts.py |
Applies the verifier's verdicts to findings.json with cross-run hysteresis (single-scan flips don't propagate; a git-diff anchor lets real transitions land immediately) |
rule_index.py |
Builds .archie/rule_index.json (keyword / path / always-inject buckets) from rules.json + platform_rules.json for hot-path edit-time enforcement |
align_check.py |
Phase 3 semantic alignment classifier — compares plan text / staged diff against each rule's description + why + example via one Claude CLI call |
migrate_blueprint_rules.py |
Migrates legacy blueprint-derived rule sections (pre-3.0) into proposed_rules.json |
arch_review.py |
Architectural review checklist for plans and diffs |
refresh.py |
File change detection (hash comparison) |
extract_output.py |
Subcommands: rules, deep-drift, recent-files, save-duplications |
telemetry.py |
Per-run step-level wall-clock timing → .archie/telemetry/<command>_<ts>.json. Subcommands: mark, finish, extra, read, write, clear, steps-count |
telemetry_sync.py |
Anonymous opt-in usage telemetry — records events to ~/.archie/analytics/runs.jsonl and pushes to the Supabase telemetry-ingest function. Subcommands incl. record-event, record-install, post-run, status, purge |
update_check.py |
Anonymous opt-in npm-registry update check (cached, snooze ladder 24h→48h→7d). Prints UPGRADE_AVAILABLE / JUST_UPGRADED markers for slash-command preambles |
config.py |
Machine-level config at ~/.archie/config.json — telemetry consent tier, update-check prefs, stable random install id |
analytics.py |
Local analytics dashboard over ~/.archie/analytics/runs.jsonl (7d / 30d / all windows) |
upload.py |
Build share bundle (build_bundle — also reused by viewer.py). Default mode POSTs raw bundle to Supabase edge function. Enterprise modes wrap bundle in {bundle, created_at} envelope and either (a) sigv4-PUT directly to customer S3 bucket + generate presigned GET URL (--mode enterprise-creds), or (b) do plain HTTP PUT to a customer-provided presigned URL (--mode enterprise-paste --put-url ... --get-url ...). All modes produce a URL; enterprise modes encode the GET URL in the viewer URL's fragment. |
share_setup.py |
One-time setup wizard for enterprise share Mode 2A. Accepts --bucket --region --access-key-id --secret-access-key [--key-prefix] [--presign-expires-seconds] and writes ~/.archie/share-profile.json with chmod 600. Per-user (not per-project). |
lint_gate.py |
Opt-in external linter gate (Tier 3). Invoked by post-lint.sh; reads .archie/enforcement.json; auto-detects ruff / eslint / golangci-lint / semgrep based on project config files + binary on PATH; per-kind config overrides; target: "parent" dispatch for package-aware linters (golangci-lint) |
npx @bitraptors/archie /path/to/project [--commands-dir dir] performs:
- Preflight — require Node 18+; reject unknown flags and print
--help(a past source of bogus installs was silently-swallowed typos) - Clean install — removes old
.pyscripts from.archie/, oldarchie-*.mdcommands, the legacyarchie-deep-scan/command subtree,_shared/scope_resolution.md, the.claude/skills/archie-deep-scan/subtree,.claude/hooks/, the hook section of.claude/settings.local.json, and the oldplatform_rules.json— so re-installs are clean and upgrades are safe in-place. User data (blueprint.json,findings.json,rules.json, …) is preserved. - Create
.claude/commands/,.claude/skills/, and.archie/in the target project - Copy the 5 slash commands +
_shared/scope_resolution.md - Copy the
archie-deep-scanskill subtree to.claude/skills/archie-deep-scan/(SKILL.md,steps/,fragments/,templates/) - Copy 30 standalone Python scripts to
.archie/ - Copy
platform_rules.json - Copy the React viewer source into
.archie/viewer/and build it once (npm ci+vite build, then dropnode_modules) — cached by a.archie-versionmarker so unchanged versions skip the ~45s build - Copy
.archieignore+.archiebulkdefaults (only if not already present — preserves user customisations) - Append
.gitignoreentries for installed tooling (idempotent, handles upgrade from older versions) - Run
python3 install_hooks.pyto set up 6 hooks + 29 permissions in.claude/settings.local.json - Write the machine-level version marker (
~/.archie/version); on a version change, markJUST_UPGRADEDso the next slash command can acknowledge it - Print installation summary + next steps
The installer does not prompt for telemetry consent — an npx install can be non-interactive (CI, pipe, agent shell), so a prompt there is unreliable. Consent is asked in-session by the first slash command instead (see Telemetry).
If the viewer build fails (no internet, old Node), the installer keeps going — scripts and hooks still install; only /archie-viewer is affected until the user re-runs.
Exact copies of every canonical file the installer ships — the 30 scripts, the 5 commands, _shared/, the skills/archie-deep-scan/ subtree, the viewer/ source, and the .archieignore / .archiebulk / platform_rules.json defaults. See File Sync Protocol.
| Command | File | Purpose |
|---|---|---|
/archie-scan |
archie-scan.md |
Architecture health check: 3 parallel Sonnets + single AI synthesis. Writes to findings.json and semantic_duplications.json |
/archie-deep-scan |
archie-deep-scan.md (router) → skills/archie-deep-scan/ |
Full 2-wave analysis. The command file is a thin router; the pipeline is the archie-deep-scan skill. Supports --incremental, --continue, --from N, --reconfigure. Intent Layer is opt-in via Step E. Step 7 delegates to /archie-intent-layer Phases 1–4 as a single source of truth |
/archie-intent-layer |
archie-intent-layer.md |
Standalone per-folder CLAUDE.md regen. Phase 0.5 asks Full/Incremental/Auto; Auto uses deep-scan-state detect-changes against last_deep_scan.json. Hard-requires blueprint.json. Same Sonnet subagent DAG used by deep-scan Step 7 |
/archie-share |
archie-share.md |
Upload bundle to hosted viewer via upload.py + return URL |
/archie-viewer |
archie-viewer.md |
Launches viewer.py — the local React sidecar at localhost:5847/local (Blueprint + Files tabs) |
All choice prompts in scan/deep-scan use AskUserQuestion (monorepo scope picker, parallel/sequential, Intent Layer opt-in) — no free-text answers to parse. The scope (Step C) and Intent Layer (Step E) prompts are mandatory decision gates: the skill instructs the agent to ask them even under a "no clarifying questions" harness mode, since they change which trees get scanned and what files get written.
archie-deep-scan/ — the deep-scan pipeline itself, packaged as a skill. SKILL.md is the orchestrator/router; steps/ holds one file per pipeline step (Step 3 fans out into step-3-wave1/ with a prompt file per Wave 1 agent); fragments/ holds the cross-step telemetry and /compact-resume contracts; templates/ holds the scan-report template. Moving the substeps under skills/ instead of commands/ keeps Claude Code from listing every substep file as its own slash command.
Upload a blueprint + findings + scan report to a hosted React viewer. Three modes — picked interactively at share time.
| Mode | Where blueprint lives | Credentials on dev disk | URL shape | BitRaptors stores |
|---|---|---|---|---|
| Default | BitRaptors Supabase | none | archie-viewer.vercel.app/r/{token} |
Full bundle JSON |
| Enterprise — stored credentials (Mode 2A) | Customer's S3 bucket | AWS access key/secret in ~/.archie/share-profile.json (chmod 600) |
archie-viewer.vercel.app/r/ext#{base64url(presigned-GET)} |
Nothing |
| Enterprise — paste URL (Mode 2B) | Customer's bucket (any S3-compatible / Azure / GCS) | None — InfoSec mints presigned PUT per share | archie-viewer.vercel.app/r/ext#{base64url(GET)} |
Nothing |
Default is the out-of-the-box behavior. Enterprise modes are for teams whose InfoSec blocks uploading architecture data to third-party infrastructure.
The URL fragment (#...) is a browser-only construct — never transmitted to any server in HTTP requests. In enterprise modes:
- Archie uploads the bundle directly to the customer's bucket.
- Archie base64url-encodes the GET URL (presigned in Mode 2A, customer-supplied in Mode 2B) into the share URL's fragment.
- When a teammate opens the URL, their browser loads the static viewer JS from Vercel but the fragment never goes over the network.
- The viewer reads
window.location.hashclient-side, decodes the GET URL, andfetch()s the blueprint directly from the customer bucket.
Net effect: BitRaptors' only role in enterprise mode is serving static JS from Vercel. No data at rest, no pointer storage, no metadata captured.
Local project Supabase (upload function) Supabase DB
| | |
archie-share | |
| | |
python3 .archie/upload.py -----> POST /functions/v1/upload --> insert report
| build_bundle() | token, bundle
| - blueprint.json | |
| - findings.json (if exists) | |
| - health.json (stripped) | |
| - scan_meta, rules_adopted | |
| - rules_proposed | |
| - scan_report.md | |
| - semantic_duplications | |
| v |
|<---------------------------- {"token": "…"} |
| |
Print: archie-viewer.vercel.app/r/{token} |
|
User shares URL --> React viewer --> GET /functions/v1/blueprint?token=… -|
(CoverPage preview + ReportPage details)
Setup (one-time):
python3 .archie/share_setup.py \
--bucket acme-archie-shares \
--region us-east-1 \
--access-key-id AKIA… \
--secret-access-key … \
[--key-prefix archie-shares/] \
[--presign-expires-seconds 604800]share_setup.py writes ~/.archie/share-profile.json with chmod 600 (owner read/write only). The file never lives in the project directory — it's per-user, not per-project.
Flow:
Local project Customer S3 bucket
| |
archie-share (mode=enterprise-creds) |
| |
python3 .archie/upload.py --mode=... |
| _read_share_profile() |
| _build_enterprise_bundle() |
| _sigv4_sign_put() --------> PUT /archie-shares/<uuid>.json
| |
| _sigv4_presign_get() <-- returns presigned GET URL (7-day expiry)
| |
| _build_enterprise_share_url() |
Print: archie-viewer.vercel.app/r/ext#<base64url(get_url)>
Teammate opens URL:
Browser --> Vercel (static JS only) --> fetches blueprint directly
from customer bucket via fragment-decoded URL
SigV4 signing is pure stdlib (hashlib, hmac, urllib.parse, datetime) — no boto3. Implementation validated against AWS's canonical presigned-GET test vector (signature aeeed9bbccd4d02ee5c0109b86d86835f995330da4c265957d157751f604d404).
Scope: AWS virtual-hosted-style ({bucket}.s3.{region}.amazonaws.com). S3-compatible services (R2, B2, Minio, Wasabi) use different DNS shapes and need a future endpoint_url profile override. Non-AWS users should use Mode 2B.
No setup, no credentials persisted. Per-share workflow:
Dev: InfoSec:
/archie-share acme-archie-url-mint (Lambda/script/ticket)
picks enterprise-paste generates presigned PUT + GET pair
prompts for URLs — URLs expire (typically 1h)
pastes URLs <------------------- hands over
|
python3 .archie/upload.py \
--mode=enterprise-paste \
--put-url "<PRESIGNED_PUT>" \
--get-url "<PRESIGNED_GET>"
|
HTTP PUT --------> customer bucket (via presigned PUT URL)
|
_build_enterprise_share_url(GET)
Print: archie-viewer.vercel.app/r/ext#<base64url(get_url)>
Cloud-agnostic by design — Mode 2B is the recommended path for Azure Blob, GCS, Artifactory, or any HTTP-PUT-accepting endpoint. InfoSec keeps full control over URL lifetime and audit trail via their own minter.
Shared across all three modes:
{
"blueprint": {...}, # required
"health": {...}, # stripped (top-20 high-CC functions, top-10 duplicates)
"scan_meta": {...}, # frameworks, subproject count, frontend_ratio
"rules_adopted": {...},
"rules_proposed": {...},
"scan_report": "...",
"semantic_duplications": [...], # structured, from semantic_duplications.json
"findings": [...] # from findings.json (shared store)
}In enterprise modes, the bundle is wrapped in an envelope {bundle, created_at} before upload so the viewer's ReportResponse shape matches both flows. created_at is the scan timestamp (blueprint.meta.scanned_at / last_scan), not the upload time, so re-shares show a stable date.
Used only in default share mode — not touched by enterprise modes.
upload/index.ts— accepts JSON up to 5 MB, validatesblueprintfield, generates 24-char token, inserts{token, bundle, size_bytes}into thereportstable.blueprint/index.ts— fetches bundle by token, returns{bundle, created_at}.telemetry-ingest/index.ts— receives anonymous opt-in usage events (see Telemetry). Uses the Supabase anon key plus an RLS-restricted insert-only policy ontelemetry_events(never the service role); RLS is hardened to block direct PostgREST bypass.
The same React/Vite app powers both the hosted share viewer (archie-viewer.vercel.app) and the local /archie-viewer sidecar. It does not assemble the bundle itself — upload.py::build_bundle is the single bundle assembler, and viewer.py imports it directly, so local and cloud render byte-identical data. The unifying contract is the ReportResponse envelope { bundle, created_at } (src/lib/api.ts).
Routes (src/main.tsx):
/local—LocalPage.tsx: fetches/api/bundlefromviewer.pyand hands the bundle toReportPage. Adds a Files tab (per-folder CLAUDE.md browser + generated-files tree) and inline rule actions thatPOST /api/rules./r/{token}—CoverPage.tsx: executive summary, top findings, headline metrics, hero CTA./r/{token}/details—ReportPage.tsx: the full report with sidebar navigation. Sections: Executive Summary, System Health, Architecture Diagram, Workspace Topology, Architecture Rules, Development Rules, Key Decisions, Trade-offs, Implementation Guidelines, Communications, Components, Integrations, Technology Stack, Deployment, Architectural Problems, Pitfalls. A "Fix this" button on each finding/pitfall generates an agent-agnostic fix prompt (src/lib/fixPrompt.ts).
fetchReport(token) in src/lib/api.ts routes on the token value:
- Any token OTHER than the sentinel
ext→ GET${SUPABASE}/blueprint?token=X(default flow, unchanged) - Token
ext→ readwindow.location.hash, base64url-decode to the GET URL, fetch directly from customer bucket with CORS + 403-expired error messaging
Local mode bypasses fetchReport entirely — it talks to viewer.py's localhost API. The sentinel routing means zero behavioral change for existing share URLs — anything without the ext token falls through to the legacy Supabase path.
The customer's bucket needs CORS allowing https://archie-viewer.vercel.app (so the browser can fetch() cross-origin) plus an IAM user with narrow s3:PutObject + s3:GetObject scoped to the prefix. See docs/enterprise-share-setup.md for the CORS policy template, IAM policy template, and step-by-step walkthrough for an InfoSec team.
The /archie-share slash command also offers a "Help me ask InfoSec for a bucket" option — the agent asks three context questions (company name, preferred bucket name, cloud provider) and composes a ready-to-paste request with the CORS + IAM JSON inlined, adapted to AWS / Azure / GCS.
Old bundles on Supabase (uploaded before the 4-field schema migration) are rendered correctly:
normalizePitfallbridges the legacy{area, description, recommendation, stems_from}shape into the current{problem_statement, evidence, root_cause, fix_direction}shape.RichFindingBodyrendersdescriptionas a lead paragraph so legacy prose bodies are never dropped.scanReportAssertsZeroSemanticDup()detects explicit "no semantic duplication detected" verdicts in the scan report prose and treats them as a structured 0, so older bundles that never wrotesemantic_duplications.jsonstill render a trustworthy count.- Decision precedence in the viewer: structured
bundle.findings→ scan report zero detector → heuristic markdown parse → unknown.
The blueprint is the single source of truth for synthesised architecture. All rendered outputs derive from it. Concrete findings live in the separate findings.json shared store.
blueprint.json
meta # Executive summary, platforms, schema version, scan_count, confidence
architecture_rules
file_placement_rules[] # Where each file type belongs
naming_conventions[] # How files and classes should be named
decisions
architectural_style # Title, chosen, rationale, alternatives_rejected
decision_chain # Rooted constraint tree with violation_keywords per node
key_decisions[] # Each with forced_by / enables / alternatives_rejected
trade_offs[] # Accept, benefit, caused_by, violation_signals
out_of_scope[] # Explicit boundary markers with linking decisions
components
structure_type # layered, modular, monolith, feature-based, flat, etc.
components[] # Name, location, responsibility, depends_on, confidence
communication
patterns[] # Design and communication patterns (name, when_to_use, how_it_works,
# + preconditions: applicable_when / do_not_apply_when / scope)
integrations[] # service, purpose, integration_point (file path)
pattern_selection_guide[]
quick_reference
where_to_put_code # File type -> directory mapping
pattern_selection # Scenario -> pattern mapping
error_mapping[]
technology
stack[] # category, name, version, purpose
project_structure
run_commands # build, test, lint, serve, etc.
frontend # Only if frontend_ratio >= 0.20
framework, rendering_strategy, styling, state_management
ui_components[]
workspace_topology # Only in monorepo `whole` scope
type, members[], edges[], cycles[], dependency_magnets[]
pitfalls[] # 4-field shape + id + source + depth + confirmed_in_scan
implementation_guidelines[]
development_rules[]
deployment
architecture_diagram # Mermaid graph TD, 8-12 nodes
Schema version: 2.0.0
Claude calls Write/Edit/MultiEdit
|
v
PreToolUse hook fires
|
v
pre-validate.sh
|-- Read tool call from stdin (tool_name, file_path, content)
|-- Load rules.json + platform_rules.json
|-- Check forbidden_import, required_pattern, forbidden_content,
| architectural_constraint, file_naming rules
|-- severity=error -> exit 2 (BLOCKED)
|-- severity=warn -> print warning, exit 0
v
Edit proceeds (or is blocked)
Phase 1 Parallel data gathering (scripts, seconds)
scanner.py, measure_health.py, detect_cycles.py, git log
Phase 2 Read accumulated knowledge
skeletons, scan, health, blueprint, findings, rules, proposed_rules
Phase 3 3 Sonnet agents in parallel (Architecture, Health, Patterns)
Each receives findings.json scoped to its source slice.
Novelty priority: focus on problems NOT in store.
Phase 4 Synthesis
4a Evolve blueprint (components, decisions, health, architecture_rules, development_rules, meta)
DO NOT write blueprint.pitfalls (deep-scan-Opus-only).
4b Write findings.json
- ID-stable match against existing entries (reused id + confirmed_in_scan += 1)
- Unmatched -> next-free f_NNNN + first_seen = today + depth: draft
- Gone -> status: resolved + resolved_at
4c Write scan_history/scan_NNN_*.md and scan_report.md
4d python3 .archie/extract_output.py save-duplications \
/tmp/archie_agent_c_rules.json "$PWD"
Phase 5 Proposed rules -> .archie/proposed_rules.json
Phase 6 Telemetry write
Step 0 Scope resolution (AskUserQuestion)
Step 1 Scanner -> scan.json (with bulk_content_manifest, frontend_ratio)
Step 2 Read accumulated knowledge
Step 3 Wave 1 — 3-4 parallel Sonnets
Structure: components, layers, workspace_topology (if whole-mode),
cross-workspace cycles as DRAFT findings (not pitfalls)
Patterns: communication patterns, design patterns, integrations
Tech: stack, deployment, dev_rules
[UI Layer: if frontend_ratio >= 0.20]
Step 4 Merge Wave 1 outputs via merge.py
-> blueprint_raw.json
Step 5 Wave 2 — single Opus
Reads blueprint_raw.json + findings.json
Three probes: A complexity-budget, B invariants & gates, C seams
Emits: decision_chain, architectural_style, key_decisions,
trade_offs, out_of_scope, findings (upgrade drafts + new),
pitfalls, architecture_diagram, implementation_guidelines
finalize.py:
- deep-merge blueprint sections
- pop findings, id-stable upsert into findings.json
- merge pitfalls into blueprint.pitfalls
Step 6 Rule synthesis — single Sonnet
Reads evolved blueprint, proposes architecturally-grounded rules
Step 7 Intent Layer — per-folder CLAUDE.md (OPT-IN, Step E decides)
If INTENT_LAYER=no: skip, telemetry records "skipped": true
Step 8 Cleanup /tmp/archie_*
Step 9 Drift Detection & Architectural Assessment
drift.py (mechanical) + single Sonnet (deep AI drift)
Writes final scan_report.md + scan_history/scan_NNN_*.md
Step 10 Telemetry write to .archie/telemetry/deep-scan_<ts>.json
Step 1 Scanner (same)
Step 2 Read accumulated knowledge + detect changed files
Step 3' One scoped Reasoning agent (Sonnet)
Input: blueprint, blueprint_raw, findings, changed files
Output: ONLY the sections that need updating
Step 4 finalize.py --patch /tmp/archie_sub_x_*.json
Deep-merges the partial output into existing blueprint + findings store
(skip to Step 6)
Every run feeds the next. Both commands read and write the same findings.json shared store:
- Id-stable upsert. Recurring findings reuse
idand bumpconfirmed_in_scan; new ones get a fresh id +first_seen; gone ones flipstatus: "resolved"(preserved as history). - Novelty priority. Agents are told to spend cognitive budget on NEW problems, not rediscover known ones under different wording.
- Depth escalation. Scan emits
depth: "draft"quickly (single-linefix_direction). Deep-scan Wave 2 upgrades the same id todepth: "canonical"with sequencedfix_directionand architecturalroot_cause. - Blueprint confidence also grows per-section with repeated confirmation across scans.
- Health scores appended to
health_history.jsonfor trend detection. - Source provenance is tracked on every rule and finding:
deep-baseline,scan-observed,scan-adopted,scan-inferred,scan-proposed,deep:synthesis,scan:{structure,health,patterns}.
Deep scans include two-phase drift detection:
Deterministic analysis:
- Pattern outliers (files that don't match established patterns)
- File size and complexity violations
- Dependency-direction breaches
- Structural anomalies
An agent reads blueprint + drift report + CLAUDE.md files + findings.json to identify drift categories:
| Category | What it detects |
|---|---|
decision_violation |
Code that contradicts a recorded architectural decision |
pattern_erosion |
Gradual drift away from established patterns |
trade_off_undermined |
Changes that undermine accepted trade-offs |
pitfall_triggered |
Known pitfalls that have materialised in code |
responsibility_leak |
Logic placed in the wrong component/layer |
abstraction_bypass |
Direct access that skips established abstractions |
semantic_duplication |
Reimplementation of existing functionality |
Violations are grounded with violation_signals from the blueprint's trade-off and decision chain data.
Every scan runs Tarjan's strongly connected components algorithm (detect_cycles.py) on the import graph. Output includes each cycle with the participating directories, file-level evidence showing which imports create the cycle, and dependency magnets (high-in-degree nodes). Results stored in .archie/dependency_graph.json.
There are two distinct telemetry surfaces. Per-run step timing is always-on and stays local. Anonymous usage telemetry is opt-in and, when enabled, uploads a minimal payload to BitRaptors' Supabase.
Every run writes a per-step wall-clock timing file:
.archie/telemetry/scan_<timestamp>.json— for/archie-scan.archie/telemetry/deep-scan_<timestamp>.json— for/archie-deep-scan
{
"command": "deep-scan",
"started_at": "YYYY-MM-DDTHHMMSSZ",
"completed_at": "YYYY-MM-DDTHHMMSSZ",
"total_seconds": 22147,
"steps": [
{"name": "scan", "seconds": 52, "started_at": "...", "completed_at": "..."},
{"name": "read", "seconds": 34},
{"name": "wave1", "seconds": 830, "model": "sonnet"},
{"name": "merge", "seconds": 14},
{"name": "wave2_synthesis", "seconds": 863, "model": "opus"},
{"name": "rule_synthesis", "seconds": 316, "model": "sonnet"},
{"name": "intent_layer", "seconds": 19676, "model": "sonnet", "skipped": false},
{"name": "cleanup", "seconds": 22},
{"name": "drift", "seconds": 340}
]
}Used to measure the wall-clock impact of prompt/code changes over time. Individual steps can set skipped: true (e.g. Intent Layer when opted out, or earlier steps skipped via --from N) with both timestamps identical. This file never leaves the project.
Consent is asked once per machine, in-session, by the first Archie slash command the user runs — not by the npx installer. Every entry point (/archie-scan, /archie-deep-scan, /archie-viewer, /archie-share, /archie-intent-layer) loads the shared _shared/telemetry-consent.md fragment in its preamble; the fragment runs config.py should-prompt and, if this machine hasn't answered, presents an AskUserQuestion opt-in. This deliberately lives in the slash commands — which always run inside Claude Code, where a picker is available — instead of the npx install, which can be non-interactive (CI, pipe, agent shell). Consent, update-check preferences, and a stable random installation id live in machine-level ~/.archie/config.json (managed by config.py) — not per-project. Three tiers:
| Tier | Payload | Installation id |
|---|---|---|
community (recommended) |
usage event | included (stable random id) |
anonymous |
usage event | stripped before upload — each event unlinkable |
off |
nothing leaves the machine | — |
What an event contains: command name (scan / deep-scan / viewer / share / intent-layer / install), Archie version, OS/arch, per-step durations, outcome, detected stack categories (e.g. kotlin / gradle / android). Never collected: source code, file paths, repo names, blueprint contents, errors from the user's code. Local-only fields (e.g. _project_basename) are stripped before upload.
Flow: every scan/share/viewer command ends by calling telemetry_sync.py record-event (silent no-op when opted out). Events are appended to ~/.archie/analytics/runs.jsonl (the local source of truth, readable via analytics.py 7d|30d|all) and pushed to the Supabase telemetry-ingest edge function — anon key + insert-only RLS, never the service role.
Update checks are a separate opt-in (default on). Each slash command preamble runs update_check.py check — a cached hit against the public npm registry (registry.npmjs.org, no Archie infrastructure) that prints an UPGRADE_AVAILABLE / JUST_UPGRADED marker. The command surfaces a one-line hint and continues; the user can snooze (24h → 48h → 7d ladder) or disable entirely.
Scan templates include a "CRITICAL CONSTRAINT: Never write inline Python" block that forbids python3 -c "..." during scans. Every operation has a dedicated CLI command:
| Command | Replaces |
|---|---|
measure_health.py --append-history --scan-type fast|deep |
Inline health history append |
finalize.py --normalize-only |
Inline blueprint normalisation |
finalize.py --patch <file> |
Incremental deep-scan merge |
intent_layer.py inspect <file> [--query .key] [--list] |
Ad-hoc JSON inspection; --list prints array elements one per line (replaces inline python for array→newline conversion) |
intent_layer.py scan-config <dir> read|write|validate |
Monorepo scope config management |
intent_layer.py deep-scan-state <dir> init|read|complete-step N|detect-changes|save-baseline|save-context|save-run-context |
Deep-scan resume state; save-run-context takes flags + newline-separated workspaces via stdin (replaces heredoc-JSON-with-inline-python that used to round-trip workspaces through python3 -c) |
extract_output.py rules <in> <out> |
Parse rule synthesis output |
extract_output.py deep-drift <in> <out> |
Merge drift findings into report |
extract_output.py recent-files <scan.json> |
Print source file paths |
extract_output.py save-duplications <agent_c_file> <project_root> |
Deterministic semantic_duplications.json writer |
telemetry.py <project_root> --command <name> --timing-file <file> |
Write per-run telemetry |
telemetry.py steps-count <project_root> |
Count completed step marks (replaces inline python len(json.load(...).get('steps')) in deep-scan Resume Prelude) |
upload.py <project_root> |
Build + POST share bundle |
This prevents crashes from Claude guessing wrong field names (e.g. batch_id vs id, path vs id) and keeps the pipeline deterministic where it matters. The constraint is enforced in the scan templates themselves; Claude Code is told "CRITICAL CONSTRAINT: Never write inline Python" at the top of each scan command, and every data operation has a dedicated subcommand. Even the deep-scan Resume Prelude — which used to contain 8 inline python3 -c blocks — now reads every field via dedicated CLIs.
| Scenario | Behaviour |
|---|---|
.archie/rules.json missing |
Hooks exit 0 silently (fail open) |
| Blueprint missing | Scan proceeds without prior knowledge — starts fresh |
.archiebulk missing |
BulkMatcher.classify() returns None; no bulk tagging; frontend_ratio falls back to extension counts only |
findings.json missing |
Agents produce fresh entries; first scan builds the store from scratch |
| File I/O errors during scan | Individual files skipped with try/except; scan continues |
| Deep-scan subagent fails | Skipped; remaining agents continue. merge.extract_json_from_text tries three strategies: direct parse, code-block extraction, brace-matching |
| Deep-scan interrupted | .archie/deep_scan_state.json tracks completed steps; --continue resumes |
| Telemetry write fails | Non-fatal; the scan already completed |
/archie-share upload fails |
Blueprint remains local at .archie/blueprint.json; exit 1 with stderr explanation |
# Run all tests
python -m pytest tests/ -v
# Run specific module tests
python -m pytest tests/test_scanner.py -v
python -m pytest tests/test_planner.py -v
python -m pytest tests/test_telemetry.py -v
# Run with coverage
python -m pytest --cov=archie tests/50 test files. Tests mirror the package structure:
- Engine — scanner (+ monorepo variant), dependencies, frameworks, hasher, imports, scan (+ scan_config), engine_models
- Coordinator — planner, runner, merger, prompts
- Hooks — hook_generator, hook_enforcement
- Rules — rule_extractor (legacy), rule_shape, rule_index, code_shape, align_check, migrate_blueprint_rules
- Renderer — renderer (+ field_coverage, merge), intent_layer (+ bulk_filter, inject_scoped, resume, run_context, suggest_batches), normalize, inspect
- Findings / scan pipeline — finalize_findings_merge, verify_findings, apply_verdicts
- Deep-scan skill — deep_scan_state_shell_api
- CLI — init, refresh, status, serve, check
- E2E — refresh_e2e
- Standalone helpers — ignore_patterns, health_append, telemetry, upload, share_setup, lint_gate, viewer
Tests use fixtures (temp directories with known file structures), subprocess mocking for runner/agent tests, and Pydantic model validation for schema compliance.
Standalone scripts, slash commands, the deep-scan skill subtree, the shared scope fragment, and the viewer source all exist in two places (canonical → copy):
archie/standalone/*.py -> npm-package/assets/*.py
archie/standalone/platform_rules.json -> npm-package/assets/platform_rules.json
.claude/commands/archie-*.md -> npm-package/assets/archie-*.md
.claude/commands/_shared/*.md -> npm-package/assets/_shared/*.md
.claude/skills/archie-deep-scan/** -> npm-package/assets/skills/archie-deep-scan/**
share/viewer/ (source, minus build) -> npm-package/assets/viewer/
archiebulk.default and archieignore.default live only in npm-package/assets/ (installer-only resources).
Workflow:
- Always edit the canonical file first (
archie/standalone/,.claude/commands/,.claude/skills/, orshare/viewer/) - Copy to
npm-package/assets/ - Before committing, run the sync checker:
python3 scripts/verify_sync.pyThis verifies all canonical files, asset copies, and archie.mjs references are consistent. It catches missing copies, orphan assets, and dead installer references, then reports:
SYNC CHECK PASSED — 30 scripts, 6 commands, all in sync.