The active codebase is the v4 rewrite under src/linuxagent/.
Core layers:
config/: validated application configurationproviders/: LangChain-backed LLM providersexecutors/: safe local command executionpolicy/: capability-based command policy engineplans/: strict JSON CommandPlan models and parsingcluster/: SSH execution and host policygraph/: LangGraph orchestration split into intent parsing, safety checks, routing, and node factoriesservices/: application servicestelemetry.py: local JSONL spans and trace correlationusage_insights/: learner, semantic helpers, recommendationsui/: terminal UI
The ts/ workspace is the TypeScript v5 experimental rewrite track. It is not
the production runtime and does not replace the default linuxagent command.
Python v4 remains the behavior oracle while TS subsystems land behind parity
fixtures and red-line checks.
The TypeScript runtime is experimental. Python v4 remains the default release runtime until parity gates pass.
Current TS packages cover contracts, policy parity, audit hash chains, sandbox runner contracts, argv-based local execution, output redaction, tool gate integration, session permissions, approval defaults, and prompt loading. See TypeScript v5 Experimental Kernel for the current progress tracker and migration boundaries. The broader rewrite plan is documented in TypeScript v5 Progressive Rewrite Design.
Useful TS commands:
make ts-install
make ts-lint
make ts-type
make ts-test
make ts-security
make ts-check
make ts-parityWhen changing TS behavior, update the TS status document and any relevant
README entry in the same change. Keep Python production gates authoritative
until an explicit cutover checklist is satisfied. CI runs the TS checks in a
separate ts-experimental job; Python CI remains the release authority.
tests/unit/: default CI test suitetests/integration/: optional graph/runtime/SSH integration coverage, gated by--integrationtests/harness/: YAML scenarios for graph and HITL behavior
Run locally:
pytest tests/unit/ --cov=linuxagent --cov-fail-under=80
make sandbox
make integration
make optional-anthropic
make harness
make eval
make verify-buildmake eval runs the recorded-replay prompt eval suite under tests/eval/.
Replays are deterministic and require no network access. make eval is part of
the release-preflight target and must pass before a release is cut.
make eval-record re-records the eval fixtures by calling the provider
configured in config.yaml. It is opt-in, makes real network calls, and is
not run in CI. Run it after editing prompts/intent_router.md or the eval's
ROUTER_CONTEXT_FIXTURE; if you skip this step the staleness guard will cause
make eval to fail.
make integration is intentionally optional and runs only tests marked
integration with the explicit --integration flag. Keep external-resource
coverage behind that gate so the default unit suite stays deterministic.
make optional-anthropic is also optional. Run pip install -e '.[anthropic,dev]'
first when validating Claude provider compatibility.
make build expects the dev build backend to be importable in the active
Python environment. Run make install first, or activate the project virtualenv
before building.
make verify-build installs the wheel in an isolated virtualenv with runtime
dependencies. It uses PyPI by default; set LINUXAGENT_PIP_INDEX_URL to test
against a private mirror.
Runtime i18n is intentionally limited to LinuxAgent-owned fixed user-facing
text. Do not localize prompt templates, planner guidance, tool descriptions
that enter model context, MCP protocol metadata, audit JSON keys, policy ids,
or other machine-readable fields. src/linuxagent/i18n/locales/*.yaml is for
CLI/TUI labels, slash help, confirmation/block messages, diagnostics, and
display-only metadata. New user-visible fixed strings should use a locale key;
new model-visible instructions belong in prompts/, policy YAML, Skill
manifests, or the relevant structured data source.
LinuxAgent runtime UX work uses these terms consistently:
| Term | Meaning | Current owner |
|---|---|---|
| turn | One user request handled against one graph thread/checkpoint | src/linuxagent/app/agent.py, src/linuxagent/graph/runtime.py |
| runtime event | A structured, non-audit status signal emitted while a turn or tool is running | src/linuxagent/graph/events.py, src/linuxagent/runtime_events.py |
| tool event | A tool-runtime event emitted by LLM-visible tools and the provider tool loop | src/linuxagent/providers/base.py, src/linuxagent/tools/sandbox.py |
| work item | One visible unit of runtime work such as command execution, a tool call, a worker, or a background job | current dict events; typed model pending |
| pending request | A resumable human decision or input request, currently represented by LangGraph interrupts | src/linuxagent/graph/*confirm*.py, src/linuxagent/ui/interrupt_dispatcher.py |
| active view | Transient in-terminal state shown while a turn is running | src/linuxagent/ui/working_status.py |
| history | Durable conversation output after the active view is cleared or consolidated | chat history and graph messages |
| steer input | User input entered while a turn is still running | not yet first-class |
| cancellation token | Shared cancellation state for a running turn and its child work | not yet first-class |
Current runtime events are legacy dictionaries. Graph nodes emit high-level
activity events through notify_event(). Read-only batches and direct-answer
workers emit worker_group events through src/linuxagent/runtime_events.py.
Command batches, background jobs, and streaming command output use related dict
events consumed by the app runtime observer.
Runtime events have three separate consumers:
- telemetry:
src/linuxagent/app/runtime_telemetry.pyrecords selected event types as local telemetry spans. - UI activity:
src/linuxagent/app/runtime_messages.py,src/linuxagent/container.py, andsrc/linuxagent/ui/working_status.pyturn events into transient terminal status. - harness:
tests/harness/runner.pycollectsruntime_eventsandtool_eventsfor scenario assertions.
Tool events are separate from runtime events today. The container records tool
audit metadata through AuditLog.record_tool_event() and also renders a
transient UI activity message. Tool-event arguments and output previews must
stay redacted before they reach telemetry, UI, or model context.
Audit records are not runtime events. HITL decisions, command execution audit, file patch audit, and tool audit entries remain durable security records with their own schema and retention behavior. Runtime events are UI/telemetry/replay signals and must not replace audit records.
Known gaps before the typed lifecycle work:
- there is no typed
turn_started/turn_completed/turn_abortedenvelope. - active terminal state is rendered directly from messages rather than a pure active-view reducer.
- cancellation exists at graph-invocation/UI edges but is not a shared runtime token.
- busy user input and pending human requests are not represented by one queue or request protocol.
- harness event assertions still observe legacy dict events rather than a stable typed event contract.
Phase 1 lifecycle acceptance should use this vocabulary when naming harness scenarios and event assertions, so tests check protocol states rather than English or Chinese UI prose.
The current stabilization track is focused on reducing orchestration complexity before adding more product surface. During this track, feature work should be deferred unless a maintainer explicitly interrupts the sequence.
Work must stay one subplan scoped: do not mix graph boundary work, node splits, file patch engine movement, sandbox wording, and container wiring in the same change. Behavior-preserving refactors must not change prompt templates, planner schemas, policy decisions, HITL semantics, audit JSON fields, or CLI UX.
Baseline hotspots being reduced:
| Module | Current responsibility | Intended owner after stabilization |
|---|---|---|
src/linuxagent/graph/intent.py |
Intent routing, direct answer, planner gate, tool planning, parse repair, wizard gates | Facade plus focused router, direct-answer, planner, tool-loop, no-change, and repair modules |
src/linuxagent/graph/nodes.py |
Confirmation, permissions, execution, batching, plan advancement, analysis | Facade plus focused confirm, permission, execution, batch, plan-step, and analysis modules |
src/linuxagent/graph/file_patch_nodes.py |
File patch confirmation, apply, verification, repair | Facade plus focused file patch graph-node modules |
src/linuxagent/plans/file_patch.py |
File patch models, parsing, safety, diff apply, transactions, summaries | Public facade plus focused implementation modules under plans/ |
src/linuxagent/graph/state.py |
Broad graph state contract | Documented section contracts with producer/consumer ownership |
src/linuxagent/container.py |
Provider, service, tool, UI, graph, and telemetry wiring | Public composition root delegating to focused wiring helpers |
Execution order for the first phase is:
- stabilization inventory and baseline gates
GraphRuntimeadapter boundary- architecture boundary checks for raw LangGraph leakage
AgentStatesection contracts- intent and planner splits
- command confirmation/execution splits
- file patch graph and engine splits
- sandbox product-contract visibility
The app layer must consume graph execution through GraphRuntime; raw LangGraph
resume commands, interrupt extraction, checkpoint snapshots, and snapshot task
inspection belong under src/linuxagent/graph/. Service and tool modules must
also stay LangGraph-free. make security and CI run
scripts/check_arch_boundaries.py to guard these boundaries.
make security and CI also run scripts/check_architecture_budget.py. The
budget turns the stabilization track into a regression gate:
src/linuxagent/app/agent.pyremains capped at 300 physical lines.- Graph modules default to 430 physical lines. Existing larger modules have narrow per-file caps so they cannot grow without an explicit follow-up split.
- Safety-sensitive plan modules default to 260 physical lines, with a narrow cap for the existing public plan model facade.
- All Python functions remain capped at 50 physical lines.
- Any new
AgentStatefield must be listed ingraph/state_contracts.pywith an owner section. - Any new graph node factory must be added to the budget coverage manifest with a real unit test and a harness or boundary scenario.
Tool sandbox metadata and subprocess ownership are enforced by
scripts/check_sandbox_rules.py, which is part of the same security gate.
These are enforced both locally and in CI:
- no
shell=True - no
AutoAddPolicy - no bare
except: - no
input()calls insidesrc/linuxagent/graph/ - no raw LangGraph runtime access from
src/linuxagent/app/,services/, ortools/ - no direct subprocess creation outside
src/linuxagent/sandbox/ - no unwrapped LangChain tools exposed to the LLM
make security runs scripts/check_code_rules.py,
scripts/check_arch_boundaries.py, scripts/check_sandbox_rules.py,
scripts/i18n_audit.py, grep red-lines, and Bandit. The i18n audit fails on
unregistered Chinese runtime string literals in production source. English
phrase detection remains report-only because many English literals are protocol
strings, exception messages, model-facing instructions, or test fixtures.
Before a release or security-sensitive merge, review:
- sandbox bypass: local execution still reaches commands only through
SandboxRunner. - tool completeness: every LLM-facing tool has
ToolSandboxSpecmetadata, timeout/output budgets, and redacted output. - audit completeness: command, file patch, local sandbox, and SSH remote metadata are recorded where applicable.
- fallback behavior: disabled/no-op paths report
enforced=false; unavailable safe profiles fail closed. - packaging:
make verify-buildconfirms config, policy, prompts, locale catalogs, and sandbox config sections are present in the wheel; the isolated wheel install also checkszh-CN/en-USlocale key parity.
Local commands run through argv-based subprocess execution. Remote SSH is
stricter because Paramiko exec_command() talks to the remote user's shell.
src/linuxagent/cluster/remote_command.py therefore rejects shell syntax
before SSH fan-out: command sequencing, pipes, redirects, command
substitution, variable expansion, and related metacharacters are not allowed
on the cluster path.
The graph applies this check after host selection and returns BLOCK before
HITL. SSHManager repeats the same validation before connecting so direct
service calls cannot bypass the boundary.
The top-level network config is reserved for application-level LLM/web tools.
Plan 10 only defines configuration, deterministic domain evaluation, audit
event shape, and linuxagent check visibility; it does not add fetch/search
tools or perform DNS resolution. Domain rules are normalized to lowercase,
strip one trailing dot, and may use .example.com / *.example.com for
subdomains only. Deny entries take priority over allow entries.
Command safety is evaluated by src/linuxagent/policy/ and exposed through
the compatibility API in src/linuxagent/executors/safety.py.
Each decision includes:
level:SAFE,CONFIRM, orBLOCKrisk_score: 0-100capabilities: e.g.filesystem.delete,service.mutate,privilege.sudomatched_rules: legacy-compatible rule names used by audit and HITL
configs/policy.default.yaml documents the default YAML shape:
rules:
- id: service.mutate
legacy_rule: DESTRUCTIVE
level: CONFIRM
risk_score: 70
capabilities: [service.mutate]
reason: service state mutation
match:
command: [systemctl, service]
subcommand_any: [stop, restart, reload, disable]For argument-sensitive rules, prefer match.argv so the policy can express
fixed prefixes, exact arity, token positions, and flags that take values without
falling back to substring checks:
match:
argv:
- prefix: [git, status]
exact: true
- prefix: [journalctl]
flag_values:
- flag: --unit
values: [nginx]Policy YAML is validated fail-fast with Pydantic and can be enabled at runtime:
policy:
path: ~/.config/linuxagent/policy.yaml
include_builtin: trueWith include_builtin: true, user rule IDs replace matching built-in IDs and
new IDs are appended. Set include_builtin: false only when intentionally
replacing the full policy set. Invalid configured policy YAML fails before the
runtime services are built.
The graph no longer accepts a raw shell string from the LLM. Provider output is
parsed as strict JSON CommandPlan; invalid JSON or schema errors are treated
as BLOCK and no command is executed.
Multi-step operations are represented as normal validated CommandPlan
entries. The graph advances through those entries generically; each command
still goes through policy, HITL, execution or patch confirmation, audit, and
analysis through the same path as any other LLM-generated plan.
Remote scope is structured data, not Python natural-language matching:
commands[].target_hosts is empty for local execution, contains exact configured
host names or hostnames for selected SSH targets, and uses ["*"] for every
configured cluster host.
Artifact and mutation requests are represented as FilePatchPlan, not shell
redirection. FilePatchPlan.request_intent carries create, update, or
unknown so safety checks do not infer intent from user-language keywords. The
planner can inspect real state before producing a patch through bounded
read-only tools:
read_file(path, offset, limit)list_dir(path)search_files(pattern, root)for literal text searchsearch_logs(pattern, log_file, max_matches)for literal text searchget_system_info()
All workspace file reads reuse file_patch.allow_roots; the default roots are
the current workspace and /tmp. Patch application dry-runs unified diffs,
checks allow/high-risk roots before reading targets, validates optional
permission changes, and can relocate hunks when the line number is stale but the
old context matches exactly. The apply path is transactional: symlink path
components, hardlinks, directories, device files, FIFOs, sockets, oversized
targets, and non-UTF-8 text are rejected; writes use temporary files and atomic
replace; existing targets are backed up and rolled back if a later write or
permission change fails. Confirmation rendering shows compact per-file diffs,
+N / -M stats, large-diff pagination, high-risk path warnings, permission
changes, and per-file acceptance for multi-file patches.
The planner prompt should preserve existing file style and behavior. If a requested feature already exists, it should return a no-change answer. If a request says "create" but the intended target path already exists, the planner should avoid silently overwriting it by choosing a new filename, returning no-change, or asking for an explicit conflict decision.
Project-specific code rules are enforced by make security and CI through
scripts/check_code_rules.py. Module-top TYPE_CHECKING imports are allowed;
imports inside functions or methods are not. Optional dependency handling should
stay at module/provider boundaries and raise explicit provider errors when the
extra is not installed.
Every graph run receives a trace_id that is attached to HITL audit records
and local telemetry spans. The default telemetry backend writes JSONL to
~/.linuxagent/telemetry.jsonl; it does not require an external OTel service.
For development you can set telemetry.exporter: console to print redacted span
JSON to stdout. For collector integration set telemetry.exporter: otlp and
telemetry.otlp_endpoint to an HTTP traces endpoint. Network telemetry export is
never enabled by default.
HITL "allow all" decisions are recorded as decision: yes_all with a
Claude-style permissions.allow list such as Bash(cat /etc/os-release).
Those permissions live in LangGraph state for the current conversation thread
and the same thread after /resume. They match exact argv token shapes rather
than substrings, are not global executor permissions, and are still blocked by
never_whitelist, destructive capabilities, SSH batch confirmation, policy,
and sandbox gates.
Audit records are hash-chained with prev_hash and hash. Use
linuxagent audit verify to validate the current audit log and locate the
first tampered line.
The old v3 source has been removed. All active work belongs in src/linuxagent/.