ruff docgen is Ruff's universal documentation generator.
It is model-driven and adapter-based:
- Universal core pipeline
- Language adapters
- Shared symbol model
- Gap analyzer
- Renderers (HTML, Markdown, JSON)
- CI quality gates
- Optional AI task emission
- Ruff (
.ruff) - PHP (
.php) - Python (
.py) - TypeScript (
.ts,.tsx) - JavaScript (
.js,.jsx,.mjs,.cjs) - Ruby (
.rb) - Go (
.go) - Haskell (
.hs,.lhs) - Zig (
.zig)
The core lives under src/docgen/ and is language-agnostic.
core.rs: orchestration + output + gatesdiscovery.rs: safe deterministic file discoverymodel.rs: shared project/module/symbol/gap modelgaps.rs: missing-doc and link-gap analysisrender/*: HTML/Markdown/JSON renderingadapters/*: language-specific symbol/doc extraction
For Ruff symbols, DocGen visibility is explicit and gate-oriented:
- Top-level symbols are
Publiconly when declared withpub:pub func ...pub struct ...pub enum ...pub const .../pub let ...
- Non-
pubtop-level symbols arePrivate. - Struct method visibility requires both:
- method declaration is
pub - containing struct is
PublicMethods under private structs remainPrivateeven if the method itself is declaredpub.
- method declaration is
- Enum variants inherit visibility from the containing enum:
- variants under
pub enumarePublic - variants under non-
pub enumarePrivate
- variants under
--public-onlywith--include-privatedisabled filters toPublicsymbols only and is the intended strict CI gate surface.
Visibility classification is implemented through shared adapter helpers that centralize explicit modifier mapping, naming-convention mapping, and container/member effective visibility rules. Ruff and TypeScript adapter semantics are regression-locked and unchanged by this refactor.
Ruff DocGen currently attaches inline documentation from these Ruff comment forms:
///line doc comments//!line doc comments/** ... */block doc comments
Non-doc block comments (/* ... */) are not treated as API documentation.
Attachment matching is decorator-aware: DocGen will skip Ruff decorator/attribute lines (for example @... and #[...]) between a doc block and its symbol target.
Proximity behavior is stable and test-locked:
- Blank lines between a doc block and symbol are allowed.
- Regular non-doc comment lines break attachment.
- The nearest eligible doc block is attached when multiple blocks appear before a symbol.
Decision: keep a hybrid Ruff extraction strategy, with regex-based symbol discovery as the default production path and an opt-in parser-assisted prototype (--ruff-parser-assisted) that gracefully falls back to regex on lexer/parser diagnostics.
Rationale:
ruff docgenis expected to be best-effort and resilient even when repositories contain partially invalid Ruff sources; a hard parser-only path would reject symbol extraction for files that fail full parse.- Current parser surfaces are optimized for program execution semantics, so parser-assisted extraction is bounded and guarded by deterministic fallback to avoid CI gate instability.
- Fixture-backed regressions (
tests/fixtures/docgen/*) cover both parser-success and parser-fallback paths to keep ordering/strict-gate behavior stable.
Follow-through policy:
- Continue expanding fixture-backed Ruff extraction edge cases as they are discovered.
- Keep parser-assisted extraction opt-in while broadening fixture coverage before any default-path promotion.
- Preserve deterministic output and strict-gate stability as non-negotiable acceptance criteria for any parser-assisted rollout.
DocGen is scan-only.
- No source code execution
- No imports/build steps
- No external AI calls by default
- Symlink traversal is skipped during discovery
- File size, depth, and file count limits are enforced
- Deterministic ordering for CI stability
- HTML output escapes documentation content by default
When discovery limits skip input, DocGen emits warning diagnostics in docgen.json:
DOCGEN_DISCOVERY_MAX_FILE_SIZEDOCGEN_DISCOVERY_MAX_DEPTHDOCGEN_DISCOVERY_MAX_FILESDOCGEN_DISCOVERY_INVALID_ENCODING
ruff docgen --json also reports per-reason skip counters under discovery_skip_counts:
max_file_sizemax_depthmax_filesinvalid_encoding- Link-validation budget truncation counters under
link_validation_skip_counts:max_link_checksmax_external_checksmax_total_time
- Symbol volume counters:
item_count(total symbols in output scope)project_symbol_count(non-builtin symbols)builtin_symbol_count(builtin symbols)
- Per-kind symbol counters:
symbol_kind_countswith deterministic keys such asfunction,method,struct,enum,enum_variant, andbuiltin
- Stable dashboard summary block:
summary.schema_version(docgen-summary/v1)summarymirrors key totals/gate counters for machine consumers while preserving existing top-level contract fields
- Effective discovery limits block:
discovery_limits.max_file_size_bytesdiscovery_limits.max_depthdiscovery_limits.max_files- mirrored under
summary.discovery_limits
Discovery limits can be overridden per run through:
- CLI flags (
--max-discovery-file-size-bytes,--max-discovery-files,--max-discovery-depth) - Environment (
RUFF_DOCGEN_MAX_FILE_SIZE_BYTES,RUFF_DOCGEN_MAX_FILES,RUFF_DOCGEN_MAX_DEPTH) - Built-in defaults when neither CLI nor env are set (CLI values take precedence over env values).
ruff docgen --json emits per-language extraction counters under adapter_health (and mirrored under summary.adapter_health):
files_scannedsymbols_extracteddoc_blocks_attachedplaceholders_emitted
DocGen emits DOCGEN_ADAPTER_LOW_YIELD warnings when extraction yield is suspiciously low for scanned language inputs:
- Three or more files scanned with zero extracted symbols.
- Ten or more files scanned with fewer than one extracted symbol per five files.
DocGen supports optional incremental extraction reuse for CI through --cache-dir:
- Per-file extraction artifacts are cached using a key that includes source content hash, language, and adapter cache version.
- Cache hits reuse extracted symbol payloads and still preserve deterministic module/symbol output ordering.
- Cache misses recompute extraction and refresh cache entries.
Machine-readable JSON includes deterministic cache counters in both top-level and summary blocks:
cache_stats.hitscache_stats.misses
Discovery and project diagnostics are emitted in deterministic sorted order for CI-stable JSON comparisons.
ruff docgen src/ --language ruff --out-dir docs/generatedruff docgen . --out-dir docs/generatedruff docgen . --languages ruff,php,python,typescript,javascript,ruby,go,haskell,zig --out-dir docs/generatedruff docgen . --public-only --fail-on-undocumented --fail-on-broken-linksruff docgen . --emit-ai-tasks --out-dir docs/generatedThe output directory includes:
index.htmldocgen.md(when format includes markdown)docgen.jsondocgen-gaps.jsondocgen-capabilities.jsondocgen-ai-tasks.md(with--emit-ai-tasks)builtins.html(unless--no-builtins)search-index.json+symbol-index.json(with--search-index)
Public symbols are documented even when no inline docs exist.
Missing docs are rendered as:
Documentation needed.This symbol was discovered from the source code, but no human-authored documentation was found.
docgen-gaps.json and docgen-ai-tasks.md include bounded source context and constrained prompts:
- Use only provided context
- Do not invent behavior
- Mark uncertainty
- Keep docs concise
- Add examples only when source supports them
Default DocGen link validation is local-file existence only:
- Local links are checked by filesystem existence.
- Local link fragments (
#anchor) and query segments (?query) are ignored in default mode. - External links (
http://,https://,mailto:) are not validated in default mode.
Optional local-anchor validation mode is available with --validate-local-anchors:
- Local links that include a fragment (
#...) require the target anchor to exist in the referenced local file. - Markdown heading slugs and basic HTML
id="..."/name="..."anchors are supported.
Optional external-link validation mode is available with --validate-external-links:
- Validation only runs for hosts in
--external-link-allowlist. - Private/loopback/link-local/multicast targets are blocked by default (including DNS-resolved hostnames) to reduce SSRF risk.
- Use
--allow-private-network-linksto opt in when private-network link validation is intentionally required. - Allowlist confinement is enforced on every redirect hop; if a redirect leaves the allowlist, the link is reported as broken with mode
external-redirect-allowlist. - Validation requests use
--external-link-timeout-ms. - Links that fail allowlisted external validation are reported as broken links.
- If external validation is enabled with an empty allowlist, DocGen emits
DOCGEN_LINK_EXTERNAL_ALLOWLIST_EMPTY. - If an allowlist is provided without
--validate-external-links, DocGen emitsDOCGEN_LINK_EXTERNAL_ALLOWLIST_IGNORED. - Broken-link diagnostics and gate failures include mode-specific categories (
local_file,local_anchor,external,external_redirect_allowlist,external_private_address) for clearer CI triage. - Link validation resource budgets are available for bounded CI/runtime behavior:
--max-link-checks--max-external-link-checks--max-total-validation-time-ms
- When a budget truncates checks, DocGen emits deterministic warnings:
DOCGEN_LINK_VALIDATION_BUDGET_MAX_LINK_CHECKSDOCGEN_LINK_VALIDATION_BUDGET_MAX_EXTERNAL_CHECKSDOCGEN_LINK_VALIDATION_BUDGET_TOTAL_TIMEand reports skip counts inlink_validation_skip_counts.
DocGen source-link rendering supports pluggable template providers:
- Default behavior (no template configured) keeps source rendering unchanged (plain source location text).
--source-link-templateenables URL template expansion when--source-linksis enabled.- Supported template placeholders:
{path}(normalized, percent-encoded relative source path){line}(1-based source line)
- Path normalization safety:
- absolute paths are rejected
- parent-traversal paths (
..) are rejected - rejected paths do not emit template links and fall back to plain source-location rendering
The following roadmap is a focused QA/pass-two backlog for tightening DocGen implementation quality after the initial feature-completion tracks.
-
DG-QA-001External link redirect confinement and allowlist re-validation. (Completed 2026-05-18) Acceptance criteria:- Re-validate host allowlist on every redirect hop, not only on the initial URL.
- Emit deterministic diagnostics when redirects leave the allowlist.
- Add regression tests for same-host redirect, cross-host allowed redirect, and blocked redirect.
-
DG-QA-002SSRF guardrails for external link mode. (Completed 2026-05-18) Acceptance criteria:- Resolve and block private/loopback/link-local/multicast IP targets by default in external-link mode.
- Add explicit opt-in for private network validation where needed.
- Add tests for DNS names resolving to blocked ranges and direct-IP URLs.
-
DG-QA-003Link validation resource budgets. (Completed 2026-05-19) Acceptance criteria:- Add max link checks, max external checks, and total validation time budget controls.
- Surface budget truncation in diagnostics and JSON summary counts.
- Keep deterministic behavior under budget exhaustion.
-
DG-QA-004Encoding-safe file ingestion. (Completed 2026-05-19) Acceptance criteria:- Replace hard failure on non-UTF-8 source reads with deterministic skip diagnostics.
- Preserve strict-gate stability while reporting skipped file count by encoding reason.
- Add fixtures for invalid UTF-8 and mixed-encoding repositories.
-
DG-QA-005Static adapter registry/lookups. (Completed 2026-05-19) Acceptance criteria:- Replace per-call boxed adapter registry construction with static/lazy lookup maps.
- Preserve adapter ordering determinism and capability-index output stability.
- Benchmark and document adapter lookup overhead reduction.
-
DG-QA-006Regex compilation caching across adapters. (Completed 2026-05-19) Acceptance criteria:- Move regex compilation from per-file extraction paths to static/lazy compiled regexes.
- Ensure no behavior drift in existing adapter extraction fixtures.
- Add micro-benchmark evidence for extraction throughput improvement.
-
DG-QA-007Link/anchor validation caching. (Completed 2026-05-19) Acceptance criteria:- Reuse one HTTP client per run and cache parsed local anchors per file path.
- Avoid repeated file reads for multiple anchors targeting the same file.
- Add regression tests covering repeated-anchor checks and repeated external hosts.
-
DG-QA-008Gap call-site indexing optimization. (Completed 2026-05-19) Acceptance criteria:- Replace per-symbol full-source scans with a one-pass call-site index.
- Preserve deterministic known-call-site ordering and limit semantics.
- Add large-repo performance regression coverage.
-
DG-QA-009Shared extraction helpers for C-style languages. (Completed 2026-05-19) Acceptance criteria:- Extract shared symbol/doc-block parsing utilities for TypeScript/JavaScript (and optionally Go/Zig where applicable).
- Reduce duplicated regex/loop logic without changing symbol contracts.
- Add adapter conformance tests to prove no language-specific regression.
-
DG-QA-010Shared visibility-policy helper layer. (Completed 2026-05-19) Acceptance criteria:- Centralize effective visibility calculation patterns used by adapters (top-level, container/member inheritance, explicit modifiers).
- Keep Ruff/TypeScript visibility semantics unchanged unless explicitly versioned.
- Add matrix tests for adapter-specific visibility edge cases.
-
DG-QA-011Single-source docgen JSON contract serialization. (Completed 2026-05-19) Acceptance criteria:- Move CLI JSON contract assembly from ad hoc
main.rsmaps into a typed summary payload builder. - Ensure backward compatibility for existing top-level keys.
- Lock output contract with dedicated snapshot tests.
- Move CLI JSON contract assembly from ad hoc
-
DG-QA-012Renderer deduplication cleanup. (Completed 2026-05-19) Acceptance criteria:- Remove no-op duplicated branches (for example source-link conditionals that currently emit identical output).
- Centralize shared symbol card rendering helpers across HTML/Markdown renderers where safe.
- Preserve deterministic render output ordering.
-
DG-QA-013Configurable discovery limits from CLI. (Completed 2026-05-19) Acceptance criteria:- Add CLI/env overrides for max file size, max depth, and max files.
- Emit effective limits in JSON summary for reproducible CI runs.
- Add contract tests for default and overridden values.
-
DG-QA-014Adapter health and extraction-confidence diagnostics. (Completed 2026-05-19) Acceptance criteria:- Emit per-language extraction counters (files scanned, symbols extracted, doc blocks attached, placeholders emitted).
- Add warnings when extraction yield is suspiciously low for a language.
- Expose these counters in the machine-readable summary block.
-
DG-QA-015Incremental/cached docgen mode for CI. (Completed 2026-05-19) Acceptance criteria:- Add optional cache keyed by file content hash and adapter version.
- Recompute only changed modules while preserving deterministic aggregate output.
- Provide cache-hit/miss counters in JSON summary.
-
DG-QA-016Source-link provider abstraction. (Completed 2026-05-19) Acceptance criteria:- Add pluggable source-link templates (local path, GitHub/GitLab URL patterns).
- Keep default behavior unchanged when no provider is configured.
- Add tests for URL rendering and path normalization safety.
The next universal DocGen maturation slice is tracked in ROADMAP.md under V1-DOCGEN-001.
-
DG-NEXT-001Parser-assisted Ruff extraction fallback prototype. (Completed 2026-05-20) Acceptance criteria:- Add an opt-in parser-assisted extraction path for Ruff symbols with graceful fallback to regex extraction when parser diagnostics occur.
- Preserve deterministic output ordering and strict-gate stability.
- Add fixture-backed coverage for both parser-success and parser-fallback paths.
-
DG-NEXT-002Cross-language adapter conformance expansion. (Completed 2026-05-20) Acceptance criteria:- Expand fixture coverage for multi-language edge patterns (nested containers, visibility inheritance, and async/doc-attachment variants).
- Add contract checks that keep adapter output shape stable across all supported languages.
- Document any intentional extraction gaps per language.
-
DG-NEXT-003External-repo strict-gate baseline refresh cadence. (Completed 2026-05-20) Acceptance criteria:- Define a repeatable external-repo validation cadence and evidence format in
notes/. - Track strict/public-only undocumented-count deltas across representative repositories.
- Document mitigation playbooks for regressions detected during baseline refresh runs.
- Define a repeatable external-repo validation cadence and evidence format in
These gaps are intentional in the current adapter contracts and must stay documented when conformance tests are expanded:
- Ruff:
- Regex mode is declaration-focused (
func/struct/enum/const/enum variants); parser-assisted mode remains opt-in and falls back to regex on diagnostics. - Visibility inheritance is container-aware for Ruff structs/enums only; it is not a cross-language global rule.
- Regex mode is declaration-focused (
- TypeScript:
- Extraction currently targets classes, interfaces, type aliases, functions, and class methods; enum/namespace/re-export graphs are not fully modeled.
- Method visibility follows explicit modifiers (
public/private/protected) without class-level visibility inheritance. export async function ...declarations andasyncclass methods are currently outside the TypeScript adapter's declaration-pattern coverage.
- JavaScript:
- Extraction targets
export class,function, and class method declarations; dynamic patterns (for example object-literal methods and assigned arrow functions) are intentionally out of scope. export async function ...declarations are currently outside the JavaScript adapter's declaration-pattern coverage.
- Extraction targets
- Python:
- Extraction focuses on
class/defdeclarations and naming-convention visibility (_private); decorator semantics and runtime metaprogramming are not resolved.
- Extraction focuses on
- PHP:
- Extraction targets
class,function, andconstdeclarations with method modifiers; traits/interfaces/namespaced alias graphs are not fully modeled.
- Extraction targets
- Ruby:
- Extraction targets
module,class, anddef; dynamic method-definition patterns and metaprogrammed visibility rules are intentionally not inferred.
- Extraction targets
- Go:
- Extraction targets
type ... struct|interface, top-level functions, and receiver methods; embedded-interface composition and build-tag-aware symbol filtering are not modeled.
- Extraction targets
- Haskell:
- Extraction targets module/data/newtype/typeclass/function signatures using declaration patterns; export lists and advanced type-level constructs are not fully resolved.
- Zig:
- Extraction targets
fn,const,struct, andenumdeclarations; container member traversal/inference is intentionally out of scope (supports_methods = false).
- Extraction targets
Cadence:
- Run baseline refreshes weekly during active DocGen feature work.
- Run baseline refreshes before RC/final release gate passes.
- Run an additional refresh after any adapter extraction/visibility/link-validation contract change.
Validation set (current):
/Users/robertdevore/2026/ruff-ai-sdk/Users/robertdevore/2026/ruff-mcp/Users/robertdevore/2026/ruff-scout
Required command modes per repo:
- Strict include-private mode (
--include-private+ strict gate flags). - Strict public-only mode (
--public-only+ strict gate flags).
Evidence format (required):
- Capture one dated
notes/YYYY-MM-DD_HH-mm_*.mdentry with:- exact command forms used
- strict/include-private and strict/public-only counts (
undocumented_count,broken_link_count,warning_count,gate_failures) - per-repo deltas versus the prior baseline note
- explicit pass/fail interpretation and follow-up owner if counts regress
Mitigation playbook for regressions:
- Re-run the affected repo with
--format json --jsonand inspectgate_failures+diagnostics. - Isolate whether drift is extraction, visibility, docs attachment, or link-validation policy.
- Add/adjust fixture-backed coverage in
tests/docgen_universal.rsbefore changing adapter logic. - If regression is expected/intentional, document the boundary in this file and update the latest evidence note with rationale.
- Re-run
cargo test --test docgen_universalbefore closing the refresh loop.