Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
c0be446
Move clone analysis under evidence
dannote May 21, 2026
810f1e7
Add standard library bypass evidence
dannote May 21, 2026
00b04af
Move JSON bypass smell into Jason plugin
dannote May 21, 2026
8debbf7
Add stdlib and map contract evidence
dannote May 21, 2026
9aadd90
Expand evidence-backed candidates
dannote May 21, 2026
be2d132
Tune evidence candidate scans
dannote May 21, 2026
e624985
Preserve advanced stdlib heuristics
dannote May 21, 2026
c5776bf
Add evidence metadata conventions
dannote May 21, 2026
e6cfd59
Discover evidence providers in scanner
dannote May 21, 2026
b4aceb0
Document evidence versus smells
dannote May 21, 2026
8b23e0f
Dogfood evidence heuristics
dannote May 21, 2026
cab9fac
Add order-safe flat map reduce evidence
dannote May 21, 2026
af76b72
Split standard library bypass evidence
dannote May 21, 2026
7f020f4
Use ExAST for simple enum evidence
dannote May 21, 2026
a327f8c
Add evidence pattern runner
dannote May 21, 2026
08d8b43
Test and reuse evidence pattern runner
dannote May 21, 2026
4ddce4f
Document evidence provider conventions
dannote May 21, 2026
179526f
Introduce shared evidence fact
dannote May 21, 2026
50ec26b
Extract evidence AST helpers
dannote May 21, 2026
bc8282e
Clarify map contract evidence phases
dannote May 21, 2026
f5961d8
Fix root reach help flag
dannote May 21, 2026
77b14ff
Tune evidence heuristics from Hex scan
dannote May 21, 2026
4c58e5b
Refine evidence scan signal
dannote May 21, 2026
66c9a89
Align evidence docs with tuned map update heuristic
dannote May 21, 2026
d1b1b00
Add map contract roles and coverage
dannote May 21, 2026
1dd234c
Track map contract aliases and escapes
dannote May 21, 2026
fd349c8
Collect project-level map contracts
dannote May 21, 2026
d78a6d1
Detect returned map bindings
dannote May 21, 2026
247d017
Group similar map contract shapes
dannote May 21, 2026
fa384bc
Split map contract candidate guidance
dannote May 21, 2026
a1f81ea
Add plugin evidence refinement hook
dannote May 21, 2026
76376de
Refine map contracts from Jason escapes
dannote May 21, 2026
a2a8953
Document plugin evidence refinement
dannote May 21, 2026
bf5175a
Tune evidence scan from corpus review
dannote May 21, 2026
9474cf0
Tune flat map reduce evidence
dannote May 21, 2026
30a4095
Fix evidence branch credo issues
dannote May 21, 2026
0b41845
Deduplicate evidence helpers
dannote May 21, 2026
88f80bc
Merge master into evidence branch
dannote May 21, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ Use this responsibility split when refactoring or adding features:

`Reach.CLI.Analyses.*` must not exist; add command orchestration under `Reach.CLI.Commands.*` and domain logic under the appropriate `Reach.*` subsystem.

Framework-specific semantics must stay in plugins. Generic modules such as `Reach.Smell.*`, `Reach.CloneAnalysis.*`, `Reach.Trace.*`, `Reach.Map.*`, and `Reach.Visualize` must not hardcode framework/library names such as Ecto/Repo/Phoenix/Oban/Ash/Jido or framework-specific CRUD/validation calls. Add plugin callbacks instead.
Framework-specific semantics must stay in plugins. Generic modules such as `Reach.Smell.*`, `Reach.Evidence.*`, `Reach.Trace.*`, `Reach.Map.*`, and `Reach.Visualize` must not hardcode framework/library names such as Ecto/Repo/Phoenix/Oban/Ash/Jido or framework-specific CRUD/validation calls. Add plugin callbacks instead.

## Constants and Limits

Expand All @@ -95,7 +95,7 @@ Framework-specific semantics must stay in plugins. Generic modules such as `Reac
- `Reach.Check.*` is for release/CI safety: architecture policy, changed-code risk, refactoring candidates, and adapters that run checks.
- `Reach.Smell.*` is the local code-shape finding engine: loose map contracts, repeated fixed-shape maps, pipeline waste, reverse append, eager patterns, string building, redundant computation, and clone-backed structural consistency.
- `mix reach.check --smells` may call the smell engine, but smell rules themselves must live under `Reach.Smell.*`, not `Reach.CLI.*`.
- `Reach.CloneAnalysis.*` is an evidence provider, not a smell namespace. ExDNA integration must emit Reach-owned clone evidence consumed by semantic checks; ExDNA must not appear as a user-facing smell kind.
- `Reach.Evidence.*` contains reusable evidence providers consumed by smells, checks, and refactoring candidates. Evidence is an observed fact; smells/checks/candidates are user-facing policy decisions. Evidence modules must not emit user-facing findings directly. ExDNA integration must emit Reach-owned clone evidence consumed by semantic checks; ExDNA must not appear as a user-facing smell kind. See `docs/evidence-heuristics.md` for the evidence-first promotion path.

## Release and Docs

Expand Down
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,21 @@

- **Architecture layer ergonomics** — `.reach.exs` now validates unknown layer references, supports allowlist-style dependency policy, dependency exceptions, optional layer coverage checks, and layer-cycle violations with concrete call-edge witnesses.

### Changed

- **Evidence provider namespace** — moved clone analysis under `Reach.Evidence.CloneAnalysis` and introduced `Reach.Evidence` as the namespace for reusable facts consumed by smells, checks, and refactoring candidates.
- **Jason plugin smell** — added an evidence-backed smell check for hand-rolled JSON sanitizers, encoders, and simple `Jason.Encoder` implementations that should use Jason protocol support directly.
- **Standard library bypass smell** — added conservative Path/URI checks for hand-written basename, extension, URL, and query-string splitting, plus higher-context `Enum.map`→flatten, order-safe reduce/reverse `Enum.flat_map`, paired `Map.has_key?`/`Map.put` update, `Map.update!`, and reduce-based `Enum.frequencies` heuristics.
- **Map contract evidence** — added `Reach.Evidence.MapContract` as a reusable evidence provider for maps that are created with a fixed shape, returned from local functions, and then read/updated as implicit contracts.
- **Map contract candidates** — `mix reach.check --candidates` now reports advisory struct, boundary, or typed-map contract candidates when repeated implicit map contracts appear in project source.
- **Poison plugin rename** — split the Poison effect classifier into `Reach.Plugins.Poison` now that Jason has its own plugin.
- **Evidence corpus scanner** — added `scripts/evidence_corpus_scan.exs` for focused Jason, standard-library bypass, and map-contract evidence scans across repositories, backed by lightweight `family/0` and `kinds/0` evidence provider metadata.
- **Plugin evidence refinement** — added a generic `refine_evidence/2` plugin hook so dependency plugins can annotate reusable evidence without owning smell or candidate policy; Jason now classifies maps passed to `Jason.encode/1,2` or `Jason.encode!/1,2` as external payload contracts.
- **Corpus-tuned evidence** — tightened evidence scanning after reviewing Hex corpus hits, including safe handling for dynamic aliases and avoiding `Enum.flat_map/2` suggestions for reduce callbacks shaped like `acc ++ [expr]`.

### Fixed

- **Root task help** — `mix reach --help` now prints usage information instead of generating the default HTML report.
- **Protocol and macro-heavy visualizations** — Elixir frontend now attaches `defimpl`/`defprotocol` functions and nested modules to their real module names, including multi-part nested `defmodule` names, and treats quoted macro-generated definitions as data while preserving `unquote` references. This removes bogus `(top-level)` graph buckets and repeated garbage nodes in reports for macro-heavy projects.
- **Redundant computation false positives** — stateful IR counter calls are no longer reported as duplicate pure computations during Reach's own strict smell checks.

Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,8 @@ Reach 2.x uses five canonical analysis tasks plus the HTML report task.

Use `--format json` for automation. Canonical commands emit pure JSON envelopes with stable command names.

Reach separates reusable evidence from user-facing output. `Reach.Evidence.*` providers collect facts that can be consumed by smells, checks, and advisory candidates; plugin-specific evidence and smells live under `Reach.Plugins.*` and are auto-enabled only when the dependency is present. Plugins can also refine generic evidence with dependency-specific context, such as marking maps passed to `Jason.encode!/1` as external payload contracts. For provider and refinement conventions, see `docs/evidence-providers.md`. For tuning evidence providers across real projects, use `scripts/evidence_corpus_scan.exs`; see `docs/evidence-heuristics.md` for the evidence-first backlog and promotion rules.

Older task names were removed in Reach 2.0 and fail fast with migration guidance. See the [Canonical CLI guide](guides/cli.md).

## Configuration
Expand Down
82 changes: 82 additions & 0 deletions docs/evidence-heuristics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Evidence Heuristics Backlog

Reach keeps promising maintainability ideas as evidence providers first. Do not discard a good idea just because a naive smell would be noisy; add stronger context, mine real history, and only promote it to a smell or candidate when the evidence is useful. Provider API and boundary conventions are documented in `docs/evidence-providers.md`.

## Evidence vs smells

Evidence is an observed fact; a smell is a user-facing judgment.

Evidence providers answer: "what facts did we observe in source, IR, or a project graph?" They return reusable facts with kind, location, confidence, and domain-specific fields. Evidence modules must not decide whether something should fail CI or be shown as a warning.

Policy consumers answer: "what should Reach do with those facts?"

- `Reach.Smell.*` turns evidence into code-quality findings shown by `mix reach.check --smells`.
- `Reach.Check.*` turns evidence into CI/release policy output or advisory refactoring candidates.
- Plugins expose dependency-specific evidence and smells only when the dependency is present.
- Corpus scripts can scan evidence directly before a heuristic is promoted to a smell or candidate.

This separation lets Reach keep promising patterns without shipping noisy warnings. The promotion path is:

```text
idea → evidence provider → corpus scan → stronger heuristic → smell/check/candidate
```

Use evidence when a signal may be useful in multiple contexts or still needs corpus tuning. Use a smell only when the message is ready to be user-facing and appropriate for strict smell gates.

## Standard library bypass

Implemented high-confidence families live in focused modules under `Reach.Evidence.StandardLibraryBypass.*` and are aggregated by `Reach.Evidence.StandardLibraryBypass`. Simple syntactic shapes use `Reach.Evidence.PatternRunner`/ExAST pattern matching where practical; flow-sensitive or multi-statement shapes may use custom AST callbacks:

- `Path.basename/1` and `Path.extname/1` for path-like `String.split` pipelines.
- `URI.parse/1` and `URI.decode_query/1` for URI/query-like splits.
- `Enum.flat_map/2` for direct `Enum.map` followed by `List.flatten/1` or `Enum.concat/1`.
- `Map.update/4` for paired `Map.has_key?`/`Map.put` branches that update the same map/key without relying on a `nil` sentinel.
- `Enum.frequencies/1` and `Enum.frequencies_by/2` for reduce-based count maps with `%{}` initial accumulator, exact increment-by-one logic, and no extra payload work.
- `Enum.flat_map/2` for reduce-based `acc ++ mapped_list` callbacks with an empty list accumulator.
- `Enum.flat_map/2` for order-safe prepend/reverse reducers shaped as `Enum.reverse(chunk, acc)` followed by a final `Enum.reverse/1`.
- `Map.update!/3` when code fetches a required existing key and immediately puts the transformed value back.

Corpus review notes:

- A Hex corpus pass over 6,882 packages produced 540 standard-library evidence hits after tuning, with no scanner stderr.
- `Enum.map(...) |> Enum.concat()` samples were direct `Enum.flat_map/2` opportunities and remain high confidence.
- `Enum.map(...) |> List.flatten()` is intentionally medium confidence: sampled uses often flatten mapper output, but recursive flattening may be semantically required.
- Reduce-based append evidence now ignores `acc ++ [expr]` because sampled hits were `Enum.map/2` shapes, not `Enum.flat_map/2` shapes. It still flags `acc ++ expand(item)` where the appended expression is a list-producing transformation.
- `Map.update/4`, `Map.update!/3`, `Enum.frequencies/1`, `Enum.frequencies_by/2`, Path, and URI samples matched the intended replacement families.

Promising mined families that need stronger constraints before implementation:

- Other `Enum.flat_map/2` prepend/reverse variants; avoid `chunk ++ acc |> Enum.reverse` because it reverses each chunk's internal order.
- `URI.parse/1` for authority parsing such as `String.split(str, ":", parts: 2)`, but only for URI/host/endpoint variable names or surrounding URI semantics.
- `Path.basename/1` / `Path.extname/1` for filename construction, but avoid generic labels/slugs.

## Map contracts

Implemented evidence:

- local fixed-shape map creation followed by key reads/updates;
- local function return shape followed by callsite reads;
- project-level remote return-shape contracts for maps returned by one module and read in another;
- shallow alias tracking for map bindings and returned map variables;
- escape target metadata for maps passed wholesale into calls;
- role metadata such as `:domain`, `:assigns`, `:accumulator`, `:external_payload`, `:options`, and `:unknown`;
- plugin evidence refinement, e.g. Jason marks maps passed to `Jason.encode/1,2` or `Jason.encode!/1,2` as external payloads;
- advisory struct, boundary, or typed-map contract candidates when evidence is repeated, return-shape based, or grouped into a similar-shape family.

Promising upgrades:

- richer project-level return-shape evidence through `Reach.Project.Query`/IR instead of source-only AST matching;
- confidence boosts when the same shape crosses module boundaries;
- plugin refinements for Phoenix/LiveView assigns, request params, component attrs, and other framework-owned map roles;
- key-source and drift evidence that explains where each observed key came from and how similar shapes diverge across files.

## Mined examples

- Hologram has direct `Enum.map(... ) |> Enum.concat/List.flatten` examples in recursive file and template expansion helpers; these validate the direct `Enum.flat_map/2` heuristic.
- Xamal replaced `String.split(str, ":", parts: 2)` authority parsing with `URI.parse("//#{str}")`; this remains a backlog URI heuristic until variable/context constraints are strong enough.
- Jido history contains `Enum.frequencies/1` and `Map.update` replacements in dependency and telemetry code; these validate count-map and paired-update families but also show why payload aggregation must be excluded.
- Reach's own history has append-in-reduce cleanups; reduce-based `Enum.flat_map/2` should stay limited to obvious `acc ++ mapped_list` shapes unless order proof is explicit.

## JSON/Jason

Jason-specific hand-roll detection belongs in `Reach.Plugins.Jason`, not generic standard-library heuristics. Future JSON work should stay plugin-owned and dependency-gated.
120 changes: 120 additions & 0 deletions docs/evidence-providers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Evidence providers

Reach keeps reusable analysis facts in evidence providers. Smells, checks, and refactoring candidates decide which facts become user-facing policy.

## Provider shape

An AST evidence provider exposes lightweight metadata:

```elixir
def family, do: :stdlib

def kinds, do: [:manual_flat_map]

def collect_ast(ast), do: [%Reach.Evidence.Fact{}]
```

Providers are discovered through `Reach.Evidence.ast_providers/1` and dependency-specific plugin callbacks. Keep the API small until several providers need a stronger behaviour.

Most providers should emit `Reach.Evidence.Fact` values. Domain-specific providers may use richer structs temporarily when downstream checks need specialized fields, but scanner-facing facts should converge on this common shape.

Evidence facts should carry at least:

- `:family` — provider family such as `:stdlib`, `:jason`, or `:map_contract`;
- `:kind` — stable atom for the observed fact;
- `:message` — short maintainer-facing explanation;
- `:replacement` — suggested abstraction or API when one is known;
- `:meta` — source metadata, usually including `:line` and optionally `:column`;
- `:confidence` — coarse confidence such as `:high` or `:medium`.

## Boundaries

Evidence providers must not emit `Reach.Smell.Finding` and must not depend on CLI rendering or command modules. User-facing policy belongs in:

- `Reach.Smell.*` for local code-shape findings shown by `mix reach.check --smells`;
- `Reach.Check.*` for CI/release policy and advisory candidates;
- plugin smell/check modules for dependency-specific user-facing output.

Plugin-gated evidence belongs under `Reach.Plugins.*.Evidence`, not in generic evidence modules. Generic providers must not hardcode framework policy such as Phoenix, Ecto, Oban, Ash, Jido, or JSON-library-specific semantics.

## Plugin refinement

Plugins may refine evidence facts after generic providers collect them. Use this when the generic evidence is framework-neutral but a dependency can add semantic context:

```elixir
def refine_evidence(%Reach.Evidence.MapContract.Contract{escapes: escapes}, _context) do
if Enum.any?(escapes, &jason_encode?/1) do
%{role: :external_payload}
else
:unchanged
end
end


def refine_evidence(_evidence, _context), do: :unchanged
```

Reach applies refinements through:

```elixir
Reach.Plugin.refine_evidence(plugins, evidence, context)
```

A refinement may return:

- `:unchanged` — keep the evidence as-is;
- a map of updates — merge annotations such as `role: :external_payload` or `confidence: :medium`;
- a replacement evidence struct of the same type.

Refinement must stay evidence-level. Plugins should annotate facts, confidence, roles, or metadata; they must not emit `Reach.Smell.Finding` or decide candidate policy directly. Smells/checks/candidates consume the refined evidence later.

Current example: `Reach.Evidence.MapContract` records generic escape targets such as `Jason.encode!(data)`. `Reach.Plugins.Jason` refines those contracts to `role: :external_payload`, which lets candidate generation suggest a boundary contract instead of a domain struct.

## Pattern matching

Prefer `Reach.Evidence.PatternRunner` for simple syntactic shapes:

```elixir
import ExAST.Sigil

PatternRunner.run(
ast,
[
manual_flat_map:
{~p[Enum.map(_, _) |> List.flatten()],
fn _match ->
%{
kind: :manual_flat_map,
message: "Enum.map followed by flatten allocates an intermediate nested list; use Enum.flat_map/2",
replacement: "Enum.flat_map/2",
confidence: :high
}
end}
],
family: :stdlib
)
```

Use the pattern as the seed and keep context checks in the builder callback. For example, `StandardLibraryBypass.PathURI` uses ExAST to find `String.split` shapes, then verifies that the subject variable looks path- or URI-like.

Use custom AST traversal, project queries, or data-flow logic when evidence requires proof beyond a single syntactic shape, such as:

- reduce-based `Enum.frequencies/1` or `Enum.flat_map/2` reimplementations;
- multi-statement `Map.fetch!/2` then `Map.put/3` updates;
- implicit map contracts that depend on construction, reads, updates, and callsite return usage.

## Promotion workflow

Use this path for new maintainability ideas:

```text
idea → evidence provider → corpus scan → stronger heuristic → smell/check/candidate
```

Run corpus scans before promoting noisy facts:

```bash
MIX_ENV=test mix run scripts/evidence_corpus_scan.exs -- --kind all /path/to/project
```

The scanner should use provider discovery and plugin refinement, producing facts even when they are not yet exposed as smells. This keeps promising heuristics available for tuning without turning early signals into noisy user-facing warnings.
30 changes: 27 additions & 3 deletions lib/mix/tasks/reach.ex
Original file line number Diff line number Diff line change
Expand Up @@ -11,20 +11,44 @@ defmodule Mix.Tasks.Reach do

@shortdoc "Generate interactive HTML report"

@help """
Generates an interactive HTML report for Elixir/Erlang/Gleam/JavaScript source files.

mix reach
mix reach lib/my_app/server.ex
mix reach --dead-code
mix reach --format dot

Options:

--format Output format: html (default), dot, json
--output Output directory (default: reach_report)
--open Open browser after generating
--no-open Do not open browser after generating
--dead-code Highlight dead code
--help Show this help
"""

@switches [
output: :string,
format: :string,
open: :boolean,
dead_code: :boolean
dead_code: :boolean,
help: :boolean
]

@aliases [o: :output, f: :format]
@aliases [o: :output, f: :format, h: :help]

@impl Mix.Task
def run(args) do
Pipe.safely(fn ->
{opts, files} = Options.parse(args, @switches, @aliases)
Report.run(opts, files)

if opts[:help] do
Mix.shell().info(@help)
else
Report.run(opts, files)
end
end)
end
end
4 changes: 4 additions & 0 deletions lib/reach/analysis.ex
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,12 @@ defmodule Reach.Analysis do

defp mix_task_module?(module) when is_atom(module) do
match?(["Mix", "Tasks" | _], Module.split(module))
rescue
ArgumentError -> false
end

defp mix_task_module?(_module), do: false

defp mix_task_file?(nil), do: false
defp mix_task_file?(%{file: file}), do: String.starts_with?(file || "", "lib/mix/tasks/")

Expand Down
5 changes: 4 additions & 1 deletion lib/reach/check/candidate.ex
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,10 @@ defmodule Reach.Check.Candidate do
:representative_calls,
:call,
:branches,
:direct_caller_count
:direct_caller_count,
:keys,
:occurrences,
:sources
]

def new(attrs) when is_list(attrs) or is_map(attrs), do: struct!(__MODULE__, attrs)
Expand Down
Loading
Loading