diff --git a/AGENTS.md b/AGENTS.md index e2ff85e..c560d76 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -81,7 +81,7 @@ Use this responsibility split when refactoring or adding features: `Reach.CLI.Analyses.*` must not exist; add command orchestration under `Reach.CLI.Commands.*` and domain logic under the appropriate `Reach.*` subsystem. -Framework-specific semantics must stay in plugins. Generic modules such as `Reach.Smell.*`, `Reach.CloneAnalysis.*`, `Reach.Trace.*`, `Reach.Map.*`, and `Reach.Visualize` must not hardcode framework/library names such as Ecto/Repo/Phoenix/Oban/Ash/Jido or framework-specific CRUD/validation calls. Add plugin callbacks instead. +Framework-specific semantics must stay in plugins. Generic modules such as `Reach.Smell.*`, `Reach.Evidence.*`, `Reach.Trace.*`, `Reach.Map.*`, and `Reach.Visualize` must not hardcode framework/library names such as Ecto/Repo/Phoenix/Oban/Ash/Jido or framework-specific CRUD/validation calls. Add plugin callbacks instead. ## Constants and Limits @@ -95,7 +95,7 @@ Framework-specific semantics must stay in plugins. Generic modules such as `Reac - `Reach.Check.*` is for release/CI safety: architecture policy, changed-code risk, refactoring candidates, and adapters that run checks. - `Reach.Smell.*` is the local code-shape finding engine: loose map contracts, repeated fixed-shape maps, pipeline waste, reverse append, eager patterns, string building, redundant computation, and clone-backed structural consistency. - `mix reach.check --smells` may call the smell engine, but smell rules themselves must live under `Reach.Smell.*`, not `Reach.CLI.*`. -- `Reach.CloneAnalysis.*` is an evidence provider, not a smell namespace. ExDNA integration must emit Reach-owned clone evidence consumed by semantic checks; ExDNA must not appear as a user-facing smell kind. +- `Reach.Evidence.*` contains reusable evidence providers consumed by smells, checks, and refactoring candidates. Evidence is an observed fact; smells/checks/candidates are user-facing policy decisions. Evidence modules must not emit user-facing findings directly. ExDNA integration must emit Reach-owned clone evidence consumed by semantic checks; ExDNA must not appear as a user-facing smell kind. See `docs/evidence-heuristics.md` for the evidence-first promotion path. ## Release and Docs diff --git a/CHANGELOG.md b/CHANGELOG.md index 5df1794..839cff8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -6,8 +6,21 @@ - **Architecture layer ergonomics** — `.reach.exs` now validates unknown layer references, supports allowlist-style dependency policy, dependency exceptions, optional layer coverage checks, and layer-cycle violations with concrete call-edge witnesses. +### Changed + +- **Evidence provider namespace** — moved clone analysis under `Reach.Evidence.CloneAnalysis` and introduced `Reach.Evidence` as the namespace for reusable facts consumed by smells, checks, and refactoring candidates. +- **Jason plugin smell** — added an evidence-backed smell check for hand-rolled JSON sanitizers, encoders, and simple `Jason.Encoder` implementations that should use Jason protocol support directly. +- **Standard library bypass smell** — added conservative Path/URI checks for hand-written basename, extension, URL, and query-string splitting, plus higher-context `Enum.map`→flatten, order-safe reduce/reverse `Enum.flat_map`, paired `Map.has_key?`/`Map.put` update, `Map.update!`, and reduce-based `Enum.frequencies` heuristics. +- **Map contract evidence** — added `Reach.Evidence.MapContract` as a reusable evidence provider for maps that are created with a fixed shape, returned from local functions, and then read/updated as implicit contracts. +- **Map contract candidates** — `mix reach.check --candidates` now reports advisory struct, boundary, or typed-map contract candidates when repeated implicit map contracts appear in project source. +- **Poison plugin rename** — split the Poison effect classifier into `Reach.Plugins.Poison` now that Jason has its own plugin. +- **Evidence corpus scanner** — added `scripts/evidence_corpus_scan.exs` for focused Jason, standard-library bypass, and map-contract evidence scans across repositories, backed by lightweight `family/0` and `kinds/0` evidence provider metadata. +- **Plugin evidence refinement** — added a generic `refine_evidence/2` plugin hook so dependency plugins can annotate reusable evidence without owning smell or candidate policy; Jason now classifies maps passed to `Jason.encode/1,2` or `Jason.encode!/1,2` as external payload contracts. +- **Corpus-tuned evidence** — tightened evidence scanning after reviewing Hex corpus hits, including safe handling for dynamic aliases and avoiding `Enum.flat_map/2` suggestions for reduce callbacks shaped like `acc ++ [expr]`. + ### Fixed +- **Root task help** — `mix reach --help` now prints usage information instead of generating the default HTML report. - **Protocol and macro-heavy visualizations** — Elixir frontend now attaches `defimpl`/`defprotocol` functions and nested modules to their real module names, including multi-part nested `defmodule` names, and treats quoted macro-generated definitions as data while preserving `unquote` references. This removes bogus `(top-level)` graph buckets and repeated garbage nodes in reports for macro-heavy projects. - **Redundant computation false positives** — stateful IR counter calls are no longer reported as duplicate pure computations during Reach's own strict smell checks. diff --git a/README.md b/README.md index 59f8782..6467672 100644 --- a/README.md +++ b/README.md @@ -90,6 +90,8 @@ Reach 2.x uses five canonical analysis tasks plus the HTML report task. Use `--format json` for automation. Canonical commands emit pure JSON envelopes with stable command names. +Reach separates reusable evidence from user-facing output. `Reach.Evidence.*` providers collect facts that can be consumed by smells, checks, and advisory candidates; plugin-specific evidence and smells live under `Reach.Plugins.*` and are auto-enabled only when the dependency is present. Plugins can also refine generic evidence with dependency-specific context, such as marking maps passed to `Jason.encode!/1` as external payload contracts. For provider and refinement conventions, see `docs/evidence-providers.md`. For tuning evidence providers across real projects, use `scripts/evidence_corpus_scan.exs`; see `docs/evidence-heuristics.md` for the evidence-first backlog and promotion rules. + Older task names were removed in Reach 2.0 and fail fast with migration guidance. See the [Canonical CLI guide](guides/cli.md). ## Configuration diff --git a/docs/evidence-heuristics.md b/docs/evidence-heuristics.md new file mode 100644 index 0000000..e6c718f --- /dev/null +++ b/docs/evidence-heuristics.md @@ -0,0 +1,82 @@ +# Evidence Heuristics Backlog + +Reach keeps promising maintainability ideas as evidence providers first. Do not discard a good idea just because a naive smell would be noisy; add stronger context, mine real history, and only promote it to a smell or candidate when the evidence is useful. Provider API and boundary conventions are documented in `docs/evidence-providers.md`. + +## Evidence vs smells + +Evidence is an observed fact; a smell is a user-facing judgment. + +Evidence providers answer: "what facts did we observe in source, IR, or a project graph?" They return reusable facts with kind, location, confidence, and domain-specific fields. Evidence modules must not decide whether something should fail CI or be shown as a warning. + +Policy consumers answer: "what should Reach do with those facts?" + +- `Reach.Smell.*` turns evidence into code-quality findings shown by `mix reach.check --smells`. +- `Reach.Check.*` turns evidence into CI/release policy output or advisory refactoring candidates. +- Plugins expose dependency-specific evidence and smells only when the dependency is present. +- Corpus scripts can scan evidence directly before a heuristic is promoted to a smell or candidate. + +This separation lets Reach keep promising patterns without shipping noisy warnings. The promotion path is: + +```text +idea → evidence provider → corpus scan → stronger heuristic → smell/check/candidate +``` + +Use evidence when a signal may be useful in multiple contexts or still needs corpus tuning. Use a smell only when the message is ready to be user-facing and appropriate for strict smell gates. + +## Standard library bypass + +Implemented high-confidence families live in focused modules under `Reach.Evidence.StandardLibraryBypass.*` and are aggregated by `Reach.Evidence.StandardLibraryBypass`. Simple syntactic shapes use `Reach.Evidence.PatternRunner`/ExAST pattern matching where practical; flow-sensitive or multi-statement shapes may use custom AST callbacks: + +- `Path.basename/1` and `Path.extname/1` for path-like `String.split` pipelines. +- `URI.parse/1` and `URI.decode_query/1` for URI/query-like splits. +- `Enum.flat_map/2` for direct `Enum.map` followed by `List.flatten/1` or `Enum.concat/1`. +- `Map.update/4` for paired `Map.has_key?`/`Map.put` branches that update the same map/key without relying on a `nil` sentinel. +- `Enum.frequencies/1` and `Enum.frequencies_by/2` for reduce-based count maps with `%{}` initial accumulator, exact increment-by-one logic, and no extra payload work. +- `Enum.flat_map/2` for reduce-based `acc ++ mapped_list` callbacks with an empty list accumulator. +- `Enum.flat_map/2` for order-safe prepend/reverse reducers shaped as `Enum.reverse(chunk, acc)` followed by a final `Enum.reverse/1`. +- `Map.update!/3` when code fetches a required existing key and immediately puts the transformed value back. + +Corpus review notes: + +- A Hex corpus pass over 6,882 packages produced 540 standard-library evidence hits after tuning, with no scanner stderr. +- `Enum.map(...) |> Enum.concat()` samples were direct `Enum.flat_map/2` opportunities and remain high confidence. +- `Enum.map(...) |> List.flatten()` is intentionally medium confidence: sampled uses often flatten mapper output, but recursive flattening may be semantically required. +- Reduce-based append evidence now ignores `acc ++ [expr]` because sampled hits were `Enum.map/2` shapes, not `Enum.flat_map/2` shapes. It still flags `acc ++ expand(item)` where the appended expression is a list-producing transformation. +- `Map.update/4`, `Map.update!/3`, `Enum.frequencies/1`, `Enum.frequencies_by/2`, Path, and URI samples matched the intended replacement families. + +Promising mined families that need stronger constraints before implementation: + +- Other `Enum.flat_map/2` prepend/reverse variants; avoid `chunk ++ acc |> Enum.reverse` because it reverses each chunk's internal order. +- `URI.parse/1` for authority parsing such as `String.split(str, ":", parts: 2)`, but only for URI/host/endpoint variable names or surrounding URI semantics. +- `Path.basename/1` / `Path.extname/1` for filename construction, but avoid generic labels/slugs. + +## Map contracts + +Implemented evidence: + +- local fixed-shape map creation followed by key reads/updates; +- local function return shape followed by callsite reads; +- project-level remote return-shape contracts for maps returned by one module and read in another; +- shallow alias tracking for map bindings and returned map variables; +- escape target metadata for maps passed wholesale into calls; +- role metadata such as `:domain`, `:assigns`, `:accumulator`, `:external_payload`, `:options`, and `:unknown`; +- plugin evidence refinement, e.g. Jason marks maps passed to `Jason.encode/1,2` or `Jason.encode!/1,2` as external payloads; +- advisory struct, boundary, or typed-map contract candidates when evidence is repeated, return-shape based, or grouped into a similar-shape family. + +Promising upgrades: + +- richer project-level return-shape evidence through `Reach.Project.Query`/IR instead of source-only AST matching; +- confidence boosts when the same shape crosses module boundaries; +- plugin refinements for Phoenix/LiveView assigns, request params, component attrs, and other framework-owned map roles; +- key-source and drift evidence that explains where each observed key came from and how similar shapes diverge across files. + +## Mined examples + +- Hologram has direct `Enum.map(... ) |> Enum.concat/List.flatten` examples in recursive file and template expansion helpers; these validate the direct `Enum.flat_map/2` heuristic. +- Xamal replaced `String.split(str, ":", parts: 2)` authority parsing with `URI.parse("//#{str}")`; this remains a backlog URI heuristic until variable/context constraints are strong enough. +- Jido history contains `Enum.frequencies/1` and `Map.update` replacements in dependency and telemetry code; these validate count-map and paired-update families but also show why payload aggregation must be excluded. +- Reach's own history has append-in-reduce cleanups; reduce-based `Enum.flat_map/2` should stay limited to obvious `acc ++ mapped_list` shapes unless order proof is explicit. + +## JSON/Jason + +Jason-specific hand-roll detection belongs in `Reach.Plugins.Jason`, not generic standard-library heuristics. Future JSON work should stay plugin-owned and dependency-gated. diff --git a/docs/evidence-providers.md b/docs/evidence-providers.md new file mode 100644 index 0000000..5ee4552 --- /dev/null +++ b/docs/evidence-providers.md @@ -0,0 +1,120 @@ +# Evidence providers + +Reach keeps reusable analysis facts in evidence providers. Smells, checks, and refactoring candidates decide which facts become user-facing policy. + +## Provider shape + +An AST evidence provider exposes lightweight metadata: + +```elixir +def family, do: :stdlib + +def kinds, do: [:manual_flat_map] + +def collect_ast(ast), do: [%Reach.Evidence.Fact{}] +``` + +Providers are discovered through `Reach.Evidence.ast_providers/1` and dependency-specific plugin callbacks. Keep the API small until several providers need a stronger behaviour. + +Most providers should emit `Reach.Evidence.Fact` values. Domain-specific providers may use richer structs temporarily when downstream checks need specialized fields, but scanner-facing facts should converge on this common shape. + +Evidence facts should carry at least: + +- `:family` — provider family such as `:stdlib`, `:jason`, or `:map_contract`; +- `:kind` — stable atom for the observed fact; +- `:message` — short maintainer-facing explanation; +- `:replacement` — suggested abstraction or API when one is known; +- `:meta` — source metadata, usually including `:line` and optionally `:column`; +- `:confidence` — coarse confidence such as `:high` or `:medium`. + +## Boundaries + +Evidence providers must not emit `Reach.Smell.Finding` and must not depend on CLI rendering or command modules. User-facing policy belongs in: + +- `Reach.Smell.*` for local code-shape findings shown by `mix reach.check --smells`; +- `Reach.Check.*` for CI/release policy and advisory candidates; +- plugin smell/check modules for dependency-specific user-facing output. + +Plugin-gated evidence belongs under `Reach.Plugins.*.Evidence`, not in generic evidence modules. Generic providers must not hardcode framework policy such as Phoenix, Ecto, Oban, Ash, Jido, or JSON-library-specific semantics. + +## Plugin refinement + +Plugins may refine evidence facts after generic providers collect them. Use this when the generic evidence is framework-neutral but a dependency can add semantic context: + +```elixir +def refine_evidence(%Reach.Evidence.MapContract.Contract{escapes: escapes}, _context) do + if Enum.any?(escapes, &jason_encode?/1) do + %{role: :external_payload} + else + :unchanged + end +end + + +def refine_evidence(_evidence, _context), do: :unchanged +``` + +Reach applies refinements through: + +```elixir +Reach.Plugin.refine_evidence(plugins, evidence, context) +``` + +A refinement may return: + +- `:unchanged` — keep the evidence as-is; +- a map of updates — merge annotations such as `role: :external_payload` or `confidence: :medium`; +- a replacement evidence struct of the same type. + +Refinement must stay evidence-level. Plugins should annotate facts, confidence, roles, or metadata; they must not emit `Reach.Smell.Finding` or decide candidate policy directly. Smells/checks/candidates consume the refined evidence later. + +Current example: `Reach.Evidence.MapContract` records generic escape targets such as `Jason.encode!(data)`. `Reach.Plugins.Jason` refines those contracts to `role: :external_payload`, which lets candidate generation suggest a boundary contract instead of a domain struct. + +## Pattern matching + +Prefer `Reach.Evidence.PatternRunner` for simple syntactic shapes: + +```elixir +import ExAST.Sigil + +PatternRunner.run( + ast, + [ + manual_flat_map: + {~p[Enum.map(_, _) |> List.flatten()], + fn _match -> + %{ + kind: :manual_flat_map, + message: "Enum.map followed by flatten allocates an intermediate nested list; use Enum.flat_map/2", + replacement: "Enum.flat_map/2", + confidence: :high + } + end} + ], + family: :stdlib +) +``` + +Use the pattern as the seed and keep context checks in the builder callback. For example, `StandardLibraryBypass.PathURI` uses ExAST to find `String.split` shapes, then verifies that the subject variable looks path- or URI-like. + +Use custom AST traversal, project queries, or data-flow logic when evidence requires proof beyond a single syntactic shape, such as: + +- reduce-based `Enum.frequencies/1` or `Enum.flat_map/2` reimplementations; +- multi-statement `Map.fetch!/2` then `Map.put/3` updates; +- implicit map contracts that depend on construction, reads, updates, and callsite return usage. + +## Promotion workflow + +Use this path for new maintainability ideas: + +```text +idea → evidence provider → corpus scan → stronger heuristic → smell/check/candidate +``` + +Run corpus scans before promoting noisy facts: + +```bash +MIX_ENV=test mix run scripts/evidence_corpus_scan.exs -- --kind all /path/to/project +``` + +The scanner should use provider discovery and plugin refinement, producing facts even when they are not yet exposed as smells. This keeps promising heuristics available for tuning without turning early signals into noisy user-facing warnings. diff --git a/lib/mix/tasks/reach.ex b/lib/mix/tasks/reach.ex index 5b31825..fa7c0d0 100644 --- a/lib/mix/tasks/reach.ex +++ b/lib/mix/tasks/reach.ex @@ -11,20 +11,44 @@ defmodule Mix.Tasks.Reach do @shortdoc "Generate interactive HTML report" + @help """ + Generates an interactive HTML report for Elixir/Erlang/Gleam/JavaScript source files. + + mix reach + mix reach lib/my_app/server.ex + mix reach --dead-code + mix reach --format dot + + Options: + + --format Output format: html (default), dot, json + --output Output directory (default: reach_report) + --open Open browser after generating + --no-open Do not open browser after generating + --dead-code Highlight dead code + --help Show this help + """ + @switches [ output: :string, format: :string, open: :boolean, - dead_code: :boolean + dead_code: :boolean, + help: :boolean ] - @aliases [o: :output, f: :format] + @aliases [o: :output, f: :format, h: :help] @impl Mix.Task def run(args) do Pipe.safely(fn -> {opts, files} = Options.parse(args, @switches, @aliases) - Report.run(opts, files) + + if opts[:help] do + Mix.shell().info(@help) + else + Report.run(opts, files) + end end) end end diff --git a/lib/reach/analysis.ex b/lib/reach/analysis.ex index 3d5c42a..18599f1 100644 --- a/lib/reach/analysis.ex +++ b/lib/reach/analysis.ex @@ -32,8 +32,12 @@ defmodule Reach.Analysis do defp mix_task_module?(module) when is_atom(module) do match?(["Mix", "Tasks" | _], Module.split(module)) + rescue + ArgumentError -> false end + defp mix_task_module?(_module), do: false + defp mix_task_file?(nil), do: false defp mix_task_file?(%{file: file}), do: String.starts_with?(file || "", "lib/mix/tasks/") diff --git a/lib/reach/check/candidate.ex b/lib/reach/check/candidate.ex index f888699..82fecba 100644 --- a/lib/reach/check/candidate.ex +++ b/lib/reach/check/candidate.ex @@ -20,7 +20,10 @@ defmodule Reach.Check.Candidate do :representative_calls, :call, :branches, - :direct_caller_count + :direct_caller_count, + :keys, + :occurrences, + :sources ] def new(attrs) when is_list(attrs) or is_map(attrs), do: struct!(__MODULE__, attrs) diff --git a/lib/reach/check/candidates.ex b/lib/reach/check/candidates.ex index 07232c6..f03fa7e 100644 --- a/lib/reach/check/candidates.ex +++ b/lib/reach/check/candidates.ex @@ -4,10 +4,14 @@ defmodule Reach.Check.Candidates do alias Reach.Analysis alias Reach.Check.{Architecture, Candidate, Changed} alias Reach.Config + alias Reach.Evidence.MapContract alias Reach.IR.Helpers, as: IRHelpers alias Reach.Project.Query @note "Candidates are advisory. Reach reports graph/effect/architecture evidence; prove behavior preservation before editing." + @suppressed_map_contract_roles [:accumulator, :assigns] + @shape_family_similarity 0.75 + @shape_family_min_shared_keys 3 def run(project, config, opts \\ []) do config = Config.normalize(config) @@ -17,7 +21,8 @@ defmodule Reach.Check.Candidates do (mixed_effect_candidates(project, config.candidates) ++ extract_region_candidates(project, config.candidates) ++ boundary_candidates(project, config) ++ - cycle_candidates(project, config.candidates)) + cycle_candidates(project, config.candidates) ++ + map_contract_candidates(project, config.candidates)) |> Enum.uniq_by(& &1.id) |> Enum.sort_by(&candidate_rank/1) |> Enum.take(top) @@ -29,8 +34,11 @@ defmodule Reach.Check.Candidates do kind_rank = %{ introduce_boundary: 0, isolate_effects: 1, - extract_pure_region: 2, - break_cycle: 3 + introduce_struct_contract: 2, + introduce_boundary_contract: 3, + introduce_typed_map_contract: 4, + extract_pure_region: 5, + break_cycle: 6 } risk_rank = %{high: 0, medium: 1, low: 2} @@ -245,6 +253,195 @@ defmodule Reach.Check.Candidates do defp function_id(func), do: {func.meta[:module], func.meta[:name], func.meta[:arity]} + defp map_contract_candidates(project, candidate_config) do + project + |> MapContract.collect_project() + |> group_map_contract_shapes() + |> Enum.flat_map(&map_contract_candidate_group/1) + |> Enum.sort_by(fn group -> + {-length(group.contracts), length(group.keys), inspect(group.keys)} + end) + |> Enum.take(candidate_config.limits.per_kind) + |> Enum.with_index(1) + |> Enum.map(fn {group, index} -> + keys = group.keys + contracts = group.contracts + first = List.first(contracts) + sources = contracts |> Enum.map(& &1.source) |> Enum.uniq() + + profile = map_contract_candidate_profile(group) + + Candidate.new( + id: candidate_id("R6", index), + kind: profile.kind, + target: map_contract_target(group), + file: first.file, + line: first.location.line, + benefit: profile.benefit, + risk: profile.risk, + confidence: map_contract_confidence(contracts), + actionability: profile.actionability, + evidence: map_contract_evidence(group, sources, candidate_config), + keys: Enum.map(keys, &to_string/1), + occurrences: length(contracts), + sources: Enum.map(sources, &to_string/1), + proof: profile.proof, + suggestion: profile.suggestion + ) + end) + end + + defp group_map_contract_shapes(contracts) do + contracts + |> Enum.sort_by(&{-length(&1.keys), inspect(&1.keys)}) + |> Enum.reduce([], &put_shape_group/2) + |> Enum.map(&finalize_shape_group/1) + end + + defp put_shape_group(contract, groups) do + case Enum.split_while(groups, &(not similar_shape?(&1.representative_keys, contract.keys))) do + {before, [group | after_groups]} -> + before ++ [%{group | contracts: [contract | group.contracts]} | after_groups] + + {_before, []} -> + groups ++ [%{representative_keys: contract.keys, contracts: [contract]}] + end + end + + defp similar_shape?(left_keys, right_keys) do + intersection = MapSet.intersection(MapSet.new(left_keys), MapSet.new(right_keys)) + union = MapSet.union(MapSet.new(left_keys), MapSet.new(right_keys)) + + MapSet.size(intersection) >= @shape_family_min_shared_keys and + MapSet.size(intersection) / max(MapSet.size(union), 1) >= @shape_family_similarity + end + + defp finalize_shape_group(group) do + key_sets = Enum.map(group.contracts, &MapSet.new(&1.keys)) + core = key_sets |> Enum.reduce(&MapSet.intersection/2) |> MapSet.to_list() |> Enum.sort() + union = key_sets |> Enum.reduce(&MapSet.union/2) |> MapSet.to_list() |> Enum.sort() + + %{ + keys: core, + contracts: Enum.reverse(group.contracts), + representative_keys: group.representative_keys, + variant_keys: union -- core + } + end + + defp map_contract_candidate_group(group) do + contracts = group.contracts + + cond do + length(contracts) >= 2 and not all_non_contract_roles?(contracts) -> + [group] + + Enum.any?(contracts, &return_contract_candidate?/1) -> + [group] + + true -> + [] + end + end + + defp return_contract_candidate?(contract) do + contract.source in [:return, :cross_file_return] and contract.confidence == :high and + length(contract.keys) >= 4 and contract.role not in @suppressed_map_contract_roles and + not low_signal_escape?(contract) + end + + defp all_non_contract_roles?(contracts) do + Enum.all?(contracts, &(&1.role in @suppressed_map_contract_roles or low_signal_escape?(&1))) + end + + defp low_signal_escape?(contract) do + contract.escaped? and contract.read_count < 2 and contract.mutation_count == 0 + end + + defp map_contract_target(%{variant_keys: []} = group), do: "map shape #{inspect(group.keys)}" + + defp map_contract_target(group) do + "map shape family #{inspect(group.keys)} (+#{length(group.variant_keys)} variant keys)" + end + + defp map_contract_candidate_profile(group) do + roles = Enum.map(group.contracts, & &1.role) + + cond do + :external_payload in roles -> + %{ + kind: :introduce_boundary_contract, + benefit: :medium, + risk: :medium, + actionability: :review_boundary_contract, + proof: [ + "Confirm this map is an external payload or boundary DTO before changing runtime shape.", + "Prefer a decoder, validator, or schema at the boundary instead of passing an untyped map deeper.", + "Keep flexible maps when the upstream contract is intentionally dynamic." + ], + suggestion: + "Consider introducing an explicit boundary contract, decoder, or validator for this repeated payload shape." + } + + :options in roles -> + %{ + kind: :introduce_typed_map_contract, + benefit: :medium, + risk: :low, + actionability: :review_options_contract, + proof: [ + "Confirm these options are internal and stable enough to document as a contract.", + "Prefer typed options or a validation schema when callers construct the same option shape repeatedly.", + "Keep plain keyword/options data when keys are intentionally open-ended." + ], + suggestion: + "Consider documenting this repeated options shape with a typed map or options validation schema." + } + + true -> + %{ + kind: :introduce_struct_contract, + benefit: :medium, + risk: :low, + actionability: :review_domain_contract, + proof: [ + "Confirm this repeated map shape represents internal domain data, not an external JSON/API boundary.", + "Prefer a struct when the shape is internal and stable across functions.", + "Keep plain maps for dynamic, user-provided, or third-party payloads." + ], + suggestion: + "Consider replacing this repeated implicit map contract with a struct or explicit domain contract." + } + end + end + + defp map_contract_confidence(contracts) do + cond do + Enum.any?(contracts, &(&1.confidence == :high)) -> :high + Enum.any?(contracts, &(&1.source == :return)) -> :medium + true -> :low + end + end + + defp map_contract_evidence(group, sources, candidate_config) do + source_labels = Enum.map(sources, &"#{&1}_evidence") + + variant_labels = + if group.variant_keys == [], + do: [], + else: ["shape_family_variants #{inspect(group.variant_keys)}"] + + examples = + group.contracts + |> Enum.take(candidate_config.limits.representative_calls) + |> Enum.map(fn contract -> + producer = if contract.producer, do: " from #{inspect(contract.producer)}", else: "" + "#{contract.file}:#{contract.location.line} #{contract.variable}#{producer}" + end) + + source_labels ++ variant_labels ++ examples + end + defp boundary_candidates(_project, %{layers: []}), do: [] defp boundary_candidates(project, config) do diff --git a/lib/reach/check/changed.ex b/lib/reach/check/changed.ex index 5432732..3112524 100644 --- a/lib/reach/check/changed.ex +++ b/lib/reach/check/changed.ex @@ -4,8 +4,8 @@ defmodule Reach.Check.Changed do alias Reach.Check.Architecture alias Reach.Check.Changed.Function, as: ChangedFunction alias Reach.Check.Changed.Result - alias Reach.CloneAnalysis alias Reach.Config + alias Reach.Evidence.CloneAnalysis alias Reach.IR alias Reach.IR.Helpers, as: IRHelpers alias Reach.Project.Query diff --git a/lib/reach/evidence.ex b/lib/reach/evidence.ex new file mode 100644 index 0000000..39df3a2 --- /dev/null +++ b/lib/reach/evidence.ex @@ -0,0 +1,34 @@ +defmodule Reach.Evidence do + @moduledoc """ + Reusable evidence providers consumed by smells, checks, and refactoring candidates. + + Evidence modules collect facts and signals. They do not decide whether + something is a user-facing finding; smell and check modules own that policy. + """ + + @ast_providers [ + Reach.Evidence.StandardLibraryBypass, + Reach.Evidence.MapContract + ] + + @doc "Returns AST evidence providers available for the configured plugins." + def ast_providers(plugins \\ []) do + (@ast_providers ++ Reach.Plugin.evidence_providers(plugins)) + |> Enum.filter(&ast_provider?/1) + |> Enum.uniq() + end + + @doc "Returns AST evidence providers matching a family or all providers." + def ast_providers_for(:all, plugins), do: ast_providers(plugins) + + def ast_providers_for(family, plugins) when is_atom(family) do + Enum.filter(ast_providers(plugins), &(&1.family() == family)) + end + + defp ast_provider?(module) do + Code.ensure_loaded?(module) and + function_exported?(module, :family, 0) and + function_exported?(module, :kinds, 0) and + function_exported?(module, :collect_ast, 1) + end +end diff --git a/lib/reach/evidence/ast.ex b/lib/reach/evidence/ast.ex new file mode 100644 index 0000000..dbccc45 --- /dev/null +++ b/lib/reach/evidence/ast.ex @@ -0,0 +1,115 @@ +defmodule Reach.Evidence.AST do + @moduledoc "Shared AST helpers for evidence providers." + + def collect(ast, collector) when is_function(collector, 2) do + ast + |> reduce([], collector) + |> Enum.reverse() + end + + def reduce(ast, initial, reducer) when is_function(reducer, 2) do + {_ast, acc} = Macro.prewalk(ast, initial, fn node, acc -> {node, reducer.(node, acc)} end) + acc + end + + def contains?(ast, predicate) when is_function(predicate, 1) do + {_ast, found?} = + Macro.prewalk(ast, false, fn node, found? -> + {node, found? or predicate.(node)} + end) + + found? + end + + def count(ast, predicate) when is_function(predicate, 1) do + {_ast, count} = + Macro.prewalk(ast, 0, fn node, count -> + {node, if(predicate.(node), do: count + 1, else: count)} + end) + + count + end + + def references?(ast, expected) do + contains?(ast, &same_ast?(&1, expected)) + end + + def same_ast?(left, right), do: Macro.to_string(left) == Macro.to_string(right) + + def call?(node, {:__local__, function}), do: local_call?(node, function) + def call?(node, {:erlang, module, function}), do: erlang_call?(node, module, function) + def call?(node, {module, function}), do: remote_call?(node, module, function) + + def local_call?({function, _meta, args}, function) when is_atom(function) and is_list(args), + do: true + + def local_call?(_node, _function), do: false + + def remote_call?( + {{:., _, [{:__aliases__, _, module_parts}, function]}, _, args}, + module, + function + ) + when is_list(module_parts) and is_atom(function) and is_atom(module) and is_list(args) do + safe_module_concat(module_parts) == module + end + + def remote_call?(_node, _module, _function), do: false + + def erlang_call?({{:., _, [module, function]}, _, args}, module, function) + when is_atom(module) and is_atom(function) and is_list(args), + do: true + + def erlang_call?(_node, _module, _function), do: false + + def contains_call?(ast, target), do: contains?(ast, &call?(&1, target)) + + def call_descriptor({function, meta, args}) when is_atom(function) and is_list(args) do + {:ok, + %{ + module: nil, + function: function, + arity: length(args), + line: meta[:line], + column: meta[:column] + }} + end + + def call_descriptor({{:., meta, [{:__aliases__, _, module_parts}, function]}, _call_meta, args}) + when is_list(module_parts) and is_atom(function) and is_list(args) do + {:ok, + %{ + module: safe_module_concat(module_parts), + function: function, + arity: length(args), + line: meta[:line], + column: meta[:column] + }} + end + + def call_descriptor({{:., meta, [module, function]}, _call_meta, args}) + when is_atom(module) and is_atom(function) and is_list(args) do + {:ok, + %{ + module: module, + function: function, + arity: length(args), + line: meta[:line], + column: meta[:column] + }} + end + + def call_descriptor(_node), do: :error + + def count_calls(ast, targets) when is_list(targets) do + count(ast, fn node -> Enum.any?(targets, &call?(node, &1)) end) + end + + defp safe_module_concat(module_parts) do + if Enum.all?(module_parts, &is_atom/1) do + Module.concat(module_parts) + end + rescue + ArgumentError -> nil + end +end diff --git a/lib/reach/clone_analysis.ex b/lib/reach/evidence/clone_analysis.ex similarity index 89% rename from lib/reach/clone_analysis.ex rename to lib/reach/evidence/clone_analysis.ex index e59fdcf..9d992fc 100644 --- a/lib/reach/clone_analysis.ex +++ b/lib/reach/evidence/clone_analysis.ex @@ -1,8 +1,8 @@ -defmodule Reach.CloneAnalysis do +defmodule Reach.Evidence.CloneAnalysis do @moduledoc "Process-dictionary-cached clone detection dispatcher." - alias Reach.CloneAnalysis.ExDNA alias Reach.Config + alias Reach.Evidence.CloneAnalysis.ExDNA @cache_key {__MODULE__, :clones} diff --git a/lib/reach/clone_analysis/clone.ex b/lib/reach/evidence/clone_analysis/clone.ex similarity index 82% rename from lib/reach/clone_analysis/clone.ex rename to lib/reach/evidence/clone_analysis/clone.ex index cb17d43..02c1d1d 100644 --- a/lib/reach/clone_analysis/clone.ex +++ b/lib/reach/evidence/clone_analysis/clone.ex @@ -1,4 +1,4 @@ -defmodule Reach.CloneAnalysis.Clone do +defmodule Reach.Evidence.CloneAnalysis.Clone do @moduledoc "Struct for a clone family (a group of similar code fragments)." @derive Jason.Encoder diff --git a/lib/reach/clone_analysis/ex_dna.ex b/lib/reach/evidence/clone_analysis/ex_dna.ex similarity index 97% rename from lib/reach/clone_analysis/ex_dna.ex rename to lib/reach/evidence/clone_analysis/ex_dna.ex index f1babe5..043a816 100644 --- a/lib/reach/clone_analysis/ex_dna.ex +++ b/lib/reach/evidence/clone_analysis/ex_dna.ex @@ -1,8 +1,9 @@ -defmodule Reach.CloneAnalysis.ExDNA do +defmodule Reach.Evidence.CloneAnalysis.ExDNA do @moduledoc "ExDNA-backed clone detection provider." - alias Reach.CloneAnalysis.{Clone, Fragment} alias Reach.Effects + alias Reach.Evidence.CloneAnalysis.Clone + alias Reach.Evidence.CloneAnalysis.Fragment alias Reach.IR alias Reach.Project.Query diff --git a/lib/reach/clone_analysis/fragment.ex b/lib/reach/evidence/clone_analysis/fragment.ex similarity index 87% rename from lib/reach/clone_analysis/fragment.ex rename to lib/reach/evidence/clone_analysis/fragment.ex index c26a52d..b64d4b6 100644 --- a/lib/reach/clone_analysis/fragment.ex +++ b/lib/reach/evidence/clone_analysis/fragment.ex @@ -1,4 +1,4 @@ -defmodule Reach.CloneAnalysis.Fragment do +defmodule Reach.Evidence.CloneAnalysis.Fragment do @moduledoc "Struct for a single code fragment within a clone family." @derive Jason.Encoder diff --git a/lib/reach/evidence/fact.ex b/lib/reach/evidence/fact.ex new file mode 100644 index 0000000..0c14173 --- /dev/null +++ b/lib/reach/evidence/fact.ex @@ -0,0 +1,14 @@ +defmodule Reach.Evidence.Fact do + @moduledoc "A reusable evidence fact emitted by evidence providers." + + defstruct [ + :family, + :kind, + :message, + :replacement, + :meta, + :confidence, + :source, + :data + ] +end diff --git a/lib/reach/evidence/map_contract.ex b/lib/reach/evidence/map_contract.ex new file mode 100644 index 0000000..d37f530 --- /dev/null +++ b/lib/reach/evidence/map_contract.ex @@ -0,0 +1,577 @@ +defmodule Reach.Evidence.MapContract do + @moduledoc "Collects evidence for maps that behave like implicit contracts." + + alias Reach.Evidence.AST + + defmodule Contract do + @moduledoc false + defstruct [ + :variable, + :keys, + :location, + :reads, + :updates, + :confidence, + :source, + :producer, + :role, + :key_coverage, + :observed_keys, + :unused_keys, + :read_count, + :mutation_count, + :escaped?, + :escapes, + :consumer + ] + end + + @min_keys 3 + @min_observations 2 + @assigns_names [:assigns] + @accumulator_names [:acc, :cat, :count, :counts, :stats] + @external_payload_names [:body, :json, :metadata, :payload, :request, :response] + @options_names [:config, :opts, :options] + @non_call_forms [:., :%, :{}, :__aliases__, :__block__, :=, :->, :def, :defp, :fn, :|>] + + def family, do: :map_contract + def kinds, do: [:implicit_map_contract] + + def collect_ast(ast, opts \\ []) do + plugins = Keyword.get(opts, :plugins, []) + context = Keyword.get(opts, :context, %{}) + + ast + |> collect_function_definitions() + |> collect_ast_contracts() + |> refine_contracts(plugins, context) + end + + def collect_project(project, opts \\ []) do + plugins = Keyword.get(opts, :plugins, project_plugins(project)) + + files = + project + |> project_source_files() + |> Enum.map(&source_file_ast/1) + |> Enum.filter(&match?({:ok, _file, _ast}, &1)) + + definitions_by_file = + Map.new(files, fn {:ok, file, ast} -> {file, collect_module_definitions(ast)} end) + + return_shapes = + definitions_by_file + |> Map.values() + |> List.flatten() + |> collect_module_return_shapes() + + Enum.flat_map(files, fn {:ok, file, ast} -> + contracts = + collect_file_project_contracts( + file, + ast, + Map.fetch!(definitions_by_file, file), + return_shapes + ) + + refine_contracts(contracts, plugins, %{file: file, project: project}) + end) + end + + defp collect_ast_contracts(definitions) do + collect_local_contracts(definitions) ++ collect_return_contracts(definitions) + end + + defp refine_contracts(contracts, [], _context), do: contracts + + defp refine_contracts(contracts, plugins, context) do + Enum.map(contracts, &Reach.Plugin.refine_evidence(plugins, &1, context)) + end + + defp project_plugins(project) when is_map(project), do: Map.get(project, :plugins, []) + defp project_plugins(_project), do: [] + + defp project_source_files(project) do + project.nodes + |> Map.values() + |> Enum.flat_map(fn node -> + case node.source_span do + %{file: file} when is_binary(file) -> [file] + _span -> [] + end + end) + |> Enum.uniq() + |> Enum.reject(&dependency_source_file?/1) + |> Enum.sort() + end + + defp dependency_source_file?(file) do + String.contains?(file, "/deps/") or String.contains?(file, "/_build/") + end + + defp source_file_ast(file) do + with {:ok, source} <- File.read(file), + {:ok, ast} <- Code.string_to_quoted(source, emit_warnings: false) do + {:ok, file, ast} + end + end + + defp collect_file_project_contracts(file, ast, module_definitions, return_shapes) do + local_contracts = ast |> collect_function_definitions() |> collect_ast_contracts() + + local_contracts + |> Enum.map(&Map.put(&1, :file, file)) + |> Kernel.++(collect_cross_function_contracts(module_definitions, return_shapes, file)) + end + + defp collect_function_definitions(ast) do + AST.collect(ast, fn + {def_kind, _meta, [head, block]}, definitions when def_kind in [:def, :defp] -> + add_function_definition(head, block, definitions) + + _node, definitions -> + definitions + end) + end + + defp add_function_definition({:when, _meta, [head | _guards]}, block, definitions), + do: add_function_definition(head, block, definitions) + + defp add_function_definition({name, meta, args}, block, definitions) + when is_atom(name) and is_list(args) do + case function_body(block) do + nil -> definitions + body -> [%{name: name, arity: length(args), meta: meta, body: body} | definitions] + end + end + + defp add_function_definition(_head, _block, definitions), do: definitions + + defp collect_module_definitions(ast) do + AST.collect(ast, fn + {:defmodule, _meta, [module_ast, block]}, modules -> + case module_name(module_ast) do + {:ok, module} -> collect_module_functions(module, block) ++ modules + :error -> modules + end + + _node, modules -> + modules + end) + end + + defp collect_module_functions(module, block) do + block + |> function_body() + |> case do + nil -> [] + body -> Enum.map(collect_function_definitions(body), &Map.put(&1, :module, module)) + end + end + + defp module_name({:__aliases__, _meta, parts}) when is_list(parts), + do: {:ok, Module.concat(parts)} + + defp module_name(_node), do: :error + + defp collect_module_return_shapes(definitions) do + definitions + |> Map.new(fn definition -> + {{definition.module, definition.name, definition.arity}, returned_shape(definition)} + end) + |> Enum.reject(fn {_mfa, shape} -> is_nil(shape) end) + |> Map.new() + end + + defp function_body(do: body), do: body + defp function_body([{{:__block__, _meta, [:do]}, body}]), do: body + defp function_body(_block), do: nil + + defp collect_local_contracts(definitions) do + Enum.flat_map(definitions, fn definition -> + definition.body + |> collect_literal_map_bindings() + |> build_contracts(definition.body, :local) + end) + end + + defp collect_return_contracts(definitions) do + return_shapes = collect_return_shapes(definitions) + + Enum.flat_map(definitions, fn definition -> + definition.body + |> collect_return_value_bindings(return_shapes) + |> build_contracts(definition.body, :return) + end) + end + + defp collect_return_shapes(definitions) do + definitions + |> Map.new(fn definition -> + {{definition.name, definition.arity}, returned_shape(definition)} + end) + |> Enum.reject(fn {_mfa, shape} -> is_nil(shape) end) + |> Map.new() + end + + defp returned_shape(%{body: {:__block__, _meta, statements}, meta: meta}) + when is_list(statements) do + bindings = collect_literal_map_bindings({:__block__, [], statements}) + returned_expression_shape(List.last(statements), bindings, meta) + end + + defp returned_shape(%{body: body, meta: meta}) do + returned_expression_shape(body, %{}, meta) + end + + defp returned_expression_shape(expression, bindings, fallback_meta) do + keys = map_literal_keys(expression) + + cond do + length(keys) >= @min_keys -> + %{keys: keys, meta: fallback_meta} + + match?({:ok, _variable}, variable_name(expression)) -> + returned_variable_shape(expression, bindings) + + true -> + nil + end + end + + defp returned_variable_shape(expression, bindings) do + {:ok, variable} = variable_name(expression) + Map.get(bindings, variable) + end + + defp collect_literal_map_bindings({:__block__, _meta, statements}) do + Enum.reduce(statements, %{}, &put_literal_map_binding/2) + end + + defp collect_literal_map_bindings(statement), do: put_literal_map_binding(statement, %{}) + + defp put_alias_binding(bindings, var, rhs, meta) do + {:ok, source_var} = variable_name(rhs) + + if Map.has_key?(bindings, source_var), + do: Map.put(bindings, var, %{bindings[source_var] | meta: meta}), + else: bindings + end + + defp put_literal_map_binding({:=, meta, [{var, _, context}, rhs]}, bindings) + when is_atom(var) and is_atom(context) do + keys = map_literal_keys(rhs) + + cond do + length(keys) >= @min_keys -> + Map.put(bindings, var, %{keys: keys, meta: meta}) + + match?({:ok, _source_var}, variable_name(rhs)) -> + put_alias_binding(bindings, var, rhs, meta) + + true -> + bindings + end + end + + defp put_literal_map_binding(_statement, bindings), do: bindings + + defp collect_return_value_bindings({:__block__, _meta, statements}, return_shapes) do + Enum.reduce(statements, %{}, &put_return_value_binding(&1, &2, return_shapes)) + end + + defp collect_return_value_bindings(statement, return_shapes), + do: put_return_value_binding(statement, %{}, return_shapes) + + defp collect_cross_function_contracts(definitions, return_shapes, file) do + Enum.flat_map(definitions, fn definition -> + definition.body + |> collect_cross_function_bindings(return_shapes) + |> build_contracts(definition.body, :cross_file_return) + |> Enum.map(fn contract -> + contract + |> Map.put(:file, file) + |> Map.put(:consumer, {definition.module, definition.name, definition.arity}) + end) + end) + end + + defp collect_cross_function_bindings({:__block__, _meta, statements}, return_shapes) do + Enum.reduce(statements, %{}, &put_cross_function_binding(&1, &2, return_shapes)) + end + + defp collect_cross_function_bindings(statement, return_shapes), + do: put_cross_function_binding(statement, %{}, return_shapes) + + defp put_cross_function_binding({:=, meta, [{var, _, context}, rhs]}, bindings, return_shapes) + when is_atom(var) and is_atom(context) do + cond do + match?({:ok, _producer}, remote_call(rhs)) -> + with {:ok, producer} <- remote_call(rhs), + %{keys: keys} <- Map.get(return_shapes, producer) do + Map.put(bindings, var, %{keys: keys, meta: meta, producer: producer}) + else + _other -> bindings + end + + match?({:ok, _source_var}, variable_name(rhs)) -> + put_existing_alias_binding(bindings, var, rhs, meta) + + true -> + bindings + end + end + + defp put_cross_function_binding(_statement, bindings, _return_shapes), do: bindings + + defp put_return_value_binding({:=, meta, [{var, _, context}, rhs]}, bindings, return_shapes) + when is_atom(var) and is_atom(context) do + cond do + match?({:ok, _producer}, local_call(rhs)) -> + with {:ok, producer} <- local_call(rhs), + %{keys: keys} <- Map.get(return_shapes, producer) do + Map.put(bindings, var, %{keys: keys, meta: meta, producer: producer}) + else + _other -> bindings + end + + match?({:ok, _source_var}, variable_name(rhs)) -> + put_existing_alias_binding(bindings, var, rhs, meta) + + true -> + bindings + end + end + + defp put_return_value_binding(_statement, bindings, _return_shapes), do: bindings + + defp put_existing_alias_binding(bindings, var, rhs, meta) do + {:ok, source_var} = variable_name(rhs) + + if Map.has_key?(bindings, source_var), + do: Map.put(bindings, var, %{bindings[source_var] | meta: meta}), + else: bindings + end + + defp local_call({name, _meta, args}) when is_atom(name) and is_list(args), + do: {:ok, {name, length(args)}} + + defp local_call(_node), do: :error + + defp remote_call({{:., _meta, [module_ast, function]}, _call_meta, args}) + when is_atom(function) and is_list(args) do + case module_name(module_ast) do + {:ok, module} -> {:ok, {module, function, length(args)}} + :error -> :error + end + end + + defp remote_call(_node), do: :error + + defp variable_name({var, _meta, context}) when is_atom(var) and is_atom(context), do: {:ok, var} + defp variable_name(_node), do: :error + + defp map_literal_keys({:%{}, _meta, fields}) do + fields + |> Enum.flat_map(fn + {key, _value} when is_atom(key) -> [key] + {key, _value} when is_binary(key) -> [key] + _field -> [] + end) + |> Enum.sort() + end + + defp map_literal_keys(_node), do: [] + + defp build_contracts(bindings, ast, source) do + if bindings == %{} do + [] + else + observations = collect_map_observations(ast, bindings) + + bindings + |> Enum.flat_map(fn {variable, binding} -> + build_contract(variable, binding, Map.get(observations, variable, []), source) + end) + end + end + + defp build_contract(variable, binding, observations, source) do + reads = Enum.filter(observations, &(&1.kind == :read)) + updates = Enum.filter(observations, &(&1.kind == :update)) + escapes = Enum.filter(observations, &(&1.kind == :escape)) + + observed_keys = + observations + |> Enum.flat_map(fn + %{key: nil} -> [] + %{key: key} -> [key] + end) + |> Enum.uniq() + |> Enum.sort() + + unused_keys = binding.keys -- observed_keys + key_coverage = length(observed_keys) / max(length(binding.keys), 1) + role = classify_role(variable, source) + + if length(observed_keys) >= @min_observations do + [ + %Contract{ + variable: variable, + keys: binding.keys, + location: location(binding.meta), + reads: Enum.map(reads, &observation_location/1), + updates: Enum.map(updates, &observation_location/1), + confidence: confidence(key_coverage, updates), + source: source, + producer: Map.get(binding, :producer), + role: role, + key_coverage: key_coverage, + observed_keys: observed_keys, + unused_keys: unused_keys, + read_count: length(reads), + mutation_count: length(updates), + escaped?: escapes != [], + escapes: Enum.map(escapes, & &1.call) + } + ] + else + [] + end + end + + defp collect_map_observations(ast, bindings) do + AST.reduce(ast, %{}, &record_observation(&1, bindings, &2)) + end + + defp record_observation(node, bindings, observations) do + case map_observation(node) do + {:ok, variable, key, kind, meta} when is_map_key(bindings, variable) -> + record_known_key_observation(observations, variable, bindings[variable], key, kind, meta) + + _other -> + record_escape_observation(node, bindings, observations) + end + end + + defp record_known_key_observation(observations, variable, binding, key, kind, meta) do + if key in binding.keys do + record_observation_for_variable(observations, variable, %{key: key, kind: kind, meta: meta}) + else + observations + end + end + + defp record_escape_observation(node, bindings, observations) do + case call_args(node) do + {:ok, meta, args} -> + call = call_descriptor(node, meta) + + args + |> Enum.flat_map(&escaped_variables(&1, bindings)) + |> Enum.reduce(observations, fn variable, observations -> + record_observation_for_variable(observations, variable, %{ + key: nil, + kind: :escape, + meta: meta, + call: call + }) + end) + + :error -> + observations + end + end + + defp record_observation_for_variable(observations, variable, observation) do + Map.update(observations, variable, [observation], &[observation | &1]) + end + + defp call_args({name, meta, args}) + when is_atom(name) and is_list(args) and name not in @non_call_forms, + do: {:ok, meta, args} + + defp call_args({{:., meta, [_target, _function]}, _call_meta, args}) when is_list(args), + do: {:ok, meta, args} + + defp call_args(_node), do: :error + + defp call_descriptor(node, fallback_meta) do + case AST.call_descriptor(node) do + {:ok, descriptor} -> + descriptor + + :error -> + %{ + module: nil, + function: nil, + arity: nil, + line: fallback_meta[:line], + column: fallback_meta[:column] + } + end + end + + defp escaped_variables(arg, bindings) do + case variable_name(arg) do + {:ok, variable} when is_map_key(bindings, variable) -> [variable] + _other -> [] + end + end + + defp map_observation({{:., meta, [{var, _, context}, key]}, _, []}) + when is_atom(var) and is_atom(context) and is_atom(key), + do: {:ok, var, key, :read, meta} + + defp map_observation( + {{:., meta, [{:__aliases__, _, [:Map]}, :get]}, _, [{var, _, context}, key | _]} + ) + when is_atom(var) and is_atom(context) and (is_atom(key) or is_binary(key)), + do: {:ok, var, key, :read, meta} + + defp map_observation( + {{:., meta, [{:__aliases__, _, [:Map]}, :fetch]}, _, [{var, _, context}, key | _]} + ) + when is_atom(var) and is_atom(context) and (is_atom(key) or is_binary(key)), + do: {:ok, var, key, :read, meta} + + defp map_observation( + {{:., meta, [{:__aliases__, _, [:Map]}, :put]}, _, [{var, _, context}, key | _]} + ) + when is_atom(var) and is_atom(context) and (is_atom(key) or is_binary(key)), + do: {:ok, var, key, :update, meta} + + defp map_observation({:%{}, meta, [{:|, _, [{var, _, context}, fields]}]}) + when is_atom(var) and is_atom(context) and is_list(fields) do + case fields do + [{key, _value} | _] when is_atom(key) or is_binary(key) -> {:ok, var, key, :update, meta} + _fields -> :error + end + end + + defp map_observation(_node), do: :error + + defp classify_role(variable, _source) when variable in @assigns_names, do: :assigns + defp classify_role(variable, _source) when variable in @accumulator_names, do: :accumulator + + defp classify_role(variable, _source) when variable in @external_payload_names, + do: :external_payload + + defp classify_role(variable, _source) when variable in @options_names, do: :options + defp classify_role(_variable, source) when source in [:return, :cross_file_return], do: :domain + defp classify_role(_variable, _source), do: :unknown + + defp confidence(coverage, updates) do + cond do + updates != [] and coverage >= 0.75 -> :high + coverage >= 0.5 -> :medium + true -> :low + end + end + + defp observation_location(%{key: key, kind: kind, meta: meta}) do + meta |> location() |> Map.merge(%{key: key, kind: kind}) + end + + defp location(meta), do: %{line: meta[:line], column: meta[:column]} +end diff --git a/lib/reach/evidence/pattern_runner.ex b/lib/reach/evidence/pattern_runner.ex new file mode 100644 index 0000000..c87403f --- /dev/null +++ b/lib/reach/evidence/pattern_runner.ex @@ -0,0 +1,64 @@ +defmodule Reach.Evidence.PatternRunner do + @moduledoc "Runs ExAST patterns for evidence providers." + + alias ExAST.Patcher + alias Reach.Evidence.Fact + + def run(ast, specs, opts \\ []) do + evidence_module = Keyword.get(opts, :evidence_module, Fact) + family = Keyword.get(opts, :family) + metadata = Map.new(specs, fn {name, spec} -> {name, spec} end) + patterns = Map.new(specs, fn {name, {pattern, _builder}} -> {name, pattern} end) + + ast + |> find_many(patterns) + |> Enum.flat_map(fn match -> + {_pattern, builder} = Map.fetch!(metadata, match.pattern) + + match + |> builder.() + |> List.wrap() + |> Enum.map(&struct_evidence(evidence_module, family, &1, match)) + |> Enum.reject(&is_nil/1) + end) + end + + defp find_many(ast, patterns) do + Patcher.find_many(ast, patterns) + rescue + FunctionClauseError -> [] + ArgumentError -> [] + end + + def match_meta(%{range: %{start: start}}) when is_list(start) do + [line: start[:line], column: start[:column]] + end + + def match_meta(%{node: {_form, meta, _args}}) when is_list(meta), do: meta + def match_meta(_match), do: [] + + defp struct_evidence(_evidence_module, _family, nil, _match), do: nil + defp struct_evidence(_evidence_module, _family, false, _match), do: nil + defp struct_evidence(_evidence_module, _family, [], _match), do: nil + + defp struct_evidence(evidence_module, family, attrs, match) + when is_map(attrs) or is_list(attrs) do + attrs = Map.new(attrs) + meta = Map.get(attrs, :meta) || match_meta(match) + + evidence_module + |> struct() + |> Map.from_struct() + |> maybe_put_family(attrs, family) + |> Map.put(:meta, meta) + |> then(&struct!(evidence_module, &1)) + end + + defp maybe_put_family(struct_fields, attrs, family) do + if Map.has_key?(struct_fields, :family) do + Map.put_new(attrs, :family, family) + else + attrs + end + end +end diff --git a/lib/reach/evidence/standard_library_bypass.ex b/lib/reach/evidence/standard_library_bypass.ex new file mode 100644 index 0000000..85b5cb3 --- /dev/null +++ b/lib/reach/evidence/standard_library_bypass.ex @@ -0,0 +1,36 @@ +defmodule Reach.Evidence.StandardLibraryBypass do + @moduledoc "Collects evidence of ad-hoc code that bypasses standard library helpers." + + alias Reach.Evidence.Fact + + @families [ + Reach.Evidence.StandardLibraryBypass.PathURI, + Reach.Evidence.StandardLibraryBypass.Enum, + Reach.Evidence.StandardLibraryBypass.Map + ] + + def family, do: :stdlib + + def kinds do + @families + |> Enum.flat_map(& &1.kinds()) + |> Enum.uniq() + end + + def fact(kind, message, replacement, meta, confidence \\ :high) do + %Fact{ + family: :stdlib, + kind: kind, + message: message, + replacement: replacement, + meta: meta, + confidence: confidence + } + end + + def collect_ast(ast) do + @families + |> Enum.flat_map(& &1.collect_ast(ast)) + |> Enum.sort_by(&{&1.meta[:line] || 0, &1.meta[:column] || 0, &1.kind}) + end +end diff --git a/lib/reach/evidence/standard_library_bypass/enum.ex b/lib/reach/evidence/standard_library_bypass/enum.ex new file mode 100644 index 0000000..f90f369 --- /dev/null +++ b/lib/reach/evidence/standard_library_bypass/enum.ex @@ -0,0 +1,279 @@ +defmodule Reach.Evidence.StandardLibraryBypass.Enum do + @moduledoc "Collects Enum standard-library bypass evidence." + + import ExAST.Sigil + + alias Reach.Evidence.AST + alias Reach.Evidence.PatternRunner + alias Reach.Evidence.StandardLibraryBypass + + @manual_flat_map_concat_message "Enum.map followed by concat allocates an intermediate nested list; use Enum.flat_map/2" + @manual_flat_map_flatten_message "Enum.map followed by List.flatten/1 may be Enum.flat_map/2 when the mapper returns a flat list; preserve List.flatten/1 if recursive flattening is required" + + def collect_ast(ast) do + pattern_evidence(ast) ++ callback_evidence(ast) + end + + def kinds do + [ + :manual_flat_map, + :manual_frequencies, + :manual_frequencies_by, + :manual_flat_map_reduce, + :manual_flat_map_prepend_reverse + ] + end + + defp pattern_evidence(ast) do + PatternRunner.run(ast, pattern_specs(), family: :stdlib) + end + + defp pattern_specs do + [ + manual_flat_map_list: {~p[Enum.map(_, _) |> List.flatten()], &manual_flatten_evidence/1}, + manual_flat_map_concat: {~p[Enum.map(_, _) |> Enum.concat()], &manual_concat_evidence/1} + ] + end + + defp manual_flatten_evidence(_match) do + %{ + kind: :manual_flat_map, + message: @manual_flat_map_flatten_message, + replacement: "Enum.flat_map/2", + confidence: :medium + } + end + + defp manual_concat_evidence(_match) do + %{ + kind: :manual_flat_map, + message: @manual_flat_map_concat_message, + replacement: "Enum.flat_map/2", + confidence: :high + } + end + + defp callback_evidence(ast) do + AST.collect(ast, &collect_callback_node/2) + end + + defp collect_callback_node({:|>, meta, [left, right]} = node, acc) do + acc = collect_pipe_node(left, right, meta, acc) + collect_direct(node, acc) + end + + defp collect_callback_node(node, acc), do: collect_direct(node, acc) + + defp collect_pipe_node(left, right, meta, acc) do + case {enum_map_call(left) || flat_map_prepend_reverse_call(left), pipe_reader(right)} do + {{:enum_map, _enumerable, _mapper, _map_meta}, {:flatten, _flatten_meta}} -> + acc + + {{:flat_map_prepend_reverse, _reduce_meta}, {:reverse, _reverse_meta}} -> + evidence( + acc, + :manual_flat_map_prepend_reverse, + "Enum.reduce with Enum.reverse(chunk, acc) followed by reverse reimplements Enum.flat_map/2", + "Enum.flat_map/2", + meta + ) + + _ -> + acc + end + end + + defp collect_direct(node, acc) do + case {frequencies_shape(node), flat_map_reduce_shape(node)} do + {{:ok, kind, replacement, meta}, _flat_map_reduce} -> + evidence( + acc, + kind, + "Enum.reduce builds a count map; use #{replacement}", + replacement, + meta + ) + + {_frequencies, {:ok, meta}} -> + evidence( + acc, + :manual_flat_map_reduce, + "Enum.reduce appends mapped lists into an accumulator; use Enum.flat_map/2", + "Enum.flat_map/2", + meta + ) + + _ -> + acc + end + end + + defp enum_map_call( + {{:., meta, [{:__aliases__, _, [:Enum]}, :map]}, _call_meta, [enumerable, mapper]} + ), + do: {:enum_map, enumerable, mapper, meta} + + defp enum_map_call( + {:|>, _pipe_meta, + [enumerable, {{:., meta, [{:__aliases__, _, [:Enum]}, :map]}, _call_meta, [mapper]}]} + ), + do: {:enum_map, enumerable, mapper, meta} + + defp enum_map_call(_node), do: nil + + defp pipe_reader({{:., meta, [{:__aliases__, _, [:List]}, :flatten]}, _call_meta, []}), + do: {:flatten, meta} + + defp pipe_reader({{:., meta, [{:__aliases__, _, [:Enum]}, :concat]}, _call_meta, []}), + do: {:flatten, meta} + + defp pipe_reader({{:., meta, [{:__aliases__, _, [:Enum]}, :reverse]}, _call_meta, []}), + do: {:reverse, meta} + + defp pipe_reader(_), do: nil + + defp frequencies_shape(node) do + with {:ok, meta, item, acc, body} <- reduce_empty_map_callback(node), + {:ok, key} <- count_map_body(body, acc) do + replacement = + if AST.same_ast?(key, item), do: "Enum.frequencies/1", else: "Enum.frequencies_by/2" + + kind = + if replacement == "Enum.frequencies/1", + do: :manual_frequencies, + else: :manual_frequencies_by + + {:ok, kind, replacement, meta} + else + _other -> :error + end + end + + defp reduce_empty_map_callback( + {{:., meta, [{:__aliases__, _, [:Enum]}, :reduce]}, _call_meta, + [_enumerable, {:%{}, _, []}, {:fn, _, [{:->, _, [[item, acc], body]}]}]} + ), + do: {:ok, meta, item, acc, body} + + defp reduce_empty_map_callback( + {:|>, _pipe_meta, + [ + _enumerable, + {{:., meta, [{:__aliases__, _, [:Enum]}, :reduce]}, _call_meta, + [{:%{}, _, []}, {:fn, _, [{:->, _, [[item, acc], body]}]}]} + ]} + ), + do: {:ok, meta, item, acc, body} + + defp reduce_empty_map_callback(_node), do: :error + + defp count_map_body( + {{:., _, [{:__aliases__, _, [:Map]}, :update]}, _, [acc, key, 1, increment_fun]}, + expected_acc + ) do + if AST.same_ast?(acc, expected_acc) and increment_by_one_fun?(increment_fun), + do: {:ok, key}, + else: :error + end + + defp count_map_body({:__block__, _, [assignment, put_call]}, expected_acc) do + with {:ok, count_var, key} <- count_assignment(assignment, expected_acc), + true <- count_put_call?(put_call, expected_acc, key, count_var) do + {:ok, key} + else + _other -> :error + end + end + + defp count_map_body(_body, _expected_acc), do: :error + + defp count_assignment( + {:=, _, [count_var, {{:., _, [{:__aliases__, _, [:Map]}, :get]}, _, [acc, key, 0]}]}, + expected_acc + ) do + if AST.same_ast?(acc, expected_acc), do: {:ok, count_var, key}, else: :error + end + + defp count_assignment(_node, _expected_acc), do: :error + + defp count_put_call?( + {{:., _, [{:__aliases__, _, [:Map]}, :put]}, _, [acc, key, increment]}, + expected_acc, + expected_key, + count_var + ) do + AST.same_ast?(acc, expected_acc) and AST.same_ast?(key, expected_key) and + increment_by_one?(increment, count_var) + end + + defp count_put_call?(_node, _expected_acc, _expected_key, _count_var), do: false + + defp increment_by_one_fun?({:&, _, [{:+, _, [{:&, _, [1]}, 1]}]}), do: true + defp increment_by_one_fun?({:&, _, [{:+, _, [1, {:&, _, [1]}]}]}), do: true + defp increment_by_one_fun?(_node), do: false + + defp increment_by_one?({:+, _, [var, 1]}, expected_var), do: AST.same_ast?(var, expected_var) + defp increment_by_one?({:+, _, [1, var]}, expected_var), do: AST.same_ast?(var, expected_var) + defp increment_by_one?(_node, _expected_var), do: false + + defp flat_map_reduce_shape(node) do + with {:ok, meta, item, acc, body} <- reduce_empty_list_callback(node), + true <- append_mapped_list?(body, item, acc) do + {:ok, meta} + else + _other -> :error + end + end + + defp flat_map_prepend_reverse_call(node) do + with {:ok, meta, item, acc, body} <- reduce_empty_list_callback(node), + true <- reverse_chunk_into_acc?(body, item, acc) do + {:flat_map_prepend_reverse, meta} + else + _other -> nil + end + end + + defp reduce_empty_list_callback( + {{:., meta, [{:__aliases__, _, [:Enum]}, :reduce]}, _call_meta, + [_enumerable, [], {:fn, _, [{:->, _, [[item, acc], body]}]}]} + ), + do: {:ok, meta, item, acc, body} + + defp reduce_empty_list_callback( + {:|>, _pipe_meta, + [ + _enumerable, + {{:., meta, [{:__aliases__, _, [:Enum]}, :reduce]}, _call_meta, + [[], {:fn, _, [{:->, _, [[item, acc], body]}]}]} + ]} + ), + do: {:ok, meta, item, acc, body} + + defp reduce_empty_list_callback(_node), do: :error + + defp append_mapped_list?({:++, _, [left, right]}, item, acc) do + AST.same_ast?(left, acc) and AST.references?(right, item) and not AST.references?(right, acc) and + not single_element_literal_list?(right) + end + + defp append_mapped_list?(_body, _item, _acc), do: false + + defp single_element_literal_list?([_item]), do: true + defp single_element_literal_list?(_node), do: false + + defp reverse_chunk_into_acc?( + {{:., _, [{:__aliases__, _, [:Enum]}, :reverse]}, _, [chunk, acc]}, + item, + expected_acc + ) do + AST.same_ast?(acc, expected_acc) and AST.references?(chunk, item) and + not AST.references?(chunk, acc) + end + + defp reverse_chunk_into_acc?(_body, _item, _acc), do: false + + defp evidence(acc, kind, message, replacement, meta) do + [StandardLibraryBypass.fact(kind, message, replacement, meta) | acc] + end +end diff --git a/lib/reach/evidence/standard_library_bypass/map.ex b/lib/reach/evidence/standard_library_bypass/map.ex new file mode 100644 index 0000000..a90ff28 --- /dev/null +++ b/lib/reach/evidence/standard_library_bypass/map.ex @@ -0,0 +1,117 @@ +defmodule Reach.Evidence.StandardLibraryBypass.Map do + @moduledoc "Collects Map standard-library bypass evidence." + + alias Reach.Evidence.AST + alias Reach.Evidence.StandardLibraryBypass + + def collect_ast(ast) do + AST.collect(ast, &collect_node/2) + end + + def kinds, do: [:manual_map_update, :manual_map_update_bang] + + defp collect_node(node, acc) do + case {map_update_shape(node), map_update_bang_shape(node)} do + {{:ok, meta}, _update_bang} -> + evidence( + acc, + :manual_map_update, + "Map.has_key? plus paired Map.put branches reimplements Map.update/4", + "Map.update/4", + meta + ) + + {_update, {:ok, meta}} -> + evidence( + acc, + :manual_map_update_bang, + "Map.fetch! followed by Map.put on the same key reimplements Map.update!/3", + "Map.update!/3", + meta + ) + + _ -> + acc + end + end + + defp map_update_shape({:if, meta, [condition, [do: do_branch, else: else_branch]]}) do + with {:ok, map, key} <- map_has_key_call(condition), + true <- map_put_call?(do_branch, map, key), + true <- map_put_call?(else_branch, map, key) do + {:ok, meta} + else + _other -> :error + end + end + + defp map_update_shape(_node), do: :error + + defp map_has_key_call({{:., _, [{:__aliases__, _, [:Map]}, :has_key?]}, _, [map, key]}), + do: {:ok, map, key} + + defp map_has_key_call(_node), do: :error + + defp map_put_call?( + {{:., _, [{:__aliases__, _, [:Map]}, :put]}, _, [map, key, _value]}, + expected_map, + expected_key + ), + do: AST.same_ast?(map, expected_map) and AST.same_ast?(key, expected_key) + + defp map_put_call?(_node, _expected_map, _expected_key), do: false + + defp map_update_bang_shape({:__block__, meta, [assignment, put_call]}) do + with {:ok, value_var, map, key, assignment_meta} <- fetch_bang_assignment(assignment), + true <- update_bang_put_call?(put_call, map, key, value_var) do + {:ok, line_meta(assignment_meta, meta)} + else + _other -> :error + end + end + + defp map_update_bang_shape( + {{:., meta, [{:__aliases__, _, [:Map]}, :put]}, _, [map, key, value]} + ) do + if fetch_bang_value_for?(value, map, key), do: {:ok, meta}, else: :error + end + + defp map_update_bang_shape(_node), do: :error + + defp fetch_bang_assignment( + {:=, meta, [value_var, {{:., _, [{:__aliases__, _, [:Map]}, :fetch!]}, _, [map, key]}]} + ), + do: {:ok, value_var, map, key, meta} + + defp fetch_bang_assignment(_node), do: :error + + defp update_bang_put_call?( + {{:., _, [{:__aliases__, _, [:Map]}, :put]}, _, [map, key, value]}, + expected_map, + expected_key, + value_var + ) do + AST.same_ast?(map, expected_map) and AST.same_ast?(key, expected_key) and + AST.references?(value, value_var) + end + + defp update_bang_put_call?(_node, _expected_map, _expected_key, _value_var), do: false + + defp fetch_bang_value_for?(node, expected_map, expected_key) do + AST.contains?(node, fn + {{:., _, [{:__aliases__, _, [:Map]}, :fetch!]}, _, [map, key]} -> + AST.same_ast?(map, expected_map) and AST.same_ast?(key, expected_key) + + _child -> + false + end) + end + + defp line_meta(preferred, fallback) do + if Keyword.get(preferred, :line), do: preferred, else: fallback + end + + defp evidence(acc, kind, message, replacement, meta) do + [StandardLibraryBypass.fact(kind, message, replacement, meta) | acc] + end +end diff --git a/lib/reach/evidence/standard_library_bypass/path_uri.ex b/lib/reach/evidence/standard_library_bypass/path_uri.ex new file mode 100644 index 0000000..9264547 --- /dev/null +++ b/lib/reach/evidence/standard_library_bypass/path_uri.ex @@ -0,0 +1,102 @@ +defmodule Reach.Evidence.StandardLibraryBypass.PathURI do + @moduledoc "Collects Path/URI standard-library bypass evidence." + + import ExAST.Sigil + + alias Reach.Evidence.PatternRunner + + @path_names ~w(path filepath file_path filename file dir directory source dest destination)a + @uri_names ~w(url uri href endpoint query query_string qs)a + + @pattern_evidence [ + path_basename: + {~p[_ |> String.split("/") |> List.last()], :path_pipe, :manual_path_basename, + "manual path basename extraction; use Path.basename/1", "Path.basename/1"}, + path_extension: + {~p[_ |> String.split(".") |> List.last()], :path_pipe, :manual_path_extension, + "manual path extension extraction; use Path.extname/1 and normalize the leading dot when needed", + "Path.extname/1"}, + uri_path_split: + {~p[_ |> String.split("?") |> List.first()], :uri_pipe, :manual_uri_path_split, + "manual URL splitting; use URI.parse/1", "URI.parse/1"}, + query_parsing: + {~p[String.split(_, "&")], :uri_direct, :manual_query_parsing, + "manual query-string splitting; use URI.decode_query/1", "URI.decode_query/1"}, + uri_scheme_split: + {~p[String.split(_, "://")], :uri_direct, :manual_uri_scheme_split, + "manual URL scheme splitting; use URI.parse/1", "URI.parse/1"} + ] + + def collect_ast(ast), do: PatternRunner.run(ast, pattern_specs(), family: :stdlib) + + def kinds do + [ + :manual_path_basename, + :manual_path_extension, + :manual_query_parsing, + :manual_uri_path_split, + :manual_uri_scheme_split + ] + end + + defp pattern_specs do + Enum.map(@pattern_evidence, fn {name, {pattern, mode, kind, message, replacement}} -> + {name, {pattern, &build_evidence(&1, mode, kind, message, replacement)}} + end) + end + + defp build_evidence(match, :path_pipe, kind, message, replacement) do + with {:ok, subject} <- pipe_split_subject(match.node), + true <- path_subject?(subject) do + evidence(kind, message, replacement) + else + _other -> nil + end + end + + defp build_evidence(match, :uri_pipe, kind, message, replacement) do + with {:ok, subject} <- pipe_split_subject(match.node), + true <- uri_subject?(subject) do + evidence(kind, message, replacement) + else + _other -> nil + end + end + + defp build_evidence(match, :uri_direct, kind, message, replacement) do + with {:ok, subject} <- direct_split_subject(match.node), + true <- uri_subject?(subject) do + evidence(kind, message, replacement) + else + _other -> nil + end + end + + defp pipe_split_subject({:|>, _, [{:|>, _, [subject, _split_call]}, _reader]}), + do: {:ok, subject} + + defp pipe_split_subject( + {:|>, _, + [{{:., _, [{:__aliases__, _, [:String]}, :split]}, _, [subject, _delimiter]}, _reader]} + ), + do: {:ok, subject} + + defp pipe_split_subject(_node), do: :error + + defp direct_split_subject( + {{:., _, [{:__aliases__, _, [:String]}, :split]}, _, [subject, _delimiter]} + ), + do: {:ok, subject} + + defp direct_split_subject(_node), do: :error + + defp evidence(kind, message, replacement) do + %{kind: kind, message: message, replacement: replacement, confidence: :high} + end + + defp path_subject?(subject), do: subject_name(subject) in @path_names + defp uri_subject?(subject), do: subject_name(subject) in @uri_names + + defp subject_name({name, _meta, context}) when is_atom(name) and is_atom(context), do: name + defp subject_name(_), do: nil +end diff --git a/lib/reach/frontend/elixir.ex b/lib/reach/frontend/elixir.ex index 00b1b0b..5c38640 100644 --- a/lib/reach/frontend/elixir.ex +++ b/lib/reach/frontend/elixir.ex @@ -1097,13 +1097,21 @@ defmodule Reach.Frontend.Elixir do defp module_name(atom) when is_atom(atom), do: atom defp module_name(other), do: other + defp protocol_impl_module(protocol, _target) when not is_atom(protocol), do: protocol defp protocol_impl_module(protocol, nil), do: protocol - defp protocol_impl_module(protocol, {:__aliases__, _, _} = target), - do: Module.concat(protocol, module_name(target)) + defp protocol_impl_module(protocol, {:__aliases__, _, _} = target) do + case module_name(target) do + module when is_atom(module) -> Module.concat(protocol, module) + _dynamic -> protocol + end + end - defp protocol_impl_module(protocol, target) when is_atom(target), - do: Module.concat(protocol, target) + defp protocol_impl_module(protocol, target) when is_atom(target) do + Module.concat(protocol, target) + rescue + ArgumentError -> protocol + end defp protocol_impl_module(protocol, _target), do: protocol diff --git a/lib/reach/plugin.ex b/lib/reach/plugin.ex index 172ba71..7cf7e3d 100644 --- a/lib/reach/plugin.ex +++ b/lib/reach/plugin.ex @@ -90,6 +90,9 @@ defmodule Reach.Plugin do {:ok, [Node.t()]} | {:error, term()} @callback smell_checks() :: [module()] + @callback evidence_providers() :: [module()] + @callback refine_evidence(evidence :: struct() | map(), context :: map()) :: + struct() | map() | :unchanged @callback trace_pattern(pattern :: String.t()) :: (Node.t() -> boolean()) | nil @callback behaviour_label(callbacks :: [atom()]) :: String.t() | nil @callback ignore_call_edge?(Graph.Edge.t()) :: boolean() @@ -101,6 +104,8 @@ defmodule Reach.Plugin do source_language: 1, parse_file: 2, smell_checks: 0, + evidence_providers: 0, + refine_evidence: 2, trace_pattern: 1, behaviour_label: 1, ignore_call_edge?: 1 @@ -113,8 +118,8 @@ defmodule Reach.Plugin do {GenStage, Reach.Plugins.GenStage}, {Jido.Action, Reach.Plugins.Jido}, {OpenTelemetry.Tracer, Reach.Plugins.OpenTelemetry}, - {Jason, Reach.Plugins.JSON}, - {Poison, Reach.Plugins.JSON}, + {Jason, Reach.Plugins.Jason}, + {Poison, Reach.Plugins.Poison}, {QuickBEAM, Reach.Plugins.QuickBEAM} ] @@ -213,6 +218,41 @@ defmodule Reach.Plugin do |> Enum.uniq() end + @doc "Returns evidence providers contributed by plugins." + def evidence_providers(plugins) do + plugins + |> Enum.flat_map(fn plugin -> + if exports?(plugin, :evidence_providers, 0), do: plugin.evidence_providers(), else: [] + end) + |> Enum.uniq() + end + + @doc "Lets plugins annotate or adjust evidence facts without owning user-facing policy." + def refine_evidence(plugins, evidence, context \\ %{}) do + Enum.reduce(plugins, evidence, fn plugin, evidence -> + if exports?(plugin, :refine_evidence, 2) do + apply_evidence_refinement(evidence, plugin.refine_evidence(evidence, context)) + else + evidence + end + end) + end + + defp apply_evidence_refinement(evidence, :unchanged), do: evidence + + defp apply_evidence_refinement(evidence, updates) when is_map(updates) do + if same_struct?(evidence, updates) do + updates + else + Map.merge(evidence, updates) + end + end + + defp apply_evidence_refinement(evidence, _other), do: evidence + + defp same_struct?(%module{}, %module{}), do: true + defp same_struct?(_evidence, _updates), do: false + @doc "Compiles a framework-specific trace pattern, if a plugin recognizes it." def trace_pattern(plugins, pattern) do Enum.find_value(plugins, fn plugin -> diff --git a/lib/reach/plugins/jason.ex b/lib/reach/plugins/jason.ex new file mode 100644 index 0000000..eef0e82 --- /dev/null +++ b/lib/reach/plugins/jason.ex @@ -0,0 +1,41 @@ +defmodule Reach.Plugins.Jason do + @moduledoc "Plugin for Jason effect classification and JSON encoder smells." + @behaviour Reach.Plugin + + alias Reach.Evidence.MapContract + alias Reach.IR.Node + + @impl true + def smell_checks do + [Reach.Plugins.Jason.Smells.HandRolledEncoder] + end + + @impl true + def evidence_providers do + [Reach.Plugins.Jason.Evidence.HandRolledEncoder] + end + + @impl true + def classify_effect(%Node{type: :call, meta: %{module: Jason}}), do: :pure + def classify_effect(_), do: nil + + @impl true + def refine_evidence(%MapContract.Contract{escapes: escapes}, _context) do + if Enum.any?(escapes || [], &jason_encode_escape?/1) do + %{role: :external_payload} + else + :unchanged + end + end + + def refine_evidence(_evidence, _context), do: :unchanged + + @impl true + def analyze(_all_nodes, _opts), do: [] + + defp jason_encode_escape?(%{module: Jason, function: function, arity: arity}) do + function in [:encode, :encode!] and arity in [1, 2] + end + + defp jason_encode_escape?(_escape), do: false +end diff --git a/lib/reach/plugins/jason/evidence/hand_rolled_encoder.ex b/lib/reach/plugins/jason/evidence/hand_rolled_encoder.ex new file mode 100644 index 0000000..81c61ed --- /dev/null +++ b/lib/reach/plugins/jason/evidence/hand_rolled_encoder.ex @@ -0,0 +1,160 @@ +defmodule Reach.Plugins.Jason.Evidence.HandRolledEncoder do + @moduledoc "Collects evidence of manual JSON encoding that Jason can own." + + alias Reach.Evidence.AST + alias Reach.Evidence.Fact + + @json_sanitizer_names [:json_safe, :normalize_json, :json_key, :json_safe_key] + @json_encoder_names [:encode_json, :do_encode, :encode_scalar, :indent_json, :indent_lines] + + def family, do: :jason + + def kinds do + [ + :hand_rolled_json_sanitizer, + :hand_rolled_json_encoder, + :manual_jason_encoder_map + ] + end + + def collect_ast(ast) do + ast + |> callback_evidence() + |> Kernel.++(manual_jason_encoder_evidence(ast)) + |> Enum.uniq_by(fn evidence -> + {evidence.kind, canonical_line(evidence.meta[:line]), evidence.meta[:column]} + end) + end + + defp callback_evidence(ast), do: AST.collect(ast, &collect_node/2) + + defp collect_node({def_kind, _meta, [{name, meta, _args} | _]} = node, acc) + when def_kind in [:def, :defp] and name in @json_sanitizer_names do + if json_boundary_body?(node) do + [ + evidence( + :hand_rolled_json_sanitizer, + "hand-rolled JSON sanitizer; prefer Jason.Encoder implementations at the domain boundary", + "Jason.Encoder", + meta, + :high + ) + | acc + ] + else + acc + end + end + + defp collect_node({def_kind, _meta, [{name, meta, _args} | _]} = node, acc) + when def_kind in [:def, :defp] and name in @json_encoder_names do + if manual_json_writer_body?(node) do + [ + evidence( + :hand_rolled_json_encoder, + "hand-rolled JSON encoder or pretty-printer; use Jason.encode/2", + "Jason.encode/2", + meta, + :high + ) + | acc + ] + else + acc + end + end + + defp collect_node(_node, acc), do: acc + + defp manual_jason_encoder_evidence(ast) do + if direct_to_map_projection?(ast), + do: AST.collect(ast, &collect_manual_jason_encoder_node/2), + else: [] + end + + defp collect_manual_jason_encoder_node(node, acc) do + if direct_jason_encoder_map?(node) do + [ + evidence( + :manual_jason_encoder_map, + "Jason encoder delegates through a direct to_map/1 projection; use @derive Jason.Encoder", + "@derive Jason.Encoder", + jason_encoder_meta(node), + :high + ) + | acc + ] + else + acc + end + end + + defp json_boundary_body?(node) do + AST.count_calls(node, [ + {:__local__, :json_safe}, + {:__local__, :normalize_json}, + {Map, :from_struct}, + {DateTime, :to_iso8601}, + {NaiveDateTime, :to_iso8601}, + {Atom, :to_string}, + {Tuple, :to_list} + ]) >= 2 + end + + defp manual_json_writer_body?(node) do + AST.contains_call?(node, {String, :duplicate}) or + AST.contains_call?(node, {String, :replace}) + end + + defp direct_jason_encoder_map?( + {{:., _, [{:__aliases__, _, [:Jason, :Encode]}, :map]}, _, [to_map_call, _opts]} + ) do + to_map_call?(to_map_call) + end + + defp direct_jason_encoder_map?(_node), do: false + + defp to_map_call?({:to_map, _meta, args}) when is_list(args), do: true + defp to_map_call?({{:., _, [_module, :to_map]}, _, args}) when is_list(args), do: true + defp to_map_call?(_node), do: false + + defp direct_to_map_projection?(ast) do + AST.contains?(ast, fn + {def_kind, _meta, [{:to_map, _to_map_meta, [_arg]}, [do: body]]} + when def_kind in [:def, :defp] -> + direct_map_from_struct?(body) + + _node -> + false + end) + end + + defp direct_map_from_struct?({{:., _, [{:__aliases__, _, [:Map]}, :from_struct]}, _, [_value]}), + do: true + + defp direct_map_from_struct?( + {:|>, _, [_value, {{:., _, [{:__aliases__, _, [:Map]}, :from_struct]}, _, []}]} + ), + do: true + + defp direct_map_from_struct?(_body), do: false + + defp jason_encoder_meta({{:., meta, [{:__aliases__, _, [:Jason, :Encode]}, :map]}, _, _args}), + do: meta + + defp jason_encoder_meta(_node), do: [line: :jason_encoder_map] + + defp canonical_line(line) when is_atom(line), do: line + defp canonical_line(_line), do: :json_writer + + defp evidence(kind, message, replacement, meta, confidence) do + %Fact{ + family: :jason, + kind: kind, + message: message, + replacement: replacement, + meta: meta, + confidence: confidence + } + end +end diff --git a/lib/reach/plugins/jason/smells/hand_rolled_encoder.ex b/lib/reach/plugins/jason/smells/hand_rolled_encoder.ex new file mode 100644 index 0000000..2d818c5 --- /dev/null +++ b/lib/reach/plugins/jason/smells/hand_rolled_encoder.ex @@ -0,0 +1,31 @@ +defmodule Reach.Plugins.Jason.Smells.HandRolledEncoder do + @moduledoc "Detects hand-rolled JSON encoding that should live behind Jason encoders." + + use Reach.Smell.Check.AST + + alias Reach.Plugins.Jason.Evidence.HandRolledEncoder + alias Reach.Smell.Finding + + @impl true + def kinds, do: HandRolledEncoder.kinds() + + defp scan_ast(ast, file) do + ast + |> HandRolledEncoder.collect_ast() + |> Enum.map(&finding(&1, file)) + end + + defp finding(evidence, file) do + Finding.new( + kind: evidence.kind, + message: evidence.message, + location: location(file, evidence.meta), + evidence: evidence.replacement, + confidence: evidence.confidence + ) + end + + defp location(file, meta) do + %{file: file, line: meta[:line], column: meta[:column]} + end +end diff --git a/lib/reach/plugins/json.ex b/lib/reach/plugins/json.ex deleted file mode 100644 index 66f7e35..0000000 --- a/lib/reach/plugins/json.ex +++ /dev/null @@ -1,18 +0,0 @@ -defmodule Reach.Plugins.JSON do - @moduledoc "Plugin for Jason encoding effect classification." - @behaviour Reach.Plugin - - alias Reach.IR.Node - - @json_modules [Jason, Poison] - - @impl true - def classify_effect(%Node{type: :call, meta: %{module: mod}}) - when mod in @json_modules, - do: :pure - - def classify_effect(_), do: nil - - @impl true - def analyze(_all_nodes, _opts), do: [] -end diff --git a/lib/reach/plugins/poison.ex b/lib/reach/plugins/poison.ex new file mode 100644 index 0000000..28ac38d --- /dev/null +++ b/lib/reach/plugins/poison.ex @@ -0,0 +1,14 @@ +defmodule Reach.Plugins.Poison do + @moduledoc "Plugin for Poison effect classification." + @behaviour Reach.Plugin + + alias Reach.IR.Node + + @impl true + def classify_effect(%Node{type: :call, meta: %{module: Poison}}), do: :pure + + def classify_effect(_), do: nil + + @impl true + def analyze(_all_nodes, _opts), do: [] +end diff --git a/lib/reach/smell/checks/behaviour_candidate.ex b/lib/reach/smell/checks/behaviour_candidate.ex index 01d49c3..dabc68d 100644 --- a/lib/reach/smell/checks/behaviour_candidate.ex +++ b/lib/reach/smell/checks/behaviour_candidate.ex @@ -3,8 +3,8 @@ defmodule Reach.Smell.Checks.BehaviourCandidate do @behaviour Reach.Smell.Check - alias Reach.CloneAnalysis alias Reach.Config + alias Reach.Evidence.CloneAnalysis alias Reach.IR alias Reach.Smell.Finding alias Reach.Smell.Helpers diff --git a/lib/reach/smell/checks/clone_consistency.ex b/lib/reach/smell/checks/clone_consistency.ex index 25dc712..2b9749d 100644 --- a/lib/reach/smell/checks/clone_consistency.ex +++ b/lib/reach/smell/checks/clone_consistency.ex @@ -3,8 +3,8 @@ defmodule Reach.Smell.Checks.CloneConsistency do @behaviour Reach.Smell.Check - alias Reach.CloneAnalysis alias Reach.Config + alias Reach.Evidence.CloneAnalysis alias Reach.Smell.Finding @impl true diff --git a/lib/reach/smell/checks/standard_library_bypass.ex b/lib/reach/smell/checks/standard_library_bypass.ex new file mode 100644 index 0000000..2fb3295 --- /dev/null +++ b/lib/reach/smell/checks/standard_library_bypass.ex @@ -0,0 +1,33 @@ +defmodule Reach.Smell.Checks.StandardLibraryBypass do + @moduledoc "Detects ad-hoc implementations that bypass standard library helpers." + + use Reach.Smell.Check.AST + + alias Reach.Evidence.StandardLibraryBypass + alias Reach.Smell.Finding + + @impl true + def kinds, do: StandardLibraryBypass.kinds() + + defp scan_ast(ast, file) do + ast + |> StandardLibraryBypass.collect_ast() + |> Enum.map(&finding(&1, file)) + end + + defp finding(evidence, file) do + Finding.new( + kind: evidence.kind, + message: evidence.message, + location: location(file, evidence.meta), + evidence: evidence.replacement, + confidence: evidence.confidence + ) + end + + defp location(file, meta) do + line = if is_list(meta), do: meta[:line], else: nil + column = if is_list(meta), do: meta[:column], else: nil + %{file: file, line: line, column: column} + end +end diff --git a/mix.exs b/mix.exs index 1258e99..5f4026f 100644 --- a/mix.exs +++ b/mix.exs @@ -129,6 +129,7 @@ defmodule Reach.MixProject do Check: [~r/Reach\.Check/], Smells: [~r/Reach\.Smell/], OTP: [~r/Reach\.OTP/], + Evidence: [~r/Reach\.Evidence/], IR: [Reach.IR, Reach.IR.Node, Reach.IR.Helpers], Analysis: [ Reach.ControlFlow, diff --git a/scripts/evidence_corpus_scan.exs b/scripts/evidence_corpus_scan.exs new file mode 100644 index 0000000..c7b2e21 --- /dev/null +++ b/scripts/evidence_corpus_scan.exs @@ -0,0 +1,176 @@ +defmodule Reach.EvidenceCorpusScan do + @moduledoc false + + @kinds ~w(jason stdlib map-contract all) + + def run(["--" | argv]), do: run(argv) + + def run(argv) do + {opts, args, invalid} = + OptionParser.parse(argv, + strict: [kind: :string, limit: :integer, format: :string], + aliases: [k: :kind, n: :limit, f: :format] + ) + + if invalid != [], do: usage("invalid option(s): #{inspect(invalid)}") + + kind = Keyword.get(opts, :kind, "all") + limit = Keyword.get(opts, :limit, 20) + format = Keyword.get(opts, :format, "text") + + unless kind in @kinds, do: usage("unknown kind #{inspect(kind)}") + unless format in ["text", "json"], do: usage("unknown format #{inspect(format)}") + + case args do + [] -> usage("expected at least one repository or source directory") + paths -> scan(paths, kind, limit, format) + end + end + + defp scan(paths, kind, limit, format) do + {:ok, _apps} = Application.ensure_all_started(:ex_unit) + plugins = Reach.Plugin.detect() + providers = Reach.Evidence.ast_providers_for(kind_family(kind), plugins) + + paths + |> Enum.flat_map(&Path.wildcard(Path.join(&1, "{lib,test}/**/*.{ex,exs}"))) + |> Enum.uniq() + |> Enum.sort() + |> Enum.flat_map(&scan_file(&1, providers, plugins)) + |> print_results(limit, format) + end + + defp scan_file(path, providers, plugins) do + with {:ok, source} <- File.read(path), + {:ok, ast} <- parse_source(source) do + providers + |> Enum.flat_map(&provider_evidence_silently(&1, ast, plugins)) + |> Enum.map(&Map.put(&1, :file, path)) + else + _error -> [] + end + end + + defp parse_source(source) do + capture_stderr(fn -> + {result, _diagnostics} = + Code.with_diagnostics(fn -> + Code.string_to_quoted(source, emit_warnings: false) + end) + + result + end) + end + + defp capture_stderr(fun) do + parent = self() + + ExUnit.CaptureIO.capture_io(:stderr, fn -> + send(parent, {:captured_result, fun.()}) + end) + + receive do + {:captured_result, result} -> result + end + end + + defp provider_evidence_silently(provider, ast, plugins) do + {result, _diagnostics} = + Code.with_diagnostics(fn -> provider_evidence(provider, ast, plugins) end) + + result + end + + defp provider_evidence(Reach.Evidence.MapContract, ast, plugins) do + ast + |> Reach.Evidence.MapContract.collect_ast(plugins: plugins) + |> Enum.map(fn contract -> + message = + "map #{inspect(contract.variable)} uses keys #{inspect(contract.keys)} as an implicit contract" + + Reach.Evidence.MapContract.family() + |> evidence(:implicit_map_contract, message, contract.location, contract.confidence) + |> Map.merge(%{ + variable: contract.variable, + keys: contract.keys, + source: contract.source, + producer: contract.producer, + role: contract.role, + key_coverage: contract.key_coverage, + observed_keys: contract.observed_keys, + unused_keys: contract.unused_keys, + read_count: contract.read_count, + mutation_count: contract.mutation_count, + escaped?: contract.escaped?, + escapes: contract.escapes, + consumer: contract.consumer + }) + end) + end + + defp provider_evidence(provider, ast, _plugins) do + ast + |> provider.collect_ast() + |> Enum.map(&evidence(provider.family(), &1.kind, &1.message, &1.meta, &1.confidence)) + end + + defp kind_family("all"), do: :all + defp kind_family("map-contract"), do: :map_contract + defp kind_family(kind), do: String.to_existing_atom(kind) + + defp print_results(results, _limit, "json") do + results + |> Enum.map(&json_result/1) + |> Jason.encode!(pretty: true) + |> IO.puts() + end + + defp print_results(results, limit, "text") do + grouped = Enum.group_by(results, & &1.family) + + IO.puts("Evidence corpus scan") + IO.puts("total=#{length(results)}") + + for {family, family_results} <- Enum.sort_by(grouped, fn {family, _} -> family end) do + IO.puts("\n## #{family} #{length(family_results)}") + + family_results + |> Enum.frequencies_by(& &1.kind) + |> Enum.sort_by(fn {kind, count} -> {-count, kind} end) + |> Enum.each(fn {kind, count} -> IO.puts("#{kind}=#{count}") end) + + family_results + |> Enum.take(limit) + |> Enum.each(fn result -> + IO.puts("- #{result.kind} #{result.file}:#{result.line} #{result.message}") + end) + end + end + + defp json_result(result) do + Map.new(result, fn {key, value} -> {to_string(key), json_value(value)} end) + end + + defp json_value(tuple) when is_tuple(tuple), do: Tuple.to_list(tuple) + defp json_value(value), do: value + + defp evidence(family, kind, message, meta, confidence) do + %{family: family, kind: kind, message: message, line: meta[:line], confidence: confidence} + end + + defp usage(message) do + Mix.raise(""" + #{message} + + Usage: + mix run scripts/evidence_corpus_scan.exs -- --kind jason PATH [PATH...] + mix run scripts/evidence_corpus_scan.exs -- --kind stdlib PATH [PATH...] + mix run scripts/evidence_corpus_scan.exs -- --kind map-contract PATH [PATH...] + mix run scripts/evidence_corpus_scan.exs -- --kind all --format json PATH [PATH...] + + Kinds: #{Enum.join(@kinds, ", ")} + """) + end +end + +Reach.EvidenceCorpusScan.run(System.argv()) diff --git a/test/reach/analysis_test.exs b/test/reach/analysis_test.exs new file mode 100644 index 0000000..a027a39 --- /dev/null +++ b/test/reach/analysis_test.exs @@ -0,0 +1,18 @@ +defmodule Reach.AnalysisTest do + use ExUnit.Case, async: true + + alias Reach.Analysis + alias Reach.IR.Node + + test "expected effect boundary ignores Erlang module atoms" do + node = %Node{ + id: "fun", + type: :function_def, + meta: %{module: :zigler, name: :run, arity: 0}, + source_span: nil, + children: [] + } + + refute Analysis.expected_effect_boundary?(node) + end +end diff --git a/test/reach/check/architecture/policy_test.exs b/test/reach/check/architecture/policy_test.exs index 0e5de77..c29062e 100644 --- a/test/reach/check/architecture/policy_test.exs +++ b/test/reach/check/architecture/policy_test.exs @@ -267,6 +267,31 @@ defmodule Reach.Check.ArchitecturePolicyTest do end end + test "evidence providers stay reusable and do not emit smell findings" do + evidence_sources = Path.wildcard("lib/reach/evidence/**/*.ex") + + refute evidence_sources == [] + + for source <- evidence_sources do + content = File.read!(source) + refute content =~ "Reach.Smell.Finding" + refute content =~ "Finding.new" + refute content =~ "Reach.CLI." + end + end + + test "legacy clone analysis namespace is not reintroduced" do + modules = :reach |> Application.spec(:modules) |> List.wrap() + + refute Enum.any?(modules, &(Module.split(&1) |> Enum.take(2) == ["Reach", "CloneAnalysis"])) + end + + test "Poison has its own plugin instead of using generic JSON plugin" do + assert Code.ensure_loaded?(Reach.Plugins.Poison) + refute Code.ensure_loaded?(Reach.Plugins.JSON) + assert Reach.Plugin.classify_effect([Reach.Plugins.Poison], poison_call_node()) == :pure + end + defp architecture_project do dir = Path.join(System.tmp_dir!(), "reach-arch-fixture-#{System.unique_integer()}") File.mkdir_p!(dir) @@ -345,6 +370,15 @@ defmodule Reach.Check.ArchitecturePolicyTest do path end + defp poison_call_node do + %Reach.IR.Node{ + id: "poison", + type: :call, + meta: %{module: Poison, function: :encode!, arity: 1}, + children: [] + } + end + defp with_reach_config(contents) do previous = if File.exists?(".reach.exs"), do: File.read!(".reach.exs") File.write!(".reach.exs", contents) diff --git a/test/reach/check/candidates_test.exs b/test/reach/check/candidates_test.exs new file mode 100644 index 0000000..cb7d176 --- /dev/null +++ b/test/reach/check/candidates_test.exs @@ -0,0 +1,256 @@ +defmodule Reach.Check.CandidatesTest do + use ExUnit.Case, async: true + + alias Reach.Check.Candidates + + test "reports repeated implicit map contracts as advisory struct candidates" do + dir = Path.join(System.tmp_dir!(), "reach-candidates-#{System.unique_integer([:positive])}") + File.mkdir_p!(dir) + path = Path.join(dir, "profiles.ex") + + File.write!(path, """ + defmodule Profiles do + def build(user) do + profile = %{id: user.id, name: user.name, email: user.email} + profile.id + profile.email + end + + def profile(user) do + %{id: user.id, name: user.name, email: user.email} + end + + def render(user) do + data = profile(user) + data.id + Map.get(data, :email) + end + end + """) + + project = Reach.Project.from_sources([path], plugins: []) + result = Candidates.run(project, [], top: 10) + + candidate = + Enum.find(result.candidates, fn candidate -> + candidate.kind == :introduce_struct_contract and + candidate.target == "map shape [:email, :id, :name]" + end) + + assert candidate + assert candidate.keys == ["email", "id", "name"] + assert candidate.occurrences == 2 + assert "local" in candidate.sources + assert "return" in candidate.sources + + File.rm_rf(dir) + end + + test "groups similar map shapes as one contract family candidate" do + dir = Path.join(System.tmp_dir!(), "reach-candidates-#{System.unique_integer([:positive])}") + File.mkdir_p!(dir) + path = Path.join(dir, "profiles.ex") + + File.write!(path, """ + defmodule Profiles do + def card(user) do + %{id: user.id, name: user.name, email: user.email} + end + + def export(user) do + %{id: user.id, name: user.name, email: user.email, inserted_at: user.inserted_at} + end + + def render(user) do + card = card(user) + card.id + card.email + + export = export(user) + export.id + export.email + export.inserted_at + end + end + """) + + project = Reach.Project.from_sources([path], plugins: []) + result = Candidates.run(project, [], top: 10) + + candidate = + Enum.find(result.candidates, fn candidate -> + candidate.kind == :introduce_struct_contract and + String.starts_with?(candidate.target, "map shape family") + end) + + assert candidate + assert candidate.keys == ["email", "id", "name"] + assert candidate.occurrences == 2 + assert Enum.any?(candidate.evidence, &String.contains?(&1, "inserted_at")) + + File.rm_rf(dir) + end + + test "promotes repeated maps encoded by Jason as boundary contract candidates" do + dir = Path.join(System.tmp_dir!(), "reach-candidates-#{System.unique_integer([:positive])}") + File.mkdir_p!(dir) + path = Path.join(dir, "json_payloads.ex") + + File.write!(path, """ + defmodule JsonPayloads do + def first(user) do + data = %{id: user.id, name: user.name, email: user.email} + data.id + data.email + Jason.encode!(data) + end + + def second(user) do + data = %{id: user.id, name: user.name, email: user.email} + data.id + data.name + Jason.encode(data) + end + end + """) + + project = Reach.Project.from_sources([path], plugins: [Reach.Plugins.Jason]) + result = Candidates.run(project, [], top: 10) + + candidate = Enum.find(result.candidates, &(&1.kind == :introduce_boundary_contract)) + + assert candidate + assert candidate.keys == ["email", "id", "name"] + + File.rm_rf(dir) + end + + test "promotes repeated payload maps as boundary contract candidates" do + dir = Path.join(System.tmp_dir!(), "reach-candidates-#{System.unique_integer([:positive])}") + File.mkdir_p!(dir) + path = Path.join(dir, "payloads.ex") + + File.write!(path, """ + defmodule Payloads do + def first(user) do + payload = %{id: user.id, name: user.name, email: user.email} + payload.id + payload.email + end + + def second(user) do + payload = %{id: user.id, name: user.name, email: user.email} + payload.id + payload.name + end + end + """) + + project = Reach.Project.from_sources([path], plugins: []) + result = Candidates.run(project, [], top: 10) + + candidate = Enum.find(result.candidates, &(&1.kind == :introduce_boundary_contract)) + + assert candidate + assert candidate.actionability == :review_boundary_contract + assert candidate.keys == ["email", "id", "name"] + assert candidate.suggestion =~ "boundary contract" + + File.rm_rf(dir) + end + + test "promotes repeated options maps as typed map contract candidates" do + dir = Path.join(System.tmp_dir!(), "reach-candidates-#{System.unique_integer([:positive])}") + File.mkdir_p!(dir) + path = Path.join(dir, "options.ex") + + File.write!(path, """ + defmodule Options do + def first(input) do + opts = %{timeout: input.timeout, retries: input.retries, mode: input.mode} + opts.timeout + opts.retries + end + + def second(input) do + opts = %{timeout: input.timeout, retries: input.retries, mode: input.mode} + opts.mode + opts.timeout + end + end + """) + + project = Reach.Project.from_sources([path], plugins: []) + result = Candidates.run(project, [], top: 10) + + candidate = Enum.find(result.candidates, &(&1.kind == :introduce_typed_map_contract)) + + assert candidate + assert candidate.actionability == :review_options_contract + assert candidate.keys == ["mode", "retries", "timeout"] + assert candidate.suggestion =~ "typed map" + + File.rm_rf(dir) + end + + test "does not promote low-signal escaped maps as struct candidates" do + dir = Path.join(System.tmp_dir!(), "reach-candidates-#{System.unique_integer([:positive])}") + File.mkdir_p!(dir) + path = Path.join(dir, "payloads.ex") + + File.write!(path, """ + defmodule Payloads do + def build(user) do + %{id: user.id, name: user.name, email: user.email, role: user.role} + end + + def send(user) do + payload = build(user) + payload.id + HTTP.post(payload) + end + end + """) + + project = Reach.Project.from_sources([path], plugins: []) + result = Candidates.run(project, [], top: 10) + + refute Enum.any?(result.candidates, &(&1.kind == :introduce_struct_contract)) + + File.rm_rf(dir) + end + + test "does not promote template assigns maps as struct candidates" do + dir = Path.join(System.tmp_dir!(), "reach-candidates-#{System.unique_integer([:positive])}") + File.mkdir_p!(dir) + path = Path.join(dir, "emails.ex") + + File.write!(path, """ + defmodule Emails do + def template_assigns(input) do + %{ + branding: input.branding, + customer: input.customer, + subject: input.subject, + to: input.to + } + end + + def render(input) do + assigns = template_assigns(input) + assigns.branding + assigns.customer + assigns.subject + assigns.to + end + end + """) + + project = Reach.Project.from_sources([path], plugins: []) + result = Candidates.run(project, [], top: 10) + + refute Enum.any?(result.candidates, &(&1.kind == :introduce_struct_contract)) + + File.rm_rf(dir) + end +end diff --git a/test/reach/cli/legacy/mix_task_reach_test.exs b/test/reach/cli/legacy/mix_task_reach_test.exs index 12b2dfd..64d0180 100644 --- a/test/reach/cli/legacy/mix_task_reach_test.exs +++ b/test/reach/cli/legacy/mix_task_reach_test.exs @@ -1,6 +1,8 @@ defmodule Mix.Tasks.ReachTest do use ExUnit.Case + import ExUnit.CaptureIO + alias Mix.Tasks.Reach, as: ReachTask @output_dir Path.join(System.tmp_dir!(), "reach_task_test") @@ -11,6 +13,15 @@ defmodule Mix.Tasks.ReachTest do :ok end + test "prints help without analyzing" do + output = capture_io(fn -> ReachTask.run(["--help"]) end) + + assert output =~ "mix reach" + assert output =~ "--format" + refute output =~ "Analyzing" + refute File.exists?(Path.join("reach_report", "index.html")) + end + test "generates HTML report for a file" do path = write_tmp_file("test_mod.ex", """ diff --git a/test/reach/evidence/ast_test.exs b/test/reach/evidence/ast_test.exs new file mode 100644 index 0000000..1ce7d5d --- /dev/null +++ b/test/reach/evidence/ast_test.exs @@ -0,0 +1,65 @@ +defmodule Reach.Evidence.ASTTest do + use ExUnit.Case, async: true + + alias Reach.Evidence.AST + + test "collect walks AST and returns reversed evidence" do + ast = + Code.string_to_quoted!(""" + IO.inspect(:one) + IO.inspect(:two) + """) + + assert [line: 1] = + ast + |> AST.collect(fn + {{:., meta, [{:__aliases__, _, [:IO]}, :inspect]}, _, _args}, acc -> [meta | acc] + _node, acc -> acc + end) + |> hd() + end + + test "reduce walks AST with a custom accumulator" do + ast = + Code.string_to_quoted!(""" + IO.inspect(:one) + IO.inspect(:two) + """) + + assert 2 == + AST.reduce(ast, 0, fn node, count -> + if AST.remote_call?(node, IO, :inspect), do: count + 1, else: count + end) + end + + test "contains and count use predicates" do + ast = Code.string_to_quoted!(~S[String.replace(name, "-", "_")]) + + assert AST.contains?(ast, &AST.remote_call?(&1, String, :replace)) + assert AST.count(ast, &AST.remote_call?(&1, String, :replace)) == 1 + end + + test "matches local, remote, and Erlang calls" do + local = Code.string_to_quoted!("json_safe(value)") + remote = Code.string_to_quoted!("Map.from_struct(value)") + erlang = Code.string_to_quoted!(":json.encode(value)") + + assert AST.call?(local, {:__local__, :json_safe}) + assert AST.call?(remote, {Map, :from_struct}) + assert AST.call?(erlang, {:erlang, :json, :encode}) + end + + test "call descriptor tolerates dynamic aliases" do + ast = Code.string_to_quoted!("__MODULE__.Engine.render(value)") + + assert {:ok, %{module: nil, function: :render, arity: 1}} = AST.call_descriptor(ast) + end + + test "compares and finds AST references" do + ast = Code.string_to_quoted!("value + 1") + expected = Code.string_to_quoted!("value") + + assert AST.references?(ast, expected) + assert AST.same_ast?(expected, Code.string_to_quoted!("value")) + end +end diff --git a/test/reach/evidence/map_contract_test.exs b/test/reach/evidence/map_contract_test.exs new file mode 100644 index 0000000..085a0f4 --- /dev/null +++ b/test/reach/evidence/map_contract_test.exs @@ -0,0 +1,211 @@ +defmodule Reach.Evidence.MapContractTest do + use ExUnit.Case, async: true + + alias Reach.Evidence.MapContract + + test "exposes evidence metadata" do + assert MapContract.family() == :map_contract + assert MapContract.kinds() == [:implicit_map_contract] + end + + test "collects map contracts with creation and later key flow evidence" do + ast = + Code.string_to_quoted!(""" + def build(user) do + profile = %{id: user.id, name: user.name, email: user.email} + Map.get(profile, :id) + profile.name + end + """) + + assert [contract] = MapContract.collect_ast(ast) + assert contract.variable == :profile + assert contract.keys == [:email, :id, :name] + assert Enum.map(contract.reads, & &1.key) |> Enum.sort() == [:id, :name] + assert contract.confidence == :medium + assert contract.role == :unknown + assert contract.key_coverage == 2 / 3 + assert contract.observed_keys == [:id, :name] + assert contract.unused_keys == [:email] + assert contract.read_count == 2 + assert contract.mutation_count == 0 + refute contract.escaped? + end + + test "accounts for updates as stronger contract evidence" do + ast = + Code.string_to_quoted!(""" + def build(user) do + profile = %{id: user.id, name: user.name, email: user.email} + profile = Map.put(profile, :email, String.downcase(profile.email)) + profile.name + end + """) + + assert [contract] = MapContract.collect_ast(ast) + assert [%{key: :email, kind: :update}] = contract.updates + assert contract.confidence == :medium + end + + test "connects fixed-shape return maps to local callsite reads" do + ast = + Code.string_to_quoted!(""" + def profile(user) do + %{id: user.id, name: user.name, email: user.email} + end + + def render(user) do + data = profile(user) + data.id + Map.get(data, :email) + end + """) + + assert [contract] = MapContract.collect_ast(ast) + assert contract.source == :return + assert contract.producer == {:profile, 1} + assert contract.variable == :data + assert Enum.map(contract.reads, & &1.key) |> Enum.sort() == [:email, :id] + assert contract.role == :domain + end + + test "classifies common non-domain map roles" do + ast = + Code.string_to_quoted!(""" + def render(input) do + assigns = %{title: input.title, body: input.body, user: input.user} + assigns.title + assigns.body + end + + def reduce(items) do + acc = %{seen: [], count: 0, errors: []} + acc.seen + acc.count + end + + def send_payload(input) do + payload = %{id: input.id, name: input.name, email: input.email} + payload.id + payload.email + end + """) + + contracts = MapContract.collect_ast(ast) + + assert Enum.find(contracts, &(&1.variable == :assigns)).role == :assigns + assert Enum.find(contracts, &(&1.variable == :acc)).role == :accumulator + assert Enum.find(contracts, &(&1.variable == :payload)).role == :external_payload + end + + test "tracks shallow aliases and escapes" do + ast = + Code.string_to_quoted!(""" + def render(user) do + profile = %{id: user.id, name: user.name, email: user.email} + data = profile + data.id + data.email + send_profile(data) + end + """) + + assert [data] = MapContract.collect_ast(ast) + assert data.variable == :data + assert data.observed_keys == [:email, :id] + assert data.escaped? + assert [%{module: nil, function: :send_profile, arity: 1}] = data.escapes + end + + test "records remote escape targets" do + ast = + Code.string_to_quoted!(""" + def render(user) do + data = %{id: user.id, name: user.name, email: user.email} + data.id + data.email + Jason.encode!(data) + end + """) + + assert [contract] = MapContract.collect_ast(ast) + assert contract.escaped? + assert [%{module: Jason, function: :encode!, arity: 1}] = contract.escapes + end + + test "connects fixed-shape return map bindings to local callsite reads" do + ast = + Code.string_to_quoted!(""" + def profile(user) do + profile = %{id: user.id, name: user.name, email: user.email} + result = profile + result + end + + def render(user) do + data = profile(user) + data.id + Map.get(data, :email) + end + """) + + assert [contract] = MapContract.collect_ast(ast) + assert contract.source == :return + assert contract.producer == {:profile, 1} + assert contract.keys == [:email, :id, :name] + assert contract.observed_keys == [:email, :id] + end + + test "collects project-level remote return shape contracts" do + dir = Path.join(System.tmp_dir!(), "reach-map-project-#{System.unique_integer([:positive])}") + File.mkdir_p!(dir) + producer = Path.join(dir, "producer.ex") + consumer = Path.join(dir, "consumer.ex") + + File.write!(producer, """ + defmodule Accounts.Profile do + def build(user) do + profile = %{id: user.id, name: user.name, email: user.email} + result = profile + result + end + end + """) + + File.write!(consumer, """ + defmodule Web.ProfileView do + def render(user) do + data = Accounts.Profile.build(user) + data.id + data.email + end + end + """) + + project = Reach.Project.from_sources([producer, consumer], plugins: []) + + assert [contract] = + project + |> MapContract.collect_project() + |> Enum.filter(&(&1.source == :cross_file_return)) + + assert contract.file == consumer + assert contract.producer == {Accounts.Profile, :build, 1} + assert contract.consumer == {Web.ProfileView, :render, 1} + assert contract.role == :domain + assert contract.observed_keys == [:email, :id] + + File.rm_rf(dir) + end + + test "ignores map literals without later flow evidence" do + ast = + Code.string_to_quoted!(""" + def build(user) do + %{id: user.id, name: user.name, email: user.email} + end + """) + + assert [] = MapContract.collect_ast(ast) + end +end diff --git a/test/reach/evidence/pattern_runner_test.exs b/test/reach/evidence/pattern_runner_test.exs new file mode 100644 index 0000000..3956f72 --- /dev/null +++ b/test/reach/evidence/pattern_runner_test.exs @@ -0,0 +1,154 @@ +defmodule Reach.Evidence.PatternRunnerTest do + use ExUnit.Case, async: true + + import ExAST.Sigil + + alias Reach.Evidence.Fact + alias Reach.Evidence.PatternRunner + + defmodule CustomEvidence do + @moduledoc false + defstruct [:kind, :message, :replacement, :meta, :confidence] + end + + test "turns pattern matches into evidence structs" do + ast = Code.string_to_quoted!("Enum.map(items, &expand/1) |> List.flatten()") + + assert [%Fact{family: :stdlib, kind: :manual_flat_map, meta: meta}] = + PatternRunner.run( + ast, + [ + flat_map: + {~p[Enum.map(_, _) |> List.flatten()], + fn _match -> + %{ + kind: :manual_flat_map, + message: "use flat_map", + replacement: "Enum.flat_map/2", + confidence: :high + } + end} + ], + family: :stdlib + ) + + assert meta[:line] == 1 + end + + test "builder can skip matches" do + ast = Code.string_to_quoted!("Enum.map(items, &expand/1) |> List.flatten()") + + assert [] = + PatternRunner.run( + ast, + [flat_map: {~p[Enum.map(_, _) |> List.flatten()], fn _match -> nil end}], + family: :stdlib + ) + + assert [] = + PatternRunner.run( + ast, + [flat_map: {~p[Enum.map(_, _) |> List.flatten()], fn _match -> false end}], + family: :stdlib + ) + end + + test "builder-provided metadata overrides match metadata" do + ast = Code.string_to_quoted!("Enum.map(items, &expand/1) |> List.flatten()") + + assert [%Fact{meta: [line: 42]}] = + PatternRunner.run( + ast, + [ + flat_map: + {~p[Enum.map(_, _) |> List.flatten()], + fn _match -> + %{ + kind: :manual_flat_map, + message: "use flat_map", + replacement: "Enum.flat_map/2", + confidence: :high, + meta: [line: 42] + } + end} + ], + family: :stdlib + ) + end + + test "skips ExAST alias collection failures for dynamic import options" do + ast = + Code.string_to_quoted!(""" + defmodule DynamicImport do + import Some.Module, unquote(opts) + end + """) + + assert [] = + PatternRunner.run( + ast, + [ + query: + {~p[String.split(_, "&")], fn _match -> %{kind: :manual_query_parsing} end} + ], + family: :stdlib + ) + end + + test "supports custom evidence structs without family fields" do + ast = Code.string_to_quoted!("Enum.map(items, &expand/1) |> List.flatten()") + + assert [%CustomEvidence{kind: :manual_flat_map}] = + PatternRunner.run( + ast, + [ + flat_map: + {~p[Enum.map(_, _) |> List.flatten()], + fn _match -> + %{ + kind: :manual_flat_map, + message: "use flat_map", + replacement: "Enum.flat_map/2", + confidence: :high + } + end} + ], + evidence_module: CustomEvidence, + family: :stdlib + ) + end + + test "runs multiple patterns in one pass" do + ast = + Code.string_to_quoted!(""" + Enum.map(items, &expand/1) |> List.flatten() + String.split(query, "&") + """) + + specs = [ + flat_map: + {~p[Enum.map(_, _) |> List.flatten()], + fn _match -> + %{ + kind: :manual_flat_map, + message: "use flat_map", + replacement: "Enum.flat_map/2", + confidence: :high + } + end}, + query: + {~p[String.split(_, "&")], + fn _match -> + %{ + kind: :manual_query_parsing, + message: "use URI.decode_query", + replacement: "URI.decode_query/1", + confidence: :high + } + end} + ] + + assert [%{kind: :manual_flat_map}, %{kind: :manual_query_parsing}] = + PatternRunner.run(ast, specs, family: :stdlib) + end +end diff --git a/test/reach/evidence/standard_library_bypass/enum_test.exs b/test/reach/evidence/standard_library_bypass/enum_test.exs new file mode 100644 index 0000000..b014570 --- /dev/null +++ b/test/reach/evidence/standard_library_bypass/enum_test.exs @@ -0,0 +1,53 @@ +defmodule Reach.Evidence.StandardLibraryBypass.EnumTest do + use ExUnit.Case, async: true + + alias Reach.Evidence.StandardLibraryBypass.Enum, as: EnumEvidence + + defp collect(ast), do: EnumEvidence.collect_ast(ast) + + test "collects direct and reduce-based flat_map evidence" do + ast = Code.string_to_quoted!("items |> Enum.map(&expand/1) |> List.flatten()") + + assert [%{kind: :manual_flat_map, confidence: :medium} = evidence] = + EnumEvidence.collect_ast(ast) + + assert evidence.message =~ "recursive flattening" + + ast = Code.string_to_quoted!("items |> Enum.map(&expand/1) |> Enum.concat()") + assert [%{kind: :manual_flat_map, confidence: :high}] = EnumEvidence.collect_ast(ast) + + ast = + Code.string_to_quoted!("Enum.reduce(items, [], fn item, acc -> acc ++ expand(item) end)") + + assert [%{kind: :manual_flat_map_reduce}] = collect(ast) + end + + test "does not flag reduce append of one-element literal lists as flat_map" do + ast = + Code.string_to_quoted!( + "Enum.reduce(items, [], fn item, acc -> acc ++ [transform(item)] end)" + ) + + assert [] = collect(ast) + end + + test "collects order-safe prepend reverse flat_map evidence" do + ast = + Code.string_to_quoted!(""" + items + |> Enum.reduce([], fn item, acc -> Enum.reverse(expand(item), acc) end) + |> Enum.reverse() + """) + + assert [%{kind: :manual_flat_map_prepend_reverse}] = collect(ast) + end + + test "collects frequencies evidence" do + ast = + Code.string_to_quoted!(""" + Enum.reduce(items, %{}, fn item, acc -> Map.update(acc, item, 1, &(&1 + 1)) end) + """) + + assert [%{kind: :manual_frequencies}] = collect(ast) + end +end diff --git a/test/reach/evidence/standard_library_bypass/map_test.exs b/test/reach/evidence/standard_library_bypass/map_test.exs new file mode 100644 index 0000000..9bd12a4 --- /dev/null +++ b/test/reach/evidence/standard_library_bypass/map_test.exs @@ -0,0 +1,42 @@ +defmodule Reach.Evidence.StandardLibraryBypass.MapTest do + use ExUnit.Case, async: true + + alias Reach.Evidence.StandardLibraryBypass.Map, as: MapEvidence + + defp collect(ast), do: MapEvidence.collect_ast(ast) + + test "collects paired Map.has_key? and Map.put update evidence" do + ast = + Code.string_to_quoted!(""" + if Map.has_key?(groups, key) do + Map.put(groups, key, [value | values]) + else + Map.put(groups, key, [value]) + end + """) + + assert [%{kind: :manual_map_update}] = collect(ast) + end + + test "ignores Map.get nil sentinel updates because nil can be a stored value" do + ast = + Code.string_to_quoted!(""" + case Map.get(groups, key) do + nil -> Map.put(groups, key, [value]) + values -> Map.put(groups, key, [value | values]) + end + """) + + assert [] = collect(ast) + end + + test "collects fetch bang followed by put evidence" do + ast = + Code.string_to_quoted!(""" + current = Map.fetch!(state, :count) + Map.put(state, :count, current + 1) + """) + + assert [%{kind: :manual_map_update_bang}] = collect(ast) + end +end diff --git a/test/reach/evidence/standard_library_bypass/path_uri_test.exs b/test/reach/evidence/standard_library_bypass/path_uri_test.exs new file mode 100644 index 0000000..a371b94 --- /dev/null +++ b/test/reach/evidence/standard_library_bypass/path_uri_test.exs @@ -0,0 +1,20 @@ +defmodule Reach.Evidence.StandardLibraryBypass.PathURITest do + use ExUnit.Case, async: true + + alias Reach.Evidence.StandardLibraryBypass.PathURI + + defp collect(ast), do: PathURI.collect_ast(ast) + + test "collects path and URI split evidence" do + ast = Code.string_to_quoted!("path |> String.split(\"/\") |> List.last()") + assert [%{kind: :manual_path_basename}] = collect(ast) + + ast = Code.string_to_quoted!("String.split(query, \"&\")") + assert [%{kind: :manual_query_parsing}] = collect(ast) + end + + test "ignores non-path names" do + ast = Code.string_to_quoted!("slug |> String.split(\"/\") |> List.last()") + assert [] = collect(ast) + end +end diff --git a/test/reach/evidence/standard_library_bypass_test.exs b/test/reach/evidence/standard_library_bypass_test.exs new file mode 100644 index 0000000..49bc811 --- /dev/null +++ b/test/reach/evidence/standard_library_bypass_test.exs @@ -0,0 +1,156 @@ +defmodule Reach.Evidence.StandardLibraryBypassTest do + use ExUnit.Case, async: true + + alias Reach.Evidence.StandardLibraryBypass + + test "exposes evidence metadata" do + assert StandardLibraryBypass.family() == :stdlib + assert :manual_frequencies in StandardLibraryBypass.kinds() + end + + test "collects manual path basename evidence" do + ast = Code.string_to_quoted!("path |> String.split(\"/\") |> List.last()") + + assert [%{kind: :manual_path_basename, replacement: "Path.basename/1"}] = + StandardLibraryBypass.collect_ast(ast) + end + + test "collects manual URL query splitting evidence" do + ast = Code.string_to_quoted!("String.split(query, \"&\")") + + assert [%{kind: :manual_query_parsing, replacement: "URI.decode_query/1"}] = + StandardLibraryBypass.collect_ast(ast) + end + + test "collects map followed by flatten evidence" do + ast = Code.string_to_quoted!("items |> Enum.map(&expand/1) |> List.flatten()") + + assert [%{kind: :manual_flat_map, replacement: "Enum.flat_map/2"}] = + StandardLibraryBypass.collect_ast(ast) + end + + test "collects Map.has_key? conditional update evidence" do + ast = + Code.string_to_quoted!(""" + if Map.has_key?(counts, key) do + Map.put(counts, key, count + 1) + else + Map.put(counts, key, 1) + end + """) + + assert [%{kind: :manual_map_update, replacement: "Map.update/4"}] = + StandardLibraryBypass.collect_ast(ast) + end + + test "collects reduce based frequencies evidence" do + ast = + Code.string_to_quoted!(""" + Enum.reduce(items, %{}, fn item, acc -> + Map.update(acc, item, 1, &(&1 + 1)) + end) + """) + + assert [%{kind: :manual_frequencies, replacement: "Enum.frequencies/1"}] = + StandardLibraryBypass.collect_ast(ast) + end + + test "collects reduce based frequencies_by evidence" do + ast = + Code.string_to_quoted!(""" + Enum.reduce(users, %{}, fn user, acc -> + count = Map.get(acc, user.role, 0) + Map.put(acc, user.role, count + 1) + end) + """) + + assert [%{kind: :manual_frequencies_by, replacement: "Enum.frequencies_by/2"}] = + StandardLibraryBypass.collect_ast(ast) + end + + test "collects reduce based flat_map evidence" do + ast = + Code.string_to_quoted!(""" + Enum.reduce(items, [], fn item, acc -> + acc ++ expand(item) + end) + """) + + assert [%{kind: :manual_flat_map_reduce, replacement: "Enum.flat_map/2"}] = + StandardLibraryBypass.collect_ast(ast) + end + + test "collects reduce reverse chunk followed by reverse evidence" do + ast = + Code.string_to_quoted!(""" + items + |> Enum.reduce([], fn item, acc -> + Enum.reverse(expand(item), acc) + end) + |> Enum.reverse() + """) + + assert [%{kind: :manual_flat_map_prepend_reverse, replacement: "Enum.flat_map/2"}] = + StandardLibraryBypass.collect_ast(ast) + end + + test "does not flag unsafe chunk prepend followed by reverse" do + ast = + Code.string_to_quoted!(""" + items + |> Enum.reduce([], fn item, acc -> + expand(item) ++ acc + end) + |> Enum.reverse() + """) + + assert [] = StandardLibraryBypass.collect_ast(ast) + end + + test "collects fetch bang followed by put evidence" do + ast = + Code.string_to_quoted!(""" + current = Map.fetch!(state, :count) + Map.put(state, :count, current + 1) + """) + + assert [%{kind: :manual_map_update_bang, replacement: "Map.update!/3"}] = + StandardLibraryBypass.collect_ast(ast) + end + + test "collects inline fetch bang followed by put evidence" do + ast = Code.string_to_quoted!("Map.put(state, :count, Map.fetch!(state, :count) + 1)") + + assert [%{kind: :manual_map_update_bang, replacement: "Map.update!/3"}] = + StandardLibraryBypass.collect_ast(ast) + end + + test "does not flag reduce when callback has extra payload logic" do + ast = + Code.string_to_quoted!(""" + Enum.reduce(items, %{}, fn item, acc -> + Map.update(acc, item, [item], &[item | &1]) + end) + """) + + assert [] = StandardLibraryBypass.collect_ast(ast) + end + + test "does not flag unrelated Map.put branches" do + ast = + Code.string_to_quoted!(""" + case Map.get(groups, key) do + nil -> Map.put(other, key, [value]) + values -> Map.put(groups, other_key, [value | values]) + end + """) + + assert [] = StandardLibraryBypass.collect_ast(ast) + end + + test "ignores slash splits for non-path variables" do + ast = Code.string_to_quoted!("slug |> String.split(\"/\") |> List.last()") + + assert [] = StandardLibraryBypass.collect_ast(ast) + end +end diff --git a/test/reach/evidence_test.exs b/test/reach/evidence_test.exs new file mode 100644 index 0000000..acbe949 --- /dev/null +++ b/test/reach/evidence_test.exs @@ -0,0 +1,20 @@ +defmodule Reach.EvidenceTest do + use ExUnit.Case, async: true + + test "discovers built-in AST evidence providers" do + providers = Reach.Evidence.ast_providers([]) + + assert Reach.Evidence.StandardLibraryBypass in providers + assert Reach.Evidence.MapContract in providers + end + + test "discovers plugin AST evidence providers" do + providers = Reach.Evidence.ast_providers([Reach.Plugins.Jason]) + + assert Reach.Plugins.Jason.Evidence.HandRolledEncoder in providers + end + + test "filters providers by family" do + assert Reach.Evidence.ast_providers_for(:stdlib, []) == [Reach.Evidence.StandardLibraryBypass] + end +end diff --git a/test/reach/frontend/elixir_dynamic_defimpl_test.exs b/test/reach/frontend/elixir_dynamic_defimpl_test.exs new file mode 100644 index 0000000..d45861d --- /dev/null +++ b/test/reach/frontend/elixir_dynamic_defimpl_test.exs @@ -0,0 +1,20 @@ +defmodule Reach.Frontend.ElixirDynamicDefimplTest do + use ExUnit.Case, async: true + + test "does not crash on dynamic defimpl protocol names" do + graph = + Reach.string_to_graph!(""" + defmodule Example do + defmacro derive_encoder(encoder) do + quote do + defimpl unquote(encoder), for: Ecto.Association.NotLoaded do + def encode(value, opts), do: Jason.Encode.map(%{}, opts) + end + end + end + end + """) + + assert is_list(Reach.nodes(graph)) + end +end diff --git a/test/reach/plugin/evidence_refinement_test.exs b/test/reach/plugin/evidence_refinement_test.exs new file mode 100644 index 0000000..7c2cde9 --- /dev/null +++ b/test/reach/plugin/evidence_refinement_test.exs @@ -0,0 +1,62 @@ +defmodule Reach.Plugin.EvidenceRefinementTest do + use ExUnit.Case, async: true + + alias Reach.Evidence.Fact + alias Reach.Evidence.MapContract + alias Reach.Plugin + + defmodule RolePlugin do + @behaviour Reach.Plugin + + @impl true + def analyze(_nodes, _opts), do: [] + + @impl true + def classify_effect(_node), do: nil + + @impl true + def refine_evidence(%MapContract.Contract{variable: :data}, _context) do + %{role: :external_payload} + end + + def refine_evidence(%Fact{} = fact, _context) do + %{fact | confidence: :medium} + end + + def refine_evidence(_evidence, _context), do: :unchanged + end + + test "plugins can refine evidence maps or structs" do + fact = %Fact{family: :stdlib, kind: :manual_flat_map, confidence: :high} + + assert %Fact{confidence: :medium} = Plugin.refine_evidence([RolePlugin], fact, %{}) + end + + test "map contract collection applies plugin role refinements" do + ast = + Code.string_to_quoted!(""" + def build(user) do + data = %{id: user.id, name: user.name, email: user.email} + data.id + data.email + end + """) + + assert [%{role: :external_payload}] = MapContract.collect_ast(ast, plugins: [RolePlugin]) + end + + test "Jason plugin classifies encoded maps as external payloads" do + ast = + Code.string_to_quoted!(""" + def build(user) do + data = %{id: user.id, name: user.name, email: user.email} + data.id + data.email + Jason.encode!(data) + end + """) + + assert [%{role: :external_payload}] = + MapContract.collect_ast(ast, plugins: [Reach.Plugins.Jason]) + end +end diff --git a/test/reach/plugins/jason/evidence/hand_rolled_encoder_test.exs b/test/reach/plugins/jason/evidence/hand_rolled_encoder_test.exs new file mode 100644 index 0000000..fc171c9 --- /dev/null +++ b/test/reach/plugins/jason/evidence/hand_rolled_encoder_test.exs @@ -0,0 +1,88 @@ +defmodule Reach.Plugins.Jason.Evidence.HandRolledEncoderTest do + use ExUnit.Case, async: true + + alias Reach.Plugins.Jason.Evidence.HandRolledEncoder + + test "exposes evidence metadata" do + assert HandRolledEncoder.family() == :jason + assert :hand_rolled_json_sanitizer in HandRolledEncoder.kinds() + end + + test "collects recursive json_safe evidence" do + ast = + Code.string_to_quoted!(""" + def json_safe(%_{} = value), do: value |> Map.from_struct() |> json_safe() + def json_safe(value) when is_list(value), do: Enum.map(value, &json_safe/1) + """) + + assert [%{kind: :hand_rolled_json_sanitizer, replacement: "Jason.Encoder"} | _] = + HandRolledEncoder.collect_ast(ast) + end + + test "collects manual JSON pretty-printer evidence" do + ast = + Code.string_to_quoted!(""" + defp indent_json(value) do + value + |> to_string() + |> String.replace("\\\"", "\\\\\\\"") + end + """) + + assert [%{kind: :hand_rolled_json_encoder, replacement: "Jason.encode/2"}] = + HandRolledEncoder.collect_ast(ast) + end + + test "ignores stdlib JSON wrappers" do + ast = + Code.string_to_quoted!(""" + def encode!(data) do + data + |> :json.encode() + |> IO.iodata_to_binary() + end + """) + + assert [] = HandRolledEncoder.collect_ast(ast) + end + + test "collects Jason encoders that delegate to direct to_map projections" do + ast = + Code.string_to_quoted!(""" + defmodule Example do + defimpl Jason.Encoder, for: Example do + def encode(value, opts), do: Jason.Encode.map(Example.to_map(value), opts) + end + + def to_map(value), do: Map.from_struct(value) + end + """) + + assert [%{kind: :manual_jason_encoder_map, replacement: "@derive Jason.Encoder"}] = + HandRolledEncoder.collect_ast(ast) + end + + test "ignores Jason encoders backed by non-trivial to_map helpers" do + ast = + Code.string_to_quoted!(""" + defmodule Example do + defimpl Jason.Encoder, for: Example do + def encode(value, opts), do: Jason.Encode.map(Example.to_map(value), opts) + end + + def to_map(value), do: %{name: value.name} + end + """) + + assert [] = HandRolledEncoder.collect_ast(ast) + end + + test "ignores unrelated to_map helpers" do + ast = + Code.string_to_quoted!(""" + def to_map(value), do: %{value: value} + """) + + assert [] = HandRolledEncoder.collect_ast(ast) + end +end diff --git a/test/reach/plugins/jason/smells/hand_rolled_encoder_test.exs b/test/reach/plugins/jason/smells/hand_rolled_encoder_test.exs new file mode 100644 index 0000000..e5655bb --- /dev/null +++ b/test/reach/plugins/jason/smells/hand_rolled_encoder_test.exs @@ -0,0 +1,32 @@ +defmodule Reach.Plugins.Jason.Smells.HandRolledEncoderTest do + use ExUnit.Case, async: true + + alias Reach.Check.Smells + alias Reach.Plugins.Jason + alias Reach.Plugins.Jason.Smells.HandRolledEncoder + + test "Jason plugin contributes the hand-rolled encoder smell" do + assert HandRolledEncoder in Reach.Plugin.smell_checks([Jason]) + end + + test "detects hand-rolled JSON sanitizers only when the Jason plugin is enabled" do + path = Path.join(System.tmp_dir!(), "jason_smell_#{System.unique_integer([:positive])}.ex") + + File.write!(path, """ + defmodule Example do + def json_safe(%_{} = value), do: value |> Map.from_struct() |> json_safe() + def json_safe(value) when is_list(value), do: Enum.map(value, &json_safe/1) + def json_safe(value), do: value + end + """) + + project = Reach.Project.from_sources([path], plugins: [Jason]) + + assert [%{kind: :hand_rolled_json_sanitizer}] = Smells.run(project, []) + + project_without_plugins = Reach.Project.from_sources([path], plugins: []) + assert [] = Smells.run(project_without_plugins, []) + + File.rm(path) + end +end diff --git a/test/reach/scripts/evidence_corpus_scan_test.exs b/test/reach/scripts/evidence_corpus_scan_test.exs new file mode 100644 index 0000000..f27ec0f --- /dev/null +++ b/test/reach/scripts/evidence_corpus_scan_test.exs @@ -0,0 +1,117 @@ +defmodule Reach.Scripts.EvidenceCorpusScanTest do + use ExUnit.Case, async: true + + test "evidence corpus scanner supports text and json output" do + dir = + Path.join(System.tmp_dir!(), "reach-evidence-scan-#{System.unique_integer([:positive])}") + + lib = Path.join(dir, "lib") + File.mkdir_p!(lib) + + File.write!(Path.join(lib, "sample.ex"), """ + defmodule Sample do + def run(items), do: items |> Enum.map(&List.wrap/1) |> List.flatten() + end + """) + + assert {text, 0} = scan(["--kind", "stdlib", dir]) + assert text =~ "manual_flat_map=1" + + assert {json, 0} = scan(["--kind", "stdlib", "--format", "json", dir]) + assert [result] = Jason.decode!(json) + assert result["kind"] == "manual_flat_map" + assert result["family"] == "stdlib" + + File.rm_rf(dir) + end + + test "evidence corpus scanner suppresses parser warnings" do + dir = + Path.join( + System.tmp_dir!(), + "reach-evidence-warning-scan-#{System.unique_integer([:positive])}" + ) + + lib = Path.join(dir, "lib") + File.mkdir_p!(lib) + + File.write!(Path.join(lib, "sample.ex"), """ + defmodule Sample do + def deprecated_escape, do: "\\x{FF}" + def charlist, do: 'abc' + def run(items), do: items |> Enum.map(&List.wrap/1) |> List.flatten() + end + """) + + assert {json, 0} = scan(["--kind", "stdlib", "--format", "json", dir]) + assert [_result] = Jason.decode!(json) + refute json =~ "warning:" + + File.rm_rf(dir) + end + + test "evidence corpus scanner includes map-contract structured fields" do + dir = Path.join(System.tmp_dir!(), "reach-map-scan-#{System.unique_integer([:positive])}") + lib = Path.join(dir, "lib") + File.mkdir_p!(lib) + + File.write!(Path.join(lib, "sample.ex"), """ + defmodule Sample do + def build(user) do + data = %{id: user.id, name: user.name, email: user.email} + data.id + data.email + end + end + """) + + assert {json, 0} = scan(["--kind", "map-contract", "--format", "json", dir]) + assert [result] = Jason.decode!(json) + assert result["family"] == "map_contract" + assert result["keys"] == ["email", "id", "name"] + assert result["variable"] == "data" + assert result["role"] == "unknown" + assert result["observed_keys"] == ["email", "id"] + assert result["unused_keys"] == ["name"] + assert result["read_count"] == 2 + assert result["mutation_count"] == 0 + assert result["escaped?"] == false + assert_in_delta result["key_coverage"], 2 / 3, 0.001 + + File.rm_rf(dir) + end + + test "evidence corpus scanner applies plugin refinements" do + dir = + Path.join(System.tmp_dir!(), "reach-map-refine-scan-#{System.unique_integer([:positive])}") + + lib = Path.join(dir, "lib") + File.mkdir_p!(lib) + + File.write!(Path.join(lib, "sample.ex"), """ + defmodule Sample do + def build(user) do + data = %{id: user.id, name: user.name, email: user.email} + data.id + data.email + Jason.encode!(data) + end + end + """) + + assert {json, 0} = scan(["--kind", "map-contract", "--format", "json", dir]) + assert [result] = Jason.decode!(json) + assert result["role"] == "external_payload" + + assert [%{"module" => "Elixir.Jason", "function" => "encode!", "arity" => 1}] = + result["escapes"] + + File.rm_rf(dir) + end + + defp scan(args) do + System.cmd("mix", ["run", "scripts/evidence_corpus_scan.exs", "--" | args], + stderr_to_stdout: true + ) + end +end diff --git a/test/reach/smell/checks/smell_test.exs b/test/reach/smell/checks/smell_test.exs index 73ebc1f..58e9694 100644 --- a/test/reach/smell/checks/smell_test.exs +++ b/test/reach/smell/checks/smell_test.exs @@ -22,6 +22,7 @@ defmodule Reach.SmellTest do assert Reach.Smell.Checks.ConfigPhase in checks assert Reach.Smell.Checks.PipelineWaste in checks assert Reach.Smell.Checks.TrivialDelegate in checks + assert Reach.Smell.Checks.StandardLibraryBypass in checks end end