Add template linting and Word artifact sanitization by houfu · Pull Request #19 · houfu/coquill

houfu · 2026-05-06T14:57:16Z

Summary

This PR adds comprehensive linting and sanitization capabilities to the template analyzer to detect and correct Word-introduced artifacts and undefined Jinja callables that would fail at render time.

Key Changes

Sanitization pipeline: Added sanitize_for_analysis() to normalize Word autocorrect artifacts (smart quotes, docxtpl prefixes like {%p) inside Jinja tag bodies before analysis. Original template files remain unchanged; sanitization is analysis-time only.
Callable detection: Implemented detect_undefined_callables() to flag Jinja function calls in templates that aren't registered with the default Jinja2 + docxtpl environment. Maintains a KNOWN_CALLABLES set of built-in globals and docxtpl helpers (range, dict, RichText, Subdoc, InlineImage, etc.).
Lint mode: Added --lint CLI flag that runs extract → sanitize → callable-detect, prints warnings to stdout, and exits non-zero if any are found. No manifest is written in this mode, allowing the Orchestrator to check for issues without side effects.
Warning collection: Modified build_manifest() to accept and include warnings in the output manifest under a warnings key. Each warning is formatted as "<code>: <detail>" with codes: smart_quote, docxtpl_prefix, undefined_callable.
Test coverage: Added test_word_artifacts() to verify sanitization and callable detection work correctly, and test_lint_flag() to verify --lint mode behavior.
Documentation: Updated SKILL.md with sanitization/lint warning details and lint-only invocation instructions.

Implementation Details

Smart quote replacement is scoped to tag bodies only (via _iter_tag_bodies()), preserving legitimate curly quotes in prose.
Docxtpl prefix stripping ({%p, {%tr, {%tc) is regex-based and generates warnings for each occurrence.
Callable detection uses regex to find name( patterns in tag bodies, filtering out Jinja keywords and known globals.
Cache check is skipped in --lint mode to ensure fresh analysis on every invocation.

https://claude.ai/code/session_01JZSs2LFRxiQDXQbM2ErRys

Templates authored in Microsoft Word silently introduce three classes of artifacts that broke the analyzer and renderer: smart quotes inside Jinja tag bodies (Word autocorrect), docxtpl block prefixes ({%p / {%tr / {%tc) that the regex walker did not recognize, and undefined Jinja callables that hard-failed only at render time. The analyzer now runs a non-mutating sanitize-for-analysis pass before the two-pass walker, normalizes these artifacts in the analysis-time string, and records each correction (plus undefined callables) as a warning string in manifest.yaml. A new --lint flag reports warnings without writing the manifest and exits non-zero when any are found.

houfu linked an issue May 6, 2026 that may be closed by this pull request

Analyzer should sanitize smart quotes, {%p ...%} tags, and undefined filters before parsing #16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add template linting and Word artifact sanitization#19

Add template linting and Word artifact sanitization#19
houfu wants to merge 1 commit into
16-analyzer-should-sanitize-smart-quotes-p-tags-and-undefined-filters-before-parsingfrom
claude/review-issue-16-A5SgZ

houfu commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

houfu commented May 6, 2026

Summary

Key Changes

Implementation Details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants