Add template linting and Word artifact sanitization#19
Open
houfu wants to merge 1 commit into
Conversation
Templates authored in Microsoft Word silently introduce three classes of
artifacts that broke the analyzer and renderer: smart quotes inside
Jinja tag bodies (Word autocorrect), docxtpl block prefixes
({%p / {%tr / {%tc) that the regex walker did not recognize, and
undefined Jinja callables that hard-failed only at render time. The
analyzer now runs a non-mutating sanitize-for-analysis pass before the
two-pass walker, normalizes these artifacts in the analysis-time string,
and records each correction (plus undefined callables) as a warning
string in manifest.yaml. A new --lint flag reports warnings without
writing the manifest and exits non-zero when any are found.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds comprehensive linting and sanitization capabilities to the template analyzer to detect and correct Word-introduced artifacts and undefined Jinja callables that would fail at render time.
Key Changes
Sanitization pipeline: Added
sanitize_for_analysis()to normalize Word autocorrect artifacts (smart quotes, docxtpl prefixes like{%p) inside Jinja tag bodies before analysis. Original template files remain unchanged; sanitization is analysis-time only.Callable detection: Implemented
detect_undefined_callables()to flag Jinja function calls in templates that aren't registered with the default Jinja2 + docxtpl environment. Maintains aKNOWN_CALLABLESset of built-in globals and docxtpl helpers (range,dict,RichText,Subdoc,InlineImage, etc.).Lint mode: Added
--lintCLI flag that runs extract → sanitize → callable-detect, prints warnings to stdout, and exits non-zero if any are found. No manifest is written in this mode, allowing the Orchestrator to check for issues without side effects.Warning collection: Modified
build_manifest()to accept and include warnings in the output manifest under awarningskey. Each warning is formatted as"<code>: <detail>"with codes:smart_quote,docxtpl_prefix,undefined_callable.Test coverage: Added
test_word_artifacts()to verify sanitization and callable detection work correctly, andtest_lint_flag()to verify--lintmode behavior.Documentation: Updated SKILL.md with sanitization/lint warning details and lint-only invocation instructions.
Implementation Details
_iter_tag_bodies()), preserving legitimate curly quotes in prose.{%p,{%tr,{%tc) is regex-based and generates warnings for each occurrence.name(patterns in tag bodies, filtering out Jinja keywords and known globals.--lintmode to ensure fresh analysis on every invocation.https://claude.ai/code/session_01JZSs2LFRxiQDXQbM2ErRys