Problem
When users author templates in Microsoft Word, three things commonly slip through that silently break both the analyzer and the renderer:
1. Smart quotes around choice literals. Word's autocorrect converts "siac" into “siac” (U+201C / U+201D). Inside a Jinja tag this is no longer valid syntax — e.g. {% if jurisdiction_type == “siac” %}. The current analyzer regex in scripts/analyze.py (RE_EQUALITY = re.compile(r"(\w+)\s*==\s*['\"](.+?)['\"]")) only recognises straight quotes, so the conditional is dropped from the manifest entirely, and docxtpl then fails or mis-renders at runtime.
2. {%p ... %} paragraph-removing tags. This is docxtpl-native syntax for "remove the containing paragraph after rendering." The analyzer regex matches \{%[-\s]*if / \{%[-\s]*for, which does not recognise the p prefix. Result: every {%p if %} / {%p for %} block is invisible to the analyzer, so the manifest shows zero conditionals/loops even though the template is full of them.
3. Undefined Jinja filters/functions. Templates occasionally reference helpers like {{ country_name(governing_law) }} that aren't registered with the environment. The analyzer doesn't warn about these, and rendering hard-fails later.
Repro
Given a Word template with:
{%p if jurisdiction_type == “siac” %}
... SIAC clause ...
{%p endif %}
{%p for party in parties_list %}
{{ party.name }}, incorporated in {{ party.place_of_incorporation }}
{%p endfor %}
This Agreement is governed by {{ country_name(governing_law) }}.
Run:
python scripts/analyze.py path/to/dir
Expected: 1 conditional (equality-gated on jurisdiction_type), 1 loop with sub-variables, and a warning about the undefined country_name callable.
Actual: 0 conditionals, 0 loops, and party.name / party.place_of_incorporation end up as flat top-level dotted variables.
Suggested fix — add a lint/sanitize pass in Phase 2
Before the two-pass analyzer runs, sanitize the extracted text within Jinja tag boundaries and emit warnings:
- Smart-quote normaliser: within every
{% ... %} / {{ ... }} region, map U+201C U+201D U+2018 U+2019 → straight " / '. Warn with "replaced smart quotes in tag at offset N".
{%p support: either extend the regexes to \{%p?[-\s]*if\s... (and the same for for / else / endif / endfor), or strip the p in a pre-pass and restore it before rendering. Equivalent handling for {%tr ...%} (table-row) and {%tc ...%} (table-cell) docxtpl tags while you're in there.
- Undefined-callable warning: scan
{{ ... }} bodies for \w+\( patterns and cross-reference against the environment's globals/filters. Flag unknowns so the author can either register the helper or rewrite the expression.
Ideally the fixer can run in two modes: --lint (report only, non-zero exit on issues) and --fix (write a sanitized copy next to the template). The analyzer would consume the sanitized copy and cache the diff in the manifest, so the original template-of-record is never mutated.
Why this matters
All three issues are authoring accidents that Word introduces silently. In a real session I hit all three in a single uploaded template before the analyzer produced a usable manifest. A lint/fix pass in the analyzer would have caught them upfront with readable warnings instead of leaving an empty conditionals: [] / loops: [] manifest and failing at render time.
Problem
When users author templates in Microsoft Word, three things commonly slip through that silently break both the analyzer and the renderer:
1. Smart quotes around choice literals. Word's autocorrect converts
"siac"into“siac”(U+201C / U+201D). Inside a Jinja tag this is no longer valid syntax — e.g.{% if jurisdiction_type == “siac” %}. The current analyzer regex inscripts/analyze.py(RE_EQUALITY = re.compile(r"(\w+)\s*==\s*['\"](.+?)['\"]")) only recognises straight quotes, so the conditional is dropped from the manifest entirely, and docxtpl then fails or mis-renders at runtime.2.
{%p ... %}paragraph-removing tags. This is docxtpl-native syntax for "remove the containing paragraph after rendering." The analyzer regex matches\{%[-\s]*if/\{%[-\s]*for, which does not recognise thepprefix. Result: every{%p if %}/{%p for %}block is invisible to the analyzer, so the manifest shows zero conditionals/loops even though the template is full of them.3. Undefined Jinja filters/functions. Templates occasionally reference helpers like
{{ country_name(governing_law) }}that aren't registered with the environment. The analyzer doesn't warn about these, and rendering hard-fails later.Repro
Given a Word template with:
Run:
Expected: 1 conditional (equality-gated on
jurisdiction_type), 1 loop with sub-variables, and a warning about the undefinedcountry_namecallable.Actual: 0 conditionals, 0 loops, and
party.name/party.place_of_incorporationend up as flat top-level dotted variables.Suggested fix — add a lint/sanitize pass in Phase 2
Before the two-pass analyzer runs, sanitize the extracted text within Jinja tag boundaries and emit warnings:
{% ... %}/{{ ... }}region, mapU+201C U+201D U+2018 U+2019→ straight"/'. Warn with "replaced smart quotes in tag at offset N".{%psupport: either extend the regexes to\{%p?[-\s]*if\s...(and the same forfor/else/endif/endfor), or strip thepin a pre-pass and restore it before rendering. Equivalent handling for{%tr ...%}(table-row) and{%tc ...%}(table-cell) docxtpl tags while you're in there.{{ ... }}bodies for\w+\(patterns and cross-reference against the environment's globals/filters. Flag unknowns so the author can either register the helper or rewrite the expression.Ideally the fixer can run in two modes:
--lint(report only, non-zero exit on issues) and--fix(write a sanitized copy next to the template). The analyzer would consume the sanitized copy and cache the diff in the manifest, so the original template-of-record is never mutated.Why this matters
All three issues are authoring accidents that Word introduces silently. In a real session I hit all three in a single uploaded template before the analyzer produced a usable manifest. A lint/fix pass in the analyzer would have caught them upfront with readable warnings instead of leaving an empty
conditionals: []/loops: []manifest and failing at render time.