Analyzer should sanitize smart quotes, {%p ...%} tags, and undefined filters before parsing

## Problem

When users author templates in Microsoft Word, three things commonly slip through that silently break both the analyzer and the renderer:

**1. Smart quotes around choice literals.** Word's autocorrect converts `"siac"` into `“siac”` (U+201C / U+201D). Inside a Jinja tag this is no longer valid syntax — e.g. `{% if jurisdiction_type == “siac” %}`. The current analyzer regex in `scripts/analyze.py` (`RE_EQUALITY = re.compile(r"(\w+)\s*==\s*['\"](.+?)['\"]")`) only recognises straight quotes, so the conditional is dropped from the manifest entirely, and docxtpl then fails or mis-renders at runtime.

**2. `{%p ... %}` paragraph-removing tags.** This is docxtpl-native syntax for "remove the containing paragraph after rendering." The analyzer regex matches `\{%[-\s]*if` / `\{%[-\s]*for`, which does not recognise the `p` prefix. Result: every `{%p if %}` / `{%p for %}` block is invisible to the analyzer, so the manifest shows zero conditionals/loops even though the template is full of them.

**3. Undefined Jinja filters/functions.** Templates occasionally reference helpers like `{{ country_name(governing_law) }}` that aren't registered with the environment. The analyzer doesn't warn about these, and rendering hard-fails later.

## Repro

Given a Word template with:

```jinja
{%p if jurisdiction_type == “siac” %}
... SIAC clause ...
{%p endif %}
{%p for party in parties_list %}
{{ party.name }}, incorporated in {{ party.place_of_incorporation }}
{%p endfor %}
This Agreement is governed by {{ country_name(governing_law) }}.
```

Run:

```bash
python scripts/analyze.py path/to/dir
```

**Expected:** 1 conditional (equality-gated on `jurisdiction_type`), 1 loop with sub-variables, and a warning about the undefined `country_name` callable.

**Actual:** 0 conditionals, 0 loops, and `party.name` / `party.place_of_incorporation` end up as flat top-level dotted variables.

## Suggested fix — add a lint/sanitize pass in Phase 2

Before the two-pass analyzer runs, sanitize the extracted text within Jinja tag boundaries and emit warnings:

- **Smart-quote normaliser:** within every `{% ... %}` / `{{ ... }}` region, map `U+201C U+201D U+2018 U+2019` → straight `"` / `'`. Warn with "replaced smart quotes in tag at offset N".
- **`{%p` support:** either extend the regexes to `\{%p?[-\s]*if\s...` (and the same for `for` / `else` / `endif` / `endfor`), or strip the `p` in a pre-pass and restore it before rendering. Equivalent handling for `{%tr ...%}` (table-row) and `{%tc ...%}` (table-cell) docxtpl tags while you're in there.
- **Undefined-callable warning:** scan `{{ ... }}` bodies for `\w+\(` patterns and cross-reference against the environment's globals/filters. Flag unknowns so the author can either register the helper or rewrite the expression.

Ideally the fixer can run in two modes: `--lint` (report only, non-zero exit on issues) and `--fix` (write a sanitized copy next to the template). The analyzer would consume the sanitized copy and cache the diff in the manifest, so the original template-of-record is never mutated.

## Why this matters

All three issues are authoring accidents that Word introduces silently. In a real session I hit all three in a single uploaded template before the analyzer produced a usable manifest. A lint/fix pass in the analyzer would have caught them upfront with readable warnings instead of leaving an empty `conditionals: []` / `loops: []` manifest and failing at render time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analyzer should sanitize smart quotes, {%p ...%} tags, and undefined filters before parsing #16

Problem

Repro

Suggested fix — add a lint/sanitize pass in Phase 2

Why this matters

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Analyzer should sanitize smart quotes, {%p ...%} tags, and undefined filters before parsing #16

Description

Problem

Repro

Suggested fix — add a lint/sanitize pass in Phase 2

Why this matters

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions