[Parser] Missing OCR correction for dropped characters like 'p na' -> 'pidana'

## Parser Improvement: Missing OCR correction for dropped characters like 'p na' -> 'pidana'

**Severity:** MEDIUM
**File:** `ocr_correct.py`
**Function:** `correct_ocr_errors`
**Estimated errors fixed:** 1

### Current Behavior

Common Indonesian legal terms with OCR-dropped characters (e.g., 'p na' for 'pidana', 'ket' for truncated words) are not corrected. The word 'pidana' is extremely common in legal text and its OCR corruption 'p na' (missing 'ida') should be caught.

### Proposed Fix

Add patterns for common legal terms where OCR drops middle characters. 'pidana' is the highest-priority term since it appears hundreds of times in criminal law documents. Use word boundary matching to avoid false positives.

### Code Before

```python
    # Common word-level OCR errors in Indonesian legal text
    (re.compile(r'\bFRESIDEN\b', re.IGNORECASE), 'PRESIDEN'),     # P->F OCR confusion
```

### Code After

```python
    # Common word-level OCR errors in Indonesian legal text
    # Dropped characters in common legal terms
    (re.compile(r'\bp\s+na\b', re.IGNORECASE), 'pidana'),  # p na -> pidana (dropped 'ida')
    (re.compile(r'\bpida\s+na\b', re.IGNORECASE), 'pidana'),  # pida na -> pidana
    (re.compile(r'\bFRESIDEN\b', re.IGNORECASE), 'PRESIDEN'),     # P->F OCR confusion
```

---

_Generated by the Pasal.id Correction Agent (Opus 4.6) after analyzing 5 parser feedback entries._


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Parser] Missing OCR correction for dropped characters like 'p na' -> 'pidana' #18

Parser Improvement: Missing OCR correction for dropped characters like 'p na' -> 'pidana'

Current Behavior

Proposed Fix

Code Before

Code After

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Parser] Missing OCR correction for dropped characters like 'p na' -> 'pidana' #18

Description

Parser Improvement: Missing OCR correction for dropped characters like 'p na' -> 'pidana'

Current Behavior

Proposed Fix

Code Before

Code After

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions