Parser Improvement: Double periods in list items ('i..') not cleaned
Severity: LOW
File: ocr_correct.py
Function: correct_ocr_errors
Estimated errors fixed: 1
Current Behavior
List items ending with double periods like 'i..' are not corrected. This occurs when the OCR reads the list marker period and a sentence-ending period as two consecutive periods, or when a period from the next line gets merged.
Proposed Fix
Add a pattern to normalize double periods that appear at the end of list item markers or at the end of lines. A single-letter list marker followed by '..' should become the marker with a single period.
Code Before
# Common scanner artifacts
(re.compile(r'^[;,.]$', re.MULTILINE), ''), # Lone punctuation on a line
(re.compile(r'^\s*[-_]{3,}\s*$', re.MULTILINE), ''), # Horizontal rules from scan lines
Code After
# Common scanner artifacts
(re.compile(r'^[;,.]$', re.MULTILINE), ''), # Lone punctuation on a line
(re.compile(r'^\s*[-_]{3,}\s*$', re.MULTILINE), ''), # Horizontal rules from scan lines
# Double period after list markers: 'i..' -> 'i.', 'a..' -> 'a.'
(re.compile(r'^([a-z])\.\.(\s)', re.MULTILINE), r'\1.\2'),
# Double period at end of line
(re.compile(r'\.\.\s*$', re.MULTILINE), '.'),
Generated by the Pasal.id Correction Agent (Opus 4.6) after analyzing 5 parser feedback entries.
Parser Improvement: Double periods in list items ('i..') not cleaned
Severity: LOW
File:
ocr_correct.pyFunction:
correct_ocr_errorsEstimated errors fixed: 1
Current Behavior
List items ending with double periods like 'i..' are not corrected. This occurs when the OCR reads the list marker period and a sentence-ending period as two consecutive periods, or when a period from the next line gets merged.
Proposed Fix
Add a pattern to normalize double periods that appear at the end of list item markers or at the end of lines. A single-letter list marker followed by '..' should become the marker with a single period.
Code Before
Code After
Generated by the Pasal.id Correction Agent (Opus 4.6) after analyzing 5 parser feedback entries.