New features:
- Boundary repair: "ats th." → "at sth." (shifted OCR word boundaries)
Tries shifting 1-2 chars between adjacent words, accepts if result
includes a known abbreviation or produces better dictionary matches
- Context split: "anew book" → "a new book" (ambiguous word merges)
Explicit allow/deny list for article+word patterns (alive, alone, etc.)
- Abbreviation awareness: 120+ known abbreviations (sth, sb, adj, etc.)
are now recognized as valid words, preventing false corrections
- Quality gate: boundary repairs only accepted when result scores
higher than original (known words + abbreviations)
40 tests passing, all edge cases covered.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>