Words like "probieren)" or "Englisch)" were incorrectly flagged as
gutter OCR errors because the closing parenthesis wasn't stripped
before dictionary lookup. The spellchecker then suggested "probierend"
(replacing ) with d, edit distance 1).
Two fixes:
1. Strip trailing/leading parentheses in _try_spell_fix before checking
if the bare word is valid — skip correction if it is
2. Add )( to the rstrip characters in the analysis phase so
"probieren)" becomes "probieren" for the known-word check
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Words ending with "-" where the stem is a known word (e.g. "wunder-"
→ "wunder" is known) are valid line-break hyphenations, not gutter
errors. Gutter problems cause the hyphen to be LOST ("ve" instead of
"ver-"), so a visible hyphen + known stem = intentional word-wrap.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two bugs fixed:
- Apply no longer removes the continuation word from the next row.
"künden" stays in row 31 — only the current row is repaired
("ve" → "ver-"). The original line-break layout is preserved.
- Analysis now skips words that already end with "-" when the direct
join with the next row is a known word (valid hyphenation, not an error).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The next-row word "künden," had a trailing comma, causing dictionary
lookup to fail for "verkünden,". Now strips .,;:!? before joining.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Lower min word length from 3→2 for hyphen-join candidates so fragments
like "ve" (from "ver-künden") are no longer skipped
- Return all spellchecker candidates instead of just top-1, so user can
pick the correct form (e.g. "stammeln" vs "stammelt")
- Frontend shows clickable alternative buttons for spell_fix suggestions
- Backend accepts text_overrides in apply endpoint for user-selected alternatives
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New step "Wortkorrektur" between Grid-Review and Ground Truth that detects
and fixes words truncated or blurred at the book gutter (binding area) of
double-page scans. Uses pyspellchecker (DE+EN) for validation.
Two repair strategies:
- hyphen_join: words split across rows with missing chars (ve + künden → verkünden)
- spell_fix: garbled trailing chars from gutter blur (stammeli → stammeln)
Interactive frontend with per-suggestion accept/reject and batch controls.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>