Real dictionary pages have only ~3% OCR-detected pipes because the thin
syllable divider lines are hard for OCR to read. The primary false-positive
guard (article_col_index check) already blocks synonym dictionaries.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rewrite cv_syllable_detect.py with pyphen-first approach:
- Remove unreliable CV gate (morphological pipe detection)
- Strip existing pipes and re-syllabify via pyphen (DE then EN)
- Merge pipe-gap spaces where OCR split words at divider positions
- Guard merges with function word blacklist and punctuation checks
Add false-positive prevention:
- Pre-check: skip if <5% of cells have existing | from OCR
- Call-site check: require article_col_index (der/die/das column)
- Prevents syllabification of synonym dictionaries and word lists
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1. Extracted 1367 lines of helper functions from grid_editor_api.py
(3051→1620 lines) into grid_editor_helpers.py (filters, detectors,
zone grid building).
2. Created cv_syllable_detect.py with generic CV+pyphen logic:
- Checks EVERY word_box for vertical pipe lines (not just first word)
- No article-column dependency — works with any dictionary layout
- CV morphological detection gates pyphen insertion
3. Grid editor scroll: calc(100vh-200px) for reliable scrolling.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>