feat: auto-insert syllable dividers via pyphen on dictionary pages
OCR engines don't detect | pipe chars used as syllable dividers in dictionaries. After dictionary detection (is_dict=True), use pyphen (MIT) to insert syllable breaks into headword cells. Tries DE first, then EN. Skips IPA content, short words, and cells already containing |. Also adds pyphen>=0.16.0 to requirements.txt. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -38,6 +38,9 @@ eng-to-ipa
|
||||
# Spell-checker for rule-based OCR correction (MIT license)
|
||||
pyspellchecker>=0.8.1
|
||||
|
||||
# Syllable hyphenation for dictionary pipe-divider insertion (MIT license)
|
||||
pyphen>=0.16.0
|
||||
|
||||
# PostgreSQL (for metrics storage)
|
||||
psycopg2-binary>=2.9.0
|
||||
asyncpg>=0.29.0
|
||||
|
||||
Reference in New Issue
Block a user