feat: auto-insert syllable dividers via pyphen on dictionary pages

OCR engines don't detect | pipe chars used as syllable dividers in
dictionaries. After dictionary detection (is_dict=True), use pyphen
(MIT) to insert syllable breaks into headword cells. Tries DE first,
then EN. Skips IPA content, short words, and cells already containing |.

Also adds pyphen>=0.16.0 to requirements.txt.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-03-24 14:17:26 +01:00
parent fe754398c0
commit 364086b86e
2 changed files with 68 additions and 0 deletions

View File

@@ -38,6 +38,9 @@ eng-to-ipa
# Spell-checker for rule-based OCR correction (MIT license)
pyspellchecker>=0.8.1
# Syllable hyphenation for dictionary pipe-divider insertion (MIT license)
pyphen>=0.16.0
# PostgreSQL (for metrics storage)
psycopg2-binary>=2.9.0
asyncpg>=0.29.0