Fix 4 Grid Editor bugs: syllable modes, heading detection, word gaps
1. Syllable "Original" (auto) mode: only normalize cells that already have | from OCR — don't add new syllable marks via pyphen to words without printed dividers on the original scan. 2. Syllable "Aus" (none) mode: strip residual | chars from OCR text so cells display clean (e.g. "Zel|le" → "Zelle"). 3. Heading detection: add text length guard in single-cell heuristic — words > 4 alpha chars starting lowercase (like "zentral") are regular vocabulary, not section headings. 4. Word-gap merge: new merge_word_gaps_in_zones() step with relaxed threshold (6 chars) fixes OCR splits like "zerknit tert" → "zerknittert". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1593,6 +1593,13 @@ async def _build_grid_core(
|
||||
except Exception as e:
|
||||
logger.warning("Dictionary detection failed: %s", e)
|
||||
|
||||
# --- Word-gap merge: fix OCR splits like "zerknit tert" → "zerknittert" ---
|
||||
try:
|
||||
from cv_syllable_detect import merge_word_gaps_in_zones
|
||||
merge_word_gaps_in_zones(zones_data, session_id)
|
||||
except Exception as e:
|
||||
logger.warning("Word-gap merge failed: %s", e)
|
||||
|
||||
# --- Syllable divider insertion for dictionary pages ---
|
||||
# syllable_mode: "auto" = only when original has pipe dividers (1% threshold),
|
||||
# "all" = force on all content words, "en" = English column only,
|
||||
@@ -1626,6 +1633,15 @@ async def _build_grid_core(
|
||||
except Exception as e:
|
||||
logger.warning("Syllable insertion failed: %s", e)
|
||||
|
||||
# When syllable mode is "none", strip any residual | from OCR so
|
||||
# that the displayed text is clean (e.g. "Zel|le" → "Zelle").
|
||||
if syllable_mode == "none":
|
||||
for z in zones_data:
|
||||
for cell in z.get("cells", []):
|
||||
t = cell.get("text", "")
|
||||
if "|" in t:
|
||||
cell["text"] = t.replace("|", "")
|
||||
|
||||
# Clean up internal flags before returning
|
||||
for z in zones_data:
|
||||
for cell in z.get("cells", []):
|
||||
|
||||
Reference in New Issue
Block a user