Fix IPA continuation: don't replace normal text with IPA
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 41s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m39s
CI / test-python-agent-core (push) Successful in 35s
CI / test-nodejs-website (push) Successful in 19s

Text like "Betonung auf der 1. Silbe: profit ['profit]" was
incorrectly detected as garbled IPA and replaced with generated
IPA transcription of the previous row's example sentence.

Added guard: if the cell text contains >=3 recognizable words
(3+ letter alpha tokens), it's normal text, not garbled IPA.
Garbled IPA is typically short and has no real dictionary words.

Fixes: Row 13 C3 showing IPA instead of pronunciation hint text.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-04-15 22:28:58 +02:00
parent 5fad2d420d
commit 0599c72cc1

View File

@@ -1112,6 +1112,13 @@ async def _build_grid_core(
# Has real IPA symbols → already fixed or valid
if any(c in _REAL_IPA_CHARS for c in cell_text):
continue
# Guard: if text contains multiple real words, it's
# normal text (e.g. "Betonung auf der 1. Silbe:
# profit"), not garbled IPA. Garbled IPA is
# typically short and has no recognizable words.
_words_in_text = re.findall(r'[A-Za-zÄÖÜäöüß]{3,}', cell_text)
if len(_words_in_text) >= 3:
continue
# Find headword in previous row, same column
prev_ri = rows_sorted[idx - 1]["index"]