Fix IPA continuation: don't replace normal text with IPA
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 41s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m39s
CI / test-python-agent-core (push) Successful in 35s
CI / test-nodejs-website (push) Successful in 19s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 41s
CI / test-go-edu-search (push) Successful in 43s
CI / test-python-klausur (push) Failing after 2m39s
CI / test-python-agent-core (push) Successful in 35s
CI / test-nodejs-website (push) Successful in 19s
Text like "Betonung auf der 1. Silbe: profit ['profit]" was incorrectly detected as garbled IPA and replaced with generated IPA transcription of the previous row's example sentence. Added guard: if the cell text contains >=3 recognizable words (3+ letter alpha tokens), it's normal text, not garbled IPA. Garbled IPA is typically short and has no real dictionary words. Fixes: Row 13 C3 showing IPA instead of pronunciation hint text. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1112,6 +1112,13 @@ async def _build_grid_core(
|
||||
# Has real IPA symbols → already fixed or valid
|
||||
if any(c in _REAL_IPA_CHARS for c in cell_text):
|
||||
continue
|
||||
# Guard: if text contains multiple real words, it's
|
||||
# normal text (e.g. "Betonung auf der 1. Silbe:
|
||||
# profit"), not garbled IPA. Garbled IPA is
|
||||
# typically short and has no recognizable words.
|
||||
_words_in_text = re.findall(r'[A-Za-zÄÖÜäöüß]{3,}', cell_text)
|
||||
if len(_words_in_text) >= 3:
|
||||
continue
|
||||
|
||||
# Find headword in previous row, same column
|
||||
prev_ri = rows_sorted[idx - 1]["index"]
|
||||
|
||||
Reference in New Issue
Block a user