fix: Klammerwörter wie (probieren), (Profit) nicht mehr als garbled IPA entfernen
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 50s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 2m12s
CI / test-python-agent-core (push) Successful in 23s
CI / test-nodejs-website (push) Successful in 27s

_strip_orphan_bracket entfernte deutsche Bedeutungsangaben in Klammern,
weil sie weder als Grammar-Partikel noch als IPA erkannt wurden.
Fix: Klammerinhalte mit echten Wörtern (>=4 Buchstaben) werden behalten.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-03-11 22:47:01 +01:00
parent 7d19145edb
commit 4afd5bd8e8

View File

@@ -876,6 +876,12 @@ def _replace_phonetics_in_text(text: str, pronunciation: str = 'british') -> str
# Keep correct IPA (contains Unicode IPA characters)
if any(ch in _IPA_CHARS for ch in content):
return m.group(0)
# Keep real-word parentheticals like (probieren), (Profit), (Geld).
# Garbled IPA fragments are short nonsense like (kros), (cy), (mais)
# — they never contain a real word ≥4 letters with proper casing.
content_alpha = re.sub(r'[^a-zA-ZäöüÄÖÜßéèêëàâîïôûùç]', '', content)
if len(content_alpha) >= 4:
return m.group(0)
logger.debug(f"phonetic: stripping orphan bracket '{m.group(0)}'")
return ''