fix: Klammerwörter wie (probieren), (Profit) nicht mehr als garbled IPA entfernen
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 50s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 2m12s
CI / test-python-agent-core (push) Successful in 23s
CI / test-nodejs-website (push) Successful in 27s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 50s
CI / test-go-edu-search (push) Successful in 45s
CI / test-python-klausur (push) Failing after 2m12s
CI / test-python-agent-core (push) Successful in 23s
CI / test-nodejs-website (push) Successful in 27s
_strip_orphan_bracket entfernte deutsche Bedeutungsangaben in Klammern, weil sie weder als Grammar-Partikel noch als IPA erkannt wurden. Fix: Klammerinhalte mit echten Wörtern (>=4 Buchstaben) werden behalten. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -876,6 +876,12 @@ def _replace_phonetics_in_text(text: str, pronunciation: str = 'british') -> str
|
||||
# Keep correct IPA (contains Unicode IPA characters)
|
||||
if any(ch in _IPA_CHARS for ch in content):
|
||||
return m.group(0)
|
||||
# Keep real-word parentheticals like (probieren), (Profit), (Geld).
|
||||
# Garbled IPA fragments are short nonsense like (kros), (cy), (mais)
|
||||
# — they never contain a real word ≥4 letters with proper casing.
|
||||
content_alpha = re.sub(r'[^a-zA-ZäöüÄÖÜßéèêëàâîïôûùç]', '', content)
|
||||
if len(content_alpha) >= 4:
|
||||
return m.group(0)
|
||||
logger.debug(f"phonetic: stripping orphan bracket '{m.group(0)}'")
|
||||
return ''
|
||||
|
||||
|
||||
Reference in New Issue
Block a user