Strip English IPA when mode excludes EN (nur DE / Aus)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 46s
CI / test-python-agent-core (push) Has been cancelled
CI / test-nodejs-website (push) Has been cancelled
CI / test-python-klausur (push) Has been cancelled
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 46s
CI / test-python-agent-core (push) Has been cancelled
CI / test-nodejs-website (push) Has been cancelled
CI / test-python-klausur (push) Has been cancelled
English IPA from the original OCR scan (e.g. [ˈgrænˌdæd]) was always shown because fix_cell_phonetics only ADDS/CORRECTS but never removes. Now strips IPA brackets containing Unicode IPA chars from the EN column when ipa_mode is "de" or "none". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1002,6 +1002,24 @@ async def _build_grid_core(
|
||||
en_ipa_target_cols.add(en_col_type)
|
||||
de_ipa_target_cols = all_content_cols - en_ipa_target_cols
|
||||
|
||||
# --- Strip IPA from columns NOT in the target set ---
|
||||
# When user selects "nur DE", English IPA from the OCR scan must
|
||||
# be removed. When "none", all IPA is removed.
|
||||
_IPA_BRACKET_STRIP_RE = re.compile(r'\s*\[[^\]]*[ˈˌːɑɒæɛəɜɪɔʊʌðŋθʃʒɹɡɾʔɐ][^\]]*\]')
|
||||
strip_en_ipa = en_col_type and en_col_type not in en_ipa_target_cols
|
||||
if strip_en_ipa or ipa_mode == "none":
|
||||
strip_cols = {en_col_type} if strip_en_ipa and ipa_mode != "none" else all_content_cols
|
||||
for cell in all_cells:
|
||||
ct = cell.get("col_type", "")
|
||||
if ct not in strip_cols:
|
||||
continue
|
||||
text = cell.get("text", "")
|
||||
if "[" in text:
|
||||
stripped = _IPA_BRACKET_STRIP_RE.sub("", text)
|
||||
if stripped != text:
|
||||
cell["text"] = stripped.strip()
|
||||
cell["_ipa_corrected"] = True
|
||||
|
||||
# --- English IPA (Britfone + eng_to_ipa) ---
|
||||
if en_ipa_target_cols:
|
||||
for cell in all_cells:
|
||||
|
||||
Reference in New Issue
Block a user