Fix IPA continuation: only process fully-bracketed cells, keep phrasal verb particles
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 26s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m53s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 16s

Two fixes:
1. Step 5d now only treats cells as continuation when text is entirely
   inside brackets (e.g. "[n, nn]"). Cells with headwords outside brackets
   (e.g. "employee [im'ploi:]") are no longer overwritten.
2. fix_ipa_continuation_cell no longer skips grammar words like "down" —
   they are part of the headword in phrasal verbs like "close sth. down".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-03-20 00:43:51 +01:00
parent 5f89913a9a
commit 92a7b85c2d
3 changed files with 32 additions and 5 deletions

View File

@@ -499,3 +499,24 @@ class TestGarbledIpaDetection:
)
assert fixed != "[1uedtX,1]"
assert "ɪkwˈɪpmənt" in fixed # equipment IPA
def test_fix_continuation_close_down(self):
"""IPA continuation for 'close sth. down' → IPA for both words."""
fixed = fix_ipa_continuation_cell(
"[klaoz 'daun]", "close sth. down", pronunciation="british",
)
assert fixed != "[klaoz 'daun]"
assert "klˈəʊs" in fixed # close IPA
assert "dˈaʊn" in fixed # down IPA — must NOT be skipped
def test_headword_with_brackets_not_continuation(self):
"""'employee [im'ploi:]' has a headword outside brackets → not garbled.
_text_has_garbled_ipa returns True (has ':'), but Step 5d should
skip this cell because text doesn't start with '['.
"""
# The garbled check still triggers (has IPA-like ':')
assert _text_has_garbled_ipa("employee [im'ploi:]") is True
# But text does NOT start with '[' — Step 5d bracket guard blocks it
text = "employee [im'ploi:]"
assert not (text.strip().startswith('[') and text.strip().endswith(']'))