Fix word-split: handle IPA brackets, contractions, and tiebreaker
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 46s
CI / test-python-klausur (push) Failing after 2m57s
CI / test-python-agent-core (push) Successful in 36s
CI / test-nodejs-website (push) Successful in 41s

1. Strip IPA brackets [ipa] before attempting word split, so
   "makeadecision[dɪsˈɪʒən]" is processed as "makeadecision"
2. Handle contractions: "solet's" → split "solet" → "so let" + "'s"
3. DP tiebreaker: prefer longer first word when scores are equal
   ("task is" over "ta skis")

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-04-12 09:13:02 +02:00
parent 4f4e6c31fa
commit ad78e26143

View File

@@ -1752,8 +1752,27 @@ async def _build_grid_core(
changed = False
for token in text.split():
# Try splitting pure-alpha tokens >= 4 chars
clean = token.rstrip(".,!?;:'\")")
suffix = token[len(clean):]
# Strip trailing punctuation AND IPA brackets
clean = token
# Remove trailing IPA like [dɪsˈɪʒən] first
bracket_pos = clean.find('[')
suffix_ipa = ""
if bracket_pos > 0:
suffix_ipa = clean[bracket_pos:]
clean = clean[:bracket_pos]
suffix_punct = ""
stripped = clean.rstrip(".,!?;:'\")")
if stripped != clean:
suffix_punct = clean[len(stripped):]
clean = stripped
suffix = suffix_punct + suffix_ipa
# Handle contractions: "solet's" → try "solet" + "'s"
contraction = ""
if "'" in clean and clean.index("'") >= 2:
apos_pos = clean.index("'")
contraction = clean[apos_pos:]
clean = clean[:apos_pos]
suffix = contraction + suffix
if len(clean) >= 4 and clean.isalpha():
split = _try_split_merged_word(clean)
if split: