Fix word-split: handle IPA brackets, contractions, and tiebreaker
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 46s
CI / test-python-klausur (push) Failing after 2m57s
CI / test-python-agent-core (push) Successful in 36s
CI / test-nodejs-website (push) Successful in 41s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 46s
CI / test-python-klausur (push) Failing after 2m57s
CI / test-python-agent-core (push) Successful in 36s
CI / test-nodejs-website (push) Successful in 41s
1. Strip IPA brackets [ipa] before attempting word split, so
"makeadecision[dɪsˈɪʒən]" is processed as "makeadecision"
2. Handle contractions: "solet's" → split "solet" → "so let" + "'s"
3. DP tiebreaker: prefer longer first word when scores are equal
("task is" over "ta skis")
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1752,8 +1752,27 @@ async def _build_grid_core(
|
|||||||
changed = False
|
changed = False
|
||||||
for token in text.split():
|
for token in text.split():
|
||||||
# Try splitting pure-alpha tokens >= 4 chars
|
# Try splitting pure-alpha tokens >= 4 chars
|
||||||
clean = token.rstrip(".,!?;:'\")")
|
# Strip trailing punctuation AND IPA brackets
|
||||||
suffix = token[len(clean):]
|
clean = token
|
||||||
|
# Remove trailing IPA like [dɪsˈɪʒən] first
|
||||||
|
bracket_pos = clean.find('[')
|
||||||
|
suffix_ipa = ""
|
||||||
|
if bracket_pos > 0:
|
||||||
|
suffix_ipa = clean[bracket_pos:]
|
||||||
|
clean = clean[:bracket_pos]
|
||||||
|
suffix_punct = ""
|
||||||
|
stripped = clean.rstrip(".,!?;:'\")")
|
||||||
|
if stripped != clean:
|
||||||
|
suffix_punct = clean[len(stripped):]
|
||||||
|
clean = stripped
|
||||||
|
suffix = suffix_punct + suffix_ipa
|
||||||
|
# Handle contractions: "solet's" → try "solet" + "'s"
|
||||||
|
contraction = ""
|
||||||
|
if "'" in clean and clean.index("'") >= 2:
|
||||||
|
apos_pos = clean.index("'")
|
||||||
|
contraction = clean[apos_pos:]
|
||||||
|
clean = clean[:apos_pos]
|
||||||
|
suffix = contraction + suffix
|
||||||
if len(clean) >= 4 and clean.isalpha():
|
if len(clean) >= 4 and clean.isalpha():
|
||||||
split = _try_split_merged_word(clean)
|
split = _try_split_merged_word(clean)
|
||||||
if split:
|
if split:
|
||||||
|
|||||||
Reference in New Issue
Block a user