Fix word-split: handle IPA brackets, contractions, and tiebreaker
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 46s
CI / test-python-klausur (push) Failing after 2m57s
CI / test-python-agent-core (push) Successful in 36s
CI / test-nodejs-website (push) Successful in 41s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 46s
CI / test-python-klausur (push) Failing after 2m57s
CI / test-python-agent-core (push) Successful in 36s
CI / test-nodejs-website (push) Successful in 41s
1. Strip IPA brackets [ipa] before attempting word split, so
"makeadecision[dɪsˈɪʒən]" is processed as "makeadecision"
2. Handle contractions: "solet's" → split "solet" → "so let" + "'s"
3. DP tiebreaker: prefer longer first word when scores are equal
("task is" over "ta skis")
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1752,8 +1752,27 @@ async def _build_grid_core(
|
||||
changed = False
|
||||
for token in text.split():
|
||||
# Try splitting pure-alpha tokens >= 4 chars
|
||||
clean = token.rstrip(".,!?;:'\")")
|
||||
suffix = token[len(clean):]
|
||||
# Strip trailing punctuation AND IPA brackets
|
||||
clean = token
|
||||
# Remove trailing IPA like [dɪsˈɪʒən] first
|
||||
bracket_pos = clean.find('[')
|
||||
suffix_ipa = ""
|
||||
if bracket_pos > 0:
|
||||
suffix_ipa = clean[bracket_pos:]
|
||||
clean = clean[:bracket_pos]
|
||||
suffix_punct = ""
|
||||
stripped = clean.rstrip(".,!?;:'\")")
|
||||
if stripped != clean:
|
||||
suffix_punct = clean[len(stripped):]
|
||||
clean = stripped
|
||||
suffix = suffix_punct + suffix_ipa
|
||||
# Handle contractions: "solet's" → try "solet" + "'s"
|
||||
contraction = ""
|
||||
if "'" in clean and clean.index("'") >= 2:
|
||||
apos_pos = clean.index("'")
|
||||
contraction = clean[apos_pos:]
|
||||
clean = clean[:apos_pos]
|
||||
suffix = contraction + suffix
|
||||
if len(clean) >= 4 and clean.isalpha():
|
||||
split = _try_split_merged_word(clean)
|
||||
if split:
|
||||
|
||||
Reference in New Issue
Block a user