Fix word-split tiebreaker: prefer longer first word
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 39s
CI / test-python-klausur (push) Failing after 2m44s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 35s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 39s
CI / test-python-klausur (push) Failing after 2m44s
CI / test-python-agent-core (push) Successful in 31s
CI / test-nodejs-website (push) Successful in 35s
"taskis" was split as "ta skis" instead of "task is" because both have the same DP score. Changed comparison from > to >= so that later candidates (with longer first words) win ties. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -758,7 +758,8 @@ def _try_split_merged_word(token: str) -> Optional[str]:
|
||||
dp[i] = (new_words, new_sq)
|
||||
else:
|
||||
old_key = (-len(dp[i][0]), dp[i][1])
|
||||
if new_key > old_key:
|
||||
if new_key >= old_key:
|
||||
# >= so that later splits (longer first word) win ties
|
||||
dp[i] = (new_words, new_sq)
|
||||
|
||||
if dp[n] is None or len(dp[n][0]) < 2:
|
||||
|
||||
Reference in New Issue
Block a user