Fix SmartSpellChecker: preserve leading non-alpha text like (=
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 47s
CI / test-python-klausur (push) Failing after 2m36s
CI / test-python-agent-core (push) Successful in 35s
CI / test-nodejs-website (push) Successful in 33s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 47s
CI / test-python-klausur (push) Failing after 2m36s
CI / test-python-agent-core (push) Successful in 35s
CI / test-nodejs-website (push) Successful in 33s
The tokenizer regex only matches alphabetic characters, so text before the first word match (like "(= " in "(= I won...") was silently dropped when reassembling the corrected text. Now preserves text[:first_match_start] as a leading prefix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -534,6 +534,13 @@ class SmartSpellChecker:
|
||||
|
||||
# --- Pass 3: Per-word correction ---
|
||||
parts: List[str] = []
|
||||
|
||||
# Preserve any leading text before the first token match
|
||||
# (e.g., "(= " before "I won and he lost.")
|
||||
first_start = tokens[0].start() if tokens else 0
|
||||
if first_start > 0:
|
||||
parts.append(text[:first_start])
|
||||
|
||||
for i, (word, sep) in enumerate(token_list):
|
||||
# Skip words inside IPA brackets (brackets land in separators)
|
||||
prev_sep = token_list[i - 1][1] if i > 0 else ""
|
||||
|
||||
Reference in New Issue
Block a user