Fix SmartSpellChecker: preserve leading non-alpha text like (=
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 47s
CI / test-python-klausur (push) Failing after 2m36s
CI / test-python-agent-core (push) Successful in 35s
CI / test-nodejs-website (push) Successful in 33s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 47s
CI / test-python-klausur (push) Failing after 2m36s
CI / test-python-agent-core (push) Successful in 35s
CI / test-nodejs-website (push) Successful in 33s
The tokenizer regex only matches alphabetic characters, so text before the first word match (like "(= " in "(= I won...") was silently dropped when reassembling the corrected text. Now preserves text[:first_match_start] as a leading prefix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -534,6 +534,13 @@ class SmartSpellChecker:
|
|||||||
|
|
||||||
# --- Pass 3: Per-word correction ---
|
# --- Pass 3: Per-word correction ---
|
||||||
parts: List[str] = []
|
parts: List[str] = []
|
||||||
|
|
||||||
|
# Preserve any leading text before the first token match
|
||||||
|
# (e.g., "(= " before "I won and he lost.")
|
||||||
|
first_start = tokens[0].start() if tokens else 0
|
||||||
|
if first_start > 0:
|
||||||
|
parts.append(text[:first_start])
|
||||||
|
|
||||||
for i, (word, sep) in enumerate(token_list):
|
for i, (word, sep) in enumerate(token_list):
|
||||||
# Skip words inside IPA brackets (brackets land in separators)
|
# Skip words inside IPA brackets (brackets land in separators)
|
||||||
prev_sep = token_list[i - 1][1] if i > 0 else ""
|
prev_sep = token_list[i - 1][1] if i > 0 else ""
|
||||||
|
|||||||
Reference in New Issue
Block a user