Fix SmartSpellChecker: preserve leading non-alpha text like (=
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 47s
CI / test-python-klausur (push) Failing after 2m36s
CI / test-python-agent-core (push) Successful in 35s
CI / test-nodejs-website (push) Successful in 33s

The tokenizer regex only matches alphabetic characters, so text
before the first word match (like "(= " in "(= I won...") was
silently dropped when reassembling the corrected text.

Now preserves text[:first_match_start] as a leading prefix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-04-15 23:41:33 +02:00
parent 596864431b
commit 4561320e0d

View File

@@ -534,6 +534,13 @@ class SmartSpellChecker:
# --- Pass 3: Per-word correction --- # --- Pass 3: Per-word correction ---
parts: List[str] = [] parts: List[str] = []
# Preserve any leading text before the first token match
# (e.g., "(= " before "I won and he lost.")
first_start = tokens[0].start() if tokens else 0
if first_start > 0:
parts.append(text[:first_start])
for i, (word, sep) in enumerate(token_list): for i, (word, sep) in enumerate(token_list):
# Skip words inside IPA brackets (brackets land in separators) # Skip words inside IPA brackets (brackets land in separators)
prev_sep = token_list[i - 1][1] if i > 0 else "" prev_sep = token_list[i - 1][1] if i > 0 else ""