SmartSpellChecker: frequency-based boundary repair for valid word pairs
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 43s
CI / test-go-edu-search (push) Successful in 40s
CI / test-python-klausur (push) Failing after 2m42s
CI / test-python-agent-core (push) Successful in 37s
CI / test-nodejs-website (push) Successful in 35s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 43s
CI / test-go-edu-search (push) Successful in 40s
CI / test-python-klausur (push) Failing after 2m42s
CI / test-python-agent-core (push) Successful in 37s
CI / test-nodejs-website (push) Successful in 35s
Previously, boundary repair was skipped when both words were valid dictionary words (e.g., "Pound sand", "wit hit", "done euro"). Now uses word-frequency scoring (product of bigram frequencies) to decide if the repair produces a more common word pair. Threshold: repair accepted when new pair is >5x more frequent, or when repair produces a known abbreviation. New fixes: Pound sand→Pounds and (2000x), wit hit→with it (100000x), done euro→one euro (7x). 43 tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -166,8 +166,8 @@ class TestBoundaryRepair:
|
||||
result = sc.correct_text("be good ats th.", "en")
|
||||
assert "at sth." in result.corrected, f"Expected 'at sth.' in '{result.corrected}'"
|
||||
|
||||
def test_no_repair_if_both_known(self, sc):
|
||||
"""Don't repair if both words are already valid."""
|
||||
def test_no_repair_common_pair(self, sc):
|
||||
"""Don't repair if both words form a common pair."""
|
||||
result = sc.correct_text("at the", "en")
|
||||
assert result.corrected == "at the"
|
||||
assert not result.changed
|
||||
@@ -184,6 +184,21 @@ class TestBoundaryRepair:
|
||||
assert repair[0] == "at"
|
||||
assert repair[1] == "sth."
|
||||
|
||||
def test_pound_sand_to_pounds_and(self, sc):
|
||||
"""'Pound sand' → 'Pounds and' — both valid but repair is much more frequent."""
|
||||
result = sc.correct_text("Pound sand euros", "en")
|
||||
assert "Pounds and" in result.corrected, f"Expected 'Pounds and' in '{result.corrected}'"
|
||||
|
||||
def test_wit_hit_to_with_it(self, sc):
|
||||
"""'wit hit' → 'with it' — frequency-based repair."""
|
||||
result = sc.correct_text("be careful wit hit", "en")
|
||||
assert "with it" in result.corrected, f"Expected 'with it' in '{result.corrected}'"
|
||||
|
||||
def test_done_euro_to_one_euro(self, sc):
|
||||
"""'done euro' → 'one euro' in context."""
|
||||
result = sc.correct_text("done euro", "en")
|
||||
assert "one euro" in result.corrected, f"Expected 'one euro' in '{result.corrected}'"
|
||||
|
||||
|
||||
# ─── Context Split ──────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
Reference in New Issue
Block a user