Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 32s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m5s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 19s
Even after multi-criteria matching, near-duplicate words can slip through (same text, centers within 30px horizontal / 15px vertical). The new _deduplicate_words() removes these, keeping the higher-confidence copy. Regression test with real session data (row 2 with 145 near-dupes) confirms no duplicates remain after merge + deduplication. Tests: 37 → 45 (added TestDeduplicateWords, TestMergeRealWorldRegression). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>