4f2fb0e94c42c6f90fd11b5ce6f549c1c6166327
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 42s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m13s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 22s
The merge algorithm now uses 3 criteria instead of just IoU > 0.3: 1. IoU > 0.15 (relaxed threshold) 2. Center proximity < word height AND same row 3. Text similarity > 0.7 AND same row This prevents doubled overlapping words when both PaddleOCR and Tesseract find the same word at similar positions. Unique words from either engine (e.g. bullets from Tesseract) are still added. Tests expanded: 19 → 37 (added _box_center_dist, _text_similarity, _words_match tests + deduplication regression test). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Description
No description provided
Languages
TypeScript
60.2%
Python
32.9%
Go
5.5%
C#
0.8%
CSS
0.2%
Other
0.3%