8e861e5a4d3086a3bf6b4246ed8115e91d5e262b
The y_tolerance for word-center clustering was based on median word height (21px → 12px tolerance), which was too small. Words on the same line can have centers 15-20px apart due to different heights. Now uses 40% of the gap-based median row height as tolerance (e.g. 40px row → 16px tolerance), and 30% for merge threshold. This produces correct cluster counts matching actual text lines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Description
No description provided
Languages
TypeScript
60.2%
Python
32.9%
Go
5.5%
C#
0.8%
CSS
0.2%
Other
0.3%