97d4355aa9ecf985a4fe874825138c4684c0fa38
Fix half-height rows caused by tall special characters (brackets, IPA symbols) being split into separate line clusters: - Group words by vertical CENTER instead of TOP position, so tall characters on the same line stay in one cluster - Filter outlier-height words (>2× median) when computing letter_h so brackets/IPA don't skew the row height - Merge clusters closer than 0.4× median word height (definitely same text line despite slight center differences) - Increased y_tolerance from 0.5× to 0.6× median word height - Enhanced logging with cluster merge count and row height range Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Description
No description provided
Languages
TypeScript
60.2%
Python
32.9%
Go
5.5%
C#
0.8%
CSS
0.2%
Other
0.3%