Benjamin Admin ce0815007e feat(ocr-pipeline): replace clustering column detection with whitespace-gap analysis
Column detection now uses vertical projection profiles to find whitespace
gaps between columns, then validates gaps against word bounding boxes to
prevent splitting through words. Old clustering algorithm extracted as
fallback (_detect_columns_by_clustering) for pages with < 2 detected gaps.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 00:36:28 +01:00
Description
No description provided
42 MiB
Languages
TypeScript 60.2%
Python 32.9%
Go 5.5%
C# 0.8%
CSS 0.2%
Other 0.3%