Benjamin Admin 1a246eb059 feat(ocr-pipeline): generic sub-column detection via left-edge clustering
Detects hidden sub-columns (e.g. page references like "p.59") within
already-recognized columns by clustering word left-edge positions and
splitting when a clear minority cluster exists. The sub-column is then
classified as page_ref and mapped to VocabRow.source_page.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-02 18:18:02 +01:00
S
Description
No description provided
15 MiB
Languages
TypeScript 57.9%
Python 33.8%
Go 6.8%
C# 0.8%
Shell 0.2%
Other 0.3%