Benjamin Admin cf27a95308 feat(ocr-pipeline): word-based 5-column detection for vocabulary pages
Replace projection-profile layout analysis with Tesseract word bounding
box clustering to detect 5-column vocabulary layouts (page_ref, EN, DE,
markers, examples). Falls back to projection profiles when < 3 clusters.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 23:08:14 +01:00
S
Description
No description provided
15 MiB
Languages
TypeScript 57.9%
Python 33.8%
Go 6.8%
C# 0.8%
Shell 0.2%
Other 0.3%