Benjamin Admin 203b3c0e2d fix(ocr-pipeline): mask out images in row detection horizontal projection
Build a word-coverage mask so only pixels near Tesseract word bounding
boxes contribute to the horizontal projection. Image regions (high ink
but no words) are treated as white, preventing illustrations from
merging multiple vocabulary rows into one.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 01:39:20 +01:00
Description
No description provided
42 MiB
Languages
TypeScript 60.2%
Python 32.9%
Go 5.5%
C# 0.8%
CSS 0.2%
Other 0.3%