a079ffe8e9d684b9120d1f68396b5e4eeec2f5a7
The 25x25 dilation kernel merges nearby green words into large regions, so pixel-overlap with OCR word boxes drops below 50%. Previous density checks alone weren't sufficient. New multi-layered approach: - Count OCR word CENTROIDS inside each colored region - ≥2 centroids → definitely text (images don't produce multiple words) - 1 centroid + 10%+ pixel overlap → likely text - Lower pixel overlap threshold from 50% to 40% - Raise density+height thresholds for text-line detection - Use INFO logging to diagnose remaining false positives Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Description
No description provided
Languages
TypeScript
60.2%
Python
32.9%
Go
5.5%
C#
0.8%
CSS
0.2%
Other
0.3%