fix: increase color recovery occupancy padding to prevent gap artifacts
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 1m58s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 18s

Colored-pixel fragments in narrow inter-word gaps were being recovered
as false characters (e.g., "!" between "lend" and "sb."), disrupting
word order. Use adaptive padding based on median word height instead
of fixed 4px.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-03-17 10:28:56 +01:00
parent 324f39a9cc
commit 2b73d9beec

View File

@@ -210,9 +210,14 @@ def recover_colored_text(
ih, iw = img_bgr.shape[:2]
max_area = int(ih * iw * 0.005)
# --- Build occupancy mask from existing words (with 4px padding) ---
# --- Build occupancy mask from existing words (adaptive padding) ---
# Pad word boxes generously to prevent colored-pixel artifacts in
# narrow inter-word gaps from being recovered as false characters.
heights = [wb["height"] for wb in existing_words if wb.get("height", 0) > 0]
median_h = int(np.median(heights)) if heights else 20
pad = max(8, int(median_h * 0.35))
occupied = np.zeros((ih, iw), dtype=np.uint8)
pad = 4
for wb in existing_words:
x1 = max(0, int(wb["left"]) - pad)
y1 = max(0, int(wb["top"]) - pad)