fix: increase color recovery occupancy padding to prevent gap artifacts
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 1m58s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 18s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 1m58s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 18s
Colored-pixel fragments in narrow inter-word gaps were being recovered as false characters (e.g., "!" between "lend" and "sb."), disrupting word order. Use adaptive padding based on median word height instead of fixed 4px. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -210,9 +210,14 @@ def recover_colored_text(
|
||||
ih, iw = img_bgr.shape[:2]
|
||||
max_area = int(ih * iw * 0.005)
|
||||
|
||||
# --- Build occupancy mask from existing words (with 4px padding) ---
|
||||
# --- Build occupancy mask from existing words (adaptive padding) ---
|
||||
# Pad word boxes generously to prevent colored-pixel artifacts in
|
||||
# narrow inter-word gaps from being recovered as false characters.
|
||||
heights = [wb["height"] for wb in existing_words if wb.get("height", 0) > 0]
|
||||
median_h = int(np.median(heights)) if heights else 20
|
||||
pad = max(8, int(median_h * 0.35))
|
||||
|
||||
occupied = np.zeros((ih, iw), dtype=np.uint8)
|
||||
pad = 4
|
||||
for wb in existing_words:
|
||||
x1 = max(0, int(wb["left"]) - pad)
|
||||
y1 = max(0, int(wb["top"]) - pad)
|
||||
|
||||
Reference in New Issue
Block a user