fix: full-width Zeilen vor Spaltenerkennung maskieren
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 25s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m56s
CI / test-python-agent-core (push) Successful in 14s
CI / test-nodejs-website (push) Successful in 17s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 25s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m56s
CI / test-python-agent-core (push) Successful in 14s
CI / test-nodejs-website (push) Successful in 17s
Farbige Sub-Header (z.B. "Unit 4: Bonnie Scotland") mit voller Breite fuellten die Spaltenluecken im vertikalen Projektionsprofil auf und fuehrten zu 11 statt 5 erkannten Spalten. Zeilen mit >40% Tintendichte werden jetzt vor der Projektion maskiert. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -2131,10 +2131,46 @@ def detect_column_geometry(ocr_img: np.ndarray, dewarped_bgr: np.ndarray) -> Opt
|
||||
|
||||
logger.info(f"ColumnGeometry: {len(left_edges)} words detected in content area")
|
||||
|
||||
# --- Step 3: Vertical projection profile ---
|
||||
# --- Step 2b: Mask out full-width rows (sub-headers, colored bands) ---
|
||||
# Rows where ink spans nearly the full content width distort the vertical
|
||||
# projection by filling in column gaps. Detect them via horizontal density
|
||||
# and zero them out before computing v_proj.
|
||||
content_strip = inv[top_y:bottom_y, left_x:right_x]
|
||||
v_proj = np.sum(content_strip, axis=0).astype(float)
|
||||
v_proj_norm = v_proj / (content_h * 255) if content_h > 0 else v_proj
|
||||
h_proj_row = np.sum(content_strip, axis=1).astype(float)
|
||||
h_proj_row_norm = h_proj_row / (content_w * 255) if content_w > 0 else h_proj_row
|
||||
|
||||
FULLWIDTH_THRESHOLD = 0.40 # normal text ~10-25%; full-width bands 40%+
|
||||
fullwidth_mask = h_proj_row_norm > FULLWIDTH_THRESHOLD
|
||||
|
||||
# Only mask contiguous bands (>=3 rows), not isolated noisy rows
|
||||
masked_strip = content_strip.copy()
|
||||
n_masked = 0
|
||||
band_start = None
|
||||
for y_idx in range(len(fullwidth_mask)):
|
||||
if fullwidth_mask[y_idx]:
|
||||
if band_start is None:
|
||||
band_start = y_idx
|
||||
else:
|
||||
if band_start is not None:
|
||||
band_height = y_idx - band_start
|
||||
if band_height >= 3:
|
||||
masked_strip[band_start:y_idx, :] = 0
|
||||
n_masked += band_height
|
||||
band_start = None
|
||||
if band_start is not None:
|
||||
band_height = len(fullwidth_mask) - band_start
|
||||
if band_height >= 3:
|
||||
masked_strip[band_start:len(fullwidth_mask), :] = 0
|
||||
n_masked += band_height
|
||||
|
||||
if n_masked > 0:
|
||||
logger.info(f"ColumnGeometry: masked {n_masked} full-width rows "
|
||||
f"({n_masked * 100 / content_h:.1f}% of content height)")
|
||||
|
||||
# --- Step 3: Vertical projection profile ---
|
||||
effective_h = content_h - n_masked
|
||||
v_proj = np.sum(masked_strip, axis=0).astype(float)
|
||||
v_proj_norm = v_proj / (effective_h * 255) if effective_h > 0 else v_proj
|
||||
|
||||
# Smooth the projection to avoid noise-induced micro-gaps
|
||||
kernel_size = max(5, content_w // 80)
|
||||
|
||||
Reference in New Issue
Block a user