fix: _split_broad_columns nur bei maximal 1 breiter Spalte ausfuehren
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 25s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 2m26s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 18s

Wenn bereits 2+ breite Content-Spalten existieren, ist das Layout
wahrscheinlich korrekt in EN/DE getrennt. Split wird nur ausgefuehrt
wenn eine einzelne breite Spalte EN+DE kombiniert enthaelt.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-03-07 22:51:14 +01:00
parent e1ae5d5fa9
commit d98359fceb

View File

@@ -2097,6 +2097,14 @@ def _split_broad_columns(
logger.info(f"SplitBroadCols: input {len(geometries)} cols: "
f"{[(g.index, g.x, g.width, g.word_count, round(g.width_ratio, 3)) for g in geometries]}")
# Count how many broad content columns exist. If there are already 2+,
# the layout is likely already correctly split into EN / DE — skip.
broad_count = sum(1 for g in geometries
if g.width_ratio > _broad_threshold and len(g.words) >= 10)
if broad_count >= 2:
logger.info(f"SplitBroadCols: {broad_count} broad cols already → skip")
return geometries
for geo in geometries:
if geo.width_ratio <= _broad_threshold or len(geo.words) < 10:
result.append(geo)