fix: lower tertiary gap threshold for narrow margin column detection
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 26s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m54s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 16s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 26s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m54s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 16s
Reduce gap threshold from max(40, 5%) to max(30, 2%) so page_ref columns (e.g. p.55/p.57) at ~56px gap are detected as tertiary columns. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -188,7 +188,7 @@ def _cluster_columns_by_alignment(
|
||||
# Must have significant gap to nearest significant cluster
|
||||
if sig_xs:
|
||||
min_dist = min(abs(c["mean_x"] - sx) for sx in sig_xs)
|
||||
if min_dist < max(40, content_span * 0.05):
|
||||
if min_dist < max(30, content_span * 0.02):
|
||||
continue
|
||||
tertiary.append(c)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user