feat: box-aware column detection — exclude box content from global columns
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m4s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 19s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m4s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 19s
- Enrich column geometries with original full-page words (box-filtered) so _detect_sub_columns() finds narrow sub-columns across box boundaries - Add inline marker guard: bullet points (1., 2., •) are not split into sub-columns (minimum gap check: 1.2× word height or 20px) - Add box_rects parameter to build_grid_from_words() — words inside boxes are excluded from X-gap column clustering - Pass box rects from zones to words_first grid builder - Add 9 tests for box-aware column detection Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -2543,7 +2543,15 @@ async def detect_words(
|
||||
})
|
||||
wf_word_dicts = abs_words
|
||||
|
||||
cells, columns_meta = build_grid_from_words(wf_word_dicts, img_w, img_h)
|
||||
# Extract box rects for box-aware column clustering
|
||||
box_rects = []
|
||||
for zone in zones:
|
||||
if zone.get("zone_type") == "box" and zone.get("box"):
|
||||
box_rects.append(zone["box"])
|
||||
|
||||
cells, columns_meta = build_grid_from_words(
|
||||
wf_word_dicts, img_w, img_h, box_rects=box_rects or None,
|
||||
)
|
||||
duration = time.time() - t0
|
||||
|
||||
# Apply IPA phonetic fixes
|
||||
|
||||
Reference in New Issue
Block a user