Commit Graph

3 Commits

Author SHA1 Message Date
Benjamin Admin
729ebff63c feat: add border ghost filter + graphic detection tests + structure overlay
- Add _filter_border_ghost_words() to remove OCR artefacts from box borders
  (vertical + horizontal edge detection, column cleanup, re-indexing)
- Add 20 tests for border ghost filter (basic filtering + column cleanup)
- Add 24 tests for cv_graphic_detect (color detection, word overlap, boxes)
- Clean up cv_graphic_detect.py logging (per-candidate → DEBUG)
- Add structure overlay layer to StepReconstruction (boxes + graphics toggle)
- Show border_ghosts_removed badge in StepStructureDetection
- Update MkDocs with structure detection documentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 18:28:53 +01:00
Benjamin Admin
6b9b280ba3 feat: integrate graphic element detection into structure step
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 28s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 1m58s
CI / test-python-agent-core (push) Successful in 18s
CI / test-nodejs-website (push) Successful in 19s
Add cv_graphic_detect.py for detecting non-text visual elements (arrows,
circles, lines, exclamation marks, icons, illustrations). Draw detected
graphics on structure overlay image and display them in the frontend
StepStructureDetection component with shape counts and individual listings.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 13:21:55 +01:00
Benjamin Admin
5b5213c2b9 feat: add Structure Detection step to OCR pipeline
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 27s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 1m58s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 16s
New pipeline step between Crop and Columns that visualizes detected
document structure: boxes (line-based + shading), page zones, and
color regions. Shows original image on the left, annotated overlay
on the right.

Backend: POST /detect-structure endpoint + /image/structure-overlay
Frontend: StepStructureDetection component with zone/box/color details

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 12:31:09 +01:00