breakpilot-lehrer

Benjamin_Boenisch/breakpilot-lehrer

Fork 0

Commit Graph

Author	SHA1	Message	Date
Benjamin Admin	2f8270f77b	Add Vision-LLM OCR Fusion (Step 4) for degraded scans Some checks failed CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-school (push) Successful in 42s Details CI / test-go-edu-search (push) Successful in 29s Details CI / test-python-klausur (push) Failing after 2m43s Details CI / test-python-agent-core (push) Successful in 20s Details CI / test-nodejs-website (push) Successful in 27s Details New module vision_ocr_fusion.py: Sends scan image + OCR word coordinates + document type to Qwen2.5-VL 32B. The LLM reads the image visually while using OCR positions as structural hints. Key features: - Document-type-aware prompts (Vokabelseite, Woerterbuch, etc.) - OCR words grouped into lines with x/y coordinates in prompt - Low-confidence words marked with (?) for LLM attention - Continuation row merging instructions in prompt - JSON response parsing with markdown code block handling - Fallback to original OCR on any error Frontend (admin-lehrer Grid Review): - "Vision-LLM" checkbox toggle - "Typ" dropdown (Vokabelseite, Woerterbuch, etc.) - Steps 1-3 defaults set to inactive Activate: Check "Vision-LLM", select document type, click "OCR neu + Grid". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-24 00:24:22 +02:00

Author

SHA1

Message

Date

Benjamin Admin

2f8270f77b

Add Vision-LLM OCR Fusion (Step 4) for degraded scans

CI / go-lint (push) Has been skipped

Details

CI / python-lint (push) Has been skipped

Details

CI / nodejs-lint (push) Has been skipped

Details

CI / test-go-school (push) Successful in 42s

Details

CI / test-go-edu-search (push) Successful in 29s

Details

CI / test-python-klausur (push) Failing after 2m43s

Details

CI / test-python-agent-core (push) Successful in 20s

Details

CI / test-nodejs-website (push) Successful in 27s

Details

New module vision_ocr_fusion.py: Sends scan image + OCR word
coordinates + document type to Qwen2.5-VL 32B. The LLM reads
the image visually while using OCR positions as structural hints.

Key features:
- Document-type-aware prompts (Vokabelseite, Woerterbuch, etc.)
- OCR words grouped into lines with x/y coordinates in prompt
- Low-confidence words marked with (?) for LLM attention
- Continuation row merging instructions in prompt
- JSON response parsing with markdown code block handling
- Fallback to original OCR on any error

Frontend (admin-lehrer Grid Review):
- "Vision-LLM" checkbox toggle
- "Typ" dropdown (Vokabelseite, Woerterbuch, etc.)
- Steps 1-3 defaults set to inactive

Activate: Check "Vision-LLM", select document type, click "OCR neu + Grid".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-24 00:24:22 +02:00

1 Commits