feat: automatische Orientierungserkennung fuer umgedrehte Scans
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 23s
CI / test-go-edu-search (push) Successful in 25s
CI / test-python-klausur (push) Failing after 1m50s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 15s

Tesseract OSD erkennt 0/90/180/270° Rotation und korrigiert
automatisch vor dem Deskew. Loest das Problem mit Buchscannern,
bei denen jede 2. Seite auf dem Kopf steht.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-03-07 17:26:21 +01:00
parent 7a1bd5e82d
commit a5635e0c43
2 changed files with 58 additions and 0 deletions

View File

@@ -71,6 +71,7 @@ try:
detect_row_geometry, build_cell_grid_v2,
_cells_to_vocab_entries, _detect_sub_columns, _detect_header_footer_gaps,
expand_narrow_columns, positional_column_regions, llm_review_entries,
detect_and_fix_orientation,
_fix_phonetic_brackets,
render_pdf_high_res,
PageRegion, RowGeometry,
@@ -1360,6 +1361,15 @@ async def _run_ocr_pipeline_for_page(
img_h, img_w = img_bgr.shape[:2]
logger.info(f"OCR Pipeline page {page_number + 1}: image {img_w}x{img_h}")
# 1b. Orientation detection (fix upside-down scans)
t0 = _time.time()
img_bgr, rotation = detect_and_fix_orientation(img_bgr)
if rotation:
img_h, img_w = img_bgr.shape[:2]
logger.info(f" orientation: rotated {rotation}° ({_time.time() - t0:.1f}s)")
else:
logger.info(f" orientation: OK ({_time.time() - t0:.1f}s)")
# 2. Create pipeline session in DB (for debugging in admin UI)
pipeline_session_id = str(uuid.uuid4())
try: