Files
breakpilot-lehrer/klausur-service/backend/ocr_pipeline_api.py
Benjamin Admin f655db30e4
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 35s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m47s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 22s
Add Ground Truth regression test system for OCR pipeline
Extract _build_grid_core() from build_grid() endpoint for reuse.
New ocr_pipeline_regression.py with endpoints to mark sessions as
ground truth, list them, and run regression comparisons after code
changes. Frontend button in StepGroundTruth.tsx to mark/update GT.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 13:46:48 +01:00

64 lines
2.7 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
"""
OCR Pipeline API - Schrittweise Seitenrekonstruktion.
Thin wrapper that assembles all sub-module routers into a single
composite router. Backward-compatible: main.py and tests can still
import ``router``, ``_cache``, and helper functions from here.
Sub-modules (each < 1 000 lines):
ocr_pipeline_common shared state, cache, Pydantic models, helpers
ocr_pipeline_sessions session CRUD, image serving, doc-type
ocr_pipeline_geometry deskew, dewarp, structure, columns
ocr_pipeline_rows row detection, box-overlay helper
ocr_pipeline_words word detection (SSE), paddle-direct, word GT
ocr_pipeline_ocr_merge paddle/tesseract merge helpers, kombi endpoints
ocr_pipeline_postprocess LLM review, reconstruction, export, validation
ocr_pipeline_auto auto-mode orchestrator, reprocess
Lizenz: Apache 2.0
DATENSCHUTZ: Alle Verarbeitung erfolgt lokal.
"""
from fastapi import APIRouter
# ---------------------------------------------------------------------------
# Shared state (imported by main.py and orientation_crop_api.py)
# ---------------------------------------------------------------------------
from ocr_pipeline_common import ( # noqa: F401 re-exported
_cache,
_BORDER_GHOST_CHARS,
_filter_border_ghost_words,
)
# ---------------------------------------------------------------------------
# Sub-module routers
# ---------------------------------------------------------------------------
from ocr_pipeline_sessions import router as _sessions_router
from ocr_pipeline_geometry import router as _geometry_router
from ocr_pipeline_rows import router as _rows_router
from ocr_pipeline_words import router as _words_router
from ocr_pipeline_ocr_merge import (
router as _ocr_merge_router,
# Re-export for test backward compatibility
_split_paddle_multi_words, # noqa: F401
_group_words_into_rows, # noqa: F401
_merge_row_sequences, # noqa: F401
_merge_paddle_tesseract, # noqa: F401
)
from ocr_pipeline_postprocess import router as _postprocess_router
from ocr_pipeline_auto import router as _auto_router
from ocr_pipeline_regression import router as _regression_router
# ---------------------------------------------------------------------------
# Composite router (used by main.py)
# ---------------------------------------------------------------------------
router = APIRouter()
router.include_router(_sessions_router)
router.include_router(_geometry_router)
router.include_router(_rows_router)
router.include_router(_words_router)
router.include_router(_ocr_merge_router)
router.include_router(_postprocess_router)
router.include_router(_auto_router)
router.include_router(_regression_router)