Each module is under 1050 lines: - ocr_pipeline_common.py (354) - shared state, cache, models, helpers - ocr_pipeline_sessions.py (483) - session CRUD, image serving, doc-type - ocr_pipeline_geometry.py (1025) - deskew, dewarp, structure, columns - ocr_pipeline_rows.py (348) - row detection, box-overlay helper - ocr_pipeline_words.py (876) - word detection (SSE), paddle-direct - ocr_pipeline_ocr_merge.py (615) - merge helpers, kombi endpoints - ocr_pipeline_postprocess.py (929) - LLM review, reconstruction, export - ocr_pipeline_auto.py (705) - auto-mode orchestrator, reprocess ocr_pipeline_api.py is now a 61-line thin wrapper that re-exports router, _cache, and test-imported symbols for backward compatibility. No changes needed in main.py or tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
62 lines
2.6 KiB
Python
62 lines
2.6 KiB
Python
"""
|
||
OCR Pipeline API - Schrittweise Seitenrekonstruktion.
|
||
|
||
Thin wrapper that assembles all sub-module routers into a single
|
||
composite router. Backward-compatible: main.py and tests can still
|
||
import ``router``, ``_cache``, and helper functions from here.
|
||
|
||
Sub-modules (each < 1 000 lines):
|
||
ocr_pipeline_common – shared state, cache, Pydantic models, helpers
|
||
ocr_pipeline_sessions – session CRUD, image serving, doc-type
|
||
ocr_pipeline_geometry – deskew, dewarp, structure, columns
|
||
ocr_pipeline_rows – row detection, box-overlay helper
|
||
ocr_pipeline_words – word detection (SSE), paddle-direct, word GT
|
||
ocr_pipeline_ocr_merge – paddle/tesseract merge helpers, kombi endpoints
|
||
ocr_pipeline_postprocess – LLM review, reconstruction, export, validation
|
||
ocr_pipeline_auto – auto-mode orchestrator, reprocess
|
||
|
||
Lizenz: Apache 2.0
|
||
DATENSCHUTZ: Alle Verarbeitung erfolgt lokal.
|
||
"""
|
||
|
||
from fastapi import APIRouter
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# Shared state (imported by main.py and orientation_crop_api.py)
|
||
# ---------------------------------------------------------------------------
|
||
from ocr_pipeline_common import ( # noqa: F401 – re-exported
|
||
_cache,
|
||
_BORDER_GHOST_CHARS,
|
||
_filter_border_ghost_words,
|
||
)
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# Sub-module routers
|
||
# ---------------------------------------------------------------------------
|
||
from ocr_pipeline_sessions import router as _sessions_router
|
||
from ocr_pipeline_geometry import router as _geometry_router
|
||
from ocr_pipeline_rows import router as _rows_router
|
||
from ocr_pipeline_words import router as _words_router
|
||
from ocr_pipeline_ocr_merge import (
|
||
router as _ocr_merge_router,
|
||
# Re-export for test backward compatibility
|
||
_split_paddle_multi_words, # noqa: F401
|
||
_group_words_into_rows, # noqa: F401
|
||
_merge_row_sequences, # noqa: F401
|
||
_merge_paddle_tesseract, # noqa: F401
|
||
)
|
||
from ocr_pipeline_postprocess import router as _postprocess_router
|
||
from ocr_pipeline_auto import router as _auto_router
|
||
|
||
# ---------------------------------------------------------------------------
|
||
# Composite router (used by main.py)
|
||
# ---------------------------------------------------------------------------
|
||
router = APIRouter()
|
||
router.include_router(_sessions_router)
|
||
router.include_router(_geometry_router)
|
||
router.include_router(_rows_router)
|
||
router.include_router(_words_router)
|
||
router.include_router(_ocr_merge_router)
|
||
router.include_router(_postprocess_router)
|
||
router.include_router(_auto_router)
|