Restructure: Move ocr_pipeline + labeling + crop into ocr/ package
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-klausur (push) Failing after 2m25s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 20s

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-04-25 21:51:43 +02:00
parent 59c400b9aa
commit 0504d22b8e
98 changed files with 10351 additions and 10152 deletions

View File

@@ -1,63 +1,4 @@
"""
OCR Pipeline API - Schrittweise Seitenrekonstruktion.
Thin wrapper that assembles all sub-module routers into a single
composite router. Backward-compatible: main.py and tests can still
import ``router``, ``_cache``, and helper functions from here.
Sub-modules (each < 1 000 lines):
ocr_pipeline_common shared state, cache, Pydantic models, helpers
ocr_pipeline_sessions session CRUD, image serving, doc-type
ocr_pipeline_geometry deskew, dewarp, structure, columns
ocr_pipeline_rows row detection, box-overlay helper
ocr_pipeline_words word detection (SSE), paddle-direct, word GT
ocr_pipeline_ocr_merge paddle/tesseract merge helpers, kombi endpoints
ocr_pipeline_postprocess LLM review, reconstruction, export, validation
ocr_pipeline_auto auto-mode orchestrator, reprocess
Lizenz: Apache 2.0
DATENSCHUTZ: Alle Verarbeitung erfolgt lokal.
"""
from fastapi import APIRouter
# ---------------------------------------------------------------------------
# Shared state (imported by main.py and orientation_crop_api.py)
# ---------------------------------------------------------------------------
from ocr_pipeline_common import ( # noqa: F401 re-exported
_cache,
_BORDER_GHOST_CHARS,
_filter_border_ghost_words,
)
# ---------------------------------------------------------------------------
# Sub-module routers
# ---------------------------------------------------------------------------
from ocr_pipeline_sessions import router as _sessions_router
from ocr_pipeline_geometry import router as _geometry_router
from ocr_pipeline_rows import router as _rows_router
from ocr_pipeline_words import router as _words_router
from ocr_pipeline_ocr_merge import (
router as _ocr_merge_router,
# Re-export for test backward compatibility
_split_paddle_multi_words, # noqa: F401
_group_words_into_rows, # noqa: F401
_merge_row_sequences, # noqa: F401
_merge_paddle_tesseract, # noqa: F401
)
from ocr_pipeline_postprocess import router as _postprocess_router
from ocr_pipeline_auto import router as _auto_router
from ocr_pipeline_regression import router as _regression_router
# ---------------------------------------------------------------------------
# Composite router (used by main.py)
# ---------------------------------------------------------------------------
router = APIRouter()
router.include_router(_sessions_router)
router.include_router(_geometry_router)
router.include_router(_rows_router)
router.include_router(_words_router)
router.include_router(_ocr_merge_router)
router.include_router(_postprocess_router)
router.include_router(_auto_router)
router.include_router(_regression_router)
# Backward-compat shim -- module moved to ocr/pipeline/api.py
import importlib as _importlib
import sys as _sys
_sys.modules[__name__] = _importlib.import_module("ocr.pipeline.api")