Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 54s
CI / test-go-edu-search (push) Successful in 47s
CI / test-python-klausur (push) Failing after 2m31s
CI / test-python-agent-core (push) Successful in 23s
CI / test-nodejs-website (push) Successful in 32s
Neuer Algorithmus in cv_words_first.py: Clustert Tesseract word_boxes direkt zu Spalten (X-Gap) und Zeilen (Y-Proximity), baut Zellen an Schnittpunkten. Kein Spalten-/Zeilenerkennung noetig. - cv_words_first.py: _cluster_columns, _cluster_rows, _build_cells, build_grid_from_words - ocr_pipeline_api.py: grid_method Parameter (v2|words_first) im /words Endpoint - StepWordRecognition.tsx: Dropdown Toggle fuer Grid-Methode - OCR-Pipeline.md: Doku v4.3.0 mit Words-First Algorithmus - 15 Unit-Tests fuer cv_words_first Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
38 lines
1.4 KiB
Python
38 lines
1.4 KiB
Python
"""
|
|
CV-based Document Reconstruction Pipeline for Vocabulary Extraction.
|
|
|
|
Re-export facade — all logic lives in the sub-modules:
|
|
|
|
cv_vocab_types Dataklassen, Konstanten, IPA, Feature-Flags
|
|
cv_preprocessing Bild-I/O, Orientierung, Deskew, Dewarp
|
|
cv_layout Dokumenttyp, Spalten, Zeilen, Klassifikation
|
|
cv_ocr_engines OCR-Engines, Vocab-Postprocessing, Text-Cleaning
|
|
cv_cell_grid Cell-Grid (v2 + Legacy), Vocab-Konvertierung
|
|
cv_review LLM/Spell Review, Pipeline-Orchestrierung
|
|
|
|
Lizenz: Apache 2.0 (kommerziell nutzbar)
|
|
DATENSCHUTZ: Alle Verarbeitung erfolgt lokal.
|
|
"""
|
|
|
|
from cv_vocab_types import * # noqa: F401,F403
|
|
from cv_preprocessing import * # noqa: F401,F403
|
|
from cv_layout import * # noqa: F401,F403
|
|
from cv_ocr_engines import * # noqa: F401,F403
|
|
from cv_cell_grid import * # noqa: F401,F403
|
|
from cv_box_detect import * # noqa: F401,F403
|
|
from cv_review import * # noqa: F401,F403
|
|
|
|
# Private names used by consumers — not covered by wildcard re-exports.
|
|
from cv_preprocessing import _apply_shear # noqa: F401
|
|
from cv_layout import ( # noqa: F401
|
|
_detect_header_footer_gaps,
|
|
_detect_sub_columns,
|
|
_split_broad_columns,
|
|
)
|
|
from cv_ocr_engines import ( # noqa: F401
|
|
_fix_character_confusion,
|
|
_fix_phonetic_brackets,
|
|
)
|
|
from cv_cell_grid import _cells_to_vocab_entries # noqa: F401
|
|
from cv_words_first import build_grid_from_words # noqa: F401
|