fix: update PaddleOCR init for v3.4+ API (lang=en, ocr_version=PP-OCRv5)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m5s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 18s

PaddleOCR 3.4.0 removed 'latin' language support. Use 'en' with
explicit ocr_version='PP-OCRv5' instead, with fallback for older API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-03-13 09:39:33 +01:00
parent 8e42e36ee4
commit 41ff7671cd

View File

@@ -18,11 +18,20 @@ def get_engine():
if _engine is None: if _engine is None:
from paddleocr import PaddleOCR from paddleocr import PaddleOCR
_engine = PaddleOCR( # PaddleOCR >= 3.x: use ocr_version param; fallback for older API
lang="latin", try:
use_angle_cls=True, _engine = PaddleOCR(
show_log=False, lang="en",
) ocr_version="PP-OCRv5",
use_angle_cls=True,
show_log=False,
)
except (ValueError, TypeError):
_engine = PaddleOCR(
lang="latin",
use_angle_cls=True,
show_log=False,
)
return _engine return _engine