- Add pyspellchecker (MIT) to requirements for EN+DE dictionary lookup
- New spell_review_entries_sync() + spell_review_entries_streaming():
- Dictionary-backed substitution: checks if corrected word is known
- Structural rule: digit at pos 0 + lowercase rest → most likely letter
(e.g. "8en"→"Ben", "8uch"→"Buch", "5ee"→"See", "6eld"→"Geld")
- Pattern rule: "|." → "1." for numbered list prefixes
- Standalone "|" → "I" (capital I)
- IPA entries still protected via existing _entry_needs_review filter
- Headings/untranslated words (e.g. "Story") are untouched (no susp. chars)
- llm_review_entries + llm_review_entries_streaming: route via REVIEW_ENGINE
env var ("spell" default, "llm" to restore previous behaviour)
- docker-compose.yml: REVIEW_ENGINE=${REVIEW_ENGINE:-spell}
- LLM code preserved for fallback (set REVIEW_ENGINE=llm in .env)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
51 lines
1.1 KiB
Plaintext
51 lines
1.1 KiB
Plaintext
fastapi>=0.109.0
|
|
uvicorn[standard]>=0.27.0
|
|
python-multipart>=0.0.6
|
|
pyjwt>=2.8.0
|
|
httpx>=0.26.0
|
|
python-dotenv>=1.0.0
|
|
|
|
# BYOEH Dependencies
|
|
qdrant-client>=1.7.0
|
|
cryptography>=41.0.0
|
|
PyPDF2>=3.0.0
|
|
PyMuPDF>=1.24.0
|
|
|
|
# PyTorch CPU-only (smaller, no CUDA needed for Docker on Mac)
|
|
--extra-index-url https://download.pytorch.org/whl/cpu
|
|
torch>=2.0.0
|
|
|
|
# Local Embeddings (no API key needed)
|
|
sentence-transformers>=2.2.0
|
|
|
|
# MinIO Object Storage
|
|
minio>=7.2.0
|
|
|
|
# OpenCV for handwriting detection (headless = no GUI, smaller for CI)
|
|
opencv-python-headless>=4.8.0
|
|
|
|
# Tesseract OCR Python binding (requires system tesseract-ocr package)
|
|
pytesseract>=0.3.10
|
|
Pillow>=10.0.0
|
|
|
|
# RapidOCR (PaddleOCR models on ONNX Runtime — works on ARM64 natively)
|
|
rapidocr
|
|
onnxruntime
|
|
|
|
# IPA pronunciation dictionary lookup (MIT license, bundled CMU dict ~134k words)
|
|
eng-to-ipa
|
|
|
|
# Spell-checker for rule-based OCR correction (MIT license)
|
|
pyspellchecker>=0.8.1
|
|
|
|
# PostgreSQL (for metrics storage)
|
|
psycopg2-binary>=2.9.0
|
|
asyncpg>=0.29.0
|
|
|
|
# Email validation for Pydantic
|
|
email-validator>=2.0.0
|
|
|
|
# Testing
|
|
pytest>=8.0.0
|
|
pytest-asyncio>=0.23.0
|