1784b43d72
Statt fragiler text-Regex + LLM-Cascade-Workarounds: deterministische
Pipeline. consent-tester macht Full-Page-Screenshot der Cookie-Richtlinie
(akzeptiert Banner, klappt Accordions, brennt Timestamp ein). Backend
laesst Tesseract OCR (deu, PSM 4) drueber + anchor-basierter Parser
extrahiert {name, category, purpose, duration, type} pro Cookie.
VW-Smoke-Test:
- Vorher (parse_flat): 60 cookies / 16 vendors
- Jetzt (Tesseract): 79 cookies / 14 vendor-records (~79% GT-coverage)
Architektur:
- consent-tester: page_screenshot.py + /capture-evidence Endpoint
- backend: cookie_screenshot_ocr.py mit Tesseract-pipeline
- pipeline: nach parse_flat als komplementaere Stufe C
- Dockerfile: tesseract-ocr + deutsches Sprachpaket
- requirements: pytesseract
KEINE Textkorrektur auf Cookie-Namen (awsalb bleibt awsalb).
Timestamp im Screenshot = juristischer Beweis was wir zum Scan-Zeitpunkt
wirklich auf der Site gesehen haben.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
56 lines
1.0 KiB
Plaintext
56 lines
1.0 KiB
Plaintext
# BreakPilot Compliance Backend Dependencies
|
|
|
|
# Web Framework
|
|
fastapi==0.123.9
|
|
uvicorn==0.38.0
|
|
starlette==0.49.3
|
|
|
|
# HTTP Client (consent-service proxy, DSR proxy)
|
|
httpx==0.28.1
|
|
requests==2.32.5
|
|
|
|
# Validation & Types
|
|
pydantic==2.12.5
|
|
pydantic_core==2.41.5
|
|
email-validator==2.3.0
|
|
annotated-types==0.7.0
|
|
|
|
# Authentication
|
|
PyJWT==2.10.1
|
|
python-multipart>=0.0.22
|
|
|
|
# AI / Anthropic (compliance AI assistant)
|
|
anthropic==0.75.0
|
|
|
|
# Re-Ranking: see requirements-reranker.txt (optional, CPU-only PyTorch)
|
|
|
|
# PDF Generation (GDPR export, audit reports)
|
|
weasyprint>=68.0
|
|
reportlab==4.2.5
|
|
Jinja2==3.1.6
|
|
|
|
# Document Processing (Word import for consent admin)
|
|
mammoth==1.11.0
|
|
Markdown==3.9
|
|
|
|
# PDF Text Extraction (document import analysis)
|
|
PyMuPDF==1.25.3
|
|
|
|
# Utilities
|
|
python-dateutil==2.9.0.post0
|
|
|
|
# Database
|
|
asyncpg==0.30.0
|
|
SQLAlchemy==2.0.36
|
|
psycopg2-binary==2.9.10
|
|
|
|
# Cache (Valkey/Redis - rate limiter middleware)
|
|
redis==5.2.1
|
|
|
|
# Security: Pin transitive dependencies to patched versions
|
|
idna>=3.7
|
|
cryptography>=42.0.0
|
|
pillow>=12.1.1
|
|
python-docx==1.2.0
|
|
pytesseract>=0.3.13
|