e1dadc8027
Stage 1.a Browser-Matrix (Task #15) — Multi-Engine Scaffolding: - consent-tester/Dockerfile: firefox + webkit + Xvfb deps - playwright install chromium firefox webkit - services/browser_profiles.py: Registry mit DEFAULT_PROFILES (Chromium-Headed/Firefox-Headed/WebKit-Headed/Mobile-Safari) + EXTRA_PROFILES (Chrome-Channel, Edge, Brave) - services/multi_browser_scanner.py: run_matrix() orchestriert N parallele Scans + worst-of-Aggregation + 3 Sub-Scores (Pre-Consent 50%, Reject-Respekt 30%, Banner-Design 20%) + Hard-Fail-Cap auf <60% bei Pre-Consent/Reject-Verstoß - routes_matrix.py: POST /scan-matrix Endpoint (eigenes Modul, damit main.py unter 500 LOC bleibt) KNOWN: Stage 1.a-Shim ruft alle Profile auf demselben Chromium, echte Engine-Diversität in Stage 1.b (consent_scanner.py Param) Coverage-Gap 3 (Task #17): 2/3 verbleibende GT-Lücken geschlossen: - B9 impressum_multi_entity_check (IMPRESSUM-001): erkennt USt-IdNr/HR/GF-Fehlen pro Entity bei multi-entity Impressen (Elli: USt-IdNr nur bei Elli Mobility, fehlt bei VW Group Charging) - B10 transfer_mechanism_check (TRANSFER-001): pro Non-EU-Vendor in cmp_vendors prüft DSE auf DPF/SCCs/BCRs/Einwilligung im ±400-char-Window. Findet Vendors ohne benannten Mechanismus. - TH-RETENTION-002 (AI-Datenkategorie-Differenzierung) bleibt semantisch-tief, vorgesehen für Specialist-Agents Task #18. Plausibility-LLM Empty-Response-Härtung (Task #16): - BATCH_SIZE 8 → 4, EXCERPT 4000 → 1500 chars, TIMEOUT 60 → 45s - Single-retry mit halbierter Batch wenn LLM empty content zurückgibt — qwen3:30b-a3b rejektiert manchmal ≥6-Item-Prompts unter format='json'. Falls auch Half-Batch empty: log + skip. - Pipeline läuft jetzt nicht mehr 10min in Timeouts. GT-Coverage Sprung: 10/13 → 11/13 (85%). 4/4 HIGH ✓, 5/6 MEDIUM ✓, 2/3 LOW ✓. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
44 lines
1.4 KiB
Docker
44 lines
1.4 KiB
Docker
FROM python:3.12-slim-bookworm
|
|
|
|
WORKDIR /app
|
|
|
|
# Install system dependencies for Playwright/Chromium
|
|
RUN apt-get update && apt-get install -y --no-install-recommends \
|
|
libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 \
|
|
libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 \
|
|
libxrandr2 libgbm1 libpango-1.0-0 libcairo2 libasound2 \
|
|
curl \
|
|
# Browser-matrix stage 1: Firefox + WebKit deps + Xvfb (headed runs)
|
|
xvfb \
|
|
libdbus-glib-1-2 libxt6 \
|
|
libwoff1 libvpx7 libevent-2.1-7 libopus0 libgstreamer-plugins-base1.0-0 \
|
|
libgstreamer-gl1.0-0 libgstreamer1.0-0 libwebpdemux2 libharfbuzz-icu0 \
|
|
libenchant-2-2 libsecret-1-0 libhyphen0 libmanette-0.2-0 libflite1 \
|
|
libgles2 libx264-164 \
|
|
&& rm -rf /var/lib/apt/lists/*
|
|
|
|
# Create user BEFORE installing Playwright (so browsers are in user's cache)
|
|
RUN useradd --create-home appuser
|
|
|
|
COPY requirements.txt .
|
|
RUN pip install --no-cache-dir -r requirements.txt
|
|
|
|
# Install Playwright browsers AS appuser (so they land in /home/appuser/.cache/)
|
|
# Stage 1: chromium + firefox + webkit (Mobile-Safari = WebKit + devices preset)
|
|
USER appuser
|
|
RUN playwright install chromium firefox webkit
|
|
USER root
|
|
|
|
COPY . .
|
|
RUN chown -R appuser:appuser /app
|
|
|
|
USER appuser
|
|
|
|
EXPOSE 8094
|
|
|
|
# P83 — Build-SHA fuer check-rebuild-needed.sh
|
|
ARG BUILD_SHA="unknown"
|
|
ENV BUILD_SHA=${BUILD_SHA}
|
|
|
|
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8094"]
|