Commit Graph

10 Commits

Author SHA1 Message Date
Benjamin Admin 4183379dc5 feat(audit): P33 3-Spalten-Vendor-Konsistenz (DSE/Cookie-Doc/Banner)
CI / detect-changes (push) Successful in 11s
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / loc-budget (push) Failing after 20s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 44s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
check_three_source_vendor_consistency: scannt DSE-, Cookie-Doc- und
Banner-Vendor-Liste auf 15 typische Vendor-Signaturen (Google Analytics,
Meta Pixel, Hotjar, HubSpot, LinkedIn Insight, ...). Listet Vendors die
in mind. einer Quelle stehen, aber nicht in allen sources_with_data.

Liefert MEDIUM-Finding mit konkreter 'fehlt in: DSE, Banner-Liste'-
Liste pro Vendor. Empfehlung: zentrale Vendor-Liste pflegen + in alle
drei Dokumenttypen propagieren. (Art. 13(1)(c)+(e) DSGVO)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 17:11:47 +02:00
Benjamin Admin c93c88577c feat(audit): P88 PDF-Export via WeasyPrint
CI / detect-changes (push) Successful in 9s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 14s
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
GET /api/compliance/agent/snapshots/{id}/pdf liefert application/pdf
mit dem vollen Audit-Mail-Inhalt im A4-Print-Layout (Header mit
Site/Timestamp/Snapshot-ID, Seitenzahlen unten rechts).

check_replay.py liefert jetzt zusaetzlich 'full_html' (nicht nur
500-char-preview), damit der PDF-Renderer das komplette HTML hat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 17:06:48 +02:00
Benjamin Admin 3207acea3e fix(audit): Replay-Pipeline um P35/P77/P78/P36 Signals-Block ergaenzen
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 14s
CI / loc-budget (push) Failing after 16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 44s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
check_replay.py rendert jetzt auch die Textsignal-Findings (Save-Label-
Ambiguitaet, Cookies-in-DSE-Akzeptanz, JC-Klausel positiv, Social-Embeds).
Damit hat der Replay-Test parity mit der echten Mail-Pipeline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 17:04:02 +02:00
Benjamin Admin df8832c521 feat(audit): P75 Banner-vs-CMP + P84 Diff-Mode + P74/P96/P97 Doc-Types
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / loc-budget (push) Failing after 18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
P75 — check_banner_vs_cmp_partner_count: wenn Banner-Text 'N Partner'
nennt und N < cmp_vendors * 0.6, HIGH-Finding (Art. 13(1)(e) DSGVO).
Erkennt Verharmlosung der tatsaechlichen Vendor-Anzahl.

P84 — run_diff.py: vergleicht aktuellen Lauf mit letztem Snapshot
derselben Site (set-Diff auf normalisierten Finding-Labels). Block
ueber dem GF-1-Pager: 'Seit letztem Lauf: X Findings weg, Y neue'.
USP — keiner der grossen Anbieter hat das.

P74/P96/P97 — Labels fuer legal_notice (Rechtliche Hinweise / IP /
Forward-Looking), dsa (Art. 12+17 Digital Services Act), lizenzhinweise
(OSS-Compliance) in _DOC_TYPE_LABELS registriert. Echte Pflichtangaben-
Checks kommen separat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 16:38:25 +02:00
Benjamin Admin 7842c95532 feat(audit): P92 CMP-Tool-Verfuegbarkeit + P94 Banner-vs-Cookie-Doc-Konsistenz
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
P92 — Wenn der Nutzer 'Anpassen'/'Einstellungen' klickt und der
CMP-Settings-Bereich kein Fehlerfreies Laden zeigt (Error, Timeout,
<80 Zeichen ohne Kategorien, keine Toggles), ist das ein HIGH-
Finding. Granulare Wahl formal vorhanden, faktisch nicht
funktionsfaehig (Art. 7 (3) DSGVO + EDPB 03/2022).

P94 — Cookie-Liste im Banner-Settings vs Cookie-Richtlinie. Heuristik
extrahiert Cookie-Namen aus dem Cookie-Doc-Text (regex auf typische
camelCase/_underscored Patterns + Vendor-Prefixes _ga/_gid/ot_/uc_).
Wenn |only_in_doc| >= 5 ODER |only_in_banner| >= 3 → MEDIUM-Finding.
|only_in_doc| >= 15 UND |only_in_banner| >= 5 → HIGH.

Beide Findings landen im neuen Mail-Block 'Banner-Konsistenz-Pruefung'
(amber-yellow) zwischen Mismatch-Block und VVT. Auch in
check_replay.py eingehaengt.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 16:31:19 +02:00
Benjamin Admin 08671adfdf feat(audit): P82 GF-1-Pager + P87 Konfidenz-Score pro Finding
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / detect-changes (push) Successful in 12s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / test-python-backend (push) Successful in 43s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 18s
CI / loc-budget (push) Failing after 19s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
P82 — gf_one_pager.py: kompakte 5-Bullet-Kurzfassung ganz oben in der
Mail. Score (gross + Farbe), Delta-zu-Vorlauf, Top-Findings nach
HIGH/MEDIUM sortiert mit zustaendiger Rolle (DSB / Marketing / IT /
Legal / Web-Team) und Klassifizierungsbits aus dem Wizard.
Sachlicher Ton — keine 4%-Drohung, '4-8 Wochen' als realistischer
Zeitrahmen. Eingehaengt vor Critical-Findings-Block in Mail-Composition
und Replay-Pipeline.

P87 — finding_confidence.py: 13 Regex-Regeln liefern (confidence_pct,
reason) pro Finding-Label. Direkt im DOM beobachtbar = 95-98%,
Library-Mismatch = 82%, Textmuster-Match auf Pflichtangaben = 75-88%.
Im 1-Pager als kleines '(NN% Konfidenz)'-Tag mit Reason-Tooltip
hinter jedem Finding gerendert.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 16:20:19 +02:00
Benjamin Admin 50fc0ecc59 feat(audit): P79 Pre-Scan-Wizard (8 Pflichtfelder) + P99 erweitert + P102 Replay-Fix
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 14s
CI / nodejs-lint (push) Has been skipped
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m56s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 40s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
P79: PreScanWizard.tsx mit 8 Pflichtfeldern (Branche, B2B/B2C,
Direkt-Vertrieb, Rechtsform, Konzern-Struktur, MA-Zahl, Besondere
Daten, Drittland). Scan-Button disabled bis alle 8 ausgefuellt. Werte
landen in scan_context und ueber Backend in compliance_check_snapshots.

P99: DOC_TYPES um dsa + legal_notice + lizenzhinweise + nutzungsbedingungen
erweitert. URL-hinzufuegen-Button war schon da.

P102 (Replay-Bug): check_replay.py liest jetzt e.get('text') statt
nur full_text — Snapshot-Schema verwendet 'text'. Library-Mismatch-
Block wird damit auch im Replay angezeigt.

Backend: ComplianceCheckRequest.scan_context optional; save_snapshot
persistiert ihn in compliance_check_snapshots.scan_context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 15:59:01 +02:00
Benjamin Admin 94057b1536 feat(audit): VW-Cookie-Bug-Fix + P101/P102 Cookie-Library-Mismatch-Findings
CI / loc-budget (push) Failing after 19s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
VW-Bug B1: extract_vendors_via_llm hatte max_text_chars=12000 -> bei
VW-Cookie-Doc (60k chars, 100 Cookies in Tabelle) wurden 80% abgeschnitten,
LLM extrahierte nur 1 Vendor. Fix: max_text_chars=50000, num_predict
6000->16000 fuer mehr Vendor-Output, Ollama-Timeout 120s->420s.

P101 Aggregator-Script (backend-compliance/scripts/cookie_library_enrich.py)
geht alle compliance_check_snapshots durch und extrahiert (cookie_name,
declared_category, observed_sites). Erste Auswertung ueber 8 Snapshots:
101 unique Cookies, 47 in Library, 54 unbekannt, 18 Mismatches.

P102 Cookie-Klassifikations-Pruefung als Mail-Block. Vergleicht
Site-deklarierte Kategorie vs Library + Vendor-Doku. HIGH wenn Library
sagt 'marketing' aber Site als 'essential'/'statistics' deklariert
(faktische Drittland-/Werbe-Verarbeitung versteckt). MEDIUM sonst.
In agent_compliance_check_routes Mail-Komposition + Replay-Pipeline
eingebaut.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 15:47:11 +02:00
Benjamin Admin 50ed0f45af fix(replay): P80 — DocCheckResult-Import entfernt (gibt es nicht in runner)
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 36s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 14s
CI / loc-budget (push) Failing after 15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
Vorher hatte ich den Container hotfixed aber den Fix nicht committed.
Beim naechsten Rebuild kam der Bug aus dem Image zurueck.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 12:25:04 +02:00
Benjamin Admin 4946571863 feat(audit-pipeline): P72-v2 Heuristik nachgeschaerft + P80 Mini-Replay-Endpoint
CI / detect-changes (push) Successful in 9s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 13s
CI / loc-budget (push) Failing after 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 36s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / nodejs-build (push) Has been skipped
P72-v2  MC-Scope-Classifier Heuristik v2 — v1 hatte 79% 'other'-Bucket
        (Patterns zu strict). v2 deckt deutlich breiter ab:
          - DSE: Art. 13/14 + Betroffenenrechte (Art. 15-22) + DSB +
            Aufsichtsbehoerde + Speicherdauer + besondere Kategorien
          - TOM: Art. 32 + Verschluesselung/Backup/Pseudonymisierung +
            Zugriffskontrolle + ISO 27001 + BSI-Grundschutz + Audit-Log
          - cookie_richtlinie: Tracking-Pixel + Webstorage + GA/Matomo/
            Hotjar/Pixel/GTM
          - process: VVT (Art. 30) + DSFA (Art. 35) + Datenpannen
            (Art. 33/34) + HinSchG + Schulungen + Loeschkonzept
        Script `backfill_mc_scope_v2.py` re-classifiziert NUR den
        'other'-Bucket (spezifische v1-Buckets bleiben unangetastet).

P80    Mini-Replay-Endpoint (v1):
          POST /compliance-check/snapshots/{id}/replay
          ?recipient=foo@bar.com & dry_run=false
        Laedt Snapshot, rendert Mail mit AKTUELLEM Render-Code (P63-P67,
        P59b/P61/P62). Sendet [REPLAY]-prefixed Mail oder gibt nur
        HTML-Stats zurueck (dry_run).
        Effekt: 7min Re-Scan -> 2-5sec fuer Mail-Layout-Iterationen.
        v2 (spaeter): MC-Scorecard mit aktuellem scope_doc_type-Filter
        ueber Snapshot — erfordert _run_compliance_check Refactoring.

Plus Bugfix: GET /snapshots/{id} raised jetzt HTTPException statt
Tuple-Return (FastAPI hat Tuple als JSON-Array zurueckgegeben).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 10:21:56 +02:00