c4d9b1426f94452af4f2efedca79feeaa1e5756a
314 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
08c08fcba2 |
feat(crawl): Vollstaendigkeit — Shadow-DOM/versteckte Links + Interaktions-Fixpunkt + Wayback-CDX-Orphans
CI / test-python-backend (push) Successful in 30s
CI / detect-changes (push) Successful in 9s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 12s
CI / loc-budget (push) Successful in 15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Damit die Specialist-Agents auf vollstaendigem Website-Content arbeiten:
A — _find_dsi_links pierct jetzt Shadow-DOM (Web-Components wie Usercentrics/
Mercedes) rekursiv; versteckte (display:none) Links werden erfasst + als
Coverage-Metadatum geflaggt.
B — _expand_to_fixpoint klappt Akkordeons/Tabs/Hover-Menues in einer Schleife
auf, bis das DOM stabil ist (statt 1 Pass); erweiterte Selektoren;
Coverage-Telemetrie (Runden, expandierte Elemente, DOM-Wachstum, Shadow-/
versteckte Links) → Response + Backend-Log.
C — legacy_url_cdx.cdx_enumerate listet via Wayback-CDX-API ALLE je
archivierten URLs der Domain → findet Orphan-/Legacy-Seiten, die nie im
Slug-Raster standen (z.B. nicht mehr verlinktes /datenschutz, per Direkt-
URL noch erreichbar). Fliesst durch das bestehende Legacy-URL-Inventar.
Tests: test_legacy_url_cdx.py (6) + consent-tester/tests/test_dsi_discovery.py
(Pure-Helper + Real-Browser-Integration). Alle gruen, LOC-Gate gruen.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
||
|
|
593baace7c |
fix(agents): HTML-Entity-Decode vor Agent + Pattern duldet '('
CI / detect-changes (push) Successful in 6s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 28s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
Bug bei BMW: dsi-discovery liefert HTML-Entities ( ) als
Literal-Strings ohne Decode. Beispiel im BMW-Impressum:
'wird gesetzlich durch den Vorstand (Milan Nedeljkovic, …)'
Mein Pattern erwartet ':' / '.' / Whitespace nach Vorstand →
matched nicht das '&' → false-positive HIGH-Finding.
Fix 1 (Hauptfix): Test-Harness ruft html.unescape() vor agent.evaluate()
auf, so dass jeder Agent sauberen Text bekommt — entkoppelt von
dsi-discovery-Eigenarten.
Fix 2 (Belt-and-suspenders): Pattern duldet jetzt auch '(' direkt
nach Vorstand/Geschaeftsfuehrer (falls Decode mal fehlschlaegt).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
361a5e7605 |
feat(agents): Test-Harness nutzt volle Compliance-Pipeline für Fetch
CI / detect-changes (push) Successful in 7s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 10s
CI / loc-budget (push) Successful in 12s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / test-python-backend (push) Successful in 28s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Statt der simplen dsi-discovery-Wrapper-Funktion ruft der Test-Harness
jetzt _fetch_text() aus agent_check/_fetch.py — die VOLLE Pipeline
die auch der produktive Compliance-Check verwendet:
- consent-tester dsi-discovery mit 240s Timeout (statt 120s)
- doc_type-aware max_documents (1 für cookie/dse, 3 für impressum)
- CMP-Payload-Capture (ePaaS, OneTrust …)
- HTTP-Fallback mit Browser-User-Agent + DomainRateLimiter
- HTML-Tag-Strip wenn Playwright fail
Damit funktionieren Cloudflare-/Anti-Bot-geschützte Sites wie BMW
und Elli auch im Test-Harness — vorher Timeout nach 90s.
Plus: bei leerem Fetch klare Fehlermeldung im Slot
('Cloudflare-/Anti-Bot-geschützt — Tipp: Text manuell einfügen')
statt silent-fail. cmp_payloads landen jetzt auch im Vault.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
3ae4e60c9d |
feat(agents): SSE-Endpoint + Agent-Test-Tab (5-URL parallel)
CI / detect-changes (push) Successful in 7s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 12s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m24s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 29s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Backend:
- specialist_agent_routes.py: GET /agents, POST /test/start (run_id),
GET /test/stream/{run_id} (SSE), GET /run/{run_id}/result,
GET /run/{run_id}/artifacts, GET /run/{run_id}/artifact/{path},
DELETE /run/{run_id}, GET /runs.
- Per-URL async orchestrator: text fetch via consent-tester
dsi-discovery → agent.evaluate() → vault.put_json + stream events.
- Tests: 7/7 grün.
Frontend:
- /api/sdk/v1/specialist-agent proxy mit SSE-passthrough.
- AgentTestTab.tsx: Agent-Wähler + 5 URL-Slots + Live-Events +
Speedometer (OK/N-A/HIGH/MEDIUM/LOW) + Findings + Recommendations +
Eskalations-Log + Artefakt-Link pro Slot.
- Neuer Tab "Agent-Test" in /sdk/agent.
User-Wunsch 2026-06-08: pro Agent isoliert testen, 5 URLs gleichzeitig,
Live-Updates statt Polling-Wartespiel.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
d6b8bf87c2 |
fix: 4 Bugs gemeinsam — B22 PDF + B17 Walk-Fallback + company_name + Plausibility-Fallback
CI / detect-changes (push) Successful in 9s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / test-python-backend (push) Successful in 29s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 10s
CI / loc-budget (push) Successful in 13s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
(1) B22 Cross-Domain (fix #59):
Elli-Test fand AGB auf logpay.de NICHT obwohl URL in doc_entries
korrekt. Vermutete Ursache: Discovery-Phase A drops/überschreibt
Original-URL bei PDF-Fetch-Fail (word_count=0).
Fix: _collect_audit_urls() iteriert über state.doc_entries +
rejected_url + req.documents — Cross-Domain-Hosting ist
unabhängig vom Text-Inhalt. Plus Trace-Logging für künftige
Diagnose. Dedup per (doc_type, host_sld).
(2) B17 Audit-Walk-Fail-Fallback (fix #60):
BMW v5 hatte audit_walk=None ohne Mail-Hinweis. Vermutlich
180s-Timeout bei OneTrust-CMP-Banner-Tour.
Fix: Timeout 180s → 300s. Plus: Bei Fail wird ein Hinweis-
Stub mit error-Grund in state["audit_walk"] + HTML-Block
geschrieben — Reviewer sieht den Fail statt silent-skip.
(3) company_name + origin_domain im Backend (fix #61):
Frontend sendet seit
|
||
|
|
d208a2bde2 |
feat: Mail-Restrukturierung + B22 Cross-Domain-Doc-Detector
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 13s
CI / go-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / detect-changes (push) Successful in 7s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / python-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
User-Feedback BMW v5: "740 Cookies verschwunden auf 31, Übersicht
verloren". Drei Anpassungen:
Mail-Restrukturierung (_executive_summary.py + _compose.py):
- render_executive_summary(): Top-of-mail TL;DR mit
Compliance-Score (gross + farbig), Top-3-Findings nach
Severity, Cookie-Statistik (deklariert/Browser/Drittland),
Severity-Verteilungs-Chips.
- collapsible(): wrapt jeden Block in <details>/<summary>.
Mailpit + alle modernen Mail-Clients rendern das nativ.
- _compose.py: alle 18+ B-Blöcke + per_doc + per_theme +
legacy_html in Akkordeons. NUR Critical-Findings + Sofort-
massnahmen sind immer offen — Reviewer sieht ~15 Zeilen
Übersicht und klappt selektiv auf.
- Cookie-Inventar (742) hat jetzt eigene Sektion ganz oben
(Akkordeon "🍪 Cookie-Inventar"), Vendor-Karten parallel.
B22 Cross-Domain-Legal-Doc-Detector (cross_domain_doc_check.py):
Real-Beispiel User-Feedback: Elli's AGB liegt auf docs.logpay.de
statt elli.eco. Detektor erkennt SLD-Mismatch:
- HIGH bei agb / widerruf (vertragsrelevant)
- MEDIUM bei dse / nutzungsbedingungen
- INFO bei cookie / impressum (Best-Practice)
Norm: DSGVO Art. 28 (AVV-Pflicht für Hosting) + Art. 13 Abs. 1
lit. e (Empfänger) + § 312i BGB (Cool-URLs).
9/9 Tests grün inkl. Elli/LogPay Pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
5c5d676f01 |
feat: Plan B + A + C — DSE-Versions-MCs + Legacy-URL + Multi-Version
CI / detect-changes (push) Successful in 7s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / loc-budget (push) Failing after 11s
CI / python-lint (push) Has been skipped
CI / test-python-backend (push) Successful in 28s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 10s
CI / go-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
Drei verwandte Mechanismen für DSE-Beweisbarkeit + URL-Hygiene.
Plan B + PDF — Versions-Beweisbarkeit-MCs (dse_checks.py):
- mc-dse_version_date (HIGH) — sichtbares Stand/Versionsdatum
Pflicht. 12 Regex-Pattern: "Stand: April 2024", ISO-Datum,
"Letzte Aktualisierung", "Version 3.2", englische
Varianten ("Last updated", "Effective date as of …").
Norm: Art. 7 Abs. 1 DSGVO (Nachweisbarkeit Einwilligung).
- mc-dse_version_proof (MED) — PDF-Download oder
versionierte Archiv-URL. Reine HTML-DSE ohne Snapshot ist
juristisch fragil. 8 Pattern: .pdf, Download-Hinweis,
web.archive.org, /dse-vNNN.html.
Norm: DSK-Orientierungshilfe 2024.
Plan A — Legacy-URL-Discovery (legacy_url_discovery.py + B20):
Vier komplementäre Quellen:
A.1 /sitemap.xml + Sub-Sitemaps parsen, auf compliance-
relevante Slugs filtern
A.2 archive.org/wayback/available pro Slug — wenn Wayback
zeigt ≥18 Monate alten Snapshot UND Seite heute noch
200 liefert UND nicht im Footer → Legacy-Verdacht
A.3 Slug-Permutations: 6 doc_types × 6 Slug-Varianten ×
5 Lang-Prefixe × 4 Brand-Parameter
A.4 Banner-Modal-Links (über consent-tester Stufe 4 Tour)
Mail-Block "🗂️ Legacy-URL-Inventar" mit Tabelle: URL · HTTP ·
Wayback-Alter · Footer · Empfehlung (301/Offline/Behalten).
Engine entscheidet NICHT was Legacy ist — präsentiert das
Inventar, Kunde wählt.
Real-World-Smoke Elli:
/en/cookies → HTTP 200, Wayback 69 Mo alt, nicht im Footer
→ "Legacy-Verdacht, 301 setzen"
/en/impressum → HTTP 302, redirected → "behalten"
Plan C — Multi-Version-DSE-Analyse (multi_version_dse.py):
Wenn ≥2 DSE-URLs reachable: pro Variante DSB-Name + Datum +
Wortzahl + SHA-256 extrahieren, Inkonsistenzen flaggen
(date_divergent, dsb_divergent, no_date_count).
Mail-Block "📑 Mehrere DSE-Versionen erkannt" mit
Vergleichstabelle + rotem Hinweis "Nur eine Version kann
gültig sein". Beispiel Elli: /de/datenschutz (Mollstr-DSB,
2022) vs /de/datenschutzerklaerung?brand=elli (Proliance,
ohne Datum).
API-Response erweitert um legacy_url_inventory +
html_blocks.legacy_urls + multi_version_dse_html im V2-Layout.
ENV-Override: LEGACY_URL_DISABLED=1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
663a1c3e38 |
feat(document-library): zentrale Doc-Übersicht + Workflow-Auto-Select (Phase 3)
CI / detect-changes (push) Successful in 9s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Failing after 12s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m16s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Neue Compliance-Admin-Seite /sdk/document-library: zeigt alle compliance_
legal_documents mit aktueller Version, gruppiert nach Empfehlungs-Klassi-
fikation, filterbar nach Status + Volltextsuche.
Backend (Service + Routes):
- LegalDocumentService.list_documents_with_versions() — JOIN über docs +
latest/published version in einem Roundtrip statt N+1
- GET /api/v1/compliance/legal-documents/documents-with-versions
liefert {documents:[{...doc, latest_version, published_version}]}
Admin-Frontend:
- app/sdk/document-library/page.tsx (350 LOC)
- Lädt Docs + Recommend parallel
- Mapped jedes Doc per .type → Recommend-Item (klassifiziert in
required/recommended/optional/uncategorized)
- 4 Sektionen mit Klassifikations-Chip + Anzahl-Badge
- Tabelle pro Sektion: Titel · Type · Status · Version · Geändert · Override
- Status-Filter (alle / draft / review_internal / review_client /
approved / published / archived / rejected)
- Klick auf Zeile → /sdk/workflow?doc=<uuid>
- Empty state mit Link zum Generator (Bulk-Modus)
- workflow/page.tsx: auto-select bei ?doc=<uuid> URL-Param
- lib/sdk/types/sdk-steps.ts: 'document-library' bei seq=2500 im Paket
'dokumentation' registriert (sichtbar in der SDK-Sidebar)
Workflow-Hookup vervollständigt: Library → click → Workflow öffnet
direkt das gewünschte Dokument im SplitViewEditor, keine manuelle
Selektion über DocumentSelectorBar mehr nötig.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
e34f7cb507 |
feat(legal-docs): 5-stage lifecycle (draft → review_internal → review_client → approved → published)
CI / detect-changes (push) Successful in 7s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Failing after 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Phase 1 of the workspace-cutover initiative: compliance becomes the
single source of truth for documents. Step one is making the existing
compliance_legal_documents workflow rich enough to express the DSB→
Mandant approval pattern that the workspace's 5-stage UI needed.
Migration 148:
- Adds CHECK constraint on status (was free-form VARCHAR20)
- Allows: draft, review, review_internal, review_client, approved,
published, archived, rejected (legacy "review" kept for backward
compat — 0 existing rows so no backfill needed)
- Adds CHECK on approvals.action with extended values:
submitted_internal, submitted_client, approved_internal,
approved_client, rejected_internal, rejected_client
- Adds 6 new columns for the richer audit trail: submitted_by/at,
approved_internal_by/at, approved_client_by/at
Service:
- New methods submit_internal_review, approve_internal, approve_client
- submit_review / approve kept as backwards-compat aliases that map to
the new methods
- reject() now reads current status to log specific rejected_internal
or rejected_client action
- _version_to_response includes all new audit fields
Routes:
- POST /versions/{id}/submit-internal-review
- POST /versions/{id}/approve-internal (DSB sagt OK → Mandant ist dran)
- POST /versions/{id}/approve-client (Mandant sagt OK → approved)
- Existing submit-review / approve endpoints stay but map through aliases
Schema:
- VersionResponse extended with optional submitted_by/at,
approved_internal_by/at, approved_client_by/at fields
This unlocks Phase 2 (Generate-All in compliance generator), Phase 3
(Document-Library tab in admin), Phase 4 (workspace cutover — drop its
own document storage and route everything through this lifecycle).
[migration-approved]
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
327e6a8984 |
fix(b19): UNK-Noise drastisch reduzieren
BMW4 zeigte 1037 UNK-Findings — die Mail wurde damit unleserlich. Drei pragmatische Anpassungen: 1. UNK severity: LOW → INFO. Mail-Renderer zeigt jetzt nur HIGH/MEDIUM/LOW; INFO bleibt im API-Payload + CSV. 2. UNK wird NICHT emittiert wenn Vendor=First-Party-Owner (z.B. "BMW AG" auf bmw.de). Heuristik _is_first_party_owner vergleicht Vendor-Name gegen Domain-SLD. 3. auto_learning threshold ≥3 Sites → ≥1 Site. Second-time-Audit einer Site hat ihre eigenen Cookies bereits gelernt → kein UNK mehr. Single-site Auto-Learning ist absichtlich konservativ (Annotation, kein Truth). Effekt: erwartete Reduktion bei BMW von 1037 UNK → ~50-100 (nur unbekannte 3rd-party-Vendoren). Mail wird lesbar, MAE- Findings (Salesforce-as-essential) bleiben prominent sichtbar. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
eecbd8fc69 |
fix(phase_e+f): mail-send unreachable + cookie_coherence im html_blocks
KRITISCH: Mein vorheriger B19-Edit hatte send_email() versehentlich
in den _build_cookie_csv_extra-Helper geschoben (NACH dem return {}).
Mail wurde nie versendet (email_status=skipped war Folge — state[
"email_result"] nie gesetzt).
Fix:
- send_email + state["email_result"]/site_name/domain/doc_count
zurück in run_phase_e (BMW4 hat 1520 findings produziert aber
keine Mail verschickt).
- _build_cookie_csv_extra ist jetzt eine echte Modul-Funktion
NACH run_phase_e.
Plus: phase_f_persist.response.html_blocks um "cookie_coherence"
ergänzt (B19-HTML-Block fehlte im API-Schema).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
c908fcd5eb |
feat(b19): Cookie-Coherence — 3-Layer-Lookup + Vendor-Karten + CSV
Adressiert das BMW-Beispiel (740 Cookies, Salesforce als "essential"
mit 1-Jahres-Lifetime, Pseudo-Zwecke wie "Siehe dazugehörige
Datenverarbeitung"). User-Konzept "Regulation als Code".
Step 1 — cookie_library_lookup.py (3 Layer):
1. Override = cookie_knowledge_db.py + extended (74) für
Schrems-II / EUGH / EU-Alternative — BreakPilot-juristische-IP.
2. Truth-Base = compliance.cookie_library (2287 aus Open Cookie
Database, CC0). actual_category als Wahrheit.
3. Auto-Learning = cookie_behavior_audits — Cross-Site-Konsens
wenn ≥3 Sites denselben Cookie melden.
Match: exact > prefix (mit Separator-Check) > wildcard. Kurze
Library-Namen ("c", "ID") brauchen exact-match — verhindert
False-Positive auf "completely_unknown". Trailing-Underscore
in OCD ("guest_uuid_essential_") wird als implicit-wildcard
interpretiert.
Step 2 — cookie_coherence_check.py (B19, 6 Finding-Typen):
- MARKETING_AS_ESSENTIAL (HIGH): KB sagt actual=marketing, Site
deklariert essential/erforderlich → Einwilligung wird umgangen
- LIFETIME_TOO_LONG_FOR_ESSENTIAL (MED): essential + >90d
- PSEUDO_PURPOSE (LOW): "Siehe dazugehörige Datenverarbeitung"
/ <4 Wörter (suppressed wenn Vendor-Purpose substantial ist)
- MISSING_COUNTRY (LOW): vendor_country leer trotz KB-Hit
- UNKNOWN_VENDOR (LOW): nicht in KB → Auto-Learning-Kandidat
- DUPLICATE_VENDOR (MED): selber Vendor in N Kategorien =
Stack-Aufspaltung um Marketing unter "essential" zu schmuggeln
Jedes Finding mit recommended_action ("Cookie X aus 'erforderlich'
raus und in 'Marketing' setzen").
Step 3 — cookie_observation_logger.py:
Loggt nach jedem Audit alle (cookie, site, declared_purpose) in
compliance.cookie_behavior_audits → Basis für Cross-Site-Konsens
in Layer 3.
Step 4 — cookie_csv_exporter.py:
cookies-full-{check_id}.csv mit 21 Spalten (Name, Vendor decl/KB,
Cat decl/KB, Lifetime decl/KB, Country, Opt-Out, 8x FIND_* flags,
recommended_action). UTF-8 BOM für Excel.
ZIP-Attachment: erweitert audit_walk_zip_builder um extra_files=
parameter; phase_e ruft mit cookies-full-...csv auf.
Step 5 — mail_render_v2/_vendor_cards.py:
Statt 740 Cookie-Rows: Aggregation pro Vendor mit Cookie-Count +
Issue-Count + 1-2 Beispiel-Cookies + Issue-Type-Tags. Top 30
Vendoren in der Mail, Rest nur in CSV. Sortiert nach Issue-Score.
Step 6 — render_info_box_rechtsrahmen():
Generic Header-Info-Box mit Art. 13 DSGVO + § 25 TDDDG + Art. 5
+ § 5 UWG + § 30/130 OWiG. Immer angezeigt, kein explicit-
finding-mapping (User-mündigkeit).
Orchestrator + _compose: run_b19 + render_vendor_cards +
render_info_box_rechtsrahmen ins V2-Layout.
Tests: 28/28 grün (15 lookup + 13 coherence).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b16130369a |
feat(b17): Stufe 4 banner-tour + Stufe 5 annotierte Screenshots + V2-default
Stufe 4 — Cookie-Banner-Tour vor dem Accept-Klick:
- audit_walk_banner_tour.tour_cookie_banner(): öffnet Settings
(16 Phrase-Varianten), scrollt vertikal, aktiviert jedes
[role=tab], expandet jedes [aria-expanded=false] / details /
summary + 14 CMP-spezifische Selektoren. Max 35 Klicks,
Best-Effort.
- audit_walk_recorder ruft tour_cookie_banner() VOR
_try_accept_banner auf — Reviewer sieht den vollen Consent-
Katalog im Video (Vendor-Liste, Kategorien, Zwecke).
- Recorder unter 500 LOC (412+155 split).
Stufe 5 — Annotierte Screenshots pro Finding:
- finding_annotator.annotate_url(): WebKit headless, JS-Inject
eines rot-banner-Labels oben + roter Outline um das Element
(Selector oder Text-Match).
- finding_annotator.annotate_findings(): dispatched 3 Cases —
B1 Tap-Target (Anchor markiert mit "Tap-Target X×Y px"),
B16 URL-Slug-Drift (404-Seite mit "/<slug> 404"),
B13 Widerruf (Footer markiert "Widerruf-Link fehlt").
- routes_audit_walk.POST /annotate-findings (consent-tester).
- _b17_wiring ruft annotate-findings nach record_audit_walk und
speichert annotations in walk.annotations.
- audit_walk_zip_builder packt PNGs nach findings/<name>.png ins
ZIP — Reviewer hat Beweis-Bilder im Postfach.
Plausibility Circuit-Breaker:
- Nach 6 consecutive empty batches (PLAUSIBILITY_EMPTY_BUDGET=6)
bricht die ganze Phase ab statt 200 Calls zu warten. Fix für
qwen3-down + große DSE-Sites (BMW: ohne Breaker 21min, mit
Breaker ~3min).
audit_walk_zip_builder fängt walk.annotations ab und legt sie unter
findings/<fname>.png im ZIP-Anhang ab.
V2-Default:
- docker-compose.yml backend-compliance.environment.MAIL_RENDER_V2:
default 'true'. Ohne diesen Override liefert die Engine
weiterhin das alte Legacy-Mail-Layout, in dem die B-Wiring-
Blöcke nicht sichtbar sind.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
e8ff75cbfe |
feat: Backlog 1-5 — soft-hints, chatbot-discovery, API-payload, LLM-Agent
5 Backlog-Items aus dem Multi-Site-Briefing in einem Sprint:
1. B13 B2C-Soft-Hints — Versicherungs/Tarif/Buchungs-Marker
_B2C_WEAK erweitert um "Reiseversicherung", "Tarifrechner",
"Online-Antrag", "Flug buchen", "Stromtarif" etc.
Fängt Allianz-Reise-Chatbot (vorher False-Negative).
2. Chatbot-Policy-Discovery (chatbot_policy_discovery.py)
Probt 14 Standard-Slugs (privacypolicychatbot, chatbot-datenschutz,
ai-policy, ki-datenschutz, ...) × 5 Lang-Prefixe auf jeder
submitted Origin. Successful >300-Wort-Findings werden in
doc_texts['dse'] gemerged. Audit-Trail über
doc_entries[dse].chatbot_policy_sources.
Hebt Westfield-iAdvize-Lücke.
3. API-Response-Payload erweitert
phase_f_persist.response um extra_findings, audit_walk und
html_blocks erweitert. B-Wiring-Output (B1, B3-B18) ist nicht
mehr nur im Mail-HTML versteckt — externe Aufrufer sehen jeden
Finding. Schema additiv, legacy clients ignorieren neue Felder.
4. Plausibility-LLM Empty-Response-Fix
Resilienz-Strategie A→B→C→D:
A) format='json' (strict, default)
B) format='' (loose, _try_extract_json mit ```json-fence + prose-
wrap-Unterstützung)
C) Split-Batch-Recursion (vorhanden)
D) Give up, leeres dict (callers behandeln als skipped)
Plus _post_llm() als isolierter LLM-Call-Helper, catched
Network-Errors.
5. Specialist-Agents Phase 2 LLM (MVP) — Impressum-Agent
impressum_agent_llm.py: qwen3:30b-a3b mit § 5 TMG System-Prompt,
business_scope-hints aus profile_dict. Output identisches Schema
wie pattern-agent für ein Merge ohne API-Bruch.
_b18_wiring.py orchestriert beide Agents + deduplet nach
field_id, rendert lila V2-Block mit KB/LLM-Tags pro Finding.
Pattern-first im Dedup (deterministisch + stable).
Tests: 107/107 grün (7 Test-Suites + chatbot-discovery + b18).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
c7d2038ad9 |
feat(b17): DSMS-CID-Anchor für Audit-Walk-Video (Stufe 3, #7)
Video + walk.json werden nach Aufnahme zu DSMS-IPFS hochgeladen.
Die zurückgegebenen CIDs sind manipulationssichere Audit-Anker —
Reviewer können das Walk-Video Monate später noch verifizieren und
auf Unverändertheit prüfen.
consent-tester:
- _upload_to_dsms(): Best-Effort-Upload zu /api/v1/documents
(Bearer-Token, document_type=audit_walk_video|meta). DSMS-Down
bricht den Walk nicht ab — CID fehlt einfach im result.
- record_audit_walk(): nach video.webm + walk.json erzeugt, beide
hochladen. walk.json wird re-written sodass es BEIDE CIDs
selbstreferenziell enthält.
- ENV: DSMS_GATEWAY_URL + DSMS_BEARER konfigurierbar.
backend:
- _b17_wiring._publicize_gateway_url(): DSMS gibt intern
http://dsms-node:8080/ipfs/{cid} zurück. Für die Audit-Mail
wird das via env DSMS_PUBLIC_GATEWAY (default
https://dsms-dev.breakpilot.ai) durch eine extern erreichbare
URL ersetzt.
- Render-Block: gelber DSMS-Anchor-Hinweis mit Video-CID +
walk.json-CID, beide als klickbare Links zur public Gateway.
Real-World-Smoke gegen Elli:
- Video-CID: QmbdFwtSymPuWGYYdC6eNZ1eEvVLsTYmoRRxEo5L6BXgwt
- walk.json-CID: QmWaTqwZq4KVd5wYFVAKB12uZtAosPqoG1X4m1azysXYJi
- DSMS-Upload erfolgreich, gateway_url im response
Tests: 12/12 grün (+2 für DSMS-Anchor-Render-Pfade inkl.
Internal-Host → Public-Gateway-Rewrite).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
80c4778017 |
feat(b17): Akkordeon-Expansion im Audit-Walk (Stufe 2, #7)
Nach jedem Compliance-Doc-Aufruf werden alle Akkordeons /
<details> / [aria-expanded=false] / Trigger-Patterns geklickt
und im Video aufgenommen.
- _expand_accordions(): 7 Selektor-Patterns, max 25 Expansionen
pro Seite, Dedup nach inner_text (verhindert Endlos-Loops bei
nesteten Strukturen). Scroll-into-view + click + 400ms warten
sicher dass das Klick-Result im Video erfasst wird.
- _visit_link(): Returns (nav_event, expand_event) Tuple. Expand
läuft nur bei HTTP 2xx + ohne nav-error.
- 1500ms post-expand wait gibt der Kamera Zeit, den finalen
Zustand mitzuschneiden.
Backend B17 render: "expand_accordions" Action wird als "5
Akkordeon/Details-Sektion(en) entfaltet" gerendert. Bei 0:
"Keine Akkordeons gefunden" (neutraler Hinweis, kein Fehler).
Real-World-Smoke gegen Elli:
Impressum: 0 Akkordeons (keine)
Datenschutzerkl: 5 Akkordeons aufgeklappt
Nutzungsbeding: 0 Akkordeons
Video-Größe verdoppelt sich (581 KB → 1.14 MB) — Reviewer sieht
jetzt den vollen DSE-Vendor-Tabellen-Inhalt im Video.
Tests: 10/10 grün (+2 für Akkordeon-Render-Pfade).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
cb4b352846 |
feat(b17): Playwright Audit-Walk-Video (Stufe 1, #7)
Nimmt einen kompletten Site-Walk als WebKit-Browser-Session
inkl. Video auf. Reviewer kann nachträglich exakt nachvollziehen,
wie die Engine zum Befund kam.
consent-tester:
- services/audit_walk_recorder.py: Playwright record_video_dir,
iPhone-Viewport-free 1280×800. Goto homepage → Banner-Accept
(Best-Effort: 12 Text-Phrasen + 5 CMP-Fallback-Selektoren) →
Footer-Links sammeln (compliance-relevant gefiltert) →
pro Link navigate + Dwell-Time → JSON-Action-Index mit
UTC-Timestamps + SHA-256 vom Video als Manipulation-Schutz.
- routes_audit_walk.py: POST /scan-audit-walk; statische
Serves für /audit-walks/{walk_id}/video.webm + walk.json.
- main.py: Router registriert.
backend:
- _b17_wiring.py: Triggert /scan-audit-walk, speichert
Walk-Metadata in state["audit_walk"]. Render-Block mit
HTML-Tabelle aller Actions (HH:MM:SS + Aktion + Detail) +
Links zu Video und walk.json.
- _orchestrator.py: run_b17 nach run_b16, async-aufgerufen.
- mail_render_v2/_compose.py: audit_walk_html im V2-Layout.
- test_b17_audit_walk.py: 8 Tests (Render-Pfade + Wiring).
Stufe-2 (Akkordeon-Expansion) und Stufe-3 (DSMS-CID-Anchor)
folgen separat.
Real-World-Smoke gegen Elli:
- 581 KB Video, SHA-256 verifizierbar
- 3 Footer-Links besucht (Impressum, Datenschutzerkl., Nutzungs-)
- 6 Actions im JSON-Index
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
65e8bb9d42 |
feat(b16): Footer-Label-vs-URL-Slug-Drift-Check (GT URL-STRUCTURE-001)
Erkennt: gängige Footer-Labels / Bookmark- + SEO-Erwartungs-Slugs
(z.B. "Cookie-Richtlinie", "AGB", "Datenschutzerklärung") liefern
404, während das Doc tatsächlich unter einem abweichenden Slug
ausgeliefert wird.
GT-Anker (Elli URL-STRUCTURE-001):
Footer-Label "Cookie-Richtlinie" → /cookie-richtlinie 404
Real: /de/cookies
→ externe Bookmarks und Google-Treffer brechen.
Heuristik:
- Aus auto-discovered URLs Origin + Sprach-Prefix extrahieren
(z.B. /de, /de-de)
- Pro doc_type 2-4 kanonische Standard-Slugs probieren (parallel
via ThreadPoolExecutor, 2s Timeout, HEAD → GET fallback bei 405)
- Wenn alternative Slug 404/410 → LOW Finding pro doc_type
- Probe-Cap auf 18 Requests gesamt (Network-Noise-Schutz)
- Abschaltbar via URL_SLUG_PROBE_DISABLED=1
Severity: LOW (Best-Practice, kein juristisches Hardfail).
Tests: 13/13 grün (Strip-Helper 4 + Origin-Helper 3 + Check-Pfade 6
inkl. mocked _head_status).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b0b7f80914 |
feat(b15): AI-Act Rechtsgrundlage-Check (GT AI-ACT-RISK-001)
Erkennt: LLM/GPAI-System (Vertex AI, OpenAI/GPT, Claude) wird in
DSE oder Cookie-Doc auf Art. 6 Abs. 1 lit. f (berechtigtes Interesse)
gestützt — statt auf lit. a (Einwilligung).
GT-Anker (Elli AI-ACT-RISK-001): Vertex-AI-Chatbot mit lit. f
deklariert. Bei LLM-Prompt/Output-Logging + US-Transfer +
Profiling-Ähnlichkeit ist Interessenabwägung fragwürdig.
Heuristik:
- KB-basiert (chat_providers.json filter: ai_capable + LLM-Type-Hint)
- LLM-Vendor-Aliases inkl. Marken-Familien (PaLM, Gemini, GPT-4,
ChatGPT, Claude 3, Azure OpenAI)
- Absatz-Boundary-Scope: Provider + lit. f im selben Absatz
- Negativ-Filter: wenn lit. a / Einwilligung ebenfalls im Absatz →
kein Finding (Side-Purpose-Erwähnung)
- Dedup pro (doc_type, provider_id)
Severity: MEDIUM.
Norm: DSGVO Art. 6 Abs. 1 lit. a vs lit. f + AI Act Art. 50 + 51.
Tests: 17/17 grün.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
6aad774fc1 |
feat(b14): widersprüchliche Speicherdauer im selben Doc (GT TH-RETENTION-001)
Erkennt: in derselben DSE / Cookie-Richtlinie nennt der Anbieter für
DIESELBE Datenkategorie mehrere unterschiedliche Speicherdauern.
GT-Anker (Elli): Logfiles "7 Tage" + "30 Tage" im selben DSE → eine
Angabe ist falsch oder veraltet.
Heuristik:
- Satz-Boundary-Scope (kein ±N-Zeichen-Fenster) verhindert
Cross-Category-Leakage
- Pro Satz: Kategorie-Anchor + Retention-Werte beide drin
- Tag-Cluster mit ±20 %-Toleranz: "30 Tage" und "1 Monat" =
1 Cluster; "7 Tage" und "30 Tage" = 2 Cluster → Finding
Kategorien (Phase 1):
- logfile, contact_form, application, newsletter, invoice,
session_cookie
Severity: MEDIUM (DSGVO Art. 5 Abs. 1 lit. a + Art. 13 Abs. 2 lit. a).
Tests: 11/11 grün (Cluster-Logik 5, Check-Pfade 6, inkl. Cross-
Category-Leakage-Regression).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b9baa8c603 |
feat(b13): Widerrufsbelehrung-Reachability-Check (GT WIDERRUFSBELEHRUNG-001)
Erkennt B2C-Shop ohne öffentlich erreichbare Widerrufsbelehrung.
Schließt eine der offenen GT-Lücken aus dem Elli-Audit.
Signale:
- doc_entries[widerruf]: discovery_attempted=True + Text leer
- kein Footer-Link auf Widerruf/cancellation/rückgabe
- B2C-Scope: Warenkorb/Kasse/Bestellung/MwSt/Wallbox/Tarif (strong)
vs Shop/Produkt/Rechnung (weak, ≥2 = likely)
- B2B-only-Override: "ausschließlich an Unternehmer" etc.
Severity:
- HIGH bei b2c_strong
- MEDIUM bei b2c_likely
- kein Finding bei b2b_only / unknown (False-Positive-Schutz)
Norm: Art. 246a § 1 Abs. 2 Nr. 1 EGBGB i.V.m. § 312d BGB.
Wiring:
- widerrufsbelehrung_reachability_check.py — Check + Scope-Detection
- _b13_wiring.py — Render + state-Anschluss
- _orchestrator.py — run_b13 nach run_b12
- mail_render_v2/_compose.py — widerruf_reach_html-Block
Tests: 13/13 grün (Scope-Detection 5 + Check-Logik 8).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
11c7e14871 |
fix(orchestrator): add missing run_b12 + run_phase_c2 imports
Beide Funktionen wurden im run_compliance_check() aufgerufen aber nicht oben importiert — NameError landete im except-Catch-all, jeder Compliance-Check schlug auf "failed" um. Bug stammt aus den letzten 2 Sprints (B12 + browser-matrix Stage 1.c) wo die Aufruf-Stelle ergänzt, der Import vergessen wurde. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
ff796fb480 |
feat: B12 Chatbot-Cookie-Klassifikation (#19) + Cookie-Matrix scan + safetykon test
#19 Chatbot-Cookie-Klassifikation: - chat_providers.json KB mit 11 Providern (iAdvize, Intercom, Tidio, Drift, Userlike, Zendesk, LivePerson, HubSpot, Vertex AI, OpenAI, Anthropic Claude). Pro Provider: Cookie-Pattern-Regex, typical_retention_days, tn_functions vs cp_functions, ai_capable. - chatbot_cookie_classification_check.py mit 4 KORRIGIERTEN Checks: CHAT-COOKIE-CLASS-001 (MED) — TN deklariert + Vendor-Purpose erwähnt Targeting/Analytics/A-B-Tests CHAT-COOKIE-CLASS-002 (MED) — Provider hat tn+cp Funktionen, Tabelle nennt nur eine Seite → keine Einwilligungs-Differenzierung CHAT-COOKIE-PURPOSE-001 (LOW) — Zweck zu generisch (Art. 13 DSGVO konkret) CHAT-COOKIE-RETENTION-001 (HIGH) — deklariert <90d, KB-typisch >365d → vermutlich unterdeklariert NEU vs vorigem Plan: kein "eigene Banner-Kategorie Chat/AI"-Check — gesetzlich nicht vorgeschrieben (Vermischung Zweck-Transparenz vs Kategorie-Name). Anwender-Frage berechtigt, Konzept geschärft. - _b12_wiring.py + Orchestrator-Wire + V2-Compose-Slot - Cookie-Inventar mit [Chat]/[Chat+AI]-Tag pro Cookie-Name (KB-Lookup) - Smoke (3 Vendors / 5 Cookies): 9 findings korrekt (3 HIGH RETENTION, 3 MEDIUM CLASS-001, 4 LOW PURPOSE) Cookie-Matrix Scan (Browser-Vergleich gegen safetykon.de): - consent-tester/services/cookie_behavior_per_browser.py: eigener fokussierter Scanner. Pro Browser-Profile: cookies before / after reject / after accept in separaten Kontexten. Sequenzielle Runs statt parallel (Race-Conditions). - routes_cookie_matrix.py POST /scan-cookie-matrix - Live-Test safetykon.de: chromium=1, firefox=0, webkit=1, mobile- safari=1 nach reject — Firefox setzt KEIN Cookie nach Reject! (consent-tester Rebuild brachte playwright install-deps für system-libs) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
bb183b0e75 |
feat(template-rules): backend foundation for profile-based document recommendations
CI / detect-changes (push) Successful in 12s
CI / branch-name (push) Has been skipped
CI / test-python-backend (push) Successful in 33s
CI / test-python-document-crawler (push) Successful in 23s
CI / test-python-dsms-gateway (push) Successful in 19s
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 7s
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Failing after 18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m27s
CI / test-go (push) Failing after 46s
CI / iace-gt-coverage (push) Successful in 25s
Introduces the sustainable backend replacement for the hardcoded inline rules in
admin-compliance/app/sdk/document-generator/templateRecommendations.ts.
What's in this commit (Phase 1.1 - 1.5 of the rustling-yawning-boot plan):
- Migration 147: 4 new tables
- compliance_template_rules (rule shell, document_type, current_version_id)
- compliance_template_rule_versions (lifecycle, JSONB conditions,
source_citation, change_summary, approval timestamps)
- compliance_template_rule_approvals (audit trail)
- compliance_tenant_rule_overrides (per-tenant classification overrides)
Plus partial unique index for "only one is_live=1 version per rule".
- SQLAlchemy models: TemplateRuleDB, TemplateRuleVersionDB,
TemplateRuleApprovalDB, TenantRuleOverrideDB (compliance/db/).
- Pydantic schemas (compliance/schemas/template_rule.py): full request/response
set including RecommendationRequest/Result with reasons and override tracking.
- TemplateRuleService (compliance/services/): CRUD + Lifecycle transitions
(submit_for_review/approve/publish/reject) following legal_document_service.py
pattern with _transition() helper and approval audit trail. Plus tenant
override upsert.
- RecommendationService: condition evaluator (eq, neq, in, not_in, gte/lte/gt/lt,
exists, truthy) over JSONB conditions, override application, reason generation
for human-readable explanations in workspace UI.
- 18 FastAPI routes in compliance/api/template_rule_routes.py covering rule CRUD,
version lifecycle, override management and POST /recommend evaluation endpoint.
- Seed data: 33 initial rules ported from templateRecommendations.ts in
compliance/data/template_rule_seed_data.py, written as published versions
on first seed run. Idempotent via rule_key.
Phase 1.6 (pytest suite) and Phase 2 (editorial UI in admin-compliance) follow
in separate commits.
[migration-approved]
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
37093ff9e3 |
feat: Browser-Matrix C2 + B11 AI-Retention + Impressum-Specialist-Agent + B1 Mobile Playwright
Task #15 Stage 1.c-e — Browser-Matrix Backend-Integration: - _phase_c2_browser_matrix.py: ruft consent-tester /scan-matrix wenn env BROWSER_MATRIX=true, fuellt state["browser_matrix"] + state["browser_aggregate"] + state["browser_matrix_html"] - V2-Mail-Block: 🌐 Browser-Matrix Tabelle (Profile · Score · Sub-Scores PC/RR/BD · Bewertung) mit Worst-of-Header - Orchestrator ruft run_phase_c2 nach run_phase_c KNOWN: Stage 1.b (consent_scanner browser_profile-Param) bleibt zurueckgestellt (Datei in loc-exception, Hook-Patch verweigert). Stage 1.a-Shim laeuft im consent-tester — alle Profile aktuell auf Chromium, echte Engine-Diversitaet kommt mit 1.b. Task #17 TH-RETENTION-002 als B11 ai_retention_granularity_check: - Erkennt AI-Provider-Kontext (vertex/openai/anthropic/etc) - In +-800-char-Window: prueft ≥2 Datenkategorien aus Standard-Liste (Texteingaben/IP/Geraet/Session/Fehlerprotokoll/Zeitstempel) - Wenn 1 pauschale Speicherdauer + ≥2 Kategorien aber kein per-Kategorie-Differential → LOW - Smoke: Elli-Mock-DSE trifft LOW "AI-Speicherdauer pauschal" Task #18 Specialist-Agents Phase-1-Prototyp: - compliance/services/specialist_agents/__init__.py mit Architektur-Doku - impressum_agent.py: 9 Pflichtangaben § 5 TMG + § 1 DL-InfoV als Pattern-Registry (Name, Email, Telefon, HR, USt-IdNr, Vertretungsberechtigt, Aufsichtsbehoerde, Berufsangaben, OS-Link) - business_scope-aware (OS-Link nur fuer ecommerce, Aufsichtsbehoerde nur fuer regulated_profession/financial/insurance) - Phase-1 ist Pattern-Match-only (kein LLM), demonstriert die Schnittstelle. Phase 2 ersetzt Pattern durch System-Prompt + KB. - Smoke: minimal-Impressum triggert 4 Findings korrekt Task #7 B1 Playwright Mobile-Verifikation: - consent-tester/services/mobile_reachability_scanner.py: echte WebKit-launch + p.devices['iPhone 15'] preset + de-DE locale + Europe/Berlin timezone - Footer-Anchor-Suche via locator("footer >> text=/.../i") fuer 13 Reopen-Phrasen - Tap-Target-Boundingbox-Messung (Apple HIG / WCAG ≥44x44) - Click-Behavior: DOM-Modal-Snapshot vor/nach, erkennt CMP-Open - Output: has_anchor, anchor_text, tap_target_px, click_opens_cmp, engine_meta, screenshot_b64 (Footer-Crop wenn kein Anchor) - consent-tester/routes_mobile.py POST /scan-mobile-reachability - Backend _b1_wiring erweitert: ruft Mobile-Endpoint zuerst, Fallback auf statischen HTTP-Fetch. Mobile-Daten enrichen finding.mobile_playwright + Severity-Bump bei tap-target<44 / click-doesnt-open-CMP. KNOWN: WebKit-System-Libs sind im Dockerfile ergaenzt (Stage 1.a- Commit), greifen aber erst nach CI/CD-Rebuild des consent-tester. Bis dahin faellt B1 sauber auf statischen Fetch zurueck. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
e1dadc8027 |
feat: Browser-Matrix Stufe 1.a + 2 weitere GT-Findings + Plausibility-LLM-Härtung
Stage 1.a Browser-Matrix (Task #15) — Multi-Engine Scaffolding: - consent-tester/Dockerfile: firefox + webkit + Xvfb deps - playwright install chromium firefox webkit - services/browser_profiles.py: Registry mit DEFAULT_PROFILES (Chromium-Headed/Firefox-Headed/WebKit-Headed/Mobile-Safari) + EXTRA_PROFILES (Chrome-Channel, Edge, Brave) - services/multi_browser_scanner.py: run_matrix() orchestriert N parallele Scans + worst-of-Aggregation + 3 Sub-Scores (Pre-Consent 50%, Reject-Respekt 30%, Banner-Design 20%) + Hard-Fail-Cap auf <60% bei Pre-Consent/Reject-Verstoß - routes_matrix.py: POST /scan-matrix Endpoint (eigenes Modul, damit main.py unter 500 LOC bleibt) KNOWN: Stage 1.a-Shim ruft alle Profile auf demselben Chromium, echte Engine-Diversität in Stage 1.b (consent_scanner.py Param) Coverage-Gap 3 (Task #17): 2/3 verbleibende GT-Lücken geschlossen: - B9 impressum_multi_entity_check (IMPRESSUM-001): erkennt USt-IdNr/HR/GF-Fehlen pro Entity bei multi-entity Impressen (Elli: USt-IdNr nur bei Elli Mobility, fehlt bei VW Group Charging) - B10 transfer_mechanism_check (TRANSFER-001): pro Non-EU-Vendor in cmp_vendors prüft DSE auf DPF/SCCs/BCRs/Einwilligung im ±400-char-Window. Findet Vendors ohne benannten Mechanismus. - TH-RETENTION-002 (AI-Datenkategorie-Differenzierung) bleibt semantisch-tief, vorgesehen für Specialist-Agents Task #18. Plausibility-LLM Empty-Response-Härtung (Task #16): - BATCH_SIZE 8 → 4, EXCERPT 4000 → 1500 chars, TIMEOUT 60 → 45s - Single-retry mit halbierter Batch wenn LLM empty content zurückgibt — qwen3:30b-a3b rejektiert manchmal ≥6-Item-Prompts unter format='json'. Falls auch Half-Batch empty: log + skip. - Pipeline läuft jetzt nicht mehr 10min in Timeouts. GT-Coverage Sprung: 10/13 → 11/13 (85%). 4/4 HIGH ✓, 5/6 MEDIUM ✓, 2/3 LOW ✓. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
d0e3621192 |
feat(audit): V2 mail render + 5 new findings (B4/B5/B6/B7/B8) + LLM-Plausibility-Phase
Mail Render V2 (compliance/services/mail_render_v2/) — 11-Modul-Subpackage
das einen einheitlichen Audit-Mail-Output erzeugt mit:
- Header + KPI-Kacheln (Score / Findings / Docs / Vendors)
- TOC + Sprung-Links
- 3-Bucket-Trennung: Kritische Befunde / Manuelle Prüfung / Interne Reminder
- Cookie-Inventar (Name·Vendor·Kategorie·Speicherdauer·Löschfrist·Sitzland·Quelle·Status)
- Sofortmaßnahmen-Aggregator ("Sitzland ergänzen für 11 Cookies")
- 24 Legacy-Wrappers — alle alten build_*_html in V2-Sections
- Scope-Filter: FIN/GOV/MED/INS/EDU/LEG aus Berichten wenn nicht relevant
- Hint/Action-Dedup: keine doppelten Sätze pro Card mehr
Aktiviert via env MAIL_RENDER_V2=true (Default: legacy renderer).
5 neue deterministische Findings als Phase D-2b/B4/B5/B6/B7/B8:
B4 vendor_consistency_check — Cross-Doc-Provider-Widerspruch
(Elli: DSE nennt Vertex AI für Chatbot, /de/cookies nennt Iadvize → HIGH).
6 Service-Types: chatbot/analytics/tag_manager/pixel/cdn/cmp.
B5 ai_act_transparency_check — AI Act Art. 50 Transparenzpflicht
(Elli: Vertex AI vorhanden ohne Pre-Chat-Disclosure → HIGH).
Plus B5-Erweiterung: Rechtsgrundlage Art-6-Abs-1-lit-f bei AI → MED
(Einwilligung empfehlen).
B6 cross_doc_dpo_check — DPO in DSE genannt, nicht im Impressum (LOW).
B7 doc_staleness_check — Datum-Extraktion aus DSE/AGB/Nutzungsbedingungen.
Cap: AGB/NB 3y, DSE 2y. Älter → MEDIUM (Elli NB Stand 2018 → HIGH).
B8 cmp_fingerprint_check — Banner detected, aber CMP-Provider generic
(kein Usercentrics/OneTrust/Cookiebot/etc → MED).
B3-Erweiterung detect_intra_doc_contradictions — Widersprüchliche
Speicherdauer im SELBEN Doc (Elli: Logfile 7d vs 30d → HIGH).
LLM-Plausibility-Phase (Phase D-2b, finding_plausibility_check.py):
- Läuft AFTER MC pipeline, BEFORE D3 render
- Prompt mit Beispiel-IDs + 3-Phase-Mapping: exact-ID / position-fallback /
fuzzy-tail-match
- Stempelt llm_title / llm_severity / llm_recommendation / llm_drop auf
jeden FAIL CheckItem
- V2-Render zeigt "🤖 LLM-Plausibility:" Box pro Finding wenn gestempelt
- KNOWN ISSUE: qwen3:30b-a3b liefert oft empty content auf format='json' +
8000-char-excerpt prompts. Pipeline läuft mit stamped=0 weiter. Task #16.
Coverage gegen Elli Ground Truth (zeroclaw/docs/ground-truth/elli_eco_2026-06-06.json,
13 expected findings via WebFetch-Agent-Crawl):
- 4/4 HIGH-Findings ✓ (COOKIE-CONSENT-UX-001 + WIDERRUFSBELEHRUNG-001 +
VENDOR-CONSISTENCY-001 + AI-ACT-TRANSPARENCY-001)
- 4/6 MEDIUM ✓
- 2/3 LOW ✓
- Total: 10/13 = 77% (Sprung von 4/13 = 31%)
Restliche 3 Gaps als Task #17: IMPRESSUM-001 (multi-entity USt-IdNr),
TRANSFER-001 (Vendor-Mechanismus DPF/SCC), TH-RETENTION-002 (AI-Retention
pro Datenkategorie).
V2-Mail-Preview in Mailpit: 'v2all@local.test' Subject '[V2 ALL] ELLI'.
Backend healthy, B1+B3+B4+B5+B6+B7+B8 alle live im Orchestrator.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
c2c8783fee |
refactor(agent-check): split routes file (2692→347 LOC) + wire B1/B3/A1 [guardrail-change]
Phase-5 split of agent_compliance_check_routes.py — the 2700-line
monolith was decomposed into 19 modules in compliance/api/agent_check/:
- Phase A-F: resolve / profile+check / banner+TCF / vendors raw+finalize /
HTML blocks top+mid+bot / email / persist
- Helpers: _constants, _helpers, _fetch, _discovery, _single_check
- Schemas + State + thin _orchestrator
A1 ZIP-Anhang nativ in _phase_e_email: evidence_zip_builder.py bundles
slices + manifest.json + audit_metadata.json (SHA256 per slice +
build_sha + source_url). smtp_sender.py erweitert um attachments-Parameter.
B1 COOKIE-CONSENT-UX-001 (Mobile Reachability): consent_reachability_check.py
parses footer anchors, classifies intent (reopen_cmp / info_only /
browser_deflect) + target (same_page_cmp / new_tab / external).
_b1_wiring.py fetches homepage with iPhone-UA + renders Art-7-Abs-3
severity-coloured block.
B3 TH-RETENTION (Cross-Doc Speicherdauer): retention_comparator.py
compares DSI claim ↔ cookie-table duration ↔ actual Max-Age/expires
with 5% tolerance + severity hierarchy (dsi_under_actual HIGH,
table_under_actual HIGH, dsi_vs_table MEDIUM, actual_under_table LOW
Safari-ITP-Hint). _b3_wiring.py + Top-10 mismatches table in mail.
Side-effects:
- Fixed silent UnboundLocalError in original Step 5 (gf_one_pager used
audit_quality_findings before declaration, caught by surrounding
except → block never rendered). New _phase_d3_blocks_bot.py runs
audit-quality FIRST.
- agent_compliance_check_routes.py removed from loc-exceptions.txt
("Phase 5 split target" — done).
Tests: 55/55 grün (B1 22 + B3 27 + saving_scan 6).
E2E: smoke against Elli DSE+Cookie produced HIGH/missing B1 finding,
TH-RETENTION table (17 cookies / 3 ✓ / 3 ✗ / 11 ?), evidence-zip
with 2 slices + manifest + audit_metadata (12089B, SHA256-chained,
source verified), email sent (attachments=1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
d2f26e70c6 |
perf(audit): parallel Tesseract OCR + Pipeline-Wire-In für Slicing
ocr_slices_extract_cookies nutzt jetzt ThreadPoolExecutor (4 workers). Tesseract released die GIL, daher echtes parallelisieren möglich. Sequenziell 32 slices ≈ 60s, parallel ~15s. Pipeline in agent_compliance_check_routes.py: Step C ruft jetzt capture_cookie_evidence_slices + ocr_slices_extract_cookies. Source 'tesseract_ocr' wird zu existing Vendors gemergt; neue Vendors als eigenständige Records. Final VW-Scan-Resultat: - Cookies: 60 (parse_flat) → 128 (mit Tesseract) = +113% - Vendors: 18 unique - Adobe Analytics: 9 → 33 Cookies (Tesseract fand +24) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
1784b43d72 |
feat(audit): Screenshot+Tesseract-OCR Cookie-Extract als Vendor-Quelle C
Statt fragiler text-Regex + LLM-Cascade-Workarounds: deterministische
Pipeline. consent-tester macht Full-Page-Screenshot der Cookie-Richtlinie
(akzeptiert Banner, klappt Accordions, brennt Timestamp ein). Backend
laesst Tesseract OCR (deu, PSM 4) drueber + anchor-basierter Parser
extrahiert {name, category, purpose, duration, type} pro Cookie.
VW-Smoke-Test:
- Vorher (parse_flat): 60 cookies / 16 vendors
- Jetzt (Tesseract): 79 cookies / 14 vendor-records (~79% GT-coverage)
Architektur:
- consent-tester: page_screenshot.py + /capture-evidence Endpoint
- backend: cookie_screenshot_ocr.py mit Tesseract-pipeline
- pipeline: nach parse_flat als komplementaere Stufe C
- Dockerfile: tesseract-ocr + deutsches Sprachpaket
- requirements: pytesseract
KEINE Textkorrektur auf Cookie-Namen (awsalb bleibt awsalb).
Timestamp im Screenshot = juristischer Beweis was wir zum Scan-Zeitpunkt
wirklich auf der Site gesehen haben.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b663e2508f |
feat(audit): P107 Branchen-Benchmark-Cockpit fuer Big-4-Demos
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 17s
CI / loc-budget (push) Failing after 18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 3m5s
CI / test-go (push) Failing after 54s
CI / iace-gt-coverage (push) Successful in 27s
CI / test-python-backend (push) Successful in 47s
CI / detect-changes (push) Successful in 13s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
benchmark_extractor.py — extract_kpis() liefert 18 KPIs pro Snapshot: * vendors_total, vendors_us, vendors_non_eu (mit % je Vendor-Land) * source_breakdown (llm/library/flat_pattern/table_paste/html_table_dom) * max/avg cookies_per_vendor (Konzentrations-Mass) * cookies_in_browser, cookies_detailed_count, cookie_doc_chars * banner_detected, banner_provider, banner_violations * compliance_score, data_quality_pct (wie viele unserer Datenquellen haben Inhalt) * saving_low/high_eur (Heuristik: (vendors - 10) × 1k-5k) anonymize_kpis() ersetzt site_label durch 'OEM 1/2/3' (Industry-Prefix Map: automotive→OEM, banking→Bank, chemistry→Chem, luftfahrt→Airline). GET /api/compliance/agent/admin/benchmark?industry=automotive&sites= VW,BMW,Mercedes&anonymized=true — liefert kpis + summary (n_sites, avg_vendors, total_saving_high). Admin-Page /sdk/benchmark: * Filter-Leiste: Industry-Dropdown, Sites-Input + 5 Preset-Gruppen (Automotive OEMs / Zulieferer, Chemie DAX, Luftfahrt, Banking DAX) * Anonymize-Toggle prominent * 5 Summary-KPI-Karten oben * Vergleichstabelle 13 Spalten (Score, Vendors, US%, Drittland%, Cookies-Browser, Cookie-Doc-kB, Banner ✓/✗, Provider, Verstoesse, Saving €/Jahr, Daten-Qualitaet, Captured-Time) * Red-/Amber-/Green-Indikatoren bei US%/Score/Drittland * Big-4-Hinweis-Footer Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
e2be51b0aa |
feat(audit): P106 MC-Audit-Type + P83 BUILD_SHA in Dockerfiles + P80 v2 full
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m42s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 41s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
P106 — mc_audit_type.py: zentrales Quality-Thema. Klassifiziert pro MC: verifiable / process_internal / doc_internal / ambiguous. Pattern-Match auf check_question + title + fail_criteria (Schulung, AVV abgeschlossen, TOM umgesetzt, DSFA durchgefuehrt, Ausnahmen dokumentieren, kostenfrei zur Verfuegung, opt-out intern ermoeglichen, …). Interne MCs werden in der MC-Auswertung NICHT mehr als FAIL gewertet, sondern als CHECK markiert (audit_status='check'). Sie zaehlen im build_scorecard als skipped (nicht failed) damit der Score realistisch ist. build_internal_checks_block_html() rendert sie als separaten blauen Block 'Pruefungen die wir von aussen NICHT durchfuehren koennen' nach dem MC-Scorecard. Erwartete Wirkung: bei VW 95 FAILs → wahrscheinlich 30-40 echte verifiable_fails + 50-60 internal_checks. GF-Mail wird drastisch realistischer (statt 'Sie haben 95 Verstoesse' → 'Sie haben 35 extern sichtbare Themen + 60 interne Checks, bitte mit DSB klaeren'). P83 — BUILD_SHA in backend/admin/consent-tester Dockerfiles als ARG + ENV. check-rebuild-needed.sh kann jetzt deployed vs local SHA vergleichen + REBUILD REQUIRED melden. P80 v2 — check_replay.py macht jetzt vollstaendigen Replay aller post-fetch Quality-Generatoren: vendor_normalizer (Dedup), audit_quality_checks, cookie_compliance_audit, tcf_vendor_authority, cookie_value_entropy, cookie_network_tracer. Snapshots aus alter Zeit zeigen jetzt im Replay den aktuellen Audit-Stand. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
bd65b6f318 |
feat(audit): Phase 2+3 — P54 + P68 + P69 + P6/P53/P55 + P31 + P80v2
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Failing after 59s
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / loc-budget (push) Failing after 19s
CI / iace-gt-coverage (push) Successful in 27s
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
P54 — consent_diff_for_user.py: USP-Feature fuer wiederkehrende Besucher. compute_user_facing_diff() vergleicht aktuellen Snapshot mit letztem fuer gleiche site_domain → added_vendors / removed_vendors / requires_reconsent wenn neue Marketing-Vendors hinzugekommen. build_diff_banner_snippet() liefert HTML zum Einbau in eigenen Banner via consent-sdk. P68 — reverse_audit.py: Self-Audit unserer Template-Bibliothek. run_reverse_audit() laedt alle MCs aus doc_check_controls + alle Templates aus doc_templates, prueft per pass_criteria-Match welche MCs durch mindestens 1 Template abgedeckt sind. Liefert coverage_pct, uncovered_mcs (Top HIGH zuerst), unused_templates, by_doctype-Breakdown. P69 — data/ecall_regulation.json: eCall-VO (EU) 2015/758 als 7 Chunks fuer RAG-Ingest (Art. 3/6/7 + compliance_implications fuer Automotive-OEMs). Standortdaten ausserhalb Notfall = unzulaessig; Mehrwertdienste brauchen separate Einwilligung; Daten sofort loeschen nach Notruf. P6+P53+P55 — industry_library.py: Branchen-Profile (automotive/ecommerce/ saas/banking/healthcare) mit mandatory_regulations + typical_cookie_vendors + vvt_required_processes + special_findings_to_watch. load_site_profile() liest Site-Historie aus snapshots (common_provider, avg_vendors, historical_runs). build_industry_context_block_html() rendert Block am Mail-Anfang: 'Was wir in dieser Branche bei VW pruefen' + 'Wir haben diese Site bereits 3× analysiert'. P31 — llm_cascade.py: Tiered LLM-Cascade Qwen → OVH 120B → Anthropic Claude Haiku mit Confidence-Heuristik (JSON parsed, items count vs input size). Valkey-Cache (redis://) mit 7-Tage-TTL plus In-Process- Fallback. Wenn Tier-1 unter Confidence-Threshold → Tier-2, dann Tier-3. Reduziert Lauf-Zeit drastisch bei Re-Runs. P80 v2 — check_replay.py: replay nutzt jetzt audit_quality_checks mit den Snapshot-Daten. Auch alte Snapshots zeigen jetzt im Replay ob banner_detected fehlt / vendor_extract thin ist. Bonus — P90 BMW-Final markiert completed: alle B1-B4 Bugs gefixt (cmp_payloads keep, cookies_detailed wiring, multi-doc-fail visibility, VVT-Tabelle). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
8cbb513e2c |
feat(audit): Phase 1 Quick-Wins (P81 + P85 + P70 + P83) + TCF DELETE/INSERT-Fix
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / detect-changes (push) Successful in 11s
CI / branch-name (push) Has been skipped
CI / loc-budget (push) Failing after 16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 38s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / test-go (push) Has been skipped
P81 — tests/fixtures/golden_truth/vw_de.json: GT-Fixture mit must_find_cookies (47 VW-Cookies) + expected_vendors (Google, Adobe, Trade Desk, ...). Basis fuer kuenftige Regression-Tests. P85 — banner_screenshot_block.py + consent_scanner.py + main.py: consent-tester macht beim Banner-Detect einen base64-PNG-Screenshot (< 1.5MB). Backend rendert ihn als <img src="data:..."> direkt nach dem GF-1-Pager. Visueller Beweis 'so sah das Banner aus' fuer Dispute mit Marketing/DSB. P70 — rag_provenance.py: classify_finding_provenance() klassifiziert ein Finding als 'rag' (Norm + Quelle), 'mixed' (Norm ohne Quelle) oder 'heuristic' (eigene Interpretation). provenance_badge_html() rendert kleine Badges (✓ RAG / NORM / ⚠ HEURISTIK). Modul ist generisch, kann bei jedem Finding-Renderer einklinkt werden. P83 — scripts/check-rebuild-needed.sh: Prueft ob die im Container deployten BUILD_SHA mit local HEAD uebereinstimmen. Bei Mismatch exit 1 mit 'REBUILD REQUIRED'-Hinweis. Verhindert das 'alter Code im Container'-Problem das uns mehrfach erwischt hat (Frontend-Tabs sichtbar, Backend ohne neuen Service). TCF-Fix — tcf_vendor_authority.py: cookie_library hat keinen UNIQUE-Index auf cookie_name → ON CONFLICT war unmoeglich. Loesung: vor Insert DELETE WHERE source_name='iab_tcf_v2'. Idempotent. + per-Vendor-Commit damit ein Fail die naechsten nicht blockt. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
2e87b74749 |
feat(audit): P103+P104+P105 Defeat-Device-Heuristik fuer Cookies
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / loc-budget (push) Failing after 16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m35s
CI / test-go (push) Failing after 51s
CI / iace-gt-coverage (push) Successful in 27s
CI / test-python-backend (push) Successful in 39s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Drei zusammenhaengende Stufen 'Cookie-Verhalten ist anders als deklariert' — analog zum VW-Diesel-Skandal-Pattern (Pruefstand vs Realbetrieb). P103 (Stufe 3) — cookie_value_entropy.py: Klassifiziert Cookie-Werte als flag/short_id/long_token/uuid/hash/json_blob via Shannon-Entropy + Regex-Patterns. Wenn ein als 'essential' deklarierter Cookie einen 64-char-Base64-Wert hat → MEDIUM-Finding 'Defeat-Device-Heuristik'. P104 (Stufe 4) — cookie_network_tracer.py: Vergleicht Cookie-Domain mit Site-Hauptdomain + bekannten Tracker-Vendoren (50 Domains gemapped: doubleclick.net, facebook.com, demdex.net, omtrdc.net, adsrvr.org, hotjar.com, ...). Wenn ein als 'essential' deklariertes Cookie von externer Tracker-Domain gesetzt wird → HIGH. Drittland-Cookies werden als 'DRITTLAND US/CN/...' markiert (Schrems-II-Folge). P105 (Stufe 5) — tcf_vendor_authority.py: Ingest-Endpoint POST /api/compliance/agent/admin/tcf-ingest holt die IAB TCF v2 Global Vendor List (vendor-list.consensu.org/v3) und upserted sie in cookie_library mit source='iab_tcf_v2'. cross_reference_with_tcf fuzzy-matched cmp_vendors gegen die TCF-Liste — wenn Vendor in TCF als Marketing gefuehrt aber Site sagt 'Funktional' → HIGH (externe Authority widerspricht der Deklaration). Alle drei rendern eigene Mail-Bloecke im Bereich Cookies (nach cookie_audit_html, vor library_mismatch_html). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
6263462ba3 |
feat(frontend): Tab-Layout für Audit-Ergebnisse + cookie_audit in API
CI / detect-changes (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / iace-gt-coverage (push) Successful in 28s
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / loc-budget (push) Failing after 16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m40s
CI / test-go (push) Failing after 45s
CI / test-python-backend (push) Successful in 40s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
ResultsTabsView.tsx — neue Komponente mit 7 Tabs:
1. Übersicht (KPIs: Docs, Findings, Vendors, Score)
2. Cookies & VVT (3-Quellen-Compliance-Vergleich +
undokumentiert/compliant/nicht-geladen + deduplizierte Vendor-Tabelle)
3. Datenschutzerklärung (DSE-Findings via ChecklistView)
4. Impressum
5. AGB / Widerruf (zwei Sections in einem Tab)
6. Cookie-Banner (Verstoesse + Phasen-KPIs)
7. Mail-Vorschau (PDF-Download-Link)
Sticky Tab-Header oben, Content scrollt darunter. Lange Scroll-Mail
ist damit verschwunden.
DocCheckTab nutzt ResultsTabsView statt der alten Inline-ChecklistView.
Backend liefert jetzt cookie_audit-dict in der Response (zusaetzlich
zu cmp_vendors + banner_result) damit das Cookie-Tab die 3 Listen
(undokumentiert / compliant / nicht-geladen) rendern kann.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
081e4f057a |
feat(audit): Cookie-Compliance-Audit (3-Quellen-Vergleich) + Vendor-Dedup + Block-Parser
CI / detect-changes (push) Successful in 12s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Failing after 55s
CI / iace-gt-coverage (push) Successful in 25s
CI / test-python-backend (push) Successful in 44s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Failing after 18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m43s
ZENTRALER USP: cookie_compliance_audit.py vergleicht 3 Quellen * DEKLARIERT in Cookie-Richtlinie (parse_cookie_table + parse_flat) * TATSAECHLICH im Browser geladen (banner_result.phases.after_accept) * LIBRARY-Metadaten (cookie_library lookup) Liefert 3 Listen mit Compliance-Verdict: * compliant (deklariert UND geladen) — gruener Block * undeclared_in_browser (geladen NICHT deklariert) — ROTER HIGH-Block → Art. 13(1)(c) DSGVO + § 25 TDDDG Verstoss * declared_not_loaded (deklariert NICHT geladen) — gelber Hinweis → Tabelle moeglicherweise veraltet parse_cookie_table erweitert um Block-Format (5 Zeilen pro Cookie wie beim User-Copy aus VW). Findet 35+ Cookies aus Copy-Paste statt 0. vendor_normalizer.py: 50+ Aliases (Google-Familie, Adobe-Familie, Trade Desk, AdForm, ...) + Garbage-Filter (URLs, leere Strings, 'click to select', 'Mehrere OEMs'). Mergt cookies-Listen beim Dedup. _guess_vendor erweitert: Adobe-Familie (s_ecid/AMCV/demdex/mbox/...), Trade Desk (TDID/TDCPM/TTDOptOut), AdForm (uid/cid/otsid), Salesforce LiveAgent, etracker, Akamai, EDAA. audit_quality_checks: vendor-thin-Threshold jetzt dynamisch nach Cookie-Doc-Wörter (3k→10 / 6k→20 / 10k→30 / 15k+→40). VW-Test-Fixture: tests/fixtures/cookie_gt/vw_cookie_richtlinie.txt (36-Cookie-Sample fuer Regression-Tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
1451873194 |
fix(audit): parse_flat_cookie_text fuer VW-Style Flat-Tabellen
CI / loc-budget (push) Failing after 19s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 3m4s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 43s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / detect-changes (push) Successful in 12s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 19s
VW Cookie-Doc liefert die Tabelle als FLACHEN Text ohne Spalten-Trenner: 'IDE Tracking Cookies (Marketing) Beschreibung 13 Monate Permanent TAID Tracking Cookies (Marketing) ...' parse_flat_cookie_text matched mit Regex: NAME [Tracking|Session|Funktional|...] Cookies ... [13 Monate|Session|Permanent] Backend faellt bei parse_cookie_table=[] auf parse_flat zurueck. Damit holen wir aus dem 65k VW Cookie-Doc ~30-50 Cookies + Vendors deterministisch, auch wenn der HTML-Table-DOM-Extract leer ist (was passiert wenn die Tabelle aus mehreren append-Code-Pfaden geladen wird). Bonus: _extract_dom_tables Helper in dsi_discovery.py vorbereitet fuer spaeteres Einhaengen an allen 7 DiscoveredDSI.append-Stellen. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
dfac940272 |
feat(licenses): attribution renderer — Stufe 1 (overview) + Stufe 3 (SourceBadge)
Backend
- backend-compliance/compliance/api/licenses_routes.py: three endpoints
built on the now-complete license_rule classification
- GET /api/compliance/licenses/overview
global aggregation by rule + per-source breakdown (Stufe 1)
- POST /api/compliance/licenses/aggregate
per-control-set aggregation for PDF footer (Stufe 2) and
tech-file appendix (Stufe 4) — consumed later
- GET /api/compliance/licenses/source-info/{control_uuid}
single-control lookup for the inline source badge (Stufe 3)
- registered in api/__init__.py via the existing safe-import loader
Frontend
- app/sdk/licenses/page.tsx (Stufe 1): the /sdk/licenses overview page.
Renders rule legend cards + per-rule source tables. Drives the
/licenses footer link and gives auditors a one-page view of what
licence classes the platform is operating under.
- components/sdk/SourceBadge.tsx (Stufe 3): reusable React component.
Small R1/R2/R3 pill with click-expand tooltip showing source
regulation + attribution string + render-full-text policy. Will be
embedded into IACE hazards/mitigations, VVT items, DSFA controls in
follow-up commits.
Two stages of the four-stage renderer are now ready. Stufe 2 (PDF
auto-footer) + Stufe 4 (tech-file appendix) follow once the existing
PDF generators are extended to call /licenses/aggregate.
|
||
|
|
cb5dad1a2f |
feat(audit): A Audit-Transparenz + B Tabellen-Parse + D HTML-Tables aus DOM
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-python-backend (push) Successful in 45s
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 20s
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Drei zusammenhaengende Fixes fuer den VW-Befund (6 Vendors statt 100+): A — audit_quality_checks.py: drei systemische Vorbehalte die IMMER prominent gezeigt werden: * banner_detected=False trotz Cookie-Doc → HIGH 'CMP-Tool ungeladen' * cookie_doc >= 30k chars aber cmp_vendors < 15 → HIGH/MEDIUM 'Vendor-Liste auffaellig kurz fuer Doc-Groesse' * submitted URL aber 0/Mini-Text → MEDIUM 'URL nicht ladbar' Rote Audit-Vorbehalt-Box ueber dem GF-1-Pager. GF-Summary sagt 'Audit unvollstaendig' statt faelschlich 'Keine kritischen Themen'. gf_one_pager nimmt audit_quality_findings in top_findings auf (BEVOR andere Findings). B — cookies_table_parser laeuft jetzt auch auf gecrawltem Cookie-Doc- Text (nicht nur bei User-Paste). Wenn der dsi-discovery-Response Tab/ Pipe-getrennte Tabellen-Reihen liefert, parsen wir sie deterministisch. D — consent-tester/dsi-discovery extrahiert jetzt zusaetzlich zum Text die <table>-Elemente aus dem DOM als list[str] (Tab-getrennt pro Zeile, mind. 2 Zellen, mind. 3 Zeilen, max 10 Tabellen pro Doc). Backend schleust diese als 'html_table'-cmp_payload ein und jagt sie zuerst durch cookies_table_parser → 100% deterministische Vendor-Extraktion ohne LLM. VW-Erwartung: aus der 65k-Cookie-Tabelle werden jetzt 30-50 Vendors deterministisch geparst statt 6 vom LLM-Cascade. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
e411c4f0d3 |
feat(audit): Text-Paste-Mode pro Row — Crawler optional umgehen
CI / detect-changes (push) Successful in 12s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 17s
CI / loc-budget (push) Failing after 20s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 3m27s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 47s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Hintergrund: VW liefert ueber URL-Crawler nur 6 Vendors statt der 100+
die in der echten Cookie-Tabelle stehen. Wenn der User die Tabelle aber
direkt von der Site kopieren kann (was bei den meisten OEM-Sites moeglich
ist), umgehen wir den Crawler komplett und parsen den Text deterministisch.
Backend:
* doc_type_classifier.py — 7 Pattern-Gruppen (§5 TMG, Art.13 DSGVO,
AGB-Klauseln, Widerrufs-Frist, Cookie-Tabellen-Header, etc). Wenn der
User Text ins falsche Doc-Type-Feld kopiert (Impressum->DSE),
detect_mismatch liefert detected + action ('reclassify' bei sehr hoher
Konfidenz, 'warn' bei medium).
* cookies_table_parser.py — Tab/Pipe/Komma/Semicolon-Separator-Auto-
Detection, Spalten-Mapping per Header-Keyword. Aggregiert Cookie-
Eintraege zu Vendor-Records (mit _guess_vendor-Fallback). Voll
deterministisch, kein LLM.
* doc_input_warnings.py — Mail-Block ueber dem Audit, der Mismatches +
Auto-Reclassifies dem User transparent macht.
* Pipeline: text gewinnt ueber url (war schon im Schema vermerkt), neue
Felder declared_doc_type / input_source / reclassify_hint in doc_entries.
Pasted-Tabellen-Vendors haben Vorrang vor Library-Fallback + LLM-Cascade
(sind 100% genau).
Frontend (DocCheckTab):
* Pro Row Mode-Toggle 'URL' / 'Text einfuegen' (lila wenn aktiv).
* Textarea (h-32, monospace) im text-mode mit kontext-spezifischem
Placeholder (Cookie-Hinweis ggue. anderen Doc-Types) und Live-
Zeichen-/Wort-Counter.
* Submit-Button accepted entries mit URL ODER text.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
7335f64f4f |
feat(founding-wizard): Per-Person IP-Assignment + Prefill + E2E-Tests
CI / loc-budget (push) Failing after 20s
CI / detect-changes (push) Successful in 12s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 19s
CI / nodejs-build (push) Successful in 3m17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 43s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Wizard unterstuetzt jetzt 2-4 Gesellschafter mit individuellem IP-Bereich: - Pro Gruender ein IP-Assignment-Vertrag (z.B. Benjamin: Compliance+RAG; Sharang: Security+Infrastruktur). Pro GF ein eigener Dienstvertrag. - Step 1: Prefill-Button aus Unternehmensprofil + Felder Registergericht und HRB-Nr. - Step 2: Rollen-Dropdown (CEO/CTO/CFO/COO/CPO/GF/Sonstige) statt freie Texteingabe, IP-Bereiche-Textarea pro Person. Backend: - generate_documents() iteriert pro Person fuer PER_PERSON_DOCS. - _build_person_context() injiziert ASSIGNOR_*, GF_*, IP_LIST_DETAILS aus person.ip_areas. - base_context() propagiert basics.register_court und basics.hrb_number. Tests: - 30/30 Pytest gruen (6 neue: Per-Person-Context, Slug-Helper, Registergericht-Propagation). - 4 neue Playwright-E2E-Specs (hermetisch via route.fulfill, mit Console-/Page-Error-Traps): kompletter 8-Step-Flow, Prefill-Fehlerpfad, Step-Navigation/Reset, Rollen-Dropdown + IP-Areas. - Spec setzt 'bp-sdk-cookie-consent' im addInitScript damit der CookieBannerOverlay nicht die Wizard-Buttons ueberlagert. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
138d9068c4 |
fix(audit): VW-Cookie-Tabelle — Library-Fallback + Pattern-Extract verstaerkt
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / detect-changes (push) Successful in 11s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Failing after 18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 41s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
VW-Lehre: cmp_vendors=6 (alle LLM-grob) wurde als ausreichend gewertet, obwohl die echte Cookie-Tabelle 30+ Eintraege hat. 3 Fixes: 1. fallback_vendors_for_run skip-Schwelle: existing_vendor_count >= 3 war zu niedrig. Jetzt nur skip wenn < 5 Cookies UND >= 5 Vendors schon vorhanden. 2. Library-Fallback wird jetzt aufgerufen bei < 20 cmp_vendors (statt < 3). VW-typische Setups (6 LLM-grob + 30 aus Library) bekommen damit eine vollstaendige Vendor-Liste. 3. _extract_cookie_names_from_doc: regex-Pattern-Extract aus dem Cookie-Doc-Text selbst — sucht nach 'NAME Tracking Cookies (Marketing)' etc. Findet Cookie-Namen die NICHT im Browser-Jar landen (z.B. nur nach Consent geladen werden). Diese werden zusaetzlich durch die Library matched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
c281464071 |
feat(audit): P71 JC-vs-AVV Entscheidungsbaum
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / test-python-backend (push) Successful in 39s
CI / test-python-document-crawler (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
jc_avv_decision.py: detect_ambiguous_jc_avv prueft ob DSE-Text sowohl JC-Signale (gemeinsame Auswertung, Schwesterunternehmen, Konzern...) als auch AVV-Signale (Auftragsverarbeiter, weisungsgebunden...) enthaelt. Bei Treffer rendert build_jc_avv_decision_html einen Block mit 4 EDPB- basierten Leitfragen + jeweiliger Empfehlung. Quellen: EDPB Guidelines 7/2020, EuGH C-25/17, C-40/17. In Mail-Render zwischen Solutions-Block und VVT eingehaengt. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
6dc427a754 |
fix(audit): VW-404-Recovery + P52 LLM-Merge + P51 Banner-UX-Checks
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 14s
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
VW-404-Fix: submitted_types zaehlt jetzt nur Doc-Types mit >= 200 Zeichen echtem Text. Eine eingegebene URL die 404/Mini-Text liefert (VW cookie- richtlinie.html) wird als 'missing' behandelt, sodass Auto-Discovery alternative URLs auf der Homepage probiert. In-place-Update statt Duplicate-Entry, rejected_url wird fuer Audit-Transparenz aufgehoben. P52 LLM-Cascade Merge: vendor_llm_extractor laeuft jetzt bei < 5 Vendors (nicht nur bei 0), und die Ergebnisse werden MIT existing cmp_vendors gemerged statt zu ueberschreiben. VW-typische Setups (Generic CMP + 0 cmp_payloads) bekommen damit den Text-basierten Vendor-Layer dazu. P51 — banner_consistency_checks erweitert: * check_banner_copyability: scannt banner_html nach user-select:none / oncopy=return false / onselectstart. MEDIUM Finding wenn Banner-Text nicht kopierbar (Art. 7 (2) DSGVO). * check_consent_history: prueft auf 'Meine Einwilligungen' / Consent- Historie / Datenschutz-Cockpit. MEDIUM wenn keine sichtbare Historie (Art. 7 (3) — Widerruf muss so einfach wie Erteilung sein). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
309c10c203 |
feat(audit): P72 MC-Scope-Filter + P73 MC-Solution-Generator
CI / detect-changes (push) Successful in 12s
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / loc-budget (push) Failing after 18s
CI / go-lint (push) Has been skipped
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 41s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
P72 — rag_document_checker LEFT JOINs canonical_controls.scope_doc_type.
_filter_by_canonical_scope wirft MCs raus deren scope explizit auf
einen inkompatiblen Doc-Type zeigt (Mapping in _SCOPE_COMPATIBLE).
Konservativ: 'other'/NULL/'process' bleiben drin — Heuristik v1 ist
noch nicht stark genug fuer hartes Filtern.
Erwartete Wirkung: ~10-15% weniger irrelevante MCs pro Doc, weil z.B.
ein TOM-MC nicht mehr als DSE-Finding auftaucht.
P73 — mc_solution_generator.py: Qwen->OVH Cascade generiert pro HIGH/
CRITICAL-Fail eine konkrete Einfuege-Empfehlung mit Anchor (wo + was)
und Aufwand-Schaetzung. JSON-Schema {solution_text, anchor_hint,
effort_min}. In-process LRU-Cache (500 entries) per (mc_id, doc_md5).
Max 3 Solutions pro Doc-Type, global Cap 8 — haelt Latenz < 60s. Bloecke
werden im Mail-Render unter VVT als 'Loesungs-Vorschlaege (KI-generiert)'
eingehaengt. Disclaimer: kein Rechts-Beratung, mit DSB pruefen.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
4183379dc5 |
feat(audit): P33 3-Spalten-Vendor-Konsistenz (DSE/Cookie-Doc/Banner)
CI / detect-changes (push) Successful in 11s
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / loc-budget (push) Failing after 20s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 44s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
check_three_source_vendor_consistency: scannt DSE-, Cookie-Doc- und Banner-Vendor-Liste auf 15 typische Vendor-Signaturen (Google Analytics, Meta Pixel, Hotjar, HubSpot, LinkedIn Insight, ...). Listet Vendors die in mind. einer Quelle stehen, aber nicht in allen sources_with_data. Liefert MEDIUM-Finding mit konkreter 'fehlt in: DSE, Banner-Liste'- Liste pro Vendor. Empfehlung: zentrale Vendor-Liste pflegen + in alle drei Dokumenttypen propagieren. (Art. 13(1)(c)+(e) DSGVO) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
c93c88577c |
feat(audit): P88 PDF-Export via WeasyPrint
CI / detect-changes (push) Successful in 9s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 14s
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
GET /api/compliance/agent/snapshots/{id}/pdf liefert application/pdf
mit dem vollen Audit-Mail-Inhalt im A4-Print-Layout (Header mit
Site/Timestamp/Snapshot-ID, Seitenzahlen unten rechts).
check_replay.py liefert jetzt zusaetzlich 'full_html' (nicht nur
500-char-preview), damit der PDF-Renderer das komplette HTML hat.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
9f06911ff9 |
feat(audit): Cookie-Library-Fallback fuer VW-Pattern (kein bekanntes CMP)
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 17s
CI / loc-budget (push) Failing after 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 41s
Wenn nach Standard-Extract + Phase-G + LLM-Cascade weiterhin < 3 cmp_vendors aber >= 5 Cookies im after_accept stehen (typisch: Custom-CMP wie VW 'cookiemgmt'), matcht der Fallback die Cookie-Namen gegen die compliance.cookie_library und rekonstruiert Vendor-Records aus den Library-Eintraegen. Hintergrund: VW Run de2a029e zeigt 4 Vendors trotz 28 after_accept-Cookies. cmp_payloads ist 0 (kein bekanntes IAB-Tool erkannt) und die hinterlegte Cookie-URL liefert 404. Die DSE ist mit 34k zwar substanziell, listet aber keine Vendor-Tabelle. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
338e03d3b0 |
feat(audit): P34 Exec-Summary Score-Einordnung — 'wo Sie stehen sollten'
CI / detect-changes (push) Successful in 10s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 14s
CI / loc-budget (push) Failing after 16s
CI / go-lint (push) Has been skipped
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m46s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / test-python-backend (push) Successful in 43s
CI / test-python-document-crawler (push) Has been skipped
_score_band_explanation: vier Baender (Sehr gut/Akzeptabel/Handlungs- bedarf/Erhoehtes Risiko) liefern Label + erwartete Handlung. Wird als neue Zeile unter den KPIs in der Exec-Summary gerendert (mit score-farbiger Linkmark). Sachlicher Ton — kein 'Vorstand muss sofort handeln', sondern realistische Empfehlung (z.B. '70-84: Branchen-Median, einmaliges Aufraeumen + Halbjahres-Check'). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |