6a6c2dbcaa6406f6850cd282ebc80699fb3f407a
152 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
6a6c2dbcaa |
feat(controls): atom-grain liefert source_article + Registry-Tests im CI-Pfad
CI / detect-changes (push) Successful in 17s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 9s
CI / validate-canonical-controls (push) Successful in 8s
CI / loc-budget (push) Successful in 21s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 3m6s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
- _ATOM_LIST_SQL via LATERAL: zusaetzlich cpl.source_article (Gesetzes-Artikel) im atom-grain Response. Spalte control_parent_links.source_article verifiziert (macmini + prod). - Registry-Mapper-Test (neue Domaenen) nach compliance/tests/ verschoben — CI faehrt compliance/tests/, nicht tests/; schliesst die CI-Luecke der 6-neue-Use-Cases-Erweiterung. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
828230746e |
feat(cra): Befund-Detail aufgeräumt + Rechts-Anker (source_article) sichtbar
Frontend (CRA/Cyber-Tab): - Erklär-Zwischensätze je Ebene (Befund -> CRA-Anforderung -> Best-Practice- Standard -> Maßnahmen) + "So liest du einen Befund"-Legende. - Kuratierte M-Maßnahmen und atom-grain "Regulatorische Breite" in EINE Sektion "Maßnahmen (wählbar)" zusammengeführt (statt zwei konkurrierender Listen). - Standalone "Empfohlene Maßnahmen (Sollzustand)" entfernt (jetzt je Befund). Backend: - Atom-Controls-Query liefert jetzt cpl.source_article (Artikel/Anhang/Erwägungs- grund-Anker) zusätzlich zu source_regulation; via LATERAL-Join. - enrich_findings_with_breadth trägt source_article in regulatory_breadth. - Daten waren schon ingestiert (682/691 CRA-Atome haben source_article) — wurden nur nicht selektiert/angezeigt. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
00f304fed9 |
feat(controls): 5 neue Use Cases + Machinery-Fix + Korpus-/Lizenz-Übersicht
CI / detect-changes (push) Successful in 14s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 11s
CI / validate-canonical-controls (push) Failing after 5s
CI / loc-budget (push) Successful in 22s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / test-go (push) Successful in 1m11s
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 3m41s
CI / iace-gt-coverage (push) Failing after 5s
CI / test-python-backend (push) Failing after 5s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
- Registry: arbeitsrecht, gesellschaftsrecht, insolvenzrecht, csrd, bafin_it + Mapper-Regeln für zuvor ungemappte Quell-Gesetze, Machinery-Guide 2006/42 -> maschinen. Jetzt 43 Use Cases (Achse 1 / license 1+2 vollständig). - corpus_overview Service + GET /v1/controls/corpus: Quell-Dokumente mit Lizenz-Tier + atom-Count + Use-Case + kuratiertem Lizenz-Katalog. - list_use_cases trägt atom_classification-Counts (atom_total/atom_relevant). - Frontend /sdk/coverage: Use-Case-Übersicht + Korpus-Dokumente + Lizenz-Katalog. - Tests: registry-Mappings (neue Domänen), corpus tier-labels, coverage-helpers. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
b2392fb680 |
refactor(cra): readiness fetches Machinery-Reg obligations from use_case=maschinen
Follow-up to the machinery_reg_cyber.py removal: the readiness endpoint now pulls Machinery Regulation 2023/1230 cyber-with-safety obligations from the shared Controls-API (use_case=maschinen), tagged "Maschinen-VO", best-effort. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
b0f78ae9a3 |
feat(cra): readiness derives obligations from Machinery Reg 2023/1230 too
Machine/plant builders are hit by BOTH the CRA and the new Machinery Regulation. New machinery_reg_cyber.py models its two well-corroborated Annex III cyber-with- safety essential requirements (1.1.9 protection against corruption, 1.2.1 control- system safety incl. foreseeable manipulation) in our own words; EU legal text is freely reusable (Commission Decision 2011/833/EU, source acknowledged), harmonised standards referenced by identifier only. The readiness check asks "is it machinery?" and, if so, adds these obligations tagged "Maschinen-VO" alongside the CRA ones — the combination is visible (regulations list + per-item source badge). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
9660724a2c |
feat(cra): CRA Readiness Check lead-magnet on /sdk/cra (Track A)
Low-friction, stateless readiness check (no project/DB): business-scope answers (internet / parameter app / remote maintenance / updates / firmware / personal data / critical infra) -> Annex III/IV classification (reuses _classify) + a high-level guideline grouped Code / Prozess / Dokumentation (via Annex I evidence_type) + conformity path + deadlines + rough effort + the "we implement" hook and a CTA into the existing project workflow. Endpoint POST /api/v1/cra/ readiness. Reuse + reframe of the existing CRA module — no duplicate questionnaire. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
437c2c8fa1 |
feat(cra): hardware path — derive cyber findings from networked components
For hardware CE projects (no repo) each networked component (controller/hmi/ gateway/drive/remote_access/sensor) yields typical ICS vulnerability CLASSES (real CWE + "CISA-ICS — product-specific check" framing, NO fabricated CVEs); they flow through the same CRA engine. /assess accepts components[]. MappedFinding now echoes title/location/cwe so the response is self-contained for any finding source. Live CISA-ICS/NVD per-product CVE lookup is the later enrichment. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
c7845f67d6 |
feat(cra): attach network_security regulatory breadth (shared Controls-API)
Semantic breadth (2): each finding's CRA-AI is mapped to a network_security sub_topic and enriched with atom-grain, framework-traceable obligations from the shared Controls-API (compliance.atom_classification) — at the endpoint/view layer (SessionLocal), NOT in the pure mapper. CRA-AI anchor + curated measure + NIST/OWASP crosswalk stay the lead; this is breadth + source evidence. Only network_security is queried (atom-grain), scoped by sub_topic + limit. Frontend renders it under the collapsible best-practice depth (control_id · title · source). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
ee1632cd52 |
feat(cra): snapshot/history UI + measure-class (code-fix vs process) UI
Snapshot/history: "Snapshot speichern" + a version list (status, date, coverage)
you can click through — makes the CRA Art. 13 running system visible (backend
endpoints already live). Measure-class: each finding shows a remediation-class
badge from its CRA evidence_type ("Code-nah" = scan-locatable, code-fix in the
ticket possible; otherwise Prozess/Doku), and the measures section is relabelled
as the Sollzustand (process/build) — no auto-fix buttons on process measures.
Backend: MappedFinding now carries evidence_type.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
cf917ab733 |
feat(cra): versioned assessment snapshots — CRA Art. 13 running system (step 3)
Persist each CRA assessment as a versioned, auditable snapshot over the product
lifecycle. Reuses the existing compliance_cra_documents table (NO new schema,
frozen DB respected): doc_type='doc_risk_assessment', full assessment in
generation_context, requirements_coverage summary, auto-incrementing version,
prior version superseded. New endpoints: POST /projects/{id}/assess-snapshot,
GET /projects/{id}/assess-snapshots (history), GET /assess-snapshots/{id}.
Additive (no contract baseline change).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
10c32d7f7c |
feat(cra): cyber-meets-safety bridge as real logic (step 2)
Deterministic bridge (cra_safety_bridge.py): a cyber finding's attack capability (remote_actuation / code_tampering / integrity_loss / auth_bypass, derived from its CRA category) is matched against what each CE safety function is vulnerable to. A match re-opens the mitigated hazard, flags the finding safety_impact (which floors it to P0), and produces the cross-link. Endpoint accepts safety_functions; frontend passes the project's safety functions and renders the LIVE cross-links (no more hardcode). Safety functions are demo input now; come from the CE risk assessment in production. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
12fa179bfd |
feat(cra): coarse priority engine — P0 floor + customer weights + quick wins
Deterministic prioritisation on top of the mapper (cra_prioritizer.py): a non-negotiable P0 floor (safety-function compromise / actively exploited / CRITICAL — customer weights cannot demote) plus a discretionary tier ranked by severity x the customer's weight (high/medium/low) for the 5 business objectives (access/data/network_api/supply_updates/monitoring). Quick-win flag (high impact, low effort) for a second view; each finding carries a short priority reason. Endpoint accepts weights + per-finding safety_impact/exploited. Rough pre-sort only (devs re-sort in Jira). No DB. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
34a678caef |
feat(cra): standalone POST /api/v1/cra/assess endpoint
Live HTTP entry for the deterministic CRA assessment — repo-scanner findings in, CRA Annex I mapping + risk + curated measures + NIST/OWASP golden-set crosswalk out. Project-less (works for any customer, no CE-RA/FMEA required); reuses the tested mapper, same logic the MCP server exposes. Additive endpoint (no contract baseline change); no DB. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
43ae33975d |
feat(cra): NIST/OWASP security golden-set crosswalk + full measure texts in CRA tab
Crosswalk (cra_security_crosswalk.py): deterministic, hand-curated CRA Annex I -> NIST 800-53 Rev5 + OWASP Top 10:2021 mapping, the authoritative Security Golden Set (no RAG; semantic breadth comes later via the shared Controls-API). Mapper attaches NIST/OWASP refs per finding; golden-set completeness pinned by test (every requirement has >=1 NIST ref). CRA tab now shows the NIST/OWASP best- practice refs per finding and the full curated measure texts + norm references (from measures_library_cra.go). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
a73b996381 |
feat(cra): standalone CRA finding->Annex I risk mapper + MCP interface
Deterministic mapper (no DB/LLM): repo-scanner findings -> the CRA Annex I essential requirement(s) they violate -> risk level -> remediation measures + coverage. Reuses the existing Annex I spine (cra_annex_i_data). The MCP server (compliance/mcp/server.py, stdio) is the thin transport the external scanner queries; all logic lives in the fully-tested mapper. Works standalone (no project/FMEA required). No DB migrations. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
d92dd3b5fc |
feat(banner): Consent-Historie/Widerruf live erkennen (Borlabs-Stil, #62)
consent_history.detect_consent_history: erkennt CMP-Anbieter (Borlabs/ Usercentrics/OneTrust/Cookiebot/…) aus Storage+Cookies, versionierten Consent (historie-fähig) + dauerhaftes Widerruf-/Einstellungs-Widget. consent_scanner ruft es in Phase A; scan_matrix_summary surft summary.consent_history; browser_cross_finding: positiver Befund wenn vorhanden, sonst Best-Practice-LOW („Nutzer sehen, wann sie welcher Version zugestimmt haben"); BrowserBehaviorView zeigt es im Engine-Detail. Tests: 7 (classify/versioned) + 2 Cross-Finding. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
3f90e40807 |
fix(browser-matrix): Tracking-Signal statt Cookie-Rohzahl + Matrix-Schnellpfad
Korrektheit (§ 25 TDDDG): "Cookies vor Consent" ist KEIN Verstoss per se — technisch notwendige Cookies inkl. des Consent-Cookies (speichert die Ablehnung) sind nach Abs. 2 erlaubt. Verstoss ist nur nicht-essentielles TRACKING vor Consent. - browser_cross_finding: Befund haengt jetzt an violations.before_consent (Tracking), nicht an der Cookie-Rohzahl; § 25 Abs. 2-Hinweis im Detail. Regressionstest: Cookies-ohne-Tracking → KEIN Befund. - multi_browser_scanner._extract_dimensions: Score nutzt Tracking-Violations + reject_respected-Verdikt statt Rohzahl (Fallback erhalten). - BrowserBehaviorView: "Cookies vor Consent" nur rot/⚠ bei Tracking, "nach Ablehnen" neutral (Verdikt = reject-Spalte); erklaerende Zeile. Speed: run_consent_test ueberspringt im Matrix-Modus (browser_profile gesetzt) die teuren Phasen C/D-F/G — nur A+B noetig. Verhindert das 504 beim Multi-Engine-Scan (BMW 4 Engines lief sonst in den 338s-Gateway-Timeout). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
85a8a1d545 |
feat(browser-matrix): Cross-Browser-Befunde + Browser-Default-Einordnung (Phase 4)
- browser_cross_finding: deterministische Sicht ueber die Matrix (keine 2. Engine, kein LLM). Findet Inkonsistenzen ZWISCHEN Browsern (Cookies vor Consent / Ablehnen nicht universell respektiert / Banner-Links fehlend) und ordnet ein: Safari-ITP / Brave-Shields / Firefox-ETP maskieren Verstoesse clientseitig → strenge Engine "sauber" ist KEIN Compliance-Beleg, massgeblich sind die nachgiebigen (Chrome/Edge). Coverage-Hinweis fuer nicht verfuegbare Browser. Je Befund Titel/Detail/Severity/affected/Massnahme. - snapshot_check_routes: cross_findings frisch in run + GET (nicht persistiert). - BrowserBehaviorView: "Cross-Browser-Befunde"-Block ueber der Tabelle. - Tests: test_browser_cross_finding (6). Offen (Folge-Task): Borlabs-Consent-Historie-Live-Erkennung (braucht consent-tester-Storage-Scan). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> |
||
|
|
c7fde93061 |
feat(backend): On-demand Browser-Verhaltens-Matrix + Snapshot-Persistenz (Phase 2)
- check_snapshot: update_browser_matrix/load_browser_matrix — migrationsfrei
in banner_result.browser_matrix (JSONB jsonb_set, eigener scanned_at)
- snapshot_check_routes: POST /snapshots/{id}/browser-behavior/run laeuft
/scan-matrix LIVE (Re-Crawl je Engine, nur live messbar), persistiert das
Ergebnis; GET /snapshots/{id}/browser-behavior liefert die gespeicherte
Matrix ohne Re-Crawl. Profil-Set = 4 Default-Engines + Brave/Chrome/Edge.
- consent-tester multi_browser_scanner: Semaphore(2) gegen OOM (7 Browser
parallel sprengten das 2g-mem_limit)
- Pydantic-Modell mit Optional[List[...]] (nicht `| None`) → Py3.9-sicher
- Tests: _snapshot_scan_url + Request-Defaults (5)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|
|
372e1fe9e9 |
Use-Case-Mapping-Filter für Master Controls + Mapper-Präzisionsfix
CI / detect-changes (push) Successful in 14s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 7s
CI / validate-canonical-controls (push) Successful in 13s
CI / loc-budget (push) Failing after 15s
CI / go-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m23s
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 34s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Phase 2: Live-Filter an /sdk/master-controls (Use Case, Quell-Regulierung, Verifikations-Methode, Coverage, Primärzweck-Toggle, category via Member-EXISTS). API mit EXISTS-Filtern + gecachten Meta-Counts in master-controls/route.ts. Phase A: neue UseCase telekommunikation + Fix der Impressum-Fehlrouten im Register (TKG/AT-TKG->telekommunikation, telemedien->dse, GewO->handelsrecht); echte Impressum-Quellen (TMG/Mediengesetz) bleiben impressum. Deterministischer Seed aus source_regulation; Tests grün. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
6ca4dcde3e |
feat(use-cases): deterministisches source_regulation-Mapping + Primaerzweck [migration-approved]
CI / detect-changes (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 12s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 31s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Use-Case-Zuordnung jetzt DETERMINISTISCH aus der Quell-Regulierung (statt LLM/scope-category): control_parent_links.source_regulation (79% der 13.588 MCs) -> Keyword-Mapper -> ~30 Domaenen-Use-Cases. 117/117 Regulierungen gemappt (dse 44 Leitlinien, code_security 10, network_security 9, ...). - use_case_registry.py: 37 Use Cases (Doku + Security + Produkt/Sektor: cra/ai_act/mica/mdr/maschinen/batterie/ehds/dsa/dma/psd2/aml/lksg/...) + use_case_for_regulation() Keyword-Mapper (117 Regulierungen abgedeckt). - migration 150: is_primary auf mc_use_case_mappings + neue mc_regulations (MC->source_regulation, n:m, is_primary) als feine Filter-Dimension. - classify_mc_use_cases.py: source_regulation-getriebener Seed; Primaerzweck = dominante Regulierung, Mehrfachzwecke = weitere. PYTHONPATH-Bootstrap. - 18 Registry-Tests gruen (Mapper-Abdeckung + Konsistenz-Invariante). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
dca7740d8c |
feat(use-cases): Fundament — Use-Case-Register + n:m-Mapping-Migration + Seed [migration-approved]
CI / detect-changes (push) Successful in 9s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Layer 1+2 (Fundament) des Use-Case-Mapping-Systems (Plan genehmigt): - compliance/data/use_case_registry.py: Single Source of Truth fuer 14 Use Cases x Verifikations-Methoden (Doku/Source-Code/Netzwerk/IT-Prozess). Erweiterbar (neuer UC = 1 Eintrag). code_security/network_security als Uebergabe-Punkte fuers Security-Team (SBOM/SAST/DAST/Pentest). - migrations/149_mc_use_case_mappings.sql: add-only n:m mc_use_case_mappings + mc_verification (1/MC) + sync_state. use_case ohne SQL-CHECK (erweiterbar). - scripts/classify_mc_use_cases.py: Seed-Stufe (deterministisch, kein LLM). LLM-Stufe (Phase 3) folgt. - Tests: test_use_case_registry.py (14 gruen). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
bc78ddd3e5 |
fix(impressum): Findings aus 12 §5-TMG-Pattern-MCs statt verunreinigtem DB-Set
CI / detect-changes (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 5s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Der Agent lieferte "alles gruen": _load_controls gab auf macmini nur 3 von 75 doc_type='impressum'-MCs zurueck (Sidecar mc_classification.db hat nur 4/75 als text-matchbar klassifiziert). Tiefere Ursache: die 75 doc_type='impressum'-MCs sind fehl-klassifiziert (60/75 canonical_scope='other'; Prefixes TRD/SEC/GOV = Geschaeftsbriefe/Marktplatz/Bestellung, NICHT §5 TMG Website-Impressum). Fix: Der Impressum-Agent erzeugt Findings jetzt aus seinen 12 autoritativen §5-TMG/DDG-Pattern-MCs (mcs.py) statt aus dem verunreinigten DB-Set — deterministisch, scope-aware, field_id = semantisches Feld. Semantic-Validator- Demote + Massnahmen + Rollup bleiben. Die 5-Impressum-GT-Tests laufen jetzt echt durch: 0 Falsch-Positive. DB-Master-Controls fuer Impressum deaktiviert bis zum MC-Re-Filtering (separate Aufgabe: die doc_type-Klassifizierung der Vorgaenger-Session muss bereinigt werden). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
08c08fcba2 |
feat(crawl): Vollstaendigkeit — Shadow-DOM/versteckte Links + Interaktions-Fixpunkt + Wayback-CDX-Orphans
CI / test-python-backend (push) Successful in 30s
CI / detect-changes (push) Successful in 9s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 12s
CI / loc-budget (push) Successful in 15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Damit die Specialist-Agents auf vollstaendigem Website-Content arbeiten:
A — _find_dsi_links pierct jetzt Shadow-DOM (Web-Components wie Usercentrics/
Mercedes) rekursiv; versteckte (display:none) Links werden erfasst + als
Coverage-Metadatum geflaggt.
B — _expand_to_fixpoint klappt Akkordeons/Tabs/Hover-Menues in einer Schleife
auf, bis das DOM stabil ist (statt 1 Pass); erweiterte Selektoren;
Coverage-Telemetrie (Runden, expandierte Elemente, DOM-Wachstum, Shadow-/
versteckte Links) → Response + Backend-Log.
C — legacy_url_cdx.cdx_enumerate listet via Wayback-CDX-API ALLE je
archivierten URLs der Domain → findet Orphan-/Legacy-Seiten, die nie im
Slug-Raster standen (z.B. nicht mehr verlinktes /datenschutz, per Direkt-
URL noch erreichbar). Fliesst durch das bestehende Legacy-URL-Inventar.
Tests: test_legacy_url_cdx.py (6) + consent-tester/tests/test_dsi_discovery.py
(Pure-Helper + Real-Browser-Integration). Alle gruen, LOC-Gate gruen.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
||
|
|
389e6de0c7 |
fix(agents): Impressum+Cookie delegieren MC-Laden ans Main Tool — Scope-Filter + Maßnahmen
CI / detect-changes (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
Regression: Der v3-Agent-Pfad baute eine parallele MC-Pipeline (_load_impressum_mcs / _load_cookie_mcs, Roh-SELECT) und lief damit an allen Schutzmechanismen der Engine vorbei → GOV/Branchen-MCs als HIGH bei OEM/Zulieferer, fremde MCs (Bestellbestätigung), und action=check_question (Fragen statt Maßnahmen im Frontend). - Agent delegiert MC-Laden an rag_document_checker._load_controls (P72-Scope, check_type='text', fits_doc_type/scope_requires). - Subtraktives Sektor-Gate (SECTOR_PREFIXES) + Themen-Gate am Agent-Rand. - action = konkrete Maßnahme (Imperativ) statt check_question. - rag_document_checker: from __future__ import annotations (3.9-Import). - mcs: Name-Pattern erkennt "Aktiengesellschaft" (OEM-Impressums). - Tote GT-/Semantic-/Routes-Tests wiederbelebt (v3-Mismatch + agent.cascade-Patch-Target). Alle 72 Specialist-Tests grün. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
bd4882e143 |
feat(agents): Sprint 1.12 Phase 2 — Cookie-Policy v3 + ImpressumAgent v3 finetune
CI / detect-changes (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / sbom-scan (push) Has been skipped
ImpressumAgent v3 (Refactor):
- v3_engine: laedt direkt alle 75 doc_check_controls['impressum'] ohne
Sidecar-Filter (Sidecar war zu streng, lieferte nur 3 von 75 MCs).
- Layer 0 Boost prueft pass+fail_criteria gegen meine 12 Patterns mit
erweiterten Initial-Seeds (User-Vorgabe 2026-06-09:
manuelle Initial-Seeds OK, Auto-Learning erweitert zur Laufzeit).
- ETO-Smoke: 75 DB-MCs · 7 Pattern-Boosts · 24 Boost-Overrides
(versus 3 DB-MCs vorher).
CookiePolicyAgent v3 (Refactor):
- cookie_policy/v3_engine.py + cookie_policy/regex_boost.py
- Laedt direkt alle 381 Cookie-MCs aus doc_check_controls
- Layer 0 mit 12 eigenen Patterns als Initial-Seed
- KB-Layer (CMP-Vendor-Cross-Check) bleibt erhalten
- agent_version='3.0'
Tests: 27/27 gruen (12 v3-impressum, 6 cookie-policy, 9 cross-placement).
Alte v2-cookie-tests umgeschrieben auf v3-Pipeline-Mock.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
d3ac33d53a |
feat(impressum): v3 — Layer-Architektur auf doc_check_controls (75 DB-MCs)
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 12s
CI / loc-budget (push) Successful in 15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / detect-changes (push) Successful in 7s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 31s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Sprint 1.12 Phase 1 (User-Vorgabe 2026-06-09):
Statt eigener 12 hartgepatchter Patterns nutzt der Impressum-Agent jetzt
die 75 echten Master-Controls aus compliance.doc_check_controls. Pipeline:
Layer 0 — Regex-Boost (meine 12 Patterns aus mcs.py / regex_boost.py)
→ wenn Pattern hits, MC wird zu PASS überschrieben
Layer 1 — Keyword-Match aus pass_criteria der 75 DB-MCs
(rag_document_checker.check_document_with_controls)
Layer 2 — BGE-M3 Embedding-Match (in rag_document_checker integriert)
Layer 3 — Semantic-Validator (LLM) für übriggebliebene HIGH/MEDIUM
+ Auto-Learning-Pattern-Library
Output-Layer bleibt unverändert: Disclaimer-Linter + Rollup-Dedup +
Methodik-First-UI.
Neue Dateien:
- impressum/v3_engine.py — Pipeline-Orchestrator
- impressum/regex_boost.py — meine 12 Patterns + Boost-Mapping
Refactored:
- impressum/agent.py — komplett umgeschrieben, agent_version=3.0
255 LOC (unter 500-Cap)
Tests: test_impressum_v3.py mit 10 neuen Tests, alle gruen. Mockt
run_v3_pipeline für offline-Lauf. Bestaetigt:
- Layer-0 erkennt Tesla-typische Felder
- Boost matched DB-MC nur bei ≥2 Keyword-Treffern in pass_criteria
- 12 Pattern-Boost-Slots + N DB-MCs in coverage
- Notes enthalten Telemetrie (v3-pipeline, Boost-Overrides)
Telemetrie wird in AgentOutput.notes ausgegeben, damit Frontend
sehen kann: 75 DB-MCs geprueft · 5 Pattern-Boosts · 3 Boost-Overrides.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
154e8c293b |
feat(agents): Cross-Placement-Agent (deplatzierter Content)
CI / detect-changes (push) Successful in 6s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 29s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Sprint 1.9 (User-Vorgabe 2026-06-09):
Erkennt im Impressum Inhalts-Sektionen die thematisch besser in
einen Footer-Reiter 'Legal' gehoeren:
- Urheberrecht / Copyright -> LOW (Footer 'Legal')
- Bilder & Lizenzen -> LOW (Seite 'Bildquellen')
- Haftungsausschluss / Disclaimer -> LOW (Seite 'Disclaimer')
- Nutzungsbedingungen -> LOW (Seite 'AGB')
- Aenderungsvorbehalt -> LOW
- ElektroG / WEEE-Reg -> MEDIUM (Produktinfo)
- VerpackG / LUCID -> MEDIUM
- BattG -> MEDIUM
Each Finding empfiehlt konkret den 'Legal'-Footer-Reiter
einzufuehren als Best Practice ('Impressum bleibt schlank
und enthaelt ausschliesslich die Pflichtangaben nach § 5
TMG/DDG').
Tests gegen die 5 GT-Impressums:
- Safetykon: 3 Findings (Urheberrecht, Bilder/Lizenzen,
Haftungsausschluss)
- Hectronic: 3 Findings (WEEE-MEDIUM, Copyright, Haftung)
- ETO/BMW/Elli: 0 Findings (sauber)
- 9/9 Tests gruen.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
ca8c388f37 |
feat(agents): Semantic-Validator + Auto-Learning-Pattern-Library
CI / detect-changes (push) Successful in 5s
CI / branch-name (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 29s
CI / test-python-document-crawler (push) Has been skipped
Sprint 1.10 — Semantic-Validator (User-Vorgabe 2026-06-09):
- Statt unendlich Regex-Pattern fuer jede Schreibweise zu pflegen
(Tel/Telefon/Telefonnr/Phone/Fon/Funkanschluss/…), nutzen wir
bei MC-MISS einen LLM-Call: 'Ist die Pflichtangabe semantisch
doch da, nur unter abweichendem Label?'
- Bei LLM-Treffer: HIGH/MEDIUM-Finding wird zu LOW demoted,
Empfehlung wird zu 'Best-Practice Umbenennung: Management ->
Geschaeftsfuehrer' (mit STANDARD_LABELS-Mapping).
- 1 LLM-Call pro Slot statt N: cost-effizient.
Sprint 1.11 — Auto-Learning-Pattern-Library:
- Jedes Label das SVL findet wird in JSON persistiert:
/tmp/breakpilot/agent_learned_patterns.json
- Beim naechsten Run prueft der Agent zuerst gelernte Patterns
BEVOR er das HIGH-Finding emittiert -> kein LLM-Call mehr.
- Asymptotisch 0 LLM-Calls fuer haeufige Edge-Cases.
- Halluzinations-Schutz: prune_low_confidence() loescht Patterns
mit <0.5 Avg-Confidence nach 100 Beobachtungen.
- Idempotent: gleicher (field_id, label, agent) -> Counter +1.
Tests: 40/40 gruen (10 Pattern-Library + 7 SVL + 13 GT + 11 v2).
STANDARD_LABELS-Map deckt Impressum + Cookie-Policy. Spaeter
erweiterbar fuer DSE, AGB, Widerrufs-Agenten.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
882e4f9798 |
test(impressum): GT-Fixtures + Fix 'Telefonnummer' Pattern
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / detect-changes (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 13s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / nodejs-build (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Ground-Truth-Fixtures fuer 5 echte Impressums (ETO, Safetykon, BMW,
Elli, Hectronic). Pro Impressum:
- text (User-eingegeben)
- expected_clean (Felder die da sind → keine Findings)
- business_scope
- placement_concerns (Texte die deplatziert sind — fuer kommenden
Cross-Placement-Agent)
13 GT-Tests + 11 Specialist-Tests = 24/24 gruen.
Bug-Fix: Elli schreibt 'Telefonnummer:' (kein 'Telefon:'),
mein Pattern matched nur Tel/Telefon. Erweitert:
'Tel(?:efon(?:nummer)?)?|Phone|Fon'
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
702e7a6333 |
fix(impressum): Pattern fasst Geschäftsführung/Vorstand/Inhaber
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 13s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m21s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 29s
CI / detect-changes (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Safetykon-Bug: 'Geschäftsführung:' (Sammelbegriff für GF einer GmbH)
matched das alte Pattern 'Geschäftsführer' nicht — False-Positive
IMPRESSUM-AGENT-VERTRETUNGSBERECHTIGTE_LABEL_KORREKT.
Pattern erweitert: Geschäftsführer|Geschäftsführung|Geschäftsführerin
+ Vorstand|Vorstandsvorsitzender + Inhaber|persönlich haftend.
Test test_safetykon_geschaeftsfuehrung_passes ergänzt (11/11 grün).
frontend: SlotCard zeigt jetzt Badge bei 0/0/0-Slots
('Dokument konnte nicht geladen werden') statt silent-fail, +
bei 0 Findings ein 'alle MCs OK'-Badge.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
3ae4e60c9d |
feat(agents): SSE-Endpoint + Agent-Test-Tab (5-URL parallel)
CI / detect-changes (push) Successful in 7s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 12s
CI / loc-budget (push) Successful in 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m24s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 29s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Backend:
- specialist_agent_routes.py: GET /agents, POST /test/start (run_id),
GET /test/stream/{run_id} (SSE), GET /run/{run_id}/result,
GET /run/{run_id}/artifacts, GET /run/{run_id}/artifact/{path},
DELETE /run/{run_id}, GET /runs.
- Per-URL async orchestrator: text fetch via consent-tester
dsi-discovery → agent.evaluate() → vault.put_json + stream events.
- Tests: 7/7 grün.
Frontend:
- /api/sdk/v1/specialist-agent proxy mit SSE-passthrough.
- AgentTestTab.tsx: Agent-Wähler + 5 URL-Slots + Live-Events +
Speedometer (OK/N-A/HIGH/MEDIUM/LOW) + Findings + Recommendations +
Eskalations-Log + Artefakt-Link pro Slot.
- Neuer Tab "Agent-Test" in /sdk/agent.
User-Wunsch 2026-06-08: pro Agent isoliert testen, 5 URLs gleichzeitig,
Live-Updates statt Polling-Wartespiel.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
f4357a2e9b |
feat(agents): Specialist-Agents Phase 2 Foundation + Cookie-Policy-Agent
Sprint 1 — Foundation (User-Vorgabe 2026-06-08): Foundation: - _base.py: BaseSpecialistAgent ABC + Pydantic Contract (AgentInput/AgentOutput/Finding/Recommendation/McCoverage/EscalationLog). - _base.lint_output(): Disclaimer-Linter verbietet "rechtssicher" / "garantiert" / "gesetzeskonform" — scrubbed inline + Log in notes. - _registry.py: AgentRegistry mit MC-Owner-Mapping (verhindert Doppel-Ownership). - _escalation.py: cascade(local → ovh). qwen2.5:7b default, OVH 120b als Stage-2 (deaktiviert wenn OVH_URL leer). - _rollup.py: deterministisches Dedup ähnlicher actions zu Recommendations mit related_finding_ids[]. - _evidence_vault.py: Pro-Run File-Vault für Playwright-Videos, Screenshots, CSV. SHA256 + manifest.json. DSR-tauglich (delete_run). Agenten: - ImpressumAgent v2 (impressum/agent.py + mcs.py) — konsolidiert v1-Pattern-Match + v2-LLM-MVP unter dem neuen Contract. 12 MCs. - CookiePolicyAgent v1 (cookie_policy/agent.py + mcs.py) — 12 MCs zu Cookie-Richtlinie-Vollständigkeit + KB-Layer für CMP-Vendor-Cross-Check. Tests: 25/25 grün (10 Impressum + 9 Vault + 6 Cookie-Policy). Roadmap: SSE-Test-Endpoint + Frontend-Tab → DSE/AGB-Agents → Cookie-Banner-Themen-Agent → Cross-Doc-Konsistenz-Agent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
d208a2bde2 |
feat: Mail-Restrukturierung + B22 Cross-Domain-Doc-Detector
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 13s
CI / go-lint (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / detect-changes (push) Successful in 7s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / python-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
User-Feedback BMW v5: "740 Cookies verschwunden auf 31, Übersicht
verloren". Drei Anpassungen:
Mail-Restrukturierung (_executive_summary.py + _compose.py):
- render_executive_summary(): Top-of-mail TL;DR mit
Compliance-Score (gross + farbig), Top-3-Findings nach
Severity, Cookie-Statistik (deklariert/Browser/Drittland),
Severity-Verteilungs-Chips.
- collapsible(): wrapt jeden Block in <details>/<summary>.
Mailpit + alle modernen Mail-Clients rendern das nativ.
- _compose.py: alle 18+ B-Blöcke + per_doc + per_theme +
legacy_html in Akkordeons. NUR Critical-Findings + Sofort-
massnahmen sind immer offen — Reviewer sieht ~15 Zeilen
Übersicht und klappt selektiv auf.
- Cookie-Inventar (742) hat jetzt eigene Sektion ganz oben
(Akkordeon "🍪 Cookie-Inventar"), Vendor-Karten parallel.
B22 Cross-Domain-Legal-Doc-Detector (cross_domain_doc_check.py):
Real-Beispiel User-Feedback: Elli's AGB liegt auf docs.logpay.de
statt elli.eco. Detektor erkennt SLD-Mismatch:
- HIGH bei agb / widerruf (vertragsrelevant)
- MEDIUM bei dse / nutzungsbedingungen
- INFO bei cookie / impressum (Best-Practice)
Norm: DSGVO Art. 28 (AVV-Pflicht für Hosting) + Art. 13 Abs. 1
lit. e (Empfänger) + § 312i BGB (Cool-URLs).
9/9 Tests grün inkl. Elli/LogPay Pattern.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
c908fcd5eb |
feat(b19): Cookie-Coherence — 3-Layer-Lookup + Vendor-Karten + CSV
Adressiert das BMW-Beispiel (740 Cookies, Salesforce als "essential"
mit 1-Jahres-Lifetime, Pseudo-Zwecke wie "Siehe dazugehörige
Datenverarbeitung"). User-Konzept "Regulation als Code".
Step 1 — cookie_library_lookup.py (3 Layer):
1. Override = cookie_knowledge_db.py + extended (74) für
Schrems-II / EUGH / EU-Alternative — BreakPilot-juristische-IP.
2. Truth-Base = compliance.cookie_library (2287 aus Open Cookie
Database, CC0). actual_category als Wahrheit.
3. Auto-Learning = cookie_behavior_audits — Cross-Site-Konsens
wenn ≥3 Sites denselben Cookie melden.
Match: exact > prefix (mit Separator-Check) > wildcard. Kurze
Library-Namen ("c", "ID") brauchen exact-match — verhindert
False-Positive auf "completely_unknown". Trailing-Underscore
in OCD ("guest_uuid_essential_") wird als implicit-wildcard
interpretiert.
Step 2 — cookie_coherence_check.py (B19, 6 Finding-Typen):
- MARKETING_AS_ESSENTIAL (HIGH): KB sagt actual=marketing, Site
deklariert essential/erforderlich → Einwilligung wird umgangen
- LIFETIME_TOO_LONG_FOR_ESSENTIAL (MED): essential + >90d
- PSEUDO_PURPOSE (LOW): "Siehe dazugehörige Datenverarbeitung"
/ <4 Wörter (suppressed wenn Vendor-Purpose substantial ist)
- MISSING_COUNTRY (LOW): vendor_country leer trotz KB-Hit
- UNKNOWN_VENDOR (LOW): nicht in KB → Auto-Learning-Kandidat
- DUPLICATE_VENDOR (MED): selber Vendor in N Kategorien =
Stack-Aufspaltung um Marketing unter "essential" zu schmuggeln
Jedes Finding mit recommended_action ("Cookie X aus 'erforderlich'
raus und in 'Marketing' setzen").
Step 3 — cookie_observation_logger.py:
Loggt nach jedem Audit alle (cookie, site, declared_purpose) in
compliance.cookie_behavior_audits → Basis für Cross-Site-Konsens
in Layer 3.
Step 4 — cookie_csv_exporter.py:
cookies-full-{check_id}.csv mit 21 Spalten (Name, Vendor decl/KB,
Cat decl/KB, Lifetime decl/KB, Country, Opt-Out, 8x FIND_* flags,
recommended_action). UTF-8 BOM für Excel.
ZIP-Attachment: erweitert audit_walk_zip_builder um extra_files=
parameter; phase_e ruft mit cookies-full-...csv auf.
Step 5 — mail_render_v2/_vendor_cards.py:
Statt 740 Cookie-Rows: Aggregation pro Vendor mit Cookie-Count +
Issue-Count + 1-2 Beispiel-Cookies + Issue-Type-Tags. Top 30
Vendoren in der Mail, Rest nur in CSV. Sortiert nach Issue-Score.
Step 6 — render_info_box_rechtsrahmen():
Generic Header-Info-Box mit Art. 13 DSGVO + § 25 TDDDG + Art. 5
+ § 5 UWG + § 30/130 OWiG. Immer angezeigt, kein explicit-
finding-mapping (User-mündigkeit).
Orchestrator + _compose: run_b19 + render_vendor_cards +
render_info_box_rechtsrahmen ins V2-Layout.
Tests: 28/28 grün (15 lookup + 13 coherence).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
0b29d1fada |
fix(cookie-inventory): fuzzy prefix-match + BMW-GT-File
BMW-Mail zeigte 738 deklariert / 31 Browser / **0 OK** — alle
Browser-Cookies landeten als UNDOC, alle deklarierten als ORPH.
Ursache: exact-string-match scheitert bei Suffix-Cookies.
_norm_for_match() + _matches() Helper:
- Strippt Wildcards (`*`, `.*`, `<id>`, `{var}`) + Lower-Case
- Erhält führende Underscores (`__cf_bm`, `_ga` sind meaningful)
- Prefix-Match in BEIDE Richtungen, min 3 Chars (kein "_"-Garbage)
build_cookie_inventory():
- Für jeden Browser-Cookie: längster Prefix-Match in declared wählen
- browser-to-decl Index + decl-match-Index für O(N×M) → O(N+M)
- matched browser-keys werden aus all_keys entfernt → kein
Double-Count (vorher: ORPH + UNDOC parallel)
Realistischer BMW-Match-Test:
declared=[_ga, _gid, __cf_bm, AMP_TOKEN, _fbp, intercom-session,
_pk_id.*, OptanonConsent]
browser= [_ga_K8YL3M9T, _gid_xyz, __cf_bm_actual_hash,
AMP_TOKEN_runtime, _fbp_123, intercom-session-2026,
_pk_id.5.7d8, OptanonConsent]
→ 8 OK (vorher 0)
BMW-GT-File (zeroclaw/docs/ground-truth/bmw_de_2026-06-07.json):
- OneTrust CMP + 14 erwartete Vendoren
- Cookie-Count-Ranges (browser 80-250, deklariert 300-800)
- 7 expected findings inkl. neuem COOKIE-INVENTORY-MATCH-001 als
Benchmark gegen den Fuzzy-Match-Bug
Tests: 14/14 grün (4 _norm_for_match + 5 _matches + 5
build_cookie_inventory inkl. realistic_bmw_pattern).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
e8ff75cbfe |
feat: Backlog 1-5 — soft-hints, chatbot-discovery, API-payload, LLM-Agent
5 Backlog-Items aus dem Multi-Site-Briefing in einem Sprint:
1. B13 B2C-Soft-Hints — Versicherungs/Tarif/Buchungs-Marker
_B2C_WEAK erweitert um "Reiseversicherung", "Tarifrechner",
"Online-Antrag", "Flug buchen", "Stromtarif" etc.
Fängt Allianz-Reise-Chatbot (vorher False-Negative).
2. Chatbot-Policy-Discovery (chatbot_policy_discovery.py)
Probt 14 Standard-Slugs (privacypolicychatbot, chatbot-datenschutz,
ai-policy, ki-datenschutz, ...) × 5 Lang-Prefixe auf jeder
submitted Origin. Successful >300-Wort-Findings werden in
doc_texts['dse'] gemerged. Audit-Trail über
doc_entries[dse].chatbot_policy_sources.
Hebt Westfield-iAdvize-Lücke.
3. API-Response-Payload erweitert
phase_f_persist.response um extra_findings, audit_walk und
html_blocks erweitert. B-Wiring-Output (B1, B3-B18) ist nicht
mehr nur im Mail-HTML versteckt — externe Aufrufer sehen jeden
Finding. Schema additiv, legacy clients ignorieren neue Felder.
4. Plausibility-LLM Empty-Response-Fix
Resilienz-Strategie A→B→C→D:
A) format='json' (strict, default)
B) format='' (loose, _try_extract_json mit ```json-fence + prose-
wrap-Unterstützung)
C) Split-Batch-Recursion (vorhanden)
D) Give up, leeres dict (callers behandeln als skipped)
Plus _post_llm() als isolierter LLM-Call-Helper, catched
Network-Errors.
5. Specialist-Agents Phase 2 LLM (MVP) — Impressum-Agent
impressum_agent_llm.py: qwen3:30b-a3b mit § 5 TMG System-Prompt,
business_scope-hints aus profile_dict. Output identisches Schema
wie pattern-agent für ein Merge ohne API-Bruch.
_b18_wiring.py orchestriert beide Agents + deduplet nach
field_id, rendert lila V2-Block mit KB/LLM-Tags pro Finding.
Pattern-first im Dedup (deterministisch + stable).
Tests: 107/107 grün (7 Test-Suites + chatbot-discovery + b18).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
c7d2038ad9 |
feat(b17): DSMS-CID-Anchor für Audit-Walk-Video (Stufe 3, #7)
Video + walk.json werden nach Aufnahme zu DSMS-IPFS hochgeladen.
Die zurückgegebenen CIDs sind manipulationssichere Audit-Anker —
Reviewer können das Walk-Video Monate später noch verifizieren und
auf Unverändertheit prüfen.
consent-tester:
- _upload_to_dsms(): Best-Effort-Upload zu /api/v1/documents
(Bearer-Token, document_type=audit_walk_video|meta). DSMS-Down
bricht den Walk nicht ab — CID fehlt einfach im result.
- record_audit_walk(): nach video.webm + walk.json erzeugt, beide
hochladen. walk.json wird re-written sodass es BEIDE CIDs
selbstreferenziell enthält.
- ENV: DSMS_GATEWAY_URL + DSMS_BEARER konfigurierbar.
backend:
- _b17_wiring._publicize_gateway_url(): DSMS gibt intern
http://dsms-node:8080/ipfs/{cid} zurück. Für die Audit-Mail
wird das via env DSMS_PUBLIC_GATEWAY (default
https://dsms-dev.breakpilot.ai) durch eine extern erreichbare
URL ersetzt.
- Render-Block: gelber DSMS-Anchor-Hinweis mit Video-CID +
walk.json-CID, beide als klickbare Links zur public Gateway.
Real-World-Smoke gegen Elli:
- Video-CID: QmbdFwtSymPuWGYYdC6eNZ1eEvVLsTYmoRRxEo5L6BXgwt
- walk.json-CID: QmWaTqwZq4KVd5wYFVAKB12uZtAosPqoG1X4m1azysXYJi
- DSMS-Upload erfolgreich, gateway_url im response
Tests: 12/12 grün (+2 für DSMS-Anchor-Render-Pfade inkl.
Internal-Host → Public-Gateway-Rewrite).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
80c4778017 |
feat(b17): Akkordeon-Expansion im Audit-Walk (Stufe 2, #7)
Nach jedem Compliance-Doc-Aufruf werden alle Akkordeons /
<details> / [aria-expanded=false] / Trigger-Patterns geklickt
und im Video aufgenommen.
- _expand_accordions(): 7 Selektor-Patterns, max 25 Expansionen
pro Seite, Dedup nach inner_text (verhindert Endlos-Loops bei
nesteten Strukturen). Scroll-into-view + click + 400ms warten
sicher dass das Klick-Result im Video erfasst wird.
- _visit_link(): Returns (nav_event, expand_event) Tuple. Expand
läuft nur bei HTTP 2xx + ohne nav-error.
- 1500ms post-expand wait gibt der Kamera Zeit, den finalen
Zustand mitzuschneiden.
Backend B17 render: "expand_accordions" Action wird als "5
Akkordeon/Details-Sektion(en) entfaltet" gerendert. Bei 0:
"Keine Akkordeons gefunden" (neutraler Hinweis, kein Fehler).
Real-World-Smoke gegen Elli:
Impressum: 0 Akkordeons (keine)
Datenschutzerkl: 5 Akkordeons aufgeklappt
Nutzungsbeding: 0 Akkordeons
Video-Größe verdoppelt sich (581 KB → 1.14 MB) — Reviewer sieht
jetzt den vollen DSE-Vendor-Tabellen-Inhalt im Video.
Tests: 10/10 grün (+2 für Akkordeon-Render-Pfade).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
cb4b352846 |
feat(b17): Playwright Audit-Walk-Video (Stufe 1, #7)
Nimmt einen kompletten Site-Walk als WebKit-Browser-Session
inkl. Video auf. Reviewer kann nachträglich exakt nachvollziehen,
wie die Engine zum Befund kam.
consent-tester:
- services/audit_walk_recorder.py: Playwright record_video_dir,
iPhone-Viewport-free 1280×800. Goto homepage → Banner-Accept
(Best-Effort: 12 Text-Phrasen + 5 CMP-Fallback-Selektoren) →
Footer-Links sammeln (compliance-relevant gefiltert) →
pro Link navigate + Dwell-Time → JSON-Action-Index mit
UTC-Timestamps + SHA-256 vom Video als Manipulation-Schutz.
- routes_audit_walk.py: POST /scan-audit-walk; statische
Serves für /audit-walks/{walk_id}/video.webm + walk.json.
- main.py: Router registriert.
backend:
- _b17_wiring.py: Triggert /scan-audit-walk, speichert
Walk-Metadata in state["audit_walk"]. Render-Block mit
HTML-Tabelle aller Actions (HH:MM:SS + Aktion + Detail) +
Links zu Video und walk.json.
- _orchestrator.py: run_b17 nach run_b16, async-aufgerufen.
- mail_render_v2/_compose.py: audit_walk_html im V2-Layout.
- test_b17_audit_walk.py: 8 Tests (Render-Pfade + Wiring).
Stufe-2 (Akkordeon-Expansion) und Stufe-3 (DSMS-CID-Anchor)
folgen separat.
Real-World-Smoke gegen Elli:
- 581 KB Video, SHA-256 verifizierbar
- 3 Footer-Links besucht (Impressum, Datenschutzerkl., Nutzungs-)
- 6 Actions im JSON-Index
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
529c032641 |
fix(b9+b14): Real-World-Smoke-Befunde aus Elli-Audit (2026-06-07)
Smoke gegen www.elli.eco hat 3 Bugs offengelegt, die in den
synthetischen Tests nicht greifbar waren — Real-Texte haben
Abkürzungen, HTML-Stripping-Artefakte, andere Formulierungen.
B9 Multi-Entity-Impressum — vorher: 13 "Entities" statt 2.
- Block-Boundary jetzt HRB-Anker-basiert (jeder HRB-Eintrag
markiert eine Entity). Robuster als Legal-Form-Anker, der bei
"Programmierung der Webseite Acme GmbH" über-matchte.
- _NAME_BLOCKLIST gegen 11 typische False-Positives
(programmierung, webseite, umsatzsteueridentifik, ...).
- _LEADING_NOISE_RE strippt Email-TLD-Artefakte ("eco "),
deutsche Artikel ("Die "), URL-Fragmente.
- _USTID_PAT fängt jetzt auch die Vollform
("Umsatzsteueridentifikationsnummer der … ist DE…") über eine
zweite Pattern-Alternative mit [\s\S]{0,80}? Bridge.
- Dedup gleicher Entity-Namen — Mehrfacherwähnung in einem Doc
zählt als EINE Entity.
- Fallback auf alten Legal-Form-Anker wenn keine HRBs vorhanden
(z.B. e.V. ohne HR-Pflicht).
B14 Retention-Conflict — Anchor-Liste erweitert:
- "protokolldat" / "protokollierung der zugriffe" /
"zugriffsdat" / "zugriffsprotokoll" als zusätzliche
Logfile-Anchors (Elli's reale DSE-Wortwahl statt "Logfile").
B15 AI-Legal-Basis — kein Code-Fix. Elli's aktuelle DSE enthält
keine LLM-Provider-Erwähnung mehr; der GT-Anker (2026-06-06) ist
seither veraltet. 0 Findings ist korrekt für den aktuellen Stand.
Tests: 3 neue Real-World-Regression-Tests in
test_impressum_multi_entity_check.py::TestRealWorldElliPattern.
Combined: 75/75 grün.
Real-World-Smoke gegen Elli (HTTP→Text via crude strip):
B9: Entities 13→2 ✓, IMPRESSUM-MULTI-UST_ID → VW ✓
B13: 1 Finding (b2c_strong) ✓
B14: 0 (Elli hat aktuell nur EINEN Retention-Wert für Logs)
B15: 0 (LLM nicht erwähnt, korrekt)
B16: 3 Findings (impressum/dse/cookie Standard-Slug-Brüche) ✓
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
8e3d05f172 |
test(elli-gt): GT-Coverage-Integration-Test + Sprint-Briefing
- tests/test_elli_gt_coverage.py: 7 Charakterisierungstests die
einen synthetischen Elli-State konstruieren und sicherstellen,
dass die 5 neuen Detektoren (B13-B16 + B9-Cleanup) genau die
erwarteten GT-IDs fangen. Regressionsschutz.
- zeroclaw/docs/audits/2026-06-06-elli-gt-coverage-sprint.md:
Sprint-Zusammenfassung mit GT-Bilanz (12/13 voll, 1/13 wartet
auf #7), Commit-Liste und Morgen-Agenda-Kandidaten.
Combined Sprint-Test-Run: 72/72 grün.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
65e8bb9d42 |
feat(b16): Footer-Label-vs-URL-Slug-Drift-Check (GT URL-STRUCTURE-001)
Erkennt: gängige Footer-Labels / Bookmark- + SEO-Erwartungs-Slugs
(z.B. "Cookie-Richtlinie", "AGB", "Datenschutzerklärung") liefern
404, während das Doc tatsächlich unter einem abweichenden Slug
ausgeliefert wird.
GT-Anker (Elli URL-STRUCTURE-001):
Footer-Label "Cookie-Richtlinie" → /cookie-richtlinie 404
Real: /de/cookies
→ externe Bookmarks und Google-Treffer brechen.
Heuristik:
- Aus auto-discovered URLs Origin + Sprach-Prefix extrahieren
(z.B. /de, /de-de)
- Pro doc_type 2-4 kanonische Standard-Slugs probieren (parallel
via ThreadPoolExecutor, 2s Timeout, HEAD → GET fallback bei 405)
- Wenn alternative Slug 404/410 → LOW Finding pro doc_type
- Probe-Cap auf 18 Requests gesamt (Network-Noise-Schutz)
- Abschaltbar via URL_SLUG_PROBE_DISABLED=1
Severity: LOW (Best-Practice, kein juristisches Hardfail).
Tests: 13/13 grün (Strip-Helper 4 + Origin-Helper 3 + Check-Pfade 6
inkl. mocked _head_status).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b0b7f80914 |
feat(b15): AI-Act Rechtsgrundlage-Check (GT AI-ACT-RISK-001)
Erkennt: LLM/GPAI-System (Vertex AI, OpenAI/GPT, Claude) wird in
DSE oder Cookie-Doc auf Art. 6 Abs. 1 lit. f (berechtigtes Interesse)
gestützt — statt auf lit. a (Einwilligung).
GT-Anker (Elli AI-ACT-RISK-001): Vertex-AI-Chatbot mit lit. f
deklariert. Bei LLM-Prompt/Output-Logging + US-Transfer +
Profiling-Ähnlichkeit ist Interessenabwägung fragwürdig.
Heuristik:
- KB-basiert (chat_providers.json filter: ai_capable + LLM-Type-Hint)
- LLM-Vendor-Aliases inkl. Marken-Familien (PaLM, Gemini, GPT-4,
ChatGPT, Claude 3, Azure OpenAI)
- Absatz-Boundary-Scope: Provider + lit. f im selben Absatz
- Negativ-Filter: wenn lit. a / Einwilligung ebenfalls im Absatz →
kein Finding (Side-Purpose-Erwähnung)
- Dedup pro (doc_type, provider_id)
Severity: MEDIUM.
Norm: DSGVO Art. 6 Abs. 1 lit. a vs lit. f + AI Act Art. 50 + 51.
Tests: 17/17 grün.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
6aad774fc1 |
feat(b14): widersprüchliche Speicherdauer im selben Doc (GT TH-RETENTION-001)
Erkennt: in derselben DSE / Cookie-Richtlinie nennt der Anbieter für
DIESELBE Datenkategorie mehrere unterschiedliche Speicherdauern.
GT-Anker (Elli): Logfiles "7 Tage" + "30 Tage" im selben DSE → eine
Angabe ist falsch oder veraltet.
Heuristik:
- Satz-Boundary-Scope (kein ±N-Zeichen-Fenster) verhindert
Cross-Category-Leakage
- Pro Satz: Kategorie-Anchor + Retention-Werte beide drin
- Tag-Cluster mit ±20 %-Toleranz: "30 Tage" und "1 Monat" =
1 Cluster; "7 Tage" und "30 Tage" = 2 Cluster → Finding
Kategorien (Phase 1):
- logfile, contact_form, application, newsletter, invoice,
session_cookie
Severity: MEDIUM (DSGVO Art. 5 Abs. 1 lit. a + Art. 13 Abs. 2 lit. a).
Tests: 11/11 grün (Cluster-Logik 5, Check-Pfade 6, inkl. Cross-
Category-Leakage-Regression).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
8b9cad88ae |
fix(b9): clean entity names in multi-entity-impressum (GT IMPRESSUM-001)
Der Multi-Entity-Check fängt Elli's USt-IdNr-Lücke (VW Group Charging
GmbH hat keine, Elli Mobility GmbH hat eine), aber Entity-Namen waren
mit Header-Noise verunreinigt:
'Impressum\n\nVolkswagen Group Charging GmbH'
'eco\n\nElli Mobility GmbH'
Behoben:
- _ENTITY_PAT lässt nur Space im Namen zu (kein \s/\n mehr)
- _clean_entity_name() trimmt Header-Worte (Impressum, Anbieter, ...)
und nimmt nur die letzte Zeile vor Legal-Form-Suffix
- 11 neue Tests, davon einer mit Elli-like Impressum als
Charakterisierungs-Test
Damit ist die finale Finding-Ausgabe für Audit-Reports lesbar
('Fehlt bei: Volkswagen Group Charging GmbH') statt verunreinigt.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b9baa8c603 |
feat(b13): Widerrufsbelehrung-Reachability-Check (GT WIDERRUFSBELEHRUNG-001)
Erkennt B2C-Shop ohne öffentlich erreichbare Widerrufsbelehrung.
Schließt eine der offenen GT-Lücken aus dem Elli-Audit.
Signale:
- doc_entries[widerruf]: discovery_attempted=True + Text leer
- kein Footer-Link auf Widerruf/cancellation/rückgabe
- B2C-Scope: Warenkorb/Kasse/Bestellung/MwSt/Wallbox/Tarif (strong)
vs Shop/Produkt/Rechnung (weak, ≥2 = likely)
- B2B-only-Override: "ausschließlich an Unternehmer" etc.
Severity:
- HIGH bei b2c_strong
- MEDIUM bei b2c_likely
- kein Finding bei b2b_only / unknown (False-Positive-Schutz)
Norm: Art. 246a § 1 Abs. 2 Nr. 1 EGBGB i.V.m. § 312d BGB.
Wiring:
- widerrufsbelehrung_reachability_check.py — Check + Scope-Detection
- _b13_wiring.py — Render + state-Anschluss
- _orchestrator.py — run_b13 nach run_b12
- mail_render_v2/_compose.py — widerruf_reach_html-Block
Tests: 13/13 grün (Scope-Detection 5 + Check-Logik 8).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
bcf1bfa038 |
test(template-rules): pytest suite for backend foundation (Phase 1.6)
CI / detect-changes (push) Successful in 7s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Failing after 4s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Failing after 15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 29s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Adds tests/test_template_rule_routes.py with: - Schema tests (Pydantic validation: condition, clause, version create, submit-for-review change_summary, override create, recommendation request) - Clause evaluator (eq, neq, in, not_in, gte with string buckets, exists, truthy) - Condition evaluator (all/any kinds, empty clauses always pass) - Recommendation profile tests (table-driven): * AI-Startup with 2 employees gets ai_usage_policy but not whistleblower * 1000+ employee corporate gets whistleblower * Always-rules (impressum) apply to anyone * Third-country transfer triggers TIA unless DPF/adequate - Tenant override tests: * Override changes classification (required → optional with override_applied flag) * NULL override disables rule completely Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|
|
d0e3621192 |
feat(audit): V2 mail render + 5 new findings (B4/B5/B6/B7/B8) + LLM-Plausibility-Phase
Mail Render V2 (compliance/services/mail_render_v2/) — 11-Modul-Subpackage
das einen einheitlichen Audit-Mail-Output erzeugt mit:
- Header + KPI-Kacheln (Score / Findings / Docs / Vendors)
- TOC + Sprung-Links
- 3-Bucket-Trennung: Kritische Befunde / Manuelle Prüfung / Interne Reminder
- Cookie-Inventar (Name·Vendor·Kategorie·Speicherdauer·Löschfrist·Sitzland·Quelle·Status)
- Sofortmaßnahmen-Aggregator ("Sitzland ergänzen für 11 Cookies")
- 24 Legacy-Wrappers — alle alten build_*_html in V2-Sections
- Scope-Filter: FIN/GOV/MED/INS/EDU/LEG aus Berichten wenn nicht relevant
- Hint/Action-Dedup: keine doppelten Sätze pro Card mehr
Aktiviert via env MAIL_RENDER_V2=true (Default: legacy renderer).
5 neue deterministische Findings als Phase D-2b/B4/B5/B6/B7/B8:
B4 vendor_consistency_check — Cross-Doc-Provider-Widerspruch
(Elli: DSE nennt Vertex AI für Chatbot, /de/cookies nennt Iadvize → HIGH).
6 Service-Types: chatbot/analytics/tag_manager/pixel/cdn/cmp.
B5 ai_act_transparency_check — AI Act Art. 50 Transparenzpflicht
(Elli: Vertex AI vorhanden ohne Pre-Chat-Disclosure → HIGH).
Plus B5-Erweiterung: Rechtsgrundlage Art-6-Abs-1-lit-f bei AI → MED
(Einwilligung empfehlen).
B6 cross_doc_dpo_check — DPO in DSE genannt, nicht im Impressum (LOW).
B7 doc_staleness_check — Datum-Extraktion aus DSE/AGB/Nutzungsbedingungen.
Cap: AGB/NB 3y, DSE 2y. Älter → MEDIUM (Elli NB Stand 2018 → HIGH).
B8 cmp_fingerprint_check — Banner detected, aber CMP-Provider generic
(kein Usercentrics/OneTrust/Cookiebot/etc → MED).
B3-Erweiterung detect_intra_doc_contradictions — Widersprüchliche
Speicherdauer im SELBEN Doc (Elli: Logfile 7d vs 30d → HIGH).
LLM-Plausibility-Phase (Phase D-2b, finding_plausibility_check.py):
- Läuft AFTER MC pipeline, BEFORE D3 render
- Prompt mit Beispiel-IDs + 3-Phase-Mapping: exact-ID / position-fallback /
fuzzy-tail-match
- Stempelt llm_title / llm_severity / llm_recommendation / llm_drop auf
jeden FAIL CheckItem
- V2-Render zeigt "🤖 LLM-Plausibility:" Box pro Finding wenn gestempelt
- KNOWN ISSUE: qwen3:30b-a3b liefert oft empty content auf format='json' +
8000-char-excerpt prompts. Pipeline läuft mit stamped=0 weiter. Task #16.
Coverage gegen Elli Ground Truth (zeroclaw/docs/ground-truth/elli_eco_2026-06-06.json,
13 expected findings via WebFetch-Agent-Crawl):
- 4/4 HIGH-Findings ✓ (COOKIE-CONSENT-UX-001 + WIDERRUFSBELEHRUNG-001 +
VENDOR-CONSISTENCY-001 + AI-ACT-TRANSPARENCY-001)
- 4/6 MEDIUM ✓
- 2/3 LOW ✓
- Total: 10/13 = 77% (Sprung von 4/13 = 31%)
Restliche 3 Gaps als Task #17: IMPRESSUM-001 (multi-entity USt-IdNr),
TRANSFER-001 (Vendor-Mechanismus DPF/SCC), TH-RETENTION-002 (AI-Retention
pro Datenkategorie).
V2-Mail-Preview in Mailpit: 'v2all@local.test' Subject '[V2 ALL] ELLI'.
Backend healthy, B1+B3+B4+B5+B6+B7+B8 alle live im Orchestrator.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
c2c8783fee |
refactor(agent-check): split routes file (2692→347 LOC) + wire B1/B3/A1 [guardrail-change]
Phase-5 split of agent_compliance_check_routes.py — the 2700-line
monolith was decomposed into 19 modules in compliance/api/agent_check/:
- Phase A-F: resolve / profile+check / banner+TCF / vendors raw+finalize /
HTML blocks top+mid+bot / email / persist
- Helpers: _constants, _helpers, _fetch, _discovery, _single_check
- Schemas + State + thin _orchestrator
A1 ZIP-Anhang nativ in _phase_e_email: evidence_zip_builder.py bundles
slices + manifest.json + audit_metadata.json (SHA256 per slice +
build_sha + source_url). smtp_sender.py erweitert um attachments-Parameter.
B1 COOKIE-CONSENT-UX-001 (Mobile Reachability): consent_reachability_check.py
parses footer anchors, classifies intent (reopen_cmp / info_only /
browser_deflect) + target (same_page_cmp / new_tab / external).
_b1_wiring.py fetches homepage with iPhone-UA + renders Art-7-Abs-3
severity-coloured block.
B3 TH-RETENTION (Cross-Doc Speicherdauer): retention_comparator.py
compares DSI claim ↔ cookie-table duration ↔ actual Max-Age/expires
with 5% tolerance + severity hierarchy (dsi_under_actual HIGH,
table_under_actual HIGH, dsi_vs_table MEDIUM, actual_under_table LOW
Safari-ITP-Hint). _b3_wiring.py + Top-10 mismatches table in mail.
Side-effects:
- Fixed silent UnboundLocalError in original Step 5 (gf_one_pager used
audit_quality_findings before declaration, caught by surrounding
except → block never rendered). New _phase_d3_blocks_bot.py runs
audit-quality FIRST.
- agent_compliance_check_routes.py removed from loc-exceptions.txt
("Phase 5 split target" — done).
Tests: 55/55 grün (B1 22 + B3 27 + saving_scan 6).
E2E: smoke against Elli DSE+Cookie produced HIGH/missing B1 finding,
TH-RETENTION table (17 cookies / 3 ✓ / 3 ✗ / 11 ?), evidence-zip
with 2 slices + manifest + audit_metadata (12089B, SHA256-chained,
source verified), email sent (attachments=1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|