breakpilot-compliance

Author	SHA1	Message	Date
Benjamin Admin	6f16507c5f	feat(banner): P19 + P20 — Per-Category-Click-Test + Frontend-Drilldown CI / secret-scan (push) Has been skipped Details CI / dep-audit (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m54s Details CI / test-go (push) Has been skipped Details CI / detect-changes (push) Successful in 10s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / validate-canonical-controls (push) Successful in 17s Details CI / loc-budget (push) Successful in 17s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / iace-gt-coverage (push) Has been skipped Details CI / test-python-backend (push) Successful in 43s Details CI / test-python-document-crawler (push) Has been skipped Details CI / test-python-dsms-gateway (push) Has been skipped Details P19 (consent-tester): - dp-cookieconsent (TYPO3, Safetykon-Pattern) als CMP-Profil hinzu — Selektoren #dp--cookie-statistics/marketing + a.cc-allow Save-Button - Neues Signal provider_details_visible: nach Kategorie-Toggle prueft Playwright ob im Banner sichtbare Provider-/Cookie-Detail-Elemente erscheinen. Bei dp-cookieconsent (Banner ohne Listing) immer False -> HIGH-Violation "Kategorie zeigt keine Provider-/Cookie-Details — Nutzer kann nicht informiert einwilligen (Art. 7 Abs. 1 DSGVO)" - main.py serialisiert provider_details_visible + cookies_set pro Kategorie P20 (Frontend-Drilldown): - Backend: check_payloads-Tabelle um Spalte 'banner' (JSON) — voller banner_result persistiert (vorher nur in-memory). ALTER TABLE Migration idempotent. - Neuer Endpoint GET /api/compliance/agent/banner/<check_id> — liefert Quality-Score, Phases, Category-Tests, Banner-Checks, alle 46 structured_checks. - Frontend: BannerTab im /sdk/agent/audit/<id> mit Quality-Cards, 3-Phasen-Cookie-Tabelle, Per-Category-Listing (mit P19-Signal rot/gruen), Banner-Verstoesse + Rechtsgrundlagen, 46-Check-Drilldown filterbar nach Severity. - Tab-Switcher in page.tsx um "Cookie-Banner-Analyse" erweitert. - Bonus: 2 alte route.ts auf Next.js 15 Promise-params umgestellt (Build-Fix). Plus: Critical-Findings-Block nutzt provider_details_visible als primaeres Signal statt nur tracking_services-Anzahl. Smoke-Test Safetykon: 4 Critical Findings im Mail, banner-Endpoint liefert 46 checks + 3 phases + 2 categories mit provider_details_visible=False. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 14:31:13 +02:00
Benjamin Admin	575644c9c5	feat(audit): P8 — MC-Severity raus, Email nur harte Findings, MC-Audit als Checkliste CI / detect-changes (push) Successful in 10s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / secret-scan (push) Has been skipped Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / validate-canonical-controls (push) Successful in 17s Details CI / loc-budget (push) Failing after 17s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m48s Details CI / test-go (push) Has been skipped Details CI / iace-gt-coverage (push) Has been skipped Details CI / test-python-backend (push) Successful in 40s Details CI / test-python-document-crawler (push) Has been skipped Details CI / test-python-dsms-gateway (push) Has been skipped Details Email-Hardening (mc_scorecard.top_fails): Neue _is_hard_finding-Heuristik filtert konditionale MCs ohne Negativ-Beleg aus den Top-Auffaelligkeiten. matched_text leer + Label enthaelt "falls/sofern/wenn/soweit/ggf." -> raus, landet nur noch im MC-Audit als "selbst pruefen". DATA-2066-A05 (kostenfreie Abschaltung Standortdaten) ist das prototypische Beispiel. MC-Audit-Frontend (audit/[checkId]/page.tsx): Severity-Spalte (CRITICAL/HIGH/MEDIUM/LOW) entfernt — der MC-Audit ist eine Checkliste, keine Severity-Drohung. Stattdessen: - Spalte "Prioritaet" mit 3-Tier aus regulation-Mapping: Gesetz (DSGVO/ePrivacy/TDDDG/...) / Behoerden-Leitlinie (EDPB/DSK/EuGH/...) / Best-Practice (ISO/NIST/BSI) - 3-Status: erfuellt (✓) / nicht erfuellt (✗) / selbst pruefen (?) / nicht anwendbar (—). rowReviewStatus() leitet "selbst pruefen" aus matched_text-leer + konditionalem Label ab. - Filter umgebaut auf 5 Stati statt 4 - Default-Filter "Nicht erfuellt" (vorher "Nur Fail") Bonus: f.payload.risk_label TS-Cast im FindingsTab clean gemacht (unknown -> string). Effekt: - Email an die GF zeigt nur noch echte Belege ("DSB fehlt", "Gebuehr fuer Widerruf") - MC-Audit ist eine sachliche Pruefliste fuer den Compliance-Officer Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 00:30:04 +02:00
Benjamin Admin	6c223c7c9b	feat(compliance-check): exec-summary + voll-audit + TDM-respect + cookie-KB-extended + saving-scan-funnel CI / detect-changes (push) Successful in 10s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / secret-scan (push) Has been skipped Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / validate-canonical-controls (push) Successful in 14s Details CI / loc-budget (push) Failing after 15s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m43s Details CI / test-go (push) Has been skipped Details CI / iace-gt-coverage (push) Has been skipped Details CI / test-python-backend (push) Successful in 37s Details CI / test-python-document-crawler (push) Has been skipped Details CI / test-python-dsms-gateway (push) Has been skipped Details P1 — Exec-Summary oben im Email-Report (4 KPIs + 2 CTAs, dunkler Gradient) P3 — no_direct_sales-Flag fuer OEM-Konfigurator-Sites; AGB/Widerruf/AGB als "NICHT ANWENDBAR" (grau) statt "NICHT GEFUNDEN" (rot) P5 — Voll-Audit Unification: alle Findings (MC + Pflichtangaben + Vendor + Redundanz) in /data/compliance_audits.db.unified_findings; neuer /api/compliance/agent/findings/<id> Endpoint + FindingsTab im Audit-UI mit Filter + CSV-Export P7 — Crawl-Hardening: TDM-Reservation-Check (robots.txt / ai.txt / Header / Meta) vor jedem Run mit 24h-Cache; HeadlessChrome-UA (Firma noch nicht gegruendet — Switch via BREAKPILOT_BRANDED_UA env); per-Domain Rate-Limit 1 req/s + max 2 concurrent P2 — Cookie-Knowledge-DB additiv erweitert (35 -> 74 Cookies): Adobe, Meta, Microsoft, LinkedIn, TikTok, HubSpot, Marketo, Salesforce, Hotjar, FullStory, Mouseflow, Intercom, Drift, Zendesk, Cloudflare, Stripe, OneTrust/Cookiebot/Usercentrics, Matomo, Pinterest, Snapchat, X/Twitter, YouTube, Vimeo, Klaviyo, Mailchimp, Mixpanel, Segment, Amplitude, Optimizely, Datadog; Wire-in in cookie_function_classifier liefert compliance_risk-Label (kritisch/hoch/mittel/gering) pro Vendor A — k-Anonymitaets-Helper (benchmark_k_anonymity) fuer P6-Vorbereitung B — Cross-Tenant-Domain-Assertion im /findings-Endpoint (expected_domain Query-Param -> 403 bei Mismatch) C — Saving-Scan-Funnel: /api/compliance/agent/saving-scan/start mit Validierung + 24h-Rate-Limit pro Domain + Lead-Persistenz in saving_scan_leads + Auto-Discovery via _run_compliance_check; 6 Tests D — Risk-Badge im Email-Vendor-Row Rechtliche Leitplanken (Memory feedback_oem_data_legal.md): nur eigene Knapp-Bewertungen + Source-Pointer, keine 1:1-Kopien fremder CMP-Texte. TDM-Opt-Out-Respect nach § 44b UrhG. KEINE Schema-Aenderungen — alles in Sidecar-SQLite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 23:48:34 +02:00
Benjamin Admin	6ed30dae5b	feat(agent): MC scorecard + audit drill-down + tenant trend (A1-A6) Now that all 1874 MCs run per check (Task #30 cap removal), the report was about to drown in noise. This commit adds the full aggregation / persistence / drill-down stack so each MC is actionable, not just counted. A1 mc_scorecard.py (new): build_scorecard(checks) -> per-regulation PASS/FAIL/SKIP + severity top_fails(checks, n) -> N most severe failed MCs full_audit_records(...) -> flat rows ready for sidecar SQLite A2 Email rendering: agent_doc_check_scorecard.py (new) builds an HTML scorecard table (regulation × passed/failed/HIGH/MEDIUM/score) shown at the top of the email. agent_doc_check_report._render_document now collapses the 500-MC L2 forest into 'X/Y bestanden (Z Fail)' summary plus a top-10 fails block per doc — old verbose render is gone. A3 compliance_audit_log.py (new) — sidecar SQLite at /data/compliance_audits.db (separate from compliance Postgres schema to comply with the no-new-migrations rule in CLAUDE.md): check_runs(check_id, ts, tenant_id, site_name, base_domain, doc_count, scorecard json, vvt_summary json) mc_results(check_id, doc_type, mc_id, label, passed, skipped, severity, regulation, matched_text, hint) Route persists every run after the email is sent. docker-compose.yml adds compliance-audit volume + env. A4 backfill_mc_regulation_llm.py (new) — Qwen-tagged backfill for the 1636 MCs the regex pass couldn't classify. Batches of 25, format=json, output constrained to the canonical regulation list. Run manually: docker exec bp-compliance-backend python3 \ /app/scripts/backfill_mc_regulation_llm.py [--dry-run] A5 Admin audit tab — GET /api/compliance/agent/audit/<check_id> proxied via /api/sdk/v1/agent/audit/<id>. New page /sdk/agent/audit/[checkId] renders scorecard + filterable MC table (status / doc_type / regulation, expandable rows with matched_text + hint). ComplianceCheckTab now shows 'Voll-Audit oeffnen' link. A6 Trend per tenant — GET /api/compliance/agent/audit/tenant/<id> returns recent runs. Email scorecard shows per-regulation delta badges ('(+12%)', '(-3%)') compared with the previous run for the same tenant + base_domain. Lookup is one SQLite query. Plumbing: rag_document_checker.py — SELECT now includes 'article'; MC results carry 'regulation' + 'article' through to CheckItem. agent_doc_check_routes.CheckItem schema gains regulation + article fields (defaults '') so old clients still parse. agent_compliance_check_routes — response gains 'check_id' so the frontend can build the audit link.	2026-05-17 13:45:58 +02:00

4 Commits