breakpilot-compliance

Author	SHA1	Message	Date
Benjamin Admin	0b29d1fada	fix(cookie-inventory): fuzzy prefix-match + BMW-GT-File BMW-Mail zeigte 738 deklariert / 31 Browser / 0 OK — alle Browser-Cookies landeten als UNDOC, alle deklarierten als ORPH. Ursache: exact-string-match scheitert bei Suffix-Cookies. _norm_for_match() + _matches() Helper: - Strippt Wildcards (``, `.`, `<id>`, `{var}`) + Lower-Case - Erhält führende Underscores (`__cf_bm`, `_ga` sind meaningful) - Prefix-Match in BEIDE Richtungen, min 3 Chars (kein "_"-Garbage) build_cookie_inventory(): - Für jeden Browser-Cookie: längster Prefix-Match in declared wählen - browser-to-decl Index + decl-match-Index für O(N×M) → O(N+M) - matched browser-keys werden aus all_keys entfernt → kein Double-Count (vorher: ORPH + UNDOC parallel) Realistischer BMW-Match-Test: declared=[_ga, _gid, __cf_bm, AMP_TOKEN, _fbp, intercom-session, _pk_id.*, OptanonConsent] browser= [_ga_K8YL3M9T, _gid_xyz, __cf_bm_actual_hash, AMP_TOKEN_runtime, _fbp_123, intercom-session-2026, _pk_id.5.7d8, OptanonConsent] → 8 OK (vorher 0) BMW-GT-File (zeroclaw/docs/ground-truth/bmw_de_2026-06-07.json): - OneTrust CMP + 14 erwartete Vendoren - Cookie-Count-Ranges (browser 80-250, deklariert 300-800) - 7 expected findings inkl. neuem COOKIE-INVENTORY-MATCH-001 als Benchmark gegen den Fuzzy-Match-Bug Tests: 14/14 grün (4 _norm_for_match + 5 _matches + 5 build_cookie_inventory inkl. realistic_bmw_pattern). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-07 21:29:21 +02:00
Benjamin Admin	e8ff75cbfe	feat: Backlog 1-5 — soft-hints, chatbot-discovery, API-payload, LLM-Agent 5 Backlog-Items aus dem Multi-Site-Briefing in einem Sprint: 1. B13 B2C-Soft-Hints — Versicherungs/Tarif/Buchungs-Marker _B2C_WEAK erweitert um "Reiseversicherung", "Tarifrechner", "Online-Antrag", "Flug buchen", "Stromtarif" etc. Fängt Allianz-Reise-Chatbot (vorher False-Negative). 2. Chatbot-Policy-Discovery (chatbot_policy_discovery.py) Probt 14 Standard-Slugs (privacypolicychatbot, chatbot-datenschutz, ai-policy, ki-datenschutz, ...) × 5 Lang-Prefixe auf jeder submitted Origin. Successful >300-Wort-Findings werden in doc_texts['dse'] gemerged. Audit-Trail über doc_entries[dse].chatbot_policy_sources. Hebt Westfield-iAdvize-Lücke. 3. API-Response-Payload erweitert phase_f_persist.response um extra_findings, audit_walk und html_blocks erweitert. B-Wiring-Output (B1, B3-B18) ist nicht mehr nur im Mail-HTML versteckt — externe Aufrufer sehen jeden Finding. Schema additiv, legacy clients ignorieren neue Felder. 4. Plausibility-LLM Empty-Response-Fix Resilienz-Strategie A→B→C→D: A) format='json' (strict, default) B) format='' (loose, _try_extract_json mit ```json-fence + prose- wrap-Unterstützung) C) Split-Batch-Recursion (vorhanden) D) Give up, leeres dict (callers behandeln als skipped) Plus _post_llm() als isolierter LLM-Call-Helper, catched Network-Errors. 5. Specialist-Agents Phase 2 LLM (MVP) — Impressum-Agent impressum_agent_llm.py: qwen3:30b-a3b mit § 5 TMG System-Prompt, business_scope-hints aus profile_dict. Output identisches Schema wie pattern-agent für ein Merge ohne API-Bruch. _b18_wiring.py orchestriert beide Agents + deduplet nach field_id, rendert lila V2-Block mit KB/LLM-Tags pro Finding. Pattern-first im Dedup (deterministisch + stable). Tests: 107/107 grün (7 Test-Suites + chatbot-discovery + b18). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-07 18:41:54 +02:00
Benjamin Admin	c7d2038ad9	feat(b17): DSMS-CID-Anchor für Audit-Walk-Video (Stufe 3, #7 ) Video + walk.json werden nach Aufnahme zu DSMS-IPFS hochgeladen. Die zurückgegebenen CIDs sind manipulationssichere Audit-Anker — Reviewer können das Walk-Video Monate später noch verifizieren und auf Unverändertheit prüfen. consent-tester: - _upload_to_dsms(): Best-Effort-Upload zu /api/v1/documents (Bearer-Token, document_type=audit_walk_video\|meta). DSMS-Down bricht den Walk nicht ab — CID fehlt einfach im result. - record_audit_walk(): nach video.webm + walk.json erzeugt, beide hochladen. walk.json wird re-written sodass es BEIDE CIDs selbstreferenziell enthält. - ENV: DSMS_GATEWAY_URL + DSMS_BEARER konfigurierbar. backend: - _b17_wiring._publicize_gateway_url(): DSMS gibt intern http://dsms-node:8080/ipfs/{cid} zurück. Für die Audit-Mail wird das via env DSMS_PUBLIC_GATEWAY (default https://dsms-dev.breakpilot.ai) durch eine extern erreichbare URL ersetzt. - Render-Block: gelber DSMS-Anchor-Hinweis mit Video-CID + walk.json-CID, beide als klickbare Links zur public Gateway. Real-World-Smoke gegen Elli: - Video-CID: QmbdFwtSymPuWGYYdC6eNZ1eEvVLsTYmoRRxEo5L6BXgwt - walk.json-CID: QmWaTqwZq4KVd5wYFVAKB12uZtAosPqoG1X4m1azysXYJi - DSMS-Upload erfolgreich, gateway_url im response Tests: 12/12 grün (+2 für DSMS-Anchor-Render-Pfade inkl. Internal-Host → Public-Gateway-Rewrite). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-07 17:32:34 +02:00
Benjamin Admin	80c4778017	feat(b17): Akkordeon-Expansion im Audit-Walk (Stufe 2, #7 ) Nach jedem Compliance-Doc-Aufruf werden alle Akkordeons / <details> / [aria-expanded=false] / Trigger-Patterns geklickt und im Video aufgenommen. - _expand_accordions(): 7 Selektor-Patterns, max 25 Expansionen pro Seite, Dedup nach inner_text (verhindert Endlos-Loops bei nesteten Strukturen). Scroll-into-view + click + 400ms warten sicher dass das Klick-Result im Video erfasst wird. - _visit_link(): Returns (nav_event, expand_event) Tuple. Expand läuft nur bei HTTP 2xx + ohne nav-error. - 1500ms post-expand wait gibt der Kamera Zeit, den finalen Zustand mitzuschneiden. Backend B17 render: "expand_accordions" Action wird als "5 Akkordeon/Details-Sektion(en) entfaltet" gerendert. Bei 0: "Keine Akkordeons gefunden" (neutraler Hinweis, kein Fehler). Real-World-Smoke gegen Elli: Impressum: 0 Akkordeons (keine) Datenschutzerkl: 5 Akkordeons aufgeklappt Nutzungsbeding: 0 Akkordeons Video-Größe verdoppelt sich (581 KB → 1.14 MB) — Reviewer sieht jetzt den vollen DSE-Vendor-Tabellen-Inhalt im Video. Tests: 10/10 grün (+2 für Akkordeon-Render-Pfade). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-07 17:23:55 +02:00
Benjamin Admin	cb4b352846	feat(b17): Playwright Audit-Walk-Video (Stufe 1, #7 ) Nimmt einen kompletten Site-Walk als WebKit-Browser-Session inkl. Video auf. Reviewer kann nachträglich exakt nachvollziehen, wie die Engine zum Befund kam. consent-tester: - services/audit_walk_recorder.py: Playwright record_video_dir, iPhone-Viewport-free 1280×800. Goto homepage → Banner-Accept (Best-Effort: 12 Text-Phrasen + 5 CMP-Fallback-Selektoren) → Footer-Links sammeln (compliance-relevant gefiltert) → pro Link navigate + Dwell-Time → JSON-Action-Index mit UTC-Timestamps + SHA-256 vom Video als Manipulation-Schutz. - routes_audit_walk.py: POST /scan-audit-walk; statische Serves für /audit-walks/{walk_id}/video.webm + walk.json. - main.py: Router registriert. backend: - _b17_wiring.py: Triggert /scan-audit-walk, speichert Walk-Metadata in state["audit_walk"]. Render-Block mit HTML-Tabelle aller Actions (HH:MM:SS + Aktion + Detail) + Links zu Video und walk.json. - _orchestrator.py: run_b17 nach run_b16, async-aufgerufen. - mail_render_v2/_compose.py: audit_walk_html im V2-Layout. - test_b17_audit_walk.py: 8 Tests (Render-Pfade + Wiring). Stufe-2 (Akkordeon-Expansion) und Stufe-3 (DSMS-CID-Anchor) folgen separat. Real-World-Smoke gegen Elli: - 581 KB Video, SHA-256 verifizierbar - 3 Footer-Links besucht (Impressum, Datenschutzerkl., Nutzungs-) - 6 Actions im JSON-Index Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-07 17:20:13 +02:00
Benjamin Admin	529c032641	fix(b9+b14): Real-World-Smoke-Befunde aus Elli-Audit (2026-06-07) Smoke gegen www.elli.eco hat 3 Bugs offengelegt, die in den synthetischen Tests nicht greifbar waren — Real-Texte haben Abkürzungen, HTML-Stripping-Artefakte, andere Formulierungen. B9 Multi-Entity-Impressum — vorher: 13 "Entities" statt 2. - Block-Boundary jetzt HRB-Anker-basiert (jeder HRB-Eintrag markiert eine Entity). Robuster als Legal-Form-Anker, der bei "Programmierung der Webseite Acme GmbH" über-matchte. - _NAME_BLOCKLIST gegen 11 typische False-Positives (programmierung, webseite, umsatzsteueridentifik, ...). - _LEADING_NOISE_RE strippt Email-TLD-Artefakte ("eco "), deutsche Artikel ("Die "), URL-Fragmente. - _USTID_PAT fängt jetzt auch die Vollform ("Umsatzsteueridentifikationsnummer der … ist DE…") über eine zweite Pattern-Alternative mit [\s\S]{0,80}? Bridge. - Dedup gleicher Entity-Namen — Mehrfacherwähnung in einem Doc zählt als EINE Entity. - Fallback auf alten Legal-Form-Anker wenn keine HRBs vorhanden (z.B. e.V. ohne HR-Pflicht). B14 Retention-Conflict — Anchor-Liste erweitert: - "protokolldat" / "protokollierung der zugriffe" / "zugriffsdat" / "zugriffsprotokoll" als zusätzliche Logfile-Anchors (Elli's reale DSE-Wortwahl statt "Logfile"). B15 AI-Legal-Basis — kein Code-Fix. Elli's aktuelle DSE enthält keine LLM-Provider-Erwähnung mehr; der GT-Anker (2026-06-06) ist seither veraltet. 0 Findings ist korrekt für den aktuellen Stand. Tests: 3 neue Real-World-Regression-Tests in test_impressum_multi_entity_check.py::TestRealWorldElliPattern. Combined: 75/75 grün. Real-World-Smoke gegen Elli (HTTP→Text via crude strip): B9: Entities 13→2 ✓, IMPRESSUM-MULTI-UST_ID → VW ✓ B13: 1 Finding (b2c_strong) ✓ B14: 0 (Elli hat aktuell nur EINEN Retention-Wert für Logs) B15: 0 (LLM nicht erwähnt, korrekt) B16: 3 Findings (impressum/dse/cookie Standard-Slug-Brüche) ✓ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-07 08:50:46 +02:00
Benjamin Admin	8e3d05f172	test(elli-gt): GT-Coverage-Integration-Test + Sprint-Briefing - tests/test_elli_gt_coverage.py: 7 Charakterisierungstests die einen synthetischen Elli-State konstruieren und sicherstellen, dass die 5 neuen Detektoren (B13-B16 + B9-Cleanup) genau die erwarteten GT-IDs fangen. Regressionsschutz. - zeroclaw/docs/audits/2026-06-06-elli-gt-coverage-sprint.md: Sprint-Zusammenfassung mit GT-Bilanz (12/13 voll, 1/13 wartet auf #7), Commit-Liste und Morgen-Agenda-Kandidaten. Combined Sprint-Test-Run: 72/72 grün. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-07 00:28:29 +02:00
Benjamin Admin	65e8bb9d42	feat(b16): Footer-Label-vs-URL-Slug-Drift-Check (GT URL-STRUCTURE-001) Erkennt: gängige Footer-Labels / Bookmark- + SEO-Erwartungs-Slugs (z.B. "Cookie-Richtlinie", "AGB", "Datenschutzerklärung") liefern 404, während das Doc tatsächlich unter einem abweichenden Slug ausgeliefert wird. GT-Anker (Elli URL-STRUCTURE-001): Footer-Label "Cookie-Richtlinie" → /cookie-richtlinie 404 Real: /de/cookies → externe Bookmarks und Google-Treffer brechen. Heuristik: - Aus auto-discovered URLs Origin + Sprach-Prefix extrahieren (z.B. /de, /de-de) - Pro doc_type 2-4 kanonische Standard-Slugs probieren (parallel via ThreadPoolExecutor, 2s Timeout, HEAD → GET fallback bei 405) - Wenn alternative Slug 404/410 → LOW Finding pro doc_type - Probe-Cap auf 18 Requests gesamt (Network-Noise-Schutz) - Abschaltbar via URL_SLUG_PROBE_DISABLED=1 Severity: LOW (Best-Practice, kein juristisches Hardfail). Tests: 13/13 grün (Strip-Helper 4 + Origin-Helper 3 + Check-Pfade 6 inkl. mocked _head_status). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-07 00:23:25 +02:00
Benjamin Admin	b0b7f80914	feat(b15): AI-Act Rechtsgrundlage-Check (GT AI-ACT-RISK-001) Erkennt: LLM/GPAI-System (Vertex AI, OpenAI/GPT, Claude) wird in DSE oder Cookie-Doc auf Art. 6 Abs. 1 lit. f (berechtigtes Interesse) gestützt — statt auf lit. a (Einwilligung). GT-Anker (Elli AI-ACT-RISK-001): Vertex-AI-Chatbot mit lit. f deklariert. Bei LLM-Prompt/Output-Logging + US-Transfer + Profiling-Ähnlichkeit ist Interessenabwägung fragwürdig. Heuristik: - KB-basiert (chat_providers.json filter: ai_capable + LLM-Type-Hint) - LLM-Vendor-Aliases inkl. Marken-Familien (PaLM, Gemini, GPT-4, ChatGPT, Claude 3, Azure OpenAI) - Absatz-Boundary-Scope: Provider + lit. f im selben Absatz - Negativ-Filter: wenn lit. a / Einwilligung ebenfalls im Absatz → kein Finding (Side-Purpose-Erwähnung) - Dedup pro (doc_type, provider_id) Severity: MEDIUM. Norm: DSGVO Art. 6 Abs. 1 lit. a vs lit. f + AI Act Art. 50 + 51. Tests: 17/17 grün. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-07 00:15:08 +02:00
Benjamin Admin	6aad774fc1	feat(b14): widersprüchliche Speicherdauer im selben Doc (GT TH-RETENTION-001) Erkennt: in derselben DSE / Cookie-Richtlinie nennt der Anbieter für DIESELBE Datenkategorie mehrere unterschiedliche Speicherdauern. GT-Anker (Elli): Logfiles "7 Tage" + "30 Tage" im selben DSE → eine Angabe ist falsch oder veraltet. Heuristik: - Satz-Boundary-Scope (kein ±N-Zeichen-Fenster) verhindert Cross-Category-Leakage - Pro Satz: Kategorie-Anchor + Retention-Werte beide drin - Tag-Cluster mit ±20 %-Toleranz: "30 Tage" und "1 Monat" = 1 Cluster; "7 Tage" und "30 Tage" = 2 Cluster → Finding Kategorien (Phase 1): - logfile, contact_form, application, newsletter, invoice, session_cookie Severity: MEDIUM (DSGVO Art. 5 Abs. 1 lit. a + Art. 13 Abs. 2 lit. a). Tests: 11/11 grün (Cluster-Logik 5, Check-Pfade 6, inkl. Cross- Category-Leakage-Regression). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-07 00:12:00 +02:00
Benjamin Admin	8b9cad88ae	fix(b9): clean entity names in multi-entity-impressum (GT IMPRESSUM-001) Der Multi-Entity-Check fängt Elli's USt-IdNr-Lücke (VW Group Charging GmbH hat keine, Elli Mobility GmbH hat eine), aber Entity-Namen waren mit Header-Noise verunreinigt: 'Impressum\n\nVolkswagen Group Charging GmbH' 'eco\n\nElli Mobility GmbH' Behoben: - _ENTITY_PAT lässt nur Space im Namen zu (kein \s/\n mehr) - _clean_entity_name() trimmt Header-Worte (Impressum, Anbieter, ...) und nimmt nur die letzte Zeile vor Legal-Form-Suffix - 11 neue Tests, davon einer mit Elli-like Impressum als Charakterisierungs-Test Damit ist die finale Finding-Ausgabe für Audit-Reports lesbar ('Fehlt bei: Volkswagen Group Charging GmbH') statt verunreinigt. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-07 00:08:18 +02:00
Benjamin Admin	b9baa8c603	feat(b13): Widerrufsbelehrung-Reachability-Check (GT WIDERRUFSBELEHRUNG-001) Erkennt B2C-Shop ohne öffentlich erreichbare Widerrufsbelehrung. Schließt eine der offenen GT-Lücken aus dem Elli-Audit. Signale: - doc_entries[widerruf]: discovery_attempted=True + Text leer - kein Footer-Link auf Widerruf/cancellation/rückgabe - B2C-Scope: Warenkorb/Kasse/Bestellung/MwSt/Wallbox/Tarif (strong) vs Shop/Produkt/Rechnung (weak, ≥2 = likely) - B2B-only-Override: "ausschließlich an Unternehmer" etc. Severity: - HIGH bei b2c_strong - MEDIUM bei b2c_likely - kein Finding bei b2b_only / unknown (False-Positive-Schutz) Norm: Art. 246a § 1 Abs. 2 Nr. 1 EGBGB i.V.m. § 312d BGB. Wiring: - widerrufsbelehrung_reachability_check.py — Check + Scope-Detection - _b13_wiring.py — Render + state-Anschluss - _orchestrator.py — run_b13 nach run_b12 - mail_render_v2/_compose.py — widerruf_reach_html-Block Tests: 13/13 grün (Scope-Detection 5 + Check-Logik 8). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-07 00:04:41 +02:00
Benjamin Admin	bcf1bfa038	test(template-rules): pytest suite for backend foundation (Phase 1.6) CI / detect-changes (push) Successful in 7s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / secret-scan (push) Has been skipped Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / build-sha-integrity (push) Failing after 4s Details CI / validate-canonical-controls (push) Successful in 11s Details CI / loc-budget (push) Failing after 15s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Has been skipped Details CI / test-go (push) Has been skipped Details CI / iace-gt-coverage (push) Has been skipped Details CI / test-python-backend (push) Successful in 29s Details CI / test-python-document-crawler (push) Has been skipped Details CI / test-python-dsms-gateway (push) Has been skipped Details Adds tests/test_template_rule_routes.py with: - Schema tests (Pydantic validation: condition, clause, version create, submit-for-review change_summary, override create, recommendation request) - Clause evaluator (eq, neq, in, not_in, gte with string buckets, exists, truthy) - Condition evaluator (all/any kinds, empty clauses always pass) - Recommendation profile tests (table-driven): * AI-Startup with 2 employees gets ai_usage_policy but not whistleblower * 1000+ employee corporate gets whistleblower * Always-rules (impressum) apply to anyone * Third-country transfer triggers TIA unless DPF/adequate - Tenant override tests: * Override changes classification (required → optional with override_applied flag) * NULL override disables rule completely Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-06 23:19:22 +02:00
Benjamin Admin	d0e3621192	feat(audit): V2 mail render + 5 new findings (B4/B5/B6/B7/B8) + LLM-Plausibility-Phase Mail Render V2 (compliance/services/mail_render_v2/) — 11-Modul-Subpackage das einen einheitlichen Audit-Mail-Output erzeugt mit: - Header + KPI-Kacheln (Score / Findings / Docs / Vendors) - TOC + Sprung-Links - 3-Bucket-Trennung: Kritische Befunde / Manuelle Prüfung / Interne Reminder - Cookie-Inventar (Name·Vendor·Kategorie·Speicherdauer·Löschfrist·Sitzland·Quelle·Status) - Sofortmaßnahmen-Aggregator ("Sitzland ergänzen für 11 Cookies") - 24 Legacy-Wrappers — alle alten build_*_html in V2-Sections - Scope-Filter: FIN/GOV/MED/INS/EDU/LEG aus Berichten wenn nicht relevant - Hint/Action-Dedup: keine doppelten Sätze pro Card mehr Aktiviert via env MAIL_RENDER_V2=true (Default: legacy renderer). 5 neue deterministische Findings als Phase D-2b/B4/B5/B6/B7/B8: B4 vendor_consistency_check — Cross-Doc-Provider-Widerspruch (Elli: DSE nennt Vertex AI für Chatbot, /de/cookies nennt Iadvize → HIGH). 6 Service-Types: chatbot/analytics/tag_manager/pixel/cdn/cmp. B5 ai_act_transparency_check — AI Act Art. 50 Transparenzpflicht (Elli: Vertex AI vorhanden ohne Pre-Chat-Disclosure → HIGH). Plus B5-Erweiterung: Rechtsgrundlage Art-6-Abs-1-lit-f bei AI → MED (Einwilligung empfehlen). B6 cross_doc_dpo_check — DPO in DSE genannt, nicht im Impressum (LOW). B7 doc_staleness_check — Datum-Extraktion aus DSE/AGB/Nutzungsbedingungen. Cap: AGB/NB 3y, DSE 2y. Älter → MEDIUM (Elli NB Stand 2018 → HIGH). B8 cmp_fingerprint_check — Banner detected, aber CMP-Provider generic (kein Usercentrics/OneTrust/Cookiebot/etc → MED). B3-Erweiterung detect_intra_doc_contradictions — Widersprüchliche Speicherdauer im SELBEN Doc (Elli: Logfile 7d vs 30d → HIGH). LLM-Plausibility-Phase (Phase D-2b, finding_plausibility_check.py): - Läuft AFTER MC pipeline, BEFORE D3 render - Prompt mit Beispiel-IDs + 3-Phase-Mapping: exact-ID / position-fallback / fuzzy-tail-match - Stempelt llm_title / llm_severity / llm_recommendation / llm_drop auf jeden FAIL CheckItem - V2-Render zeigt "🤖 LLM-Plausibility:" Box pro Finding wenn gestempelt - KNOWN ISSUE: qwen3:30b-a3b liefert oft empty content auf format='json' + 8000-char-excerpt prompts. Pipeline läuft mit stamped=0 weiter. Task #16. Coverage gegen Elli Ground Truth (zeroclaw/docs/ground-truth/elli_eco_2026-06-06.json, 13 expected findings via WebFetch-Agent-Crawl): - 4/4 HIGH-Findings ✓ (COOKIE-CONSENT-UX-001 + WIDERRUFSBELEHRUNG-001 + VENDOR-CONSISTENCY-001 + AI-ACT-TRANSPARENCY-001) - 4/6 MEDIUM ✓ - 2/3 LOW ✓ - Total: 10/13 = 77% (Sprung von 4/13 = 31%) Restliche 3 Gaps als Task #17: IMPRESSUM-001 (multi-entity USt-IdNr), TRANSFER-001 (Vendor-Mechanismus DPF/SCC), TH-RETENTION-002 (AI-Retention pro Datenkategorie). V2-Mail-Preview in Mailpit: 'v2all@local.test' Subject '[V2 ALL] ELLI'. Backend healthy, B1+B3+B4+B5+B6+B7+B8 alle live im Orchestrator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-06 21:19:49 +02:00
Benjamin Admin	c2c8783fee	refactor(agent-check): split routes file (2692→347 LOC) + wire B1/B3/A1 [guardrail-change] Phase-5 split of agent_compliance_check_routes.py — the 2700-line monolith was decomposed into 19 modules in compliance/api/agent_check/: - Phase A-F: resolve / profile+check / banner+TCF / vendors raw+finalize / HTML blocks top+mid+bot / email / persist - Helpers: _constants, _helpers, _fetch, _discovery, _single_check - Schemas + State + thin _orchestrator A1 ZIP-Anhang nativ in _phase_e_email: evidence_zip_builder.py bundles slices + manifest.json + audit_metadata.json (SHA256 per slice + build_sha + source_url). smtp_sender.py erweitert um attachments-Parameter. B1 COOKIE-CONSENT-UX-001 (Mobile Reachability): consent_reachability_check.py parses footer anchors, classifies intent (reopen_cmp / info_only / browser_deflect) + target (same_page_cmp / new_tab / external). _b1_wiring.py fetches homepage with iPhone-UA + renders Art-7-Abs-3 severity-coloured block. B3 TH-RETENTION (Cross-Doc Speicherdauer): retention_comparator.py compares DSI claim ↔ cookie-table duration ↔ actual Max-Age/expires with 5% tolerance + severity hierarchy (dsi_under_actual HIGH, table_under_actual HIGH, dsi_vs_table MEDIUM, actual_under_table LOW Safari-ITP-Hint). _b3_wiring.py + Top-10 mismatches table in mail. Side-effects: - Fixed silent UnboundLocalError in original Step 5 (gf_one_pager used audit_quality_findings before declaration, caught by surrounding except → block never rendered). New _phase_d3_blocks_bot.py runs audit-quality FIRST. - agent_compliance_check_routes.py removed from loc-exceptions.txt ("Phase 5 split target" — done). Tests: 55/55 grün (B1 22 + B3 27 + saving_scan 6). E2E: smoke against Elli DSE+Cookie produced HIGH/missing B1 finding, TH-RETENTION table (17 cookies / 3 ✓ / 3 ✗ / 11 ?), evidence-zip with 2 slices + manifest + audit_metadata (12089B, SHA256-chained, source verified), email sent (attachments=1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-06-06 14:47:25 +02:00
Benjamin Admin	6baf44ac84	fix(mc-audit): TOM/AVV case-mismatch + Ausnahmen-Pattern Wortabstand - _PROCESS_INTERNAL_PATTERNS: Patterns wurden gegen lowercased Blob geprueft, aber Case-sensitive geschrieben (TOM/AVV/SCC). Matchen nie. Auf lowercase normalisiert. - "Ausnahmen ... dokumentieren": Pattern war zu eng, verlangte direkte Adjazenz. Jetzt bis zu 60 Zeichen Wortabstand. - Test-Suite mit 22 kuratierten DSGVO/AI-Act/eCall-MC-Labels. Alle gruen (vorher 2/22 FAIL — beide vom User explizit als Beispiele genannt: TOM, Ausnahmen). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 11:51:03 +02:00
Benjamin Admin	8cbb513e2c	feat(audit): Phase 1 Quick-Wins (P81 + P85 + P70 + P83) + TCF DELETE/INSERT-Fix CI / guardrail-integrity (push) Has been skipped Details CI / secret-scan (push) Has been skipped Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / detect-changes (push) Successful in 11s Details CI / branch-name (push) Has been skipped Details CI / loc-budget (push) Failing after 16s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Has been skipped Details CI / validate-canonical-controls (push) Successful in 15s Details CI / iace-gt-coverage (push) Has been skipped Details CI / test-python-backend (push) Successful in 38s Details CI / test-python-document-crawler (push) Has been skipped Details CI / test-python-dsms-gateway (push) Has been skipped Details CI / test-go (push) Has been skipped Details P81 — tests/fixtures/golden_truth/vw_de.json: GT-Fixture mit must_find_cookies (47 VW-Cookies) + expected_vendors (Google, Adobe, Trade Desk, ...). Basis fuer kuenftige Regression-Tests. P85 — banner_screenshot_block.py + consent_scanner.py + main.py: consent-tester macht beim Banner-Detect einen base64-PNG-Screenshot (< 1.5MB). Backend rendert ihn als <img src="data:..."> direkt nach dem GF-1-Pager. Visueller Beweis 'so sah das Banner aus' fuer Dispute mit Marketing/DSB. P70 — rag_provenance.py: classify_finding_provenance() klassifiziert ein Finding als 'rag' (Norm + Quelle), 'mixed' (Norm ohne Quelle) oder 'heuristic' (eigene Interpretation). provenance_badge_html() rendert kleine Badges (✓ RAG / NORM / ⚠ HEURISTIK). Modul ist generisch, kann bei jedem Finding-Renderer einklinkt werden. P83 — scripts/check-rebuild-needed.sh: Prueft ob die im Container deployten BUILD_SHA mit local HEAD uebereinstimmen. Bei Mismatch exit 1 mit 'REBUILD REQUIRED'-Hinweis. Verhindert das 'alter Code im Container'-Problem das uns mehrfach erwischt hat (Frontend-Tabs sichtbar, Backend ohne neuen Service). TCF-Fix — tcf_vendor_authority.py: cookie_library hat keinen UNIQUE-Index auf cookie_name → ON CONFLICT war unmoeglich. Loesung: vor Insert DELETE WHERE source_name='iab_tcf_v2'. Idempotent. + per-Vendor-Commit damit ein Fail die naechsten nicht blockt. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-22 08:24:46 +02:00
Benjamin Admin	081e4f057a	feat(audit): Cookie-Compliance-Audit (3-Quellen-Vergleich) + Vendor-Dedup + Block-Parser CI / detect-changes (push) Successful in 12s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / secret-scan (push) Has been skipped Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Failing after 55s Details CI / iace-gt-coverage (push) Successful in 25s Details CI / test-python-backend (push) Successful in 44s Details CI / test-python-document-crawler (push) Has been skipped Details CI / test-python-dsms-gateway (push) Has been skipped Details CI / validate-canonical-controls (push) Successful in 16s Details CI / loc-budget (push) Failing after 18s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m43s Details ZENTRALER USP: cookie_compliance_audit.py vergleicht 3 Quellen * DEKLARIERT in Cookie-Richtlinie (parse_cookie_table + parse_flat) * TATSAECHLICH im Browser geladen (banner_result.phases.after_accept) * LIBRARY-Metadaten (cookie_library lookup) Liefert 3 Listen mit Compliance-Verdict: * compliant (deklariert UND geladen) — gruener Block * undeclared_in_browser (geladen NICHT deklariert) — ROTER HIGH-Block → Art. 13(1)(c) DSGVO + § 25 TDDDG Verstoss * declared_not_loaded (deklariert NICHT geladen) — gelber Hinweis → Tabelle moeglicherweise veraltet parse_cookie_table erweitert um Block-Format (5 Zeilen pro Cookie wie beim User-Copy aus VW). Findet 35+ Cookies aus Copy-Paste statt 0. vendor_normalizer.py: 50+ Aliases (Google-Familie, Adobe-Familie, Trade Desk, AdForm, ...) + Garbage-Filter (URLs, leere Strings, 'click to select', 'Mehrere OEMs'). Mergt cookies-Listen beim Dedup. _guess_vendor erweitert: Adobe-Familie (s_ecid/AMCV/demdex/mbox/...), Trade Desk (TDID/TDCPM/TTDOptOut), AdForm (uid/cid/otsid), Salesforce LiveAgent, etracker, Akamai, EDAA. audit_quality_checks: vendor-thin-Threshold jetzt dynamisch nach Cookie-Doc-Wörter (3k→10 / 6k→20 / 10k→30 / 15k+→40). VW-Test-Fixture: tests/fixtures/cookie_gt/vw_cookie_richtlinie.txt (36-Cookie-Sample fuer Regression-Tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 23:36:45 +02:00
Benjamin Admin	7335f64f4f	feat(founding-wizard): Per-Person IP-Assignment + Prefill + E2E-Tests CI / loc-budget (push) Failing after 20s Details CI / detect-changes (push) Successful in 12s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / secret-scan (push) Has been skipped Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / validate-canonical-controls (push) Successful in 19s Details CI / nodejs-build (push) Successful in 3m17s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go (push) Has been skipped Details CI / iace-gt-coverage (push) Has been skipped Details CI / test-python-backend (push) Successful in 43s Details CI / test-python-document-crawler (push) Has been skipped Details CI / test-python-dsms-gateway (push) Has been skipped Details Wizard unterstuetzt jetzt 2-4 Gesellschafter mit individuellem IP-Bereich: - Pro Gruender ein IP-Assignment-Vertrag (z.B. Benjamin: Compliance+RAG; Sharang: Security+Infrastruktur). Pro GF ein eigener Dienstvertrag. - Step 1: Prefill-Button aus Unternehmensprofil + Felder Registergericht und HRB-Nr. - Step 2: Rollen-Dropdown (CEO/CTO/CFO/COO/CPO/GF/Sonstige) statt freie Texteingabe, IP-Bereiche-Textarea pro Person. Backend: - generate_documents() iteriert pro Person fuer PER_PERSON_DOCS. - _build_person_context() injiziert ASSIGNOR_, GF_, IP_LIST_DETAILS aus person.ip_areas. - base_context() propagiert basics.register_court und basics.hrb_number. Tests: - 30/30 Pytest gruen (6 neue: Per-Person-Context, Slug-Helper, Registergericht-Propagation). - 4 neue Playwright-E2E-Specs (hermetisch via route.fulfill, mit Console-/Page-Error-Traps): kompletter 8-Step-Flow, Prefill-Fehlerpfad, Step-Navigation/Reset, Rollen-Dropdown + IP-Areas. - Spec setzt 'bp-sdk-cookie-consent' im addInitScript damit der CookieBannerOverlay nicht die Wizard-Buttons ueberlagert. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 18:49:10 +02:00
Benjamin Admin	badb356740	fix(founding-wizard): nested IF-Bloecke korrekt aufloesen (innermost-first) CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / iace-gt-coverage (push) Has been skipped Details CI / test-python-backend (push) Successful in 42s Details CI / detect-changes (push) Successful in 10s Details CI / secret-scan (push) Has been skipped Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / validate-canonical-controls (push) Successful in 13s Details CI / loc-budget (push) Successful in 16s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Has been skipped Details CI / test-go (push) Has been skipped Details CI / test-python-document-crawler (push) Has been skipped Details CI / test-python-dsms-gateway (push) Has been skipped Details	2026-05-20 19:21:08 +02:00
Benjamin Admin	7a5f1e48dd	feat(founding-wizard): Gründungs-Wizard für 2-Mann GmbH + 14 Notar-Templates [migration-approved] Templates (Migrations 123-136): - 123 GO-GF (Geschäftsordnung Geschäftsführung) - 124 SHA (Shareholders' Agreement, 56 Platzhalter) - 125 Satzung (Articles of Association mit UG-Variante) - 126 GF-Dienstvertrag (Trennungsprinzip Organ/Anstellung) - 127 Arbeitsvertrag (AGG-neutral, NachwG, eAU) - 128 Gesellschafterliste (§ 40 GmbHG) - 129 GF-Bestellungsbeschluss (mit § 6 Abs. 2 Versicherung) - 130 HRB-Anmeldung (§§ 7, 8, 39 GmbHG, § 12 HGB) - 131 IP-Assignment Agreement (Gründer→GmbH) - 132 Term Sheet (Pre-Seed/Seed VC-Standard) - 133 Wandeldarlehensvertrag (Convertible Loan) - 134 Beteiligungsvertrag (Subscription Agreement) - 135 ESOP/VSOP-Plan (3 Varianten) - 136 Cap Table Kategorisierung (Migrations 137-138): - ALTER TABLE compliance_legal_templates ADD lifecycle_stage TEXT[], functional_category TEXT (mit CHECK Constraints + GIN-Index) - Backfill aller 105 Templates: lifecycle_stage (pre_founding\|founding\| startup\|kmu\|konzern) + functional_category (founding_legal\|employment\| investor_funding\|...) Backend Founding-Wizard Service: - template_renderer.py: Handlebars-light ({{VAR}}, {{#IF FLAG}}...{{/IF}}) - wizard_to_context.py: Mapping Wizard-State → SCREAMING_SNAKE_CASE Vars - markdown_to_docx.py: Markdown → DOCX via python-docx - founding_wizard_routes.py: POST /v1/founding-wizard/generate → liefert base64-DOCX-Files für ausgewählte Templates Frontend Founding-Wizard (/sdk/founding-wizard): - 8-Step Wizard (Basics, Gesellschafter, GF, Kapital, Notar, SHA, GF-Verträge, Generate) - useFoundingWizardForm Hook mit localStorage-Persistenz - TypeScript Code-Registry (template-categories.ts) als Backup zur DB - Word-Download via data:URLs (base64) Tests: - 20 Unit-Tests grün (Renderer, Context-Mapping, DOCX-Conversion) - Playwright E2E-Test mit 2-Mann GmbH (Benjamin + Sharang) Test-Daten	2026-05-20 09:30:51 +02:00
Benjamin Admin	6c223c7c9b	feat(compliance-check): exec-summary + voll-audit + TDM-respect + cookie-KB-extended + saving-scan-funnel CI / detect-changes (push) Successful in 10s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / secret-scan (push) Has been skipped Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / validate-canonical-controls (push) Successful in 14s Details CI / loc-budget (push) Failing after 15s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m43s Details CI / test-go (push) Has been skipped Details CI / iace-gt-coverage (push) Has been skipped Details CI / test-python-backend (push) Successful in 37s Details CI / test-python-document-crawler (push) Has been skipped Details CI / test-python-dsms-gateway (push) Has been skipped Details P1 — Exec-Summary oben im Email-Report (4 KPIs + 2 CTAs, dunkler Gradient) P3 — no_direct_sales-Flag fuer OEM-Konfigurator-Sites; AGB/Widerruf/AGB als "NICHT ANWENDBAR" (grau) statt "NICHT GEFUNDEN" (rot) P5 — Voll-Audit Unification: alle Findings (MC + Pflichtangaben + Vendor + Redundanz) in /data/compliance_audits.db.unified_findings; neuer /api/compliance/agent/findings/<id> Endpoint + FindingsTab im Audit-UI mit Filter + CSV-Export P7 — Crawl-Hardening: TDM-Reservation-Check (robots.txt / ai.txt / Header / Meta) vor jedem Run mit 24h-Cache; HeadlessChrome-UA (Firma noch nicht gegruendet — Switch via BREAKPILOT_BRANDED_UA env); per-Domain Rate-Limit 1 req/s + max 2 concurrent P2 — Cookie-Knowledge-DB additiv erweitert (35 -> 74 Cookies): Adobe, Meta, Microsoft, LinkedIn, TikTok, HubSpot, Marketo, Salesforce, Hotjar, FullStory, Mouseflow, Intercom, Drift, Zendesk, Cloudflare, Stripe, OneTrust/Cookiebot/Usercentrics, Matomo, Pinterest, Snapchat, X/Twitter, YouTube, Vimeo, Klaviyo, Mailchimp, Mixpanel, Segment, Amplitude, Optimizely, Datadog; Wire-in in cookie_function_classifier liefert compliance_risk-Label (kritisch/hoch/mittel/gering) pro Vendor A — k-Anonymitaets-Helper (benchmark_k_anonymity) fuer P6-Vorbereitung B — Cross-Tenant-Domain-Assertion im /findings-Endpoint (expected_domain Query-Param -> 403 bei Mismatch) C — Saving-Scan-Funnel: /api/compliance/agent/saving-scan/start mit Validierung + 24h-Rate-Limit pro Domain + Lead-Persistenz in saving_scan_leads + Auto-Discovery via _run_compliance_check; 6 Tests D — Risk-Badge im Email-Vendor-Row Rechtliche Leitplanken (Memory feedback_oem_data_legal.md): nur eigene Knapp-Bewertungen + Source-Pointer, keine 1:1-Kopien fremder CMP-Texte. TDM-Opt-Out-Respect nach § 44b UrhG. KEINE Schema-Aenderungen — alles in Sidecar-SQLite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 23:48:34 +02:00
Benjamin Admin	df7d83134b	feat(agent): migrate compliance-check results to banner + documents (M1-M5) After a compliance-check run finishes, the user can now apply the extracted vendor inventory directly to their own: - CookieBanner config (admin /sdk/einwilligungen) - Cookie-Policy / VVT-Register / Privacy-Policy templates (admin /sdk/document-generator) Backend: - migration_to_banner.py: vendor list -> CookieBannerConfig with ESSENTIAL/PERFORMANCE/PERSONALIZATION/EXTERNAL_MEDIA buckets + review flags (broken opt-out URLs, missing expiry, no cookies listed) - migration_to_document.py: vendor list -> pre-fills for 3 doc templates, recipient-type aware (INTERNAL/GROUP/PROCESSOR/CONTROLLER) - agent_migration_routes.py: GET /banner-preview, /document-preview, /summary keyed on check_id - compliance_audit_log: new check_payloads table persists cmp_vendors + extracted_profile so the preview survives an app restart - tests: 9 mapper units + 4 endpoint integration tests Frontend: - MigrationPanel.tsx: modal showing banner-config diff + document pre-fills, plus links into the existing editors - ComplianceCheckTab.tsx: replaces standalone audit link with the panel; net -3 lines, stays at the 500-cap Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-17 14:06:28 +02:00
Benjamin Admin	17c67b4f25	feat: Cookie-Banner ↔ Backend Integration (DSR, Retention, Consent Proof) Phase 1: Vendor sync from service registry (82+ services → banner vendors) Phase 2: Category-based retention (marketing=90d, statistics=790d, not hardcoded 365d) Phase 3: DSR ↔ Banner email linking (link-email, by-email, Art.17 erasure, Art.15/20 export) Phase 4: Consent sync (Banner → Einwilligungen bridge) Phase 6: Consent proof (SHA256 config hash + config_version in audit log, Art. 7(1) DSGVO) New files: - banner_dsr_service.py — email linking + DSR integration - vendor_banner_sync.py — service registry → vendor configs - migration 106 — linked_email, banner_config_hash, consent_version columns Tests: 20+ new backend tests + 2 Playwright E2E test suites (API + UI) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 19:41:22 +02:00
Benjamin Admin	0f3ba9c207	test: Lit-Mapping validation — Dict vs Control Library comparison 8 test cases with deliberately wrong legal basis assignments: - Cookie tracking on lit. f (should be lit. a) - Analytics on lit. b (should be lit. a) - Newsletter on lit. f (should be lit. a) - Klarna without Art. 22 - Session recording on lit. f - 2 correct cases (should NOT trigger findings) Runs both hardcoded dict AND Control Library query, compares results. If Control Library passes all → dict can be removed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-29 16:56:38 +02:00
Benjamin Admin	0c0dd4e3a6	feat: ZeroClaw compliance agent — document analysis + role assignment + email Add autonomous compliance agent that fetches web documents (cookie banners, privacy policies), classifies them via Qwen/Ollama, assesses DSGVO compliance, assigns to the responsible role, and sends notification emails. Components: - ZeroClaw SOP (6-step workflow: fetch, classify, assess, summarize, assign, notify) - Backend: /api/compliance/agent/analyze (combined endpoint) - Backend: /api/compliance/agent/notify (standalone email) - Frontend: /sdk/agent page (Manager UI with URL input + results) - Helper scripts + E2E test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-27 23:28:21 +02:00
Sharang Parnerkar	c43d9da6d0	merge: sync with origin/main, take upstream on conflicts # Conflicts: # admin-compliance/lib/sdk/types.ts # admin-compliance/lib/sdk/vendor-compliance/types.ts	2026-04-16 16:26:48 +02:00
Sharang Parnerkar	7344e5806e	refactor(backend/isms): split isms_assessment_service.py to stay under 500 LOC The previous commit (`32e121f`) left isms_assessment_service.py at 639 LOC, exceeding the 500-line hard cap. This follow-up extracts ReadinessCheckService and OverviewService into a new isms_readiness_service.py (400 LOC), leaving isms_assessment_service.py at 257 LOC (Management Reviews, Internal Audits, Audit Trail only). Updated isms_routes.py imports to reference the new service file. File sizes after split: - isms_routes.py: 446 LOC (thin handlers) - isms_governance_service.py: 416 LOC (scope, context, policy, objectives, SoA) - isms_findings_service.py: 276 LOC (findings, CAPA) - isms_assessment_service.py: 257 LOC (mgmt reviews, internal audits, audit trail) - isms_readiness_service.py: 400 LOC (readiness check, ISO 27001 overview) All 58 integration tests + 173 unit/contract tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 20:50:30 +02:00
Sharang Parnerkar	ae008d7d25	refactor(backend/api): extract DSFA schemas + services (Step 4 — file 14 of 18) - Create compliance/schemas/dsfa.py (161 LOC) — extract DSFACreate, DSFAUpdate, DSFAStatusUpdate, DSFASectionUpdate, DSFAApproveRequest - Create compliance/services/dsfa_service.py (386 LOC) — CRUD + helpers + stats + audit-log + CSV export; uses domain errors - Create compliance/services/dsfa_workflow_service.py (347 LOC) — status update, section update, submit-for-review, approve, export JSON, versions - Rewrite compliance/api/dsfa_routes.py (339 LOC) as thin handlers with Depends + translate_domain_errors(); re-export legacy symbols via __all__ - Add [mypy-compliance.api.dsfa_routes] ignore_errors = False to mypy.ini - Update tests: 422 -> 400 for domain ValidationError (6 assertions) - Regenerate OpenAPI baseline (360 paths / 484 operations — unchanged) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 19:20:48 +02:00
Sharang Parnerkar	d2c94619d8	refactor(backend/api): extract LegalDocumentConsentService (Step 4 — file 12 of 18) Extract consent, audit log, cookie category, and consent stats endpoints from legal_document_routes into LegalDocumentConsentService. The route file is now a thin handler layer delegating to LegalDocumentService and LegalDocumentConsentService with translate_domain_errors(). Legacy helpers (_doc_to_response, _version_to_response, _transition, _log_approval) and schemas are re-exported for existing tests. Two transition tests updated to expect domain errors instead of HTTPException. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 08:47:56 +02:00
Sharang Parnerkar	cc1c61947d	refactor(backend/api): extract Incident services (Step 4 — file 11 of 18) compliance/api/incident_routes.py (916 LOC) -> 280 LOC thin routes + two services + 95-line schemas file. Two-service split for DSGVO Art. 33/34 Datenpannen-Management: incident_service.py (460 LOC): - CRUD (create, list, get, update, delete) - Stats, status update, timeline append, close - Module-level helpers: _calculate_risk_level, _is_notification_required, _calculate_72h_deadline, _incident_to_response, _measure_to_response, _parse_jsonb, _append_timeline, DEFAULT_TENANT_ID incident_workflow_service.py (329 LOC): - Risk assessment (likelihood x impact -> risk_level) - Art. 33 authority notification (with 72h deadline tracking) - Art. 34 data subject notification - Corrective measures CRUD Both services use raw SQL via sqlalchemy.text() — no ORM models for incident_incidents / incident_measures tables. Migrated from the Go ai-compliance-sdk; Python backend is Source of Truth. Legacy test compat: tests/test_incident_routes.py imports _calculate_risk_level, _is_notification_required, _calculate_72h_deadline, _incident_to_response, _measure_to_response, _parse_jsonb, DEFAULT_TENANT_ID directly from compliance.api.incident_routes — all re-exported via __all__. Verified: - 223/223 pytest pass (173 core + 50 incident) - OpenAPI 360/484 unchanged - mypy compliance/ -> Success on 141 source files - incident_routes.py 916 -> 280 LOC - Hard-cap violations: 8 -> 7 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 08:35:57 +02:00
Sharang Parnerkar	0c2e03f294	refactor(backend/api): extract Email Template services (Step 4 — file 10 of 18) compliance/api/email_template_routes.py (823 LOC) -> 295 LOC thin routes + 402-line EmailTemplateService + 241-line EmailTemplateVersionService + 61-line schemas file. Two-service split along natural responsibility seam: email_template_service.py (402 LOC): - Template type catalog (TEMPLATE_TYPES constant) - Template CRUD (list, create, get) - Stats, settings, send logs, initialization, default content - Shared _template_to_dict / _version_to_dict / _render_template helpers email_template_version_service.py (241 LOC): - Version CRUD (create, list, get, update) - Workflow transitions (submit, approve, reject, publish) - Preview and test-send TEMPLATE_TYPES, VALID_CATEGORIES, VALID_STATUSES re-exported from the route module for any legacy consumers. State-transition errors use ValidationError (-> HTTPException 400) to preserve the original handler's 400 status for "Only draft/review versions can be ..." checks, since the existing TestClient integration tests (47 tests) assert status_code == 400. Verified: - 47/47 tests/test_email_template_routes.py pass - OpenAPI 360/484 unchanged - mypy compliance/ -> Success on 138 source files - email_template_routes.py 823 -> 295 LOC - Hard-cap violations: 9 -> 8 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 22:39:19 +02:00
Sharang Parnerkar	a638d0e527	refactor(backend/api): extract EvidenceService (Step 4 — file 9 of 18) compliance/api/evidence_routes.py (641 LOC) -> 240 LOC thin routes + 460-line EvidenceService. Manages evidence CRUD, file upload, CI/CD evidence collection (SAST/dependency/SBOM/container scans), and CI status dashboard. Service injection pattern: EvidenceService takes the EvidenceRepository, ControlRepository, and AutoRiskUpdater classes as constructor parameters. The route's get_evidence_service factory reads these class references from its own module namespace so tests that ``patch("compliance.api.evidence_routes.EvidenceRepository", ...)`` still take effect through the factory. The `_store_evidence` and `_update_risks` helpers stay as module-level callables in evidence_service and are re-exported from the route module. The collect_ci_evidence handler remains inline (not delegated to a service method) so tests can patch `compliance.api.evidence_routes._store_evidence` and have the patch take effect at the handler's call site. Legacy re-exports via __all__: SOURCE_CONTROL_MAP, EvidenceRepository, ControlRepository, AutoRiskUpdater, _parse_ci_evidence, _extract_findings_detail, _store_evidence, _update_risks. Verified: - 208/208 pytest (core + 35 evidence tests) pass - OpenAPI 360/484 unchanged - mypy compliance/ -> Success on 135 source files - evidence_routes.py 641 -> 240 LOC - Hard-cap violations: 10 -> 9 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 21:59:03 +02:00
Sharang Parnerkar	7107a31496	refactor(backend/api): extract SourcePolicyService (Step 4 — file 7 of 18) compliance/api/source_policy_router.py (580 LOC) -> 253 LOC thin routes + 453-line SourcePolicyService + 83-line schemas file. Manages allowed data sources, operations matrix, PII rules, blocked-content log, audit trail, and dashboard stats/report. Single-service split. ORM-based (uses compliance.db.source_policy_models). Date-string parsing extracted to a module-level _parse_iso_optional helper so the audit + blocked-content list endpoints share it instead of duplicating try/except blocks. Legacy test compat: SourceCreate, SourceUpdate, SourceResponse, PIIRuleCreate, PIIRuleUpdate, OperationUpdate, _log_audit re-exported from compliance.api.source_policy_router via __all__. Verified: - 208/208 pytest pass (173 core + 35 source policy) - OpenAPI 360/484 unchanged - mypy compliance/ -> Success on 132 source files - source_policy_router.py 580 -> 253 LOC - Hard-cap violations: 12 -> 11 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 19:58:02 +02:00
Sharang Parnerkar	b850368ec9	refactor(backend/api): extract CanonicalControlService (Step 4 — file 6 of 18) compliance/api/canonical_control_routes.py (514 LOC) -> 192 LOC thin routes + 316-line CanonicalControlService + 105-line schemas file. Canonical Control Library manages OWASP/NIST/ENISA-anchored security control frameworks and controls. Like company_profile_routes, this file uses raw SQL via sqlalchemy.text() because there are no SQLAlchemy models for canonical_control_frameworks or canonical_controls. Single-service split. Session management moved from bespoke `with SessionLocal() as db:` blocks to Depends(get_db) for consistency. Legacy test imports preserved via re-export (FrameworkResponse, ControlResponse, SimilarityCheckRequest, SimilarityCheckResponse, _control_row). Validation extracted to a module-level `_validate_control_input` helper so both create and update share the same checks. ValidationError (from compliance.domain) replaces raw HTTPException(400) raises. Verified: - 187/187 pytest (173 core + 14 canonical) pass - OpenAPI 360/484 unchanged - mypy compliance/ -> Success on 130 source files - canonical_control_routes.py 514 -> 192 LOC - Hard-cap violations: 13 -> 12 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 19:53:55 +02:00
Sharang Parnerkar	4fa0dd6f6d	refactor(backend/api): extract VVTService (Step 4 — file 5 of 18) compliance/api/vvt_routes.py (550 LOC) -> 225 LOC thin routes + 475-line VVTService. Covers the organization header, processing activities CRUD, audit log, JSON/CSV export, stats, and version lookups for the Art. 30 DSGVO Verzeichnis. Single-service split: organization + activities + audit + stats all revolve around the same tenant's VVT document, and the existing test suite (tests/test_vvt_routes.py — 768 LOC, tests/test_vvt_tenant_isolation.py — 205 LOC) exercises them together. Module-level helpers (_activity_to_response, _log_audit, _export_csv) stay module-level in compliance.services.vvt_service and are re-exported from compliance.api.vvt_routes so the two test files keep importing from the old path. Pydantic schemas already live in compliance.schemas.vvt from Step 3 — no new schema file needed this round. mypy.ini flips compliance.api.vvt_routes from ignore_errors=True to False. Two SQLAlchemy Column[str] vs str dict-index errors fixed with explicit str() casts on status/business_function in the stats loop. Verified: - 242/242 pytest (173 core + 69 VVT integration) pass - OpenAPI 360/484 unchanged - mypy compliance/ -> Success on 128 source files - vvt_routes.py 550 -> 225 LOC - vvt_service.py 475 LOC (under 500 hard cap) - Hard-cap violations: 14 -> 13 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 19:50:40 +02:00
Sharang Parnerkar	f39c7ca40c	refactor(backend/api): extract CompanyProfileService (Step 4 — file 4 of 18) compliance/api/company_profile_routes.py (640 LOC) -> 154 LOC thin routes. Unusual for this repo: persistence uses raw SQL via sqlalchemy.text() because the underlying compliance_company_profiles table has ~45 columns with complex jsonb coercion and there is no SQLAlchemy model for it. New files: compliance/schemas/company_profile.py (127) — 4 request/response models compliance/services/company_profile_service.py (340) — Service class + row_to_response + log_audit compliance/services/_company_profile_sql.py (139) — 70-line INSERT/UPDATE statements separated for readability Minor behavioral improvement: the handlers now use Depends(get_db) for session management instead of the bespoke `db = SessionLocal(); try: ... finally: db.close()` pattern. This makes the routes consistent with every other refactored service, fixes the broken-ness under test dependency_overrides, and removes 6 duplicate try/finally blocks. Legacy exports preserved: CompanyProfileRequest, CompanyProfileResponse, AuditEntryResponse, AuditListResponse, row_to_response, and log_audit are re-exported from compliance.api.company_profile_routes so that the two existing test files (tests/test_company_profile_routes.py, tests/test_company_profile_extend.py) keep importing from the same path. Pre-existing broken tests noted: 6 tests in those files feed a 40-tuple row into row_to_response, but _BASE_COLUMNS_LIST has 46 columns (has had since the Phase 2 Stammdaten extension). These tests fail on main too (verified via `git stash` round-trip). Not fixed in this commit — they require a rewrite of the test's _make_row helper, which is out of scope for a pure structural refactor. Flagged for follow-up. Verified: - 173/173 pytest compliance/tests/ tests/contracts/ pass - OpenAPI 360/484 unchanged - mypy compliance/ -> Success on 127 source files - company_profile_routes.py 640 -> 154 LOC - All new files under soft 300 target except service (340, under hard 500) - Hard-cap violations: 15 -> 14 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 19:47:29 +02:00
Sharang Parnerkar	d571412657	refactor(backend/api): extract TOMService (Step 4 — file 3 of 18) compliance/api/tom_routes.py (609 LOC) -> 215 LOC thin routes + 434-line TOMService. Request bodies (TOMStateBody, TOMMeasureCreate, TOMMeasureUpdate, TOMMeasureBulkItem, TOMMeasureBulkBody) moved to compliance/schemas/tom.py (joining the existing response models from the Step 3 split). Single-service split (not two like banner): state, measures CRUD + bulk upsert, stats, export, and version lookups are all tightly coupled around the TOMMeasureDB aggregate, so splitting would create artificial boundaries. TOMService is 434 LOC — comfortably under the 500 hard cap. Domain error mapping: - ConflictError -> 409 (version conflict on state save; duplicate control_id on create) - NotFoundError -> 404 (missing measure on update; missing version) - ValidationError -> 400 (missing tenant_id on DELETE /state) Legacy test compat: the existing tests/test_tom_routes.py imports TOMMeasureBulkItem, _parse_dt, _measure_to_dict, and DEFAULT_TENANT_ID directly from compliance.api.tom_routes. All re-exported via __all__ so the 44-test file runs unchanged. mypy.ini flips compliance.api.tom_routes from ignore_errors=True to False. TOMService carries the scoped Column[T] header. Verified: - 217/217 pytest (173 baseline + 44 TOM) pass - OpenAPI 360/484 unchanged - mypy compliance/ -> Success on 124 source files - tom_routes.py 609 -> 215 LOC - Hard-cap violations: 16 -> 15 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 19:42:17 +02:00
Sharang Parnerkar	10073f3ef0	refactor(backend/api): extract BannerConsent + BannerAdmin services (Step 4) Phase 1 Step 4, file 2 of 18. Same cookbook as audit_routes (`4a91814` + `883ef70`) applied to banner_routes.py. compliance/api/banner_routes.py (653 LOC) is decomposed into: compliance/api/banner_routes.py (255) — thin handlers compliance/services/banner_consent_service.py (298) — public SDK surface compliance/services/banner_admin_service.py (238) — site/category/vendor CRUD compliance/services/_banner_serializers.py ( 81) — ORM-to-dict helpers shared between the two services compliance/schemas/banner.py ( 85) — Pydantic request models Split rationale: the SDK-facing endpoints (consent CRUD, config retrieval, export, stats) and the admin CRUD endpoints (sites + categories + vendors) have distinct audiences and different auth stories, and combined they would push the service file over the 500 hard cap. Two focused services is cleaner than one ~540-line god class. The shared ORM-to-dict helpers live in a private sibling module (_banner_serializers) rather than a static method on either service, so both services can import without a cycle. Handlers follow the established pattern: - Depends(get_consent_service) or Depends(get_admin_service) - `with translate_domain_errors():` wrapping the service call - Explicit return type annotations - ~3-5 lines per handler Services raise NotFoundError / ConflictError / ValidationError from compliance.domain; no HTTPException in the service layer. mypy.ini flips compliance.api.banner_routes from ignore_errors=True to False, joining audit_routes in the strict scope. The services carry the same scoped `# mypy: disable-error-code="arg-type,assignment"` header used by the audit services for the ORM Column[T] issue. Pydantic schemas moved to compliance.schemas.banner (mirroring the Step 3 schemas split). They were previously defined inline in banner_routes.py and not referenced by anything outside it, so no backwards-compat shim is needed. Verified: - 224/224 pytest (173 baseline + 26 audit integration + 25 banner integration) pass - tests/contracts/test_openapi_baseline.py green (360/484 unchanged) - mypy compliance/ -> Success: no issues found in 123 source files - All new files under the 300 soft target (largest: 298) - banner_routes.py drops from 653 -> 255 LOC (below hard cap) Hard-cap violations remaining: 16 (was 17). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 18:52:31 +02:00
Sharang Parnerkar	883ef702ac	tech-debt: mypy --strict config + integration tests for audit routes Phase 1 Step 4 follow-up addressing the debt flagged in the worked-example commit (`4a91814`). ## mypy --strict policy Adds backend-compliance/mypy.ini declaring the strict-mode scope: Fully strict (enforced today): - compliance/domain/ - compliance/schemas/ - compliance/api/_http_errors.py - compliance/api/audit_routes.py (refactored in Step 4) - compliance/services/audit_session_service.py - compliance/services/audit_signoff_service.py Loose (ignore_errors=True) with a migration path: - compliance/db/* — SQLAlchemy 1.x Column[] vs runtime T; unblocks Phase 1 until a Mapped[T] migration. - compliance/api/<route>.py — each route file flips to strict as its own Step 4 refactor lands. - compliance/services/<legacy util> — 14 utility services (llm_provider, pdf_extractor, seeder, ...) that predate the clean-arch refactor. - compliance/tests/ — excluded (legacy placeholder style). The new TestClient- based integration suite is type-annotated. The two new service files carry a scoped `# mypy: disable-error-code="arg-type,assignment"` header for the ORM Column[T] issue — same underlying SQLAlchemy limitation, narrowly scoped rather than wholesale ignore_errors. Flow: `cd backend-compliance && mypy compliance/` -> clean on 119 files. CI yaml updated to use the config instead of ad-hoc package lists. ## Bugs fixed while enabling strict mypy --strict surfaced two latent bugs in the pre-refactor code. Both were invisible because the old `compliance/tests/test_audit_routes.py` is a placeholder suite that asserts on request-data shape and never calls the handlers: - AuditSessionResponse.updated_at is a required field in the schema, but the original handler didn't pass it. Fixed in AuditSessionService._to_response. - PaginationMeta requires has_next + has_prev. The original audit checklist handler didn't compute them. Fixed in AuditSignOffService.get_checklist. Both are behavior-preserving at the HTTP level because the old code would have raised Pydantic ValidationError at response serialization had the endpoint actually been exercised. ## Integration test suite Adds backend-compliance/tests/test_audit_routes_integration.py — 26 real TestClient tests against an in-memory sqlite backend (StaticPool). Replaces the coverage gap left by the placeholder suite. Covers: - Session CRUD + lifecycle transitions (draft -> in_progress -> completed -> archived), including the 409 paths for illegal transitions - Checklist pagination, filtering, search - Sign-off create / update / auto-start-session / count-flipping - Sign-off 400 (invalid result), 404 (missing requirement), 409 (completed session) - Get-signoff 404 / 200 round-trip Uses a module-scoped schema fixture + per-test DELETE-sweep so the suite runs in ~2.3s despite the ~50-table ORM surface. Verified: - 199/199 pytest (173 original + 26 new audit integration) pass - tests/contracts/test_openapi_baseline.py green, OpenAPI 360/484 unchanged - mypy compliance/ -> Success: no issues found in 119 source files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 18:39:40 +02:00
Sharang Parnerkar	4a91814bfc	refactor(backend/api): extract AuditSession service layer (Step 4 worked example) Phase 1 Step 4 of PHASE1_RUNBOOK.md, first worked example. Demonstrates the router -> service delegation pattern for all 18 oversized route files still above the 500 LOC hard cap. compliance/api/audit_routes.py (637 LOC) is decomposed into: compliance/api/audit_routes.py (198) — thin handlers compliance/services/audit_session_service.py (259) — session lifecycle compliance/services/audit_signoff_service.py (319) — checklist + sign-off compliance/api/_http_errors.py ( 43) — reusable error translator Handlers shrink to 3-6 lines each: @router.post("/sessions", response_model=AuditSessionResponse) async def create_audit_session( request: CreateAuditSessionRequest, service: AuditSessionService = Depends(get_audit_session_service), ): with translate_domain_errors(): return service.create(request) Services are HTTP-agnostic: they raise NotFoundError / ConflictError / ValidationError from compliance.domain, and the route layer translates those to HTTPException(404/409/400) via the translate_domain_errors() context manager in compliance.api._http_errors. The error translator is reusable by every future Step 4 refactor. Services take a sqlalchemy Session in the constructor and are wired via Depends factories (get_audit_session_service / get_audit_signoff_service). No globals, no module-level state. Behavior is byte-identical at the HTTP boundary: - Same paths, methods, status codes, response models - Same error messages (domain error __str__ preserved) - Same auto-start-on-first-signoff, same statistics calculation, same signature hash format, same PDF streaming response Verified: - 173/173 pytest compliance/tests/ tests/contracts/ pass - OpenAPI 360 paths / 484 operations unchanged - audit_routes.py under soft 300 target - Both new service files under soft 300 / hard 500 Note: compliance/tests/test_audit_routes.py contains placeholder tests that do not actually import or call the handler functions — they only assert on request-data shape. Real behavioral coverage relies on the contract test. A follow-up commit should add TestClient-based integration tests for the audit endpoints. Flagged in PHASE1_RUNBOOK. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 18:16:50 +02:00
Sharang Parnerkar	3320ef94fc	refactor: phase 0 guardrails + phase 1 step 2 (models.py split) Squash of branch refactor/phase0-guardrails-and-models-split — 4 commits, 81 files, 173/173 pytest green, OpenAPI contract preserved (360 paths / 484 operations). ## Phase 0 — Architecture guardrails Three defense-in-depth layers to keep the architecture rules enforced regardless of who opens Claude Code in this repo: 1. .claude/settings.json PreToolUse hook on Write/Edit blocks any file that would exceed the 500-line hard cap. Auto-loads in every Claude session in this repo. 2. scripts/githooks/pre-commit (install via scripts/install-hooks.sh) enforces the LOC cap locally, freezes migrations/ without [migration-approved], and protects guardrail files without [guardrail-change]. 3. .gitea/workflows/ci.yaml gains loc-budget + guardrail-integrity + sbom-scan (syft+grype) jobs, adds mypy --strict for the new Python packages (compliance/{services,repositories,domain,schemas}), and tsc --noEmit for admin-compliance + developer-portal. Per-language conventions documented in AGENTS.python.md, AGENTS.go.md, AGENTS.typescript.md at the repo root — layering, tooling, and explicit "what you may NOT do" lists. Root CLAUDE.md is prepended with the six non-negotiable rules. Each of the 10 services gets a README.md. scripts/check-loc.sh enforces soft 300 / hard 500 and surfaces the current baseline of 205 hard + 161 soft violations so Phases 1-4 can drain it incrementally. CI gates only CHANGED files in PRs so the legacy baseline does not block unrelated work. ## Deprecation sweep 47 files. Pydantic V1 regex= -> pattern= (2 sites), class Config -> ConfigDict in source_policy_router.py (schemas.py intentionally skipped; it is the Phase 1 Step 3 split target). datetime.utcnow() -> datetime.now(timezone.utc) everywhere including SQLAlchemy default= callables. All DB columns already declare timezone=True, so this is a latent-bug fix at the Python side, not a schema change. DeprecationWarning count dropped from 158 to 35. ## Phase 1 Step 1 — Contract test harness tests/contracts/test_openapi_baseline.py diffs the live FastAPI /openapi.json against tests/contracts/openapi.baseline.json on every test run. Fails on removed paths, removed status codes, or new required request body fields. Regenerate only via tests/contracts/regenerate_baseline.py after a consumer-updated contract change. This is the safety harness for all subsequent refactor commits. ## Phase 1 Step 2 — models.py split (1466 -> 85 LOC shim) compliance/db/models.py is decomposed into seven sibling aggregate modules following the existing repo pattern (dsr_models.py, vvt_models.py, ...): regulation_models.py (134) — Regulation, Requirement control_models.py (279) — Control, Mapping, Evidence, Risk ai_system_models.py (141) — AISystem, AuditExport service_module_models.py (176) — ServiceModule, ModuleRegulation, ModuleRisk audit_session_models.py (177) — AuditSession, AuditSignOff isms_governance_models.py (323) — ISMSScope, Context, Policy, Objective, SoA isms_audit_models.py (468) — Finding, CAPA, MgmtReview, InternalAudit, AuditTrail, Readiness models.py becomes an 85-line re-export shim in dependency order so existing imports continue to work unchanged. Schema is byte-identical: __tablename__, column definitions, relationship strings, back_populates, cascade directives all preserved. All new sibling files are under the 500-line hard cap; largest is isms_audit_models.py at 468. No file in compliance/db/ now exceeds the hard cap. ## Phase 1 Step 3 — infrastructure only backend-compliance/compliance/{schemas,domain,repositories}/ packages are created as landing zones with docstrings. compliance/domain/ exports DomainError / NotFoundError / ConflictError / ValidationError / PermissionError — the base classes services will use to raise domain-level errors instead of HTTPException. PHASE1_RUNBOOK.md at backend-compliance/PHASE1_RUNBOOK.md documents the nine-step execution plan for Phase 1: snapshot baseline, characterization tests, split models.py (this commit), split schemas.py (next), extract services, extract repositories, mypy --strict, coverage. ## Verification backend-compliance/.venv-phase1: uv python install 3.12 + pip -r requirements.txt PYTHONPATH=. pytest compliance/tests/ tests/contracts/ -> 173 passed, 0 failed, 35 warnings, OpenAPI 360/484 unchanged Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 13:18:29 +02:00
Benjamin Admin	712fa8cb74	feat: Pass 0b quality — negative actions, container detection, session object classes CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 33s Details CI/CD / test-python-backend-compliance (push) Successful in 30s Details CI/CD / test-python-document-crawler (push) Successful in 21s Details CI/CD / test-python-dsms-gateway (push) Successful in 16s Details CI/CD / validate-canonical-controls (push) Successful in 10s Details CI/CD / Deploy (push) Successful in 2s Details 4 error class fixes from AUTH-1052 quality review: 1. Prohibitive action types (prevent/exclude/forbid) for "dürfen keine", "verboten" etc. 2. Container object detection (Sitzungsverwaltung, Token-Schutz → _requires_decomposition) 3. Session-specific object classes (session, cookie, jwt, federated_assertion) 4. Session lifecycle actions (invalidate, issue, rotate, enforce) with templates + severity caps 76 new tests (303 total), all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-28 17:24:19 +01:00
Benjamin Admin	f8d9919b97	Improve object normalization: shorter keys, synonym expansion, qualifier stripping - Truncate object keys to 40 chars (was 80) at underscore boundary - Strip German qualifying prepositional phrases (bei/für/gemäß/von/zur/...) - Add 65 new synonym mappings for near-duplicate patterns found in analysis - Strip trailing noise tokens (articles/prepositions) - Add _truncate_at_boundary() helper and _QUALIFYING_PHRASE_RE regex - 11 new tests for normalization improvements (227 total pass) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-28 08:55:48 +01:00
Benjamin Admin	fb2cf29b34	fix: Pass 0b — Duplicate Guard, Severity-Kalibrierung, Title-Truncation CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 55s Details CI/CD / test-python-backend-compliance (push) Successful in 36s Details CI/CD / test-python-document-crawler (push) Successful in 23s Details CI/CD / test-python-dsms-gateway (push) Successful in 20s Details CI/CD / validate-canonical-controls (push) Successful in 11s Details CI/CD / Deploy (push) Successful in 4s Details 1. Duplicate Guard: merge_hint-Lookup vor INSERT in _write_atomic_control() verhindert semantisch identische Controls unter demselben Parent. 2. Severity-Kalibrierung: action_type-basiert statt blind vom Parent. define/review/test → max medium, implement/monitor → max high. 3. Title-Truncation: Schnitt am Wortende statt mitten im Wort. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-27 08:38:33 +01:00
Benjamin Admin	f39e5a71af	feat: Obligation-Deduplizierung — 34.617 Duplikate als 'duplicate' markiert CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 33s Details CI/CD / test-python-backend-compliance (push) Successful in 35s Details CI/CD / test-python-document-crawler (push) Successful in 30s Details CI/CD / test-python-dsms-gateway (push) Successful in 20s Details CI/CD / validate-canonical-controls (push) Successful in 13s Details CI/CD / Deploy (push) Successful in 3s Details Neue Endpunkte POST /obligations/dedup und GET /obligations/dedup-stats. Pro candidate_id wird der aelteste Eintrag behalten, alle weiteren erhalten release_state='duplicate' mit merged_into_id + quality_flags fuer Traceability. Detail-View filtert Duplikate aus. MKDocs aktualisiert. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 20:13:00 +01:00
Benjamin Admin	52e463a7c8	feat: Faceted Search — Dropdown-Counts passen sich aktiven Filtern an CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 36s Details CI/CD / test-python-backend-compliance (push) Successful in 42s Details CI/CD / test-python-document-crawler (push) Successful in 30s Details CI/CD / test-python-dsms-gateway (push) Successful in 21s Details CI/CD / validate-canonical-controls (push) Successful in 13s Details CI/CD / Deploy (push) Successful in 2s Details Backend: controls-meta akzeptiert alle Filter-Parameter und berechnet Faceted Counts (jede Dimension zaehlt mit allen ANDEREN Filtern). Neue Facets: severity, verification_method, category, evidence_type, release_state — zusaetzlich zu domains, sources, type_counts. Frontend: loadMeta laedt bei jeder Filteraenderung neu, alle Dropdowns zeigen kontextsensitive Zahlen. Proxy leitet Filter an controls-meta weiter. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 15:00:40 +01:00
Benjamin Admin	81c9ce5de3	fix: V1 Enrichment — Qdrant Collection + Parent-Resolution fuer regulatorische Matches CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 33s Details CI/CD / test-python-backend-compliance (push) Successful in 30s Details CI/CD / test-python-document-crawler (push) Successful in 21s Details CI/CD / test-python-dsms-gateway (push) Successful in 16s Details CI/CD / validate-canonical-controls (push) Successful in 9s Details CI/CD / Deploy (push) Successful in 1s Details Die atomic_controls_dedup Collection (51k Punkte) enthaelt nur atomare Controls ohne source_citation. Jetzt wird der Parent-Control aufgeloest, der die Rechtsgrundlage traegt. Deduplizierung nach Parent-UUID verhindert mehrfache Eintraege fuer die gleiche Regulation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 10:52:41 +01:00
Benjamin Admin	db7c207464	feat: V1 Control Enrichment — Eigenentwicklung-Label, regulatorisches Matching & Vergleichsansicht CI/CD / go-lint (push) Has been skipped Details CI/CD / python-lint (push) Has been skipped Details CI/CD / nodejs-lint (push) Has been skipped Details CI/CD / test-go-ai-compliance (push) Successful in 39s Details CI/CD / test-python-backend-compliance (push) Successful in 32s Details CI/CD / test-python-document-crawler (push) Successful in 20s Details CI/CD / test-python-dsms-gateway (push) Successful in 16s Details CI/CD / validate-canonical-controls (push) Successful in 9s Details CI/CD / Deploy (push) Successful in 4s Details 863 v1-Controls (manuell geschrieben, ohne Rechtsgrundlage) werden als "Eigenentwicklung" gekennzeichnet und automatisch mit regulatorischen Controls (DSGVO, NIS2, OWASP etc.) per Embedding-Similarity abgeglichen. Backend: - Migration 080: v1_control_matches Tabelle (Cross-Reference) - v1_enrichment.py: Batch-Matching via BGE-M3 + Qdrant (Threshold 0.75) - 3 neue API-Endpoints: enrich-v1-matches, v1-matches, v1-enrichment-stats - 6 Tests (dry-run, execution, matches, pagination, detection) Frontend: - Orange "Eigenentwicklung"-Badge statt grauem "v1" (wenn kein Source) - "Regulatorische Abdeckung"-Sektion im ControlDetail mit Match-Karten - Side-by-Side V1CompareView (Eigenentwicklung vs. regulatorisch gedeckt) - Prev/Next Navigation durch alle Matches Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 10:32:08 +01:00
Benjamin Admin	23dd5116b3	feat: LLM-basierter Rationale-Backfill fuer atomare Controls POST /controls/backfill-rationale — ersetzt Placeholder "Aus Obligation abgeleitet." durch LLM-generierte Begruendungen (Ollama/qwen3.5). Optimierung: gruppiert ~86k Controls nach ~7k Parents, ein LLM-Call pro Parent. Paginierung via batch_size/offset fuer kontrollierte Ausfuehrung. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-25 23:01:49 +01:00

1 2 3

117 Commits