breakpilot-compliance

Author	SHA1	Message	Date
Benjamin Admin	ea4dbb223f	feat(vvt): per-vendor extraction + opt-out check + VVT table in email (V1) When a known CMP (ePaaS, OneTrust) renders the cookie policy, we now extract structured vendor records, probe their opt-out + privacy URLs, score each vendor (0-100), and append a 'VVT-Vorschlag' table to the compliance email — one row per vendor, sortable by compliance score. consent-tester: - DSIDiscoveryResult.cmp_payloads: surfaces raw CMP JSON to callers - DSIDiscoveryResponse: new cmp_payloads field - discover_dsi_documents sets cmp_payloads from cmp_capture - cmp_library/{epaas,onetrust}.py: new extract_vendors(d) returning list[VendorRecord] backend: - _fetch_text() now returns (text, cmp_payloads) tuple - doc_entries store cmp_payloads per doc (mostly cookie) - _autodiscover_missing forwards homepage payloads to the cookie entry - New module vendor_extractor.py: dispatches ePaaS/OneTrust/generic schemas; dedupes vendors across multiple payloads - cookie_link_validator.py extended with validate_vendor_urls(vendors) and score_vendors(vendors) — 0-100 score per vendor based on name, purpose, country, opt-out reachable, privacy URL reachable, cookies with names + expiry - agent_doc_check_extras.build_vvt_table_html: renders the table - Route appends VVT HTML after the provider list, before the document-by-document report - Response JSON gains cmp_vendors for future frontend rendering Example for BMW: ~30 ePaaS providers → table with Name \| Kategorie \| Sitz \| Cookies \| Opt-Out (✓/✗) \| Privacy (✓/✗) \| Score. Sorted by score ascending so the worst-compliant vendors are at the top.	2026-05-17 09:50:11 +02:00
Benjamin Admin	c9c0fb5965	feat(cookie-check): enhanced patterns + active opt-out link validator cookie_checks.py: - cookie_names_listed: now also matches CMP placeholder notation (BMW: 'Adfpc###', 'CT###') and 'Diese Datenverarbeitung verwendet die folgenden Cookies oder ähnliche Technologien' as list-shape signal. Cryptic vendor names like 'audience', 'adformfrpid' are accepted via the surrounding markup, not by hard-coding each one. - cookie_providers_named: new pattern 'Gesetzt von: <Firma>' (BMW/ePaaS per-cookie vendor naming) + recognition of full legal-form names (Adform A/S, BMW AG, Adobe Systems Software Ireland Limited). - cookie_duration_values: now matches 'Ablauf: 1 Jahr' / 'Speicherdauer: 30 Tage' (BMW format) in addition to the legacy '<n> <unit>'. New L1 + L2 checks for controller in cookie-policy: - cookie_controller (L1): the cookie policy must name Verantwortlich(er) - cookie_controller_address (L2): PLZ + Ort or address keywords - cookie_controller_contact_or_link (L2): email/phone OR link back to Datenschutzerklärung (the practical equivalent — BMW does this) New L2 checks (parented under opt_out): - cookie_optout_links: detects per-provider opt-out URLs in the text - cookie_privacy_policy_links: per-provider privacy-policy URLs New service: cookie_link_validator.py - extract_links(text): pulls all https?://… URLs that follow 'Opt-Out Link:' / 'Link zur Privacy Policy:' (deduped) - validate_links(links): probes every URL concurrently (HEAD first, GET fallback for 405/403). 10 parallel, 8s per request, 60s batch cap. Returns reachable=True/False + status + final_url. - build_check_items(): renders 2 CheckItems (opt-out + privacy-policy), each pass if ALL links 2xx/3xx, fail with up-to-5 broken-link examples. Hook in _check_single: doc_type=='cookie' triggers the validator after regex+MC checks. Recomputes correctness with the new L2 items. This addresses two concrete BMW observations: 1. BMW's per-cookie structure (Name + Zweck + Ablauf, Gesetzt von: …, Opt-Out Link: …) now recognised → 'Konkrete Cookie-Namen aufgelistet' and 'Konkrete Speicherdauern' should pass. 2. Defective opt-out URLs surface as compliance findings rather than silently passing — Art. 7(3) DSGVO requires a working withdrawal path per provider.	2026-05-17 09:38:32 +02:00
Benjamin Admin	b090662524	fix(compliance-check): respect auto-discovery 'not found' verdict; DSB not canonical Two related bugs in the BMW test result: 1. AGB rendered as 'MANGELHAFT 0/13' even though BMW has no public AGB: - Auto-discovery correctly returned 'not found' for AGB (no link on bmw.de matches AGB keywords). - But auto_fill_from_dsi then found the substring 'AGB' in a section of the DSI and pseudo-filled the AGB entry with a 264-word DSI fragment. - cross_search_documents would have done the same. - Both now skip entries where discovery_attempted=True AND auto_discovered=False — the 'not found' verdict stands. 2. DSB-Kontakt rendered as a separate 100% OK document with 7566 words = the entire DSI text: - GDPR practice: the DSB is named inside the DSI as an email or contact block (Art. 13(1)(b)), not as a stand-alone page. - cross_search_documents had been assigning the full DSI to the DSB row because it matched 'datenschutzbeauftragte' keywords. - DSB removed from _ALL_DOC_TYPES — no longer canonical, no longer padded as missing, no longer auto-discovered. The frontend row remains so a tenant with a separate DSB page can still submit one. After this fix BMW should render: - DSE: OK - Impressum: LUECKENHAFT (unchanged — regex gaps to fix separately) - Cookie-Richtlinie: OK - Social Media: NICHT GEFUNDEN (bmw.de does not link to it) - AGB: NICHT GEFUNDEN (correct — BMW has no public AGB) - Nutzungsbedingungen: NICHT GEFUNDEN - Widerruf: NICHT GEFUNDEN	2026-05-17 01:53:09 +02:00
Benjamin Admin	b2b4d77877	fix(auto-discovery): compute missing against canonical 8 types, not submitted Frontend filters out empty doc rows -> req.documents only contains the N submitted entries (3 in BMW case). The old auto-discovery loop computed 'missing' as 'entries in doc_entries with empty text', which was always empty for those N entries -> discovery never fired. Fix: - missing = _ALL_DOC_TYPES - {canonical doc_types in doc_entries} - For each missing type, APPEND a new entry to doc_entries with discovery_attempted=True. If a discovered doc matched, fill text/url and set auto_discovered=True. - Check loop: skip entries with no URL and no text (let padding label them). Entries with URL but no text keep the 'Kein Text' error so the user sees fetch failures explicitly.	2026-05-17 01:28:51 +02:00
Benjamin Admin	525038359a	feat(compliance-check): auto-discover missing doc types from homepage When the user leaves some doc-type rows empty, the tool now actively searches the website for them — only marks 'not found' as last resort. Flow: 1. User submits N URLs (e.g. just DSI) 2. For each canonical doc_type with no submitted URL/text, the route identifies the most-common base (scheme://netloc) from submitted URLs 3. Calls consent-tester /dsi-discovery on the homepage with max_documents=15 (180s timeout) 4. Classifies every discovered doc into a canonical doc_type via title/URL keyword rules (_DISCOVERY_RULES — covers cookie/widerruf/ social_media/agb/nutzungsbedingungen/dsb/impressum/dse) 5. Fills matching empty entries with the discovered text, marks auto_discovered=True and discovery_attempted=True Padding now differentiates: - 'Auf der Website nicht gefunden' — discovery was attempted, no doc matched. Amber badge, friendly hint to add URL manually. - 'Nicht eingereicht — Quelle nicht angegeben' — user gave NO URLs at all, nothing to crawl from. Grey badge. Email + frontend: - Status labels: NICHT GEFUNDEN (amber) vs NICHT EINGEREICHT (grey) - 'Gepruefte Quellen' table tags auto-discovered URLs with a small blue 'auto-entdeckt' badge so GF sees what tool found vs user submitted. Implementation only runs when ≥1 URL was submitted (no base to crawl from otherwise). Adds 30-90s for unsubmitted types but avoids the 'just say nicht gefunden' anti-pattern.	2026-05-17 01:14:05 +02:00
Benjamin Admin	bc21480a2a	fix(compliance-check): always render 8 doc types + 4 BMW GT-gap fixes Always-show-8 (user-requested): - agent_compliance_check_routes.py: _pad_results_with_missing pads the results list to always include all 8 canonical doc_types in canonical order. Missing types get a placeholder DocCheckResult with error= 'Nicht eingereicht' + scenario='missing'. - agent_doc_check_report.py: NICHT EINGEREICHT status label (neutral), friendly grey body block instead of red error. - ChecklistView.tsx: 'Nicht eingereicht' chip (neutral grey, not red 'Fehler'); SCENARIO_LABELS adds missing entry + header chip counter. Impressum-Regression fix (#18): - _fetch_text(url, doc_type): cookie/dse/social_media -> max_documents=1 (CMP capture authoritative, sub-pages dilute). Other types -> =3 (Impressum needs Versicherungsvermittler, Aufsicht, Berufsrecht sub- pages). 15s networkidle bail keeps timing safe. ODR/Verbraucherstreitbeilegung filter (#19): - _apply_profile_filter: when profile.needs_odr=True (B2C), override the check's default B2B-oriented hint with action-oriented B2C guidance pointing at Art. 14 EU-VO 524/2013 + §36 VSBG. Previously the check contradicted itself: 'profile says B2C' + hint 'only relevant for B2C online vendors'. Registergericht regex (#20): - impressum_checks.py: accept colon/dot/dash between keyword and city (BMW writes 'registergericht: münchen hrb 42243'). Add 'sitz und registergericht: X' as separate pattern. Industry detection (#21): - business_profiler.py: 'automotive' keywords broadened (antriebs, motor, leasing, werkstatt, probefahrt, plus brand names BMW/Mercedes/ Audi/VW/Porsche/Opel). 'it_services' keywords narrowed — software/ cloud/hosting are mentioned in every privacy policy and were biasing the result toward IT for any tech-aware company.	2026-05-17 01:03:58 +02:00
Benjamin Admin	9814b56f2f	fix(cookie-extract): max_documents=1 + faster networkidle bail (Phase 0 fix) Root cause of the recurring 603-word BMW result: - DSI discovery for cookie-policy URL was hitting 4x networkidle timeouts (60s each = ~240s total). - Backend httpx timeout (180s after the previous fix) gave up before the consent-tester finished, falling through to the raw HTTP fetch which returned BMWs SSR navigation chrome (603 words) as the 'cookie policy'. Two orthogonal fixes: 1. _fetch_text now passes max_documents=1 for user-specified URLs. We only want self-extraction of THAT page; link-following is unnecessary noise. 2. networkidle wait_until window dropped 60s -> 15s. SPAs like BMW/Daimler never reach networkidle anyway; the 60s wait was pure latency. Falls through to domcontentloaded+5s render-wait, same as before.	2026-05-16 22:53:23 +02:00
Benjamin Admin	6689b37f95	fix(agent): bump _fetch_text timeout 60s->180s The dsi-discovery in consent-tester does self-extraction + follows up to 3 sub-links + waits for CMP JSON payloads. On big SPAs (BMW, Daimler) this routinely exceeds 60s. When it timed out, the HTTP fallback returned the SSR shell as text — for the BMW cookie page that's 603 words of site navigation, which then registered as 'Cookie-Richtlinie nicht im eingereichten Text' (33%). With 180s the consent-tester finishes cleanly and we get the CMP-captured 1824 words of real policy.	2026-05-16 22:00:42 +02:00
Benjamin Admin	e61e9d9e2a	feat(agent): progress_pct + 6 BMW-Run Verbesserungen Backend (agent_compliance_check_routes.py): - progress_pct (0-100%) im Job-State, ueber alle Phasen verteilt (Laden 0-30, Profil 35-40, Pruefen 40-80, Banner 80-92, Report 95-100) - Status-Texte vereinheitlicht ("Texte laden X/N", "Pruefen X/N") - Firmenname fuer Email-Subject jetzt aus URL abgeleitet (bmw.de -> "BMW", mercedes-benz.de -> "Mercedes-Benz") statt unzuverlaessigem extracted_profile.companyName (matchte oft juris.de) - E-Mail-Report enthaelt jetzt Banner+TCF-Vendor-Liste (build_provider_list_html) Backend (agent_doc_check_extras.py — neu): - build_scanned_urls_html: gepruefte URLs als Tabelle oben im Report (transparent fuer GF, welche Quellen wirklich gezogen wurden) - Cross-Domain-Hinweis bei >1 netloc (BMW: bmw.de / bmwgroup.com / bmwgroup.jobs — Auffindbarkeit nach Art. 12 DSGVO) - build_provider_list_html: Banner-Box + TCF-Vendor-Tabelle mit Spalten Name \| Kategorie \| Zweck \| Drittland \| Rechtsgrundlage Backend (business_profiler.py): - §34d-GewO Versicherungsvermittler-Hinweise zaehlen nicht mehr als "finance"-Industrie (BMW wurde dadurch falsch als B2B/finance erkannt) - Neue Industry "automotive" (Fahrzeug/KFZ/Konfigurator/Modellpalette) - B2B-Keywords: generische Begriffe wie "unternehmen", "beratung", "consulting" entfernt (matchten in jedem Konzerntext) - B2C-Fallback: bei Verbraucher-Signalen ("widerruf", "kunde", redaktioneller Inhalt) tendiert auf b2c statt b2b Frontend (ComplianceCheckTab.tsx): - Progress-Balken mit Width-% und XX%-Anzeige rechts - liest data.progress_pct aus Polling-Response Consent-Tester (dsi_discovery.py): - Cookie-Policy-Extraktion kritisch fixt: wait_for_function bis body.innerText > 500 chars (BMW SPA-Rendering brauchte mehr Zeit) - _extract_text_robust: 3-Strategien-Extraktion (Selektoren -> Body- Cleanup -> P/LI/TD-Tags) - _extract_text_from_iframes: liest OneTrust/Sourcepoint/Usercentrics Iframe-Inhalte (manche Cookie-Policies leben dort) Adressiert alle Findings aus dem BMW-Ground-Truth-Vergleich.	2026-05-16 17:53:14 +02:00
Benjamin Admin	d45e08e25f	fix: reduce Playwright timeout 180s→60s, increase poll limit 15→25min	2026-05-16 00:47:28 +02:00
Benjamin Admin	3dbf3aa34a	feat: HTTP fallback for text extraction when Playwright times out BMW Impressum/Cookie pages timeout in Playwright (>180s) because the SPA has many sub-links to follow. But the HTML source already contains the text (SSR). New fallback: direct HTTP GET + HTML tag stripping. Order: 1. Consent-tester (Playwright, 180s) → 2. HTTP GET (30s) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-15 23:16:10 +02:00
Benjamin Admin	f34305c0a1	fix: increase dsi-discovery timeout 90s→300s, reduce max_documents 10→5	2026-05-15 14:21:13 +02:00
Benjamin Admin	fca67c1f43	fix: accordion close bug + merge multi-page DSIs (BMW fix) 1. _expand_all_interactive(): Only click aria-expanded="false" buttons. Before: clicked ALL accordion buttons including open ones → BMW's pre-expanded accordions got CLOSED, reducing text from 1151 to 361w. 2. _fetch_text() + /extract-text: merge ALL documents found on a page (max_documents=10 instead of 1). BMW splits DSI across 5 sub-pages that the discovery finds as separate documents — now merged. 3. Tab panels: unhide hidden tabpanels instead of clicking tabs (clicking tabs can hide the currently visible panel). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-15 13:32:04 +02:00
Benjamin Admin	9f87bc5a2c	fix: include website/company name in compliance-check email subject	2026-05-15 10:15:34 +02:00
Benjamin Admin	d72aa10691	feat: management summary for GF + batch GT test script 1. Management Summary (agent_doc_check_report.py): - Plain-language action items for Geschaeftsfuehrer - Maps technical checks to business actions ("Ihren DSB erwaehnen", "Beschwerderecht ergaenzen", "Loeschfristen dokumentieren") - Shows at top of compliance check email before detail report - Max 10 actions, max 3 per document 2. Batch GT Test (zeroclaw/scripts/batch_gt_test.py): - Runs all 10 GT websites through compliance-check API - Prints comparison table with L1 scores, word counts, services - Saves raw JSON results for analysis - Usage: python3 batch_gt_test.py --sites 1,6 --backend-url URL Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-15 09:39:19 +02:00
Benjamin Admin	826ce2a1b8	fix(cross-doc): suppress false positives when regex checks already pass Cross-search "not in text" findings are only shown when regex L1 completeness < 50%. This prevents false positives where the text IS the right doc_type but doesn't contain the specific cross-search keywords (e.g. Impressum passes 9/13 checks but lacks "§5 TMG"). Also: cross-search now checks entries with wrong text, not just empty. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-15 00:54:33 +02:00
Benjamin Admin	bd2d6976d6	fix(cross-doc): also check entries with wrong text, not just empty ones Cross-search now validates if existing text matches the expected doc_type using keyword scoring. If text is present but doesn't match (e.g. Nutzungsbedingungen in Widerruf row), searches other texts and creates a finding explaining the mismatch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-15 00:19:40 +02:00
Benjamin Admin	4e9043f26d	feat(cross-doc): search all texts for all doc_types + misplacement finding Cross-Document Intelligence: When a doc_type row is empty, searches ALL other loaded documents for that content. If found (e.g. Widerruf in AGB), extracts the section, runs the check, AND creates a finding: "Widerrufsbelehrung in falschem Dokument gefunden — schwer auffindbar" Keywords for: widerruf, cookie, social_media, impressum, agb, dsb. Integrated as Step 1c in compliance check pipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-14 23:19:39 +02:00
Benjamin Admin	33bf2b7c5a	feat(service-detector): detect 118 services in legal texts (was 20) Build + Deploy / build-admin-compliance (push) Successful in 2m5s Details Build + Deploy / build-backend-compliance (push) Successful in 3m26s Details Build + Deploy / build-ai-sdk (push) Successful in 56s Details Build + Deploy / build-developer-portal (push) Successful in 1m29s Details Build + Deploy / build-tts (push) Failing after 1m48s Details Build + Deploy / build-document-crawler (push) Successful in 44s Details Build + Deploy / build-dsms-gateway (push) Successful in 28s Details Build + Deploy / build-dsms-node (push) Successful in 17s Details CI / branch-name (push) Has been skipped Details Build + Deploy / trigger-orca (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 17s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m45s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Failing after 52s Details CI / test-python-backend (push) Successful in 36s Details CI / test-python-document-crawler (push) Successful in 25s Details CI / test-python-dsms-gateway (push) Successful in 21s Details CI / validate-canonical-controls (push) Successful in 14s Details New service_detector.py uses service_registry (88 entries) plus 30+ extra text patterns to detect services mentioned in DSI/legal texts. Results on Spiegel: 31/32 services detected (97%, was 5/32 = 16%). Includes metadata: name, category, country, EU adequacy status. - Profiler now uses detect_services_in_text() instead of 20-entry list - Profile extractor adds detected_services with full metadata - Auto-generates scope hint for non-EU services (Drittlandtransfer) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-13 16:00:15 +02:00
Benjamin Admin	5e317d2f0f	fix: text extraction 50k char limit was root cause of all Spiegel FNs Build + Deploy / build-admin-compliance (push) Successful in 18s Details Build + Deploy / build-backend-compliance (push) Successful in 12s Details Build + Deploy / build-ai-sdk (push) Successful in 10s Details Build + Deploy / build-developer-portal (push) Successful in 10s Details Build + Deploy / build-tts (push) Successful in 10s Details Build + Deploy / build-document-crawler (push) Successful in 9s Details Build + Deploy / build-dsms-gateway (push) Successful in 10s Details Build + Deploy / build-dsms-node (push) Successful in 15s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 17s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m46s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Failing after 41s Details CI / test-python-backend (push) Successful in 37s Details CI / test-python-document-crawler (push) Successful in 27s Details CI / test-python-dsms-gateway (push) Successful in 22s Details CI / validate-canonical-controls (push) Successful in 13s Details Build + Deploy / trigger-orca (push) Successful in 2m13s Details ROOT CAUSE: main.py line 338 truncated full_text at 50,000 chars. Spiegel DSI has 107,720 chars (13,705 words) — only 47% was extracted. DSB, Art. 77, Betroffenenrechte were all in the truncated portion. Fixes: 1. Raise text limit from 50k to 200k chars in API response + discovery 2. click_button(): add iframe fallback for Sourcepoint/Quantcast 3. dsi_helpers: iterate ALL page.frames for consent buttons 4. Profiler: only check impressum (not full text) for regulated professions, and "rechtsanwalt" must be in first 500 chars (company description) 5. GT: save full Spiegel DSI text (13,705 words) as reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-13 15:22:38 +02:00
Benjamin Admin	c702260ec1	fix: 5 regex bugs + text extraction scroll + GT update Build + Deploy / build-admin-compliance (push) Successful in 13s Details Build + Deploy / build-backend-compliance (push) Successful in 23s Details Build + Deploy / build-ai-sdk (push) Successful in 13s Details Build + Deploy / build-developer-portal (push) Successful in 14s Details Build + Deploy / build-tts (push) Successful in 15s Details Build + Deploy / build-document-crawler (push) Successful in 13s Details Build + Deploy / build-dsms-gateway (push) Successful in 15s Details Build + Deploy / build-dsms-node (push) Successful in 14s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 15s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m26s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Successful in 39s Details CI / test-python-backend (push) Successful in 39s Details CI / test-python-document-crawler (push) Successful in 25s Details CI / test-python-dsms-gateway (push) Successful in 22s Details CI / validate-canonical-controls (push) Successful in 15s Details Build + Deploy / trigger-orca (push) Successful in 2m28s Details Root cause: Spiegel DSI text was truncated (lazy-loading) — the rights/DSB/complaints sections at the bottom were never extracted. Fixes: 1. Text extraction: scroll to bottom before innerText (dsi_discovery.py) 2. V.i.S.d.P.: add "verantwortlicher i.s.v." + "§18 Abs. N MStV" pattern 3. USt-IdNr: add "umsatzsteuer-id" + "DE 212 442 423" (with spaces) 4. Profiler: remove generic "anwalt"/"praxis" (false positive on Spiegel "Redaktionsanwalt"), keep only "rechtsanwalt", "kanzlei" etc. 5. Section splitter: auto_fill_from_dsi() fills empty Cookie/Social-Media rows from sections found in the DSI text Ground Truth 06-spiegel.md fully rewritten with verified data from live website — 3 L1 False Negatives identified (DSB, Beschwerderecht, Betroffenenrechte all present on website but not in extracted text). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-13 01:20:55 +02:00
Benjamin Admin	0b9150f16f	feat(vendor-assessment): Pruefprotokoll + Frontend + Sidebar Build + Deploy / build-admin-compliance (push) Successful in 2m16s Details Build + Deploy / build-backend-compliance (push) Successful in 3m27s Details Build + Deploy / build-ai-sdk (push) Successful in 58s Details Build + Deploy / build-developer-portal (push) Successful in 1m13s Details Build + Deploy / build-tts (push) Successful in 1m43s Details Build + Deploy / build-document-crawler (push) Successful in 45s Details Build + Deploy / build-dsms-gateway (push) Successful in 30s Details Build + Deploy / build-dsms-node (push) Successful in 19s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 17s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m35s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Successful in 43s Details CI / test-python-backend (push) Successful in 37s Details CI / test-python-document-crawler (push) Successful in 26s Details CI / test-python-dsms-gateway (push) Successful in 21s Details CI / validate-canonical-controls (push) Successful in 14s Details Build + Deploy / trigger-orca (push) Successful in 3m33s Details Phase 4-5: Professional Pruefprotokoll report builder with styled HTML output (Kopfdaten, Kategorie-Scores, L1/L2 Check-Hierarchie, Findings, Freigabe-Block). Frontend at /sdk/vendor-assessment with 3-step flow: DocumentUploader → AssessmentProgress → PruefprotokollView. Sidebar: "Use-Case Audits" → "Vertragspruefung" renamed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-12 23:24:12 +02:00
Benjamin Admin	0326d5baab	feat(vendor-assessment): AVV/SCC/TOM/Sub-Processor checklists + assessment service Phase 1-3 of the Vendor Contract Assessment: Backend checklists (Doc-Check L1/L2 engine compatible): - avv_checks.py: 28 checks (11 L1 + 17 L2) for Art. 28(3) DSGVO - scc_checks.py: 7 checks for EU SCC 2021 (modules, annexes, TIA) - tom_annex_checks.py: 12 checks for Art. 32 (8 control objectives) - sub_processor_checks.py: 7 checks for sub-processor list completeness Assessment service: - POST /vendor-compliance/assessments — async contract analysis - GET /vendor-compliance/assessments/{id} — poll status - Cross-check engine: detects missing SCC when AVV mentions third-country, missing TOM annex, missing sub-processor list All checklists registered in runner.py CHECKLIST_MAP (27 doc_types total). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-12 23:14:54 +02:00
Benjamin Admin	c867478791	feat(tcf-vendors): GVL cache + vendor extraction + VVT mapping Build + Deploy / build-admin-compliance (push) Successful in 14s Details Build + Deploy / build-backend-compliance (push) Successful in 16s Details Build + Deploy / build-ai-sdk (push) Successful in 20s Details Build + Deploy / build-developer-portal (push) Successful in 12s Details Build + Deploy / build-tts (push) Successful in 15s Details Build + Deploy / build-document-crawler (push) Successful in 13s Details Build + Deploy / build-dsms-gateway (push) Successful in 13s Details Build + Deploy / build-dsms-node (push) Successful in 12s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 16s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m49s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Successful in 45s Details CI / test-python-backend (push) Successful in 38s Details CI / test-python-document-crawler (push) Successful in 26s Details CI / test-python-dsms-gateway (push) Successful in 23s Details CI / validate-canonical-controls (push) Successful in 15s Details Build + Deploy / trigger-orca (push) Successful in 2m23s Details Phase 1-2 of the closed quality loop: - GVL cache (consent-tester/services/gvl_cache.py): downloads and caches IAB Global Vendor List with 24h TTL, resolves vendor IDs to names, purposes, policy URLs, retention, country - Vendor extraction (consent_interceptor.py): extract_tcf_vendors() reads __tcfapi after accept phase, resolves via GVL - Scan response: tcf_vendors field added to /scan endpoint - VVT mapper (vendor_vvt_mapper.py): maps TCF vendors to VVT format with purpose labels, Rechtsgrundlage, Drittland detection - Vendor cross-check (banner_cookie_cross_check.py): checks all TCF vendors against DSI text — missing vendors, undocumented transfers - Compliance check integrates Step 3d: TCF vendors vs DSI Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-12 18:18:50 +02:00
Benjamin Admin	7be34552bb	feat(compliance-check): profile extraction + scenario classification Build + Deploy / build-admin-compliance (push) Successful in 15s Details Build + Deploy / build-backend-compliance (push) Successful in 21s Details Build + Deploy / build-ai-sdk (push) Successful in 46s Details Build + Deploy / build-developer-portal (push) Successful in 12s Details Build + Deploy / build-tts (push) Successful in 13s Details Build + Deploy / build-document-crawler (push) Successful in 11s Details Build + Deploy / build-dsms-gateway (push) Successful in 11s Details Build + Deploy / build-dsms-node (push) Successful in 14s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 17s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m46s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Successful in 47s Details CI / test-python-backend (push) Successful in 39s Details CI / test-python-document-crawler (push) Successful in 27s Details CI / test-python-dsms-gateway (push) Successful in 22s Details CI / validate-canonical-controls (push) Successful in 16s Details Build + Deploy / trigger-orca (push) Successful in 2m29s Details - New profile_extractor.py: extracts Company Profile fields (name, legal form, address, DPO, USt-IdNr) and Compliance Scope hints (Art. 9 data, third country, profiling) from document texts - Scenario per document: regenerate (<30%), fix (30-95%), import (>95%) - Widerruf for B2B: no longer skipped, instead all checks flagged as INFO with "not needed for B2B" hint - Move _build_profile_html to report builder module - DocCheckResult gets scenario field Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-12 17:34:33 +02:00
Benjamin Admin	be9cfdc2d4	feat(compliance-check): skip Widerruf for B2B, limit MCs, fix industry Build + Deploy / build-admin-compliance (push) Successful in 2m1s Details Build + Deploy / build-backend-compliance (push) Successful in 4m20s Details Build + Deploy / build-ai-sdk (push) Successful in 53s Details Build + Deploy / build-developer-portal (push) Successful in 2m6s Details Build + Deploy / build-tts (push) Successful in 2m48s Details Build + Deploy / build-document-crawler (push) Successful in 52s Details Build + Deploy / build-dsms-gateway (push) Successful in 11s Details Build + Deploy / build-dsms-node (push) Successful in 13s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 15s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m45s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Successful in 45s Details CI / test-python-backend (push) Successful in 41s Details CI / test-python-document-crawler (push) Successful in 26s Details CI / test-python-dsms-gateway (push) Successful in 21s Details CI / validate-canonical-controls (push) Successful in 15s Details Build + Deploy / trigger-orca (push) Successful in 3m17s Details - Skip Widerrufsbelehrung check entirely for B2B/B2G businesses - Limit MC checks to top 20 per doc_type (by severity) to reduce noise (e.g. 75 impressum MCs → 20, avoiding 55 irrelevant FAILs) - Add consulting/manufacturing industry keywords (arbeitssicherheit, brandschutz, werkzeugbau, etc.) - Lower industry detection threshold from 2 to 1 keyword hit Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-12 17:03:57 +02:00
Benjamin Admin	b42e1cd091	feat(cmp): timezone→geo_country mapping + timezone parameter Build + Deploy / build-admin-compliance (push) Successful in 2m10s Details Build + Deploy / build-backend-compliance (push) Successful in 5m20s Details Build + Deploy / build-ai-sdk (push) Successful in 57s Details Build + Deploy / build-developer-portal (push) Successful in 1m15s Details Build + Deploy / build-tts (push) Successful in 2m3s Details Build + Deploy / build-document-crawler (push) Successful in 53s Details Build + Deploy / build-dsms-gateway (push) Successful in 38s Details Build + Deploy / build-dsms-node (push) Successful in 20s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 18s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m40s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Successful in 48s Details CI / test-python-backend (push) Successful in 44s Details CI / test-python-document-crawler (push) Successful in 26s Details CI / test-python-dsms-gateway (push) Successful in 25s Details CI / validate-canonical-controls (push) Successful in 15s Details Build + Deploy / trigger-orca (push) Successful in 3m32s Details Add _resolve_geo_from_timezone() with 35-country IANA timezone map. Accept timezone field in ConsentCreate schema and pass through to service. Populate geo_country automatically from browser timezone. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-12 14:43:13 +02:00
Benjamin Admin	4a7e09bbb0	fix(impressum): regex [A-Z] never matches on lowercased text Build + Deploy / build-admin-compliance (push) Successful in 12s Details Build + Deploy / build-backend-compliance (push) Successful in 14s Details Build + Deploy / build-ai-sdk (push) Successful in 20s Details Build + Deploy / build-developer-portal (push) Successful in 13s Details Build + Deploy / build-tts (push) Successful in 12s Details Build + Deploy / build-document-crawler (push) Successful in 14s Details Build + Deploy / build-dsms-gateway (push) Successful in 13s Details Build + Deploy / build-dsms-node (push) Successful in 18s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 15s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m39s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Successful in 46s Details CI / test-python-backend (push) Successful in 42s Details CI / test-python-document-crawler (push) Successful in 27s Details CI / test-python-dsms-gateway (push) Successful in 22s Details CI / validate-canonical-controls (push) Successful in 15s Details Build + Deploy / trigger-orca (push) Successful in 2m28s Details All patterns matched against text_lower but used [A-Z] character class. Changed to [a-zA-Z] so patterns like "geschäftsführung: dr. oliver" are found. Also added "Pflicht"/"Detail" labels to the two progress bars to clarify what 100% vs 8% means. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-12 14:02:25 +02:00
Benjamin Admin	edbf6d2be5	feat(dsms): Stufe 2+3 — Evidence/TechFile → DSMS + Version Chains + Audit Timeline Build + Deploy / build-admin-compliance (push) Successful in 1m58s Details Build + Deploy / build-backend-compliance (push) Successful in 12s Details Build + Deploy / build-ai-sdk (push) Successful in 11s Details Build + Deploy / build-developer-portal (push) Successful in 11s Details Build + Deploy / build-tts (push) Successful in 21s Details Build + Deploy / build-document-crawler (push) Successful in 11s Details Build + Deploy / build-dsms-gateway (push) Successful in 14s Details Build + Deploy / build-dsms-node (push) Successful in 14s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 15s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m40s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Successful in 40s Details CI / test-python-backend (push) Successful in 37s Details CI / test-python-document-crawler (push) Successful in 26s Details CI / test-python-dsms-gateway (push) Successful in 22s Details CI / validate-canonical-controls (push) Successful in 14s Details Build + Deploy / trigger-orca (push) Successful in 2m26s Details Stufe 2A: Evidence Upload → automatische DSMS-Archivierung - Nach SHA-256 Hash → archive_to_dsms(), CID im Audit-Trail - Evidence mit CID wird automatisch zu E2 (hash-verifiziert) hochgestuft Stufe 2B: IACE Tech-File Export → DSMS - PDF/Excel/DOCX/Markdown Exporte werden nach DSMS archiviert - archiveTechFile() Helper fuer alle 4 Formate Stufe 3A: DSMS Gateway — parent_cid + History Endpoint - parent_cid + tenant_id Felder in DocumentMetadata - GET /documents/{cid}/history — folgt parent_cid-Chain (max 50 deep) Stufe 3C: Audit Timeline UI - Neue Seite /sdk/audit-timeline - Vertikale Timeline mit farbigen Action-Dots - Filter: Alle, Nachweis, DSMS-Archiv, Control, Dokument, DSFA, VVT, TOM - CID-Badges fuer DSMS-archivierte Eintraege Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-12 13:55:07 +02:00
Benjamin Admin	74f00bbb0f	feat(compliance-check): split shared URLs into sections per doc_type Build + Deploy / build-admin-compliance (push) Successful in 2m4s Details Build + Deploy / build-backend-compliance (push) Successful in 3m39s Details Build + Deploy / build-ai-sdk (push) Successful in 50s Details Build + Deploy / build-developer-portal (push) Successful in 1m12s Details Build + Deploy / build-tts (push) Successful in 2m16s Details Build + Deploy / build-document-crawler (push) Successful in 1m9s Details Build + Deploy / build-dsms-gateway (push) Successful in 35s Details Build + Deploy / build-dsms-node (push) Successful in 32s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 16s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m37s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Successful in 43s Details CI / test-python-backend (push) Successful in 39s Details CI / test-python-document-crawler (push) Successful in 27s Details CI / test-python-dsms-gateway (push) Successful in 22s Details CI / validate-canonical-controls (push) Successful in 15s Details Build + Deploy / trigger-orca (push) Successful in 3m16s Details When the same URL is used for multiple document types (e.g. /datenschutz for DSI + Cookie + DSB), the section splitter now: - Detects duplicate URLs and fetches text only once - Splits text at classified headings (Cookie, Google Analytics, etc.) - Assigns matching sections to each doc_type - DSI always keeps the full text Extracted to section_splitter.py (170 LOC) to keep routes under 500. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-12 12:49:57 +02:00
Benjamin Admin	407a9503e4	fix(profiler): fix B2G false positive + add consulting/manufacturing Build + Deploy / build-admin-compliance (push) Successful in 2m27s Details Build + Deploy / build-backend-compliance (push) Successful in 3m40s Details Build + Deploy / build-ai-sdk (push) Successful in 1m0s Details Build + Deploy / build-developer-portal (push) Successful in 1m16s Details Build + Deploy / build-tts (push) Successful in 1m54s Details Build + Deploy / build-document-crawler (push) Successful in 1m2s Details Build + Deploy / build-dsms-gateway (push) Successful in 31s Details Build + Deploy / build-dsms-node (push) Successful in 20s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 17s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m44s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Successful in 49s Details CI / test-python-backend (push) Successful in 36s Details CI / test-python-document-crawler (push) Successful in 25s Details CI / test-python-dsms-gateway (push) Successful in 21s Details CI / validate-canonical-controls (push) Successful in 14s Details Build + Deploy / trigger-orca (push) Successful in 3m23s Details - Remove generic B2G keywords (behörde, amt, öffentlich) that match in every DSI due to "Aufsichtsbehörde", "Amtsgericht", "veröffentlichen" - Remove "server" from it_services (too generic, appears in every DSI) - Add consulting, manufacturing, media industries - Add B2B fallback for GmbH/AG without B2C signals - Add 10 ground truth files for unified compliance check Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-12 12:20:44 +02:00
Benjamin Admin	ce77cde309	fix(compliance-check): batch LLM verification + increase poll timeout Build + Deploy / build-admin-compliance (push) Successful in 1m52s Details Build + Deploy / build-backend-compliance (push) Successful in 18s Details Build + Deploy / build-ai-sdk (push) Successful in 11s Details Build + Deploy / build-developer-portal (push) Successful in 11s Details Build + Deploy / build-tts (push) Successful in 12s Details Build + Deploy / build-document-crawler (push) Successful in 14s Details Build + Deploy / build-dsms-gateway (push) Successful in 10s Details Build + Deploy / build-dsms-node (push) Successful in 12s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 15s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m35s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Successful in 42s Details CI / test-python-backend (push) Successful in 37s Details CI / test-python-document-crawler (push) Successful in 25s Details CI / test-python-dsms-gateway (push) Successful in 21s Details CI / validate-canonical-controls (push) Successful in 16s Details Build + Deploy / trigger-orca (push) Successful in 2m24s Details - LLM verify now sends ALL failed checks in one batched call instead of one Ollama call per check (80+ calls → 1 per document) - Increase frontend poll timeout from 6 min to 15 min Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-12 11:49:30 +02:00
Benjamin Admin	b6ad958b69	feat(compliance-check): integrate banner cross-check + extract to module Build + Deploy / build-admin-compliance (push) Successful in 1m57s Details Build + Deploy / build-backend-compliance (push) Successful in 3m20s Details Build + Deploy / build-ai-sdk (push) Successful in 48s Details Build + Deploy / build-developer-portal (push) Successful in 1m6s Details Build + Deploy / build-tts (push) Successful in 1m43s Details Build + Deploy / build-document-crawler (push) Successful in 44s Details Build + Deploy / build-dsms-gateway (push) Successful in 31s Details Build + Deploy / build-dsms-node (push) Successful in 18s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 16s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m40s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Successful in 47s Details CI / test-python-backend (push) Successful in 38s Details CI / test-python-document-crawler (push) Successful in 28s Details CI / test-python-dsms-gateway (push) Successful in 20s Details CI / validate-canonical-controls (push) Successful in 14s Details Build + Deploy / trigger-orca (push) Successful in 3m26s Details Add automatic banner check (Step 3b) and banner-vs-cookie cross-check (Step 3c) to unified compliance check. Extract cross-check logic to banner_cookie_cross_check.py to keep routes under 500 LOC. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-12 00:08:47 +02:00
Benjamin Admin	66d30568e2	feat(dsms): Stufe 1 — Gap-Analyse Report wird in DSMS archiviert Build + Deploy / build-admin-compliance (push) Successful in 1m41s Details Build + Deploy / build-backend-compliance (push) Successful in 14s Details Build + Deploy / build-ai-sdk (push) Successful in 41s Details Build + Deploy / build-developer-portal (push) Successful in 10s Details Build + Deploy / build-tts (push) Successful in 10s Details Build + Deploy / build-document-crawler (push) Successful in 10s Details Build + Deploy / build-dsms-gateway (push) Successful in 10s Details Build + Deploy / build-dsms-node (push) Successful in 11s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 14s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m31s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Successful in 48s Details CI / test-python-backend (push) Failing after 1s Details CI / test-python-document-crawler (push) Successful in 32s Details CI / test-python-dsms-gateway (push) Successful in 25s Details CI / validate-canonical-controls (push) Successful in 15s Details Build + Deploy / trigger-orca (push) Successful in 2m23s Details - Go DSMS Client (internal/dsms/client.go): Archive() + Verify() - Python DSMS Client (compliance/services/dsms_client.py): archive_to_dsms() + verify_dsms() - Gap-Analyse AnalyzeProject() archiviert Report-JSON nach DSMS - Response enthält dsms_cid wenn Archivierung erfolgreich - Frontend: Grünes "Revisionssicher archiviert" Badge mit CID im GapDashboard - DSMS Proxy Route (/api/sdk/v1/dsms/[...path]) für Verify-Abfragen Stufe 2 (Evidence Upload → DSMS) und Stufe 3 (Version Chains) folgen. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 23:39:26 +02:00
Benjamin Admin	397de741c1	feat(cmp): Phase 2 — script blocking + cookie tracking Migration 108: scripts_blocked, scripts_released, cookies_set JSONB columns. Backend models/schema/service/serializer/routes extended. Admin detail modal shows released scripts and set cookies with categories. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 22:52:26 +02:00
Benjamin Admin	051890c370	feat(cmp): restore vendor-agnostic fields + module wiring Build + Deploy / build-admin-compliance (push) Successful in 2m0s Details Build + Deploy / build-backend-compliance (push) Successful in 14s Details Build + Deploy / build-ai-sdk (push) Successful in 10s Details Build + Deploy / build-developer-portal (push) Successful in 14s Details Build + Deploy / build-tts (push) Successful in 11s Details Build + Deploy / build-document-crawler (push) Successful in 11s Details Build + Deploy / build-dsms-gateway (push) Successful in 10s Details Build + Deploy / build-dsms-node (push) Successful in 13s Details CI / branch-name (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 18s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 2m55s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Successful in 45s Details CI / test-python-backend (push) Successful in 41s Details CI / test-python-document-crawler (push) Successful in 30s Details CI / test-python-dsms-gateway (push) Successful in 26s Details CI / validate-canonical-controls (push) Successful in 15s Details Build + Deploy / trigger-orca (push) Successful in 2m17s Details Re-add 13 vendor-agnostic columns to banner models/serializers/service (consent_method, banner_version, device_type, browser, os, etc.) that were lost when another session overwrote the code. Keep vendor_consents dict from the other session. Add list_consents method back to BannerConsentService. Wire CookieBanner, Loeschfristen and UseCases into Document Generator contextBridge (CMP_NAME, analytics tools, retention months, feature flags). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 21:57:54 +02:00
Benjamin Admin	0d0e705117	feat: Unified Compliance-Check — 8 document types in one form New 3-tab structure: Website-Scan, Compliance-Check, Banner-Check. Compliance-Check Tab (replaces Dokumenten-Pruefung + Impressum-Check): - 8 document rows: DSI, Impressum, Social Media, Cookie, AGB, Nutzungsbedingungen, Widerruf, DSB-Kontakt - Each row: URL input + "Text laden" + file upload + manual text - "Text laden" extracts via consent-tester, shows in editable textarea - User verifies/corrects text before checking - Empty fields = "not present" → own finding Business Profiler (business_profiler.py): - Detects B2B/B2C/B2G from all documents together - Recognizes regulated professions, online shops, editorial content - Context-aware: INFO checks become PASS/FAIL based on profile Backend: /compliance-check + /extract-text endpoints Frontend: ComplianceCheckTab.tsx + DocumentRow.tsx API proxies: compliance-check/route.ts + extract-text/route.ts Also: Impressum regex fixes (Telefon, AG, Geschaeftsfuehrung) and INFO severity for context-dependent checks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 20:56:10 +02:00
Benjamin Admin	0c25832b5c	fix: Context-aware Impressum checks + 3 regex fixes 3 Regex fixes: - Telefon: matches '0761 / 48 98 09 01' format (spaces around /) - Registergericht: matches 'AG Freiburg' (not just 'Amtsgericht') - Vertretung: matches 'Geschaeftsfuehrung:' (not just 'Geschaeftsfuehrer:') 6 checks changed from FAIL to INFO severity: - V.i.S.d.P.: only relevant if website has editorial content - Streitbeilegung: only relevant for B2C online shops - Berufsrecht: only relevant for regulated professions - Stammkapital: legally required but rarely enforced - Aufsichtsbehoerde: only for licensed activities - Berufshaftpflicht: only for mandatory insurance INFO checks don't count towards completeness percentage. They appear as hints, not findings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 15:23:19 +02:00
Benjamin Admin	02ff96f74e	fix: resolve all merge conflict markers from feat/zeroclaw-compliance-agent Build + Deploy / build-admin-compliance (push) Successful in 2m7s Details Build + Deploy / build-backend-compliance (push) Failing after 5m21s Details Build + Deploy / build-ai-sdk (push) Successful in 53s Details Build + Deploy / build-developer-portal (push) Successful in 1m18s Details Build + Deploy / build-tts (push) Successful in 1m42s Details Build + Deploy / build-document-crawler (push) Successful in 45s Details Build + Deploy / build-dsms-gateway (push) Successful in 27s Details Build + Deploy / build-dsms-node (push) Successful in 19s Details CI / branch-name (push) Has been skipped Details Build + Deploy / trigger-orca (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 19s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 3m6s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Successful in 55s Details CI / test-python-backend (push) Successful in 44s Details CI / test-python-document-crawler (push) Successful in 30s Details CI / test-python-dsms-gateway (push) Successful in 26s Details CI / validate-canonical-controls (push) Successful in 18s Details 9 files had conflict markers from the branch merge. All resolved keeping the feature branch version. Also split agent_scan_routes.py (534→367 LOC) by extracting Pydantic models to agent_scan_models.py. [guardrail-change] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 12:15:07 +02:00
Benjamin Admin	36c6101b91	Merge feat/zeroclaw-compliance-agent into main Brings all compliance doc-check features: - 162 regex checks + 1874 Master Controls - LLM-agnostic agent with tool calling - Banner check (46 checks, 30 CMPs, stealth, Shadow DOM) - Impressum check (24 checks) - Deep consent verification (DataLayer, GCM, TCF) - CMP E2E tests (39 tests) - HTML email reports, FAQ, persistent history Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-11 11:44:20 +02:00
Benjamin Admin	91d6d8b1a7	feat: KI-Agent toggle button in Dokumenten-Pruefung Build + Deploy / build-admin-compliance (push) Successful in 3m15s Details Build + Deploy / build-backend-compliance (push) Successful in 3m43s Details Build + Deploy / build-ai-sdk (push) Failing after 49s Details Build + Deploy / build-developer-portal (push) Successful in 1m26s Details Build + Deploy / build-tts (push) Successful in 1m49s Details Build + Deploy / build-document-crawler (push) Successful in 46s Details Build + Deploy / build-dsms-gateway (push) Successful in 33s Details Build + Deploy / build-dsms-node (push) Successful in 22s Details CI / branch-name (push) Has been skipped Details Build + Deploy / trigger-orca (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 22s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 3m1s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Failing after 58s Details CI / test-python-backend (push) Successful in 47s Details CI / test-python-document-crawler (push) Successful in 28s Details CI / test-python-dsms-gateway (push) Successful in 28s Details CI / validate-canonical-controls (push) Successful in 16s Details Green pill button: 'KI-Agent aus' / 'KI-Agent aktiv (1.874 MCs)' Toggles use_agent flag which is passed through the full chain: Frontend → DocCheckRequest → _run_doc_check → _check_single_document → check_document_with_controls(use_agent=True) → ComplianceAgent with tool calling Default: OFF (deterministic regex). User can enable per scan. Also works via env var COMPLIANCE_USE_AGENT=true for always-on. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-10 23:26:21 +02:00
Benjamin Admin	289ec5f396	feat(cmp): vendor-agnostic consent data model — 13 new fields Build + Deploy / build-admin-compliance (push) Successful in 2m28s Details Build + Deploy / build-backend-compliance (push) Successful in 3m48s Details Build + Deploy / build-ai-sdk (push) Failing after 45s Details Build + Deploy / build-developer-portal (push) Successful in 1m28s Details Build + Deploy / build-tts (push) Successful in 1m48s Details Build + Deploy / build-document-crawler (push) Successful in 48s Details Build + Deploy / build-dsms-gateway (push) Successful in 34s Details Build + Deploy / build-dsms-node (push) Successful in 20s Details CI / branch-name (push) Has been skipped Details Build + Deploy / trigger-orca (push) Has been skipped Details CI / guardrail-integrity (push) Has been skipped Details CI / loc-budget (push) Failing after 24s Details CI / secret-scan (push) Has been skipped Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / nodejs-build (push) Successful in 3m1s Details CI / dep-audit (push) Has been skipped Details CI / sbom-scan (push) Has been skipped Details CI / test-go (push) Failing after 49s Details CI / test-python-backend (push) Successful in 45s Details CI / test-python-document-crawler (push) Successful in 31s Details CI / test-python-dsms-gateway (push) Successful in 27s Details CI / validate-canonical-controls (push) Successful in 18s Details Extend banner consent records with consent_method, banner_version, banner_config_hash, geo, page_url, referrer, device info, session_id and consent_scope for full Art. 7 DSGVO proof with any tracking vendor. Migration 107, backward-compatible (all fields nullable). Admin detail modal shows tracking context, device info and technical data. Fix pre-existing str\|None → Optional[str] for Python 3.9 compat. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-10 23:12:20 +02:00
Benjamin Admin	58f370f4ff	feat: LLM-agnostic Compliance Agent with tool calling New agent architecture for intelligent MC evaluation: agent_tools.py (367 LOC): - 5 tools in OpenAI function-calling format - query_controls: async DB query for MCs by doc_type - evaluate_controls_batch: deterministic keyword matching - search_document: text search with context - get_document_stats: word count, sections, language - submit_results: finalize check results compliance_agent.py (398 LOC): - ComplianceAgent class with agent loop - 3 LLM providers: Ollama, OpenAI-compatible (OVH), Anthropic - Tool call dispatch + result collection - System prompt for systematic compliance analysis - run_compliance_check() convenience function Hybrid mode: - COMPLIANCE_USE_AGENT=false (default): deterministic regex - COMPLIANCE_USE_AGENT=true: LLM agent with tool calling - Agent fallback to regex if LLM unavailable Works with Qwen 35B (Ollama), Qwen 120B (OVH vLLM), Claude. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-10 22:56:09 +02:00
Benjamin Admin	bdbc30e47b	feat(cmp): unified consent view — Website-Besucher + Login-Nutzer tabs Merges two separate consent views into one unified page at /sdk/einwilligungen: - Tab "Website-Besucher": device-based banner consents with site selector - Tab "Login-Nutzer": user-based DSGVO consents (existing, unchanged) Backend: - New endpoint GET /admin/consents for paginated banner consent records - Fix: categories JSON string parsing (was iterating chars instead of array) CMP Dashboard: - Dynamic site selector replacing hardcoded "preview-test-site" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-10 22:41:56 +02:00
Benjamin Admin	9cbbc6ee2f	feat: LLM interpretation layer for failed MC checks Deterministic pass/fail stays unchanged. After keyword checking, ONE batched LLM call enriches the top 10 severity FAILs with context-specific recommendations based on the actual document. Example: If document uses Google Analytics but lacks transfer mechanism → LLM generates: "Sie nutzen Google Analytics (USA). Ergaenzen Sie einen Verweis auf das EU-US Data Privacy Framework und pruefen Sie die DPF-Zertifizierung unter dataprivacyframework.gov." - Pass/fail: deterministic (keyword matching, reproducible) - Hint enrichment: LLM (contextual, one call for all fails) - Temperature 0.3 for consistency - Graceful fallback if Ollama unavailable Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-10 22:08:07 +02:00
Benjamin Admin	5ea83e9b33	feat: Deterministic MC checking — ALL controls, no LLM, reproducible Replaced LLM-based MC verification with deterministic keyword matching: - Extracts keywords from pass_criteria/fail_criteria - Matches against document text via regex (case-insensitive) - PASS if >= 60% of criteria keywords found AND no fail_criteria triggered - Same text + same MCs = same result every time Checks ALL MCs for the doc_type (max_controls=0): - DSE: all 571 controls checked in <1 second - Impressum: all 75 controls - Cookie: all 381 controls No LLM calls needed — purely deterministic keyword matching. Bigram extraction for compound terms (e.g. "standardvertragsklauseln"). Stop word filtering for German legal text. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-10 21:51:58 +02:00
Benjamin Admin	26b222d53d	feat: Integrate 1.874 Master Controls into document checking Rewritten rag_document_checker.py to use doc_check_controls table instead of generic canonical_controls. Each MC has: - check_question: binary YES/NO for LLM - pass_criteria: JSONB list of concrete requirements - fail_criteria: JSONB list of common mistakes Flow: Regex checks (fast) → LLM verify FAILs → MC deep check (15 per doc) MC results appear as additional L2 checks in the report. Coverage: 571 DSE, 381 Cookie, 309 Loeschkonzept, 153 Widerruf, 147 DSFA, 125 AVV, 113 AGB, 75 Impressum = 1.874 total. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-10 21:06:03 +02:00
Benjamin Admin	a14e5ad97d	fix: Non-DSE doc checks prefer self-extracted text from actual URL When checking impressum/agb/widerruf, the DSI discovery would follow links away from the page and return the wrong document (e.g. /impressum → finds link to /datenschutz → returns datenschutz text). Now: for non-DSE doc_types, prefer the html_full_page document (self-extracted from the actual URL the user provided) over linked pages found by the crawler. Fixes safetykon.de/impressum returning datenschutz text. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-10 10:24:37 +02:00
Benjamin Admin	82951785ec	feat: Impressum checks expanded from 16 to 24 (GAP analysis) 8 new checks: Reglementierte Berufe, Grundkapital, Aufsichtsbehoerde, Berufshaftpflicht, rechtswidrige Disclaimer, Kammer, Berufsbezeichnung, berufsrechtliche Regelungen. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-10 09:29:49 +02:00
Benjamin Admin	1b8e9881bb	feat: Banner-Check — Historie, persistentes Ergebnis, E-Mail-Report 1. localStorage Persistenz: URL, letztes Ergebnis, Historie (30 Eintraege) 2. Historie: Zeigt URL, Datum, Provider, Violations, Prozent 3. Letztes Ergebnis bleibt nach Tab-Wechsel/Reload sichtbar 4. E-Mail-Report: HTML-formatiert mit Violations + Hints an mailpit 5. Email-Status Anzeige im Frontend Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-10 07:55:12 +02:00

1 2 3 4 5 ...

343 Commits