Commit Graph

968 Commits

Author SHA1 Message Date
Benjamin Admin 8d6959e8b2 fix: Expand Art. 13 patterns for generic matching across all websites
Complaint (Art. 13(2)(d)):
+ 'recht auf beschwerde', 'art. 77', 'beschwerde...wenden/einlegen',
  'zuständige behörde' — IHK uses 'Recht auf Beschwerde gem. Art. 77'

Legal basis (Art. 13(1)(c)):
+ 'gemäß Art.', '§ X IHKG/BDSG/LDSG/BBiG/TDDDG', 'einwilligung gem',
  'verarbeitung auf grundlage' — catches statutory references

Third country (Art. 13(1)(f)):
+ 'Übermittlung ausserhalb', 'EWR/EEA', 'Data Privacy Framework'

Retention (Art. 13(2)(a)):
+ 'Dauer der Speicherung', 'Aufbewahrungsdauer/-pflicht/-zeit',
  'gesetzliche Aufbewahrung' — common German DSE headings

All patterns are generic, not IHK-specific.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 17:45:02 +02:00
Benjamin Admin 85e82d0dfa feat: IACE 28 operational hazard patterns (HP066-HP093)
Fault Clearing (HP066-HP072): Jammed parts releasing, hose bursts,
unexpected restart, stored energy, intervention in running machine,
material jam, falling parts during fault clearing

Maintenance (HP073-HP079): Missing LOTO, falls from platforms,
hot parts contact, hazardous substances, electric shock, ergonomic
access, uncontrolled hydraulic lowering

Setup/Changeover (HP080-HP085): Crushing during tool change, burns
from hot tools, heavy tool drops, unintended stroke in setup mode,
wrong parameters, test cycle hits personnel

Transport/Install/Decommission (HP086-HP090): Machine tipping,
crushing during installation, uncontrolled commissioning movement,
residual media, sharp edges

Cleaning (HP091-HP093): Slipping, chemical exposure, draw-in

Lifecycle keywords expanded: werkzeugwechsel, stoerung, fehlersuche,
klemm, blockier, stau → trigger fault_clearing phase patterns

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 17:42:38 +02:00
Benjamin Admin a349111a01 fix: Raise full_text limit 10K→50K + combine all DSI texts for checks
Two fixes:
1. consent-tester: full_text truncation raised from 10,000 to 50,000 chars
   (IHK Internetangebot has ~50K chars, Beschwerderecht was after 10K cutoff)
2. Backend: dse_text now combines Playwright HTML + ALL DSI discovery texts
   for mandatory content checking. Previously only used first 8K chars from
   one source, missing Verantwortlicher/DSB that were in DSI documents.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 16:03:56 +02:00
Benjamin Admin 3ac8d0cba8 fix: IACE mitigations page — remove broken 'm.' prefix + accept 'protective' type
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 15:52:10 +02:00
Benjamin Admin e3ae35891f fix: 0% completeness bug — SCORE finding was not generated at 100%
Root cause: When all 9 Art. 13 checks passed (100%), no SCORE finding
was created (line: 'if pct < 100'). The backend then defaulted to
completeness=0 because it looked for the SCORE finding to extract the %.

Fix: Always generate SCORE finding, even at 100%. Added 'OK' severity
for fully compliant documents.

This was the cause of 8 documents showing '0% MANGELHAFT' despite
containing all required information.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 15:34:04 +02:00
Benjamin Admin 72761d6066 debug: Log DSI text lengths to diagnose 0% completeness bug
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 14:08:04 +02:00
Benjamin Admin e494cf62bb fix: Increase page load timeouts — IHK site needs >30s for networkidle
- Initial page.goto timeout: 30s → 60s (IHK loads many JS resources)
- Per-page navigation timeout: 20s → 45s (heavy JS sites)
- Reduced extra wait from 3s+1s back to 2s+0.5s (goto timeout handles slow loads)
- Playwright scanner page timeout: 20s → 45s

Root cause: IHK website has heavy JavaScript that takes >30s to reach
'networkidle' state, causing DSI discovery to fail immediately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 13:10:59 +02:00
Benjamin Admin d547e63663 fix: DSI dedup prefers 'Datenschutzinformation*' titles + better JS content extraction
Bug 1 fix: When merging documents with identical word_count, prefer
titles starting with 'Datenschutzinformation' over generic section
headings like 'Zweck und Rechtsgrundlage'. This restores the main
'Datenschutzinformationen zum Internetangebot' document.

Bug 2 fix: After navigating to a document page, wait 3s (was 2s) for
JS content loading, then try 10+ content selectors before falling back
to body text (with nav/header/footer removed). Handles IHK-style JS
navigation where content loads after page.goto() completes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 12:26:42 +02:00
Benjamin Admin b4f90ed113 fix: IACE components page — remove broken 'c.' prefix from refactor
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 12:20:09 +02:00
Benjamin Admin daa47bb7ab feat: Scan history — shows last 20 scans with URL, date, findings count
- localStorage-based scan history (persists across sessions)
- Each completed scan adds entry: URL, timestamp, findings count, docs count
- 'Letzte Scans' section below results shows clickable history entries
- Click loads URL into form (and shows cached result if same URL)
- Max 20 entries, deduplicates by URL (latest scan wins)
- History visible in 'Website-Scan' tab

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 11:52:35 +02:00
Benjamin Admin 6c5e086356 fix: DSI dedup — skip anchor links, filter noise, merge duplicates + fix false positives
Dedup fixes:
- Anchor links (#cookies, #betroffenenrechte) on same page are skipped entirely
- Noise titles filtered: 'drucken', 'nach oben', 'Datenschutz' (too generic)
- Documents with < 50 words filtered (navigation snippets)
- Documents with identical word_count merged (same page, different title)
- URL-only titles filtered

False positive fixes (dsi_document_checker.py):
- 'Kontaktdaten des Verantwortlichen' pattern for controller check
- 'Zweck und Rechtsgrundlage' combined heading pattern
- 'Welche Daten werden verarbeitet' question-style headings
- 'Betroffenenrechte' as standalone heading
- 'Welche Rechte hat der Betroffene' question pattern
- 'Daten werden geloescht' retention pattern
- 'Auftragsverarbeiter' as recipient indicator

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 11:41:07 +02:00
Benjamin Admin 8e40155459 feat: Scan state persists across navigation — resume polling on return
- URL, mode, tab, scan result persisted in localStorage
- Active scan_id stored — polling resumes when returning to page
- Scan results survive navigation to other SDK modules
- 'Scan laeuft noch...' shown when returning to in-progress scan
- Cleans up localStorage when scan completes or fails

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 10:47:39 +02:00
Benjamin Admin b5cf25f6ab fix: IACE overview null-check for risk_summary (empty projects)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 10:44:16 +02:00
Benjamin Admin 7c7513525e feat: Document-centric scan results + DSI deduplication
DSI Dedup (consent-tester):
- Only H1/H2 headings count as documents (not H3/H4 sub-sections)
- Sub-sections (Cookies, Betroffenenrechte, Social Media) are part of
  parent document's full text, not separate documents
- Reduces IHK result from 30 to ~11 real documents

Backend (agent_scan_routes):
- ScanFinding gets doc_title field linking each finding to its document
- doc_title set when creating DSI findings for document attribution

Frontend (ScanResult.tsx):
- 3 sections: Services table, Document cards, General findings
- Documents: expandable cards with completeness bar (green/yellow/red)
- Findings grouped under their parent document
- Each card shows: title, word count, findings count, % completeness
- Findings without doc_title go to "Allgemeine Findings" section

Email Summary (agent_scan_helpers):
- Findings listed under their parent document
- General findings in separate section
- No more flat mixed list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 09:56:29 +02:00
Benjamin Admin d816cf8d3a fix: missing closing brace in GetBuiltinHazardPatterns()
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 09:36:23 +02:00
Benjamin Admin 8dd1581fae feat: IACE SIL/PL calculator + Cobot patterns + library extensions
SIL/PL Calculator: Deterministic S×E×P → PL (a-e) → SIL (1-3) mapping
Cobot Patterns (HP059-HP065): Human-robot collision, afterrun, misprogramming
Press Patterns split into separate file (500-line guardrail)
5 new components (C136-C140), 5 new tags, 18 keyword entries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 09:29:03 +02:00
Benjamin Admin ea8353f1a0 fix: Scan progress display — separate progress state, guard ScanResult render
- scanProgress state tracks live progress (not mixed into scanData)
- ScanResult only renders when scanData.services exists (prevents crash)
- Purple progress bar with spinner shows current step during scan
- Fixes: TypeError 's.services.filter' when progress data set as scanData

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 08:29:03 +02:00
Benjamin Admin d80cb9c8e4 feat: IACE Interview Frontend — 3 Modi (Interview/Wizard/Formular)
CE-Risikobeurteilung Datenerfassung mit 3 wählbaren Eingabe-Modi:

1. Interview-Modus (Chat-artig): Fragen werden nacheinander gestellt
   wie im Kundengespräch. Antwort-Historie sichtbar.
2. Wizard-Modus: Schritt-für-Schritt durch 8 Sektionen.
3. Formular-Modus: Alle Sektionen als Accordion auf einer Seite.

20 strukturierte Fragen in 8 Abschnitten:
- Maschinenbeschreibung (Name, Typ, Baugruppen)
- Lebensphasen (Betrieb, Einrichten, Wartung)
- Bestimmungsgemäße Verwendung
- Vorhersehbare Fehlanwendung
- Qualifikation der Benutzer
- Räumliche/Zeitliche Grenzen
- Technische Daten (Kräfte, Spannungen, Temperaturen, Drehzahlen)
- Umgebungsbedingungen

answersToNarrativeText() konvertiert alle Antworten in den Freitext
der an POST /parse-narrative gesendet wird.
Ergebnis-Panel zeigt: Komponenten, Gefahren, Patterns, Energiequellen.

URL: /sdk/iace/[projectId]/interview

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 08:22:59 +02:00
Benjamin Admin cb607bf228 feat: Async scan with polling — no more timeout issues
Fundamental fix: scans now run asynchronously with progress polling.

Backend:
- POST /scan starts background task, returns scan_id immediately
- GET /scan/{scan_id} returns status + progress + result when done
- 7 progress steps shown: Website scan, DSI discovery, DSE analysis,
  SOLL/IST comparison, corrections, report, email
- In-memory job store (dict with scan_id → status/result)
- No timeout limits on scan duration

Frontend:
- POST starts scan, receives scan_id
- Polls GET every 5 seconds (max 120 attempts = 10 min)
- Shows live progress message during scan
- Displays result when completed, error when failed

Proxy:
- POST timeout reduced to 30s (just starts the job)
- GET timeout 10s (just status check)
- No more 504/connection-dropped errors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 07:30:09 +02:00
Benjamin Admin d7b287889e fix: IACE parser handler — use MatchOutput.SuggestedHazards instead of MatchedPatterns fields
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 07:18:55 +02:00
Benjamin Admin d4b7943d54 feat: IACE deterministic narrative parser + library extensions
Library Extensions:
- 15 new components (C121-C135): knee lever, hydraulic ram, lubrication
  system, extraction system, vibrating plate, die tooling, transfer system,
  hoist, chute, oil drip tray, pressure relief valve, die space, flywheel,
  bin changeover station, inspection scale
- 8 new tags: person_under_load, two_hand_control_required,
  thermal_accumulation, mechanical_transmission, oil_mist_risk,
  rapid_energy_release, gravity_suspended_load, bypass_risk
- 14 new patterns (HP045-HP058): ram drop, die space crushing, oil mist
  inhalation, hot workpiece burns, suspended load, transfer draw-in,
  ejection fall, accumulator pressure release, impact noise, flywheel
  residual energy, guard bypass, two-hand misoperation, oil leakage,
  ergonomic bin changeover

Deterministic Parser (NO LLM):
- keyword_dictionary.go: ~100 entries mapping DE/EN keywords to
  component IDs, energy source IDs, and tags
- narrative_parser.go: ParseNarrative() extracts components, energy
  sources, lifecycle phases, roles, tech specs, and context tags from
  free-text machine descriptions via keyword matching + regex
- Tech spec regex: extracts kN, V, °C, bar, kW, rpm values and
  derives energy sources + severity tags automatically
- iace_handler_parser.go: POST /projects/:id/parse-narrative endpoint
  chains parser → pattern engine → hazard suggestions

Test: Paste Kniehebelpresse description → should detect 10+ components,
15+ hazards, all deterministically without LLM.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 00:29:18 +02:00
Benjamin Admin 47ec792acf fix: raise scan proxy timeout from 3 to 10 min (50 pages + 20 DSI docs + LLM)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 00:25:33 +02:00
Benjamin Admin f3e44cf59f fix: restore all missing consent-tester service modules
banner_detector.py, script_analyzer.py, category_tester.py, authenticated_scanner.py
were only on the feature branch — needed for consent-tester to start.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 00:14:26 +02:00
Benjamin Admin 3fade26d89 fix: restore consent-tester requirements.txt
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 00:06:50 +02:00
Benjamin Admin 797ed667a2 fix: restore consent-tester Dockerfile (was lost from main)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 00:05:19 +02:00
Benjamin Admin a3f7fb93f4 fix: Scan quality — raise page limit, use full DSI text for checks
Bug 1: max_pages was hardcoded to 15 in backend call — raised to 50
Bug 2: DSI documents checked against text_preview (500 chars) — now uses
       full_text (10,000 chars) for Art. 13 mandatory field checks
Bug 3: DSE text not found when Playwright misses DSE page — now falls
       back to DSI Discovery full_text as second source
Bug 4: Backend timeout 120s too short for 50 pages — raised to 300s

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 23:51:03 +02:00
Benjamin Admin f967480cd9 fix: Add missing service_registry.py to main
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 23:34:00 +02:00
Benjamin Admin 275bdf9848 fix: Add missing service modules required by agent_scan_routes
Build + Deploy / build-admin-compliance (push) Successful in 1m49s
Build + Deploy / build-backend-compliance (push) Successful in 2m57s
Build + Deploy / build-ai-sdk (push) Successful in 50s
Build + Deploy / build-developer-portal (push) Successful in 1m2s
Build + Deploy / build-tts (push) Successful in 1m23s
Build + Deploy / build-document-crawler (push) Successful in 39s
Build + Deploy / build-dsms-gateway (push) Successful in 23s
Build + Deploy / build-dsms-node (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 21s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m31s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Failing after 41s
CI / test-python-backend (push) Successful in 40s
CI / test-python-document-crawler (push) Successful in 25s
CI / test-python-dsms-gateway (push) Successful in 20s
CI / validate-canonical-controls (push) Successful in 13s
Build + Deploy / trigger-orca (push) Successful in 2m46s
These files existed on the feature branch but were never cherry-picked
to main, causing ModuleNotFoundError on import:
- dse_parser.py — parses DSE HTML into structured sections
- dse_matcher.py — matches detected services against DSE sections
- mandatory_content_checker.py — checks Art. 13 DSGVO mandatory fields
- legal_basis_validator.py — validates legal basis (lit. a-f)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 23:23:02 +02:00
Benjamin Admin a18ef16378 fix: Add missing service modules required by agent_scan_routes
These files existed on the feature branch but were never cherry-picked
to main, causing ModuleNotFoundError on import:
- dse_parser.py — parses DSE HTML into structured sections
- dse_matcher.py — matches detected services against DSE sections
- mandatory_content_checker.py — checks Art. 13 DSGVO mandatory fields
- legal_basis_validator.py — validates legal basis (lit. a-f)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 23:22:30 +02:00
Benjamin Admin 5c0ca803b0 fix: Add missing 'import re' to agent_scan_routes.py
Build + Deploy / build-admin-compliance (push) Successful in 11s
Build + Deploy / build-backend-compliance (push) Successful in 9s
Build + Deploy / build-ai-sdk (push) Successful in 7s
Build + Deploy / build-developer-portal (push) Successful in 6s
Build + Deploy / build-tts (push) Successful in 7s
Build + Deploy / build-document-crawler (push) Successful in 7s
Build + Deploy / build-dsms-gateway (push) Successful in 7s
Build + Deploy / build-dsms-node (push) Successful in 9s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 15s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m35s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Successful in 46s
CI / test-python-backend (push) Successful in 43s
CI / test-python-document-crawler (push) Successful in 31s
CI / test-python-dsms-gateway (push) Successful in 21s
CI / validate-canonical-controls (push) Successful in 16s
Build + Deploy / trigger-orca (push) Successful in 2m20s
NameError: name 're' is not defined at line 146 — the import was
accidentally removed when extracting helper functions to agent_scan_helpers.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 22:59:55 +02:00
Benjamin Admin 2f0f76e365 fix: Add missing 'import re' to agent_scan_routes.py
NameError: name 're' is not defined at line 146 — the import was
accidentally removed when extracting helper functions to agent_scan_helpers.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 22:59:53 +02:00
Benjamin Admin f960bd052a fix: Add missing 'import re' to agent_scan_routes.py
NameError: name 're' is not defined at line 146 — the import was
accidentally removed when extracting helper functions to agent_scan_helpers.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 22:59:53 +02:00
Benjamin Admin 4f92e5056c docs: Complete agent architecture reference for reuse in other agents
Full documentation of the ZeroClaw compliance agent architecture:
- System overview diagram (Frontend → Backend → LLM → Playwright)
- Detailed request flow for Website-Scan mode (7 steps)
- All 5 components: Frontend, Backend, Consent-Tester, Ollama, Soul Files
- 20 banner checks across 3 files
- LLM call patterns (/api/generate + /api/chat + think-mode stripping)
- Blueprint for creating new agents (5 steps: Soul, Route, Page, Proxy, Docker)
- Timeouts, environment variables, file reference with LOC counts

Designed as reusable blueprint for Sales, HR, Finance, or other agents.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 22:26:56 +02:00
Benjamin Admin b22351fc6e fix: Exhaustive crawl — no arbitrary page/document limits
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 14s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m37s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Successful in 38s
CI / test-python-backend (push) Successful in 36s
CI / test-python-document-crawler (push) Successful in 24s
CI / test-python-dsms-gateway (push) Successful in 21s
CI / validate-canonical-controls (push) Successful in 15s
Both scanners now search until done, not until a counter runs out:

playwright_scanner.py:
- Default max_pages raised from 15 to 50
- Added 3-minute timeout as safety net
- Recursive link discovery on EVERY visited page (not just DSE pages)
- Stops when: all links visited OR max_pages OR timeout

dsi_discovery.py:
- Default max_documents raised from 30 to 100
- Added 5-minute timeout as safety net
- Recursive: on each visited page, searches for MORE DSI links
- Processes ALL discovered links exhaustively
- Stops when: no more pending links OR max_documents OR timeout

The scanners now behave like a real user: they follow every relevant
link they find, and on each new page they look for more links.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 22:22:00 +02:00
Benjamin Admin a846bd8910 fix: Exhaustive crawl — no arbitrary page/document limits
Both scanners now search until done, not until a counter runs out:

playwright_scanner.py:
- Default max_pages raised from 15 to 50
- Added 3-minute timeout as safety net
- Recursive link discovery on EVERY visited page (not just DSE pages)
- Stops when: all links visited OR max_pages OR timeout

dsi_discovery.py:
- Default max_documents raised from 30 to 100
- Added 5-minute timeout as safety net
- Recursive: on each visited page, searches for MORE DSI links
- Processes ALL discovered links exhaustively
- Stops when: no more pending links OR max_documents OR timeout

The scanners now behave like a real user: they follow every relevant
link they find, and on each new page they look for more links.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 22:21:57 +02:00
Benjamin Admin 6da9972ef4 fix: Exhaustive crawl — no arbitrary page/document limits
Both scanners now search until done, not until a counter runs out:

playwright_scanner.py:
- Default max_pages raised from 15 to 50
- Added 3-minute timeout as safety net
- Recursive link discovery on EVERY visited page (not just DSE pages)
- Stops when: all links visited OR max_pages OR timeout

dsi_discovery.py:
- Default max_documents raised from 30 to 100
- Added 5-minute timeout as safety net
- Recursive: on each visited page, searches for MORE DSI links
- Processes ALL discovered links exhaustively
- Stops when: no more pending links OR max_documents OR timeout

The scanners now behave like a real user: they follow every relevant
link they find, and on each new page they look for more links.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 22:21:16 +02:00
Benjamin Admin c284cefada refactor: Remove Modules step, add Regulations card to Dashboard
- Modules step deleted from sdk-steps.ts and SDK Flow
  (regulations are now shown in Scope-Decision tab with toggles)
- Dashboard: "Erkannte Regulierungen" card shows which regulations
  apply based on Scope-Profiling (DSGVO, AI Act, NIS2, HinSchG)
- Dashboard: Amber warning if Scope-Profiling not yet completed
- Link to Scope-Decision tab for details & customization

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 22:21:12 +02:00
Benjamin Admin a970c28168 feat: DSI document discovery + completeness check in agent scan workflow
Build + Deploy / build-admin-compliance (push) Successful in 1m49s
Build + Deploy / build-backend-compliance (push) Successful in 2m52s
Build + Deploy / build-ai-sdk (push) Successful in 38s
Build + Deploy / build-developer-portal (push) Successful in 1m3s
Build + Deploy / build-tts (push) Successful in 1m27s
Build + Deploy / build-document-crawler (push) Successful in 33s
Build + Deploy / build-dsms-gateway (push) Successful in 22s
Build + Deploy / build-dsms-node (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 13s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m33s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Successful in 44s
CI / test-python-backend (push) Successful in 41s
CI / test-python-document-crawler (push) Successful in 1m3s
CI / test-python-dsms-gateway (push) Successful in 29s
CI / validate-canonical-controls (push) Successful in 19s
Build + Deploy / trigger-orca (push) Successful in 2m58s
Agent scan now automatically:
1. Discovers all legal documents via consent-tester /dsi-discovery endpoint
2. Classifies each as DSE/AGB/Widerruf/Cookie/Impressum
3. Checks completeness against type-specific checklists:
   - DSE: 9 Art. 13 DSGVO mandatory fields (controller, DPO, purposes,
     legal basis, recipients, third-country, retention, rights, complaint)
   - AGB: §305ff BGB (scope, contract formation, liability, jurisdiction)
   - Widerruf: §355 BGB (right info, 14-day deadline, form, consequences)
4. Adds findings per document to scan results
5. Shows discovered documents with completeness % in email summary
6. Returns discovered_documents list in API response

New files:
- dsi_document_checker.py (229 LOC) — checklists + classifier
- agent_scan_helpers.py (109 LOC) — extracted summary builder + corrections

Refactor: agent_scan_routes.py 537→448 LOC (under 500 budget)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 22:10:15 +02:00
Benjamin Admin 48146cddaf feat: DSI document discovery + completeness check in agent scan workflow
Agent scan now automatically:
1. Discovers all legal documents via consent-tester /dsi-discovery endpoint
2. Classifies each as DSE/AGB/Widerruf/Cookie/Impressum
3. Checks completeness against type-specific checklists:
   - DSE: 9 Art. 13 DSGVO mandatory fields (controller, DPO, purposes,
     legal basis, recipients, third-country, retention, rights, complaint)
   - AGB: §305ff BGB (scope, contract formation, liability, jurisdiction)
   - Widerruf: §355 BGB (right info, 14-day deadline, form, consequences)
4. Adds findings per document to scan results
5. Shows discovered documents with completeness % in email summary
6. Returns discovered_documents list in API response

New files:
- dsi_document_checker.py (229 LOC) — checklists + classifier
- agent_scan_helpers.py (109 LOC) — extracted summary builder + corrections

Refactor: agent_scan_routes.py 537→448 LOC (under 500 budget)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 22:10:13 +02:00
Benjamin Admin 53f6f30cf0 feat: DSI document discovery + completeness check in agent scan workflow
Agent scan now automatically:
1. Discovers all legal documents via consent-tester /dsi-discovery endpoint
2. Classifies each as DSE/AGB/Widerruf/Cookie/Impressum
3. Checks completeness against type-specific checklists:
   - DSE: 9 Art. 13 DSGVO mandatory fields (controller, DPO, purposes,
     legal basis, recipients, third-country, retention, rights, complaint)
   - AGB: §305ff BGB (scope, contract formation, liability, jurisdiction)
   - Widerruf: §355 BGB (right info, 14-day deadline, form, consequences)
4. Adds findings per document to scan results
5. Shows discovered documents with completeness % in email summary
6. Returns discovered_documents list in API response

New files:
- dsi_document_checker.py (229 LOC) — checklists + classifier
- agent_scan_helpers.py (109 LOC) — extracted summary builder + corrections

Refactor: agent_scan_routes.py 537→448 LOC (under 500 budget)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 22:09:45 +02:00
Benjamin Admin 298c95731a feat: Generic legal document discovery (DSI, AGB, Widerruf, Cookie-Richtlinie)
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 22s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m35s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Successful in 52s
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Successful in 29s
CI / test-python-dsms-gateway (push) Successful in 21s
CI / validate-canonical-controls (push) Successful in 14s
New service: dsi_discovery.py — finds ALL legal documents on any website:
- Technology-agnostic: HTML, SPA, WordPress, Typo3, custom CMS
- Structure-agnostic: accordions, sidebars, footers, inline links, tabs
- Format-agnostic: HTML pages, anchor sections, PDFs, cross-domain links
- Language-agnostic: 26 EU/EEA languages with document-type keywords

Document types discovered:
- Datenschutzinformationen / Privacy Policies (Art. 13/14 DSGVO)
- AGB / Terms of Service / Nutzungsbedingungen
- Widerrufsbelehrung / Right of Withdrawal (§355 BGB)
- Cookie-Richtlinie / Cookie Policy
- All cross-domain variants (e.g. help.instagram.com from instagram.com)

API: POST /dsi-discovery { url, max_documents }
Returns: list of documents with title, url, language, type, word_count, text_preview

Features:
- Expands all accordions, details, tabs, dropdowns before scanning
- Follows cross-domain links (same registrable domain)
- Re-expands after navigation back to source page
- Handles anchor links (#sections) separately from full pages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 21:57:37 +02:00
Benjamin Admin 4e63a6050d feat: Generic legal document discovery (DSI, AGB, Widerruf, Cookie-Richtlinie)
New service: dsi_discovery.py — finds ALL legal documents on any website:
- Technology-agnostic: HTML, SPA, WordPress, Typo3, custom CMS
- Structure-agnostic: accordions, sidebars, footers, inline links, tabs
- Format-agnostic: HTML pages, anchor sections, PDFs, cross-domain links
- Language-agnostic: 26 EU/EEA languages with document-type keywords

Document types discovered:
- Datenschutzinformationen / Privacy Policies (Art. 13/14 DSGVO)
- AGB / Terms of Service / Nutzungsbedingungen
- Widerrufsbelehrung / Right of Withdrawal (§355 BGB)
- Cookie-Richtlinie / Cookie Policy
- All cross-domain variants (e.g. help.instagram.com from instagram.com)

API: POST /dsi-discovery { url, max_documents }
Returns: list of documents with title, url, language, type, word_count, text_preview

Features:
- Expands all accordions, details, tabs, dropdowns before scanning
- Follows cross-domain links (same registrable domain)
- Re-expands after navigation back to source page
- Handles anchor links (#sections) separately from full pages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 21:56:55 +02:00
Benjamin Admin a6618af5ed feat: Generic legal document discovery (DSI, AGB, Widerruf, Cookie-Richtlinie)
New service: dsi_discovery.py — finds ALL legal documents on any website:
- Technology-agnostic: HTML, SPA, WordPress, Typo3, custom CMS
- Structure-agnostic: accordions, sidebars, footers, inline links, tabs
- Format-agnostic: HTML pages, anchor sections, PDFs, cross-domain links
- Language-agnostic: 26 EU/EEA languages with document-type keywords

Document types discovered:
- Datenschutzinformationen / Privacy Policies (Art. 13/14 DSGVO)
- AGB / Terms of Service / Nutzungsbedingungen
- Widerrufsbelehrung / Right of Withdrawal (§355 BGB)
- Cookie-Richtlinie / Cookie Policy
- All cross-domain variants (e.g. help.instagram.com from instagram.com)

API: POST /dsi-discovery { url, max_documents }
Returns: list of documents with title, url, language, type, word_count, text_preview

Features:
- Expands all accordions, details, tabs, dropdowns before scanning
- Follows cross-domain links (same registrable domain)
- Re-expands after navigation back to source page
- Handles anchor links (#sections) separately from full pages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 21:56:29 +02:00
Benjamin Admin 2b4ff9f422 feat: DSFA — VVT-Verknüpfung + Residual Risk + Bundesland-Blacklists
1. VVT-Verknüpfung: Dropdown "Verknüpfte VVT-Aktivität" in Step 1,
   lädt Aktivitäten via API, auto-fills Verarbeitungstätigkeit bei Auswahl

2. Residual Risk: Neuer Step 5 im Wizard — Bewertung des Restrisikos
   nach Maßnahmen. Bei hoch/kritisch → Art. 36 Vorabkonsultation Warnung

3. Bundesland-Blacklists (Art. 35 Abs. 4): 16 Landesbehörden mit
   DSK-Muss-Liste (10 gemeinsame Kriterien) + länderspezifische
   Ergänzungen (Bayern: Whistleblower/Drohnen, NRW: Social-Media-
   Monitoring, Berlin: Mieterbonitätsprüfung). Automatische Prüfung
   gegen Scope-Antworten. Blacklist-Matches im DSFA-Banner angezeigt.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 21:48:59 +02:00
Benjamin Admin 84b21cad08 feat: DSFA pre-fill from Company Profile + Scope answers
- New prefill-from-scope.ts utility:
  - headquartersState → federal_state (Bundesland for authority lookup)
  - data_art9 → special data categories (Gesundheit, Biometrie, etc.)
  - data_minors → adds "Minderjährige" to data subjects + raises risk
  - proc_adm_scoring → Art. 22 affected rights + measures
  - proc_ai_usage → involves_ai flag + AI measures
  - proc_video_surveillance → video data categories
  - industry/businessModel → processing purpose + legal basis

- isDSFARequired() check: shows red banner when Art. 35 triggers detected
- GeneratorWizard accepts prefill prop, initializes all fields
- Passes federal_state, involves_ai, legal_basis to backend POST

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 19:36:13 +02:00
Benjamin Admin 95baf60da3 refactor: Paket 2 Analyse umstrukturiert + AI Act/Evidence verschoben
Paket 2 Analyse (vorher 7 Steps → jetzt 5):
  1. Requirements — Pruefaspekte aus Regulierungen
  2. Controls — Technische & organisatorische Massnahmen
  3. Risk Matrix — Risikobewertung (vorher #4, jetzt #3)
  4. Audit Checklist — Pruefbare Checkliste (vorher #6)
  5. Audit Report — Zusammenfassender Report (vorher #7)

Verschoben:
- AI Act → Paket 1 Vorbereitung (optional, nur bei KI-Einsatz)
- Evidence → Paket 5 Betrieb (Nachweise laufend sammeln, nicht einmalig)

SDK Flow (steps-*.ts) synchronisiert mit neuer Reihenfolge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 16:40:10 +02:00
Benjamin Admin 9fe7759973 refactor: ISO 27001 aus Regulierungen entfernen → ISMS Readiness
ISO 27001 ist kein Gesetz — freiwilliger Standard, kein Normtext ingested.

- Modules: ISO 27001 Fallback-Modul entfernt, Filter entfernt
- ISMS: Umbenannt zu "ISMS — ISO 27001 Readiness"
- ISMS: Hinweis "Basierend auf eigenen Pruefaspekten, kein Normtext"
- Sidebar: "ISMS (ISO 27001)" → "ISMS Readiness"
- Verbleibende Regulierungen: DSGVO, AI Act, NIS2 (gesetzlich)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 14:38:22 +02:00
Benjamin Admin f737bfc4db refactor: Integrate Modules into Scope-Decision (Option C)
- RegulationsPanel: added enable/disable toggles per regulation
- ScopeDecisionTab: passes enabledModules + onToggleModule
- Scope page: auto-enables all applicable regulations when loaded
- Modules step: isOptional=true, moved to Zusatzmodule
- Requirements: now depends on compliance-scope, not modules
- Source-policy: now depends on use-case-assessment, not modules

Flow: Profile → Scope → Scope-Decision shows applicable regulations
with toggles → Requirements derived from enabled regulations

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 14:29:53 +02:00
Benjamin Admin 7ab1476d8f refactor: Move Screening to Zusatzmodule (optional)
- Screening step: isOptional=true
- Compliance Modules no longer depends on Screening
- Description updated to "SBOM + Vulnerability Scan (OSV.dev)"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 13:55:09 +02:00
Benjamin Admin 225456ec14 refactor: Source Policy — strip PII/Audit/Blocked, move to Zusatzmodule
- Removed: PII-Regeln tab (→ Core Service, other repo)
- Removed: Audit tab (→ redundant with Document Workflow + RBAC)
- Removed: Blockierte Inhalte tab (→ belongs to PII)
- Kept: Quellen-Whitelist + Berechtigungen (Operations Matrix)
- Renamed: "Source Policy" → "Quellen-Verwaltung"
- Moved: From Paket 1 (Pflicht) to Zusatzmodule (optional)
- sdk-steps.ts: isOptional=true, requirements no longer depends on it
- Sidebar: Added under Zusatzmodule section
- Page reduced from 365 → 130 lines

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 11:36:20 +02:00