feat: ZeroClaw compliance agent — document analysis + role assignment + email

Add autonomous compliance agent that fetches web documents (cookie banners, privacy policies), classifies them via Qwen/Ollama, assesses DSGVO compliance, assigns to the responsible role, and sends notification emails. Components: - ZeroClaw SOP (6-step workflow: fetch, classify, assess, summarize, assign, notify) - Backend: /api/compliance/agent/analyze (combined endpoint) - Backend: /api/compliance/agent/notify (standalone email) - Frontend: /sdk/agent page (Manager UI with URL input + results) - Helper scripts + E2E test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-27 23:27:25 +02:00
parent f528b8e7a9
commit 0c0dd4e3a6
16 changed files with 1095 additions and 0 deletions
@@ -0,0 +1,56 @@
+# ZeroClaw Compliance Agent Demo
+
+Autonomer Compliance-Agent der Web-Dokumente (Cookie-Banner, Datenschutzerklaerungen) analysiert und die Ergebnisse an die zustaendige Rolle weiterleitet.
+
+## Architektur
+
+```
+ZeroClaw Agent (Rust, Mac Mini)
+  │
+  ├── LLM: Qwen 3.5:35b-a3b (Ollama, localhost:11434)
+  │
+  ├── Compliance SDK (Go/Gin, localhost:8093)
+  │   ├── /sdk/v1/llm/chat         → Dokumentklassifizierung
+  │   ├── /sdk/v1/ucca/assess      → Risikobewertung
+  │   └── /sdk/v1/ucca/escalations → Eskalation + Rollenzuweisung
+  │
+  ├── Backend (Python/FastAPI, localhost:8002)
+  │   └── /api/compliance/agent/notify → Email-Benachrichtigung
+  │
+  └── Mailpit (SMTP localhost:1025, Web localhost:8025)
+      └── Fiktive Email-Zustellung
+```
+
+## Voraussetzungen
+
+- ZeroClaw v0.7.3+ (`brew install zeroclaw`)
+- Ollama mit `qwen3.5:35b-a3b` Modell
+- Alle Compliance-Services laufen (SDK, Backend, Mailpit)
+
+## Demo ausfuehren
+
+```bash
+# 1. ZeroClaw mit Ollama verbinden (einmalig)
+zeroclaw onboard --quick --provider ollama --model qwen3.5:35b-a3b
+
+# 2. SOP ausfuehren
+zeroclaw agent -m "Analysiere die Datenschutzerklaerung von https://www.google.com/intl/de/policies/privacy/"
+
+# 3. Ergebnis pruefen
+open http://localhost:8025  # Mailpit Web-UI
+```
+
+## E2E Test
+
+```bash
+bash zeroclaw/tests/test_sop_workflow.sh
+```
+
+## SOP-Workflow (6 Schritte)
+
+1. **Fetch** — URL holen, HTML strippen
+2. **Classify** — Dokumenttyp bestimmen (privacy_policy, cookie_banner, etc.)
+3. **Assess** — DSGVO-Risikobewertung via UCCA
+4. **Summarize** — Manager-Report auf Deutsch
+5. **Assign** — Zustaendige Rolle bestimmen (E0-E3 Mapping)
+6. **Notify** — Email an DSB/Teamleitung senden
@@ -0,0 +1,34 @@
+#!/usr/bin/env bash
+#
+# fetch-and-analyze.sh — Fetch a URL and extract clean text for compliance analysis.
+#
+# Usage: bash fetch-and-analyze.sh <url> [max_chars]
+#
+# Outputs clean text to stdout, truncated to max_chars (default: 4000).
+
+set -euo pipefail
+
+URL="${1:?Usage: fetch-and-analyze.sh <url> [max_chars]}"
+MAX_CHARS="${2:-4000}"
+
+# Fetch page with reasonable timeout and user agent
+HTML=$(curl -sL --max-time 30 \
+  -H "User-Agent: Mozilla/5.0 (compatible; BreakPilot-Compliance-Agent/1.0)" \
+  "$URL" 2>/dev/null || echo "")
+
+if [ -z "$HTML" ]; then
+  echo "ERROR: Could not fetch $URL" >&2
+  exit 1
+fi
+
+# Strip HTML: remove style/script blocks, then all tags, normalize whitespace
+CLEAN=$(echo "$HTML" \
+  | sed 's/<style[^>]*>[^<]*<\/style>//gi' \
+  | sed 's/<script[^>]*>[^<]*<\/script>//gi' \
+  | sed 's/<[^>]*>//g' \
+  | sed 's/&nbsp;/ /g; s/&amp;/\&/g; s/&lt;/</g; s/&gt;/>/g; s/&quot;/"/g' \
+  | tr -s '[:space:]' ' ' \
+  | sed 's/^ //; s/ $//')
+
+# Truncate to max chars
+echo "$CLEAN" | head -c "$MAX_CHARS"
@@ -0,0 +1,35 @@
+#!/usr/bin/env bash
+#
+# send-notification.sh — Send a notification email via Mailpit SMTP.
+#
+# Usage: bash send-notification.sh <recipient> <subject> <body_text>
+#
+# Uses Mailpit's SMTP on localhost:1025 via Python smtplib (one-liner).
+
+set -euo pipefail
+
+RECIPIENT="${1:?Usage: send-notification.sh <recipient> <subject> <body_text>}"
+SUBJECT="${2:?Missing subject}"
+BODY="${3:?Missing body text}"
+
+SMTP_HOST="${SMTP_HOST:-localhost}"
+SMTP_PORT="${SMTP_PORT:-1025}"
+FROM_ADDR="${SMTP_FROM_ADDR:-compliance-agent@breakpilot.local}"
+FROM_NAME="${SMTP_FROM_NAME:-BreakPilot Compliance Agent}"
+
+python3 -c "
+import smtplib
+from email.mime.text import MIMEText
+from email.mime.multipart import MIMEMultipart
+
+msg = MIMEMultipart('alternative')
+msg['From'] = '${FROM_NAME} <${FROM_ADDR}>'
+msg['To'] = '${RECIPIENT}'
+msg['Subject'] = '${SUBJECT}'
+msg.attach(MIMEText('''${BODY}''', 'html', 'utf-8'))
+
+with smtplib.SMTP('${SMTP_HOST}', ${SMTP_PORT}) as server:
+    server.sendmail('${FROM_ADDR}', '${RECIPIENT}', msg.as_string())
+
+print('Email sent to ${RECIPIENT}')
+"
@@ -0,0 +1,98 @@
+## Context
+
+Du bist ein Compliance-Analyst-Agent. Du analysierst Web-Dokumente (Cookie-Banner, Datenschutzerklaerungen) auf DSGVO-Konformitaet mithilfe des BreakPilot Compliance SDK.
+
+### Endpunkte
+
+- **Compliance SDK:** http://localhost:8093
+- **Backend:** http://localhost:8002
+- **Mailpit SMTP:** localhost:1025
+- **Mailpit Web:** http://localhost:8025
+
+### Authentifizierung
+
+Alle SDK-Anfragen benoetigen diese Header:
+- `X-Tenant-ID: 9282a473-5c95-4b3a-bf78-0ecc0ec71d3e`
+- `X-User-ID: 00000000-0000-0000-0000-000000000001`
+
+## Steps
+
+### 1. Fetch Document
+
+Hole die Ziel-URL und extrahiere den Text:
+
+```bash
+curl -sL "$URL" | sed 's/<style[^>]*>.*<\/style>//g; s/<script[^>]*>.*<\/script>//g; s/<[^>]*>//g; s/&nbsp;/ /g; s/&amp;/\&/g; s/  */ /g' | head -c 4000
+```
+
+Speichere das Ergebnis als `$DOCUMENT_TEXT`.
+
+### 2. Classify Document
+
+Sende den Text an das SDK zur Klassifizierung:
+
+```bash
+curl -s -X POST http://localhost:8093/sdk/v1/llm/chat \
+  -H "Content-Type: application/json" \
+  -H "X-Tenant-ID: 9282a473-5c95-4b3a-bf78-0ecc0ec71d3e" \
+  -H "X-User-ID: 00000000-0000-0000-0000-000000000001" \
+  -d '{
+    "messages": [
+      {"role": "system", "content": "Klassifiziere das folgende Dokument in GENAU EINE Kategorie: privacy_policy, cookie_banner, terms_of_service, imprint, dpa, other. Antworte NUR mit dem Kategorienamen."},
+      {"role": "user", "content": "'"$DOCUMENT_TEXT"'"}
+    ]
+  }'
+```
+
+### 3. Analyze Compliance
+
+Fuehre eine UCCA-Bewertung durch:
+
+```bash
+curl -s -X POST http://localhost:8093/sdk/v1/ucca/assess \
+  -H "Content-Type: application/json" \
+  -H "X-Tenant-ID: 9282a473-5c95-4b3a-bf78-0ecc0ec71d3e" \
+  -H "X-User-ID: 00000000-0000-0000-0000-000000000001" \
+  -d '{
+    "use_case_text": "'"$DOCUMENT_TEXT"'",
+    "domain": "'"$CLASSIFICATION"'",
+    "data_categories": ["personal_data", "tracking", "cookies", "third_party_sharing"]
+  }'
+```
+
+Notiere: `risk_score`, `risk_level`, `escalation_level`, `triggered_rules`, `required_controls`.
+
+### 4. Prepare Summary
+
+Erstelle einen Manager-Report auf Deutsch mit:
+- **Dokumenttyp:** (aus Schritt 2)
+- **Quelle:** (URL)
+- **Risikobewertung:** (risk_level + risk_score aus Schritt 3)
+- **Wesentliche Findings:** (triggered_rules zusammengefasst)
+- **Erforderliche Massnahmen:** (required_controls zusammengefasst)
+- **Empfehlung:** (Handlungsempfehlung basierend auf escalation_level)
+
+### 5. Determine Responsible Role
+
+Basierend auf dem `escalation_level` aus Schritt 3:
+- **E0** → Kein Handlungsbedarf, automatische Compliance
+- **E1** → Teamleitung Datenschutz
+- **E2** → Datenschutzbeauftragter (DSB)
+- **E3** → DSB + Rechtsabteilung (gemeinsame Entscheidung)
+
+### 6. Send Notification Email
+
+Sende eine Benachrichtigung an die zustaendige Rolle:
+
+```bash
+curl -s -X POST http://localhost:8002/api/compliance/agent/notify \
+  -H "Content-Type: application/json" \
+  -d '{
+    "recipient": "dsb@breakpilot.local",
+    "subject": "Compliance-Finding: '"$CLASSIFICATION"' — '"$URL"'",
+    "body_html": "'"$MANAGER_SUMMARY_HTML"'",
+    "role": "'"$RESPONSIBLE_ROLE"'"
+  }'
+```
+
+Pruefe das Ergebnis in Mailpit: http://localhost:8025
@@ -0,0 +1,15 @@
+[sop]
+name = "compliance-analyst"
+description = "Fetch a web document (cookie banner, privacy policy), analyze for DSGVO compliance via BreakPilot SDK, assign to responsible role, notify via email"
+version = "1.0.0"
+priority = "normal"
+execution_mode = "supervised"
+max_concurrent = 1
+cooldown_secs = 60
+
+[[triggers]]
+type = "manual"
+
+[[triggers]]
+type = "webhook"
+path = "/sop/compliance-analyst"
@@ -0,0 +1,96 @@
+#!/usr/bin/env bash
+#
+# test_sop_workflow.sh — End-to-end test for the compliance-analyst SOP.
+#
+# Prerequisites:
+#   - Compliance SDK running on localhost:8093
+#   - Backend running on localhost:8002
+#   - Ollama running on localhost:11434 with qwen model
+#   - Mailpit running (SMTP on 1025, Web on 8025)
+#   - ZeroClaw installed
+
+set -euo pipefail
+
+SDK="http://localhost:8093"
+BACKEND="http://localhost:8002"
+OLLAMA="http://localhost:11434"
+MAILPIT="http://localhost:8025"
+TENANT="9282a473-5c95-4b3a-bf78-0ecc0ec71d3e"
+USER_ID="00000000-0000-0000-0000-000000000001"
+
+red()   { printf '\033[31m✗ %s\033[0m\n' "$*"; }
+green() { printf '\033[32m✓ %s\033[0m\n' "$*"; }
+
+echo "═══ Compliance Agent SOP — E2E Test ═══"
+echo ""
+
+# Step 1: Health checks
+echo "── Step 1: Service Health ──"
+curl -sf "$SDK/health" >/dev/null && green "SDK healthy" || red "SDK unreachable"
+curl -sf "$BACKEND/health" >/dev/null && green "Backend healthy" || red "Backend unreachable"
+curl -sf "$OLLAMA/api/tags" >/dev/null && green "Ollama running" || red "Ollama unreachable"
+
+# Step 2: Test document fetch
+echo ""
+echo "── Step 2: Document Fetch ──"
+TEXT=$(bash "$(dirname "$0")/../scripts/fetch-and-analyze.sh" "https://www.google.com/intl/de/policies/privacy/" 2000)
+CHARS=${#TEXT}
+if [ "$CHARS" -gt 100 ]; then
+  green "Fetched $CHARS chars from Google Privacy Policy"
+else
+  red "Fetch returned too little text ($CHARS chars)"
+  exit 1
+fi
+
+# Step 3: Test LLM classification
+echo ""
+echo "── Step 3: LLM Classification ──"
+CLASSIFY_RESULT=$(curl -sf -X POST "$SDK/sdk/v1/llm/chat" \
+  -H "Content-Type: application/json" \
+  -H "X-Tenant-ID: $TENANT" \
+  -H "X-User-ID: $USER_ID" \
+  -d "{
+    \"messages\": [
+      {\"role\": \"system\", \"content\": \"Klassifiziere: privacy_policy, cookie_banner, terms_of_service, imprint, dpa, other. Antworte NUR mit dem Kategorienamen.\"},
+      {\"role\": \"user\", \"content\": $(echo "$TEXT" | head -c 1000 | python3 -c 'import json,sys; print(json.dumps(sys.stdin.read()))')}
+    ]
+  }" 2>&1) || true
+
+if echo "$CLASSIFY_RESULT" | grep -qi "privacy_policy\|cookie\|terms\|imprint\|dpa"; then
+  green "Classification: $(echo "$CLASSIFY_RESULT" | python3 -c 'import json,sys; d=json.load(sys.stdin); print(d.get("response","").strip()[:50])' 2>/dev/null || echo "$CLASSIFY_RESULT" | head -c 50)"
+else
+  echo "  Classification result: $(echo "$CLASSIFY_RESULT" | head -c 100)"
+  red "Classification did not return expected category (may still be valid)"
+fi
+
+# Step 4: Test notification endpoint
+echo ""
+echo "── Step 4: Agent Notification ──"
+NOTIFY_RESULT=$(curl -sf -X POST "$BACKEND/api/compliance/agent/notify" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "recipient": "dsb@breakpilot.local",
+    "subject": "E2E Test: Compliance-Finding",
+    "body_html": "<h2>Test-Benachrichtigung</h2><p>Automatischer E2E-Test des Compliance-Agent SOP.</p>",
+    "role": "Datenschutzbeauftragter"
+  }' 2>&1) || true
+
+if echo "$NOTIFY_RESULT" | grep -qi "sent\|success\|ok"; then
+  green "Notification sent"
+else
+  echo "  Notify result: $(echo "$NOTIFY_RESULT" | head -c 100)"
+  red "Notification endpoint returned unexpected result"
+fi
+
+# Step 5: Check Mailpit
+echo ""
+echo "── Step 5: Mailpit Check ──"
+MAIL_COUNT=$(curl -sf "$MAILPIT/api/v1/messages" 2>/dev/null | python3 -c 'import json,sys; d=json.load(sys.stdin); print(d.get("total",0))' 2>/dev/null || echo "0")
+if [ "$MAIL_COUNT" -gt 0 ]; then
+  green "Mailpit has $MAIL_COUNT message(s)"
+else
+  red "No messages in Mailpit (check SMTP connectivity)"
+fi
+
+echo ""
+echo "═══ E2E Test Complete ═══"