refactor: split agent_analyze_routes (420→309 LOC) + agent docs + migration
- Extracted website compliance checks + helpers to website_compliance_checks.py - Created agent documentation (zeroclaw/docs/compliance-agent.md) - DB migration 086 executed (compliance_agent_scans table) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,114 @@
|
||||
# Compliance Agent — Dokumentation
|
||||
|
||||
## Uebersicht
|
||||
|
||||
Der Compliance Agent analysiert Websites und Dokumente automatisch auf DSGVO-Konformitaet.
|
||||
Er kombiniert Website-Scanning, LLM-Analyse, Control Library und Playwright Browser-Tests
|
||||
zu einem umfassenden Compliance-Audit.
|
||||
|
||||
## 5 Analyse-Modi
|
||||
|
||||
### 1. Schnellanalyse
|
||||
Einzelne URL klassifizieren und bewerten.
|
||||
- Qwen klassifiziert Dokumenttyp (DSE, Cookie-Banner, AGB, Impressum)
|
||||
- LLM extrahiert Intake-Flags (14 Kategorien)
|
||||
- UCCA Assessment bewertet Risiko
|
||||
- Relevance Filter entfernt False-Positive Controls
|
||||
- Email-Benachrichtigung an zustaendige Rolle
|
||||
|
||||
### 2. Website-Scan
|
||||
Multi-Page Crawl mit Dienstleister-Abgleich.
|
||||
- Playwright-Browser scannt 5-15 Seiten (JS-Rendering, Menue-Klicks)
|
||||
- 82+ Dienste erkannt (Tracking, CDN, Chatbots, Payment, Marketing)
|
||||
- SOLL/IST-Abgleich: DSE-Text vs. tatsaechlich eingebundene Dienste
|
||||
- Pflichtinhalte-Check: Art. 13 DSGVO (9 Felder) + §5 TMG (5 Felder)
|
||||
- Textblock-Referenzierung: Originaltext, Position, Korrekturvorschlag
|
||||
- Lit-Mapping: Prueft ob korrekte Rechtsgrundlage (lit. a-f) verwendet wird
|
||||
|
||||
### 3. Cookie-Test
|
||||
3-Phasen Consent-Test mit echtem Chromium-Browser.
|
||||
- Phase A: Was laedt VOR Einwilligung? (§25 TDDDG Verstoss)
|
||||
- Phase B: Was laedt NACH Ablehnung? (KRITISCH wenn Tracking weiterlaeuft)
|
||||
- Phase C: Was laedt NACH Zustimmung? (Abgleich mit Cookie-Policy)
|
||||
- Phase D-F: Einzelne Kategorien testen (Statistik, Marketing, Funktional)
|
||||
- 10 CMP-spezifische Selektoren (Cookiebot, OneTrust, Didomi, etc.)
|
||||
|
||||
### 4. Vergleich
|
||||
2-5 Websites parallel scannen und Compliance vergleichen.
|
||||
- Vergleichstabelle: Risiko, Findings, Services, Impressum, Cookie-Banner
|
||||
|
||||
### 5. Login-Test
|
||||
Kundenbereich nach Login pruefen.
|
||||
- §312k BGB: Kuendigungsbutton (2 Klicks)
|
||||
- Art. 17 DSGVO: Konto loeschen
|
||||
- Art. 20 DSGVO: Daten exportieren
|
||||
- Art. 7(3) DSGVO: Einwilligungen widerrufen
|
||||
- Art. 15 DSGVO: Profildaten einsehen
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Backend (Port 8002)
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|-------------|
|
||||
| POST | `/api/compliance/agent/analyze` | Schnellanalyse |
|
||||
| POST | `/api/compliance/agent/scan` | Website-Scan |
|
||||
| POST | `/api/compliance/agent/notify` | Email senden |
|
||||
| POST | `/api/compliance/agent/scans` | Scan speichern |
|
||||
| GET | `/api/compliance/agent/scans` | Scan-Verlauf |
|
||||
| POST | `/api/compliance/agent/scans/pdf` | PDF-Export |
|
||||
| POST | `/api/compliance/agent/compare` | Multi-Website Vergleich |
|
||||
| POST | `/api/compliance/agent/monitored-urls` | URL zur Ueberwachung |
|
||||
| POST | `/api/compliance/agent/run-scheduled` | Scheduled Scans triggern |
|
||||
|
||||
### Consent-Tester (Port 8094)
|
||||
|
||||
| Method | Endpoint | Beschreibung |
|
||||
|--------|----------|-------------|
|
||||
| POST | `/scan` | 3-Phasen Cookie-Test |
|
||||
| POST | `/website-scan` | Playwright Website-Scan |
|
||||
| POST | `/authenticated-scan` | Login-Test |
|
||||
| GET | `/health` | Health Check |
|
||||
|
||||
## Service-Registry
|
||||
|
||||
82+ Dienste in 15 Kategorien:
|
||||
Tracking, Marketing, Newsletter, CDN, Chatbots, Payment, Heatmaps,
|
||||
A/B Testing, Tag Manager, Push, Video, Social, Error Tracking, CRM, Accessibility.
|
||||
|
||||
Datei: `backend-compliance/compliance/services/service_registry.py`
|
||||
|
||||
## Pre-Launch vs. Post-Launch
|
||||
|
||||
| Modus | Tonfall | Empfehlung |
|
||||
|-------|---------|------------|
|
||||
| Pre-Launch | "Vor Veroeffentlichung korrigieren" | Einbaufertige DSE-Textbausteine |
|
||||
| Post-Launch | "ACHTUNG: Oeffentlich sichtbar!" | Sofortige Nachbesserung |
|
||||
|
||||
## Architektur
|
||||
|
||||
```
|
||||
Browser (Frontend)
|
||||
|
|
||||
├── /sdk/agent (Next.js, 5 Tabs)
|
||||
|
|
||||
├── Next.js API Proxies (/api/sdk/v1/agent/*)
|
||||
| |
|
||||
| ├── Backend (FastAPI, Port 8002)
|
||||
| | ├── agent_analyze_routes.py
|
||||
| | ├── agent_scan_routes.py (+ Playwright integration)
|
||||
| | ├── agent_history_routes.py
|
||||
| | ├── agent_recurring_routes.py
|
||||
| | └── agent_compare_routes.py
|
||||
| |
|
||||
| └── Consent-Tester (FastAPI + Playwright, Port 8094)
|
||||
| ├── consent_scanner.py (3-Phasen + Kategorien)
|
||||
| ├── playwright_scanner.py (Website-Scan)
|
||||
| ├── authenticated_scanner.py (Login-Test)
|
||||
| ├── banner_detector.py (10 CMPs)
|
||||
| ├── category_tester.py (Kategorie-Toggles)
|
||||
| └── script_analyzer.py (Service-Erkennung)
|
||||
|
|
||||
├── Qwen 3.5:35b-a3b (Ollama, Port 11434)
|
||||
└── Mailpit (SMTP 1025, Web 8025)
|
||||
```
|
||||
Reference in New Issue
Block a user