docs: Complete agent architecture reference for reuse in other agents
Full documentation of the ZeroClaw compliance agent architecture: - System overview diagram (Frontend → Backend → LLM → Playwright) - Detailed request flow for Website-Scan mode (7 steps) - All 5 components: Frontend, Backend, Consent-Tester, Ollama, Soul Files - 20 banner checks across 3 files - LLM call patterns (/api/generate + /api/chat + think-mode stripping) - Blueprint for creating new agents (5 steps: Soul, Route, Page, Proxy, Docker) - Timeouts, environment variables, file reference with LOC counts Designed as reusable blueprint for Sales, HR, Finance, or other agents. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,367 @@
|
||||
# ZeroClaw Agent-Architektur — Referenzdokumentation
|
||||
|
||||
Dieses Dokument beschreibt die vollstaendige Architektur des Compliance-Agenten,
|
||||
damit sie als Blaupause fuer weitere Agenten (z.B. Sales, HR, Finance) genutzt werden kann.
|
||||
|
||||
---
|
||||
|
||||
## 1. Ueberblick
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ FRONTEND (Next.js) │
|
||||
│ admin-compliance/app/sdk/agent/page.tsx │
|
||||
│ 5 Tabs: Quick | Scan | Cookie-Test | Vergleich | Login-Test │
|
||||
│ │
|
||||
│ Proxy-Routes: /api/sdk/v1/agent/* │
|
||||
│ → leiten an Backend weiter (vermeidet SSL-Cert-Probleme) │
|
||||
└──────────────────────┬──────────────────────────────────────────┘
|
||||
│ POST /api/sdk/v1/agent/{mode}
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ BACKEND (Python/FastAPI) │
|
||||
│ backend-compliance:8002 │
|
||||
│ │
|
||||
│ Orchestriert den gesamten Flow: │
|
||||
│ 1. Ruft Playwright-Service fuer Website-Crawling │
|
||||
│ 2. Erkennt Services via Regex (82+ Patterns) │
|
||||
│ 3. Ruft LLM fuer Klassifikation/Korrekturen │
|
||||
│ 4. Prueft Pflichtfelder (Art. 13, §355, §305ff) │
|
||||
│ 5. Sendet E-Mail-Report │
|
||||
│ │
|
||||
│ Routes: │
|
||||
│ - agent_analyze_routes.py (Schnellanalyse) │
|
||||
│ - agent_scan_routes.py (Website-Scan + DSI Discovery) │
|
||||
│ - agent_compare_routes.py (Multi-Website-Vergleich) │
|
||||
│ - agent_notification_routes.py (E-Mail) │
|
||||
│ - agent_history_routes.py (Scan-Verlauf + PDF) │
|
||||
└──────────┬──────────────────────┬───────────────────────────────┘
|
||||
│ │
|
||||
▼ ▼
|
||||
┌─────────────────────┐ ┌──────────────────────────────────────┐
|
||||
│ OLLAMA (LLM) │ │ CONSENT-TESTER (Playwright) │
|
||||
│ Mac Mini lokal │ │ consent-tester:8094 │
|
||||
│ │ │ │
|
||||
│ Qwen 3.5:35b-a3b │ │ Headless Chromium Browser: │
|
||||
│ Port: 11434 │ │ - /scan (3-Phasen Cookie-Test) │
|
||||
│ │ │ - /website-scan (JS-Rendering) │
|
||||
│ Endpunkte: │ │ - /dsi-discovery (Dokumentensuche) │
|
||||
│ /api/chat │ │ - /authenticated-scan (Login-Test) │
|
||||
│ /api/generate │ │ │
|
||||
│ /api/embeddings │ │ 20 Banner-Checks (rechtlich) │
|
||||
└─────────────────────┘ └──────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Request-Flow (am Beispiel Website-Scan)
|
||||
|
||||
```
|
||||
User klickt "Scan starten" im Frontend
|
||||
│
|
||||
▼
|
||||
POST /api/sdk/v1/agent/scan { url, mode, recipient }
|
||||
│
|
||||
├── Next.js API Route (Proxy)
|
||||
│ admin-compliance/app/api/sdk/v1/agent/scan/route.ts
|
||||
│ → Leitet an backend-compliance:8002 weiter
|
||||
│ → Timeout: 5 Minuten
|
||||
│
|
||||
▼
|
||||
Backend: agent_scan_routes.py :: scan_website_endpoint()
|
||||
│
|
||||
├── Schritt 1: Playwright Website-Scan
|
||||
│ POST http://consent-tester:8094/website-scan
|
||||
│ → Chromium oeffnet URL, rendert JavaScript
|
||||
│ → Klickt Navigations-Menues, entdeckt Unterseiten
|
||||
│ → Gibt HTML aller Seiten zurueck (max 50 Seiten, 3 Min Timeout)
|
||||
│
|
||||
├── Schritt 1b: DSI Document Discovery
|
||||
│ POST http://consent-tester:8094/dsi-discovery
|
||||
│ → Findet alle rechtlichen Dokumente (DSI, AGB, Widerruf, Cookie)
|
||||
│ → Oeffnet Accordions, Tabs, Sidebars
|
||||
│ → Folgt Cross-Domain-Links (z.B. help.instagram.com)
|
||||
│ → Extrahiert Text aus jedem Dokument
|
||||
│ → Backend prueft Vollstaendigkeit (Art. 13 Checkliste)
|
||||
│
|
||||
├── Schritt 2: Service-Erkennung (deterministisch, kein LLM)
|
||||
│ service_registry.py: 82+ Regex-Patterns gegen HTML
|
||||
│ → Google Analytics, Facebook Pixel, Stripe, Hotjar, etc.
|
||||
│ → Jeder Service hat: Kategorie, Anbieter, Land, Rechtsgrundlage
|
||||
│
|
||||
├── Schritt 3: DSE-Extraktion (LLM)
|
||||
│ POST http://ollama:11434/api/generate
|
||||
│ Prompt: "Extrahiere alle erwahnten Dienste aus dieser DSE..."
|
||||
│ → Qwen 3.5 liest den DSE-Text und gibt JSON zurueck
|
||||
│ → Fallback: Regex-Suche wenn LLM fehlschlaegt
|
||||
│
|
||||
├── Schritt 4: SOLL/IST-Vergleich
|
||||
│ SOLL = Services aus DSE (was dokumentiert ist)
|
||||
│ IST = Services auf Website (was tatsaechlich laeuft)
|
||||
│ → "undocumented": auf Website, nicht in DSE (Verstoss!)
|
||||
│ → "documented": beides OK
|
||||
│ → "outdated": in DSE, nicht mehr auf Website
|
||||
│
|
||||
├── Schritt 5: Pflichtfeld-Pruefung
|
||||
│ mandatory_content_checker.py: 9 Art. 13 DSGVO Felder
|
||||
│ legal_basis_validator.py: Rechtsgrundlage korrekt?
|
||||
│ dsi_document_checker.py: Jedes gefundene Dokument pruefen
|
||||
│
|
||||
├── Schritt 6: Korrekturen generieren (LLM, nur bei Findings)
|
||||
│ POST http://ollama:11434/api/generate
|
||||
│ Prompt: "Erstelle DSE-Textbaustein fuer Google Analytics..."
|
||||
│ → Qwen 3.5 generiert einbaufertigen Text
|
||||
│
|
||||
├── Schritt 7: E-Mail senden
|
||||
│ smtp_sender.py → Mailpit (SMTP :1025)
|
||||
│ HTML-Report mit allen Findings + Korrekturen
|
||||
│
|
||||
└── Response zurueck an Frontend
|
||||
ScanResponse { pages_scanned, services[], findings[],
|
||||
discovered_documents[], summary }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Komponenten im Detail
|
||||
|
||||
### 3.1 Frontend (Next.js)
|
||||
|
||||
**Seite:** `admin-compliance/app/sdk/agent/page.tsx`
|
||||
**Hooks:** `admin-compliance/app/sdk/agent/_hooks/`
|
||||
**Komponenten:** `admin-compliance/app/sdk/agent/_components/`
|
||||
|
||||
| Komponente | Aufgabe |
|
||||
|---|---|
|
||||
| ScanResult.tsx | Service-Tabelle, Findings, PDF-Download |
|
||||
| ConsentTestResult.tsx | 3-Phasen-Anzeige + Banner-Checks |
|
||||
| CompareResult.tsx | Side-by-Side Vergleich |
|
||||
| AuthTestResult.tsx | 5 Login-Checks |
|
||||
| TextReference.tsx | Originaltext + Korrekturvorschlag |
|
||||
| FollowUpQuestions.tsx | Ja/Nein Fragen |
|
||||
|
||||
**Proxy-Pattern:** Jeder API-Call geht ueber eine Next.js API Route
|
||||
(`app/api/sdk/v1/agent/*/route.ts`) die serverseitig an das Backend weiterleitet.
|
||||
Das vermeidet CORS- und SSL-Probleme.
|
||||
|
||||
### 3.2 Backend (Python/FastAPI)
|
||||
|
||||
**Service:** `backend-compliance:8002`
|
||||
**Prefix:** `/api/compliance/agent/`
|
||||
|
||||
| Route-Datei | Endpunkt | Aufgabe |
|
||||
|---|---|---|
|
||||
| agent_analyze_routes.py | POST /analyze | Schnellanalyse (1 URL) |
|
||||
| agent_scan_routes.py | POST /scan | Deep Scan + DSI Discovery |
|
||||
| agent_compare_routes.py | POST /compare | Multi-URL Vergleich |
|
||||
| agent_notification_routes.py | POST /notify | E-Mail senden |
|
||||
| agent_history_routes.py | GET/POST /scans | Scan-Verlauf |
|
||||
| agent_recurring_routes.py | */monitored-urls | Wiederkehrende Scans |
|
||||
|
||||
**Wichtiges Design-Prinzip:** Das Backend orchestriert, macht aber keine
|
||||
Browser-Operationen. Alles was einen Browser braucht → consent-tester.
|
||||
|
||||
### 3.3 Consent-Tester (Playwright/FastAPI)
|
||||
|
||||
**Service:** `consent-tester:8094`
|
||||
**Dockerfile:** Chromium als appuser installiert (Permission-Fix)
|
||||
|
||||
| Endpunkt | Aufgabe | Dateien |
|
||||
|---|---|---|
|
||||
| POST /scan | 3-Phasen Cookie-Test | consent_scanner.py, banner_text_checker.py, banner_advanced_checks.py |
|
||||
| POST /website-scan | JS-gerendertes Crawling | playwright_scanner.py |
|
||||
| POST /dsi-discovery | Dokumenten-Suche | dsi_discovery.py |
|
||||
| POST /authenticated-scan | Login + Rechte-Check | authenticated_scanner.py |
|
||||
| GET /health | Healthcheck | main.py |
|
||||
|
||||
**20 Banner-Checks** in 3 Dateien:
|
||||
- banner_text_checker.py (Checks 1-11)
|
||||
- banner_advanced_checks.py (Checks 12-20)
|
||||
|
||||
### 3.4 LLM (Ollama)
|
||||
|
||||
**Modell:** Qwen 3.5:35b-a3b (23.9 GB, lokal auf Mac Mini)
|
||||
**Port:** 11434 (erreichbar als `host.docker.internal:11434`)
|
||||
|
||||
**Zwei Aufruf-Patterns:**
|
||||
|
||||
```python
|
||||
# Pattern 1: /api/generate (einfache Completion)
|
||||
resp = await client.post(f"{OLLAMA_URL}/api/generate", json={
|
||||
"model": "qwen3.5:35b-a3b",
|
||||
"prompt": "Erstelle einen DSE-Textbaustein...",
|
||||
"stream": False,
|
||||
})
|
||||
text = resp.json()["response"]
|
||||
|
||||
# Pattern 2: /api/chat (Chat mit System-Prompt)
|
||||
resp = await client.post(f"{OLLAMA_URL}/api/chat", json={
|
||||
"model": "qwen3.5:35b-a3b",
|
||||
"messages": [
|
||||
{"role": "system", "content": "Du bist ein DSGVO-Experte..."},
|
||||
{"role": "user", "content": prompt},
|
||||
],
|
||||
"stream": False,
|
||||
"format": "json", # Erzwingt JSON-Ausgabe
|
||||
})
|
||||
text = resp.json()["message"]["content"]
|
||||
```
|
||||
|
||||
**Think-Mode beachten:** Qwen 3.5 gibt `<think>...</think>` Tags aus.
|
||||
Diese muessen aus der Antwort entfernt werden:
|
||||
```python
|
||||
raw = re.sub(r"<think>.*?</think>", "", raw, flags=re.DOTALL).strip()
|
||||
```
|
||||
|
||||
### 3.5 Soul Files (Agent-Persoenlichkeiten)
|
||||
|
||||
**Pfad:** `admin-compliance/agent-core/soul/`
|
||||
|
||||
| Datei | Agent | Aufgabe |
|
||||
|---|---|---|
|
||||
| compliance-advisor.soul.md | Compliance Advisor | Rechtsfragen beantworten (RAG) |
|
||||
| drafting-agent.soul.md | Drafting Agent | DSGVO-Dokumente generieren |
|
||||
|
||||
**Aufbau einer Soul-Datei:**
|
||||
1. Identitaet (Wer bin ich?)
|
||||
2. Kompetenzbereich (Was kann ich?)
|
||||
3. RAG-Nutzung (Welche Quellen?)
|
||||
4. Kommunikationsstil (Wie antworte ich?)
|
||||
5. Einschraenkungen (Was darf ich nicht?)
|
||||
6. Produktwissen (Features des Tools)
|
||||
7. FAQ (Haeufige Fragen + Antworten)
|
||||
8. Eskalation (Wann verweise ich weiter?)
|
||||
|
||||
---
|
||||
|
||||
## 4. Blaupause: Neuen Agenten erstellen
|
||||
|
||||
### Schritt 1: Soul-Datei erstellen
|
||||
|
||||
```markdown
|
||||
# Mein Agent
|
||||
|
||||
## Identitaet
|
||||
Du bist der [Name]-Agent. Du hilfst bei [Aufgabe].
|
||||
|
||||
## Kompetenzbereich
|
||||
- [Thema 1]
|
||||
- [Thema 2]
|
||||
|
||||
## Kommunikationsstil
|
||||
- Sachlich, verstaendlich
|
||||
- Quellenangaben
|
||||
|
||||
## Einschraenkungen
|
||||
- Keine [X]-Beratung
|
||||
```
|
||||
|
||||
### Schritt 2: Backend-Route erstellen
|
||||
|
||||
```python
|
||||
# backend-compliance/compliance/api/mein_agent_routes.py
|
||||
from fastapi import APIRouter
|
||||
router = APIRouter(prefix="/compliance/mein-agent", tags=["mein-agent"])
|
||||
|
||||
@router.post("/analyze")
|
||||
async def analyze(req: AnalyzeRequest):
|
||||
# 1. Daten holen (Playwright, API, DB)
|
||||
# 2. LLM aufrufen (Ollama)
|
||||
# 3. Ergebnis aufbereiten
|
||||
# 4. E-Mail senden
|
||||
return result
|
||||
```
|
||||
|
||||
### Schritt 3: Frontend-Page erstellen
|
||||
|
||||
```
|
||||
admin-compliance/app/sdk/mein-agent/
|
||||
├── page.tsx # Hauptseite mit Tabs
|
||||
├── _hooks/ # State + API-Calls
|
||||
└── _components/ # Ergebnis-Anzeige
|
||||
```
|
||||
|
||||
### Schritt 4: Next.js Proxy-Route
|
||||
|
||||
```typescript
|
||||
// admin-compliance/app/api/sdk/v1/mein-agent/[[...path]]/route.ts
|
||||
const BACKEND_URL = 'http://backend-compliance:8002'
|
||||
|
||||
export async function POST(req, { params }) {
|
||||
const path = (await params).path?.join('/') || ''
|
||||
const resp = await fetch(`${BACKEND_URL}/api/compliance/mein-agent/${path}`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: await req.text(),
|
||||
})
|
||||
return NextResponse.json(await resp.json())
|
||||
}
|
||||
```
|
||||
|
||||
### Schritt 5: In docker-compose.yml registrieren
|
||||
|
||||
Der Agent laeuft als Teil des Backend-Services (kein eigener Container noetig,
|
||||
es sei denn er braucht Playwright — dann eigenen Service wie consent-tester).
|
||||
|
||||
---
|
||||
|
||||
## 5. Timeouts
|
||||
|
||||
| Komponente | Timeout | Grund |
|
||||
|---|---|---|
|
||||
| Frontend → Backend Proxy | 300s (5 Min) | Scan kann lange dauern |
|
||||
| Backend → Playwright | 180s (3 Min) | Browser-Rendering + Crawling |
|
||||
| Backend → Ollama LLM | 180s (3 Min) | Grosse Modelle sind langsam |
|
||||
| Playwright → Website | 20s pro Seite | Einzelne Seite laden |
|
||||
| Exhaustive Crawl | 180-300s | Safety-Limit fuer rekursives Crawling |
|
||||
|
||||
---
|
||||
|
||||
## 6. Environment-Variablen
|
||||
|
||||
| Variable | Default | Beschreibung |
|
||||
|---|---|---|
|
||||
| OLLAMA_URL | http://host.docker.internal:11434 | Ollama LLM Endpunkt |
|
||||
| OLLAMA_MODEL | qwen3.5:35b-a3b | Standard-Modell |
|
||||
| BACKEND_URL | http://backend-compliance:8002 | Backend API |
|
||||
| CONSENT_SERVICE_URL | http://consent-tester:8094 | Playwright-Service |
|
||||
| RAG_SERVICE_URL | http://rag-service:8097 | RAG/Embedding Service |
|
||||
| QDRANT_URL | https://qdrant-dev.breakpilot.ai | Vektor-Datenbank |
|
||||
| SMTP_HOST | mailpit | E-Mail Server |
|
||||
| SMTP_PORT | 1025 | SMTP Port |
|
||||
|
||||
---
|
||||
|
||||
## 7. Dateien (Referenz)
|
||||
|
||||
### Backend
|
||||
| Datei | LOC | Aufgabe |
|
||||
|---|---|---|
|
||||
| agent_scan_routes.py | 448 | Scan-Orchestrierung |
|
||||
| agent_scan_helpers.py | 109 | Summary + Korrekturen |
|
||||
| agent_analyze_routes.py | 309 | Schnellanalyse |
|
||||
| agent_compare_routes.py | 94 | Multi-Vergleich |
|
||||
| agent_notification_routes.py | 111 | E-Mail |
|
||||
| service_registry.py | 508 | 82+ Service-Patterns |
|
||||
| dsi_document_checker.py | 229 | Dokument-Checklisten |
|
||||
| mandatory_content_checker.py | 274 | Art. 13 Pflichtfelder |
|
||||
| legal_basis_validator.py | 155 | Rechtsgrundlagen |
|
||||
|
||||
### Consent-Tester
|
||||
| Datei | LOC | Aufgabe |
|
||||
|---|---|---|
|
||||
| main.py | 321 | FastAPI + Endpunkte |
|
||||
| consent_scanner.py | 213 | 3-Phasen-Test |
|
||||
| banner_text_checker.py | 407 | Banner-Checks 1-11 |
|
||||
| banner_advanced_checks.py | 396 | Banner-Checks 12-20 |
|
||||
| dsi_discovery.py | 488 | Dokumenten-Suche |
|
||||
| playwright_scanner.py | 266 | Website-Crawling |
|
||||
| authenticated_scanner.py | 230 | Login-Tests |
|
||||
|
||||
### Frontend
|
||||
| Datei | LOC | Aufgabe |
|
||||
|---|---|---|
|
||||
| agent/page.tsx | 207 | 5-Tab Agent UI |
|
||||
| ScanResult.tsx | 241 | Scan-Ergebnis |
|
||||
| ConsentTestResult.tsx | 207 | Cookie-Test-Ergebnis |
|
||||
| TextReference.tsx | 108 | Korrekturvorschlaege |
|
||||
Reference in New Issue
Block a user