Files
breakpilot-compliance/zeroclaw/scripts/fetch-and-analyze.sh
Benjamin Admin 0c0dd4e3a6 feat: ZeroClaw compliance agent — document analysis + role assignment + email
Add autonomous compliance agent that fetches web documents (cookie banners,
privacy policies), classifies them via Qwen/Ollama, assesses DSGVO compliance,
assigns to the responsible role, and sends notification emails.

Components:
- ZeroClaw SOP (6-step workflow: fetch, classify, assess, summarize, assign, notify)
- Backend: /api/compliance/agent/analyze (combined endpoint)
- Backend: /api/compliance/agent/notify (standalone email)
- Frontend: /sdk/agent page (Manager UI with URL input + results)
- Helper scripts + E2E test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-27 23:28:21 +02:00

35 lines
1.0 KiB
Bash
Executable File

#!/usr/bin/env bash
#
# fetch-and-analyze.sh — Fetch a URL and extract clean text for compliance analysis.
#
# Usage: bash fetch-and-analyze.sh <url> [max_chars]
#
# Outputs clean text to stdout, truncated to max_chars (default: 4000).
set -euo pipefail
URL="${1:?Usage: fetch-and-analyze.sh <url> [max_chars]}"
MAX_CHARS="${2:-4000}"
# Fetch page with reasonable timeout and user agent
HTML=$(curl -sL --max-time 30 \
-H "User-Agent: Mozilla/5.0 (compatible; BreakPilot-Compliance-Agent/1.0)" \
"$URL" 2>/dev/null || echo "")
if [ -z "$HTML" ]; then
echo "ERROR: Could not fetch $URL" >&2
exit 1
fi
# Strip HTML: remove style/script blocks, then all tags, normalize whitespace
CLEAN=$(echo "$HTML" \
| sed 's/<style[^>]*>[^<]*<\/style>//gi' \
| sed 's/<script[^>]*>[^<]*<\/script>//gi' \
| sed 's/<[^>]*>//g' \
| sed 's/&nbsp;/ /g; s/&amp;/\&/g; s/&lt;/</g; s/&gt;/>/g; s/&quot;/"/g' \
| tr -s '[:space:]' ' ' \
| sed 's/^ //; s/ $//')
# Truncate to max chars
echo "$CLEAN" | head -c "$MAX_CHARS"