Compare commits
167 Commits
19ee99a3bc
...
feature/do
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
f89ce46631 | ||
|
|
fc71117bf2 | ||
|
|
441d5740bd | ||
|
|
ee5241a7bc | ||
|
|
e3ab428b91 | ||
| c7ab569b2b | |||
| 645973141c | |||
|
|
68692ade4e | ||
|
|
49908d72d0 | ||
|
|
1b5c2a156c | ||
|
|
159d07efd5 | ||
|
|
06431be40d | ||
|
|
9f3e5bbf9f | ||
|
|
a66b76001b | ||
|
|
3188054462 | ||
|
|
5fd65e8a38 | ||
|
|
34d2529e04 | ||
|
|
928556aa89 | ||
|
|
720493f26b | ||
|
|
ab13254636 | ||
|
|
104a506b6f | ||
|
|
92290b9035 | ||
|
|
b5d855d117 | ||
|
|
1bd57da627 | ||
|
|
f9c03c30d9 | ||
|
|
f2b225106d | ||
|
|
29d3ec60d0 | ||
|
|
bbf038d228 | ||
|
|
c967d80aed | ||
|
|
11c0c1df38 | ||
|
|
f849fd729a | ||
|
|
85949dbf8e | ||
|
|
6fba87fdd9 | ||
|
|
c7236ef7e8 | ||
|
|
307af5c901 | ||
|
|
625906f75a | ||
|
|
129072e0f0 | ||
|
|
dbc4e59e24 | ||
|
|
cf476ea986 | ||
|
|
c989af42f5 | ||
|
|
d3247ef090 | ||
|
|
90c7f9d8ec | ||
|
|
c43d39fd7f | ||
|
|
8aca75118c | ||
|
|
6bf2692faa | ||
|
|
2d85ef310a | ||
|
|
774a0ba6db | ||
|
|
566a8bf84e | ||
|
|
3567845235 | ||
|
|
c4d8da6d0d | ||
|
|
fa8010cf91 | ||
|
|
16de384831 | ||
|
|
a01e6cb88e | ||
|
|
a58cd16f01 | ||
|
|
f514667ef9 | ||
|
|
9e712465af | ||
|
|
bf22d436fb | ||
|
|
f689b892de | ||
|
|
2f2338c973 | ||
|
|
10eb0ce5f9 | ||
|
|
32616504a6 | ||
|
|
4bce3724f2 | ||
|
|
322e2d9cb3 | ||
|
|
c1a8b9d936 | ||
|
|
c374600833 | ||
|
|
87b00a94c0 | ||
|
|
978f0297eb | ||
|
|
959986356b | ||
|
|
f126b40574 | ||
|
|
fa4027d027 | ||
|
|
9da9b323fc | ||
|
|
eb263ce7a4 | ||
|
|
aece5f7414 | ||
|
|
ddabda6f05 | ||
|
|
bcbceba31c | ||
|
|
3a2567b44d | ||
|
|
df0a9d6cf0 | ||
|
|
38363b2837 | ||
|
|
96f94475f6 | ||
|
|
3fd3336f6c | ||
|
|
eaba087d11 | ||
|
|
ed2cc234b8 | ||
|
|
ffd3fd1d7c | ||
|
|
23694b6555 | ||
|
|
8979aa8e43 | ||
|
|
c433bc021e | ||
|
|
f4ed1eb10c | ||
|
|
9c8663a0f1 | ||
|
|
d1632fca17 | ||
| fcf8aa8652 | |||
|
|
65177d3ff7 | ||
|
|
559d6a351c | ||
|
|
8fd11998e4 | ||
|
|
4ce649aa71 | ||
|
|
5ee3cc0104 | ||
|
|
b36712247b | ||
|
|
cf2cabd098 | ||
|
|
8ee02bd2e4 | ||
|
|
d9687725e5 | ||
|
|
6c3911ca47 | ||
|
|
30807d1ce1 | ||
|
|
82c28a2b6e | ||
|
|
86624d72dd | ||
|
|
9218664400 | ||
|
|
8fa5d9061a | ||
|
|
84002f5719 | ||
|
|
86b11c7e5f | ||
|
|
8003dcac39 | ||
|
|
778c44226e | ||
|
|
79891063dd | ||
|
|
2c9b0dc448 | ||
|
|
3133615044 | ||
|
|
2bc0f87325 | ||
|
|
4ee38d6f0b | ||
|
|
992d4f2a6b | ||
|
|
8f5f9641c7 | ||
|
|
7cdb53051f | ||
|
|
8b87b90cbb | ||
|
|
be45adb975 | ||
|
|
7c932c441f | ||
|
|
1eb402b3da | ||
|
|
963e824328 | ||
|
|
c0782e0039 | ||
|
|
44d66e2d6c | ||
|
|
f9b475db8f | ||
|
|
0770ff499b | ||
|
|
d834753a98 | ||
|
|
395011d0f4 | ||
|
|
9e1660f954 | ||
|
|
13ff930b5e | ||
|
|
5d1c837f49 | ||
|
|
1dd9662037 | ||
|
|
4626edb232 | ||
|
|
3c29b621ac | ||
|
|
755570d474 | ||
|
|
32aade553d | ||
|
|
f467db2ea0 | ||
|
|
35aad9b169 | ||
|
|
806d3e0b56 | ||
|
|
9f0e8328e5 | ||
|
|
65184c02c3 | ||
|
|
4245e24980 | ||
|
|
8dc1b4c67f | ||
|
|
2801e44d39 | ||
|
|
62ecb3eb24 | ||
|
|
fe9a9c2df2 | ||
|
|
5fe2617857 | ||
|
|
c8cc8774db | ||
|
|
1527f4ffe7 | ||
|
|
db1b3c40ed | ||
|
|
85df14c552 | ||
|
|
72e0f18d08 | ||
|
|
e890b1490a | ||
|
|
1c8f528c7a | ||
|
|
403cb5b85d | ||
|
|
5c8307f58a | ||
|
|
92ca5b7ba5 | ||
|
|
d7cc6bfbc7 | ||
|
|
13ba1457b0 | ||
|
|
0ac23089f4 | ||
|
|
d15de16c47 | ||
|
|
e87ec2520d | ||
|
|
b7d21daa24 | ||
|
|
eb43b40dd0 | ||
|
|
bde0e11ba2 | ||
|
|
c736a596c0 | ||
|
|
022c00cd17 |
@@ -2,23 +2,55 @@
|
||||
|
||||
## Entwicklungsumgebung (WICHTIG - IMMER ZUERST LESEN)
|
||||
|
||||
### Zwei-Rechner-Setup
|
||||
### Zwei-Rechner-Setup + Coolify
|
||||
|
||||
| Geraet | Rolle | Aufgaben |
|
||||
|--------|-------|----------|
|
||||
| **MacBook** | Client | Claude Terminal, Browser (Frontend-Tests) |
|
||||
| **Mac Mini** | Server | Docker, alle Services, Code-Ausfuehrung, Tests, Git |
|
||||
| **MacBook** | Entwicklung | Claude Terminal, Code-Entwicklung, Browser (Frontend-Tests) |
|
||||
| **Mac Mini** | Lokaler Server | Docker fuer lokale Dev/Tests (NICHT fuer Production!) |
|
||||
| **Coolify** | Production | Automatisches Build + Deploy bei Push auf gitea |
|
||||
|
||||
**WICHTIG:** Die Entwicklung findet vollstaendig auf dem **Mac Mini** statt!
|
||||
**WICHTIG:** Code wird direkt auf dem MacBook in diesem Repo bearbeitet. Production-Deployment laeuft automatisch ueber Coolify.
|
||||
|
||||
### SSH-Verbindung
|
||||
### Entwicklungsworkflow (CI/CD — Coolify)
|
||||
|
||||
```bash
|
||||
ssh macmini
|
||||
# Projektverzeichnis:
|
||||
cd /Users/benjaminadmin/Projekte/breakpilot-core
|
||||
# 1. Code auf MacBook bearbeiten (dieses Verzeichnis)
|
||||
# 2. Committen und zu BEIDEN Remotes pushen:
|
||||
git push origin main && git push gitea main
|
||||
|
||||
# Einzelbefehle (BEVORZUGT):
|
||||
# 3. FERTIG! Push auf gitea triggert automatisch:
|
||||
# - Gitea Actions: Tests
|
||||
# - Coolify: Build → Deploy
|
||||
```
|
||||
|
||||
**NIEMALS** manuell in Coolify auf "Redeploy" klicken — Gitea Actions triggert Coolify automatisch.
|
||||
**IMMER auf `main` pushen** — sowohl origin als auch gitea.
|
||||
|
||||
### Post-Push Deploy-Monitoring (PFLICHT nach jedem Push auf gitea)
|
||||
|
||||
**IMMER wenn Claude auf gitea pusht, MUSS danach automatisch das Deploy-Monitoring laufen:**
|
||||
|
||||
1. Dem User sofort mitteilen: "Deploy gestartet, ich ueberwache den Status..."
|
||||
2. Im Hintergrund Health-Checks pollen (alle 20 Sekunden, max 5 Minuten):
|
||||
```bash
|
||||
curl -sf https://api-dev.breakpilot.ai/health # Compliance Backend
|
||||
curl -sf https://sdk-dev.breakpilot.ai/health # AI SDK
|
||||
```
|
||||
3. Sobald ALLE Endpoints healthy sind, dem User im Chat melden:
|
||||
**"Deploy abgeschlossen! Du kannst jetzt testen."**
|
||||
4. Falls nach 5 Minuten noch nicht healthy → Fehlermeldung mit Hinweis auf Coolify-Logs.
|
||||
|
||||
### Lokale Entwicklung (Mac Mini — optional, nur Dev/Tests)
|
||||
|
||||
```bash
|
||||
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-core && git pull --no-rebase origin main"
|
||||
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-core && /usr/local/bin/docker compose build --no-cache <service> && /usr/local/bin/docker compose up -d <service>"
|
||||
```
|
||||
|
||||
### SSH-Verbindung (fuer lokale Docker/Tests)
|
||||
|
||||
```bash
|
||||
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-core && <cmd>"
|
||||
```
|
||||
|
||||
@@ -44,6 +76,14 @@ networks:
|
||||
name: breakpilot-network # Fixer Name, kein Auto-Prefix!
|
||||
```
|
||||
|
||||
### Deployment-Modell
|
||||
|
||||
| Repo | Deployment | Trigger |
|
||||
|------|-----------|---------|
|
||||
| **breakpilot-core** | Coolify (automatisch) | Push auf gitea main |
|
||||
| **breakpilot-compliance** | Coolify (automatisch) | Push auf gitea main |
|
||||
| **breakpilot-lehrer** | Mac Mini (lokal) | Manuell docker compose |
|
||||
|
||||
---
|
||||
|
||||
## Haupt-URLs (via Nginx Reverse Proxy)
|
||||
@@ -154,7 +194,7 @@ networks:
|
||||
| `compliance` | Compliance | compliance_*, dsr, gdpr, sdk_tenants, consent_admin |
|
||||
|
||||
```bash
|
||||
# DB-Zugang
|
||||
# DB-Zugang (lokal)
|
||||
ssh macmini "docker exec bp-core-postgres psql -U breakpilot -d breakpilot_db"
|
||||
```
|
||||
|
||||
@@ -178,15 +218,45 @@ breakpilot-core/
|
||||
├── gitea/ # Gitea Config
|
||||
├── docs-src/ # MkDocs Quellen
|
||||
├── mkdocs.yml # MkDocs Config
|
||||
├── control-pipeline/ # RAG/Control Pipeline (Port 8098)
|
||||
├── scripts/ # Helper Scripts
|
||||
└── docker-compose.yml # Haupt-Compose (28+ Services)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Control Pipeline (WICHTIG)
|
||||
|
||||
**Seit 2026-04-09 liegt die gesamte RAG/Control-Pipeline im Core-Repo** (`control-pipeline/`), NICHT mehr im Compliance-Repo. Alle Arbeiten an der Pipeline (Pass 0a/0b, BatchDedup, Control Generator, Enrichment) finden ausschliesslich hier statt.
|
||||
|
||||
- **Port:** 8098
|
||||
- **Container:** bp-core-control-pipeline
|
||||
- **DB:** Schreibt ins `compliance`-Schema der shared PostgreSQL
|
||||
- **Das Compliance-Repo wird NICHT fuer Pipeline-Aenderungen benutzt**
|
||||
|
||||
```bash
|
||||
# Container auf Mac Mini
|
||||
ssh macmini "cd ~/Projekte/breakpilot-core && /usr/local/bin/docker compose build --no-cache control-pipeline && /usr/local/bin/docker compose up -d --no-deps control-pipeline"
|
||||
|
||||
# Health
|
||||
ssh macmini "/usr/local/bin/docker exec bp-core-control-pipeline curl -sf http://127.0.0.1:8098/health"
|
||||
|
||||
# Logs
|
||||
ssh macmini "/usr/local/bin/docker logs -f bp-core-control-pipeline"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Haeufige Befehle
|
||||
|
||||
### Docker
|
||||
### Deployment (CI/CD — Standardweg)
|
||||
|
||||
```bash
|
||||
# Committen und pushen → Coolify deployt automatisch:
|
||||
git push origin main && git push gitea main
|
||||
```
|
||||
|
||||
### Lokale Docker-Befehle (Mac Mini — nur Dev/Tests)
|
||||
|
||||
```bash
|
||||
# Alle Core-Services starten
|
||||
@@ -204,31 +274,15 @@ ssh macmini "/usr/local/bin/docker ps --filter name=bp-core"
|
||||
|
||||
**WICHTIG:** Docker-Pfad auf Mac Mini ist `/usr/local/bin/docker` (nicht im Standard-SSH-PATH).
|
||||
|
||||
### Alle 3 Projekte starten
|
||||
|
||||
```bash
|
||||
# 1. Core (MUSS zuerst!)
|
||||
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-core && /usr/local/bin/docker compose up -d"
|
||||
# Warten auf Health:
|
||||
ssh macmini "curl -sf http://127.0.0.1:8099/health"
|
||||
|
||||
# 2. Lehrer
|
||||
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-lehrer && /usr/local/bin/docker compose up -d"
|
||||
|
||||
# 3. Compliance
|
||||
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-compliance && /usr/local/bin/docker compose up -d"
|
||||
```
|
||||
|
||||
### Git
|
||||
|
||||
```bash
|
||||
# Zu BEIDEN Remotes pushen (PFLICHT!):
|
||||
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-core && git push all main"
|
||||
git push origin main && git push gitea main
|
||||
|
||||
# Remotes:
|
||||
# origin: lokale Gitea (macmini:3003)
|
||||
# gitea: gitea.meghsakha.com
|
||||
# all: beide gleichzeitig
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
65
.env.coolify.example
Normal file
65
.env.coolify.example
Normal file
@@ -0,0 +1,65 @@
|
||||
# =========================================================
|
||||
# BreakPilot Core — Coolify Environment Variables
|
||||
# =========================================================
|
||||
# Copy these into Coolify's environment variable UI
|
||||
# for the breakpilot-core Docker Compose resource.
|
||||
# =========================================================
|
||||
|
||||
# --- External PostgreSQL (Coolify-managed) ---
|
||||
POSTGRES_HOST=<coolify-postgres-hostname>
|
||||
POSTGRES_PORT=5432
|
||||
POSTGRES_USER=breakpilot
|
||||
POSTGRES_PASSWORD=CHANGE_ME_STRONG_PASSWORD
|
||||
POSTGRES_DB=breakpilot_db
|
||||
|
||||
# --- Security ---
|
||||
JWT_SECRET=CHANGE_ME_RANDOM_64_CHARS
|
||||
JWT_REFRESH_SECRET=CHANGE_ME_ANOTHER_RANDOM_64_CHARS
|
||||
INTERNAL_API_KEY=CHANGE_ME_INTERNAL_KEY
|
||||
|
||||
# --- External S3 Storage ---
|
||||
S3_ENDPOINT=<s3-endpoint-host:port>
|
||||
S3_ACCESS_KEY=CHANGE_ME_S3_ACCESS_KEY
|
||||
S3_SECRET_KEY=CHANGE_ME_S3_SECRET_KEY
|
||||
S3_BUCKET=breakpilot-rag
|
||||
S3_SECURE=true
|
||||
|
||||
# --- External Qdrant (Coolify-managed) ---
|
||||
QDRANT_URL=http://<coolify-qdrant-hostname>:6333
|
||||
QDRANT_API_KEY=
|
||||
|
||||
# --- SMTP (Real mail server) ---
|
||||
SMTP_HOST=smtp.example.com
|
||||
SMTP_PORT=587
|
||||
SMTP_USERNAME=noreply@breakpilot.ai
|
||||
SMTP_PASSWORD=CHANGE_ME_SMTP_PASSWORD
|
||||
SMTP_FROM_NAME=BreakPilot
|
||||
SMTP_FROM_ADDR=noreply@breakpilot.ai
|
||||
|
||||
# --- Session ---
|
||||
SESSION_TTL_HOURS=24
|
||||
|
||||
# --- Frontend URLs (build args) ---
|
||||
NEXT_PUBLIC_CORE_API_URL=https://api-core.breakpilot.ai
|
||||
FRONTEND_URL=https://www.breakpilot.ai
|
||||
|
||||
# --- Stripe (Billing) ---
|
||||
STRIPE_SECRET_KEY=
|
||||
STRIPE_WEBHOOK_SECRET=
|
||||
STRIPE_PUBLISHABLE_KEY=
|
||||
BILLING_SUCCESS_URL=https://www.breakpilot.ai/billing/success
|
||||
BILLING_CANCEL_URL=https://www.breakpilot.ai/billing/cancel
|
||||
TRIAL_PERIOD_DAYS=14
|
||||
|
||||
# --- Embedding Service ---
|
||||
EMBEDDING_BACKEND=local
|
||||
LOCAL_EMBEDDING_MODEL=BAAI/bge-m3
|
||||
LOCAL_RERANKER_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2
|
||||
PDF_EXTRACTION_BACKEND=pymupdf
|
||||
OPENAI_API_KEY=
|
||||
COHERE_API_KEY=
|
||||
LOG_LEVEL=INFO
|
||||
|
||||
# --- Ollama (optional, for RAG embeddings) ---
|
||||
OLLAMA_URL=
|
||||
OLLAMA_EMBED_MODEL=bge-m3
|
||||
@@ -46,11 +46,6 @@ ERPNEXT_DB_ROOT_PASSWORD=erpnext_root
|
||||
ERPNEXT_DB_PASSWORD=erpnext_secret
|
||||
ERPNEXT_ADMIN_PASSWORD=admin
|
||||
|
||||
# Woodpecker CI
|
||||
WOODPECKER_HOST=http://macmini:8090
|
||||
WOODPECKER_ADMIN=pilotadmin
|
||||
WOODPECKER_AGENT_SECRET=woodpecker-secret
|
||||
|
||||
# Gitea Runner
|
||||
GITEA_RUNNER_TOKEN=
|
||||
|
||||
|
||||
@@ -20,11 +20,14 @@ jobs:
|
||||
# ========================================
|
||||
|
||||
go-lint:
|
||||
runs-on: ubuntu-latest
|
||||
runs-on: docker
|
||||
if: github.event_name == 'pull_request'
|
||||
container: golangci/golangci-lint:v1.55-alpine
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: Checkout
|
||||
run: |
|
||||
apk add --no-cache git
|
||||
git clone --depth 1 --branch ${GITHUB_REF_NAME} ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git .
|
||||
- name: Lint consent-service
|
||||
run: |
|
||||
if [ -d "consent-service" ]; then
|
||||
@@ -32,11 +35,14 @@ jobs:
|
||||
fi
|
||||
|
||||
python-lint:
|
||||
runs-on: ubuntu-latest
|
||||
runs-on: docker
|
||||
if: github.event_name == 'pull_request'
|
||||
container: python:3.12-slim
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: Checkout
|
||||
run: |
|
||||
apt-get update -qq && apt-get install -y -qq git > /dev/null 2>&1
|
||||
git clone --depth 1 --branch ${GITHUB_REF_NAME} ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git .
|
||||
- name: Lint Python services
|
||||
run: |
|
||||
pip install --quiet ruff
|
||||
@@ -48,11 +54,14 @@ jobs:
|
||||
done
|
||||
|
||||
nodejs-lint:
|
||||
runs-on: ubuntu-latest
|
||||
runs-on: docker
|
||||
if: github.event_name == 'pull_request'
|
||||
container: node:20-alpine
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: Checkout
|
||||
run: |
|
||||
apk add --no-cache git
|
||||
git clone --depth 1 --branch ${GITHUB_REF_NAME} ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git .
|
||||
- name: Lint admin-core
|
||||
run: |
|
||||
if [ -d "admin-core" ]; then
|
||||
@@ -66,16 +75,18 @@ jobs:
|
||||
# ========================================
|
||||
|
||||
test-go-consent:
|
||||
runs-on: ubuntu-latest
|
||||
runs-on: docker
|
||||
container: golang:1.23-alpine
|
||||
env:
|
||||
CGO_ENABLED: "0"
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: Checkout
|
||||
run: |
|
||||
apk add --no-cache git
|
||||
git clone --depth 1 --branch ${GITHUB_REF_NAME} ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git .
|
||||
- name: Test consent-service
|
||||
run: |
|
||||
apk add --no-cache jq bash
|
||||
if [ \! -d "consent-service" ]; then
|
||||
if [ ! -d "consent-service" ]; then
|
||||
echo "WARNUNG: consent-service nicht gefunden"
|
||||
exit 0
|
||||
fi
|
||||
@@ -85,15 +96,18 @@ jobs:
|
||||
echo "Coverage: $COVERAGE"
|
||||
|
||||
test-python-voice:
|
||||
runs-on: ubuntu-latest
|
||||
runs-on: docker
|
||||
container: python:3.12-slim
|
||||
env:
|
||||
CI: "true"
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: Checkout
|
||||
run: |
|
||||
apt-get update -qq && apt-get install -y -qq git > /dev/null 2>&1
|
||||
git clone --depth 1 --branch ${GITHUB_REF_NAME} ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git .
|
||||
- name: Test voice-service
|
||||
run: |
|
||||
if [ \! -d "voice-service" ]; then
|
||||
if [ ! -d "voice-service" ]; then
|
||||
echo "WARNUNG: voice-service nicht gefunden"
|
||||
exit 0
|
||||
fi
|
||||
@@ -104,15 +118,18 @@ jobs:
|
||||
python -m pytest tests/ -v --tb=short --ignore=tests/bqas
|
||||
|
||||
test-bqas:
|
||||
runs-on: ubuntu-latest
|
||||
runs-on: docker
|
||||
container: python:3.12-slim
|
||||
env:
|
||||
CI: "true"
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
- name: Checkout
|
||||
run: |
|
||||
apt-get update -qq && apt-get install -y -qq git > /dev/null 2>&1
|
||||
git clone --depth 1 --branch ${GITHUB_REF_NAME} ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git .
|
||||
- name: Test BQAS
|
||||
run: |
|
||||
if [ \! -d "voice-service/tests/bqas" ]; then
|
||||
if [ ! -d "voice-service/tests/bqas" ]; then
|
||||
echo "WARNUNG: BQAS Tests nicht gefunden"
|
||||
exit 0
|
||||
fi
|
||||
@@ -121,3 +138,22 @@ jobs:
|
||||
pip install --quiet --no-cache-dir -r requirements.txt 2>/dev/null || true
|
||||
pip install --quiet --no-cache-dir fastapi uvicorn pydantic pytest pytest-asyncio
|
||||
python -m pytest tests/bqas/ -v --tb=short || true
|
||||
|
||||
# ========================================
|
||||
# Deploy via Coolify (nur main, kein PR)
|
||||
# ========================================
|
||||
|
||||
deploy-coolify:
|
||||
name: Deploy
|
||||
runs-on: docker
|
||||
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
|
||||
needs:
|
||||
- test-go-consent
|
||||
container:
|
||||
image: alpine:latest
|
||||
steps:
|
||||
- name: Trigger Coolify deploy
|
||||
run: |
|
||||
apk add --no-cache curl
|
||||
curl -sf "${{ secrets.COOLIFY_WEBHOOK }}" \
|
||||
-H "Authorization: Bearer ${{ secrets.COOLIFY_TOKEN }}"
|
||||
|
||||
27
.gitea/workflows/deploy-coolify.yml
Normal file
27
.gitea/workflows/deploy-coolify.yml
Normal file
@@ -0,0 +1,27 @@
|
||||
name: Deploy to Coolify
|
||||
|
||||
on:
|
||||
push:
|
||||
branches:
|
||||
- coolify
|
||||
|
||||
jobs:
|
||||
deploy:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Deploy via Coolify API
|
||||
run: |
|
||||
echo "Deploying breakpilot-core to Coolify..."
|
||||
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
|
||||
-X POST \
|
||||
-H "Authorization: Bearer ${{ secrets.COOLIFY_API_TOKEN }}" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"uuid": "${{ secrets.COOLIFY_RESOURCE_UUID }}", "force_rebuild": true}' \
|
||||
"${{ secrets.COOLIFY_BASE_URL }}/api/v1/deploy")
|
||||
|
||||
echo "HTTP Status: $HTTP_STATUS"
|
||||
if [ "$HTTP_STATUS" -ne 200 ] && [ "$HTTP_STATUS" -ne 201 ]; then
|
||||
echo "Deployment failed with status $HTTP_STATUS"
|
||||
exit 1
|
||||
fi
|
||||
echo "Deployment triggered successfully!"
|
||||
@@ -1,422 +0,0 @@
|
||||
# Woodpecker CI Main Pipeline
|
||||
# BreakPilot Core - CI/CD Pipeline
|
||||
#
|
||||
# Plattform: ARM64 (Apple Silicon Mac Mini)
|
||||
#
|
||||
# Services:
|
||||
# Go: consent-service
|
||||
# Python: backend-core, voice-service (+ BQAS), embedding-service, night-scheduler
|
||||
# Node.js: admin-core
|
||||
#
|
||||
# Strategie:
|
||||
# - Lint bei PRs
|
||||
# - Tests laufen bei JEDEM Push/PR
|
||||
# - Test-Ergebnisse werden an Dashboard gesendet
|
||||
# - Builds/Scans laufen nur bei Tags oder manuell
|
||||
# - Deployment nur manuell (Sicherheit)
|
||||
|
||||
when:
|
||||
- event: [push, pull_request, manual, tag]
|
||||
branch: [main, develop]
|
||||
|
||||
clone:
|
||||
git:
|
||||
image: woodpeckerci/plugin-git
|
||||
settings:
|
||||
depth: 1
|
||||
extra_hosts:
|
||||
- macmini:192.168.178.100
|
||||
|
||||
variables:
|
||||
- &golang_image golang:1.23-alpine
|
||||
- &python_image python:3.12-slim
|
||||
- &nodejs_image node:20-alpine
|
||||
- &docker_image docker:27-cli
|
||||
|
||||
steps:
|
||||
# ========================================
|
||||
# STAGE 1: Lint (nur bei PRs)
|
||||
# ========================================
|
||||
|
||||
go-lint:
|
||||
image: golangci/golangci-lint:v1.55-alpine
|
||||
commands:
|
||||
- cd consent-service && golangci-lint run --timeout 5m ./...
|
||||
when:
|
||||
event: pull_request
|
||||
|
||||
python-lint:
|
||||
image: *python_image
|
||||
commands:
|
||||
- pip install --quiet ruff
|
||||
- |
|
||||
for svc in backend-core voice-service night-scheduler embedding-service; do
|
||||
if [ -d "$svc" ]; then
|
||||
echo "=== Linting $svc ==="
|
||||
ruff check "$svc/" --output-format=github || true
|
||||
fi
|
||||
done
|
||||
when:
|
||||
event: pull_request
|
||||
|
||||
nodejs-lint:
|
||||
image: *nodejs_image
|
||||
commands:
|
||||
- |
|
||||
if [ -d "admin-core" ]; then
|
||||
cd admin-core
|
||||
npm ci --silent 2>/dev/null || npm install --silent
|
||||
npx next lint || true
|
||||
fi
|
||||
when:
|
||||
event: pull_request
|
||||
|
||||
# ========================================
|
||||
# STAGE 2: Unit Tests mit JSON-Ausgabe
|
||||
# Ergebnisse werden im Workspace gespeichert (.ci-results/)
|
||||
# ========================================
|
||||
|
||||
test-go-consent:
|
||||
image: *golang_image
|
||||
environment:
|
||||
CGO_ENABLED: "0"
|
||||
commands:
|
||||
- |
|
||||
set -euo pipefail
|
||||
apk add --no-cache jq bash
|
||||
mkdir -p .ci-results
|
||||
|
||||
if [ ! -d "consent-service" ]; then
|
||||
echo '{"service":"consent-service","framework":"go","total":0,"passed":0,"failed":0,"skipped":0,"coverage":0}' > .ci-results/results-consent.json
|
||||
echo "WARNUNG: consent-service Verzeichnis nicht gefunden"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
cd consent-service
|
||||
set +e
|
||||
go test -v -json -coverprofile=coverage.out ./... 2>&1 | tee ../.ci-results/test-consent.json
|
||||
TEST_EXIT=$?
|
||||
set -e
|
||||
|
||||
JSON_FILE="../.ci-results/test-consent.json"
|
||||
if grep -q '^{' "$JSON_FILE" 2>/dev/null; then
|
||||
TOTAL=$(grep '^{' "$JSON_FILE" | jq -s '[.[] | select(.Action=="run" and .Test != null)] | length')
|
||||
PASSED=$(grep '^{' "$JSON_FILE" | jq -s '[.[] | select(.Action=="pass" and .Test != null)] | length')
|
||||
FAILED=$(grep '^{' "$JSON_FILE" | jq -s '[.[] | select(.Action=="fail" and .Test != null)] | length')
|
||||
SKIPPED=$(grep '^{' "$JSON_FILE" | jq -s '[.[] | select(.Action=="skip" and .Test != null)] | length')
|
||||
else
|
||||
echo "WARNUNG: Keine JSON-Zeilen in $JSON_FILE gefunden (Build-Fehler?)"
|
||||
TOTAL=0; PASSED=0; FAILED=0; SKIPPED=0
|
||||
fi
|
||||
|
||||
COVERAGE=$(go tool cover -func=coverage.out 2>/dev/null | tail -1 | awk '{print $3}' | tr -d '%' || echo "0")
|
||||
[ -z "$COVERAGE" ] && COVERAGE=0
|
||||
|
||||
echo "{\"service\":\"consent-service\",\"framework\":\"go\",\"total\":$TOTAL,\"passed\":$PASSED,\"failed\":$FAILED,\"skipped\":$SKIPPED,\"coverage\":$COVERAGE}" > ../.ci-results/results-consent.json
|
||||
cat ../.ci-results/results-consent.json
|
||||
|
||||
# Backlog-Strategie: Fehler werden gemeldet aber Pipeline laeuft weiter
|
||||
if [ "$FAILED" -gt "0" ]; then
|
||||
echo "WARNUNG: $FAILED Tests fehlgeschlagen - werden ins Backlog geschrieben"
|
||||
fi
|
||||
|
||||
test-python-voice:
|
||||
image: *python_image
|
||||
environment:
|
||||
CI: "true"
|
||||
commands:
|
||||
- |
|
||||
set -uo pipefail
|
||||
mkdir -p .ci-results
|
||||
|
||||
if [ ! -d "voice-service" ]; then
|
||||
echo '{"service":"voice-service","framework":"pytest","total":0,"passed":0,"failed":0,"skipped":0,"coverage":0}' > .ci-results/results-voice.json
|
||||
echo "WARNUNG: voice-service Verzeichnis nicht gefunden"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
cd voice-service
|
||||
export PYTHONPATH="$(pwd):${PYTHONPATH:-}"
|
||||
pip install --quiet --no-cache-dir -r requirements.txt 2>/dev/null || true
|
||||
pip install --quiet --no-cache-dir fastapi uvicorn pydantic pytest pytest-json-report
|
||||
|
||||
set +e
|
||||
python -m pytest tests/ -v --tb=short --ignore=tests/bqas --json-report --json-report-file=../.ci-results/test-voice.json
|
||||
TEST_EXIT=$?
|
||||
set -e
|
||||
|
||||
if [ -f ../.ci-results/test-voice.json ]; then
|
||||
TOTAL=$(python3 -c "import json; d=json.load(open('../.ci-results/test-voice.json')); print(d.get('summary',{}).get('total',0))" 2>/dev/null || echo "0")
|
||||
PASSED=$(python3 -c "import json; d=json.load(open('../.ci-results/test-voice.json')); print(d.get('summary',{}).get('passed',0))" 2>/dev/null || echo "0")
|
||||
FAILED=$(python3 -c "import json; d=json.load(open('../.ci-results/test-voice.json')); print(d.get('summary',{}).get('failed',0))" 2>/dev/null || echo "0")
|
||||
SKIPPED=$(python3 -c "import json; d=json.load(open('../.ci-results/test-voice.json')); print(d.get('summary',{}).get('skipped',0))" 2>/dev/null || echo "0")
|
||||
else
|
||||
TOTAL=0; PASSED=0; FAILED=0; SKIPPED=0
|
||||
fi
|
||||
|
||||
echo "{\"service\":\"voice-service\",\"framework\":\"pytest\",\"total\":$TOTAL,\"passed\":$PASSED,\"failed\":$FAILED,\"skipped\":$SKIPPED,\"coverage\":0}" > ../.ci-results/results-voice.json
|
||||
cat ../.ci-results/results-voice.json
|
||||
|
||||
if [ "$TEST_EXIT" -ne "0" ]; then exit 1; fi
|
||||
|
||||
test-bqas-golden:
|
||||
image: *python_image
|
||||
commands:
|
||||
- |
|
||||
set -uo pipefail
|
||||
mkdir -p .ci-results
|
||||
|
||||
if [ ! -d "voice-service/tests/bqas" ]; then
|
||||
echo '{"service":"bqas-golden","framework":"pytest","total":0,"passed":0,"failed":0,"skipped":0,"coverage":0}' > .ci-results/results-bqas-golden.json
|
||||
echo "WARNUNG: voice-service/tests/bqas Verzeichnis nicht gefunden"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
cd voice-service
|
||||
export PYTHONPATH="$(pwd):${PYTHONPATH:-}"
|
||||
pip install --quiet --no-cache-dir -r requirements.txt 2>/dev/null || true
|
||||
pip install --quiet --no-cache-dir fastapi uvicorn pydantic pytest pytest-json-report pytest-asyncio
|
||||
|
||||
set +e
|
||||
python -m pytest tests/bqas/test_golden.py tests/bqas/test_regression.py tests/bqas/test_synthetic.py -v --tb=short --json-report --json-report-file=../.ci-results/test-bqas-golden.json
|
||||
TEST_EXIT=$?
|
||||
set -e
|
||||
|
||||
if [ -f ../.ci-results/test-bqas-golden.json ]; then
|
||||
TOTAL=$(python3 -c "import json; d=json.load(open('../.ci-results/test-bqas-golden.json')); print(d.get('summary',{}).get('total',0))" 2>/dev/null || echo "0")
|
||||
PASSED=$(python3 -c "import json; d=json.load(open('../.ci-results/test-bqas-golden.json')); print(d.get('summary',{}).get('passed',0))" 2>/dev/null || echo "0")
|
||||
FAILED=$(python3 -c "import json; d=json.load(open('../.ci-results/test-bqas-golden.json')); print(d.get('summary',{}).get('failed',0))" 2>/dev/null || echo "0")
|
||||
SKIPPED=$(python3 -c "import json; d=json.load(open('../.ci-results/test-bqas-golden.json')); print(d.get('summary',{}).get('skipped',0))" 2>/dev/null || echo "0")
|
||||
else
|
||||
TOTAL=0; PASSED=0; FAILED=0; SKIPPED=0
|
||||
fi
|
||||
|
||||
echo "{\"service\":\"bqas-golden\",\"framework\":\"pytest\",\"total\":$TOTAL,\"passed\":$PASSED,\"failed\":$FAILED,\"skipped\":$SKIPPED,\"coverage\":0}" > ../.ci-results/results-bqas-golden.json
|
||||
cat ../.ci-results/results-bqas-golden.json
|
||||
|
||||
# BQAS tests may skip if Ollama not available - don't fail pipeline
|
||||
if [ "$FAILED" -gt "0" ]; then exit 1; fi
|
||||
|
||||
test-bqas-rag:
|
||||
image: *python_image
|
||||
commands:
|
||||
- |
|
||||
set -uo pipefail
|
||||
mkdir -p .ci-results
|
||||
|
||||
if [ ! -d "voice-service/tests/bqas" ]; then
|
||||
echo '{"service":"bqas-rag","framework":"pytest","total":0,"passed":0,"failed":0,"skipped":0,"coverage":0}' > .ci-results/results-bqas-rag.json
|
||||
echo "WARNUNG: voice-service/tests/bqas Verzeichnis nicht gefunden"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
cd voice-service
|
||||
export PYTHONPATH="$(pwd):${PYTHONPATH:-}"
|
||||
pip install --quiet --no-cache-dir -r requirements.txt 2>/dev/null || true
|
||||
pip install --quiet --no-cache-dir fastapi uvicorn pydantic pytest pytest-json-report pytest-asyncio
|
||||
|
||||
set +e
|
||||
python -m pytest tests/bqas/test_rag.py tests/bqas/test_notifier.py -v --tb=short --json-report --json-report-file=../.ci-results/test-bqas-rag.json
|
||||
TEST_EXIT=$?
|
||||
set -e
|
||||
|
||||
if [ -f ../.ci-results/test-bqas-rag.json ]; then
|
||||
TOTAL=$(python3 -c "import json; d=json.load(open('../.ci-results/test-bqas-rag.json')); print(d.get('summary',{}).get('total',0))" 2>/dev/null || echo "0")
|
||||
PASSED=$(python3 -c "import json; d=json.load(open('../.ci-results/test-bqas-rag.json')); print(d.get('summary',{}).get('passed',0))" 2>/dev/null || echo "0")
|
||||
FAILED=$(python3 -c "import json; d=json.load(open('../.ci-results/test-bqas-rag.json')); print(d.get('summary',{}).get('failed',0))" 2>/dev/null || echo "0")
|
||||
SKIPPED=$(python3 -c "import json; d=json.load(open('../.ci-results/test-bqas-rag.json')); print(d.get('summary',{}).get('skipped',0))" 2>/dev/null || echo "0")
|
||||
else
|
||||
TOTAL=0; PASSED=0; FAILED=0; SKIPPED=0
|
||||
fi
|
||||
|
||||
echo "{\"service\":\"bqas-rag\",\"framework\":\"pytest\",\"total\":$TOTAL,\"passed\":$PASSED,\"failed\":$FAILED,\"skipped\":$SKIPPED,\"coverage\":0}" > ../.ci-results/results-bqas-rag.json
|
||||
cat ../.ci-results/results-bqas-rag.json
|
||||
|
||||
# BQAS tests may skip if Ollama not available - don't fail pipeline
|
||||
if [ "$FAILED" -gt "0" ]; then exit 1; fi
|
||||
|
||||
# ========================================
|
||||
# STAGE 3: Test-Ergebnisse an Dashboard senden
|
||||
# ========================================
|
||||
|
||||
report-test-results:
|
||||
image: curlimages/curl:8.10.1
|
||||
commands:
|
||||
- |
|
||||
set -uo pipefail
|
||||
echo "=== Sende Test-Ergebnisse an Dashboard ==="
|
||||
echo "Pipeline Status: ${CI_PIPELINE_STATUS:-unknown}"
|
||||
ls -la .ci-results/ || echo "Verzeichnis nicht gefunden"
|
||||
|
||||
PIPELINE_STATUS="${CI_PIPELINE_STATUS:-unknown}"
|
||||
|
||||
for f in .ci-results/results-*.json; do
|
||||
[ -f "$f" ] || continue
|
||||
echo "Sending: $f"
|
||||
curl -f -sS -X POST "http://backend:8000/api/tests/ci-result" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "{
|
||||
\"pipeline_id\": \"${CI_PIPELINE_NUMBER}\",
|
||||
\"commit\": \"${CI_COMMIT_SHA}\",
|
||||
\"branch\": \"${CI_COMMIT_BRANCH}\",
|
||||
\"repo\": \"breakpilot-core\",
|
||||
\"status\": \"${PIPELINE_STATUS}\",
|
||||
\"test_results\": $(cat "$f")
|
||||
}" || echo "WARNUNG: Konnte $f nicht senden"
|
||||
done
|
||||
|
||||
echo "=== Test-Ergebnisse gesendet ==="
|
||||
when:
|
||||
status: [success, failure]
|
||||
depends_on:
|
||||
- test-go-consent
|
||||
- test-python-voice
|
||||
- test-bqas-golden
|
||||
- test-bqas-rag
|
||||
|
||||
# ========================================
|
||||
# STAGE 4: Build & Security (nur Tags/manuell)
|
||||
# ========================================
|
||||
|
||||
build-consent-service:
|
||||
image: *docker_image
|
||||
commands:
|
||||
- |
|
||||
if [ -d ./consent-service ]; then
|
||||
docker build -t breakpilot/consent-service:${CI_COMMIT_SHA:0:8} ./consent-service
|
||||
docker tag breakpilot/consent-service:${CI_COMMIT_SHA:0:8} breakpilot/consent-service:latest
|
||||
echo "Built breakpilot/consent-service:${CI_COMMIT_SHA:0:8}"
|
||||
else
|
||||
echo "consent-service Verzeichnis nicht gefunden - ueberspringe"
|
||||
fi
|
||||
when:
|
||||
- event: tag
|
||||
- event: manual
|
||||
|
||||
build-backend-core:
|
||||
image: *docker_image
|
||||
commands:
|
||||
- |
|
||||
if [ -d ./backend-core ]; then
|
||||
docker build -t breakpilot/backend-core:${CI_COMMIT_SHA:0:8} ./backend-core
|
||||
docker tag breakpilot/backend-core:${CI_COMMIT_SHA:0:8} breakpilot/backend-core:latest
|
||||
echo "Built breakpilot/backend-core:${CI_COMMIT_SHA:0:8}"
|
||||
else
|
||||
echo "backend-core Verzeichnis nicht gefunden - ueberspringe"
|
||||
fi
|
||||
when:
|
||||
- event: tag
|
||||
- event: manual
|
||||
|
||||
build-admin-core:
|
||||
image: *docker_image
|
||||
commands:
|
||||
- |
|
||||
if [ -d ./admin-core ]; then
|
||||
docker build -t breakpilot/admin-core:${CI_COMMIT_SHA:0:8} ./admin-core
|
||||
docker tag breakpilot/admin-core:${CI_COMMIT_SHA:0:8} breakpilot/admin-core:latest
|
||||
echo "Built breakpilot/admin-core:${CI_COMMIT_SHA:0:8}"
|
||||
else
|
||||
echo "admin-core Verzeichnis nicht gefunden - ueberspringe"
|
||||
fi
|
||||
when:
|
||||
- event: tag
|
||||
- event: manual
|
||||
|
||||
build-voice-service:
|
||||
image: *docker_image
|
||||
commands:
|
||||
- |
|
||||
if [ -d ./voice-service ]; then
|
||||
docker build -t breakpilot/voice-service:${CI_COMMIT_SHA:0:8} ./voice-service
|
||||
docker tag breakpilot/voice-service:${CI_COMMIT_SHA:0:8} breakpilot/voice-service:latest
|
||||
echo "Built breakpilot/voice-service:${CI_COMMIT_SHA:0:8}"
|
||||
else
|
||||
echo "voice-service Verzeichnis nicht gefunden - ueberspringe"
|
||||
fi
|
||||
when:
|
||||
- event: tag
|
||||
- event: manual
|
||||
|
||||
build-embedding-service:
|
||||
image: *docker_image
|
||||
commands:
|
||||
- |
|
||||
if [ -d ./embedding-service ]; then
|
||||
docker build -t breakpilot/embedding-service:${CI_COMMIT_SHA:0:8} ./embedding-service
|
||||
docker tag breakpilot/embedding-service:${CI_COMMIT_SHA:0:8} breakpilot/embedding-service:latest
|
||||
echo "Built breakpilot/embedding-service:${CI_COMMIT_SHA:0:8}"
|
||||
else
|
||||
echo "embedding-service Verzeichnis nicht gefunden - ueberspringe"
|
||||
fi
|
||||
when:
|
||||
- event: tag
|
||||
- event: manual
|
||||
|
||||
build-night-scheduler:
|
||||
image: *docker_image
|
||||
commands:
|
||||
- |
|
||||
if [ -d ./night-scheduler ]; then
|
||||
docker build -t breakpilot/night-scheduler:${CI_COMMIT_SHA:0:8} ./night-scheduler
|
||||
docker tag breakpilot/night-scheduler:${CI_COMMIT_SHA:0:8} breakpilot/night-scheduler:latest
|
||||
echo "Built breakpilot/night-scheduler:${CI_COMMIT_SHA:0:8}"
|
||||
else
|
||||
echo "night-scheduler Verzeichnis nicht gefunden - ueberspringe"
|
||||
fi
|
||||
when:
|
||||
- event: tag
|
||||
- event: manual
|
||||
|
||||
generate-sbom:
|
||||
image: *golang_image
|
||||
commands:
|
||||
- |
|
||||
echo "Installing syft for ARM64..."
|
||||
wget -qO- https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b /usr/local/bin
|
||||
for svc in consent-service backend-core voice-service embedding-service night-scheduler; do
|
||||
if [ -d "./$svc" ]; then
|
||||
syft dir:./$svc -o cyclonedx-json > sbom-$svc.json
|
||||
echo "SBOM generated for $svc"
|
||||
fi
|
||||
done
|
||||
when:
|
||||
- event: tag
|
||||
- event: manual
|
||||
|
||||
vulnerability-scan:
|
||||
image: *golang_image
|
||||
commands:
|
||||
- |
|
||||
echo "Installing grype for ARM64..."
|
||||
wget -qO- https://raw.githubusercontent.com/anchore/grype/main/install.sh | sh -s -- -b /usr/local/bin
|
||||
for f in sbom-*.json; do
|
||||
[ -f "$f" ] || continue
|
||||
echo "=== Scanning $f ==="
|
||||
grype sbom:"$f" -o table --fail-on critical || true
|
||||
done
|
||||
when:
|
||||
- event: tag
|
||||
- event: manual
|
||||
depends_on:
|
||||
- generate-sbom
|
||||
|
||||
# ========================================
|
||||
# STAGE 5: Deploy (nur manuell)
|
||||
# ========================================
|
||||
|
||||
deploy-production:
|
||||
image: *docker_image
|
||||
commands:
|
||||
- echo "Deploying breakpilot-core to production..."
|
||||
- docker compose -f docker-compose.yml pull || true
|
||||
- docker compose -f docker-compose.yml up -d --remove-orphans || true
|
||||
when:
|
||||
event: manual
|
||||
depends_on:
|
||||
- build-consent-service
|
||||
- build-backend-core
|
||||
- build-admin-core
|
||||
- build-voice-service
|
||||
- build-embedding-service
|
||||
- build-night-scheduler
|
||||
@@ -18,6 +18,9 @@ ARG NEXT_PUBLIC_API_URL
|
||||
# Set environment variables for build
|
||||
ENV NEXT_PUBLIC_API_URL=$NEXT_PUBLIC_API_URL
|
||||
|
||||
# Ensure public directory exists
|
||||
RUN mkdir -p public
|
||||
|
||||
# Build the application
|
||||
RUN npm run build
|
||||
|
||||
@@ -30,8 +33,8 @@ WORKDIR /app
|
||||
ENV NODE_ENV=production
|
||||
|
||||
# Create non-root user
|
||||
RUN addgroup --system --gid 1001 nodejs
|
||||
RUN adduser --system --uid 1001 nextjs
|
||||
RUN addgroup -S -g 1001 nodejs
|
||||
RUN adduser -S -u 1001 -G nodejs nextjs
|
||||
|
||||
# Copy built assets
|
||||
COPY --from=builder /app/public ./public
|
||||
|
||||
@@ -1,912 +0,0 @@
|
||||
'use client'
|
||||
|
||||
/**
|
||||
* Alerts Monitoring Admin Page (migrated from website/admin/alerts)
|
||||
*
|
||||
* Google Alerts & Feed-Ueberwachung Dashboard
|
||||
* Provides inbox management, topic configuration, rule builder, and relevance profiles
|
||||
*/
|
||||
|
||||
import { useEffect, useState, useCallback } from 'react'
|
||||
import { PagePurpose } from '@/components/common/PagePurpose'
|
||||
|
||||
// Types
|
||||
interface AlertItem {
|
||||
id: string
|
||||
title: string
|
||||
url: string
|
||||
snippet: string
|
||||
topic_name: string
|
||||
relevance_score: number | null
|
||||
relevance_decision: string | null
|
||||
status: string
|
||||
fetched_at: string
|
||||
published_at: string | null
|
||||
matched_rule: string | null
|
||||
tags: string[]
|
||||
}
|
||||
|
||||
interface Topic {
|
||||
id: string
|
||||
name: string
|
||||
feed_url: string
|
||||
feed_type: string
|
||||
is_active: boolean
|
||||
fetch_interval_minutes: number
|
||||
last_fetched_at: string | null
|
||||
alert_count: number
|
||||
}
|
||||
|
||||
interface Rule {
|
||||
id: string
|
||||
name: string
|
||||
topic_id: string | null
|
||||
conditions: Array<{
|
||||
field: string
|
||||
operator: string
|
||||
value: string | number
|
||||
}>
|
||||
action_type: string
|
||||
action_config: Record<string, unknown>
|
||||
priority: number
|
||||
is_active: boolean
|
||||
}
|
||||
|
||||
interface Profile {
|
||||
priorities: string[]
|
||||
exclusions: string[]
|
||||
positive_examples: Array<{ title: string; url: string }>
|
||||
negative_examples: Array<{ title: string; url: string }>
|
||||
policies: {
|
||||
keep_threshold: number
|
||||
drop_threshold: number
|
||||
}
|
||||
}
|
||||
|
||||
interface Stats {
|
||||
total_alerts: number
|
||||
new_alerts: number
|
||||
kept_alerts: number
|
||||
review_alerts: number
|
||||
dropped_alerts: number
|
||||
total_topics: number
|
||||
active_topics: number
|
||||
total_rules: number
|
||||
}
|
||||
|
||||
// Tab type
|
||||
type TabId = 'dashboard' | 'inbox' | 'topics' | 'rules' | 'profile' | 'audit' | 'documentation'
|
||||
|
||||
export default function AlertsPage() {
|
||||
const [activeTab, setActiveTab] = useState<TabId>('dashboard')
|
||||
const [stats, setStats] = useState<Stats | null>(null)
|
||||
const [alerts, setAlerts] = useState<AlertItem[]>([])
|
||||
const [topics, setTopics] = useState<Topic[]>([])
|
||||
const [rules, setRules] = useState<Rule[]>([])
|
||||
const [profile, setProfile] = useState<Profile | null>(null)
|
||||
const [loading, setLoading] = useState(true)
|
||||
const [error, setError] = useState<string | null>(null)
|
||||
const [inboxFilter, setInboxFilter] = useState<string>('all')
|
||||
|
||||
const API_BASE = '/api/alerts'
|
||||
|
||||
const fetchData = useCallback(async () => {
|
||||
try {
|
||||
const [statsRes, alertsRes, topicsRes, rulesRes, profileRes] = await Promise.all([
|
||||
fetch(`${API_BASE}/stats`),
|
||||
fetch(`${API_BASE}/inbox?limit=50`),
|
||||
fetch(`${API_BASE}/topics`),
|
||||
fetch(`${API_BASE}/rules`),
|
||||
fetch(`${API_BASE}/profile`),
|
||||
])
|
||||
|
||||
if (statsRes.ok) setStats(await statsRes.json())
|
||||
if (alertsRes.ok) {
|
||||
const data = await alertsRes.json()
|
||||
setAlerts(data.items || [])
|
||||
}
|
||||
if (topicsRes.ok) {
|
||||
const data = await topicsRes.json()
|
||||
setTopics(data.topics || data.items || [])
|
||||
}
|
||||
if (rulesRes.ok) {
|
||||
const data = await rulesRes.json()
|
||||
setRules(data.rules || data.items || [])
|
||||
}
|
||||
if (profileRes.ok) setProfile(await profileRes.json())
|
||||
|
||||
setError(null)
|
||||
} catch (err) {
|
||||
setError(err instanceof Error ? err.message : 'Verbindungsfehler')
|
||||
// Set demo data
|
||||
setStats({
|
||||
total_alerts: 147,
|
||||
new_alerts: 23,
|
||||
kept_alerts: 89,
|
||||
review_alerts: 12,
|
||||
dropped_alerts: 23,
|
||||
total_topics: 5,
|
||||
active_topics: 4,
|
||||
total_rules: 8,
|
||||
})
|
||||
setAlerts([
|
||||
{
|
||||
id: 'demo_1',
|
||||
title: 'Neue Studie zur digitalen Bildung an Schulen',
|
||||
url: 'https://example.com/artikel1',
|
||||
snippet: 'Eine aktuelle Studie zeigt, dass digitale Lernmittel den Lernerfolg steigern koennen...',
|
||||
topic_name: 'Digitale Bildung',
|
||||
relevance_score: 0.85,
|
||||
relevance_decision: 'KEEP',
|
||||
status: 'new',
|
||||
fetched_at: new Date().toISOString(),
|
||||
published_at: null,
|
||||
matched_rule: null,
|
||||
tags: ['bildung', 'digital'],
|
||||
},
|
||||
{
|
||||
id: 'demo_2',
|
||||
title: 'Inklusion: Fortbildungen fuer Lehrkraefte',
|
||||
url: 'https://example.com/artikel2',
|
||||
snippet: 'Das Kultusministerium bietet neue Fortbildungsangebote zum Thema Inklusion an...',
|
||||
topic_name: 'Inklusion',
|
||||
relevance_score: 0.72,
|
||||
relevance_decision: 'KEEP',
|
||||
status: 'new',
|
||||
fetched_at: new Date(Date.now() - 3600000).toISOString(),
|
||||
published_at: null,
|
||||
matched_rule: null,
|
||||
tags: ['inklusion'],
|
||||
},
|
||||
])
|
||||
setTopics([
|
||||
{
|
||||
id: 'topic_1',
|
||||
name: 'Digitale Bildung',
|
||||
feed_url: 'https://google.com/alerts/feeds/123',
|
||||
feed_type: 'rss',
|
||||
is_active: true,
|
||||
fetch_interval_minutes: 60,
|
||||
last_fetched_at: new Date().toISOString(),
|
||||
alert_count: 47,
|
||||
},
|
||||
{
|
||||
id: 'topic_2',
|
||||
name: 'Inklusion',
|
||||
feed_url: 'https://google.com/alerts/feeds/456',
|
||||
feed_type: 'rss',
|
||||
is_active: true,
|
||||
fetch_interval_minutes: 60,
|
||||
last_fetched_at: new Date(Date.now() - 1800000).toISOString(),
|
||||
alert_count: 32,
|
||||
},
|
||||
])
|
||||
setRules([
|
||||
{
|
||||
id: 'rule_1',
|
||||
name: 'Stellenanzeigen ausschliessen',
|
||||
topic_id: null,
|
||||
conditions: [{ field: 'title', operator: 'contains', value: 'Stellenangebot' }],
|
||||
action_type: 'drop',
|
||||
action_config: {},
|
||||
priority: 10,
|
||||
is_active: true,
|
||||
},
|
||||
])
|
||||
setProfile({
|
||||
priorities: ['Inklusion', 'digitale Bildung'],
|
||||
exclusions: ['Stellenanzeigen', 'Werbung'],
|
||||
positive_examples: [],
|
||||
negative_examples: [],
|
||||
policies: { keep_threshold: 0.7, drop_threshold: 0.3 },
|
||||
})
|
||||
} finally {
|
||||
setLoading(false)
|
||||
}
|
||||
}, [])
|
||||
|
||||
useEffect(() => {
|
||||
fetchData()
|
||||
}, [fetchData])
|
||||
|
||||
const formatTimeAgo = (dateStr: string | null) => {
|
||||
if (!dateStr) return '-'
|
||||
const date = new Date(dateStr)
|
||||
const now = new Date()
|
||||
const diffMs = now.getTime() - date.getTime()
|
||||
const diffMins = Math.floor(diffMs / 60000)
|
||||
|
||||
if (diffMins < 1) return 'gerade eben'
|
||||
if (diffMins < 60) return `vor ${diffMins} Min.`
|
||||
if (diffMins < 1440) return `vor ${Math.floor(diffMins / 60)} Std.`
|
||||
return `vor ${Math.floor(diffMins / 1440)} Tagen`
|
||||
}
|
||||
|
||||
const getScoreBadge = (score: number | null) => {
|
||||
if (score === null) return null
|
||||
const pct = Math.round(score * 100)
|
||||
let cls = 'bg-slate-100 text-slate-600'
|
||||
if (pct >= 70) cls = 'bg-green-100 text-green-800'
|
||||
else if (pct >= 40) cls = 'bg-amber-100 text-amber-800'
|
||||
else cls = 'bg-red-100 text-red-800'
|
||||
return <span className={`px-2 py-0.5 rounded text-xs font-semibold ${cls}`}>{pct}%</span>
|
||||
}
|
||||
|
||||
const getDecisionBadge = (decision: string | null) => {
|
||||
if (!decision) return null
|
||||
const styles: Record<string, string> = {
|
||||
KEEP: 'bg-green-100 text-green-800',
|
||||
REVIEW: 'bg-amber-100 text-amber-800',
|
||||
DROP: 'bg-red-100 text-red-800',
|
||||
}
|
||||
return (
|
||||
<span className={`px-2 py-0.5 rounded text-xs font-semibold uppercase ${styles[decision] || 'bg-slate-100'}`}>
|
||||
{decision}
|
||||
</span>
|
||||
)
|
||||
}
|
||||
|
||||
const filteredAlerts = alerts.filter((alert) => {
|
||||
if (inboxFilter === 'all') return true
|
||||
if (inboxFilter === 'new') return alert.status === 'new'
|
||||
if (inboxFilter === 'keep') return alert.relevance_decision === 'KEEP'
|
||||
if (inboxFilter === 'review') return alert.relevance_decision === 'REVIEW'
|
||||
return true
|
||||
})
|
||||
|
||||
const tabs: { id: TabId; label: string; badge?: number }[] = [
|
||||
{ id: 'dashboard', label: 'Dashboard' },
|
||||
{ id: 'inbox', label: 'Inbox', badge: stats?.new_alerts || 0 },
|
||||
{ id: 'topics', label: 'Topics' },
|
||||
{ id: 'rules', label: 'Regeln' },
|
||||
{ id: 'profile', label: 'Profil' },
|
||||
{ id: 'audit', label: 'Audit' },
|
||||
{ id: 'documentation', label: 'Dokumentation' },
|
||||
]
|
||||
|
||||
if (loading) {
|
||||
return (
|
||||
<div className="flex items-center justify-center h-64">
|
||||
<div className="animate-spin rounded-full h-8 w-8 border-b-2 border-green-600" />
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
return (
|
||||
<div>
|
||||
{/* Page Purpose */}
|
||||
<PagePurpose
|
||||
title="Alerts Monitoring"
|
||||
purpose="Google Alerts & Feed-Ueberwachung mit KI-gestuetzter Relevanzpruefung. Verwalten Sie Topics, konfigurieren Sie Filterregeln und nutzen Sie LLM-basiertes Scoring fuer automatische Kategorisierung."
|
||||
audience={['Marketing', 'Admins', 'DSB']}
|
||||
architecture={{
|
||||
services: ['backend (FastAPI)', 'APScheduler', 'LLM Gateway'],
|
||||
databases: ['PostgreSQL', 'Valkey Cache'],
|
||||
}}
|
||||
relatedPages={[
|
||||
{ name: 'Unified Inbox', href: '/communication/mail', description: 'E-Mail-Konten verwalten' },
|
||||
{ name: 'Voice Service', href: '/communication/matrix', description: 'Voice-First Interface' },
|
||||
]}
|
||||
collapsible={true}
|
||||
defaultCollapsed={false}
|
||||
/>
|
||||
|
||||
{/* Stats Overview */}
|
||||
<div className="grid grid-cols-2 md:grid-cols-4 gap-4 mb-6">
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-4 shadow-sm">
|
||||
<div className="text-3xl font-bold text-slate-900">{stats?.total_alerts || 0}</div>
|
||||
<div className="text-sm text-slate-500">Alerts gesamt</div>
|
||||
</div>
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-4 shadow-sm">
|
||||
<div className="text-3xl font-bold text-blue-600">{stats?.new_alerts || 0}</div>
|
||||
<div className="text-sm text-slate-500">Neue Alerts</div>
|
||||
</div>
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-4 shadow-sm">
|
||||
<div className="text-3xl font-bold text-green-600">{stats?.kept_alerts || 0}</div>
|
||||
<div className="text-sm text-slate-500">Relevant</div>
|
||||
</div>
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-4 shadow-sm">
|
||||
<div className="text-3xl font-bold text-amber-600">{stats?.review_alerts || 0}</div>
|
||||
<div className="text-sm text-slate-500">Zur Pruefung</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Tab Navigation */}
|
||||
<div className="bg-white rounded-lg shadow mb-6">
|
||||
<div className="border-b border-slate-200 px-4">
|
||||
<nav className="flex gap-4 overflow-x-auto">
|
||||
{tabs.map((tab) => (
|
||||
<button
|
||||
key={tab.id}
|
||||
onClick={() => setActiveTab(tab.id)}
|
||||
className={`pb-3 pt-4 px-1 text-sm font-medium border-b-2 transition-colors flex items-center gap-2 whitespace-nowrap ${
|
||||
activeTab === tab.id
|
||||
? 'border-green-600 text-green-600'
|
||||
: 'border-transparent text-slate-500 hover:text-slate-700'
|
||||
}`}
|
||||
>
|
||||
{tab.label}
|
||||
{tab.badge !== undefined && tab.badge > 0 && (
|
||||
<span className="px-2 py-0.5 rounded-full text-xs font-semibold bg-red-500 text-white">
|
||||
{tab.badge}
|
||||
</span>
|
||||
)}
|
||||
</button>
|
||||
))}
|
||||
</nav>
|
||||
</div>
|
||||
|
||||
<div className="p-6">
|
||||
{/* Dashboard Tab */}
|
||||
{activeTab === 'dashboard' && (
|
||||
<div className="space-y-6">
|
||||
{/* Quick Actions */}
|
||||
<div className="grid grid-cols-1 md:grid-cols-2 gap-6">
|
||||
<div className="bg-slate-50 rounded-xl p-6">
|
||||
<h3 className="font-semibold text-slate-900 mb-4">Aktive Topics</h3>
|
||||
<div className="space-y-3">
|
||||
{topics.slice(0, 5).map((topic) => (
|
||||
<div key={topic.id} className="flex items-center justify-between p-3 bg-white rounded-lg border border-slate-200">
|
||||
<div>
|
||||
<div className="font-medium text-slate-900">{topic.name}</div>
|
||||
<div className="text-xs text-slate-500">{topic.alert_count} Alerts</div>
|
||||
</div>
|
||||
<span className={`px-2 py-1 rounded text-xs font-semibold ${topic.is_active ? 'bg-green-100 text-green-800' : 'bg-slate-100 text-slate-600'}`}>
|
||||
{topic.is_active ? 'Aktiv' : 'Pausiert'}
|
||||
</span>
|
||||
</div>
|
||||
))}
|
||||
{topics.length === 0 && (
|
||||
<div className="text-sm text-slate-500 text-center py-4">Keine Topics konfiguriert</div>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="bg-slate-50 rounded-xl p-6">
|
||||
<h3 className="font-semibold text-slate-900 mb-4">Letzte Alerts</h3>
|
||||
<div className="space-y-3">
|
||||
{alerts.slice(0, 5).map((alert) => (
|
||||
<div key={alert.id} className="p-3 bg-white rounded-lg border border-slate-200">
|
||||
<div className="font-medium text-slate-900 text-sm truncate">{alert.title}</div>
|
||||
<div className="flex items-center gap-2 mt-1">
|
||||
<span className="text-xs text-slate-500">{alert.topic_name}</span>
|
||||
{getScoreBadge(alert.relevance_score)}
|
||||
</div>
|
||||
</div>
|
||||
))}
|
||||
{alerts.length === 0 && (
|
||||
<div className="text-sm text-slate-500 text-center py-4">Keine Alerts vorhanden</div>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{error && (
|
||||
<div className="bg-amber-50 border border-amber-200 rounded-lg p-4">
|
||||
<p className="text-sm text-amber-800">
|
||||
<strong>Hinweis:</strong> API nicht erreichbar. Demo-Daten werden angezeigt.
|
||||
</p>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Inbox Tab */}
|
||||
{activeTab === 'inbox' && (
|
||||
<div className="space-y-4">
|
||||
{/* Filters */}
|
||||
<div className="flex gap-2 flex-wrap">
|
||||
{['all', 'new', 'keep', 'review'].map((filter) => (
|
||||
<button
|
||||
key={filter}
|
||||
onClick={() => setInboxFilter(filter)}
|
||||
className={`px-4 py-2 rounded-full text-sm font-medium transition-colors ${
|
||||
inboxFilter === filter
|
||||
? 'bg-green-600 text-white'
|
||||
: 'bg-slate-100 text-slate-600 hover:bg-slate-200'
|
||||
}`}
|
||||
>
|
||||
{filter === 'all' && 'Alle'}
|
||||
{filter === 'new' && 'Neu'}
|
||||
{filter === 'keep' && 'Relevant'}
|
||||
{filter === 'review' && 'Pruefung'}
|
||||
</button>
|
||||
))}
|
||||
</div>
|
||||
|
||||
{/* Alerts Table */}
|
||||
<div className="bg-white rounded-xl border border-slate-200 overflow-hidden">
|
||||
<table className="w-full">
|
||||
<thead className="bg-slate-50 border-b border-slate-200">
|
||||
<tr>
|
||||
<th className="text-left p-4 text-xs font-semibold text-slate-500 uppercase">Alert</th>
|
||||
<th className="text-left p-4 text-xs font-semibold text-slate-500 uppercase">Topic</th>
|
||||
<th className="text-left p-4 text-xs font-semibold text-slate-500 uppercase">Score</th>
|
||||
<th className="text-left p-4 text-xs font-semibold text-slate-500 uppercase">Decision</th>
|
||||
<th className="text-left p-4 text-xs font-semibold text-slate-500 uppercase">Zeit</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody className="divide-y divide-slate-100">
|
||||
{filteredAlerts.map((alert) => (
|
||||
<tr key={alert.id} className="hover:bg-slate-50">
|
||||
<td className="p-4">
|
||||
<a href={alert.url} target="_blank" rel="noopener noreferrer" className="font-medium text-slate-900 hover:text-green-600">
|
||||
{alert.title}
|
||||
</a>
|
||||
<p className="text-sm text-slate-500 truncate max-w-md">{alert.snippet}</p>
|
||||
</td>
|
||||
<td className="p-4 text-sm text-slate-600">{alert.topic_name}</td>
|
||||
<td className="p-4">{getScoreBadge(alert.relevance_score)}</td>
|
||||
<td className="p-4">{getDecisionBadge(alert.relevance_decision)}</td>
|
||||
<td className="p-4 text-sm text-slate-500">{formatTimeAgo(alert.fetched_at)}</td>
|
||||
</tr>
|
||||
))}
|
||||
{filteredAlerts.length === 0 && (
|
||||
<tr>
|
||||
<td colSpan={5} className="p-8 text-center text-slate-500">
|
||||
Keine Alerts gefunden
|
||||
</td>
|
||||
</tr>
|
||||
)}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Topics Tab */}
|
||||
{activeTab === 'topics' && (
|
||||
<div className="space-y-4">
|
||||
<div className="flex justify-between items-center">
|
||||
<h3 className="font-semibold text-slate-900">Feed Topics</h3>
|
||||
<button className="px-4 py-2 bg-green-600 text-white rounded-lg text-sm font-medium hover:bg-green-700">
|
||||
+ Topic hinzufuegen
|
||||
</button>
|
||||
</div>
|
||||
|
||||
<div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4">
|
||||
{topics.map((topic) => (
|
||||
<div key={topic.id} className="bg-white rounded-xl border border-slate-200 p-4">
|
||||
<div className="flex justify-between items-start mb-3">
|
||||
<div className="w-10 h-10 bg-green-100 rounded-lg flex items-center justify-center">
|
||||
<svg className="w-5 h-5 text-green-600" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M6 5c7.18 0 13 5.82 13 13M6 11a7 7 0 017 7m-6 0a1 1 0 11-2 0 1 1 0 012 0z" />
|
||||
</svg>
|
||||
</div>
|
||||
<span className={`px-2 py-1 rounded text-xs font-semibold ${topic.is_active ? 'bg-green-100 text-green-800' : 'bg-slate-100 text-slate-600'}`}>
|
||||
{topic.is_active ? 'Aktiv' : 'Pausiert'}
|
||||
</span>
|
||||
</div>
|
||||
<h4 className="font-semibold text-slate-900">{topic.name}</h4>
|
||||
<p className="text-sm text-slate-500 truncate">{topic.feed_url}</p>
|
||||
<div className="flex justify-between items-center mt-4 pt-4 border-t border-slate-100">
|
||||
<div className="text-sm">
|
||||
<span className="font-semibold text-slate-900">{topic.alert_count}</span>
|
||||
<span className="text-slate-500"> Alerts</span>
|
||||
</div>
|
||||
<div className="text-xs text-slate-500">
|
||||
{formatTimeAgo(topic.last_fetched_at)}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
))}
|
||||
{topics.length === 0 && (
|
||||
<div className="col-span-full text-center py-8 text-slate-500">
|
||||
Keine Topics konfiguriert
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Rules Tab */}
|
||||
{activeTab === 'rules' && (
|
||||
<div className="space-y-4">
|
||||
<div className="flex justify-between items-center">
|
||||
<h3 className="font-semibold text-slate-900">Filterregeln</h3>
|
||||
<button className="px-4 py-2 bg-green-600 text-white rounded-lg text-sm font-medium hover:bg-green-700">
|
||||
+ Regel erstellen
|
||||
</button>
|
||||
</div>
|
||||
|
||||
<div className="bg-white rounded-xl border border-slate-200 divide-y divide-slate-100">
|
||||
{rules.map((rule) => (
|
||||
<div key={rule.id} className="p-4 flex items-center gap-4">
|
||||
<div className="text-slate-400 cursor-grab">
|
||||
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M4 6h16M4 12h16M4 18h16" />
|
||||
</svg>
|
||||
</div>
|
||||
<div className="flex-1">
|
||||
<div className="font-medium text-slate-900">{rule.name}</div>
|
||||
<div className="text-sm text-slate-500">
|
||||
Wenn: {rule.conditions[0]?.field} {rule.conditions[0]?.operator} "{rule.conditions[0]?.value}"
|
||||
</div>
|
||||
</div>
|
||||
<span className={`px-3 py-1 rounded text-xs font-semibold uppercase ${
|
||||
rule.action_type === 'keep' ? 'bg-green-100 text-green-800' :
|
||||
rule.action_type === 'drop' ? 'bg-red-100 text-red-800' :
|
||||
rule.action_type === 'email' ? 'bg-blue-100 text-blue-800' :
|
||||
'bg-purple-100 text-purple-800'
|
||||
}`}>
|
||||
{rule.action_type}
|
||||
</span>
|
||||
<div
|
||||
className={`w-12 h-6 rounded-full relative cursor-pointer transition-colors ${
|
||||
rule.is_active ? 'bg-green-500' : 'bg-slate-300'
|
||||
}`}
|
||||
>
|
||||
<div
|
||||
className={`absolute w-5 h-5 bg-white rounded-full top-0.5 transition-all shadow ${
|
||||
rule.is_active ? 'left-6' : 'left-0.5'
|
||||
}`}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
))}
|
||||
{rules.length === 0 && (
|
||||
<div className="p-8 text-center text-slate-500">
|
||||
Keine Regeln konfiguriert
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Profile Tab */}
|
||||
{activeTab === 'profile' && (
|
||||
<div className="max-w-2xl space-y-6">
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-6">
|
||||
<h3 className="font-semibold text-slate-900 mb-4">Relevanzprofil</h3>
|
||||
|
||||
<div className="space-y-4">
|
||||
<div>
|
||||
<label className="block text-sm font-medium text-slate-700 mb-2">
|
||||
Prioritaeten (wichtige Themen)
|
||||
</label>
|
||||
<textarea
|
||||
className="w-full p-3 border border-slate-200 rounded-lg text-sm focus:ring-2 focus:ring-green-500 focus:border-green-500"
|
||||
rows={4}
|
||||
defaultValue={profile?.priorities?.join('\n') || ''}
|
||||
placeholder="Ein Thema pro Zeile..."
|
||||
/>
|
||||
<p className="text-xs text-slate-500 mt-1">Alerts zu diesen Themen werden hoeher bewertet.</p>
|
||||
</div>
|
||||
|
||||
<div>
|
||||
<label className="block text-sm font-medium text-slate-700 mb-2">
|
||||
Ausschluesse (unerwuenschte Themen)
|
||||
</label>
|
||||
<textarea
|
||||
className="w-full p-3 border border-slate-200 rounded-lg text-sm focus:ring-2 focus:ring-green-500 focus:border-green-500"
|
||||
rows={4}
|
||||
defaultValue={profile?.exclusions?.join('\n') || ''}
|
||||
placeholder="Ein Thema pro Zeile..."
|
||||
/>
|
||||
<p className="text-xs text-slate-500 mt-1">Alerts zu diesen Themen werden niedriger bewertet.</p>
|
||||
</div>
|
||||
|
||||
<div className="grid grid-cols-2 gap-4">
|
||||
<div>
|
||||
<label className="block text-sm font-medium text-slate-700 mb-2">
|
||||
Schwellenwert KEEP
|
||||
</label>
|
||||
<select
|
||||
className="w-full p-3 border border-slate-200 rounded-lg text-sm focus:ring-2 focus:ring-green-500 focus:border-green-500"
|
||||
defaultValue={profile?.policies?.keep_threshold || 0.7}
|
||||
>
|
||||
<option value={0.8}>80% (sehr streng)</option>
|
||||
<option value={0.7}>70% (empfohlen)</option>
|
||||
<option value={0.6}>60% (weniger streng)</option>
|
||||
</select>
|
||||
</div>
|
||||
<div>
|
||||
<label className="block text-sm font-medium text-slate-700 mb-2">
|
||||
Schwellenwert DROP
|
||||
</label>
|
||||
<select
|
||||
className="w-full p-3 border border-slate-200 rounded-lg text-sm focus:ring-2 focus:ring-green-500 focus:border-green-500"
|
||||
defaultValue={profile?.policies?.drop_threshold || 0.3}
|
||||
>
|
||||
<option value={0.4}>40% (strenger)</option>
|
||||
<option value={0.3}>30% (empfohlen)</option>
|
||||
<option value={0.2}>20% (lockerer)</option>
|
||||
</select>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<button className="px-4 py-2 bg-green-600 text-white rounded-lg text-sm font-medium hover:bg-green-700">
|
||||
Profil speichern
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Audit Tab */}
|
||||
{activeTab === 'audit' && (
|
||||
<div className="space-y-6">
|
||||
<h3 className="text-sm font-semibold text-slate-700 uppercase tracking-wide">Audit-relevante Informationen</h3>
|
||||
|
||||
<div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4">
|
||||
{/* Database Info */}
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-4">
|
||||
<h4 className="font-semibold text-slate-900 mb-3 flex items-center gap-2">
|
||||
<svg className="w-5 h-5 text-green-600" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M4 7v10c0 2.21 3.582 4 8 4s8-1.79 8-4V7M4 7c0 2.21 3.582 4 8 4s8-1.79 8-4M4 7c0-2.21 3.582-4 8-4s8 1.79 8 4" />
|
||||
</svg>
|
||||
Datenbank
|
||||
</h4>
|
||||
<div className="space-y-2">
|
||||
<div className="flex items-center justify-between py-2 border-b border-slate-100">
|
||||
<span className="text-sm text-slate-600">Tabellen</span>
|
||||
<span className="text-sm font-medium bg-green-100 text-green-800 px-2 py-0.5 rounded">4 (topics, items, rules, profiles)</span>
|
||||
</div>
|
||||
<div className="flex items-center justify-between py-2 border-b border-slate-100">
|
||||
<span className="text-sm text-slate-600">Indizes</span>
|
||||
<span className="text-sm font-medium bg-green-100 text-green-800 px-2 py-0.5 rounded">URL-Hash, Topic-ID, Status</span>
|
||||
</div>
|
||||
<div className="flex items-center justify-between py-2">
|
||||
<span className="text-sm text-slate-600">Backups</span>
|
||||
<span className="text-sm font-medium bg-green-100 text-green-800 px-2 py-0.5 rounded">PostgreSQL pg_dump</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* API Security */}
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-4">
|
||||
<h4 className="font-semibold text-slate-900 mb-3 flex items-center gap-2">
|
||||
<svg className="w-5 h-5 text-green-600" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 15v2m-6 4h12a2 2 0 002-2v-6a2 2 0 00-2-2H6a2 2 0 00-2 2v6a2 2 0 002 2zm10-10V7a4 4 0 00-8 0v4h8z" />
|
||||
</svg>
|
||||
API Sicherheit
|
||||
</h4>
|
||||
<div className="space-y-2">
|
||||
<div className="flex items-center justify-between py-2 border-b border-slate-100">
|
||||
<span className="text-sm text-slate-600">Authentifizierung</span>
|
||||
<span className="text-sm font-medium bg-amber-100 text-amber-800 px-2 py-0.5 rounded">Bearer Token (geplant)</span>
|
||||
</div>
|
||||
<div className="flex items-center justify-between py-2 border-b border-slate-100">
|
||||
<span className="text-sm text-slate-600">Rate Limiting</span>
|
||||
<span className="text-sm font-medium bg-amber-100 text-amber-800 px-2 py-0.5 rounded">Nicht implementiert</span>
|
||||
</div>
|
||||
<div className="flex items-center justify-between py-2">
|
||||
<span className="text-sm text-slate-600">Input Validation</span>
|
||||
<span className="text-sm font-medium bg-green-100 text-green-800 px-2 py-0.5 rounded">Pydantic Models</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Logging */}
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-4">
|
||||
<h4 className="font-semibold text-slate-900 mb-3 flex items-center gap-2">
|
||||
<svg className="w-5 h-5 text-green-600" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 17v-2m3 2v-4m3 4v-6m2 10H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z" />
|
||||
</svg>
|
||||
Logging & Monitoring
|
||||
</h4>
|
||||
<div className="space-y-2">
|
||||
<div className="flex items-center justify-between py-2 border-b border-slate-100">
|
||||
<span className="text-sm text-slate-600">Structured Logging</span>
|
||||
<span className="text-sm font-medium bg-green-100 text-green-800 px-2 py-0.5 rounded">Python logging</span>
|
||||
</div>
|
||||
<div className="flex items-center justify-between py-2 border-b border-slate-100">
|
||||
<span className="text-sm text-slate-600">Metriken</span>
|
||||
<span className="text-sm font-medium bg-green-100 text-green-800 px-2 py-0.5 rounded">Stats Endpoint</span>
|
||||
</div>
|
||||
<div className="flex items-center justify-between py-2">
|
||||
<span className="text-sm text-slate-600">Health Checks</span>
|
||||
<span className="text-sm font-medium bg-green-100 text-green-800 px-2 py-0.5 rounded">/api/alerts/health</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Privacy Notes */}
|
||||
<div className="bg-blue-50 border border-blue-200 rounded-lg p-4">
|
||||
<h4 className="text-sm font-semibold text-blue-800 mb-2">Datenschutz-Hinweise</h4>
|
||||
<ul className="space-y-1">
|
||||
<li className="text-sm text-blue-700 flex items-start gap-2">
|
||||
<svg className="w-4 h-4 mt-0.5 text-blue-500 flex-shrink-0" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
|
||||
</svg>
|
||||
Alle Daten werden in Deutschland gespeichert (PostgreSQL)
|
||||
</li>
|
||||
<li className="text-sm text-blue-700 flex items-start gap-2">
|
||||
<svg className="w-4 h-4 mt-0.5 text-blue-500 flex-shrink-0" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
|
||||
</svg>
|
||||
Keine personenbezogenen Daten in Alerts (nur URLs und Snippets)
|
||||
</li>
|
||||
<li className="text-sm text-blue-700 flex items-start gap-2">
|
||||
<svg className="w-4 h-4 mt-0.5 text-blue-500 flex-shrink-0" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
|
||||
</svg>
|
||||
LLM-Verarbeitung kann on-premise mit Ollama/vLLM erfolgen
|
||||
</li>
|
||||
<li className="text-sm text-blue-700 flex items-start gap-2">
|
||||
<svg className="w-4 h-4 mt-0.5 text-blue-500 flex-shrink-0" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
|
||||
</svg>
|
||||
DSGVO-konforme Datenverarbeitung
|
||||
</li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Documentation Tab */}
|
||||
{activeTab === 'documentation' && (
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-6 overflow-auto max-h-[calc(100vh-350px)]">
|
||||
<div className="prose prose-slate max-w-none prose-headings:font-semibold prose-h1:text-2xl prose-h2:text-xl prose-h3:text-lg">
|
||||
{/* Header */}
|
||||
<div className="not-prose mb-8 pb-6 border-b border-slate-200">
|
||||
<h1 className="text-2xl font-bold text-slate-900">BreakPilot Alerts Agent</h1>
|
||||
<p className="text-sm text-slate-500 mt-1">Version: 1.0.0 | Stand: Januar 2026 | Autor: BreakPilot Development Team</p>
|
||||
</div>
|
||||
|
||||
{/* Audit Box */}
|
||||
<div className="not-prose bg-blue-50 border border-blue-200 rounded-lg p-4 mb-6">
|
||||
<h3 className="font-semibold text-blue-900 mb-2">Audit-Relevante Informationen</h3>
|
||||
<p className="text-sm text-blue-800">
|
||||
Dieses Dokument dient als technische Dokumentation fuer das Alert-Monitoring-System der BreakPilot Plattform.
|
||||
Es ist fuer Audits durch Bildungstraeger und Datenschutzbeauftragte konzipiert.
|
||||
</p>
|
||||
</div>
|
||||
|
||||
{/* Ziel des Systems */}
|
||||
<h2>Ziel des Alert-Systems</h2>
|
||||
<p>Das System ermoeglicht automatisierte Ueberwachung von Bildungsthemen mit:</p>
|
||||
<ul>
|
||||
<li><strong>Google Alerts Integration</strong>: RSS-Feeds von Google Alerts automatisch abrufen</li>
|
||||
<li><strong>RSS/Atom Feeds</strong>: Beliebige Nachrichtenquellen einbinden</li>
|
||||
<li><strong>KI-Relevanzpruefung</strong>: Automatische Bewertung der Relevanz durch LLM</li>
|
||||
<li><strong>Regelbasierte Filterung</strong>: Flexible Regeln fuer automatische Sortierung</li>
|
||||
<li><strong>Multi-Channel Actions</strong>: E-Mail, Webhook, Slack Benachrichtigungen</li>
|
||||
<li><strong>Few-Shot Learning</strong>: Profil verbessert sich durch Nutzerfeedback</li>
|
||||
</ul>
|
||||
|
||||
{/* Architecture Diagram */}
|
||||
<h2>Systemarchitektur</h2>
|
||||
<div className="not-prose bg-slate-900 rounded-lg p-4 overflow-x-auto">
|
||||
<pre className="text-green-400 text-xs">{`
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ BreakPilot Alerts Frontend │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────┐│
|
||||
│ │ Dashboard │ │ Inbox │ │ Topics │ │ Profile ││
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ └──────────┘│
|
||||
└───────────────────────────────┬─────────────────────────────────────┘
|
||||
│
|
||||
v
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Ingestion Layer │
|
||||
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
|
||||
│ │ RSS Fetcher │ │ Email Parser │ │ APScheduler │ │
|
||||
│ └───────┬────────┘ └───────┬────────┘ └───────┬────────┘ │
|
||||
│ └───────────────────┼───────────────────┘ │
|
||||
│ ┌──────────────────────────────────────────────────────┐ │
|
||||
│ │ Deduplication (URL-Hash + SimHash) │ │
|
||||
│ └──────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
v
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Processing Layer │
|
||||
│ ┌──────────────────────────────────────────────────────┐ │
|
||||
│ │ Rule Engine │ │
|
||||
│ └──────────────────────────────────────────────────────┘ │
|
||||
│ ┌──────────────────────────────────────────────────────┐ │
|
||||
│ │ LLM Relevance Scorer │ │
|
||||
│ │ Output: { score, decision: KEEP/DROP/REVIEW } │ │
|
||||
│ └──────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
v
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Action Layer │
|
||||
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
|
||||
│ │ Email Action │ │ Webhook Action │ │ Slack Action │ │
|
||||
│ └────────────────┘ └────────────────┘ └────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
v
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ Storage Layer │
|
||||
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
|
||||
│ │ PostgreSQL │ │ Valkey │ │ LLM Gateway │ │
|
||||
│ └────────────────┘ └────────────────┘ └────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘`}</pre>
|
||||
</div>
|
||||
|
||||
{/* API Endpoints */}
|
||||
<h2>API Endpoints</h2>
|
||||
<div className="not-prose overflow-x-auto">
|
||||
<table className="min-w-full text-sm border border-slate-200 rounded-lg">
|
||||
<thead className="bg-slate-50">
|
||||
<tr>
|
||||
<th className="px-4 py-2 text-left font-semibold text-slate-700 border-b">Endpoint</th>
|
||||
<th className="px-4 py-2 text-left font-semibold text-slate-700 border-b">Methode</th>
|
||||
<th className="px-4 py-2 text-left font-semibold text-slate-700 border-b">Beschreibung</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody className="divide-y divide-slate-200">
|
||||
<tr><td className="px-4 py-2 font-mono text-xs">/api/alerts/inbox</td><td className="px-4 py-2">GET</td><td className="px-4 py-2 text-slate-600">Inbox Items abrufen</td></tr>
|
||||
<tr><td className="px-4 py-2 font-mono text-xs">/api/alerts/ingest</td><td className="px-4 py-2">POST</td><td className="px-4 py-2 text-slate-600">Manuell Alert importieren</td></tr>
|
||||
<tr><td className="px-4 py-2 font-mono text-xs">/api/alerts/topics</td><td className="px-4 py-2">GET/POST</td><td className="px-4 py-2 text-slate-600">Topics verwalten</td></tr>
|
||||
<tr><td className="px-4 py-2 font-mono text-xs">/api/alerts/rules</td><td className="px-4 py-2">GET/POST</td><td className="px-4 py-2 text-slate-600">Regeln verwalten</td></tr>
|
||||
<tr><td className="px-4 py-2 font-mono text-xs">/api/alerts/profile</td><td className="px-4 py-2">GET/PUT</td><td className="px-4 py-2 text-slate-600">Profil abrufen/aktualisieren</td></tr>
|
||||
<tr><td className="px-4 py-2 font-mono text-xs">/api/alerts/stats</td><td className="px-4 py-2">GET</td><td className="px-4 py-2 text-slate-600">Statistiken abrufen</td></tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
|
||||
{/* Rule Engine */}
|
||||
<h2>Rule Engine - Operatoren</h2>
|
||||
<div className="not-prose overflow-x-auto">
|
||||
<table className="min-w-full text-sm border border-slate-200 rounded-lg">
|
||||
<thead className="bg-slate-50">
|
||||
<tr>
|
||||
<th className="px-4 py-2 text-left font-semibold text-slate-700 border-b">Operator</th>
|
||||
<th className="px-4 py-2 text-left font-semibold text-slate-700 border-b">Beschreibung</th>
|
||||
<th className="px-4 py-2 text-left font-semibold text-slate-700 border-b">Beispiel</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody className="divide-y divide-slate-200">
|
||||
<tr><td className="px-4 py-2 font-mono text-xs">contains</td><td className="px-4 py-2">Text enthaelt</td><td className="px-4 py-2 text-slate-600">title contains "Inklusion"</td></tr>
|
||||
<tr><td className="px-4 py-2 font-mono text-xs">not_contains</td><td className="px-4 py-2">Text enthaelt nicht</td><td className="px-4 py-2 text-slate-600">title not_contains "Werbung"</td></tr>
|
||||
<tr><td className="px-4 py-2 font-mono text-xs">equals</td><td className="px-4 py-2">Exakte Uebereinstimmung</td><td className="px-4 py-2 text-slate-600">status equals "new"</td></tr>
|
||||
<tr><td className="px-4 py-2 font-mono text-xs">regex</td><td className="px-4 py-2">Regulaerer Ausdruck</td><td className="px-4 py-2 text-slate-600">title regex "\d{4}"</td></tr>
|
||||
<tr><td className="px-4 py-2 font-mono text-xs">gt / lt</td><td className="px-4 py-2">Groesser/Kleiner</td><td className="px-4 py-2 text-slate-600">relevance_score gt 0.8</td></tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
|
||||
{/* Scoring */}
|
||||
<h2>LLM Relevanz-Scoring</h2>
|
||||
<div className="not-prose overflow-x-auto">
|
||||
<table className="min-w-full text-sm border border-slate-200 rounded-lg">
|
||||
<thead className="bg-slate-50">
|
||||
<tr>
|
||||
<th className="px-4 py-2 text-left font-semibold text-slate-700 border-b">Entscheidung</th>
|
||||
<th className="px-4 py-2 text-left font-semibold text-slate-700 border-b">Score-Bereich</th>
|
||||
<th className="px-4 py-2 text-left font-semibold text-slate-700 border-b">Bedeutung</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody className="divide-y divide-slate-200">
|
||||
<tr className="bg-green-50"><td className="px-4 py-2 font-semibold text-green-800">KEEP</td><td className="px-4 py-2">0.7 - 1.0</td><td className="px-4 py-2">Klar relevant, in Inbox anzeigen</td></tr>
|
||||
<tr className="bg-amber-50"><td className="px-4 py-2 font-semibold text-amber-800">REVIEW</td><td className="px-4 py-2">0.4 - 0.7</td><td className="px-4 py-2">Unsicher, Nutzer entscheidet</td></tr>
|
||||
<tr className="bg-red-50"><td className="px-4 py-2 font-semibold text-red-800">DROP</td><td className="px-4 py-2">0.0 - 0.4</td><td className="px-4 py-2">Irrelevant, automatisch archivieren</td></tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
|
||||
{/* Contact */}
|
||||
<h2>Kontakt & Support</h2>
|
||||
<div className="not-prose overflow-x-auto">
|
||||
<table className="min-w-full text-sm border border-slate-200 rounded-lg">
|
||||
<thead className="bg-slate-50">
|
||||
<tr>
|
||||
<th className="px-4 py-2 text-left font-semibold text-slate-700 border-b">Kontakt</th>
|
||||
<th className="px-4 py-2 text-left font-semibold text-slate-700 border-b">Adresse</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody className="divide-y divide-slate-200">
|
||||
<tr><td className="px-4 py-2">Technischer Support</td><td className="px-4 py-2">support@breakpilot.de</td></tr>
|
||||
<tr><td className="px-4 py-2">Datenschutzbeauftragter</td><td className="px-4 py-2">dsb@breakpilot.de</td></tr>
|
||||
<tr><td className="px-4 py-2">Dokumentation</td><td className="px-4 py-2">docs.breakpilot.de</td></tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
|
||||
{/* Footer */}
|
||||
<div className="not-prose mt-8 pt-6 border-t border-slate-200 text-sm text-slate-500">
|
||||
<p>Dokumentation erstellt: Januar 2026 | Version: 1.0.0</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -1,946 +0,0 @@
|
||||
'use client'
|
||||
|
||||
/**
|
||||
* Unified Inbox Mail Admin Page
|
||||
* Migrated from website/admin/mail to admin-v2/communication/mail
|
||||
*
|
||||
* Admin interface for managing email accounts, viewing system status,
|
||||
* and configuring AI analysis settings.
|
||||
*/
|
||||
|
||||
import { useState, useEffect, useCallback } from 'react'
|
||||
import Link from 'next/link'
|
||||
import { PagePurpose } from '@/components/common/PagePurpose'
|
||||
|
||||
// API Base URL for backend operations (accounts, sync, etc.)
|
||||
const API_BASE = process.env.NEXT_PUBLIC_KLAUSUR_SERVICE_URL || 'http://macmini:8086'
|
||||
|
||||
// Types
|
||||
interface EmailAccount {
|
||||
id: string
|
||||
email: string
|
||||
displayName: string
|
||||
imapHost: string
|
||||
imapPort: number
|
||||
smtpHost: string
|
||||
smtpPort: number
|
||||
status: 'active' | 'inactive' | 'error' | 'syncing'
|
||||
lastSync: string | null
|
||||
emailCount: number
|
||||
unreadCount: number
|
||||
createdAt: string
|
||||
}
|
||||
|
||||
interface MailStats {
|
||||
totalAccounts: number
|
||||
activeAccounts: number
|
||||
totalEmails: number
|
||||
unreadEmails: number
|
||||
totalTasks: number
|
||||
pendingTasks: number
|
||||
overdueTasks: number
|
||||
aiAnalyzedCount: number
|
||||
lastSyncTime: string | null
|
||||
}
|
||||
|
||||
interface SyncStatus {
|
||||
running: boolean
|
||||
accountsInProgress: string[]
|
||||
lastCompleted: string | null
|
||||
errors: string[]
|
||||
}
|
||||
|
||||
// Tab definitions
|
||||
type TabId = 'overview' | 'accounts' | 'ai-settings' | 'templates' | 'logs'
|
||||
|
||||
const tabs: { id: TabId; name: string }[] = [
|
||||
{ id: 'overview', name: 'Uebersicht' },
|
||||
{ id: 'accounts', name: 'Konten' },
|
||||
{ id: 'ai-settings', name: 'KI-Einstellungen' },
|
||||
{ id: 'templates', name: 'Vorlagen' },
|
||||
{ id: 'logs', name: 'Audit-Log' },
|
||||
]
|
||||
|
||||
// Main Component
|
||||
export default function MailAdminPage() {
|
||||
const [activeTab, setActiveTab] = useState<TabId>('overview')
|
||||
const [stats, setStats] = useState<MailStats | null>(null)
|
||||
const [accounts, setAccounts] = useState<EmailAccount[]>([])
|
||||
const [syncStatus, setSyncStatus] = useState<SyncStatus | null>(null)
|
||||
const [loading, setLoading] = useState(true)
|
||||
const [error, setError] = useState<string | null>(null)
|
||||
|
||||
const fetchData = useCallback(async () => {
|
||||
try {
|
||||
setLoading(true)
|
||||
|
||||
// Fetch stats via our proxy API (avoids CORS/mixed-content issues)
|
||||
const response = await fetch('/api/admin/mail')
|
||||
|
||||
if (response.ok) {
|
||||
const data = await response.json()
|
||||
setStats(data.stats)
|
||||
setAccounts(data.accounts)
|
||||
setSyncStatus(data.syncStatus)
|
||||
setError(null)
|
||||
} else {
|
||||
const errorData = await response.json().catch(() => ({}))
|
||||
throw new Error(errorData.details || `API returned ${response.status}`)
|
||||
}
|
||||
} catch (err) {
|
||||
console.error('Failed to fetch mail data:', err)
|
||||
setError('Verbindung zum Mail-Service (Mailpit) fehlgeschlagen. Laeuft Mailpit auf Port 8025?')
|
||||
} finally {
|
||||
setLoading(false)
|
||||
}
|
||||
}, [])
|
||||
|
||||
useEffect(() => {
|
||||
fetchData()
|
||||
|
||||
// Refresh every 10 seconds if syncing
|
||||
const interval = setInterval(() => {
|
||||
if (syncStatus?.running) {
|
||||
fetchData()
|
||||
}
|
||||
}, 10000)
|
||||
|
||||
return () => clearInterval(interval)
|
||||
}, [fetchData, syncStatus?.running])
|
||||
|
||||
return (
|
||||
<div>
|
||||
{/* Page Purpose */}
|
||||
<PagePurpose
|
||||
title="Unified Inbox"
|
||||
purpose="Verwalten Sie E-Mail-Konten, synchronisieren Sie Postfaecher und konfigurieren Sie die KI-gestuetzte E-Mail-Analyse fuer automatische Kategorisierung und Aufgabenerkennung."
|
||||
audience={['Admins', 'Schulleitung']}
|
||||
architecture={{
|
||||
services: ['Mailpit (Dev Mail Catcher)', 'IMAP/SMTP Server (Prod)'],
|
||||
databases: ['PostgreSQL', 'Vault (Credentials)'],
|
||||
}}
|
||||
relatedPages={[
|
||||
{ name: 'Mail Wizard', href: '/communication/mail/wizard', description: 'Interaktives Setup und Testing' },
|
||||
{ name: 'Voice Service', href: '/communication/matrix', description: 'Voice-First Interface' },
|
||||
]}
|
||||
collapsible={true}
|
||||
defaultCollapsed={false}
|
||||
/>
|
||||
|
||||
{/* Quick Link to Wizard */}
|
||||
<div className="mb-6">
|
||||
<Link
|
||||
href="/communication/mail/wizard"
|
||||
className="inline-flex items-center gap-2 px-4 py-2 bg-blue-600 text-white rounded-lg hover:bg-blue-700 transition-colors"
|
||||
>
|
||||
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 10V3L4 14h7v7l9-11h-7z" />
|
||||
</svg>
|
||||
Mail Wizard starten
|
||||
</Link>
|
||||
</div>
|
||||
|
||||
{/* Error Banner */}
|
||||
{error && (
|
||||
<div className="mb-6 bg-red-50 border border-red-200 rounded-lg p-4 flex items-center gap-3">
|
||||
<svg className="w-5 h-5 text-red-500 flex-shrink-0" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 8v4m0 4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
|
||||
</svg>
|
||||
<span className="text-red-700">{error}</span>
|
||||
<button onClick={fetchData} className="ml-auto text-red-600 hover:text-red-800 text-sm font-medium">
|
||||
Erneut versuchen
|
||||
</button>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Tab Navigation */}
|
||||
<div className="border-b border-slate-200 mb-6">
|
||||
<nav className="-mb-px flex space-x-8">
|
||||
{tabs.map((tab) => (
|
||||
<button
|
||||
key={tab.id}
|
||||
onClick={() => setActiveTab(tab.id)}
|
||||
className={`
|
||||
flex items-center gap-2 py-4 px-1 border-b-2 font-medium text-sm transition-colors
|
||||
${activeTab === tab.id
|
||||
? 'border-blue-500 text-blue-600'
|
||||
: 'border-transparent text-slate-500 hover:text-slate-700 hover:border-slate-300'
|
||||
}
|
||||
`}
|
||||
>
|
||||
{tab.name}
|
||||
</button>
|
||||
))}
|
||||
</nav>
|
||||
</div>
|
||||
|
||||
{/* Tab Content */}
|
||||
{activeTab === 'overview' && (
|
||||
<OverviewTab
|
||||
stats={stats}
|
||||
syncStatus={syncStatus}
|
||||
loading={loading}
|
||||
onRefresh={fetchData}
|
||||
/>
|
||||
)}
|
||||
{activeTab === 'accounts' && (
|
||||
<AccountsTab
|
||||
accounts={accounts}
|
||||
loading={loading}
|
||||
onRefresh={fetchData}
|
||||
/>
|
||||
)}
|
||||
{activeTab === 'ai-settings' && (
|
||||
<AISettingsTab />
|
||||
)}
|
||||
{activeTab === 'templates' && (
|
||||
<TemplatesTab />
|
||||
)}
|
||||
{activeTab === 'logs' && (
|
||||
<AuditLogTab />
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Overview Tab
|
||||
// ============================================================================
|
||||
|
||||
function OverviewTab({
|
||||
stats,
|
||||
syncStatus,
|
||||
loading,
|
||||
onRefresh
|
||||
}: {
|
||||
stats: MailStats | null
|
||||
syncStatus: SyncStatus | null
|
||||
loading: boolean
|
||||
onRefresh: () => void
|
||||
}) {
|
||||
const triggerSync = async () => {
|
||||
try {
|
||||
await fetch(`${API_BASE}/api/v1/mail/sync/all`, {
|
||||
method: 'POST',
|
||||
})
|
||||
onRefresh()
|
||||
} catch (err) {
|
||||
console.error('Failed to trigger sync:', err)
|
||||
}
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="space-y-6">
|
||||
{/* Header */}
|
||||
<div className="flex items-center justify-between">
|
||||
<div>
|
||||
<h2 className="text-lg font-semibold text-slate-900">System-Uebersicht</h2>
|
||||
<p className="text-sm text-slate-500">Status aller E-Mail-Konten und Aufgaben</p>
|
||||
</div>
|
||||
<div className="flex gap-3">
|
||||
<button
|
||||
onClick={onRefresh}
|
||||
className="px-4 py-2 text-sm font-medium text-slate-700 bg-white border border-slate-300 rounded-lg hover:bg-slate-50"
|
||||
>
|
||||
Aktualisieren
|
||||
</button>
|
||||
<button
|
||||
onClick={triggerSync}
|
||||
disabled={syncStatus?.running}
|
||||
className="px-4 py-2 text-sm font-medium text-white bg-blue-600 rounded-lg hover:bg-blue-700 disabled:opacity-50"
|
||||
>
|
||||
{syncStatus?.running ? 'Synchronisiert...' : 'Alle synchronisieren'}
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Loading State */}
|
||||
{loading && (
|
||||
<div className="flex items-center justify-center py-12">
|
||||
<div className="animate-spin rounded-full h-8 w-8 border-b-2 border-blue-600"></div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Stats Grid */}
|
||||
{!loading && stats && (
|
||||
<>
|
||||
<div className="grid grid-cols-2 md:grid-cols-4 gap-4">
|
||||
<StatCard
|
||||
title="E-Mail-Konten"
|
||||
value={stats.totalAccounts}
|
||||
subtitle={`${stats.activeAccounts} aktiv`}
|
||||
color="blue"
|
||||
/>
|
||||
<StatCard
|
||||
title="E-Mails gesamt"
|
||||
value={stats.totalEmails}
|
||||
subtitle={`${stats.unreadEmails} ungelesen`}
|
||||
color="green"
|
||||
/>
|
||||
<StatCard
|
||||
title="Aufgaben"
|
||||
value={stats.totalTasks}
|
||||
subtitle={`${stats.pendingTasks} offen`}
|
||||
color="yellow"
|
||||
/>
|
||||
<StatCard
|
||||
title="Ueberfaellig"
|
||||
value={stats.overdueTasks}
|
||||
color={stats.overdueTasks > 0 ? 'red' : 'green'}
|
||||
/>
|
||||
</div>
|
||||
|
||||
{/* Sync Status */}
|
||||
<div className="bg-white rounded-lg border border-slate-200 p-6">
|
||||
<h3 className="text-sm font-medium text-slate-700 mb-4">Synchronisierung</h3>
|
||||
<div className="flex items-center gap-4">
|
||||
{syncStatus?.running ? (
|
||||
<>
|
||||
<div className="w-3 h-3 bg-yellow-500 rounded-full animate-pulse"></div>
|
||||
<span className="text-slate-600">
|
||||
Synchronisiere {syncStatus.accountsInProgress.length} Konto(en)...
|
||||
</span>
|
||||
</>
|
||||
) : (
|
||||
<>
|
||||
<div className="w-3 h-3 bg-green-500 rounded-full"></div>
|
||||
<span className="text-slate-600">Bereit</span>
|
||||
</>
|
||||
)}
|
||||
{stats.lastSyncTime && (
|
||||
<span className="text-sm text-slate-500 ml-auto">
|
||||
Letzte Sync: {new Date(stats.lastSyncTime).toLocaleString('de-DE')}
|
||||
</span>
|
||||
)}
|
||||
</div>
|
||||
|
||||
{syncStatus?.errors && syncStatus.errors.length > 0 && (
|
||||
<div className="mt-4 p-4 bg-red-50 rounded-lg">
|
||||
<h4 className="text-sm font-medium text-red-800 mb-2">Fehler</h4>
|
||||
<ul className="text-sm text-red-700 space-y-1">
|
||||
{syncStatus.errors.slice(0, 3).map((error, i) => (
|
||||
<li key={i}>{error}</li>
|
||||
))}
|
||||
</ul>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
|
||||
{/* AI Stats */}
|
||||
<div className="bg-white rounded-lg border border-slate-200 p-6">
|
||||
<h3 className="text-sm font-medium text-slate-700 mb-4">KI-Analyse</h3>
|
||||
<div className="grid grid-cols-2 md:grid-cols-4 gap-6">
|
||||
<div>
|
||||
<p className="text-xs text-slate-500 uppercase tracking-wider">Analysiert</p>
|
||||
<p className="text-2xl font-bold text-slate-900">{stats.aiAnalyzedCount}</p>
|
||||
</div>
|
||||
<div>
|
||||
<p className="text-xs text-slate-500 uppercase tracking-wider">Analyse-Rate</p>
|
||||
<p className="text-2xl font-bold text-slate-900">
|
||||
{stats.totalEmails > 0
|
||||
? `${Math.round((stats.aiAnalyzedCount / stats.totalEmails) * 100)}%`
|
||||
: '0%'}
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
function StatCard({
|
||||
title,
|
||||
value,
|
||||
subtitle,
|
||||
color = 'blue'
|
||||
}: {
|
||||
title: string
|
||||
value: number
|
||||
subtitle?: string
|
||||
color?: 'blue' | 'green' | 'yellow' | 'red'
|
||||
}) {
|
||||
const colorClasses = {
|
||||
blue: 'text-blue-600',
|
||||
green: 'text-green-600',
|
||||
yellow: 'text-yellow-600',
|
||||
red: 'text-red-600',
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="bg-white rounded-lg border border-slate-200 p-6">
|
||||
<p className="text-xs text-slate-500 uppercase tracking-wider mb-1">{title}</p>
|
||||
<p className={`text-3xl font-bold ${colorClasses[color]}`}>{value.toLocaleString()}</p>
|
||||
{subtitle && <p className="text-sm text-slate-500 mt-1">{subtitle}</p>}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Accounts Tab
|
||||
// ============================================================================
|
||||
|
||||
function AccountsTab({
|
||||
accounts,
|
||||
loading,
|
||||
onRefresh
|
||||
}: {
|
||||
accounts: EmailAccount[]
|
||||
loading: boolean
|
||||
onRefresh: () => void
|
||||
}) {
|
||||
const [showAddModal, setShowAddModal] = useState(false)
|
||||
|
||||
const testConnection = async (accountId: string) => {
|
||||
try {
|
||||
const res = await fetch(`${API_BASE}/api/v1/mail/accounts/${accountId}/test`, {
|
||||
method: 'POST',
|
||||
})
|
||||
if (res.ok) {
|
||||
alert('Verbindung erfolgreich!')
|
||||
} else {
|
||||
alert('Verbindungsfehler')
|
||||
}
|
||||
} catch (err) {
|
||||
alert('Verbindungsfehler')
|
||||
}
|
||||
}
|
||||
|
||||
const statusColors = {
|
||||
active: 'bg-green-100 text-green-800',
|
||||
inactive: 'bg-gray-100 text-gray-800',
|
||||
error: 'bg-red-100 text-red-800',
|
||||
syncing: 'bg-yellow-100 text-yellow-800',
|
||||
}
|
||||
|
||||
const statusLabels = {
|
||||
active: 'Aktiv',
|
||||
inactive: 'Inaktiv',
|
||||
error: 'Fehler',
|
||||
syncing: 'Synchronisiert...',
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="space-y-6">
|
||||
{/* Header */}
|
||||
<div className="flex items-center justify-between">
|
||||
<div>
|
||||
<h2 className="text-lg font-semibold text-slate-900">E-Mail-Konten</h2>
|
||||
<p className="text-sm text-slate-500">Verwalten Sie die verbundenen E-Mail-Konten</p>
|
||||
</div>
|
||||
<button
|
||||
onClick={() => setShowAddModal(true)}
|
||||
className="px-4 py-2 text-sm font-medium text-white bg-blue-600 rounded-lg hover:bg-blue-700 flex items-center gap-2"
|
||||
>
|
||||
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 6v6m0 0v6m0-6h6m-6 0H6" />
|
||||
</svg>
|
||||
Konto hinzufuegen
|
||||
</button>
|
||||
</div>
|
||||
|
||||
{/* Loading State */}
|
||||
{loading && (
|
||||
<div className="flex items-center justify-center py-12">
|
||||
<div className="animate-spin rounded-full h-8 w-8 border-b-2 border-blue-600"></div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Accounts Grid */}
|
||||
{!loading && (
|
||||
<div className="grid gap-4">
|
||||
{accounts.length === 0 ? (
|
||||
<div className="bg-slate-50 rounded-lg p-8 text-center">
|
||||
<svg className="w-12 h-12 text-slate-400 mx-auto mb-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M3 8l7.89 5.26a2 2 0 002.22 0L21 8M5 19h14a2 2 0 002-2V7a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z" />
|
||||
</svg>
|
||||
<h3 className="text-lg font-medium text-slate-900 mb-2">Keine E-Mail-Konten</h3>
|
||||
<p className="text-slate-500 mb-4">Fuegen Sie Ihr erstes E-Mail-Konto hinzu.</p>
|
||||
</div>
|
||||
) : (
|
||||
accounts.map((account) => (
|
||||
<div
|
||||
key={account.id}
|
||||
className="bg-white rounded-lg border border-slate-200 p-6 hover:shadow-md transition-shadow"
|
||||
>
|
||||
<div className="flex items-start justify-between">
|
||||
<div className="flex items-center gap-4">
|
||||
<div className="w-12 h-12 bg-blue-100 rounded-lg flex items-center justify-center">
|
||||
<svg className="w-6 h-6 text-blue-600" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M3 8l7.89 5.26a2 2 0 002.22 0L21 8M5 19h14a2 2 0 002-2V7a2 2 0 00-2-2H5a2 2 0 00-2 2v10a2 2 0 002 2z" />
|
||||
</svg>
|
||||
</div>
|
||||
<div>
|
||||
<h3 className="text-lg font-semibold text-slate-900">
|
||||
{account.displayName || account.email}
|
||||
</h3>
|
||||
<p className="text-sm text-slate-500">{account.email}</p>
|
||||
</div>
|
||||
</div>
|
||||
<div className="flex items-center gap-3">
|
||||
<span className={`px-3 py-1 rounded-full text-xs font-medium ${statusColors[account.status]}`}>
|
||||
{statusLabels[account.status]}
|
||||
</span>
|
||||
<button
|
||||
onClick={() => testConnection(account.id)}
|
||||
className="p-2 text-slate-400 hover:text-slate-600"
|
||||
title="Verbindung testen"
|
||||
>
|
||||
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 10V3L4 14h7v7l9-11h-7z" />
|
||||
</svg>
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="mt-4 grid grid-cols-2 md:grid-cols-4 gap-4">
|
||||
<div>
|
||||
<p className="text-xs text-slate-500 uppercase tracking-wider">E-Mails</p>
|
||||
<p className="text-lg font-semibold text-slate-900">{account.emailCount}</p>
|
||||
</div>
|
||||
<div>
|
||||
<p className="text-xs text-slate-500 uppercase tracking-wider">Ungelesen</p>
|
||||
<p className="text-lg font-semibold text-slate-900">{account.unreadCount}</p>
|
||||
</div>
|
||||
<div>
|
||||
<p className="text-xs text-slate-500 uppercase tracking-wider">IMAP</p>
|
||||
<p className="text-sm font-mono text-slate-700">{account.imapHost}:{account.imapPort}</p>
|
||||
</div>
|
||||
<div>
|
||||
<p className="text-xs text-slate-500 uppercase tracking-wider">Letzte Sync</p>
|
||||
<p className="text-sm text-slate-700">
|
||||
{account.lastSync
|
||||
? new Date(account.lastSync).toLocaleString('de-DE')
|
||||
: 'Nie'}
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
))
|
||||
)}
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Add Account Modal */}
|
||||
{showAddModal && (
|
||||
<AddAccountModal onClose={() => setShowAddModal(false)} onSuccess={() => { setShowAddModal(false); onRefresh(); }} />
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
function AddAccountModal({
|
||||
onClose,
|
||||
onSuccess
|
||||
}: {
|
||||
onClose: () => void
|
||||
onSuccess: () => void
|
||||
}) {
|
||||
const [formData, setFormData] = useState({
|
||||
email: '',
|
||||
displayName: '',
|
||||
imapHost: '',
|
||||
imapPort: 993,
|
||||
smtpHost: '',
|
||||
smtpPort: 587,
|
||||
username: '',
|
||||
password: '',
|
||||
})
|
||||
const [submitting, setSubmitting] = useState(false)
|
||||
const [error, setError] = useState<string | null>(null)
|
||||
|
||||
const handleSubmit = async (e: React.FormEvent) => {
|
||||
e.preventDefault()
|
||||
setSubmitting(true)
|
||||
setError(null)
|
||||
|
||||
try {
|
||||
const res = await fetch(`${API_BASE}/api/v1/mail/accounts`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({
|
||||
email: formData.email,
|
||||
display_name: formData.displayName,
|
||||
imap_host: formData.imapHost,
|
||||
imap_port: formData.imapPort,
|
||||
smtp_host: formData.smtpHost,
|
||||
smtp_port: formData.smtpPort,
|
||||
username: formData.username,
|
||||
password: formData.password,
|
||||
}),
|
||||
})
|
||||
|
||||
if (res.ok) {
|
||||
onSuccess()
|
||||
} else {
|
||||
const data = await res.json()
|
||||
setError(data.detail || 'Fehler beim Hinzufuegen des Kontos')
|
||||
}
|
||||
} catch (err) {
|
||||
setError('Netzwerkfehler')
|
||||
} finally {
|
||||
setSubmitting(false)
|
||||
}
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="fixed inset-0 bg-black bg-opacity-50 flex items-center justify-center z-50">
|
||||
<div className="bg-white rounded-lg shadow-xl max-w-lg w-full mx-4 max-h-[90vh] overflow-y-auto">
|
||||
<div className="p-6 border-b border-slate-200">
|
||||
<h2 className="text-lg font-semibold text-slate-900">E-Mail-Konto hinzufuegen</h2>
|
||||
</div>
|
||||
|
||||
<form onSubmit={handleSubmit} className="p-6 space-y-4">
|
||||
{error && (
|
||||
<div className="p-3 bg-red-50 text-red-700 rounded-lg text-sm">{error}</div>
|
||||
)}
|
||||
|
||||
<div className="grid grid-cols-2 gap-4">
|
||||
<div className="col-span-2">
|
||||
<label className="block text-sm font-medium text-slate-700 mb-1">E-Mail-Adresse</label>
|
||||
<input
|
||||
type="email"
|
||||
required
|
||||
value={formData.email}
|
||||
onChange={(e) => setFormData({ ...formData, email: e.target.value })}
|
||||
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-blue-500"
|
||||
placeholder="schulleitung@grundschule-xy.de"
|
||||
/>
|
||||
</div>
|
||||
<div className="col-span-2">
|
||||
<label className="block text-sm font-medium text-slate-700 mb-1">Anzeigename</label>
|
||||
<input
|
||||
type="text"
|
||||
value={formData.displayName}
|
||||
onChange={(e) => setFormData({ ...formData, displayName: e.target.value })}
|
||||
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-blue-500"
|
||||
placeholder="Schulleitung"
|
||||
/>
|
||||
</div>
|
||||
|
||||
<div>
|
||||
<label className="block text-sm font-medium text-slate-700 mb-1">IMAP Server</label>
|
||||
<input
|
||||
type="text"
|
||||
required
|
||||
value={formData.imapHost}
|
||||
onChange={(e) => setFormData({ ...formData, imapHost: e.target.value })}
|
||||
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-blue-500"
|
||||
placeholder="imap.example.com"
|
||||
/>
|
||||
</div>
|
||||
<div>
|
||||
<label className="block text-sm font-medium text-slate-700 mb-1">IMAP Port</label>
|
||||
<input
|
||||
type="number"
|
||||
required
|
||||
value={formData.imapPort}
|
||||
onChange={(e) => setFormData({ ...formData, imapPort: parseInt(e.target.value) })}
|
||||
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-blue-500"
|
||||
/>
|
||||
</div>
|
||||
|
||||
<div>
|
||||
<label className="block text-sm font-medium text-slate-700 mb-1">SMTP Server</label>
|
||||
<input
|
||||
type="text"
|
||||
required
|
||||
value={formData.smtpHost}
|
||||
onChange={(e) => setFormData({ ...formData, smtpHost: e.target.value })}
|
||||
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-blue-500"
|
||||
placeholder="smtp.example.com"
|
||||
/>
|
||||
</div>
|
||||
<div>
|
||||
<label className="block text-sm font-medium text-slate-700 mb-1">SMTP Port</label>
|
||||
<input
|
||||
type="number"
|
||||
required
|
||||
value={formData.smtpPort}
|
||||
onChange={(e) => setFormData({ ...formData, smtpPort: parseInt(e.target.value) })}
|
||||
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-blue-500"
|
||||
/>
|
||||
</div>
|
||||
|
||||
<div className="col-span-2">
|
||||
<label className="block text-sm font-medium text-slate-700 mb-1">Benutzername</label>
|
||||
<input
|
||||
type="text"
|
||||
required
|
||||
value={formData.username}
|
||||
onChange={(e) => setFormData({ ...formData, username: e.target.value })}
|
||||
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-blue-500"
|
||||
/>
|
||||
</div>
|
||||
|
||||
<div className="col-span-2">
|
||||
<label className="block text-sm font-medium text-slate-700 mb-1">Passwort</label>
|
||||
<input
|
||||
type="password"
|
||||
required
|
||||
value={formData.password}
|
||||
onChange={(e) => setFormData({ ...formData, password: e.target.value })}
|
||||
className="w-full px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-blue-500"
|
||||
/>
|
||||
<p className="text-xs text-slate-500 mt-1">
|
||||
Das Passwort wird verschluesselt in Vault gespeichert.
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="flex justify-end gap-3 pt-4 border-t border-slate-200">
|
||||
<button
|
||||
type="button"
|
||||
onClick={onClose}
|
||||
className="px-4 py-2 text-sm font-medium text-slate-700 hover:bg-slate-100 rounded-lg"
|
||||
>
|
||||
Abbrechen
|
||||
</button>
|
||||
<button
|
||||
type="submit"
|
||||
disabled={submitting}
|
||||
className="px-4 py-2 text-sm font-medium text-white bg-blue-600 rounded-lg hover:bg-blue-700 disabled:opacity-50"
|
||||
>
|
||||
{submitting ? 'Speichern...' : 'Konto hinzufuegen'}
|
||||
</button>
|
||||
</div>
|
||||
</form>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// AI Settings Tab
|
||||
// ============================================================================
|
||||
|
||||
function AISettingsTab() {
|
||||
const [settings, setSettings] = useState({
|
||||
autoAnalyze: true,
|
||||
autoCreateTasks: true,
|
||||
analysisModel: 'breakpilot-teacher-8b',
|
||||
confidenceThreshold: 0.7,
|
||||
})
|
||||
|
||||
return (
|
||||
<div className="space-y-6">
|
||||
<div>
|
||||
<h2 className="text-lg font-semibold text-slate-900">KI-Einstellungen</h2>
|
||||
<p className="text-sm text-slate-500">Konfigurieren Sie die automatische E-Mail-Analyse</p>
|
||||
</div>
|
||||
|
||||
<div className="bg-white rounded-lg border border-slate-200 p-6 space-y-6">
|
||||
{/* Auto-Analyze */}
|
||||
<div className="flex items-center justify-between">
|
||||
<div>
|
||||
<h3 className="text-sm font-medium text-slate-900">Automatische Analyse</h3>
|
||||
<p className="text-sm text-slate-500">E-Mails automatisch beim Empfang analysieren</p>
|
||||
</div>
|
||||
<button
|
||||
onClick={() => setSettings({ ...settings, autoAnalyze: !settings.autoAnalyze })}
|
||||
className={`relative inline-flex h-6 w-11 items-center rounded-full transition-colors ${
|
||||
settings.autoAnalyze ? 'bg-blue-600' : 'bg-slate-200'
|
||||
}`}
|
||||
>
|
||||
<span
|
||||
className={`inline-block h-4 w-4 transform rounded-full bg-white transition-transform ${
|
||||
settings.autoAnalyze ? 'translate-x-6' : 'translate-x-1'
|
||||
}`}
|
||||
/>
|
||||
</button>
|
||||
</div>
|
||||
|
||||
{/* Auto-Create Tasks */}
|
||||
<div className="flex items-center justify-between">
|
||||
<div>
|
||||
<h3 className="text-sm font-medium text-slate-900">Aufgaben automatisch erstellen</h3>
|
||||
<p className="text-sm text-slate-500">Erkannte Fristen als Aufgaben anlegen</p>
|
||||
</div>
|
||||
<button
|
||||
onClick={() => setSettings({ ...settings, autoCreateTasks: !settings.autoCreateTasks })}
|
||||
className={`relative inline-flex h-6 w-11 items-center rounded-full transition-colors ${
|
||||
settings.autoCreateTasks ? 'bg-blue-600' : 'bg-slate-200'
|
||||
}`}
|
||||
>
|
||||
<span
|
||||
className={`inline-block h-4 w-4 transform rounded-full bg-white transition-transform ${
|
||||
settings.autoCreateTasks ? 'translate-x-6' : 'translate-x-1'
|
||||
}`}
|
||||
/>
|
||||
</button>
|
||||
</div>
|
||||
|
||||
{/* Model Selection */}
|
||||
<div>
|
||||
<label className="block text-sm font-medium text-slate-700 mb-2">Analyse-Modell</label>
|
||||
<select
|
||||
value={settings.analysisModel}
|
||||
onChange={(e) => setSettings({ ...settings, analysisModel: e.target.value })}
|
||||
className="w-full md:w-64 px-3 py-2 border border-slate-300 rounded-lg focus:ring-2 focus:ring-blue-500"
|
||||
>
|
||||
<option value="breakpilot-teacher-8b">BreakPilot Teacher 8B (schnell)</option>
|
||||
<option value="breakpilot-teacher-70b">BreakPilot Teacher 70B (genau)</option>
|
||||
<option value="llama-3.1-8b-instruct">Llama 3.1 8B Instruct</option>
|
||||
</select>
|
||||
</div>
|
||||
|
||||
{/* Confidence Threshold */}
|
||||
<div>
|
||||
<label className="block text-sm font-medium text-slate-700 mb-2">
|
||||
Konfidenz-Schwelle: {Math.round(settings.confidenceThreshold * 100)}%
|
||||
</label>
|
||||
<input
|
||||
type="range"
|
||||
min="0.5"
|
||||
max="0.95"
|
||||
step="0.05"
|
||||
value={settings.confidenceThreshold}
|
||||
onChange={(e) => setSettings({ ...settings, confidenceThreshold: parseFloat(e.target.value) })}
|
||||
className="w-full md:w-64"
|
||||
/>
|
||||
<p className="text-xs text-slate-500 mt-1">
|
||||
Mindest-Konfidenz fuer automatische Aufgabenerstellung
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Sender Classification */}
|
||||
<div className="bg-white rounded-lg border border-slate-200 p-6">
|
||||
<h3 className="text-sm font-medium text-slate-700 mb-4">Bekannte Absender (Niedersachsen)</h3>
|
||||
<div className="grid grid-cols-2 md:grid-cols-3 gap-3">
|
||||
{[
|
||||
{ domain: '@mk.niedersachsen.de', type: 'Kultusministerium', priority: 'Hoch' },
|
||||
{ domain: '@rlsb.de', type: 'RLSB', priority: 'Hoch' },
|
||||
{ domain: '@landesschulbehoerde-nds.de', type: 'Landesschulbehoerde', priority: 'Hoch' },
|
||||
{ domain: '@nibis.de', type: 'NiBiS', priority: 'Mittel' },
|
||||
{ domain: '@schultraeger.de', type: 'Schultraeger', priority: 'Mittel' },
|
||||
].map((sender) => (
|
||||
<div key={sender.domain} className="p-3 bg-slate-50 rounded-lg">
|
||||
<p className="text-sm font-mono text-slate-700">{sender.domain}</p>
|
||||
<p className="text-xs text-slate-500">{sender.type}</p>
|
||||
<span className={`text-xs px-2 py-0.5 rounded ${
|
||||
sender.priority === 'Hoch' ? 'bg-red-100 text-red-700' : 'bg-yellow-100 text-yellow-700'
|
||||
}`}>
|
||||
{sender.priority}
|
||||
</span>
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Templates Tab
|
||||
// ============================================================================
|
||||
|
||||
function TemplatesTab() {
|
||||
const [templates] = useState([
|
||||
{ id: '1', name: 'Eingangsbestaetigung', category: 'Standard', usageCount: 45 },
|
||||
{ id: '2', name: 'Terminbestaetigung', category: 'Termine', usageCount: 23 },
|
||||
{ id: '3', name: 'Elternbrief-Vorlage', category: 'Eltern', usageCount: 67 },
|
||||
])
|
||||
|
||||
return (
|
||||
<div className="space-y-6">
|
||||
<div className="flex items-center justify-between">
|
||||
<div>
|
||||
<h2 className="text-lg font-semibold text-slate-900">E-Mail-Vorlagen</h2>
|
||||
<p className="text-sm text-slate-500">Verwalten Sie Antwort-Templates</p>
|
||||
</div>
|
||||
<button className="px-4 py-2 text-sm font-medium text-white bg-blue-600 rounded-lg hover:bg-blue-700 flex items-center gap-2">
|
||||
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 6v6m0 0v6m0-6h6m-6 0H6" />
|
||||
</svg>
|
||||
Vorlage erstellen
|
||||
</button>
|
||||
</div>
|
||||
|
||||
<div className="bg-white rounded-lg border border-slate-200 overflow-hidden">
|
||||
<table className="min-w-full divide-y divide-slate-200">
|
||||
<thead className="bg-slate-50">
|
||||
<tr>
|
||||
<th className="px-6 py-3 text-left text-xs font-medium text-slate-500 uppercase">Name</th>
|
||||
<th className="px-6 py-3 text-left text-xs font-medium text-slate-500 uppercase">Kategorie</th>
|
||||
<th className="px-6 py-3 text-left text-xs font-medium text-slate-500 uppercase">Verwendet</th>
|
||||
<th className="px-6 py-3 text-right text-xs font-medium text-slate-500 uppercase">Aktionen</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody className="divide-y divide-slate-200">
|
||||
{templates.map((template) => (
|
||||
<tr key={template.id} className="hover:bg-slate-50">
|
||||
<td className="px-6 py-4 text-sm font-medium text-slate-900">{template.name}</td>
|
||||
<td className="px-6 py-4">
|
||||
<span className="px-2 py-1 bg-slate-100 text-slate-700 text-xs rounded">{template.category}</span>
|
||||
</td>
|
||||
<td className="px-6 py-4 text-sm text-slate-500">{template.usageCount}x</td>
|
||||
<td className="px-6 py-4 text-right">
|
||||
<button className="text-blue-600 hover:text-blue-800 text-sm font-medium">Bearbeiten</button>
|
||||
</td>
|
||||
</tr>
|
||||
))}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Audit Log Tab
|
||||
// ============================================================================
|
||||
|
||||
function AuditLogTab() {
|
||||
const [logs] = useState([
|
||||
{ id: '1', action: 'account_created', user: 'admin@breakpilot.de', timestamp: new Date().toISOString(), details: 'Konto schulleitung@example.de hinzugefuegt' },
|
||||
{ id: '2', action: 'email_analyzed', user: 'system', timestamp: new Date(Date.now() - 3600000).toISOString(), details: '5 E-Mails analysiert' },
|
||||
{ id: '3', action: 'task_created', user: 'system', timestamp: new Date(Date.now() - 7200000).toISOString(), details: 'Aufgabe aus Fristenerkennung erstellt' },
|
||||
])
|
||||
|
||||
const actionLabels: Record<string, string> = {
|
||||
account_created: 'Konto erstellt',
|
||||
email_analyzed: 'E-Mail analysiert',
|
||||
task_created: 'Aufgabe erstellt',
|
||||
sync_completed: 'Sync abgeschlossen',
|
||||
}
|
||||
|
||||
return (
|
||||
<div className="space-y-6">
|
||||
<div>
|
||||
<h2 className="text-lg font-semibold text-slate-900">Audit-Log</h2>
|
||||
<p className="text-sm text-slate-500">Alle Aktionen im Mail-System</p>
|
||||
</div>
|
||||
|
||||
<div className="bg-white rounded-lg border border-slate-200 overflow-hidden">
|
||||
<table className="min-w-full divide-y divide-slate-200">
|
||||
<thead className="bg-slate-50">
|
||||
<tr>
|
||||
<th className="px-6 py-3 text-left text-xs font-medium text-slate-500 uppercase">Zeit</th>
|
||||
<th className="px-6 py-3 text-left text-xs font-medium text-slate-500 uppercase">Aktion</th>
|
||||
<th className="px-6 py-3 text-left text-xs font-medium text-slate-500 uppercase">Benutzer</th>
|
||||
<th className="px-6 py-3 text-left text-xs font-medium text-slate-500 uppercase">Details</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody className="divide-y divide-slate-200">
|
||||
{logs.map((log) => (
|
||||
<tr key={log.id} className="hover:bg-slate-50">
|
||||
<td className="px-6 py-4 text-sm text-slate-500">
|
||||
{new Date(log.timestamp).toLocaleString('de-DE')}
|
||||
</td>
|
||||
<td className="px-6 py-4">
|
||||
<span className="px-2 py-1 bg-blue-100 text-blue-700 text-xs rounded font-medium">
|
||||
{actionLabels[log.action] || log.action}
|
||||
</span>
|
||||
</td>
|
||||
<td className="px-6 py-4 text-sm text-slate-700">{log.user}</td>
|
||||
<td className="px-6 py-4 text-sm text-slate-500">{log.details}</td>
|
||||
</tr>
|
||||
))}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -1,594 +0,0 @@
|
||||
'use client'
|
||||
|
||||
/**
|
||||
* Voice Service Admin Page (migrated from website/admin/voice)
|
||||
*
|
||||
* Displays:
|
||||
* - Voice-First Architecture Overview
|
||||
* - Developer Guide Content
|
||||
* - Live Voice Demo (embedded from studio-v2)
|
||||
* - Task State Machine Documentation
|
||||
* - DSGVO Compliance Information
|
||||
*/
|
||||
|
||||
import { useState } from 'react'
|
||||
import Link from 'next/link'
|
||||
import { PagePurpose } from '@/components/common/PagePurpose'
|
||||
|
||||
type TabType = 'overview' | 'demo' | 'tasks' | 'intents' | 'dsgvo' | 'api'
|
||||
|
||||
// Task State Machine data
|
||||
const TASK_STATES = [
|
||||
{ state: 'DRAFT', description: 'Task erstellt, noch nicht verarbeitet', color: 'bg-gray-100 text-gray-800', next: ['QUEUED', 'PAUSED'] },
|
||||
{ state: 'QUEUED', description: 'In Warteschlange fuer Verarbeitung', color: 'bg-blue-100 text-blue-800', next: ['RUNNING', 'PAUSED'] },
|
||||
{ state: 'RUNNING', description: 'Wird aktuell verarbeitet', color: 'bg-yellow-100 text-yellow-800', next: ['READY', 'PAUSED'] },
|
||||
{ state: 'READY', description: 'Fertig, wartet auf User-Bestaetigung', color: 'bg-green-100 text-green-800', next: ['APPROVED', 'REJECTED', 'PAUSED'] },
|
||||
{ state: 'APPROVED', description: 'Vom User bestaetigt', color: 'bg-emerald-100 text-emerald-800', next: ['COMPLETED'] },
|
||||
{ state: 'REJECTED', description: 'Vom User abgelehnt', color: 'bg-red-100 text-red-800', next: ['DRAFT'] },
|
||||
{ state: 'COMPLETED', description: 'Erfolgreich abgeschlossen', color: 'bg-teal-100 text-teal-800', next: [] },
|
||||
{ state: 'EXPIRED', description: 'TTL ueberschritten', color: 'bg-orange-100 text-orange-800', next: [] },
|
||||
{ state: 'PAUSED', description: 'Vom User pausiert', color: 'bg-purple-100 text-purple-800', next: ['DRAFT', 'QUEUED', 'RUNNING', 'READY'] },
|
||||
]
|
||||
|
||||
// Intent Types (22 types organized by group)
|
||||
const INTENT_GROUPS = [
|
||||
{
|
||||
group: 'Notizen',
|
||||
color: 'bg-blue-50 border-blue-200',
|
||||
intents: [
|
||||
{ type: 'student_observation', example: 'Notiz zu Max: heute wiederholt gestoert', description: 'Schuelerbeobachtungen' },
|
||||
{ type: 'reminder', example: 'Erinner mich morgen an Konferenz', description: 'Erinnerungen setzen' },
|
||||
{ type: 'homework_check', example: '7b Mathe Hausaufgabe kontrollieren', description: 'Hausaufgaben pruefen' },
|
||||
{ type: 'conference_topic', example: 'Thema Lehrerkonferenz: iPad-Regeln', description: 'Konferenzthemen' },
|
||||
{ type: 'correction_thought', example: 'Aufgabe 3: haeufiger Fehler erklaeren', description: 'Korrekturgedanken' },
|
||||
]
|
||||
},
|
||||
{
|
||||
group: 'Content-Generierung',
|
||||
color: 'bg-green-50 border-green-200',
|
||||
intents: [
|
||||
{ type: 'worksheet_generate', example: 'Erstelle 3 Lueckentexte zu Vokabeln', description: 'Arbeitsblaetter erstellen' },
|
||||
{ type: 'quiz_generate', example: '10-Minuten Vokabeltest mit Loesungen', description: 'Quiz/Tests erstellen' },
|
||||
{ type: 'quick_activity', example: '10 Minuten Einstieg, 5 Aufgaben', description: 'Schnelle Aktivitaeten' },
|
||||
{ type: 'differentiation', example: 'Zwei Schwierigkeitsstufen: Basis und Plus', description: 'Differenzierung' },
|
||||
]
|
||||
},
|
||||
{
|
||||
group: 'Kommunikation',
|
||||
color: 'bg-yellow-50 border-yellow-200',
|
||||
intents: [
|
||||
{ type: 'parent_letter', example: 'Neutraler Elternbrief wegen Stoerungen', description: 'Elternbriefe erstellen' },
|
||||
{ type: 'class_message', example: 'Nachricht an 8a: Hausaufgaben bis Mittwoch', description: 'Klassennachrichten' },
|
||||
]
|
||||
},
|
||||
{
|
||||
group: 'Canvas-Editor',
|
||||
color: 'bg-purple-50 border-purple-200',
|
||||
intents: [
|
||||
{ type: 'canvas_edit', example: 'Ueberschriften groesser, Zeilenabstand kleiner', description: 'Formatierung aendern' },
|
||||
{ type: 'canvas_layout', example: 'Alles auf eine Seite, Drucklayout A4', description: 'Layout anpassen' },
|
||||
{ type: 'canvas_element', example: 'Kasten fuer Merke hinzufuegen', description: 'Elemente hinzufuegen' },
|
||||
{ type: 'canvas_image', example: 'Bild 2 nach links, Pfeil auf Aufgabe 3', description: 'Bilder positionieren' },
|
||||
]
|
||||
},
|
||||
{
|
||||
group: 'RAG & Korrektur',
|
||||
color: 'bg-pink-50 border-pink-200',
|
||||
intents: [
|
||||
{ type: 'operator_checklist', example: 'Operatoren-Checkliste fuer diese Aufgabe', description: 'Operatoren abrufen' },
|
||||
{ type: 'eh_passage', example: 'Erwartungshorizont-Passage zu diesem Thema', description: 'EH-Passagen suchen' },
|
||||
{ type: 'feedback_suggestion', example: 'Kurze Feedbackformulierung vorschlagen', description: 'Feedback vorschlagen' },
|
||||
]
|
||||
},
|
||||
{
|
||||
group: 'Follow-up (TaskOrchestrator)',
|
||||
color: 'bg-teal-50 border-teal-200',
|
||||
intents: [
|
||||
{ type: 'task_summary', example: 'Fasse alle offenen Tasks zusammen', description: 'Task-Uebersicht' },
|
||||
{ type: 'convert_note', example: 'Mach aus der Notiz von gestern einen Elternbrief', description: 'Notizen konvertieren' },
|
||||
{ type: 'schedule_reminder', example: 'Erinner mich morgen an das Gespraech mit Max', description: 'Erinnerungen planen' },
|
||||
]
|
||||
},
|
||||
]
|
||||
|
||||
// DSGVO Data Categories
|
||||
const DSGVO_CATEGORIES = [
|
||||
{ category: 'Audio', processing: 'NUR transient im RAM, NIEMALS persistiert', storage: 'Keine', ttl: '-', icon: '🎤', risk: 'low' },
|
||||
{ category: 'PII (Schuelernamen)', processing: 'NUR auf Lehrergeraet', storage: 'Client-side', ttl: '-', icon: '👤', risk: 'high' },
|
||||
{ category: 'Pseudonyme', processing: 'Server erlaubt (student_ref, class_ref)', storage: 'Valkey Cache', ttl: '24h', icon: '🔢', risk: 'low' },
|
||||
{ category: 'Transkripte', processing: 'NUR verschluesselt (AES-256-GCM)', storage: 'PostgreSQL', ttl: '7 Tage', icon: '📝', risk: 'medium' },
|
||||
{ category: 'Task States', processing: 'TaskOrchestrator', storage: 'Valkey', ttl: '30 Tage', icon: '📋', risk: 'low' },
|
||||
{ category: 'Audit Logs', processing: 'Nur truncated IDs, keine PII', storage: 'PostgreSQL', ttl: '90 Tage', icon: '📊', risk: 'low' },
|
||||
]
|
||||
|
||||
// API Endpoints
|
||||
const API_ENDPOINTS = [
|
||||
{ method: 'POST', path: '/api/v1/sessions', description: 'Voice Session erstellen' },
|
||||
{ method: 'GET', path: '/api/v1/sessions/{id}', description: 'Session Status abrufen' },
|
||||
{ method: 'DELETE', path: '/api/v1/sessions/{id}', description: 'Session beenden' },
|
||||
{ method: 'GET', path: '/api/v1/sessions/{id}/tasks', description: 'Pending Tasks abrufen' },
|
||||
{ method: 'POST', path: '/api/v1/tasks', description: 'Task erstellen' },
|
||||
{ method: 'GET', path: '/api/v1/tasks/{id}', description: 'Task Status abrufen' },
|
||||
{ method: 'PUT', path: '/api/v1/tasks/{id}/transition', description: 'Task State aendern' },
|
||||
{ method: 'DELETE', path: '/api/v1/tasks/{id}', description: 'Task loeschen' },
|
||||
{ method: 'WS', path: '/ws/voice', description: 'Voice Streaming (WebSocket)' },
|
||||
{ method: 'GET', path: '/health', description: 'Health Check' },
|
||||
]
|
||||
|
||||
export default function VoiceMatrixPage() {
|
||||
const [activeTab, setActiveTab] = useState<TabType>('overview')
|
||||
const [demoLoaded, setDemoLoaded] = useState(false)
|
||||
|
||||
const tabs = [
|
||||
{ id: 'overview', name: 'Architektur', icon: '🏗️' },
|
||||
{ id: 'demo', name: 'Live Demo', icon: '🎤' },
|
||||
{ id: 'tasks', name: 'Task States', icon: '📋' },
|
||||
{ id: 'intents', name: 'Intents (22)', icon: '🎯' },
|
||||
{ id: 'dsgvo', name: 'DSGVO', icon: '🔒' },
|
||||
{ id: 'api', name: 'API', icon: '🔌' },
|
||||
]
|
||||
|
||||
return (
|
||||
<div>
|
||||
{/* Page Purpose */}
|
||||
<PagePurpose
|
||||
title="Voice Service"
|
||||
purpose="Voice-First Interface mit PersonaPlex-7B & TaskOrchestrator. Konfigurieren und testen Sie den Voice-Service fuer Lehrer-Interaktionen per Sprache."
|
||||
audience={['Entwickler', 'Admins']}
|
||||
architecture={{
|
||||
services: ['voice-service (Python, Port 8091)', 'studio-v2 (Next.js)', 'valkey (Cache)'],
|
||||
databases: ['PostgreSQL', 'Valkey Cache'],
|
||||
}}
|
||||
relatedPages={[
|
||||
{ name: 'Matrix & Jitsi', href: '/communication/matrix', description: 'Kommunikation Monitoring' },
|
||||
{ name: 'LLM Vergleich', href: '/ai/llm-compare', description: 'KI-Provider vergleichen' },
|
||||
{ name: 'GPU Infrastruktur', href: '/infrastructure/gpu', description: 'GPU fuer Voice-Service' },
|
||||
]}
|
||||
collapsible={true}
|
||||
defaultCollapsed={false}
|
||||
/>
|
||||
|
||||
{/* Quick Links */}
|
||||
<div className="mb-6 flex flex-wrap gap-3">
|
||||
<a
|
||||
href="https://macmini:3001/voice-test"
|
||||
target="_blank"
|
||||
rel="noopener noreferrer"
|
||||
className="flex items-center gap-2 px-4 py-2 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors"
|
||||
>
|
||||
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M19 11a7 7 0 01-7 7m0 0a7 7 0 01-7-7m7 7v4m0 0H8m4 0h4m-4-8a3 3 0 01-3-3V5a3 3 0 116 0v6a3 3 0 01-3 3z" />
|
||||
</svg>
|
||||
Voice Test (Studio)
|
||||
</a>
|
||||
<a
|
||||
href="https://macmini:8091/health"
|
||||
target="_blank"
|
||||
rel="noopener noreferrer"
|
||||
className="flex items-center gap-2 px-4 py-2 bg-green-100 text-green-700 rounded-lg hover:bg-green-200 transition-colors"
|
||||
>
|
||||
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 12l2 2 4-4m6 2a9 9 0 11-18 0 9 9 0 0118 0z" />
|
||||
</svg>
|
||||
Health Check
|
||||
</a>
|
||||
<Link
|
||||
href="/development/docs"
|
||||
className="flex items-center gap-2 px-4 py-2 bg-slate-100 text-slate-700 rounded-lg hover:bg-slate-200 transition-colors"
|
||||
>
|
||||
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 12h6m-6 4h6m2 5H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z" />
|
||||
</svg>
|
||||
Developer Docs
|
||||
</Link>
|
||||
</div>
|
||||
|
||||
{/* Stats Overview */}
|
||||
<div className="grid grid-cols-2 md:grid-cols-6 gap-4 mb-6">
|
||||
<div className="bg-white rounded-lg shadow p-4">
|
||||
<div className="text-3xl font-bold text-teal-600">8091</div>
|
||||
<div className="text-sm text-slate-500">Port</div>
|
||||
</div>
|
||||
<div className="bg-white rounded-lg shadow p-4">
|
||||
<div className="text-3xl font-bold text-blue-600">22</div>
|
||||
<div className="text-sm text-slate-500">Task Types</div>
|
||||
</div>
|
||||
<div className="bg-white rounded-lg shadow p-4">
|
||||
<div className="text-3xl font-bold text-purple-600">9</div>
|
||||
<div className="text-sm text-slate-500">Task States</div>
|
||||
</div>
|
||||
<div className="bg-white rounded-lg shadow p-4">
|
||||
<div className="text-3xl font-bold text-green-600">24kHz</div>
|
||||
<div className="text-sm text-slate-500">Audio Rate</div>
|
||||
</div>
|
||||
<div className="bg-white rounded-lg shadow p-4">
|
||||
<div className="text-3xl font-bold text-orange-600">80ms</div>
|
||||
<div className="text-sm text-slate-500">Frame Size</div>
|
||||
</div>
|
||||
<div className="bg-white rounded-lg shadow p-4">
|
||||
<div className="text-3xl font-bold text-red-600">0</div>
|
||||
<div className="text-sm text-slate-500">Audio Persist</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Tabs */}
|
||||
<div className="bg-white rounded-lg shadow mb-6">
|
||||
<div className="border-b border-slate-200 px-4">
|
||||
<div className="flex gap-1 overflow-x-auto">
|
||||
{tabs.map((tab) => (
|
||||
<button
|
||||
key={tab.id}
|
||||
onClick={() => setActiveTab(tab.id as TabType)}
|
||||
className={`px-4 py-3 text-sm font-medium whitespace-nowrap transition-colors border-b-2 ${
|
||||
activeTab === tab.id
|
||||
? 'border-teal-600 text-teal-600'
|
||||
: 'border-transparent text-slate-500 hover:text-slate-700'
|
||||
}`}
|
||||
>
|
||||
<span className="mr-2">{tab.icon}</span>
|
||||
{tab.name}
|
||||
</button>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="p-6">
|
||||
{/* Overview Tab */}
|
||||
{activeTab === 'overview' && (
|
||||
<div className="space-y-6">
|
||||
<h3 className="text-lg font-semibold text-slate-900">Voice-First Architektur</h3>
|
||||
|
||||
{/* Architecture Diagram */}
|
||||
<div className="bg-slate-50 rounded-lg p-6 font-mono text-sm overflow-x-auto">
|
||||
<pre className="text-slate-700">{`
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ LEHRERGERAET (PWA / App) │
|
||||
│ ┌────────────────────────────────────────────────────────────┐ │
|
||||
│ │ VoiceCapture.tsx │ voice-encryption.ts │ voice-api.ts │ │
|
||||
│ │ Mikrofon │ AES-256-GCM │ WebSocket Client │ │
|
||||
│ └────────────────────────────────────────────────────────────┘ │
|
||||
└───────────────────────────┬──────────────────────────────────────┘
|
||||
│ WebSocket (wss://)
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────────────────┐
|
||||
│ VOICE SERVICE (Port 8091) │
|
||||
│ ┌────────────────────────────────────────────────────────────┐ │
|
||||
│ │ main.py │ streaming.py │ sessions.py │ tasks.py │ │
|
||||
│ └────────────────────────────────────────────────────────────┘ │
|
||||
│ ┌────────────────────────────────────────────────────────────┐ │
|
||||
│ │ task_orchestrator.py │ intent_router.py │ encryption │ │
|
||||
│ └────────────────────────────────────────────────────────────┘ │
|
||||
└───────────────────────────┬──────────────────────────────────────┘
|
||||
│
|
||||
┌──────────────────┼──────────────────┐
|
||||
▼ ▼ ▼
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ PersonaPlex-7B │ │ Ollama Fallback │ │ Valkey Cache │
|
||||
│ (A100 GPU) │ │ (Mac Mini) │ │ (Sessions) │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
`}</pre>
|
||||
</div>
|
||||
|
||||
{/* Technology Stack */}
|
||||
<div className="grid grid-cols-1 md:grid-cols-3 gap-4">
|
||||
<div className="bg-blue-50 border border-blue-200 rounded-lg p-4">
|
||||
<h4 className="font-semibold text-blue-800 mb-2">Voice Model (Produktion)</h4>
|
||||
<p className="text-sm text-blue-700">PersonaPlex-7B (NVIDIA)</p>
|
||||
<p className="text-xs text-blue-600 mt-1">Full-Duplex Speech-to-Speech</p>
|
||||
<p className="text-xs text-blue-500">Lizenz: MIT + NVIDIA Open Model</p>
|
||||
</div>
|
||||
<div className="bg-green-50 border border-green-200 rounded-lg p-4">
|
||||
<h4 className="font-semibold text-green-800 mb-2">Agent Orchestration</h4>
|
||||
<p className="text-sm text-green-700">TaskOrchestrator</p>
|
||||
<p className="text-xs text-green-600 mt-1">Task State Machine</p>
|
||||
<p className="text-xs text-green-500">Lizenz: Proprietary</p>
|
||||
</div>
|
||||
<div className="bg-purple-50 border border-purple-200 rounded-lg p-4">
|
||||
<h4 className="font-semibold text-purple-800 mb-2">Audio Codec</h4>
|
||||
<p className="text-sm text-purple-700">Mimi (24kHz, 80ms)</p>
|
||||
<p className="text-xs text-purple-600 mt-1">Low-Latency Streaming</p>
|
||||
<p className="text-xs text-purple-500">Lizenz: MIT</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Key Files */}
|
||||
<div>
|
||||
<h4 className="font-semibold text-slate-800 mb-3">Wichtige Dateien</h4>
|
||||
<div className="bg-white border border-slate-200 rounded-lg overflow-hidden">
|
||||
<table className="min-w-full divide-y divide-slate-200">
|
||||
<thead className="bg-slate-50">
|
||||
<tr>
|
||||
<th className="px-4 py-2 text-left text-xs font-medium text-slate-500 uppercase">Datei</th>
|
||||
<th className="px-4 py-2 text-left text-xs font-medium text-slate-500 uppercase">Beschreibung</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody className="divide-y divide-slate-200">
|
||||
<tr><td className="px-4 py-2 font-mono text-sm">voice-service/main.py</td><td className="px-4 py-2 text-sm text-slate-600">FastAPI Entry, WebSocket Handler</td></tr>
|
||||
<tr><td className="px-4 py-2 font-mono text-sm">voice-service/services/task_orchestrator.py</td><td className="px-4 py-2 text-sm text-slate-600">Task State Machine</td></tr>
|
||||
<tr><td className="px-4 py-2 font-mono text-sm">voice-service/services/intent_router.py</td><td className="px-4 py-2 text-sm text-slate-600">Intent Detection (22 Types)</td></tr>
|
||||
<tr><td className="px-4 py-2 font-mono text-sm">voice-service/services/encryption_service.py</td><td className="px-4 py-2 text-sm text-slate-600">Namespace Key Management</td></tr>
|
||||
<tr><td className="px-4 py-2 font-mono text-sm">studio-v2/components/voice/VoiceCapture.tsx</td><td className="px-4 py-2 text-sm text-slate-600">Frontend Mikrofon + Crypto</td></tr>
|
||||
<tr><td className="px-4 py-2 font-mono text-sm">studio-v2/lib/voice/voice-encryption.ts</td><td className="px-4 py-2 text-sm text-slate-600">AES-256-GCM Client-side</td></tr>
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Demo Tab */}
|
||||
{activeTab === 'demo' && (
|
||||
<div className="space-y-4">
|
||||
<div className="flex items-center justify-between">
|
||||
<h3 className="text-lg font-semibold text-slate-900">Live Voice Demo</h3>
|
||||
<a
|
||||
href="https://macmini:3001/voice-test"
|
||||
target="_blank"
|
||||
rel="noopener noreferrer"
|
||||
className="text-sm text-teal-600 hover:text-teal-700 flex items-center gap-1"
|
||||
>
|
||||
In neuem Tab oeffnen
|
||||
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M10 6H6a2 2 0 00-2 2v10a2 2 0 002 2h10a2 2 0 002-2v-4M14 4h6m0 0v6m0-6L10 14" />
|
||||
</svg>
|
||||
</a>
|
||||
</div>
|
||||
|
||||
<div className="bg-slate-100 rounded-lg p-4 text-sm text-slate-600 mb-4">
|
||||
<p><strong>Hinweis:</strong> Die Demo erfordert, dass der Voice Service (Port 8091) und das Studio-v2 Frontend (Port 3001) laufen.</p>
|
||||
<code className="block mt-2 bg-slate-200 p-2 rounded">docker compose up -d voice-service && cd studio-v2 && npm run dev</code>
|
||||
</div>
|
||||
|
||||
{/* Embedded Demo */}
|
||||
<div className="relative bg-slate-900 rounded-lg overflow-hidden" style={{ height: '600px' }}>
|
||||
{!demoLoaded && (
|
||||
<div className="absolute inset-0 flex items-center justify-center">
|
||||
<button
|
||||
onClick={() => setDemoLoaded(true)}
|
||||
className="px-6 py-3 bg-teal-600 text-white rounded-lg hover:bg-teal-700 transition-colors flex items-center gap-2"
|
||||
>
|
||||
<svg className="w-6 h-6" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M14.752 11.168l-3.197-2.132A1 1 0 0010 9.87v4.263a1 1 0 001.555.832l3.197-2.132a1 1 0 000-1.664z" />
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
|
||||
</svg>
|
||||
Voice Demo laden
|
||||
</button>
|
||||
</div>
|
||||
)}
|
||||
{demoLoaded && (
|
||||
<iframe
|
||||
src="https://macmini:3001/voice-test?embed=true"
|
||||
className="w-full h-full border-0"
|
||||
title="Voice Demo"
|
||||
allow="microphone"
|
||||
/>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Task States Tab */}
|
||||
{activeTab === 'tasks' && (
|
||||
<div className="space-y-6">
|
||||
<h3 className="text-lg font-semibold text-slate-900">Task State Machine (TaskOrchestrator)</h3>
|
||||
|
||||
{/* State Diagram */}
|
||||
<div className="bg-slate-50 rounded-lg p-6 font-mono text-sm overflow-x-auto">
|
||||
<pre className="text-slate-700">{`
|
||||
DRAFT → QUEUED → RUNNING → READY
|
||||
│
|
||||
┌───────────┴───────────┐
|
||||
│ │
|
||||
APPROVED REJECTED
|
||||
│ │
|
||||
COMPLETED DRAFT (revision)
|
||||
|
||||
Any State → EXPIRED (TTL)
|
||||
Any State → PAUSED (User Interrupt)
|
||||
`}</pre>
|
||||
</div>
|
||||
|
||||
{/* States Table */}
|
||||
<div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4">
|
||||
{TASK_STATES.map((state) => (
|
||||
<div key={state.state} className={`${state.color} rounded-lg p-4`}>
|
||||
<div className="font-semibold text-lg">{state.state}</div>
|
||||
<p className="text-sm mt-1">{state.description}</p>
|
||||
{state.next.length > 0 && (
|
||||
<div className="mt-2 text-xs">
|
||||
<span className="opacity-75">Naechste:</span>{' '}
|
||||
{state.next.join(', ')}
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Intents Tab */}
|
||||
{activeTab === 'intents' && (
|
||||
<div className="space-y-6">
|
||||
<h3 className="text-lg font-semibold text-slate-900">Intent Types (22 unterstuetzte Typen)</h3>
|
||||
|
||||
{INTENT_GROUPS.map((group) => (
|
||||
<div key={group.group} className={`${group.color} border rounded-lg p-4`}>
|
||||
<h4 className="font-semibold text-slate-800 mb-3">{group.group}</h4>
|
||||
<div className="space-y-2">
|
||||
{group.intents.map((intent) => (
|
||||
<div key={intent.type} className="bg-white rounded-lg p-3 shadow-sm">
|
||||
<div className="flex items-start justify-between">
|
||||
<div>
|
||||
<code className="text-sm font-mono text-teal-700 bg-teal-50 px-2 py-0.5 rounded">
|
||||
{intent.type}
|
||||
</code>
|
||||
<p className="text-sm text-slate-600 mt-1">{intent.description}</p>
|
||||
</div>
|
||||
</div>
|
||||
<div className="mt-2 text-xs text-slate-500 italic">
|
||||
Beispiel: "{intent.example}"
|
||||
</div>
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* DSGVO Tab */}
|
||||
{activeTab === 'dsgvo' && (
|
||||
<div className="space-y-6">
|
||||
<h3 className="text-lg font-semibold text-slate-900">DSGVO-Compliance</h3>
|
||||
|
||||
{/* Key Principles */}
|
||||
<div className="bg-green-50 border border-green-200 rounded-lg p-4">
|
||||
<h4 className="font-semibold text-green-800 mb-2">Kernprinzipien</h4>
|
||||
<ul className="list-disc list-inside text-sm text-green-700 space-y-1">
|
||||
<li><strong>Audio NIEMALS persistiert</strong> - Nur transient im RAM</li>
|
||||
<li><strong>Namespace-Verschluesselung</strong> - Key nur auf Lehrergeraet</li>
|
||||
<li><strong>Keine Klartext-PII serverseitig</strong> - Nur verschluesselt oder pseudonymisiert</li>
|
||||
<li><strong>TTL-basierte Auto-Loeschung</strong> - 7/30/90 Tage je nach Kategorie</li>
|
||||
</ul>
|
||||
</div>
|
||||
|
||||
{/* Data Categories Table */}
|
||||
<div className="bg-white border border-slate-200 rounded-lg overflow-hidden">
|
||||
<table className="min-w-full divide-y divide-slate-200">
|
||||
<thead className="bg-slate-50">
|
||||
<tr>
|
||||
<th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">Kategorie</th>
|
||||
<th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">Verarbeitung</th>
|
||||
<th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">Speicherort</th>
|
||||
<th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">TTL</th>
|
||||
<th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">Risiko</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody className="divide-y divide-slate-200">
|
||||
{DSGVO_CATEGORIES.map((cat) => (
|
||||
<tr key={cat.category}>
|
||||
<td className="px-4 py-3">
|
||||
<span className="mr-2">{cat.icon}</span>
|
||||
<span className="font-medium">{cat.category}</span>
|
||||
</td>
|
||||
<td className="px-4 py-3 text-sm text-slate-600">{cat.processing}</td>
|
||||
<td className="px-4 py-3 text-sm text-slate-600">{cat.storage}</td>
|
||||
<td className="px-4 py-3 text-sm text-slate-600">{cat.ttl}</td>
|
||||
<td className="px-4 py-3">
|
||||
<span className={`px-2 py-1 rounded text-xs font-medium ${
|
||||
cat.risk === 'low' ? 'bg-green-100 text-green-700' :
|
||||
cat.risk === 'medium' ? 'bg-yellow-100 text-yellow-700' :
|
||||
'bg-red-100 text-red-700'
|
||||
}`}>
|
||||
{cat.risk.toUpperCase()}
|
||||
</span>
|
||||
</td>
|
||||
</tr>
|
||||
))}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
|
||||
{/* Audit Log Info */}
|
||||
<div className="bg-slate-50 border border-slate-200 rounded-lg p-4">
|
||||
<h4 className="font-semibold text-slate-800 mb-2">Audit Logs (ohne PII)</h4>
|
||||
<div className="grid grid-cols-2 gap-4 text-sm">
|
||||
<div>
|
||||
<span className="text-green-600 font-medium">Erlaubt:</span>
|
||||
<ul className="list-disc list-inside text-slate-600 mt-1">
|
||||
<li>ref_id (truncated)</li>
|
||||
<li>content_type</li>
|
||||
<li>size_bytes</li>
|
||||
<li>ttl_hours</li>
|
||||
</ul>
|
||||
</div>
|
||||
<div>
|
||||
<span className="text-red-600 font-medium">Verboten:</span>
|
||||
<ul className="list-disc list-inside text-slate-600 mt-1">
|
||||
<li>user_name</li>
|
||||
<li>content / transcript</li>
|
||||
<li>email</li>
|
||||
<li>student_name</li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* API Tab */}
|
||||
{activeTab === 'api' && (
|
||||
<div className="space-y-6">
|
||||
<h3 className="text-lg font-semibold text-slate-900">Voice Service API (Port 8091)</h3>
|
||||
|
||||
{/* REST Endpoints */}
|
||||
<div className="bg-white border border-slate-200 rounded-lg overflow-hidden">
|
||||
<table className="min-w-full divide-y divide-slate-200">
|
||||
<thead className="bg-slate-50">
|
||||
<tr>
|
||||
<th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">Methode</th>
|
||||
<th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">Endpoint</th>
|
||||
<th className="px-4 py-3 text-left text-xs font-medium text-slate-500 uppercase">Beschreibung</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody className="divide-y divide-slate-200">
|
||||
{API_ENDPOINTS.map((ep, idx) => (
|
||||
<tr key={idx}>
|
||||
<td className="px-4 py-3">
|
||||
<span className={`px-2 py-1 rounded text-xs font-medium ${
|
||||
ep.method === 'GET' ? 'bg-green-100 text-green-700' :
|
||||
ep.method === 'POST' ? 'bg-blue-100 text-blue-700' :
|
||||
ep.method === 'PUT' ? 'bg-yellow-100 text-yellow-700' :
|
||||
ep.method === 'DELETE' ? 'bg-red-100 text-red-700' :
|
||||
'bg-purple-100 text-purple-700'
|
||||
}`}>
|
||||
{ep.method}
|
||||
</span>
|
||||
</td>
|
||||
<td className="px-4 py-3 font-mono text-sm">{ep.path}</td>
|
||||
<td className="px-4 py-3 text-sm text-slate-600">{ep.description}</td>
|
||||
</tr>
|
||||
))}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
|
||||
{/* WebSocket Protocol */}
|
||||
<div className="bg-slate-50 rounded-lg p-4">
|
||||
<h4 className="font-semibold text-slate-800 mb-3">WebSocket Protocol</h4>
|
||||
<div className="grid grid-cols-1 md:grid-cols-2 gap-4 text-sm">
|
||||
<div className="bg-white rounded-lg p-3 border border-slate-200">
|
||||
<div className="font-medium text-slate-700 mb-2">Client → Server</div>
|
||||
<ul className="list-disc list-inside text-slate-600 space-y-1">
|
||||
<li><code className="bg-slate-100 px-1 rounded">Binary</code>: Int16 PCM Audio (24kHz, 80ms)</li>
|
||||
<li><code className="bg-slate-100 px-1 rounded">JSON</code>: {`{type: "config|end_turn|interrupt"}`}</li>
|
||||
</ul>
|
||||
</div>
|
||||
<div className="bg-white rounded-lg p-3 border border-slate-200">
|
||||
<div className="font-medium text-slate-700 mb-2">Server → Client</div>
|
||||
<ul className="list-disc list-inside text-slate-600 space-y-1">
|
||||
<li><code className="bg-slate-100 px-1 rounded">Binary</code>: Audio Response (base64)</li>
|
||||
<li><code className="bg-slate-100 px-1 rounded">JSON</code>: {`{type: "transcript|intent|status|error"}`}</li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Example curl commands */}
|
||||
<div className="bg-slate-900 rounded-lg p-4 text-sm">
|
||||
<h4 className="font-semibold text-slate-300 mb-3">Beispiel: Session erstellen</h4>
|
||||
<pre className="text-green-400 overflow-x-auto">{`curl -X POST https://macmini:8091/api/v1/sessions \\
|
||||
-H "Content-Type: application/json" \\
|
||||
-d '{
|
||||
"namespace_id": "ns-12345678abcdef12345678abcdef12",
|
||||
"key_hash": "sha256:dGVzdGtleWhhc2h0ZXN0a2V5aGFzaHRlc3Q=",
|
||||
"device_type": "pwa"
|
||||
}'`}</pre>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -1,635 +0,0 @@
|
||||
'use client'
|
||||
|
||||
/**
|
||||
* Video & Chat Admin Page
|
||||
*
|
||||
* Matrix & Jitsi Monitoring Dashboard
|
||||
* Provides system statistics, active calls, user metrics, and service health
|
||||
* Migrated from website/app/admin/communication
|
||||
*/
|
||||
|
||||
import { useEffect, useState, useCallback } from 'react'
|
||||
import Link from 'next/link'
|
||||
import { PagePurpose } from '@/components/common/PagePurpose'
|
||||
import { getModuleByHref } from '@/lib/navigation'
|
||||
|
||||
interface MatrixStats {
|
||||
total_users: number
|
||||
active_users: number
|
||||
total_rooms: number
|
||||
active_rooms: number
|
||||
messages_today: number
|
||||
messages_this_week: number
|
||||
status: 'online' | 'offline' | 'degraded'
|
||||
}
|
||||
|
||||
interface JitsiStats {
|
||||
active_meetings: number
|
||||
total_participants: number
|
||||
meetings_today: number
|
||||
average_duration_minutes: number
|
||||
peak_concurrent_users: number
|
||||
total_minutes_today: number
|
||||
status: 'online' | 'offline' | 'degraded'
|
||||
}
|
||||
|
||||
interface TrafficStats {
|
||||
matrix: {
|
||||
bandwidth_in_mb: number
|
||||
bandwidth_out_mb: number
|
||||
messages_per_minute: number
|
||||
media_uploads_today: number
|
||||
media_size_mb: number
|
||||
}
|
||||
jitsi: {
|
||||
bandwidth_in_mb: number
|
||||
bandwidth_out_mb: number
|
||||
video_streams_active: number
|
||||
audio_streams_active: number
|
||||
estimated_hourly_gb: number
|
||||
}
|
||||
total: {
|
||||
bandwidth_in_mb: number
|
||||
bandwidth_out_mb: number
|
||||
estimated_monthly_gb: number
|
||||
}
|
||||
}
|
||||
|
||||
interface CommunicationStats {
|
||||
matrix: MatrixStats
|
||||
jitsi: JitsiStats
|
||||
traffic?: TrafficStats
|
||||
last_updated: string
|
||||
}
|
||||
|
||||
interface ActiveMeeting {
|
||||
room_name: string
|
||||
display_name: string
|
||||
participants: number
|
||||
started_at: string
|
||||
duration_minutes: number
|
||||
}
|
||||
|
||||
interface RecentRoom {
|
||||
room_id: string
|
||||
name: string
|
||||
member_count: number
|
||||
last_activity: string
|
||||
room_type: 'class' | 'parent' | 'staff' | 'general'
|
||||
}
|
||||
|
||||
export default function VideoChatPage() {
|
||||
const [stats, setStats] = useState<CommunicationStats | null>(null)
|
||||
const [activeMeetings, setActiveMeetings] = useState<ActiveMeeting[]>([])
|
||||
const [recentRooms, setRecentRooms] = useState<RecentRoom[]>([])
|
||||
const [loading, setLoading] = useState(true)
|
||||
const [error, setError] = useState<string | null>(null)
|
||||
|
||||
const moduleInfo = getModuleByHref('/communication/video-chat')
|
||||
|
||||
// Use local API proxy
|
||||
const fetchStats = useCallback(async () => {
|
||||
try {
|
||||
const response = await fetch('/api/admin/communication/stats')
|
||||
if (!response.ok) {
|
||||
throw new Error(`HTTP ${response.status}`)
|
||||
}
|
||||
const data = await response.json()
|
||||
setStats(data)
|
||||
setActiveMeetings(data.active_meetings || [])
|
||||
setRecentRooms(data.recent_rooms || [])
|
||||
setError(null)
|
||||
} catch (err) {
|
||||
setError(err instanceof Error ? err.message : 'Verbindungsfehler')
|
||||
// Set mock data for display purposes when API unavailable
|
||||
setStats({
|
||||
matrix: {
|
||||
total_users: 0,
|
||||
active_users: 0,
|
||||
total_rooms: 0,
|
||||
active_rooms: 0,
|
||||
messages_today: 0,
|
||||
messages_this_week: 0,
|
||||
status: 'offline'
|
||||
},
|
||||
jitsi: {
|
||||
active_meetings: 0,
|
||||
total_participants: 0,
|
||||
meetings_today: 0,
|
||||
average_duration_minutes: 0,
|
||||
peak_concurrent_users: 0,
|
||||
total_minutes_today: 0,
|
||||
status: 'offline'
|
||||
},
|
||||
last_updated: new Date().toISOString()
|
||||
})
|
||||
} finally {
|
||||
setLoading(false)
|
||||
}
|
||||
}, [])
|
||||
|
||||
useEffect(() => {
|
||||
fetchStats()
|
||||
}, [fetchStats])
|
||||
|
||||
// Auto-refresh every 15 seconds
|
||||
useEffect(() => {
|
||||
const interval = setInterval(fetchStats, 15000)
|
||||
return () => clearInterval(interval)
|
||||
}, [fetchStats])
|
||||
|
||||
const getStatusBadge = (status: string) => {
|
||||
const baseClasses = 'px-3 py-1 rounded-full text-xs font-semibold uppercase'
|
||||
switch (status) {
|
||||
case 'online':
|
||||
return `${baseClasses} bg-green-100 text-green-800`
|
||||
case 'degraded':
|
||||
return `${baseClasses} bg-yellow-100 text-yellow-800`
|
||||
case 'offline':
|
||||
return `${baseClasses} bg-red-100 text-red-800`
|
||||
default:
|
||||
return `${baseClasses} bg-slate-100 text-slate-600`
|
||||
}
|
||||
}
|
||||
|
||||
const getRoomTypeBadge = (type: string) => {
|
||||
const baseClasses = 'px-2 py-0.5 rounded text-xs font-medium'
|
||||
switch (type) {
|
||||
case 'class':
|
||||
return `${baseClasses} bg-blue-100 text-blue-700`
|
||||
case 'parent':
|
||||
return `${baseClasses} bg-purple-100 text-purple-700`
|
||||
case 'staff':
|
||||
return `${baseClasses} bg-orange-100 text-orange-700`
|
||||
default:
|
||||
return `${baseClasses} bg-slate-100 text-slate-600`
|
||||
}
|
||||
}
|
||||
|
||||
const formatDuration = (minutes: number) => {
|
||||
if (minutes < 60) return `${Math.round(minutes)} Min.`
|
||||
const hours = Math.floor(minutes / 60)
|
||||
const mins = Math.round(minutes % 60)
|
||||
return `${hours}h ${mins}m`
|
||||
}
|
||||
|
||||
const formatTimeAgo = (dateStr: string) => {
|
||||
const date = new Date(dateStr)
|
||||
const now = new Date()
|
||||
const diffMs = now.getTime() - date.getTime()
|
||||
const diffMins = Math.floor(diffMs / 60000)
|
||||
|
||||
if (diffMins < 1) return 'gerade eben'
|
||||
if (diffMins < 60) return `vor ${diffMins} Min.`
|
||||
if (diffMins < 1440) return `vor ${Math.floor(diffMins / 60)} Std.`
|
||||
return `vor ${Math.floor(diffMins / 1440)} Tagen`
|
||||
}
|
||||
|
||||
// Traffic estimation helpers for SysEleven planning
|
||||
const calculateEstimatedTraffic = (direction: 'in' | 'out'): number => {
|
||||
const messages = stats?.matrix?.messages_today || 0
|
||||
const callMinutes = stats?.jitsi?.total_minutes_today || 0
|
||||
const participants = stats?.jitsi?.total_participants || 0
|
||||
|
||||
const messageTrafficMB = messages * 0.002
|
||||
const videoTrafficMB = callMinutes * participants * 0.011
|
||||
|
||||
if (direction === 'in') {
|
||||
return messageTrafficMB * 0.3 + videoTrafficMB * 0.4
|
||||
}
|
||||
return messageTrafficMB * 0.7 + videoTrafficMB * 0.6
|
||||
}
|
||||
|
||||
const calculateHourlyEstimate = (): number => {
|
||||
const activeParticipants = stats?.jitsi?.total_participants || 0
|
||||
return activeParticipants * 0.675
|
||||
}
|
||||
|
||||
const calculateMonthlyEstimate = (): number => {
|
||||
const dailyCallMinutes = stats?.jitsi?.total_minutes_today || 0
|
||||
const avgParticipants = stats?.jitsi?.peak_concurrent_users || 1
|
||||
const monthlyMinutes = dailyCallMinutes * 22
|
||||
return (monthlyMinutes * avgParticipants * 11) / 1024
|
||||
}
|
||||
|
||||
const getResourceRecommendation = (): string => {
|
||||
const peakUsers = stats?.jitsi?.peak_concurrent_users || 0
|
||||
const monthlyGB = calculateMonthlyEstimate()
|
||||
|
||||
if (monthlyGB < 10 || peakUsers < 5) {
|
||||
return 'Starter (1 vCPU, 2GB RAM, 100GB Traffic)'
|
||||
} else if (monthlyGB < 50 || peakUsers < 20) {
|
||||
return 'Standard (2 vCPU, 4GB RAM, 500GB Traffic)'
|
||||
} else if (monthlyGB < 200 || peakUsers < 50) {
|
||||
return 'Professional (4 vCPU, 8GB RAM, 2TB Traffic)'
|
||||
} else {
|
||||
return 'Enterprise (8+ vCPU, 16GB+ RAM, Unlimited Traffic)'
|
||||
}
|
||||
}
|
||||
|
||||
return (
|
||||
<div>
|
||||
{/* Page Purpose */}
|
||||
<PagePurpose
|
||||
title={moduleInfo?.module.name || 'Video & Chat'}
|
||||
purpose={moduleInfo?.module.purpose || 'Matrix & Jitsi Monitoring Dashboard'}
|
||||
audience={moduleInfo?.module.audience || ['Admins', 'DevOps']}
|
||||
architecture={{
|
||||
services: ['synapse (Matrix)', 'jitsi-meet', 'prosody', 'jvb'],
|
||||
databases: ['PostgreSQL', 'synapse-db'],
|
||||
}}
|
||||
collapsible={true}
|
||||
defaultCollapsed={true}
|
||||
/>
|
||||
|
||||
{/* Quick Actions */}
|
||||
<div className="flex gap-3 mb-6">
|
||||
<Link
|
||||
href="/communication/video-chat/wizard"
|
||||
className="px-4 py-2 bg-green-600 text-white rounded-lg hover:bg-green-700 transition-colors text-sm font-medium"
|
||||
>
|
||||
Test Wizard starten
|
||||
</Link>
|
||||
<button
|
||||
onClick={fetchStats}
|
||||
disabled={loading}
|
||||
className="px-4 py-2 border border-slate-300 rounded-lg hover:bg-slate-50 disabled:opacity-50 text-sm"
|
||||
>
|
||||
{loading ? 'Lade...' : 'Aktualisieren'}
|
||||
</button>
|
||||
</div>
|
||||
|
||||
{/* Service Status Overview */}
|
||||
<div className="grid grid-cols-1 md:grid-cols-2 gap-6 mb-6">
|
||||
{/* Matrix Status Card */}
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-6">
|
||||
<div className="flex items-center justify-between mb-4">
|
||||
<div className="flex items-center gap-3">
|
||||
<div className="w-10 h-10 bg-purple-100 rounded-lg flex items-center justify-center">
|
||||
<svg className="w-6 h-6 text-purple-600" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M8 12h.01M12 12h.01M16 12h.01M21 12c0 4.418-4.03 8-9 8a9.863 9.863 0 01-4.255-.949L3 20l1.395-3.72C3.512 15.042 3 13.574 3 12c0-4.418 4.03-8 9-8s9 3.582 9 8z" />
|
||||
</svg>
|
||||
</div>
|
||||
<div>
|
||||
<h3 className="font-semibold text-slate-900">Matrix (Synapse)</h3>
|
||||
<p className="text-sm text-slate-500">E2EE Messaging</p>
|
||||
</div>
|
||||
</div>
|
||||
<span className={getStatusBadge(stats?.matrix.status || 'offline')}>
|
||||
{stats?.matrix.status || 'offline'}
|
||||
</span>
|
||||
</div>
|
||||
<div className="grid grid-cols-3 gap-4">
|
||||
<div>
|
||||
<div className="text-2xl font-bold text-slate-900">{stats?.matrix.total_users || 0}</div>
|
||||
<div className="text-xs text-slate-500">Benutzer</div>
|
||||
</div>
|
||||
<div>
|
||||
<div className="text-2xl font-bold text-slate-900">{stats?.matrix.active_users || 0}</div>
|
||||
<div className="text-xs text-slate-500">Aktiv</div>
|
||||
</div>
|
||||
<div>
|
||||
<div className="text-2xl font-bold text-slate-900">{stats?.matrix.total_rooms || 0}</div>
|
||||
<div className="text-xs text-slate-500">Raeume</div>
|
||||
</div>
|
||||
</div>
|
||||
<div className="mt-4 pt-4 border-t border-slate-100">
|
||||
<div className="flex justify-between text-sm">
|
||||
<span className="text-slate-500">Nachrichten heute</span>
|
||||
<span className="font-medium">{stats?.matrix.messages_today || 0}</span>
|
||||
</div>
|
||||
<div className="flex justify-between text-sm mt-1">
|
||||
<span className="text-slate-500">Diese Woche</span>
|
||||
<span className="font-medium">{stats?.matrix.messages_this_week || 0}</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Jitsi Status Card */}
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-6">
|
||||
<div className="flex items-center justify-between mb-4">
|
||||
<div className="flex items-center gap-3">
|
||||
<div className="w-10 h-10 bg-blue-100 rounded-lg flex items-center justify-center">
|
||||
<svg className="w-6 h-6 text-blue-600" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15 10l4.553-2.276A1 1 0 0121 8.618v6.764a1 1 0 01-1.447.894L15 14M5 18h8a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v8a2 2 0 002 2z" />
|
||||
</svg>
|
||||
</div>
|
||||
<div>
|
||||
<h3 className="font-semibold text-slate-900">Jitsi Meet</h3>
|
||||
<p className="text-sm text-slate-500">Videokonferenzen</p>
|
||||
</div>
|
||||
</div>
|
||||
<span className={getStatusBadge(stats?.jitsi.status || 'offline')}>
|
||||
{stats?.jitsi.status || 'offline'}
|
||||
</span>
|
||||
</div>
|
||||
<div className="grid grid-cols-3 gap-4">
|
||||
<div>
|
||||
<div className="text-2xl font-bold text-green-600">{stats?.jitsi.active_meetings || 0}</div>
|
||||
<div className="text-xs text-slate-500">Live Calls</div>
|
||||
</div>
|
||||
<div>
|
||||
<div className="text-2xl font-bold text-slate-900">{stats?.jitsi.total_participants || 0}</div>
|
||||
<div className="text-xs text-slate-500">Teilnehmer</div>
|
||||
</div>
|
||||
<div>
|
||||
<div className="text-2xl font-bold text-slate-900">{stats?.jitsi.meetings_today || 0}</div>
|
||||
<div className="text-xs text-slate-500">Calls heute</div>
|
||||
</div>
|
||||
</div>
|
||||
<div className="mt-4 pt-4 border-t border-slate-100">
|
||||
<div className="flex justify-between text-sm">
|
||||
<span className="text-slate-500">Durchschnittliche Dauer</span>
|
||||
<span className="font-medium">{formatDuration(stats?.jitsi.average_duration_minutes || 0)}</span>
|
||||
</div>
|
||||
<div className="flex justify-between text-sm mt-1">
|
||||
<span className="text-slate-500">Peak gleichzeitig</span>
|
||||
<span className="font-medium">{stats?.jitsi.peak_concurrent_users || 0} Nutzer</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Traffic & Bandwidth Statistics */}
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-6 mb-6">
|
||||
<div className="flex items-center justify-between mb-4">
|
||||
<div className="flex items-center gap-3">
|
||||
<div className="w-10 h-10 bg-emerald-100 rounded-lg flex items-center justify-center">
|
||||
<svg className="w-6 h-6 text-emerald-600" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 7h8m0 0v8m0-8l-8 8-4-4-6 6" />
|
||||
</svg>
|
||||
</div>
|
||||
<div>
|
||||
<h3 className="font-semibold text-slate-900">Traffic & Bandbreite</h3>
|
||||
<p className="text-sm text-slate-500">SysEleven Ressourcenplanung</p>
|
||||
</div>
|
||||
</div>
|
||||
<span className="px-3 py-1 rounded-full text-xs font-semibold uppercase bg-emerald-100 text-emerald-800">
|
||||
Live
|
||||
</span>
|
||||
</div>
|
||||
|
||||
<div className="grid grid-cols-2 md:grid-cols-4 gap-4 mb-4">
|
||||
<div className="bg-slate-50 rounded-lg p-4">
|
||||
<div className="text-xs text-slate-500 mb-1">Eingehend (heute)</div>
|
||||
<div className="text-2xl font-bold text-slate-900">
|
||||
{stats?.traffic?.total?.bandwidth_in_mb?.toFixed(1) || calculateEstimatedTraffic('in').toFixed(1)} MB
|
||||
</div>
|
||||
</div>
|
||||
<div className="bg-slate-50 rounded-lg p-4">
|
||||
<div className="text-xs text-slate-500 mb-1">Ausgehend (heute)</div>
|
||||
<div className="text-2xl font-bold text-slate-900">
|
||||
{stats?.traffic?.total?.bandwidth_out_mb?.toFixed(1) || calculateEstimatedTraffic('out').toFixed(1)} MB
|
||||
</div>
|
||||
</div>
|
||||
<div className="bg-slate-50 rounded-lg p-4">
|
||||
<div className="text-xs text-slate-500 mb-1">Geschaetzt/Stunde</div>
|
||||
<div className="text-2xl font-bold text-blue-600">
|
||||
{stats?.traffic?.jitsi?.estimated_hourly_gb?.toFixed(2) || calculateHourlyEstimate().toFixed(2)} GB
|
||||
</div>
|
||||
</div>
|
||||
<div className="bg-slate-50 rounded-lg p-4">
|
||||
<div className="text-xs text-slate-500 mb-1">Geschaetzt/Monat</div>
|
||||
<div className="text-2xl font-bold text-emerald-600">
|
||||
{stats?.traffic?.total?.estimated_monthly_gb?.toFixed(1) || calculateMonthlyEstimate().toFixed(1)} GB
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="grid grid-cols-1 md:grid-cols-2 gap-4">
|
||||
{/* Matrix Traffic */}
|
||||
<div className="border border-slate-200 rounded-lg p-4">
|
||||
<div className="flex items-center gap-2 mb-3">
|
||||
<div className="w-3 h-3 bg-purple-500 rounded-full"></div>
|
||||
<span className="text-sm font-medium text-slate-700">Matrix Messaging</span>
|
||||
</div>
|
||||
<div className="space-y-2 text-sm">
|
||||
<div className="flex justify-between">
|
||||
<span className="text-slate-500">Nachrichten/Min</span>
|
||||
<span className="font-medium">{stats?.traffic?.matrix?.messages_per_minute || Math.round((stats?.matrix?.messages_today || 0) / (new Date().getHours() || 1) / 60)}</span>
|
||||
</div>
|
||||
<div className="flex justify-between">
|
||||
<span className="text-slate-500">Media Uploads heute</span>
|
||||
<span className="font-medium">{stats?.traffic?.matrix?.media_uploads_today || 0}</span>
|
||||
</div>
|
||||
<div className="flex justify-between">
|
||||
<span className="text-slate-500">Media Groesse</span>
|
||||
<span className="font-medium">{stats?.traffic?.matrix?.media_size_mb?.toFixed(1) || '0.0'} MB</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Jitsi Traffic */}
|
||||
<div className="border border-slate-200 rounded-lg p-4">
|
||||
<div className="flex items-center gap-2 mb-3">
|
||||
<div className="w-3 h-3 bg-blue-500 rounded-full"></div>
|
||||
<span className="text-sm font-medium text-slate-700">Jitsi Video</span>
|
||||
</div>
|
||||
<div className="space-y-2 text-sm">
|
||||
<div className="flex justify-between">
|
||||
<span className="text-slate-500">Video Streams aktiv</span>
|
||||
<span className="font-medium">{stats?.traffic?.jitsi?.video_streams_active || (stats?.jitsi?.total_participants || 0)}</span>
|
||||
</div>
|
||||
<div className="flex justify-between">
|
||||
<span className="text-slate-500">Audio Streams aktiv</span>
|
||||
<span className="font-medium">{stats?.traffic?.jitsi?.audio_streams_active || (stats?.jitsi?.total_participants || 0)}</span>
|
||||
</div>
|
||||
<div className="flex justify-between">
|
||||
<span className="text-slate-500">Bitrate geschaetzt</span>
|
||||
<span className="font-medium">{((stats?.jitsi?.total_participants || 0) * 1.5).toFixed(1)} Mbps</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* SysEleven Recommendation */}
|
||||
<div className="mt-4 p-4 bg-emerald-50 border border-emerald-200 rounded-lg">
|
||||
<h4 className="text-sm font-semibold text-emerald-800 mb-2">SysEleven Empfehlung</h4>
|
||||
<div className="text-sm text-emerald-700">
|
||||
<p>Basierend auf aktuellem Traffic: <strong>{getResourceRecommendation()}</strong></p>
|
||||
<p className="mt-1 text-xs text-emerald-600">
|
||||
Peak Teilnehmer: {stats?.jitsi?.peak_concurrent_users || 0} |
|
||||
Durchschnittliche Call-Dauer: {stats?.jitsi?.average_duration_minutes?.toFixed(0) || 0} Min. |
|
||||
Calls heute: {stats?.jitsi?.meetings_today || 0}
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Active Meetings */}
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-6 mb-6">
|
||||
<div className="flex items-center justify-between mb-4">
|
||||
<h3 className="font-semibold text-slate-900">Aktive Meetings</h3>
|
||||
</div>
|
||||
|
||||
{activeMeetings.length === 0 ? (
|
||||
<div className="text-center py-8 text-slate-500">
|
||||
<svg className="w-12 h-12 mx-auto mb-3 text-slate-300" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M15 10l4.553-2.276A1 1 0 0121 8.618v6.764a1 1 0 01-1.447.894L15 14M5 18h8a2 2 0 002-2V8a2 2 0 00-2-2H5a2 2 0 00-2 2v8a2 2 0 002 2z" />
|
||||
</svg>
|
||||
<p>Keine aktiven Meetings</p>
|
||||
</div>
|
||||
) : (
|
||||
<div className="overflow-x-auto">
|
||||
<table className="w-full">
|
||||
<thead>
|
||||
<tr className="text-left text-xs text-slate-500 uppercase border-b border-slate-200">
|
||||
<th className="pb-3 pr-4">Meeting</th>
|
||||
<th className="pb-3 pr-4">Teilnehmer</th>
|
||||
<th className="pb-3 pr-4">Gestartet</th>
|
||||
<th className="pb-3">Dauer</th>
|
||||
</tr>
|
||||
</thead>
|
||||
<tbody className="divide-y divide-slate-100">
|
||||
{activeMeetings.map((meeting, idx) => (
|
||||
<tr key={idx} className="text-sm">
|
||||
<td className="py-3 pr-4">
|
||||
<div className="font-medium text-slate-900">{meeting.display_name}</div>
|
||||
<div className="text-xs text-slate-500">{meeting.room_name}</div>
|
||||
</td>
|
||||
<td className="py-3 pr-4">
|
||||
<span className="inline-flex items-center gap-1">
|
||||
<span className="w-2 h-2 bg-green-500 rounded-full animate-pulse" />
|
||||
{meeting.participants}
|
||||
</span>
|
||||
</td>
|
||||
<td className="py-3 pr-4 text-slate-500">{formatTimeAgo(meeting.started_at)}</td>
|
||||
<td className="py-3 font-medium">{formatDuration(meeting.duration_minutes)}</td>
|
||||
</tr>
|
||||
))}
|
||||
</tbody>
|
||||
</table>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
|
||||
{/* Recent Chat Rooms & Usage Stats */}
|
||||
<div className="grid grid-cols-1 lg:grid-cols-2 gap-6 mb-6">
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-6">
|
||||
<h3 className="font-semibold text-slate-900 mb-4">Aktive Chat-Raeume</h3>
|
||||
|
||||
{recentRooms.length === 0 ? (
|
||||
<div className="text-center py-6 text-slate-500">
|
||||
<p>Keine aktiven Raeume</p>
|
||||
</div>
|
||||
) : (
|
||||
<div className="space-y-3">
|
||||
{recentRooms.slice(0, 5).map((room, idx) => (
|
||||
<div key={idx} className="flex items-center justify-between p-3 bg-slate-50 rounded-lg">
|
||||
<div className="flex items-center gap-3">
|
||||
<div className="w-8 h-8 bg-slate-200 rounded-lg flex items-center justify-center">
|
||||
<svg className="w-4 h-4 text-slate-600" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M17 20h5v-2a3 3 0 00-5.356-1.857M17 20H7m10 0v-2c0-.656-.126-1.283-.356-1.857M7 20H2v-2a3 3 0 015.356-1.857M7 20v-2c0-.656.126-1.283.356-1.857m0 0a5.002 5.002 0 019.288 0M15 7a3 3 0 11-6 0 3 3 0 016 0z" />
|
||||
</svg>
|
||||
</div>
|
||||
<div>
|
||||
<div className="font-medium text-slate-900 text-sm">{room.name}</div>
|
||||
<div className="text-xs text-slate-500">{room.member_count} Mitglieder</div>
|
||||
</div>
|
||||
</div>
|
||||
<div className="flex items-center gap-2">
|
||||
<span className={getRoomTypeBadge(room.room_type)}>{room.room_type}</span>
|
||||
<span className="text-xs text-slate-400">{formatTimeAgo(room.last_activity)}</span>
|
||||
</div>
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
|
||||
{/* Usage Statistics */}
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-6">
|
||||
<h3 className="font-semibold text-slate-900 mb-4">Nutzungsstatistiken</h3>
|
||||
<div className="space-y-4">
|
||||
<div>
|
||||
<div className="flex justify-between text-sm mb-1">
|
||||
<span className="text-slate-600">Call-Minuten heute</span>
|
||||
<span className="font-semibold">{stats?.jitsi.total_minutes_today || 0} Min.</span>
|
||||
</div>
|
||||
<div className="w-full bg-slate-100 rounded-full h-2">
|
||||
<div
|
||||
className="bg-blue-600 h-2 rounded-full transition-all"
|
||||
style={{ width: `${Math.min((stats?.jitsi.total_minutes_today || 0) / 500 * 100, 100)}%` }}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div className="flex justify-between text-sm mb-1">
|
||||
<span className="text-slate-600">Aktive Chat-Raeume</span>
|
||||
<span className="font-semibold">{stats?.matrix.active_rooms || 0} / {stats?.matrix.total_rooms || 0}</span>
|
||||
</div>
|
||||
<div className="w-full bg-slate-100 rounded-full h-2">
|
||||
<div
|
||||
className="bg-purple-600 h-2 rounded-full transition-all"
|
||||
style={{ width: `${stats?.matrix.total_rooms ? ((stats.matrix.active_rooms / stats.matrix.total_rooms) * 100) : 0}%` }}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
<div>
|
||||
<div className="flex justify-between text-sm mb-1">
|
||||
<span className="text-slate-600">Aktive Nutzer</span>
|
||||
<span className="font-semibold">{stats?.matrix.active_users || 0} / {stats?.matrix.total_users || 0}</span>
|
||||
</div>
|
||||
<div className="w-full bg-slate-100 rounded-full h-2">
|
||||
<div
|
||||
className="bg-green-600 h-2 rounded-full transition-all"
|
||||
style={{ width: `${stats?.matrix.total_users ? ((stats.matrix.active_users / stats.matrix.total_users) * 100) : 0}%` }}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Quick Actions */}
|
||||
<div className="mt-6 pt-4 border-t border-slate-100">
|
||||
<h4 className="text-sm font-medium text-slate-700 mb-3">Schnellaktionen</h4>
|
||||
<div className="flex flex-wrap gap-2">
|
||||
<a
|
||||
href="http://localhost:8448/_synapse/admin"
|
||||
target="_blank"
|
||||
rel="noopener noreferrer"
|
||||
className="px-3 py-1.5 text-sm bg-purple-100 text-purple-700 rounded-lg hover:bg-purple-200 transition-colors"
|
||||
>
|
||||
Synapse Admin
|
||||
</a>
|
||||
<a
|
||||
href="http://localhost:8443"
|
||||
target="_blank"
|
||||
rel="noopener noreferrer"
|
||||
className="px-3 py-1.5 text-sm bg-blue-100 text-blue-700 rounded-lg hover:bg-blue-200 transition-colors"
|
||||
>
|
||||
Jitsi Meet
|
||||
</a>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Connection Info */}
|
||||
<div className="bg-blue-50 border border-blue-200 rounded-xl p-4">
|
||||
<div className="flex gap-3">
|
||||
<svg className="w-5 h-5 text-blue-600 flex-shrink-0 mt-0.5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 16h-1v-4h-1m1-4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
|
||||
</svg>
|
||||
<div>
|
||||
<h4 className="font-semibold text-blue-900">Service Konfiguration</h4>
|
||||
<p className="text-sm text-blue-800 mt-1">
|
||||
<strong>Matrix Homeserver:</strong> http://localhost:8448 (Synapse)<br />
|
||||
<strong>Jitsi Meet:</strong> http://localhost:8443<br />
|
||||
<strong>Auto-Refresh:</strong> Alle 15 Sekunden
|
||||
</p>
|
||||
{error && (
|
||||
<p className="text-sm text-red-600 mt-2">
|
||||
<strong>Fehler:</strong> {error} - Backend nicht erreichbar
|
||||
</p>
|
||||
)}
|
||||
{stats?.last_updated && (
|
||||
<p className="text-xs text-blue-600 mt-2">
|
||||
Letzte Aktualisierung: {new Date(stats.last_updated).toLocaleString('de-DE')}
|
||||
</p>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -27,7 +27,6 @@ export default function DashboardPage() {
|
||||
{ name: 'Jitsi Meet', status: 'unknown' },
|
||||
{ name: 'Mailpit', status: 'unknown' },
|
||||
{ name: 'Gitea', status: 'unknown' },
|
||||
{ name: 'Woodpecker CI', status: 'unknown' },
|
||||
{ name: 'Backend Core', status: 'unknown' },
|
||||
]
|
||||
|
||||
|
||||
@@ -1,318 +0,0 @@
|
||||
'use client'
|
||||
|
||||
import { useState } from 'react'
|
||||
|
||||
type Tab = 'colors' | 'typography' | 'components' | 'logos' | 'voice'
|
||||
|
||||
const tabs: { id: Tab; label: string }[] = [
|
||||
{ id: 'colors', label: 'Farben' },
|
||||
{ id: 'typography', label: 'Typografie' },
|
||||
{ id: 'components', label: 'Komponenten' },
|
||||
{ id: 'logos', label: 'Logos' },
|
||||
{ id: 'voice', label: 'Voice & Tone' },
|
||||
]
|
||||
|
||||
const primaryColors = [
|
||||
{ name: 'Primary 50', hex: '#f0f9ff', class: 'bg-primary-50' },
|
||||
{ name: 'Primary 100', hex: '#e0f2fe', class: 'bg-primary-100' },
|
||||
{ name: 'Primary 200', hex: '#bae6fd', class: 'bg-primary-200' },
|
||||
{ name: 'Primary 300', hex: '#7dd3fc', class: 'bg-primary-300' },
|
||||
{ name: 'Primary 400', hex: '#38bdf8', class: 'bg-primary-400' },
|
||||
{ name: 'Primary 500', hex: '#0ea5e9', class: 'bg-primary-500' },
|
||||
{ name: 'Primary 600', hex: '#0284c7', class: 'bg-primary-600' },
|
||||
{ name: 'Primary 700', hex: '#0369a1', class: 'bg-primary-700' },
|
||||
{ name: 'Primary 800', hex: '#075985', class: 'bg-primary-800' },
|
||||
{ name: 'Primary 900', hex: '#0c4a6e', class: 'bg-primary-900' },
|
||||
]
|
||||
|
||||
const categoryColorSets = [
|
||||
{
|
||||
name: 'Kommunikation',
|
||||
baseHex: '#22c55e',
|
||||
swatches: [
|
||||
{ name: '100', hex: '#dcfce7' },
|
||||
{ name: '300', hex: '#86efac' },
|
||||
{ name: '500', hex: '#22c55e' },
|
||||
{ name: '700', hex: '#15803d' },
|
||||
],
|
||||
},
|
||||
{
|
||||
name: 'Infrastruktur',
|
||||
baseHex: '#f97316',
|
||||
swatches: [
|
||||
{ name: '100', hex: '#ffedd5' },
|
||||
{ name: '300', hex: '#fdba74' },
|
||||
{ name: '500', hex: '#f97316' },
|
||||
{ name: '700', hex: '#c2410c' },
|
||||
],
|
||||
},
|
||||
{
|
||||
name: 'Entwicklung',
|
||||
baseHex: '#64748b',
|
||||
swatches: [
|
||||
{ name: '100', hex: '#f1f5f9' },
|
||||
{ name: '300', hex: '#cbd5e1' },
|
||||
{ name: '500', hex: '#64748b' },
|
||||
{ name: '700', hex: '#334155' },
|
||||
],
|
||||
},
|
||||
]
|
||||
|
||||
export default function BrandbookPage() {
|
||||
const [activeTab, setActiveTab] = useState<Tab>('colors')
|
||||
|
||||
return (
|
||||
<div>
|
||||
{/* Tabs */}
|
||||
<div className="flex gap-1 mb-6 bg-white rounded-xl border border-slate-200 p-1">
|
||||
{tabs.map((tab) => (
|
||||
<button
|
||||
key={tab.id}
|
||||
onClick={() => setActiveTab(tab.id)}
|
||||
className={`flex-1 px-4 py-2 rounded-lg text-sm font-medium transition-colors ${
|
||||
activeTab === tab.id
|
||||
? 'bg-primary-600 text-white'
|
||||
: 'text-slate-600 hover:bg-slate-100'
|
||||
}`}
|
||||
>
|
||||
{tab.label}
|
||||
</button>
|
||||
))}
|
||||
</div>
|
||||
|
||||
{/* Colors Tab */}
|
||||
{activeTab === 'colors' && (
|
||||
<div className="space-y-8">
|
||||
{/* Primary */}
|
||||
<div>
|
||||
<h2 className="text-lg font-semibold text-slate-900 mb-4">Primary: Sky Blue</h2>
|
||||
<div className="grid grid-cols-5 md:grid-cols-10 gap-2">
|
||||
{primaryColors.map((color) => (
|
||||
<div key={color.hex} className="text-center">
|
||||
<div
|
||||
className="w-full aspect-square rounded-lg border border-slate-200 mb-1"
|
||||
style={{ backgroundColor: color.hex }}
|
||||
/>
|
||||
<div className="text-xs text-slate-500">{color.name.split(' ')[1]}</div>
|
||||
<div className="text-xs text-slate-400 font-mono">{color.hex}</div>
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Category Colors */}
|
||||
<div>
|
||||
<h2 className="text-lg font-semibold text-slate-900 mb-4">Kategorie-Farben</h2>
|
||||
<div className="grid grid-cols-1 md:grid-cols-3 gap-6">
|
||||
{categoryColorSets.map((set) => (
|
||||
<div key={set.name} className="bg-white rounded-xl border border-slate-200 p-4">
|
||||
<div className="flex items-center gap-2 mb-3">
|
||||
<div
|
||||
className="w-4 h-4 rounded-full"
|
||||
style={{ backgroundColor: set.baseHex }}
|
||||
/>
|
||||
<h3 className="font-medium text-slate-900">{set.name}</h3>
|
||||
<span className="text-xs text-slate-400 font-mono">{set.baseHex}</span>
|
||||
</div>
|
||||
<div className="grid grid-cols-4 gap-2">
|
||||
{set.swatches.map((swatch) => (
|
||||
<div key={swatch.hex} className="text-center">
|
||||
<div
|
||||
className="w-full aspect-square rounded-lg border border-slate-200 mb-1"
|
||||
style={{ backgroundColor: swatch.hex }}
|
||||
/>
|
||||
<div className="text-xs text-slate-400 font-mono">{swatch.name}</div>
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Semantic Colors */}
|
||||
<div>
|
||||
<h2 className="text-lg font-semibold text-slate-900 mb-4">Semantische Farben</h2>
|
||||
<div className="grid grid-cols-2 md:grid-cols-4 gap-4">
|
||||
{[
|
||||
{ name: 'Success', hex: '#22c55e', bg: '#dcfce7' },
|
||||
{ name: 'Warning', hex: '#f59e0b', bg: '#fef3c7' },
|
||||
{ name: 'Error', hex: '#ef4444', bg: '#fee2e2' },
|
||||
{ name: 'Info', hex: '#3b82f6', bg: '#dbeafe' },
|
||||
].map((color) => (
|
||||
<div key={color.name} className="p-4 rounded-xl border border-slate-200" style={{ backgroundColor: color.bg }}>
|
||||
<div className="w-8 h-8 rounded-lg mb-2" style={{ backgroundColor: color.hex }} />
|
||||
<div className="font-medium" style={{ color: color.hex }}>{color.name}</div>
|
||||
<div className="text-xs text-slate-500 font-mono">{color.hex}</div>
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Typography Tab */}
|
||||
{activeTab === 'typography' && (
|
||||
<div className="space-y-8">
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-6">
|
||||
<h2 className="text-lg font-semibold text-slate-900 mb-4">Schriftart: Inter</h2>
|
||||
<p className="text-slate-500 mb-6">
|
||||
Inter ist eine Open-Source-Schriftart (OFL), optimiert fuer Bildschirme.
|
||||
</p>
|
||||
|
||||
<div className="space-y-6">
|
||||
{[
|
||||
{ name: 'Heading 1', class: 'text-4xl font-bold', size: '36px / 2.25rem' },
|
||||
{ name: 'Heading 2', class: 'text-2xl font-semibold', size: '24px / 1.5rem' },
|
||||
{ name: 'Heading 3', class: 'text-xl font-semibold', size: '20px / 1.25rem' },
|
||||
{ name: 'Body Large', class: 'text-lg', size: '18px / 1.125rem' },
|
||||
{ name: 'Body', class: 'text-base', size: '16px / 1rem' },
|
||||
{ name: 'Body Small', class: 'text-sm', size: '14px / 0.875rem' },
|
||||
{ name: 'Caption', class: 'text-xs', size: '12px / 0.75rem' },
|
||||
].map((item) => (
|
||||
<div key={item.name} className="flex items-baseline gap-4 border-b border-slate-100 pb-4">
|
||||
<div className="w-32 text-sm text-slate-500">{item.name}</div>
|
||||
<div className={`flex-1 text-slate-900 ${item.class}`}>
|
||||
BreakPilot Core Admin
|
||||
</div>
|
||||
<div className="text-xs text-slate-400 font-mono">{item.size}</div>
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Components Tab */}
|
||||
{activeTab === 'components' && (
|
||||
<div className="space-y-8">
|
||||
{/* Buttons */}
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-6">
|
||||
<h2 className="text-lg font-semibold text-slate-900 mb-4">Buttons</h2>
|
||||
<div className="flex flex-wrap gap-4">
|
||||
<button className="px-4 py-2 bg-primary-600 text-white rounded-lg hover:bg-primary-700">Primary</button>
|
||||
<button className="px-4 py-2 bg-white border border-slate-200 text-slate-700 rounded-lg hover:bg-slate-50">Secondary</button>
|
||||
<button className="px-4 py-2 bg-red-600 text-white rounded-lg hover:bg-red-700">Danger</button>
|
||||
<button className="px-4 py-2 bg-green-600 text-white rounded-lg hover:bg-green-700">Success</button>
|
||||
<button className="px-4 py-2 text-primary-600 hover:text-primary-700 hover:bg-primary-50 rounded-lg">Ghost</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Cards */}
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-6">
|
||||
<h2 className="text-lg font-semibold text-slate-900 mb-4">Cards</h2>
|
||||
<div className="grid grid-cols-3 gap-4">
|
||||
<div className="p-4 bg-white rounded-xl border border-slate-200 shadow-sm">
|
||||
<h3 className="font-medium text-slate-900">Default Card</h3>
|
||||
<p className="text-sm text-slate-500 mt-1">Standard-Karte mit Rand</p>
|
||||
</div>
|
||||
<div className="p-4 bg-primary-50 rounded-xl border border-primary-200">
|
||||
<h3 className="font-medium text-primary-900">Active Card</h3>
|
||||
<p className="text-sm text-primary-600 mt-1">Hervorgehobene Karte</p>
|
||||
</div>
|
||||
<div className="p-4 bg-white rounded-xl border border-slate-200 shadow-md hover:shadow-lg transition-shadow">
|
||||
<h3 className="font-medium text-slate-900">Hover Card</h3>
|
||||
<p className="text-sm text-slate-500 mt-1">Karte mit Hover-Effekt</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Badges */}
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-6">
|
||||
<h2 className="text-lg font-semibold text-slate-900 mb-4">Badges / Status</h2>
|
||||
<div className="flex flex-wrap gap-3">
|
||||
<span className="px-2 py-1 bg-green-100 text-green-700 rounded-full text-xs font-medium">Healthy</span>
|
||||
<span className="px-2 py-1 bg-red-100 text-red-700 rounded-full text-xs font-medium">Error</span>
|
||||
<span className="px-2 py-1 bg-yellow-100 text-yellow-700 rounded-full text-xs font-medium">Warning</span>
|
||||
<span className="px-2 py-1 bg-blue-100 text-blue-700 rounded-full text-xs font-medium">Info</span>
|
||||
<span className="px-2 py-1 bg-slate-100 text-slate-700 rounded-full text-xs font-medium">Default</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Logos Tab */}
|
||||
{activeTab === 'logos' && (
|
||||
<div className="space-y-8">
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-6">
|
||||
<h2 className="text-lg font-semibold text-slate-900 mb-4">Logo-Varianten</h2>
|
||||
<div className="grid grid-cols-2 gap-6">
|
||||
<div className="p-8 bg-white rounded-xl border border-slate-200 flex items-center justify-center">
|
||||
<div className="text-center">
|
||||
<div className="text-3xl font-bold text-primary-600 mb-1">BreakPilot</div>
|
||||
<div className="text-sm text-slate-500">Core Admin</div>
|
||||
</div>
|
||||
</div>
|
||||
<div className="p-8 bg-slate-900 rounded-xl flex items-center justify-center">
|
||||
<div className="text-center">
|
||||
<div className="text-3xl font-bold text-white mb-1">BreakPilot</div>
|
||||
<div className="text-sm text-slate-400">Core Admin</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-6">
|
||||
<h2 className="text-lg font-semibold text-slate-900 mb-4">Schutzzone</h2>
|
||||
<p className="text-sm text-slate-500">
|
||||
Um das Logo herum muss mindestens der Abstand der Buchstabenhoehe "B" als Freiraum gelassen werden.
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Voice & Tone Tab */}
|
||||
{activeTab === 'voice' && (
|
||||
<div className="space-y-8">
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-6">
|
||||
<h2 className="text-lg font-semibold text-slate-900 mb-4">Sprachstil</h2>
|
||||
<div className="grid grid-cols-2 gap-6">
|
||||
<div>
|
||||
<h3 className="font-medium text-green-600 mb-2">So schreiben wir</h3>
|
||||
<ul className="space-y-2 text-sm text-slate-600">
|
||||
<li className="flex items-start gap-2">
|
||||
<span className="text-green-500 mt-0.5">+</span>
|
||||
<span>Klar und direkt</span>
|
||||
</li>
|
||||
<li className="flex items-start gap-2">
|
||||
<span className="text-green-500 mt-0.5">+</span>
|
||||
<span>Technisch praezise, aber verstaendlich</span>
|
||||
</li>
|
||||
<li className="flex items-start gap-2">
|
||||
<span className="text-green-500 mt-0.5">+</span>
|
||||
<span>Handlungsorientiert</span>
|
||||
</li>
|
||||
<li className="flex items-start gap-2">
|
||||
<span className="text-green-500 mt-0.5">+</span>
|
||||
<span>Deutsch als Hauptsprache</span>
|
||||
</li>
|
||||
</ul>
|
||||
</div>
|
||||
<div>
|
||||
<h3 className="font-medium text-red-600 mb-2">Das vermeiden wir</h3>
|
||||
<ul className="space-y-2 text-sm text-slate-600">
|
||||
<li className="flex items-start gap-2">
|
||||
<span className="text-red-500 mt-0.5">-</span>
|
||||
<span>Unnoetige Anglizismen</span>
|
||||
</li>
|
||||
<li className="flex items-start gap-2">
|
||||
<span className="text-red-500 mt-0.5">-</span>
|
||||
<span>Marketing-Sprache</span>
|
||||
</li>
|
||||
<li className="flex items-start gap-2">
|
||||
<span className="text-red-500 mt-0.5">-</span>
|
||||
<span>Passive Formulierungen</span>
|
||||
</li>
|
||||
<li className="flex items-start gap-2">
|
||||
<span className="text-red-500 mt-0.5">-</span>
|
||||
<span>Abkuerzungen ohne Erklaerung</span>
|
||||
</li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -1,77 +0,0 @@
|
||||
'use client'
|
||||
|
||||
import { useState } from 'react'
|
||||
|
||||
const quickLinks = [
|
||||
{ name: 'Backend Core API', url: 'https://macmini:8000/docs', description: 'FastAPI Swagger Docs' },
|
||||
{ name: 'Gitea', url: 'http://macmini:3003', description: 'Git Server' },
|
||||
{ name: 'Woodpecker CI', url: 'http://macmini:8090', description: 'CI/CD Pipelines' },
|
||||
{ name: 'MkDocs', url: 'http://macmini:8009', description: 'Projekt-Dokumentation' },
|
||||
]
|
||||
|
||||
export default function DocsPage() {
|
||||
const [iframeUrl, setIframeUrl] = useState('http://macmini:8009')
|
||||
const [isLoading, setIsLoading] = useState(true)
|
||||
|
||||
return (
|
||||
<div>
|
||||
{/* Quick Links */}
|
||||
<div className="grid grid-cols-1 md:grid-cols-4 gap-4 mb-6">
|
||||
{quickLinks.map((link) => (
|
||||
<button
|
||||
key={link.name}
|
||||
onClick={() => {
|
||||
setIframeUrl(link.url)
|
||||
setIsLoading(true)
|
||||
}}
|
||||
className={`p-4 rounded-xl border text-left transition-all hover:shadow-md ${
|
||||
iframeUrl === link.url
|
||||
? 'bg-primary-50 border-primary-300'
|
||||
: 'bg-white border-slate-200 hover:border-primary-300'
|
||||
}`}
|
||||
>
|
||||
<h3 className="font-medium text-slate-900">{link.name}</h3>
|
||||
<p className="text-sm text-slate-500">{link.description}</p>
|
||||
</button>
|
||||
))}
|
||||
</div>
|
||||
|
||||
{/* Iframe Viewer */}
|
||||
<div className="bg-white rounded-xl border border-slate-200 shadow-sm overflow-hidden">
|
||||
<div className="flex items-center justify-between px-4 py-2 bg-slate-50 border-b border-slate-200">
|
||||
<span className="text-sm text-slate-600 truncate">{iframeUrl}</span>
|
||||
<a
|
||||
href={iframeUrl}
|
||||
target="_blank"
|
||||
rel="noopener noreferrer"
|
||||
className="text-sm text-primary-600 hover:text-primary-700"
|
||||
>
|
||||
In neuem Tab oeffnen
|
||||
</a>
|
||||
</div>
|
||||
<div className="relative" style={{ height: '70vh' }}>
|
||||
{isLoading && (
|
||||
<div className="absolute inset-0 flex items-center justify-center bg-slate-50">
|
||||
<div className="animate-spin rounded-full h-8 w-8 border-b-2 border-primary-600"></div>
|
||||
</div>
|
||||
)}
|
||||
<iframe
|
||||
src={iframeUrl}
|
||||
className="w-full h-full border-0"
|
||||
onLoad={() => setIsLoading(false)}
|
||||
title="Documentation Viewer"
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Info */}
|
||||
<div className="mt-4 p-4 bg-blue-50 border border-blue-200 rounded-xl">
|
||||
<h3 className="font-medium text-blue-900 mb-1">Dokumentation bearbeiten</h3>
|
||||
<p className="text-sm text-blue-700">
|
||||
Die MkDocs-Dokumentation liegt unter <code className="px-1 py-0.5 bg-blue-100 rounded">/docs-src/</code>.
|
||||
Aenderungen werden automatisch beim naechsten Build sichtbar.
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -1,178 +0,0 @@
|
||||
'use client'
|
||||
|
||||
import { useState, useCallback, useMemo } from 'react'
|
||||
import ReactFlow, {
|
||||
Node,
|
||||
Edge,
|
||||
Controls,
|
||||
Background,
|
||||
MiniMap,
|
||||
useNodesState,
|
||||
useEdgesState,
|
||||
MarkerType,
|
||||
} from 'reactflow'
|
||||
import 'reactflow/dist/style.css'
|
||||
|
||||
type CategoryFilter = 'all' | 'communication' | 'infrastructure' | 'development' | 'meta'
|
||||
|
||||
const categoryColors: Record<string, string> = {
|
||||
communication: '#22c55e',
|
||||
infrastructure: '#f97316',
|
||||
development: '#64748b',
|
||||
meta: '#0ea5e9',
|
||||
}
|
||||
|
||||
const initialNodes: Node[] = [
|
||||
// Meta
|
||||
{ id: 'role-select', position: { x: 400, y: 0 }, data: { label: 'Rollenauswahl', category: 'meta' }, style: { background: '#e0f2fe', border: '2px solid #0ea5e9', borderRadius: '12px', padding: '10px 16px' } },
|
||||
{ id: 'dashboard', position: { x: 400, y: 100 }, data: { label: 'Dashboard', category: 'meta' }, style: { background: '#e0f2fe', border: '2px solid #0ea5e9', borderRadius: '12px', padding: '10px 16px' } },
|
||||
|
||||
// Communication (Green)
|
||||
{ id: 'video-chat', position: { x: 50, y: 250 }, data: { label: 'Video & Chat', category: 'communication' }, style: { background: '#dcfce7', border: '2px solid #22c55e', borderRadius: '12px', padding: '10px 16px' } },
|
||||
{ id: 'voice-service', position: { x: 50, y: 350 }, data: { label: 'Voice Service', category: 'communication' }, style: { background: '#dcfce7', border: '2px solid #22c55e', borderRadius: '12px', padding: '10px 16px' } },
|
||||
{ id: 'mail', position: { x: 50, y: 450 }, data: { label: 'Unified Inbox', category: 'communication' }, style: { background: '#dcfce7', border: '2px solid #22c55e', borderRadius: '12px', padding: '10px 16px' } },
|
||||
{ id: 'alerts', position: { x: 50, y: 550 }, data: { label: 'Alerts Monitoring', category: 'communication' }, style: { background: '#dcfce7', border: '2px solid #22c55e', borderRadius: '12px', padding: '10px 16px' } },
|
||||
|
||||
// Infrastructure (Orange)
|
||||
{ id: 'gpu', position: { x: 300, y: 250 }, data: { label: 'GPU Infrastruktur', category: 'infrastructure' }, style: { background: '#ffedd5', border: '2px solid #f97316', borderRadius: '12px', padding: '10px 16px' } },
|
||||
{ id: 'middleware', position: { x: 300, y: 350 }, data: { label: 'Middleware', category: 'infrastructure' }, style: { background: '#ffedd5', border: '2px solid #f97316', borderRadius: '12px', padding: '10px 16px' } },
|
||||
{ id: 'security', position: { x: 300, y: 450 }, data: { label: 'Security Dashboard', category: 'infrastructure' }, style: { background: '#ffedd5', border: '2px solid #f97316', borderRadius: '12px', padding: '10px 16px' } },
|
||||
{ id: 'sbom', position: { x: 300, y: 550 }, data: { label: 'SBOM', category: 'infrastructure' }, style: { background: '#ffedd5', border: '2px solid #f97316', borderRadius: '12px', padding: '10px 16px' } },
|
||||
{ id: 'ci-cd', position: { x: 500, y: 250 }, data: { label: 'CI/CD Dashboard', category: 'infrastructure' }, style: { background: '#ffedd5', border: '2px solid #f97316', borderRadius: '12px', padding: '10px 16px' } },
|
||||
{ id: 'tests', position: { x: 500, y: 350 }, data: { label: 'Test Dashboard', category: 'infrastructure' }, style: { background: '#ffedd5', border: '2px solid #f97316', borderRadius: '12px', padding: '10px 16px' } },
|
||||
|
||||
// Development (Slate)
|
||||
{ id: 'docs', position: { x: 700, y: 250 }, data: { label: 'Developer Docs', category: 'development' }, style: { background: '#f1f5f9', border: '2px solid #64748b', borderRadius: '12px', padding: '10px 16px' } },
|
||||
{ id: 'screen-flow', position: { x: 700, y: 350 }, data: { label: 'Screen Flow', category: 'development' }, style: { background: '#f1f5f9', border: '2px solid #64748b', borderRadius: '12px', padding: '10px 16px' } },
|
||||
{ id: 'brandbook', position: { x: 700, y: 450 }, data: { label: 'Brandbook', category: 'development' }, style: { background: '#f1f5f9', border: '2px solid #64748b', borderRadius: '12px', padding: '10px 16px' } },
|
||||
]
|
||||
|
||||
const initialEdges: Edge[] = [
|
||||
// Meta flow
|
||||
{ id: 'e-role-dash', source: 'role-select', target: 'dashboard', markerEnd: { type: MarkerType.ArrowClosed }, style: { stroke: '#0ea5e9' } },
|
||||
|
||||
// Dashboard to categories
|
||||
{ id: 'e-dash-vc', source: 'dashboard', target: 'video-chat', markerEnd: { type: MarkerType.ArrowClosed }, style: { stroke: '#22c55e' } },
|
||||
{ id: 'e-dash-gpu', source: 'dashboard', target: 'gpu', markerEnd: { type: MarkerType.ArrowClosed }, style: { stroke: '#f97316' } },
|
||||
{ id: 'e-dash-cicd', source: 'dashboard', target: 'ci-cd', markerEnd: { type: MarkerType.ArrowClosed }, style: { stroke: '#f97316' } },
|
||||
{ id: 'e-dash-docs', source: 'dashboard', target: 'docs', markerEnd: { type: MarkerType.ArrowClosed }, style: { stroke: '#64748b' } },
|
||||
|
||||
// Communication internal
|
||||
{ id: 'e-vc-voice', source: 'video-chat', target: 'voice-service', markerEnd: { type: MarkerType.ArrowClosed }, style: { stroke: '#22c55e' } },
|
||||
{ id: 'e-voice-mail', source: 'voice-service', target: 'mail', markerEnd: { type: MarkerType.ArrowClosed }, style: { stroke: '#22c55e' } },
|
||||
{ id: 'e-mail-alerts', source: 'mail', target: 'alerts', markerEnd: { type: MarkerType.ArrowClosed }, style: { stroke: '#22c55e' } },
|
||||
|
||||
// Infrastructure internal
|
||||
{ id: 'e-gpu-mw', source: 'gpu', target: 'middleware', markerEnd: { type: MarkerType.ArrowClosed }, style: { stroke: '#f97316' } },
|
||||
{ id: 'e-mw-sec', source: 'middleware', target: 'security', markerEnd: { type: MarkerType.ArrowClosed }, style: { stroke: '#f97316' } },
|
||||
{ id: 'e-sec-sbom', source: 'security', target: 'sbom', markerEnd: { type: MarkerType.ArrowClosed }, style: { stroke: '#f97316' } },
|
||||
{ id: 'e-cicd-tests', source: 'ci-cd', target: 'tests', markerEnd: { type: MarkerType.ArrowClosed }, style: { stroke: '#f97316' } },
|
||||
|
||||
// Cross-category
|
||||
{ id: 'e-sec-cicd', source: 'security', target: 'ci-cd', markerEnd: { type: MarkerType.ArrowClosed }, style: { stroke: '#94a3b8', strokeDasharray: '5,5' } },
|
||||
{ id: 'e-tests-docs', source: 'tests', target: 'docs', markerEnd: { type: MarkerType.ArrowClosed }, style: { stroke: '#94a3b8', strokeDasharray: '5,5' } },
|
||||
]
|
||||
|
||||
export default function ScreenFlowPage() {
|
||||
const [filter, setFilter] = useState<CategoryFilter>('all')
|
||||
const [nodes, setNodes, onNodesChange] = useNodesState(initialNodes)
|
||||
const [edges, setEdges, onEdgesChange] = useEdgesState(initialEdges)
|
||||
|
||||
const filteredNodes = useMemo(() => {
|
||||
if (filter === 'all') return nodes
|
||||
return nodes.filter(n => n.data.category === filter || n.data.category === 'meta')
|
||||
}, [nodes, filter])
|
||||
|
||||
const filteredEdges = useMemo(() => {
|
||||
const nodeIds = new Set(filteredNodes.map(n => n.id))
|
||||
return edges.filter(e => nodeIds.has(e.source) && nodeIds.has(e.target))
|
||||
}, [edges, filteredNodes])
|
||||
|
||||
const filters: { id: CategoryFilter; label: string; color: string }[] = [
|
||||
{ id: 'all', label: 'Alle', color: '#0ea5e9' },
|
||||
{ id: 'communication', label: 'Kommunikation', color: '#22c55e' },
|
||||
{ id: 'infrastructure', label: 'Infrastruktur', color: '#f97316' },
|
||||
{ id: 'development', label: 'Entwicklung', color: '#64748b' },
|
||||
]
|
||||
|
||||
return (
|
||||
<div>
|
||||
{/* Filter */}
|
||||
<div className="flex items-center gap-2 mb-4">
|
||||
{filters.map((f) => (
|
||||
<button
|
||||
key={f.id}
|
||||
onClick={() => setFilter(f.id)}
|
||||
className={`px-4 py-2 rounded-lg text-sm font-medium transition-colors ${
|
||||
filter === f.id
|
||||
? 'text-white'
|
||||
: 'bg-white border border-slate-200 text-slate-600 hover:border-slate-300'
|
||||
}`}
|
||||
style={filter === f.id ? { backgroundColor: f.color } : undefined}
|
||||
>
|
||||
{f.label}
|
||||
</button>
|
||||
))}
|
||||
</div>
|
||||
|
||||
{/* Stats */}
|
||||
<div className="grid grid-cols-4 gap-4 mb-4">
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-3 text-center">
|
||||
<div className="text-2xl font-bold text-slate-900">{filteredNodes.length}</div>
|
||||
<div className="text-xs text-slate-500">Screens</div>
|
||||
</div>
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-3 text-center">
|
||||
<div className="text-2xl font-bold text-slate-900">{filteredEdges.length}</div>
|
||||
<div className="text-xs text-slate-500">Verbindungen</div>
|
||||
</div>
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-3 text-center">
|
||||
<div className="text-2xl font-bold text-slate-900">3</div>
|
||||
<div className="text-xs text-slate-500">Kategorien</div>
|
||||
</div>
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-3 text-center">
|
||||
<div className="text-2xl font-bold text-slate-900">13</div>
|
||||
<div className="text-xs text-slate-500">Module</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Flow */}
|
||||
<div className="bg-white rounded-xl border border-slate-200 shadow-sm" style={{ height: '65vh' }}>
|
||||
<ReactFlow
|
||||
nodes={filteredNodes}
|
||||
edges={filteredEdges}
|
||||
onNodesChange={onNodesChange}
|
||||
onEdgesChange={onEdgesChange}
|
||||
fitView
|
||||
attributionPosition="bottom-left"
|
||||
>
|
||||
<Controls />
|
||||
<Background />
|
||||
<MiniMap
|
||||
nodeColor={(node) => categoryColors[node.data?.category] || '#94a3b8'}
|
||||
maskColor="rgba(0,0,0,0.1)"
|
||||
/>
|
||||
</ReactFlow>
|
||||
</div>
|
||||
|
||||
{/* Legend */}
|
||||
<div className="mt-4 flex items-center gap-6 text-sm text-slate-500">
|
||||
<div className="flex items-center gap-2">
|
||||
<div className="w-4 h-4 rounded bg-green-100 border-2 border-green-500" />
|
||||
<span>Kommunikation (4)</span>
|
||||
</div>
|
||||
<div className="flex items-center gap-2">
|
||||
<div className="w-4 h-4 rounded bg-orange-100 border-2 border-orange-500" />
|
||||
<span>Infrastruktur (6)</span>
|
||||
</div>
|
||||
<div className="flex items-center gap-2">
|
||||
<div className="w-4 h-4 rounded bg-slate-100 border-2 border-slate-500" />
|
||||
<span>Entwicklung (3)</span>
|
||||
</div>
|
||||
<div className="flex items-center gap-2">
|
||||
<div className="w-4 h-4 rounded bg-sky-100 border-2 border-sky-500" />
|
||||
<span>Meta</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -85,38 +85,7 @@ interface DockerStats {
|
||||
stopped_containers: number
|
||||
}
|
||||
|
||||
type TabType = 'overview' | 'woodpecker' | 'pipelines' | 'deployments' | 'setup' | 'scheduler'
|
||||
|
||||
// Woodpecker Types
|
||||
interface WoodpeckerStep {
|
||||
name: string
|
||||
state: 'pending' | 'running' | 'success' | 'failure' | 'skipped'
|
||||
exit_code: number
|
||||
error?: string
|
||||
}
|
||||
|
||||
interface WoodpeckerPipeline {
|
||||
id: number
|
||||
number: number
|
||||
status: 'pending' | 'running' | 'success' | 'failure' | 'error'
|
||||
event: string
|
||||
branch: string
|
||||
commit: string
|
||||
message: string
|
||||
author: string
|
||||
created: number
|
||||
started: number
|
||||
finished: number
|
||||
steps: WoodpeckerStep[]
|
||||
errors?: string[]
|
||||
}
|
||||
|
||||
interface WoodpeckerStatus {
|
||||
status: 'online' | 'offline'
|
||||
pipelines: WoodpeckerPipeline[]
|
||||
lastUpdate: string
|
||||
error?: string
|
||||
}
|
||||
type TabType = 'overview' | 'pipelines' | 'deployments' | 'setup' | 'scheduler'
|
||||
|
||||
// ============================================================================
|
||||
// Helper Components
|
||||
@@ -168,10 +137,6 @@ export default function CICDPage() {
|
||||
const [containerFilter, setContainerFilter] = useState<'all' | 'running' | 'stopped'>('all')
|
||||
const [actionLoading, setActionLoading] = useState<string | null>(null)
|
||||
|
||||
// Woodpecker State
|
||||
const [woodpeckerStatus, setWoodpeckerStatus] = useState<WoodpeckerStatus | null>(null)
|
||||
const [triggeringWoodpecker, setTriggeringWoodpecker] = useState(false)
|
||||
|
||||
// General State
|
||||
const [loading, setLoading] = useState(true)
|
||||
const [error, setError] = useState<string | null>(null)
|
||||
@@ -214,54 +179,12 @@ export default function CICDPage() {
|
||||
}
|
||||
}, [])
|
||||
|
||||
const loadWoodpeckerData = useCallback(async () => {
|
||||
try {
|
||||
const response = await fetch('/api/admin/infrastructure/woodpecker?limit=10')
|
||||
if (response.ok) {
|
||||
const data = await response.json()
|
||||
setWoodpeckerStatus(data)
|
||||
}
|
||||
} catch (err) {
|
||||
console.error('Failed to load Woodpecker data:', err)
|
||||
setWoodpeckerStatus({
|
||||
status: 'offline',
|
||||
pipelines: [],
|
||||
lastUpdate: new Date().toISOString(),
|
||||
error: 'Verbindung fehlgeschlagen'
|
||||
})
|
||||
}
|
||||
}, [])
|
||||
|
||||
const triggerWoodpeckerPipeline = async () => {
|
||||
setTriggeringWoodpecker(true)
|
||||
setMessage(null)
|
||||
try {
|
||||
const response = await fetch('/api/admin/infrastructure/woodpecker', {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ branch: 'main' })
|
||||
})
|
||||
if (response.ok) {
|
||||
const result = await response.json()
|
||||
setMessage(`Woodpecker Pipeline #${result.pipeline?.number || '?'} gestartet!`)
|
||||
setTimeout(loadWoodpeckerData, 2000)
|
||||
setTimeout(loadWoodpeckerData, 5000)
|
||||
} else {
|
||||
setError('Pipeline-Start fehlgeschlagen')
|
||||
}
|
||||
} catch (err) {
|
||||
setError('Pipeline konnte nicht gestartet werden')
|
||||
} finally {
|
||||
setTriggeringWoodpecker(false)
|
||||
}
|
||||
}
|
||||
|
||||
const loadAllData = useCallback(async () => {
|
||||
setLoading(true)
|
||||
setError(null)
|
||||
await Promise.all([loadPipelineData(), loadContainerData(), loadWoodpeckerData()])
|
||||
await Promise.all([loadPipelineData(), loadContainerData()])
|
||||
setLoading(false)
|
||||
}, [loadPipelineData, loadContainerData, loadWoodpeckerData])
|
||||
}, [loadPipelineData, loadContainerData])
|
||||
|
||||
useEffect(() => {
|
||||
loadAllData()
|
||||
@@ -402,11 +325,6 @@ export default function CICDPage() {
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 19v-6a2 2 0 00-2-2H5a2 2 0 00-2 2v6a2 2 0 002 2h2a2 2 0 002-2zm0 0V9a2 2 0 012-2h2a2 2 0 012 2v10m-6 0a2 2 0 002 2h2a2 2 0 002-2m0 0V5a2 2 0 012-2h2a2 2 0 012 2v14a2 2 0 01-2 2h-2a2 2 0 01-2-2z" />
|
||||
</svg>
|
||||
)},
|
||||
{ id: 'woodpecker', name: 'Woodpecker CI', icon: (
|
||||
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M8 9l3 3-3 3m5 0h3M5 20h14a2 2 0 002-2V6a2 2 0 00-2-2H5a2 2 0 00-2 2v12a2 2 0 002 2z" />
|
||||
</svg>
|
||||
)},
|
||||
{ id: 'pipelines', name: 'Gitea Pipelines', icon: (
|
||||
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M4 4v5h.582m15.356 2A8.001 8.001 0 004.582 9m0 0H9m11 11v-5h-.581m0 0a8.003 8.003 0 01-15.357-2m15.357 2H15" />
|
||||
@@ -458,95 +376,6 @@ export default function CICDPage() {
|
||||
{/* ================================================================ */}
|
||||
{activeTab === 'overview' && (
|
||||
<div className="space-y-6">
|
||||
{/* Woodpecker CI Status - Prominent */}
|
||||
<div className={`p-4 rounded-xl border-2 ${
|
||||
woodpeckerStatus?.status === 'online'
|
||||
? woodpeckerStatus.pipelines?.[0]?.status === 'success'
|
||||
? 'border-green-300 bg-green-50'
|
||||
: woodpeckerStatus.pipelines?.[0]?.status === 'failure' || woodpeckerStatus.pipelines?.[0]?.status === 'error'
|
||||
? 'border-red-300 bg-red-50'
|
||||
: woodpeckerStatus.pipelines?.[0]?.status === 'running'
|
||||
? 'border-blue-300 bg-blue-50'
|
||||
: 'border-slate-300 bg-slate-50'
|
||||
: 'border-red-300 bg-red-50'
|
||||
}`}>
|
||||
<div className="flex items-center justify-between">
|
||||
<div className="flex items-center gap-4">
|
||||
<div className={`p-3 rounded-lg ${
|
||||
woodpeckerStatus?.status === 'online'
|
||||
? woodpeckerStatus.pipelines?.[0]?.status === 'success'
|
||||
? 'bg-green-100'
|
||||
: woodpeckerStatus.pipelines?.[0]?.status === 'failure' || woodpeckerStatus.pipelines?.[0]?.status === 'error'
|
||||
? 'bg-red-100'
|
||||
: 'bg-blue-100'
|
||||
: 'bg-red-100'
|
||||
}`}>
|
||||
<svg className="w-6 h-6" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M8 9l3 3-3 3m5 0h3M5 20h14a2 2 0 002-2V6a2 2 0 00-2-2H5a2 2 0 00-2 2v12a2 2 0 002 2z" />
|
||||
</svg>
|
||||
</div>
|
||||
<div>
|
||||
<div className="flex items-center gap-2">
|
||||
<h3 className="font-semibold text-slate-900">Woodpecker CI</h3>
|
||||
<span className={`px-2 py-0.5 text-xs font-medium rounded-full ${
|
||||
woodpeckerStatus?.status === 'online' ? 'bg-green-100 text-green-800' : 'bg-red-100 text-red-800'
|
||||
}`}>
|
||||
{woodpeckerStatus?.status === 'online' ? 'Online' : 'Offline'}
|
||||
</span>
|
||||
</div>
|
||||
{woodpeckerStatus?.pipelines?.[0] && (
|
||||
<p className="text-sm text-slate-600 mt-1">
|
||||
Pipeline #{woodpeckerStatus.pipelines[0].number}: {' '}
|
||||
<span className={`font-medium ${
|
||||
woodpeckerStatus.pipelines[0].status === 'success' ? 'text-green-600' :
|
||||
woodpeckerStatus.pipelines[0].status === 'failure' || woodpeckerStatus.pipelines[0].status === 'error' ? 'text-red-600' :
|
||||
woodpeckerStatus.pipelines[0].status === 'running' ? 'text-blue-600' : 'text-slate-600'
|
||||
}`}>
|
||||
{woodpeckerStatus.pipelines[0].status}
|
||||
</span>
|
||||
{' '}auf {woodpeckerStatus.pipelines[0].branch}
|
||||
</p>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
<div className="flex items-center gap-2">
|
||||
<button
|
||||
onClick={() => setActiveTab('woodpecker')}
|
||||
className="px-3 py-1.5 text-sm border border-slate-300 text-slate-700 rounded-lg hover:bg-white"
|
||||
>
|
||||
Details
|
||||
</button>
|
||||
<button
|
||||
onClick={triggerWoodpeckerPipeline}
|
||||
disabled={triggeringWoodpecker}
|
||||
className="px-3 py-1.5 text-sm bg-blue-600 text-white rounded-lg hover:bg-blue-700 disabled:opacity-50 flex items-center gap-1"
|
||||
>
|
||||
{triggeringWoodpecker ? (
|
||||
<div className="animate-spin rounded-full h-3 w-3 border-b-2 border-white" />
|
||||
) : (
|
||||
<svg className="w-3 h-3" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M14.752 11.168l-3.197-2.132A1 1 0 0010 9.87v4.263a1 1 0 001.555.832l3.197-2.132a1 1 0 000-1.664z" />
|
||||
</svg>
|
||||
)}
|
||||
Starten
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
{/* Failed steps preview */}
|
||||
{woodpeckerStatus?.pipelines?.[0]?.steps?.some(s => s.state === 'failure') && (
|
||||
<div className="mt-3 pt-3 border-t border-red-200">
|
||||
<p className="text-xs font-medium text-red-700 mb-2">Fehlgeschlagene Steps:</p>
|
||||
<div className="flex flex-wrap gap-2">
|
||||
{woodpeckerStatus.pipelines[0].steps.filter(s => s.state === 'failure').map((step, i) => (
|
||||
<span key={i} className="px-2 py-1 bg-red-100 text-red-700 text-xs rounded">
|
||||
{step.name}
|
||||
</span>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
|
||||
{/* Status Cards */}
|
||||
<div className="grid grid-cols-1 md:grid-cols-4 gap-4">
|
||||
<div className={`p-4 rounded-lg ${pipelineStatus?.gitea_connected ? 'bg-green-50' : 'bg-yellow-50'}`}>
|
||||
@@ -679,299 +508,6 @@ export default function CICDPage() {
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* ================================================================ */}
|
||||
{/* Woodpecker Tab */}
|
||||
{/* ================================================================ */}
|
||||
{activeTab === 'woodpecker' && (
|
||||
<div className="space-y-6">
|
||||
{/* Woodpecker Status Header */}
|
||||
<div className="flex items-center justify-between">
|
||||
<div className="flex items-center gap-3">
|
||||
<h3 className="text-lg font-semibold text-slate-800">Woodpecker CI Pipeline</h3>
|
||||
<span className={`flex items-center gap-1.5 px-2 py-1 rounded-full text-xs font-medium ${
|
||||
woodpeckerStatus?.status === 'online'
|
||||
? 'bg-green-100 text-green-800'
|
||||
: 'bg-red-100 text-red-800'
|
||||
}`}>
|
||||
<span className={`w-2 h-2 rounded-full ${
|
||||
woodpeckerStatus?.status === 'online' ? 'bg-green-500' : 'bg-red-500'
|
||||
}`} />
|
||||
{woodpeckerStatus?.status === 'online' ? 'Online' : 'Offline'}
|
||||
</span>
|
||||
</div>
|
||||
<div className="flex items-center gap-2">
|
||||
<a
|
||||
href="http://macmini:8090"
|
||||
target="_blank"
|
||||
rel="noopener noreferrer"
|
||||
className="px-3 py-2 text-sm border border-slate-300 text-slate-700 rounded-lg hover:bg-slate-50 flex items-center gap-2"
|
||||
>
|
||||
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M10 6H6a2 2 0 00-2 2v10a2 2 0 002 2h10a2 2 0 002-2v-4M14 4h6m0 0v6m0-6L10 14" />
|
||||
</svg>
|
||||
Woodpecker UI
|
||||
</a>
|
||||
<button
|
||||
onClick={triggerWoodpeckerPipeline}
|
||||
disabled={triggeringWoodpecker}
|
||||
className="px-4 py-2 bg-blue-600 text-white rounded-lg font-medium hover:bg-blue-700 disabled:opacity-50 transition-colors flex items-center gap-2"
|
||||
>
|
||||
{triggeringWoodpecker ? (
|
||||
<>
|
||||
<div className="animate-spin rounded-full h-4 w-4 border-b-2 border-white" />
|
||||
Startet...
|
||||
</>
|
||||
) : (
|
||||
<>
|
||||
<svg className="w-4 h-4" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M14.752 11.168l-3.197-2.132A1 1 0 0010 9.87v4.263a1 1 0 001.555.832l3.197-2.132a1 1 0 000-1.664z" />
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
|
||||
</svg>
|
||||
Pipeline starten
|
||||
</>
|
||||
)}
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Pipeline Stats */}
|
||||
<div className="grid grid-cols-1 md:grid-cols-4 gap-4">
|
||||
<div className="bg-blue-50 p-4 rounded-lg">
|
||||
<div className="flex items-center gap-2 mb-2">
|
||||
<svg className="w-4 h-4 text-blue-500" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2" />
|
||||
</svg>
|
||||
<span className="text-sm font-medium">Gesamt</span>
|
||||
</div>
|
||||
<p className="text-2xl font-bold text-blue-700">{woodpeckerStatus?.pipelines?.length || 0}</p>
|
||||
</div>
|
||||
<div className="bg-green-50 p-4 rounded-lg">
|
||||
<div className="flex items-center gap-2 mb-2">
|
||||
<svg className="w-4 h-4 text-green-500" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M5 13l4 4L19 7" />
|
||||
</svg>
|
||||
<span className="text-sm font-medium">Erfolgreich</span>
|
||||
</div>
|
||||
<p className="text-2xl font-bold text-green-700">
|
||||
{woodpeckerStatus?.pipelines?.filter(p => p.status === 'success').length || 0}
|
||||
</p>
|
||||
</div>
|
||||
<div className="bg-red-50 p-4 rounded-lg">
|
||||
<div className="flex items-center gap-2 mb-2">
|
||||
<svg className="w-4 h-4 text-red-500" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M6 18L18 6M6 6l12 12" />
|
||||
</svg>
|
||||
<span className="text-sm font-medium">Fehlgeschlagen</span>
|
||||
</div>
|
||||
<p className="text-2xl font-bold text-red-700">
|
||||
{woodpeckerStatus?.pipelines?.filter(p => p.status === 'failure' || p.status === 'error').length || 0}
|
||||
</p>
|
||||
</div>
|
||||
<div className="bg-yellow-50 p-4 rounded-lg">
|
||||
<div className="flex items-center gap-2 mb-2">
|
||||
<svg className="w-4 h-4 text-yellow-500" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 8v4l3 3m6-3a9 9 0 11-18 0 9 9 0 0118 0z" />
|
||||
</svg>
|
||||
<span className="text-sm font-medium">Laufend</span>
|
||||
</div>
|
||||
<p className="text-2xl font-bold text-yellow-700">
|
||||
{woodpeckerStatus?.pipelines?.filter(p => p.status === 'running' || p.status === 'pending').length || 0}
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Pipeline List */}
|
||||
{woodpeckerStatus?.pipelines && woodpeckerStatus.pipelines.length > 0 ? (
|
||||
<div className="bg-slate-50 rounded-lg p-4">
|
||||
<h4 className="font-medium text-slate-800 mb-4">Pipeline Historie</h4>
|
||||
<div className="space-y-3">
|
||||
{woodpeckerStatus.pipelines.map((pipeline) => (
|
||||
<div
|
||||
key={pipeline.id}
|
||||
className={`border rounded-xl p-4 transition-colors ${
|
||||
pipeline.status === 'success'
|
||||
? 'border-green-200 bg-green-50/30'
|
||||
: pipeline.status === 'failure' || pipeline.status === 'error'
|
||||
? 'border-red-200 bg-red-50/30'
|
||||
: pipeline.status === 'running'
|
||||
? 'border-blue-200 bg-blue-50/30'
|
||||
: 'border-slate-200 bg-white'
|
||||
}`}
|
||||
>
|
||||
<div className="flex items-start justify-between gap-4">
|
||||
<div className="flex-1">
|
||||
<div className="flex items-center gap-2 mb-2">
|
||||
<span className={`w-3 h-3 rounded-full ${
|
||||
pipeline.status === 'success' ? 'bg-green-500' :
|
||||
pipeline.status === 'failure' || pipeline.status === 'error' ? 'bg-red-500' :
|
||||
pipeline.status === 'running' ? 'bg-blue-500 animate-pulse' : 'bg-slate-400'
|
||||
}`} />
|
||||
<span className="font-semibold text-slate-900">Pipeline #{pipeline.number}</span>
|
||||
<span className={`px-2 py-0.5 text-xs font-medium rounded-full ${
|
||||
pipeline.status === 'success' ? 'bg-green-100 text-green-800' :
|
||||
pipeline.status === 'failure' || pipeline.status === 'error' ? 'bg-red-100 text-red-800' :
|
||||
pipeline.status === 'running' ? 'bg-blue-100 text-blue-800' :
|
||||
'bg-slate-100 text-slate-600'
|
||||
}`}>
|
||||
{pipeline.status}
|
||||
</span>
|
||||
</div>
|
||||
<div className="text-sm text-slate-600 mb-2">
|
||||
<span className="font-mono">{pipeline.branch}</span>
|
||||
<span className="mx-2 text-slate-400">•</span>
|
||||
<span className="font-mono text-slate-500">{pipeline.commit}</span>
|
||||
<span className="mx-2 text-slate-400">•</span>
|
||||
<span>{pipeline.event}</span>
|
||||
</div>
|
||||
{pipeline.message && (
|
||||
<p className="text-sm text-slate-500 mb-2 truncate max-w-xl">{pipeline.message}</p>
|
||||
)}
|
||||
|
||||
{/* Steps Progress */}
|
||||
{pipeline.steps && pipeline.steps.length > 0 && (
|
||||
<div className="mt-3">
|
||||
<div className="flex gap-1 mb-2">
|
||||
{pipeline.steps.map((step, i) => (
|
||||
<div
|
||||
key={i}
|
||||
className={`h-2 flex-1 rounded-full ${
|
||||
step.state === 'success' ? 'bg-green-500' :
|
||||
step.state === 'failure' ? 'bg-red-500' :
|
||||
step.state === 'running' ? 'bg-blue-500 animate-pulse' :
|
||||
step.state === 'skipped' ? 'bg-slate-200' : 'bg-slate-300'
|
||||
}`}
|
||||
title={`${step.name}: ${step.state}`}
|
||||
/>
|
||||
))}
|
||||
</div>
|
||||
<div className="flex flex-wrap gap-2 text-xs">
|
||||
{pipeline.steps.map((step, i) => (
|
||||
<span
|
||||
key={i}
|
||||
className={`px-2 py-1 rounded ${
|
||||
step.state === 'success' ? 'bg-green-100 text-green-700' :
|
||||
step.state === 'failure' ? 'bg-red-100 text-red-700' :
|
||||
step.state === 'running' ? 'bg-blue-100 text-blue-700' :
|
||||
'bg-slate-100 text-slate-600'
|
||||
}`}
|
||||
>
|
||||
{step.name}
|
||||
</span>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Errors */}
|
||||
{pipeline.errors && pipeline.errors.length > 0 && (
|
||||
<div className="mt-3 p-3 bg-red-50 border border-red-200 rounded-lg">
|
||||
<h5 className="text-sm font-medium text-red-800 mb-1">Fehler:</h5>
|
||||
<ul className="text-xs text-red-700 space-y-1">
|
||||
{pipeline.errors.map((err, i) => (
|
||||
<li key={i} className="font-mono">{err}</li>
|
||||
))}
|
||||
</ul>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
|
||||
<div className="text-right text-sm text-slate-500">
|
||||
<p>{new Date(pipeline.created * 1000).toLocaleDateString('de-DE')}</p>
|
||||
<p className="text-xs">{new Date(pipeline.created * 1000).toLocaleTimeString('de-DE')}</p>
|
||||
{pipeline.started && pipeline.finished && (
|
||||
<p className="text-xs mt-1">
|
||||
Dauer: {Math.round((pipeline.finished - pipeline.started) / 60)}m
|
||||
</p>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
) : (
|
||||
<div className="bg-slate-50 rounded-lg p-8 text-center">
|
||||
<svg className="w-12 h-12 text-slate-300 mx-auto mb-3" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2" />
|
||||
</svg>
|
||||
<p className="text-slate-500">Keine Pipelines gefunden</p>
|
||||
<p className="text-sm text-slate-400 mt-1">Starte eine neue Pipeline oder pruefe die Woodpecker-Konfiguration</p>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* Pipeline Configuration Info */}
|
||||
<div className="bg-slate-50 rounded-lg p-4">
|
||||
<h4 className="font-medium text-slate-800 mb-3">Pipeline Konfiguration</h4>
|
||||
<pre className="bg-slate-800 text-slate-100 p-4 rounded-lg overflow-x-auto text-sm">
|
||||
{`Woodpecker CI Pipeline (.woodpecker/main.yml)
|
||||
│
|
||||
├── 1. go-lint → Go Linting (PR only)
|
||||
├── 2. python-lint → Python Linting (PR only)
|
||||
├── 3. secrets-scan → GitLeaks Secrets Scan
|
||||
│
|
||||
├── 4. test-go-consent → Go Unit Tests
|
||||
├── 5. test-go-billing → Billing Service Tests
|
||||
├── 6. test-go-school → School Service Tests
|
||||
├── 7. test-python → Python Backend Tests
|
||||
│
|
||||
├── 8. build-images → Docker Image Build
|
||||
├── 9. generate-sbom → SBOM Generation (Syft)
|
||||
├── 10. vuln-scan → Vulnerability Scan (Grype)
|
||||
├── 11. container-scan → Container Scan (Trivy)
|
||||
│
|
||||
├── 12. sign-images → Cosign Image Signing
|
||||
├── 13. attest-sbom → SBOM Attestation
|
||||
├── 14. provenance → SLSA Provenance
|
||||
│
|
||||
└── 15. deploy-prod → Production Deployment`}
|
||||
</pre>
|
||||
</div>
|
||||
|
||||
{/* Workflow Anleitung */}
|
||||
<div className="bg-blue-50 border border-blue-200 rounded-lg p-4">
|
||||
<h4 className="font-medium text-blue-800 mb-3 flex items-center gap-2">
|
||||
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 16h-1v-4h-1m1-4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
|
||||
</svg>
|
||||
Workflow-Anleitung
|
||||
</h4>
|
||||
<div className="grid grid-cols-1 md:grid-cols-2 gap-4 text-sm">
|
||||
<div>
|
||||
<h5 className="font-medium text-blue-700 mb-2">🤖 Automatisch (bei jedem Push/PR):</h5>
|
||||
<ul className="space-y-1 text-blue-600">
|
||||
<li>• <strong>Linting</strong> - Code-Qualitaet pruefen (nur PRs)</li>
|
||||
<li>• <strong>Unit Tests</strong> - Go & Python Tests</li>
|
||||
<li>• <strong>Test-Dashboard</strong> - Ergebnisse werden gesendet</li>
|
||||
<li>• <strong>Backlog</strong> - Fehlgeschlagene Tests werden erfasst</li>
|
||||
</ul>
|
||||
</div>
|
||||
<div>
|
||||
<h5 className="font-medium text-blue-700 mb-2">👆 Manuell (Button oder Tag):</h5>
|
||||
<ul className="space-y-1 text-blue-600">
|
||||
<li>• <strong>Docker Builds</strong> - Container erstellen</li>
|
||||
<li>• <strong>SBOM/Scans</strong> - Sicherheitsanalyse</li>
|
||||
<li>• <strong>Deployment</strong> - In Produktion deployen</li>
|
||||
<li>• <strong>Pipeline starten</strong> - Diesen Button verwenden</li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
<div className="mt-4 pt-3 border-t border-blue-200">
|
||||
<h5 className="font-medium text-blue-700 mb-2">⚙️ Setup: API Token konfigurieren</h5>
|
||||
<p className="text-blue-600 text-sm">
|
||||
Um Pipelines ueber das Dashboard zu starten, muss ein <strong>WOODPECKER_TOKEN</strong> konfiguriert werden:
|
||||
</p>
|
||||
<ol className="mt-2 space-y-1 text-blue-600 text-sm list-decimal list-inside">
|
||||
<li>Woodpecker UI oeffnen: <a href="http://macmini:8090" target="_blank" rel="noopener noreferrer" className="underline hover:text-blue-800">http://macmini:8090</a></li>
|
||||
<li>Mit Gitea-Account einloggen</li>
|
||||
<li>Klick auf Profil → <strong>User Settings</strong> → <strong>Personal Access Tokens</strong></li>
|
||||
<li>Neues Token erstellen und in <code className="bg-blue-100 px-1 rounded">.env</code> eintragen: <code className="bg-blue-100 px-1 rounded">WOODPECKER_TOKEN=...</code></li>
|
||||
<li>Container neu starten: <code className="bg-blue-100 px-1 rounded">docker compose up -d admin-v2</code></li>
|
||||
</ol>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{/* ================================================================ */}
|
||||
{/* Pipelines Tab */}
|
||||
{/* ================================================================ */}
|
||||
|
||||
@@ -1,391 +0,0 @@
|
||||
'use client'
|
||||
|
||||
/**
|
||||
* GPU Infrastructure Admin Page
|
||||
*
|
||||
* vast.ai GPU Management for LLM Processing
|
||||
*/
|
||||
|
||||
import { useEffect, useState, useCallback } from 'react'
|
||||
import { PagePurpose } from '@/components/common/PagePurpose'
|
||||
|
||||
interface VastStatus {
|
||||
instance_id: number | null
|
||||
status: string
|
||||
gpu_name: string | null
|
||||
dph_total: number | null
|
||||
endpoint_base_url: string | null
|
||||
last_activity: string | null
|
||||
auto_shutdown_in_minutes: number | null
|
||||
total_runtime_hours: number | null
|
||||
total_cost_usd: number | null
|
||||
account_credit: number | null
|
||||
account_total_spend: number | null
|
||||
session_runtime_minutes: number | null
|
||||
session_cost_usd: number | null
|
||||
message: string | null
|
||||
error?: string
|
||||
}
|
||||
|
||||
export default function GPUInfrastructurePage() {
|
||||
const [status, setStatus] = useState<VastStatus | null>(null)
|
||||
const [loading, setLoading] = useState(true)
|
||||
const [actionLoading, setActionLoading] = useState<string | null>(null)
|
||||
const [error, setError] = useState<string | null>(null)
|
||||
const [message, setMessage] = useState<string | null>(null)
|
||||
|
||||
const API_PROXY = '/api/admin/gpu'
|
||||
|
||||
const fetchStatus = useCallback(async () => {
|
||||
setLoading(true)
|
||||
setError(null)
|
||||
|
||||
try {
|
||||
const response = await fetch(API_PROXY)
|
||||
const data = await response.json()
|
||||
|
||||
if (!response.ok) {
|
||||
throw new Error(data.error || `HTTP ${response.status}`)
|
||||
}
|
||||
|
||||
setStatus(data)
|
||||
} catch (err) {
|
||||
setError(err instanceof Error ? err.message : 'Verbindungsfehler')
|
||||
setStatus({
|
||||
instance_id: null,
|
||||
status: 'error',
|
||||
gpu_name: null,
|
||||
dph_total: null,
|
||||
endpoint_base_url: null,
|
||||
last_activity: null,
|
||||
auto_shutdown_in_minutes: null,
|
||||
total_runtime_hours: null,
|
||||
total_cost_usd: null,
|
||||
account_credit: null,
|
||||
account_total_spend: null,
|
||||
session_runtime_minutes: null,
|
||||
session_cost_usd: null,
|
||||
message: 'Verbindung fehlgeschlagen'
|
||||
})
|
||||
} finally {
|
||||
setLoading(false)
|
||||
}
|
||||
}, [])
|
||||
|
||||
useEffect(() => {
|
||||
fetchStatus()
|
||||
}, [fetchStatus])
|
||||
|
||||
useEffect(() => {
|
||||
const interval = setInterval(fetchStatus, 30000)
|
||||
return () => clearInterval(interval)
|
||||
}, [fetchStatus])
|
||||
|
||||
const powerOn = async () => {
|
||||
setActionLoading('on')
|
||||
setError(null)
|
||||
setMessage(null)
|
||||
|
||||
try {
|
||||
const response = await fetch(API_PROXY, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ action: 'on' }),
|
||||
})
|
||||
|
||||
const data = await response.json()
|
||||
|
||||
if (!response.ok) {
|
||||
throw new Error(data.error || data.detail || 'Aktion fehlgeschlagen')
|
||||
}
|
||||
|
||||
setMessage('Start angefordert')
|
||||
setTimeout(fetchStatus, 3000)
|
||||
setTimeout(fetchStatus, 10000)
|
||||
} catch (err) {
|
||||
setError(err instanceof Error ? err.message : 'Fehler beim Starten')
|
||||
fetchStatus()
|
||||
} finally {
|
||||
setActionLoading(null)
|
||||
}
|
||||
}
|
||||
|
||||
const powerOff = async () => {
|
||||
setActionLoading('off')
|
||||
setError(null)
|
||||
setMessage(null)
|
||||
|
||||
try {
|
||||
const response = await fetch(API_PROXY, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ action: 'off' }),
|
||||
})
|
||||
|
||||
const data = await response.json()
|
||||
|
||||
if (!response.ok) {
|
||||
throw new Error(data.error || data.detail || 'Aktion fehlgeschlagen')
|
||||
}
|
||||
|
||||
setMessage('Stop angefordert')
|
||||
setTimeout(fetchStatus, 3000)
|
||||
setTimeout(fetchStatus, 10000)
|
||||
} catch (err) {
|
||||
setError(err instanceof Error ? err.message : 'Fehler beim Stoppen')
|
||||
fetchStatus()
|
||||
} finally {
|
||||
setActionLoading(null)
|
||||
}
|
||||
}
|
||||
|
||||
const getStatusBadge = (s: string) => {
|
||||
const baseClasses = 'px-3 py-1 rounded-full text-sm font-semibold uppercase'
|
||||
switch (s) {
|
||||
case 'running':
|
||||
return `${baseClasses} bg-green-100 text-green-800`
|
||||
case 'stopped':
|
||||
case 'exited':
|
||||
return `${baseClasses} bg-red-100 text-red-800`
|
||||
case 'loading':
|
||||
case 'scheduling':
|
||||
case 'creating':
|
||||
case 'starting...':
|
||||
case 'stopping...':
|
||||
return `${baseClasses} bg-yellow-100 text-yellow-800`
|
||||
default:
|
||||
return `${baseClasses} bg-slate-100 text-slate-600`
|
||||
}
|
||||
}
|
||||
|
||||
const getCreditColor = (credit: number | null) => {
|
||||
if (credit === null) return 'text-slate-500'
|
||||
if (credit < 5) return 'text-red-600'
|
||||
if (credit < 15) return 'text-yellow-600'
|
||||
return 'text-green-600'
|
||||
}
|
||||
|
||||
return (
|
||||
<div>
|
||||
{/* Page Purpose */}
|
||||
<PagePurpose
|
||||
title="GPU Infrastruktur"
|
||||
purpose="Verwalten Sie die vast.ai GPU-Instanzen fuer LLM-Verarbeitung und OCR. Starten/Stoppen Sie GPUs bei Bedarf und ueberwachen Sie Kosten in Echtzeit."
|
||||
audience={['DevOps', 'Entwickler', 'System-Admins']}
|
||||
architecture={{
|
||||
services: ['vast.ai API', 'Ollama', 'VLLM'],
|
||||
databases: ['PostgreSQL (Logs)'],
|
||||
}}
|
||||
relatedPages={[
|
||||
{ name: 'LLM Vergleich', href: '/ai/llm-compare', description: 'KI-Provider testen' },
|
||||
{ name: 'Security', href: '/infrastructure/security', description: 'DevSecOps Dashboard' },
|
||||
{ name: 'Builds', href: '/infrastructure/builds', description: 'CI/CD Pipeline' },
|
||||
]}
|
||||
collapsible={true}
|
||||
defaultCollapsed={true}
|
||||
/>
|
||||
|
||||
{/* Status Cards */}
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-6 mb-6">
|
||||
<div className="grid grid-cols-2 md:grid-cols-3 lg:grid-cols-6 gap-6">
|
||||
<div>
|
||||
<div className="text-sm text-slate-500 mb-2">Status</div>
|
||||
{loading ? (
|
||||
<span className="px-3 py-1 rounded-full text-sm font-semibold bg-slate-100 text-slate-600">
|
||||
Laden...
|
||||
</span>
|
||||
) : (
|
||||
<span className={getStatusBadge(
|
||||
actionLoading === 'on' ? 'starting...' :
|
||||
actionLoading === 'off' ? 'stopping...' :
|
||||
status?.status || 'unknown'
|
||||
)}>
|
||||
{actionLoading === 'on' ? 'starting...' :
|
||||
actionLoading === 'off' ? 'stopping...' :
|
||||
status?.status || 'unbekannt'}
|
||||
</span>
|
||||
)}
|
||||
</div>
|
||||
|
||||
<div>
|
||||
<div className="text-sm text-slate-500 mb-2">GPU</div>
|
||||
<div className="font-semibold text-slate-900">
|
||||
{status?.gpu_name || '-'}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div>
|
||||
<div className="text-sm text-slate-500 mb-2">Kosten/h</div>
|
||||
<div className="font-semibold text-slate-900">
|
||||
{status?.dph_total ? `$${status.dph_total.toFixed(3)}` : '-'}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div>
|
||||
<div className="text-sm text-slate-500 mb-2">Auto-Stop</div>
|
||||
<div className="font-semibold text-slate-900">
|
||||
{status && status.auto_shutdown_in_minutes !== null
|
||||
? `${status.auto_shutdown_in_minutes} min`
|
||||
: '-'}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div>
|
||||
<div className="text-sm text-slate-500 mb-2">Budget</div>
|
||||
<div className={`font-bold text-lg ${getCreditColor(status?.account_credit ?? null)}`}>
|
||||
{status && status.account_credit !== null
|
||||
? `$${status.account_credit.toFixed(2)}`
|
||||
: '-'}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div>
|
||||
<div className="text-sm text-slate-500 mb-2">Session</div>
|
||||
<div className="font-semibold text-slate-900">
|
||||
{status && status.session_runtime_minutes !== null && status.session_cost_usd !== null
|
||||
? `${Math.round(status.session_runtime_minutes)} min / $${status.session_cost_usd.toFixed(3)}`
|
||||
: '-'}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Buttons */}
|
||||
<div className="flex items-center gap-4 mt-6 pt-6 border-t border-slate-200">
|
||||
<button
|
||||
onClick={powerOn}
|
||||
disabled={actionLoading !== null || status?.status === 'running'}
|
||||
className="px-6 py-2 bg-orange-600 text-white rounded-lg font-medium hover:bg-orange-700 disabled:opacity-50 disabled:cursor-not-allowed transition-colors"
|
||||
>
|
||||
Starten
|
||||
</button>
|
||||
<button
|
||||
onClick={powerOff}
|
||||
disabled={actionLoading !== null || status?.status !== 'running'}
|
||||
className="px-6 py-2 bg-red-600 text-white rounded-lg font-medium hover:bg-red-700 disabled:opacity-50 disabled:cursor-not-allowed transition-colors"
|
||||
>
|
||||
Stoppen
|
||||
</button>
|
||||
<button
|
||||
onClick={fetchStatus}
|
||||
disabled={loading}
|
||||
className="px-4 py-2 border border-slate-300 text-slate-700 rounded-lg font-medium hover:bg-slate-50 disabled:opacity-50 transition-colors"
|
||||
>
|
||||
{loading ? 'Aktualisiere...' : 'Aktualisieren'}
|
||||
</button>
|
||||
|
||||
{message && (
|
||||
<span className="ml-4 text-sm text-green-600 font-medium">{message}</span>
|
||||
)}
|
||||
{error && (
|
||||
<span className="ml-4 text-sm text-red-600 font-medium">{error}</span>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Extended Stats */}
|
||||
<div className="grid grid-cols-1 lg:grid-cols-2 gap-6 mb-6">
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-6">
|
||||
<h3 className="font-semibold text-slate-900 mb-4">Kosten-Uebersicht</h3>
|
||||
<div className="space-y-4">
|
||||
<div className="flex justify-between items-center">
|
||||
<span className="text-slate-600">Session Laufzeit</span>
|
||||
<span className="font-semibold">
|
||||
{status && status.session_runtime_minutes !== null
|
||||
? `${Math.round(status.session_runtime_minutes)} Minuten`
|
||||
: '-'}
|
||||
</span>
|
||||
</div>
|
||||
<div className="flex justify-between items-center">
|
||||
<span className="text-slate-600">Session Kosten</span>
|
||||
<span className="font-semibold">
|
||||
{status && status.session_cost_usd !== null
|
||||
? `$${status.session_cost_usd.toFixed(4)}`
|
||||
: '-'}
|
||||
</span>
|
||||
</div>
|
||||
<div className="flex justify-between items-center pt-4 border-t border-slate-100">
|
||||
<span className="text-slate-600">Gesamtlaufzeit</span>
|
||||
<span className="font-semibold">
|
||||
{status && status.total_runtime_hours !== null
|
||||
? `${status.total_runtime_hours.toFixed(1)} Stunden`
|
||||
: '-'}
|
||||
</span>
|
||||
</div>
|
||||
<div className="flex justify-between items-center">
|
||||
<span className="text-slate-600">Gesamtkosten</span>
|
||||
<span className="font-semibold">
|
||||
{status && status.total_cost_usd !== null
|
||||
? `$${status.total_cost_usd.toFixed(2)}`
|
||||
: '-'}
|
||||
</span>
|
||||
</div>
|
||||
<div className="flex justify-between items-center">
|
||||
<span className="text-slate-600">vast.ai Ausgaben</span>
|
||||
<span className="font-semibold">
|
||||
{status && status.account_total_spend !== null
|
||||
? `$${status.account_total_spend.toFixed(2)}`
|
||||
: '-'}
|
||||
</span>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div className="bg-white rounded-xl border border-slate-200 p-6">
|
||||
<h3 className="font-semibold text-slate-900 mb-4">Instanz-Details</h3>
|
||||
<div className="space-y-4">
|
||||
<div className="flex justify-between items-center">
|
||||
<span className="text-slate-600">Instanz ID</span>
|
||||
<span className="font-mono text-sm">
|
||||
{status?.instance_id || '-'}
|
||||
</span>
|
||||
</div>
|
||||
<div className="flex justify-between items-center">
|
||||
<span className="text-slate-600">GPU</span>
|
||||
<span className="font-semibold">
|
||||
{status?.gpu_name || '-'}
|
||||
</span>
|
||||
</div>
|
||||
<div className="flex justify-between items-center">
|
||||
<span className="text-slate-600">Stundensatz</span>
|
||||
<span className="font-semibold">
|
||||
{status?.dph_total ? `$${status.dph_total.toFixed(4)}/h` : '-'}
|
||||
</span>
|
||||
</div>
|
||||
<div className="flex justify-between items-center">
|
||||
<span className="text-slate-600">Letzte Aktivitaet</span>
|
||||
<span className="text-sm">
|
||||
{status?.last_activity
|
||||
? new Date(status.last_activity).toLocaleString('de-DE')
|
||||
: '-'}
|
||||
</span>
|
||||
</div>
|
||||
{status?.endpoint_base_url && status.status === 'running' && (
|
||||
<div className="pt-4 border-t border-slate-100">
|
||||
<div className="text-slate-600 text-sm mb-1">Endpoint</div>
|
||||
<code className="text-xs bg-slate-100 px-2 py-1 rounded block overflow-x-auto">
|
||||
{status.endpoint_base_url}
|
||||
</code>
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Info */}
|
||||
<div className="bg-orange-50 border border-orange-200 rounded-xl p-4">
|
||||
<div className="flex gap-3">
|
||||
<svg className="w-5 h-5 text-orange-600 flex-shrink-0 mt-0.5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
|
||||
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M13 16h-1v-4h-1m1-4h.01M21 12a9 9 0 11-18 0 9 9 0 0118 0z" />
|
||||
</svg>
|
||||
<div>
|
||||
<h4 className="font-semibold text-orange-900">Auto-Shutdown</h4>
|
||||
<p className="text-sm text-orange-800 mt-1">
|
||||
Die GPU-Instanz wird automatisch gestoppt, wenn sie laengere Zeit inaktiv ist.
|
||||
Der Status wird alle 30 Sekunden automatisch aktualisiert.
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
@@ -110,8 +110,7 @@ const INFRASTRUCTURE_COMPONENTS: Component[] = [
|
||||
{ type: 'service', name: 'ERPNext', version: 'v15', category: 'erp', port: '8090', description: 'Open Source ERP System', license: 'GPL-3.0', sourceUrl: 'https://github.com/frappe/erpnext' },
|
||||
|
||||
// ===== CI/CD & VERSION CONTROL =====
|
||||
{ type: 'service', name: 'Woodpecker CI', version: '2.x', category: 'cicd', port: '8082', description: 'Self-hosted CI/CD Pipeline (Drone Fork)', license: 'Apache-2.0', sourceUrl: 'https://github.com/woodpecker-ci/woodpecker' },
|
||||
{ type: 'service', name: 'Gitea', version: '1.21', category: 'cicd', port: '3003', description: 'Self-hosted Git Service', license: 'MIT', sourceUrl: 'https://github.com/go-gitea/gitea' },
|
||||
{ type: 'service', name: 'Gitea', version: '1.21', category: 'cicd', port: '3003', description: 'Self-hosted Git Service with Actions CI/CD', license: 'MIT', sourceUrl: 'https://github.com/go-gitea/gitea' },
|
||||
{ type: 'service', name: 'Dokploy', version: '0.26.7', category: 'cicd', port: '3000', description: 'Self-hosted PaaS (Vercel/Heroku Alternative)', license: 'Apache-2.0', sourceUrl: 'https://github.com/Dokploy/dokploy' },
|
||||
|
||||
// ===== DEVELOPMENT =====
|
||||
@@ -120,11 +119,6 @@ const INFRASTRUCTURE_COMPONENTS: Component[] = [
|
||||
// ===== GAME (Breakpilot Drive) =====
|
||||
{ type: 'service', name: 'Breakpilot Drive (Unity WebGL)', version: '6000.0', category: 'game', port: '3001', description: 'Lernspiel fuer Schueler (Klasse 2-6)', license: 'Proprietary', sourceUrl: '-' },
|
||||
|
||||
// ===== VOICE SERVICE =====
|
||||
{ type: 'service', name: 'Voice Service (FastAPI)', version: '1.0', category: 'voice', port: '8091', description: 'Voice-First Interface mit PersonaPlex-7B & TaskOrchestrator', license: 'Proprietary', sourceUrl: '-' },
|
||||
{ type: 'service', name: 'PersonaPlex-7B (NVIDIA)', version: '7B', category: 'voice', port: '8998', description: 'Full-Duplex Speech-to-Speech (Produktion)', license: 'MIT/NVIDIA Open Model', sourceUrl: 'https://developer.nvidia.com' },
|
||||
{ type: 'service', name: 'TaskOrchestrator', version: '1.0', category: 'voice', port: '-', description: 'Agent-Orchestrierung mit Task State Machine', license: 'Proprietary', sourceUrl: '-' },
|
||||
{ type: 'service', name: 'Mimi Audio Codec', version: '1.0', category: 'voice', port: '-', description: 'Audio Streaming (24kHz, 80ms Frames)', license: 'MIT', sourceUrl: '-' },
|
||||
|
||||
// ===== BQAS (Quality Assurance) =====
|
||||
{ type: 'service', name: 'BQAS Local Scheduler', version: '1.0', category: 'qa', port: '-', description: 'Lokale GitHub Actions Alternative (launchd)', license: 'Proprietary', sourceUrl: '-' },
|
||||
@@ -193,6 +187,15 @@ const PYTHON_PACKAGES: Component[] = [
|
||||
{ type: 'library', name: 'scipy', version: '1.14+', category: 'python', description: 'Signal Processing (Audio)', license: 'BSD-3-Clause', sourceUrl: 'https://github.com/scipy/scipy' },
|
||||
{ type: 'library', name: 'redis', version: '5.x', category: 'python', description: 'Valkey/Redis Client (Voice Sessions)', license: 'MIT', sourceUrl: 'https://github.com/redis/redis-py' },
|
||||
{ type: 'library', name: 'pydantic-settings', version: '2.x', category: 'python', description: 'Settings Management (Voice Config)', license: 'MIT', sourceUrl: 'https://github.com/pydantic/pydantic-settings' },
|
||||
{ type: 'library', name: 'pyspellchecker', version: '0.8.1+', category: 'python', description: 'Regel-basierte OCR-Korrektur (klausur-service Schritt 6)', license: 'MIT', sourceUrl: 'https://github.com/barrust/pyspellchecker' },
|
||||
{ type: 'library', name: 'pytesseract', version: '0.3.10+', category: 'python', description: 'Tesseract OCR Engine Wrapper (klausur-service)', license: 'Apache-2.0', sourceUrl: 'https://github.com/madmaze/pytesseract' },
|
||||
{ type: 'library', name: 'opencv-python-headless', version: '4.8+', category: 'python', description: 'Bildverarbeitung, Projektionsprofile, Inpainting (klausur-service)', license: 'Apache-2.0', sourceUrl: 'https://github.com/opencv/opencv-python' },
|
||||
{ type: 'library', name: 'rapidocr-onnxruntime', version: 'latest', category: 'python', description: 'Schnelles OCR ARM64 via ONNX (klausur-service)', license: 'Apache-2.0', sourceUrl: 'https://github.com/RapidAI/RapidOCR' },
|
||||
{ type: 'library', name: 'onnxruntime', version: 'latest', category: 'python', description: 'ONNX-Inferenz für RapidOCR (klausur-service)', license: 'MIT', sourceUrl: 'https://github.com/microsoft/onnxruntime' },
|
||||
{ type: 'library', name: 'eng-to-ipa', version: 'latest', category: 'python', description: 'IPA-Lautschrift-Lookup (klausur-service Vokabel-Pipeline)', license: 'MIT', sourceUrl: 'https://github.com/mphilli/English-to-IPA' },
|
||||
{ type: 'library', name: 'sentence-transformers', version: '2.2+', category: 'python', description: 'Lokale Embeddings (klausur-service, rag-service)', license: 'Apache-2.0', sourceUrl: 'https://github.com/UKPLab/sentence-transformers' },
|
||||
{ type: 'library', name: 'torch', version: '2.0+', category: 'python', description: 'ML-Framework CPU/MPS (TrOCR, klausur-service)', license: 'BSD-3-Clause', sourceUrl: 'https://github.com/pytorch/pytorch' },
|
||||
{ type: 'library', name: 'transformers', version: '4.x', category: 'python', description: 'HuggingFace Transformers (TrOCR, Handschrift-HTR)', license: 'Apache-2.0', sourceUrl: 'https://github.com/huggingface/transformers' },
|
||||
]
|
||||
|
||||
// Key Go modules (from go.mod files)
|
||||
|
||||
@@ -639,7 +639,7 @@ Tests bleiben wo sie sind:
|
||||
|
||||
<div className="mt-4 pt-4 border-t border-blue-200">
|
||||
<p className="text-sm text-blue-600">
|
||||
<strong>Daten-Fluss:</strong> Woodpecker CI → POST /api/tests/ci-result → PostgreSQL → Test Dashboard
|
||||
<strong>Daten-Fluss:</strong> Gitea Actions → POST /api/tests/ci-result → PostgreSQL → Test Dashboard
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
@@ -1185,9 +1185,6 @@ export default function TestDashboardPage() {
|
||||
const DEMO_SERVICES: ServiceTestInfo[] = [
|
||||
{ service: 'consent-service', display_name: 'Consent Service', port: 8081, language: 'go', total_tests: 22, passed_tests: 20, failed_tests: 2, skipped_tests: 0, pass_rate: 90.9, coverage_percent: 82.3, last_run: new Date().toISOString(), status: 'failed' },
|
||||
{ service: 'backend', display_name: 'Python Backend', port: 8000, language: 'python', total_tests: 40, passed_tests: 38, failed_tests: 2, skipped_tests: 0, pass_rate: 95.0, coverage_percent: 75.1, last_run: new Date().toISOString(), status: 'failed' },
|
||||
{ service: 'voice-service', display_name: 'Voice Service', port: 8091, language: 'python', total_tests: 5, passed_tests: 5, failed_tests: 0, skipped_tests: 0, pass_rate: 100, coverage_percent: 68.9, last_run: new Date().toISOString(), status: 'passed' },
|
||||
{ service: 'bqas-golden', display_name: 'BQAS Golden Suite', port: 8091, language: 'python', total_tests: 97, passed_tests: 89, failed_tests: 8, skipped_tests: 0, pass_rate: 91.7, coverage_percent: undefined, last_run: new Date().toISOString(), status: 'failed' },
|
||||
{ service: 'bqas-rag', display_name: 'BQAS RAG Tests', port: 8091, language: 'python', total_tests: 20, passed_tests: 18, failed_tests: 2, skipped_tests: 0, pass_rate: 90.0, coverage_percent: undefined, last_run: new Date().toISOString(), status: 'failed' },
|
||||
{ service: 'klausur-service', display_name: 'Klausur Service', port: 8086, language: 'python', total_tests: 8, passed_tests: 8, failed_tests: 0, skipped_tests: 0, pass_rate: 100, coverage_percent: 71.2, last_run: new Date().toISOString(), status: 'passed' },
|
||||
{ service: 'billing-service', display_name: 'Billing Service', port: 8082, language: 'go', total_tests: 5, passed_tests: 5, failed_tests: 0, skipped_tests: 0, pass_rate: 100, coverage_percent: 78.5, last_run: new Date().toISOString(), status: 'passed' },
|
||||
{ service: 'school-service', display_name: 'School Service', port: 8084, language: 'go', total_tests: 6, passed_tests: 6, failed_tests: 0, skipped_tests: 0, pass_rate: 100, coverage_percent: 81.4, last_run: new Date().toISOString(), status: 'passed' },
|
||||
|
||||
@@ -1,210 +0,0 @@
|
||||
/**
|
||||
* Communication Admin API Route - Stats Proxy
|
||||
*
|
||||
* Proxies requests to Matrix/Jitsi admin endpoints via backend
|
||||
* Aggregates statistics from both services
|
||||
*/
|
||||
|
||||
import { NextRequest, NextResponse } from 'next/server'
|
||||
|
||||
// Service URLs
|
||||
const BACKEND_URL = process.env.NEXT_PUBLIC_BACKEND_URL || 'http://localhost:8000'
|
||||
const CONSENT_SERVICE_URL = process.env.CONSENT_SERVICE_URL || 'http://localhost:8081'
|
||||
const MATRIX_ADMIN_URL = process.env.MATRIX_ADMIN_URL || 'http://localhost:8448'
|
||||
const JITSI_URL = process.env.JITSI_URL || 'http://localhost:8443'
|
||||
|
||||
// Matrix Admin Token (for Synapse Admin API)
|
||||
const MATRIX_ADMIN_TOKEN = process.env.MATRIX_ADMIN_TOKEN || ''
|
||||
|
||||
interface MatrixStats {
|
||||
total_users: number
|
||||
active_users: number
|
||||
total_rooms: number
|
||||
active_rooms: number
|
||||
messages_today: number
|
||||
messages_this_week: number
|
||||
status: 'online' | 'offline' | 'degraded'
|
||||
}
|
||||
|
||||
interface JitsiStats {
|
||||
active_meetings: number
|
||||
total_participants: number
|
||||
meetings_today: number
|
||||
average_duration_minutes: number
|
||||
peak_concurrent_users: number
|
||||
total_minutes_today: number
|
||||
status: 'online' | 'offline' | 'degraded'
|
||||
}
|
||||
|
||||
async function fetchFromBackend(): Promise<{
|
||||
matrix: MatrixStats
|
||||
jitsi: JitsiStats
|
||||
active_meetings: unknown[]
|
||||
recent_rooms: unknown[]
|
||||
} | null> {
|
||||
try {
|
||||
const response = await fetch(`${BACKEND_URL}/api/v1/communication/admin/stats`, {
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
signal: AbortSignal.timeout(5000),
|
||||
})
|
||||
if (response.ok) {
|
||||
return await response.json()
|
||||
}
|
||||
} catch (error) {
|
||||
console.log('Backend not reachable, trying consent service:', error)
|
||||
}
|
||||
return null
|
||||
}
|
||||
|
||||
async function fetchFromConsentService(): Promise<{
|
||||
matrix: MatrixStats
|
||||
jitsi: JitsiStats
|
||||
active_meetings: unknown[]
|
||||
recent_rooms: unknown[]
|
||||
} | null> {
|
||||
try {
|
||||
const response = await fetch(`${CONSENT_SERVICE_URL}/api/v1/communication/admin/stats`, {
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
signal: AbortSignal.timeout(5000),
|
||||
})
|
||||
if (response.ok) {
|
||||
return await response.json()
|
||||
}
|
||||
} catch (error) {
|
||||
console.log('Consent service not reachable:', error)
|
||||
}
|
||||
return null
|
||||
}
|
||||
|
||||
async function fetchMatrixStats(): Promise<MatrixStats> {
|
||||
try {
|
||||
// Check if Matrix is reachable
|
||||
const healthCheck = await fetch(`${MATRIX_ADMIN_URL}/_matrix/client/versions`, {
|
||||
signal: AbortSignal.timeout(5000)
|
||||
})
|
||||
|
||||
if (healthCheck.ok) {
|
||||
// Try to get user count from admin API
|
||||
if (MATRIX_ADMIN_TOKEN) {
|
||||
try {
|
||||
const usersResponse = await fetch(`${MATRIX_ADMIN_URL}/_synapse/admin/v2/users?limit=1`, {
|
||||
headers: { 'Authorization': `Bearer ${MATRIX_ADMIN_TOKEN}` },
|
||||
signal: AbortSignal.timeout(5000),
|
||||
})
|
||||
if (usersResponse.ok) {
|
||||
const data = await usersResponse.json()
|
||||
return {
|
||||
total_users: data.total || 0,
|
||||
active_users: 0,
|
||||
total_rooms: 0,
|
||||
active_rooms: 0,
|
||||
messages_today: 0,
|
||||
messages_this_week: 0,
|
||||
status: 'online'
|
||||
}
|
||||
}
|
||||
} catch {
|
||||
// Admin API not available
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
total_users: 0,
|
||||
active_users: 0,
|
||||
total_rooms: 0,
|
||||
active_rooms: 0,
|
||||
messages_today: 0,
|
||||
messages_this_week: 0,
|
||||
status: 'degraded' // Server reachable but no admin access
|
||||
}
|
||||
}
|
||||
} catch (error) {
|
||||
console.error('Matrix stats fetch error:', error)
|
||||
}
|
||||
|
||||
return {
|
||||
total_users: 0,
|
||||
active_users: 0,
|
||||
total_rooms: 0,
|
||||
active_rooms: 0,
|
||||
messages_today: 0,
|
||||
messages_this_week: 0,
|
||||
status: 'offline'
|
||||
}
|
||||
}
|
||||
|
||||
async function fetchJitsiStats(): Promise<JitsiStats> {
|
||||
try {
|
||||
// Check if Jitsi is reachable
|
||||
const healthCheck = await fetch(`${JITSI_URL}/http-bind`, {
|
||||
method: 'HEAD',
|
||||
signal: AbortSignal.timeout(5000)
|
||||
})
|
||||
|
||||
return {
|
||||
active_meetings: 0,
|
||||
total_participants: 0,
|
||||
meetings_today: 0,
|
||||
average_duration_minutes: 0,
|
||||
peak_concurrent_users: 0,
|
||||
total_minutes_today: 0,
|
||||
status: healthCheck.ok ? 'online' : 'offline'
|
||||
}
|
||||
} catch (error) {
|
||||
console.error('Jitsi stats fetch error:', error)
|
||||
return {
|
||||
active_meetings: 0,
|
||||
total_participants: 0,
|
||||
meetings_today: 0,
|
||||
average_duration_minutes: 0,
|
||||
peak_concurrent_users: 0,
|
||||
total_minutes_today: 0,
|
||||
status: 'offline'
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
export async function GET(request: NextRequest) {
|
||||
try {
|
||||
// Try backend first
|
||||
let data = await fetchFromBackend()
|
||||
|
||||
// Fallback to consent service
|
||||
if (!data) {
|
||||
data = await fetchFromConsentService()
|
||||
}
|
||||
|
||||
// If both fail, try direct service checks
|
||||
if (!data) {
|
||||
const [matrixStats, jitsiStats] = await Promise.all([
|
||||
fetchMatrixStats(),
|
||||
fetchJitsiStats()
|
||||
])
|
||||
|
||||
data = {
|
||||
matrix: matrixStats,
|
||||
jitsi: jitsiStats,
|
||||
active_meetings: [],
|
||||
recent_rooms: []
|
||||
}
|
||||
}
|
||||
|
||||
return NextResponse.json({
|
||||
...data,
|
||||
last_updated: new Date().toISOString()
|
||||
})
|
||||
} catch (error) {
|
||||
console.error('Communication stats error:', error)
|
||||
return NextResponse.json(
|
||||
{
|
||||
error: 'Fehler beim Abrufen der Statistiken',
|
||||
matrix: { status: 'offline', total_users: 0, active_users: 0, total_rooms: 0, active_rooms: 0, messages_today: 0, messages_this_week: 0 },
|
||||
jitsi: { status: 'offline', active_meetings: 0, total_participants: 0, meetings_today: 0, average_duration_minutes: 0, peak_concurrent_users: 0, total_minutes_today: 0 },
|
||||
active_meetings: [],
|
||||
recent_rooms: [],
|
||||
last_updated: new Date().toISOString()
|
||||
},
|
||||
{ status: 503 }
|
||||
)
|
||||
}
|
||||
}
|
||||
@@ -16,7 +16,6 @@ const SERVICES: ServiceConfig[] = [
|
||||
// Core Services
|
||||
{ name: 'Backend API', port: 8000, endpoint: '/health', category: 'core' },
|
||||
{ name: 'Consent Service', port: 8081, endpoint: '/api/v1/health', category: 'core' },
|
||||
{ name: 'Voice Service', port: 8091, endpoint: '/health', category: 'core' },
|
||||
{ name: 'Klausur Service', port: 8086, endpoint: '/health', category: 'core' },
|
||||
{ name: 'Mail Service (Mailpit)', port: 8025, endpoint: '/api/v1/info', category: 'core' },
|
||||
{ name: 'Edu Search', port: 8088, endpoint: '/health', category: 'core' },
|
||||
@@ -41,7 +40,6 @@ const getInternalHost = (port: number): string => {
|
||||
const serviceMap: Record<number, string> = {
|
||||
8000: 'backend',
|
||||
8081: 'consent-service',
|
||||
8091: 'voice-service',
|
||||
8086: 'klausur-service',
|
||||
8025: 'mailpit',
|
||||
8088: 'edu-search-service',
|
||||
|
||||
@@ -1,208 +0,0 @@
|
||||
import { NextRequest, NextResponse } from 'next/server'
|
||||
|
||||
// Woodpecker API configuration
|
||||
const WOODPECKER_URL = process.env.WOODPECKER_URL || 'http://woodpecker-server:8000'
|
||||
const WOODPECKER_TOKEN = process.env.WOODPECKER_TOKEN || ''
|
||||
|
||||
export interface PipelineStep {
|
||||
name: string
|
||||
state: 'pending' | 'running' | 'success' | 'failure' | 'skipped'
|
||||
exit_code: number
|
||||
error?: string
|
||||
}
|
||||
|
||||
export interface Pipeline {
|
||||
id: number
|
||||
number: number
|
||||
status: 'pending' | 'running' | 'success' | 'failure' | 'error'
|
||||
event: string
|
||||
branch: string
|
||||
commit: string
|
||||
message: string
|
||||
author: string
|
||||
created: number
|
||||
started: number
|
||||
finished: number
|
||||
steps: PipelineStep[]
|
||||
errors?: string[]
|
||||
}
|
||||
|
||||
export interface WoodpeckerStatusResponse {
|
||||
status: 'online' | 'offline'
|
||||
pipelines: Pipeline[]
|
||||
lastUpdate: string
|
||||
error?: string
|
||||
}
|
||||
|
||||
export async function GET(request: NextRequest) {
|
||||
const searchParams = request.nextUrl.searchParams
|
||||
const repoId = searchParams.get('repo') || '1'
|
||||
const limit = parseInt(searchParams.get('limit') || '10')
|
||||
|
||||
try {
|
||||
// Fetch pipelines from Woodpecker API
|
||||
const response = await fetch(
|
||||
`${WOODPECKER_URL}/api/repos/${repoId}/pipelines?per_page=${limit}`,
|
||||
{
|
||||
headers: {
|
||||
'Authorization': `Bearer ${WOODPECKER_TOKEN}`,
|
||||
'Content-Type': 'application/json',
|
||||
},
|
||||
cache: 'no-store',
|
||||
}
|
||||
)
|
||||
|
||||
if (!response.ok) {
|
||||
return NextResponse.json({
|
||||
status: 'offline',
|
||||
pipelines: [],
|
||||
lastUpdate: new Date().toISOString(),
|
||||
error: `Woodpecker API nicht erreichbar (${response.status})`
|
||||
} as WoodpeckerStatusResponse)
|
||||
}
|
||||
|
||||
const rawPipelines = await response.json()
|
||||
|
||||
// Transform pipelines to our format
|
||||
const pipelines: Pipeline[] = rawPipelines.map((p: any) => {
|
||||
// Extract errors from workflows/steps
|
||||
const errors: string[] = []
|
||||
const steps: PipelineStep[] = []
|
||||
|
||||
if (p.workflows) {
|
||||
for (const workflow of p.workflows) {
|
||||
if (workflow.children) {
|
||||
for (const child of workflow.children) {
|
||||
steps.push({
|
||||
name: child.name,
|
||||
state: child.state,
|
||||
exit_code: child.exit_code,
|
||||
error: child.error
|
||||
})
|
||||
if (child.state === 'failure' && child.error) {
|
||||
errors.push(`${child.name}: ${child.error}`)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
id: p.id,
|
||||
number: p.number,
|
||||
status: p.status,
|
||||
event: p.event,
|
||||
branch: p.branch,
|
||||
commit: p.commit?.substring(0, 7) || '',
|
||||
message: p.message || '',
|
||||
author: p.author,
|
||||
created: p.created,
|
||||
started: p.started,
|
||||
finished: p.finished,
|
||||
steps,
|
||||
errors: errors.length > 0 ? errors : undefined
|
||||
}
|
||||
})
|
||||
|
||||
return NextResponse.json({
|
||||
status: 'online',
|
||||
pipelines,
|
||||
lastUpdate: new Date().toISOString()
|
||||
} as WoodpeckerStatusResponse)
|
||||
|
||||
} catch (error) {
|
||||
console.error('Woodpecker API error:', error)
|
||||
return NextResponse.json({
|
||||
status: 'offline',
|
||||
pipelines: [],
|
||||
lastUpdate: new Date().toISOString(),
|
||||
error: 'Fehler beim Abrufen des Woodpecker Status'
|
||||
} as WoodpeckerStatusResponse)
|
||||
}
|
||||
}
|
||||
|
||||
// Trigger a new pipeline
|
||||
export async function POST(request: NextRequest) {
|
||||
try {
|
||||
const body = await request.json()
|
||||
const { repoId = '1', branch = 'main' } = body
|
||||
|
||||
const response = await fetch(
|
||||
`${WOODPECKER_URL}/api/repos/${repoId}/pipelines`,
|
||||
{
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Authorization': `Bearer ${WOODPECKER_TOKEN}`,
|
||||
'Content-Type': 'application/json',
|
||||
},
|
||||
body: JSON.stringify({ branch }),
|
||||
}
|
||||
)
|
||||
|
||||
if (!response.ok) {
|
||||
return NextResponse.json(
|
||||
{ error: 'Pipeline konnte nicht gestartet werden' },
|
||||
{ status: 500 }
|
||||
)
|
||||
}
|
||||
|
||||
const pipeline = await response.json()
|
||||
return NextResponse.json({
|
||||
success: true,
|
||||
pipeline: {
|
||||
id: pipeline.id,
|
||||
number: pipeline.number,
|
||||
status: pipeline.status
|
||||
}
|
||||
})
|
||||
|
||||
} catch (error) {
|
||||
console.error('Pipeline trigger error:', error)
|
||||
return NextResponse.json(
|
||||
{ error: 'Fehler beim Starten der Pipeline' },
|
||||
{ status: 500 }
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
// Get pipeline logs
|
||||
export async function PUT(request: NextRequest) {
|
||||
try {
|
||||
const body = await request.json()
|
||||
const { repoId = '1', pipelineNumber, stepId } = body
|
||||
|
||||
if (!pipelineNumber || !stepId) {
|
||||
return NextResponse.json(
|
||||
{ error: 'pipelineNumber und stepId erforderlich' },
|
||||
{ status: 400 }
|
||||
)
|
||||
}
|
||||
|
||||
const response = await fetch(
|
||||
`${WOODPECKER_URL}/api/repos/${repoId}/pipelines/${pipelineNumber}/logs/${stepId}`,
|
||||
{
|
||||
headers: {
|
||||
'Authorization': `Bearer ${WOODPECKER_TOKEN}`,
|
||||
'Content-Type': 'application/json',
|
||||
},
|
||||
}
|
||||
)
|
||||
|
||||
if (!response.ok) {
|
||||
return NextResponse.json(
|
||||
{ error: 'Logs nicht verfuegbar' },
|
||||
{ status: response.status }
|
||||
)
|
||||
}
|
||||
|
||||
const logs = await response.json()
|
||||
return NextResponse.json({ logs })
|
||||
|
||||
} catch (error) {
|
||||
console.error('Pipeline logs error:', error)
|
||||
return NextResponse.json(
|
||||
{ error: 'Fehler beim Abrufen der Logs' },
|
||||
{ status: 500 }
|
||||
)
|
||||
}
|
||||
}
|
||||
@@ -1,81 +0,0 @@
|
||||
import { NextResponse } from 'next/server'
|
||||
|
||||
/**
|
||||
* Server-side proxy for Mailpit API
|
||||
* Avoids CORS and mixed-content issues by fetching from server
|
||||
*/
|
||||
|
||||
// Use internal Docker hostname when running in container
|
||||
const getMailpitHost = (): string => {
|
||||
return process.env.BACKEND_URL ? 'mailpit' : 'localhost'
|
||||
}
|
||||
|
||||
export async function GET() {
|
||||
const host = getMailpitHost()
|
||||
const mailpitUrl = `http://${host}:8025/api/v1/info`
|
||||
|
||||
try {
|
||||
const response = await fetch(mailpitUrl, {
|
||||
method: 'GET',
|
||||
signal: AbortSignal.timeout(5000),
|
||||
})
|
||||
|
||||
if (!response.ok) {
|
||||
return NextResponse.json(
|
||||
{ error: 'Mailpit API error', status: response.status },
|
||||
{ status: response.status }
|
||||
)
|
||||
}
|
||||
|
||||
const data = await response.json()
|
||||
|
||||
// Transform Mailpit response to our expected format
|
||||
return NextResponse.json({
|
||||
stats: {
|
||||
totalAccounts: 1,
|
||||
activeAccounts: 1,
|
||||
totalEmails: data.Messages || 0,
|
||||
unreadEmails: data.Unread || 0,
|
||||
totalTasks: 0,
|
||||
pendingTasks: 0,
|
||||
overdueTasks: 0,
|
||||
aiAnalyzedCount: 0,
|
||||
lastSyncTime: new Date().toISOString(),
|
||||
},
|
||||
accounts: [{
|
||||
id: 'mailpit-dev',
|
||||
email: 'dev@mailpit.local',
|
||||
displayName: 'Mailpit (Development)',
|
||||
imapHost: 'mailpit',
|
||||
imapPort: 1143,
|
||||
smtpHost: 'mailpit',
|
||||
smtpPort: 1025,
|
||||
status: 'active' as const,
|
||||
lastSync: new Date().toISOString(),
|
||||
emailCount: data.Messages || 0,
|
||||
unreadCount: data.Unread || 0,
|
||||
createdAt: new Date().toISOString(),
|
||||
}],
|
||||
syncStatus: {
|
||||
running: false,
|
||||
accountsInProgress: [],
|
||||
lastCompleted: new Date().toISOString(),
|
||||
errors: [],
|
||||
},
|
||||
mailpitInfo: {
|
||||
version: data.Version,
|
||||
databaseSize: data.DatabaseSize,
|
||||
uptime: data.RuntimeStats?.Uptime,
|
||||
}
|
||||
})
|
||||
} catch (error) {
|
||||
console.error('Failed to fetch from Mailpit:', error)
|
||||
return NextResponse.json(
|
||||
{
|
||||
error: 'Failed to connect to Mailpit',
|
||||
details: error instanceof Error ? error.message : 'Unknown error'
|
||||
},
|
||||
{ status: 503 }
|
||||
)
|
||||
}
|
||||
}
|
||||
@@ -1,172 +0,0 @@
|
||||
/**
|
||||
* Alerts API Proxy - Catch-all route
|
||||
* Proxies all /api/alerts/* requests to backend
|
||||
* Supports: inbox, topics, rules, profile, stats, etc.
|
||||
*/
|
||||
|
||||
import { NextRequest, NextResponse } from 'next/server'
|
||||
|
||||
const BACKEND_URL = process.env.BACKEND_URL || 'http://localhost:8000'
|
||||
|
||||
function getForwardHeaders(request: NextRequest): HeadersInit {
|
||||
const headers: HeadersInit = {
|
||||
'Content-Type': 'application/json',
|
||||
}
|
||||
|
||||
// Forward cookie for session auth
|
||||
const cookie = request.headers.get('cookie')
|
||||
if (cookie) {
|
||||
headers['Cookie'] = cookie
|
||||
}
|
||||
|
||||
// Forward authorization header if present
|
||||
const auth = request.headers.get('authorization')
|
||||
if (auth) {
|
||||
headers['Authorization'] = auth
|
||||
}
|
||||
|
||||
return headers
|
||||
}
|
||||
|
||||
export async function GET(
|
||||
request: NextRequest,
|
||||
{ params }: { params: Promise<{ path: string[] }> }
|
||||
) {
|
||||
const { path } = await params
|
||||
const pathStr = path.join('/')
|
||||
const searchParams = request.nextUrl.searchParams.toString()
|
||||
const url = `${BACKEND_URL}/api/alerts/${pathStr}${searchParams ? `?${searchParams}` : ''}`
|
||||
|
||||
try {
|
||||
const response = await fetch(url, {
|
||||
method: 'GET',
|
||||
headers: getForwardHeaders(request),
|
||||
signal: AbortSignal.timeout(30000)
|
||||
})
|
||||
|
||||
if (!response.ok) {
|
||||
const errorText = await response.text()
|
||||
return NextResponse.json(
|
||||
{ error: `Backend Error: ${response.status}`, details: errorText },
|
||||
{ status: response.status }
|
||||
)
|
||||
}
|
||||
|
||||
const data = await response.json()
|
||||
return NextResponse.json(data)
|
||||
} catch (error) {
|
||||
console.error('Alerts API proxy error:', error)
|
||||
return NextResponse.json(
|
||||
{ error: 'Verbindung zum Backend fehlgeschlagen' },
|
||||
{ status: 503 }
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
export async function POST(
|
||||
request: NextRequest,
|
||||
{ params }: { params: Promise<{ path: string[] }> }
|
||||
) {
|
||||
const { path } = await params
|
||||
const pathStr = path.join('/')
|
||||
const url = `${BACKEND_URL}/api/alerts/${pathStr}`
|
||||
|
||||
try {
|
||||
const body = await request.json()
|
||||
|
||||
const response = await fetch(url, {
|
||||
method: 'POST',
|
||||
headers: getForwardHeaders(request),
|
||||
body: JSON.stringify(body),
|
||||
signal: AbortSignal.timeout(30000)
|
||||
})
|
||||
|
||||
if (!response.ok) {
|
||||
const errorText = await response.text()
|
||||
return NextResponse.json(
|
||||
{ error: `Backend Error: ${response.status}`, details: errorText },
|
||||
{ status: response.status }
|
||||
)
|
||||
}
|
||||
|
||||
const data = await response.json()
|
||||
return NextResponse.json(data)
|
||||
} catch (error) {
|
||||
console.error('Alerts API proxy error:', error)
|
||||
return NextResponse.json(
|
||||
{ error: 'Verbindung zum Backend fehlgeschlagen' },
|
||||
{ status: 503 }
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
export async function PUT(
|
||||
request: NextRequest,
|
||||
{ params }: { params: Promise<{ path: string[] }> }
|
||||
) {
|
||||
const { path } = await params
|
||||
const pathStr = path.join('/')
|
||||
const url = `${BACKEND_URL}/api/alerts/${pathStr}`
|
||||
|
||||
try {
|
||||
const body = await request.json()
|
||||
|
||||
const response = await fetch(url, {
|
||||
method: 'PUT',
|
||||
headers: getForwardHeaders(request),
|
||||
body: JSON.stringify(body),
|
||||
signal: AbortSignal.timeout(30000)
|
||||
})
|
||||
|
||||
if (!response.ok) {
|
||||
const errorText = await response.text()
|
||||
return NextResponse.json(
|
||||
{ error: `Backend Error: ${response.status}`, details: errorText },
|
||||
{ status: response.status }
|
||||
)
|
||||
}
|
||||
|
||||
const data = await response.json()
|
||||
return NextResponse.json(data)
|
||||
} catch (error) {
|
||||
console.error('Alerts API proxy error:', error)
|
||||
return NextResponse.json(
|
||||
{ error: 'Verbindung zum Backend fehlgeschlagen' },
|
||||
{ status: 503 }
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
export async function DELETE(
|
||||
request: NextRequest,
|
||||
{ params }: { params: Promise<{ path: string[] }> }
|
||||
) {
|
||||
const { path } = await params
|
||||
const pathStr = path.join('/')
|
||||
const url = `${BACKEND_URL}/api/alerts/${pathStr}`
|
||||
|
||||
try {
|
||||
const response = await fetch(url, {
|
||||
method: 'DELETE',
|
||||
headers: getForwardHeaders(request),
|
||||
signal: AbortSignal.timeout(30000)
|
||||
})
|
||||
|
||||
if (!response.ok) {
|
||||
const errorText = await response.text()
|
||||
return NextResponse.json(
|
||||
{ error: `Backend Error: ${response.status}`, details: errorText },
|
||||
{ status: response.status }
|
||||
)
|
||||
}
|
||||
|
||||
const data = await response.json()
|
||||
return NextResponse.json(data)
|
||||
} catch (error) {
|
||||
console.error('Alerts API proxy error:', error)
|
||||
return NextResponse.json(
|
||||
{ error: 'Verbindung zum Backend fehlgeschlagen' },
|
||||
{ status: 503 }
|
||||
)
|
||||
}
|
||||
}
|
||||
@@ -1,273 +0,0 @@
|
||||
import { NextRequest, NextResponse } from 'next/server'
|
||||
import type { WoodpeckerWebhookPayload, ExtractedError, BacklogSource } from '@/types/infrastructure-modules'
|
||||
|
||||
// =============================================================================
|
||||
// Configuration
|
||||
// =============================================================================
|
||||
|
||||
// Webhook secret for verification (optional but recommended)
|
||||
const WEBHOOK_SECRET = process.env.WOODPECKER_WEBHOOK_SECRET || ''
|
||||
|
||||
// Internal API URL for log extraction
|
||||
const LOG_EXTRACT_URL = process.env.NEXT_PUBLIC_APP_URL
|
||||
? `${process.env.NEXT_PUBLIC_APP_URL}/api/infrastructure/log-extract/extract`
|
||||
: 'http://localhost:3002/api/infrastructure/log-extract/extract'
|
||||
|
||||
// Test service API URL for backlog insertion
|
||||
const TEST_SERVICE_URL = process.env.TEST_SERVICE_URL || 'http://localhost:8086'
|
||||
|
||||
// =============================================================================
|
||||
// Helper Functions
|
||||
// =============================================================================
|
||||
|
||||
/**
|
||||
* Verify webhook signature (if secret is configured)
|
||||
*/
|
||||
function verifySignature(request: NextRequest, body: string): boolean {
|
||||
if (!WEBHOOK_SECRET) return true // Skip verification if no secret configured
|
||||
|
||||
const signature = request.headers.get('X-Woodpecker-Signature')
|
||||
if (!signature) return false
|
||||
|
||||
// Simple HMAC verification (Woodpecker uses SHA256)
|
||||
const crypto = require('crypto')
|
||||
const expectedSignature = crypto
|
||||
.createHmac('sha256', WEBHOOK_SECRET)
|
||||
.update(body)
|
||||
.digest('hex')
|
||||
|
||||
return signature === `sha256=${expectedSignature}`
|
||||
}
|
||||
|
||||
/**
|
||||
* Map error category to backlog priority
|
||||
*/
|
||||
function categoryToPriority(category: string): 'critical' | 'high' | 'medium' | 'low' {
|
||||
switch (category) {
|
||||
case 'security_warning':
|
||||
return 'critical'
|
||||
case 'build_error':
|
||||
return 'high'
|
||||
case 'license_violation':
|
||||
return 'high'
|
||||
case 'test_failure':
|
||||
return 'medium'
|
||||
case 'dependency_issue':
|
||||
return 'low'
|
||||
default:
|
||||
return 'medium'
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Map error category to error_type for backlog
|
||||
*/
|
||||
function categoryToErrorType(category: string): string {
|
||||
switch (category) {
|
||||
case 'security_warning':
|
||||
return 'security'
|
||||
case 'build_error':
|
||||
return 'build'
|
||||
case 'license_violation':
|
||||
return 'license'
|
||||
case 'test_failure':
|
||||
return 'test'
|
||||
case 'dependency_issue':
|
||||
return 'dependency'
|
||||
default:
|
||||
return 'unknown'
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Insert extracted errors into backlog
|
||||
*/
|
||||
async function insertIntoBacklog(
|
||||
errors: ExtractedError[],
|
||||
pipelineNumber: number,
|
||||
source: BacklogSource
|
||||
): Promise<{ inserted: number; failed: number }> {
|
||||
let inserted = 0
|
||||
let failed = 0
|
||||
|
||||
for (const error of errors) {
|
||||
try {
|
||||
// Create backlog item
|
||||
const backlogItem = {
|
||||
test_name: error.message.substring(0, 200), // Truncate long messages
|
||||
test_file: error.file_path || null,
|
||||
service: error.service || 'unknown',
|
||||
framework: `ci_cd_pipeline_${pipelineNumber}`,
|
||||
error_message: error.message,
|
||||
error_type: categoryToErrorType(error.category),
|
||||
status: 'open',
|
||||
priority: categoryToPriority(error.category),
|
||||
fix_suggestion: error.suggested_fix || null,
|
||||
notes: `Auto-generated from pipeline #${pipelineNumber}, step: ${error.step}, line: ${error.line}`,
|
||||
source, // Custom field to track origin
|
||||
}
|
||||
|
||||
// Try to insert into test service backlog
|
||||
const response = await fetch(`${TEST_SERVICE_URL}/api/v1/backlog`, {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
},
|
||||
body: JSON.stringify(backlogItem),
|
||||
})
|
||||
|
||||
if (response.ok) {
|
||||
inserted++
|
||||
} else {
|
||||
console.warn(`Failed to insert backlog item: ${response.status}`)
|
||||
failed++
|
||||
}
|
||||
} catch (insertError) {
|
||||
console.error('Backlog insertion error:', insertError)
|
||||
failed++
|
||||
}
|
||||
}
|
||||
|
||||
return { inserted, failed }
|
||||
}
|
||||
|
||||
// =============================================================================
|
||||
// API Handler
|
||||
// =============================================================================
|
||||
|
||||
/**
|
||||
* POST /api/webhooks/woodpecker
|
||||
*
|
||||
* Webhook endpoint fuer Woodpecker CI/CD Events.
|
||||
*
|
||||
* Bei Pipeline-Failure:
|
||||
* 1. Extrahiert Logs mit /api/infrastructure/logs/extract
|
||||
* 2. Parsed Fehler nach Kategorie
|
||||
* 3. Traegt automatisch in Backlog ein
|
||||
*
|
||||
* Request Body (Woodpecker Webhook Format):
|
||||
* - event: 'pipeline_success' | 'pipeline_failure' | 'pipeline_started'
|
||||
* - repo_id: number
|
||||
* - pipeline_number: number
|
||||
* - branch?: string
|
||||
* - commit?: string
|
||||
* - author?: string
|
||||
* - message?: string
|
||||
*/
|
||||
export async function POST(request: NextRequest) {
|
||||
try {
|
||||
const bodyText = await request.text()
|
||||
|
||||
// Verify webhook signature
|
||||
if (!verifySignature(request, bodyText)) {
|
||||
return NextResponse.json(
|
||||
{ error: 'Invalid webhook signature' },
|
||||
{ status: 401 }
|
||||
)
|
||||
}
|
||||
|
||||
const payload: WoodpeckerWebhookPayload = JSON.parse(bodyText)
|
||||
|
||||
// Log all events for debugging
|
||||
console.log(`Woodpecker webhook: ${payload.event} for pipeline #${payload.pipeline_number}`)
|
||||
|
||||
// Only process pipeline_failure events
|
||||
if (payload.event !== 'pipeline_failure') {
|
||||
return NextResponse.json({
|
||||
status: 'ignored',
|
||||
message: `Event ${payload.event} wird nicht verarbeitet`,
|
||||
pipeline_number: payload.pipeline_number,
|
||||
})
|
||||
}
|
||||
|
||||
// 1. Extract logs from failed pipeline
|
||||
console.log(`Extracting logs for failed pipeline #${payload.pipeline_number}`)
|
||||
|
||||
const extractResponse = await fetch(LOG_EXTRACT_URL, {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
},
|
||||
body: JSON.stringify({
|
||||
pipeline_number: payload.pipeline_number,
|
||||
repo_id: String(payload.repo_id),
|
||||
}),
|
||||
})
|
||||
|
||||
if (!extractResponse.ok) {
|
||||
const errorText = await extractResponse.text()
|
||||
console.error('Log extraction failed:', errorText)
|
||||
return NextResponse.json({
|
||||
status: 'error',
|
||||
message: 'Log-Extraktion fehlgeschlagen',
|
||||
pipeline_number: payload.pipeline_number,
|
||||
}, { status: 500 })
|
||||
}
|
||||
|
||||
const extractionResult = await extractResponse.json()
|
||||
const errors: ExtractedError[] = extractionResult.errors || []
|
||||
|
||||
console.log(`Extracted ${errors.length} errors from pipeline #${payload.pipeline_number}`)
|
||||
|
||||
// 2. Insert errors into backlog
|
||||
if (errors.length > 0) {
|
||||
const backlogResult = await insertIntoBacklog(
|
||||
errors,
|
||||
payload.pipeline_number,
|
||||
'ci_cd'
|
||||
)
|
||||
|
||||
console.log(`Backlog: ${backlogResult.inserted} inserted, ${backlogResult.failed} failed`)
|
||||
|
||||
return NextResponse.json({
|
||||
status: 'processed',
|
||||
pipeline_number: payload.pipeline_number,
|
||||
branch: payload.branch,
|
||||
commit: payload.commit,
|
||||
errors_found: errors.length,
|
||||
backlog_inserted: backlogResult.inserted,
|
||||
backlog_failed: backlogResult.failed,
|
||||
categories: {
|
||||
test_failure: errors.filter(e => e.category === 'test_failure').length,
|
||||
build_error: errors.filter(e => e.category === 'build_error').length,
|
||||
security_warning: errors.filter(e => e.category === 'security_warning').length,
|
||||
license_violation: errors.filter(e => e.category === 'license_violation').length,
|
||||
dependency_issue: errors.filter(e => e.category === 'dependency_issue').length,
|
||||
},
|
||||
})
|
||||
}
|
||||
|
||||
return NextResponse.json({
|
||||
status: 'processed',
|
||||
pipeline_number: payload.pipeline_number,
|
||||
message: 'Keine Fehler extrahiert',
|
||||
errors_found: 0,
|
||||
})
|
||||
|
||||
} catch (error) {
|
||||
console.error('Webhook processing error:', error)
|
||||
return NextResponse.json(
|
||||
{ error: 'Webhook-Verarbeitung fehlgeschlagen' },
|
||||
{ status: 500 }
|
||||
)
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* GET /api/webhooks/woodpecker
|
||||
*
|
||||
* Health check endpoint
|
||||
*/
|
||||
export async function GET() {
|
||||
return NextResponse.json({
|
||||
status: 'ready',
|
||||
endpoint: '/api/webhooks/woodpecker',
|
||||
events: ['pipeline_failure'],
|
||||
description: 'Woodpecker CI/CD Webhook Handler',
|
||||
configured: {
|
||||
webhook_secret: WEBHOOK_SECRET ? 'yes' : 'no',
|
||||
log_extract_url: LOG_EXTRACT_URL,
|
||||
test_service_url: TEST_SERVICE_URL,
|
||||
},
|
||||
})
|
||||
}
|
||||
@@ -92,25 +92,7 @@ function usePipelineLiveStatus(): PipelineLiveStatus | null {
|
||||
const [status, setStatus] = useState<PipelineLiveStatus | null>(null)
|
||||
|
||||
useEffect(() => {
|
||||
// Optional: Fetch live status from API
|
||||
// For now, return null and display static content
|
||||
// Uncomment below to enable live status fetching
|
||||
/*
|
||||
const fetchStatus = async () => {
|
||||
try {
|
||||
const response = await fetch('/api/admin/infrastructure/woodpecker/status')
|
||||
if (response.ok) {
|
||||
const data = await response.json()
|
||||
setStatus(data)
|
||||
}
|
||||
} catch (error) {
|
||||
console.error('Failed to fetch pipeline status:', error)
|
||||
}
|
||||
}
|
||||
fetchStatus()
|
||||
const interval = setInterval(fetchStatus, 30000) // Poll every 30s
|
||||
return () => clearInterval(interval)
|
||||
*/
|
||||
// Live status fetching not yet implemented
|
||||
}, [])
|
||||
|
||||
return status
|
||||
@@ -246,7 +228,7 @@ export function DevOpsPipelineSidebar({
|
||||
<div className="pt-2 border-t border-slate-200 dark:border-gray-700">
|
||||
<div className="text-xs text-slate-500 dark:text-slate-400 px-1">
|
||||
{currentTool === 'ci-cd' && (
|
||||
<span>Verwalten Sie Woodpecker Pipelines und Deployments</span>
|
||||
<span>Verwalten Sie Gitea Actions Pipelines und Deployments</span>
|
||||
)}
|
||||
{currentTool === 'tests' && (
|
||||
<span>Ueberwachen Sie 280+ Tests ueber alle Services</span>
|
||||
@@ -458,7 +440,7 @@ export function DevOpsPipelineSidebarResponsive({
|
||||
<div className="text-sm text-slate-600 dark:text-slate-400 p-3 bg-slate-50 dark:bg-gray-800 rounded-xl">
|
||||
{currentTool === 'ci-cd' && (
|
||||
<>
|
||||
<strong className="text-slate-700 dark:text-slate-300">Aktuell:</strong> Woodpecker Pipelines und Deployments verwalten
|
||||
<strong className="text-slate-700 dark:text-slate-300">Aktuell:</strong> Gitea Actions Pipelines und Deployments verwalten
|
||||
</>
|
||||
)}
|
||||
{currentTool === 'tests' && (
|
||||
|
||||
@@ -4,7 +4,7 @@
|
||||
* 3 Categories: Communication, Infrastructure, Development
|
||||
*/
|
||||
|
||||
export type CategoryId = 'communication' | 'infrastructure' | 'development'
|
||||
export type CategoryId = 'infrastructure'
|
||||
|
||||
export interface NavModule {
|
||||
id: string
|
||||
@@ -27,51 +27,6 @@ export interface NavCategory {
|
||||
}
|
||||
|
||||
export const navigation: NavCategory[] = [
|
||||
// =========================================================================
|
||||
// Kommunikation & Alerts (Green)
|
||||
// =========================================================================
|
||||
{
|
||||
id: 'communication',
|
||||
name: 'Kommunikation',
|
||||
icon: 'message-circle',
|
||||
color: '#22c55e',
|
||||
colorClass: 'communication',
|
||||
description: 'Matrix, Jitsi, E-Mail & Alerts',
|
||||
modules: [
|
||||
{
|
||||
id: 'video-chat',
|
||||
name: 'Video & Chat',
|
||||
href: '/communication/video-chat',
|
||||
description: 'Matrix & Jitsi Monitoring',
|
||||
purpose: 'Dashboard fuer Matrix Synapse und Jitsi Meet. Service-Status, aktive Meetings, Traffic.',
|
||||
audience: ['Admins', 'DevOps'],
|
||||
},
|
||||
{
|
||||
id: 'matrix',
|
||||
name: 'Voice Service',
|
||||
href: '/communication/matrix',
|
||||
description: 'PersonaPlex-7B & TaskOrchestrator',
|
||||
purpose: 'Voice-First Interface Konfiguration und Architektur-Dokumentation.',
|
||||
audience: ['Entwickler', 'Admins'],
|
||||
},
|
||||
{
|
||||
id: 'mail',
|
||||
name: 'Unified Inbox',
|
||||
href: '/communication/mail',
|
||||
description: 'E-Mail-Konten & KI-Analyse',
|
||||
purpose: 'E-Mail-Konten verwalten und KI-Kategorisierung nutzen.',
|
||||
audience: ['Support', 'Admins'],
|
||||
},
|
||||
{
|
||||
id: 'alerts',
|
||||
name: 'Alerts Monitoring',
|
||||
href: '/communication/alerts',
|
||||
description: 'Google Alerts & Feed-Ueberwachung',
|
||||
purpose: 'Google Alerts und RSS-Feeds fuer relevante Neuigkeiten ueberwachen.',
|
||||
audience: ['Marketing', 'Admins'],
|
||||
},
|
||||
],
|
||||
},
|
||||
// =========================================================================
|
||||
// Infrastruktur & DevOps (Orange)
|
||||
// =========================================================================
|
||||
@@ -83,15 +38,6 @@ export const navigation: NavCategory[] = [
|
||||
colorClass: 'infrastructure',
|
||||
description: 'GPU, Security, CI/CD & Monitoring',
|
||||
modules: [
|
||||
{
|
||||
id: 'gpu',
|
||||
name: 'GPU Infrastruktur',
|
||||
href: '/infrastructure/gpu',
|
||||
description: 'vast.ai GPU Management',
|
||||
purpose: 'GPU-Instanzen auf vast.ai fuer ML-Training und Inferenz verwalten.',
|
||||
audience: ['DevOps', 'Entwickler'],
|
||||
subgroup: 'Compute',
|
||||
},
|
||||
{
|
||||
id: 'middleware',
|
||||
name: 'Middleware',
|
||||
@@ -123,7 +69,7 @@ export const navigation: NavCategory[] = [
|
||||
id: 'ci-cd',
|
||||
name: 'CI/CD Dashboard',
|
||||
href: '/infrastructure/ci-cd',
|
||||
description: 'Gitea & Woodpecker Pipelines',
|
||||
description: 'Gitea Actions Pipelines',
|
||||
purpose: 'CI/CD Dashboard mit Pipelines, Deployment-Status und Container-Management.',
|
||||
audience: ['DevOps', 'Entwickler'],
|
||||
subgroup: 'DevOps Pipeline',
|
||||
@@ -139,43 +85,6 @@ export const navigation: NavCategory[] = [
|
||||
},
|
||||
],
|
||||
},
|
||||
// =========================================================================
|
||||
// Entwicklung (Slate)
|
||||
// =========================================================================
|
||||
{
|
||||
id: 'development',
|
||||
name: 'Entwicklung',
|
||||
icon: 'code',
|
||||
color: '#64748b',
|
||||
colorClass: 'development',
|
||||
description: 'Docs, Screen Flow & Brandbook',
|
||||
modules: [
|
||||
{
|
||||
id: 'docs',
|
||||
name: 'Developer Docs',
|
||||
href: '/development/docs',
|
||||
description: 'MkDocs Dokumentation',
|
||||
purpose: 'API-Dokumentation und Architektur-Diagramme durchsuchen.',
|
||||
audience: ['Entwickler'],
|
||||
},
|
||||
{
|
||||
id: 'screen-flow',
|
||||
name: 'Screen Flow',
|
||||
href: '/development/screen-flow',
|
||||
description: 'UI Screen-Verbindungen',
|
||||
purpose: 'Navigation und Screen-Verbindungen der Core-App visualisieren.',
|
||||
audience: ['Designer', 'Entwickler'],
|
||||
},
|
||||
{
|
||||
id: 'brandbook',
|
||||
name: 'Brandbook',
|
||||
href: '/development/brandbook',
|
||||
description: 'Corporate Design',
|
||||
purpose: 'Referenz fuer Logos, Farben, Typografie und Design-Richtlinien.',
|
||||
audience: ['Designer', 'Marketing'],
|
||||
},
|
||||
],
|
||||
},
|
||||
]
|
||||
|
||||
// Meta modules (always visible)
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
* Shared Types & Constants for Infrastructure/DevOps Modules
|
||||
*
|
||||
* Diese Datei enthaelt gemeinsame Typen und Konstanten fuer die DevOps-Pipeline:
|
||||
* - CI/CD: Woodpecker Pipelines & Deployments
|
||||
* - CI/CD: Gitea Actions Pipelines & Deployments
|
||||
* - Tests: Test Dashboard & Backlog
|
||||
* - SBOM: Software Bill of Materials & Lizenz-Checks
|
||||
* - Security: DevSecOps Scans & Vulnerabilities
|
||||
@@ -230,24 +230,6 @@ export interface LogExtractionResponse {
|
||||
// Webhook Types
|
||||
// =============================================================================
|
||||
|
||||
/**
|
||||
* Woodpecker Webhook Event Types
|
||||
*/
|
||||
export type WoodpeckerEventType = 'pipeline_success' | 'pipeline_failure' | 'pipeline_started'
|
||||
|
||||
/**
|
||||
* Woodpecker Webhook Payload
|
||||
*/
|
||||
export interface WoodpeckerWebhookPayload {
|
||||
event: WoodpeckerEventType
|
||||
repo_id: number
|
||||
pipeline_number: number
|
||||
branch?: string
|
||||
commit?: string
|
||||
author?: string
|
||||
message?: string
|
||||
}
|
||||
|
||||
// =============================================================================
|
||||
// LLM Integration Types
|
||||
// =============================================================================
|
||||
@@ -346,18 +328,14 @@ export interface PipelineLiveStatus {
|
||||
export const INFRASTRUCTURE_API_ENDPOINTS = {
|
||||
/** CI/CD Endpoints */
|
||||
CI_CD: {
|
||||
PIPELINES: '/api/admin/infrastructure/woodpecker',
|
||||
TRIGGER: '/api/admin/infrastructure/woodpecker/trigger',
|
||||
LOGS: '/api/admin/infrastructure/woodpecker/logs',
|
||||
PIPELINES: '/api/v1/security/sbom/pipeline/history',
|
||||
STATUS: '/api/v1/security/sbom/pipeline/status',
|
||||
TRIGGER: '/api/v1/security/sbom/pipeline/trigger',
|
||||
},
|
||||
/** Log Extraction Endpoints */
|
||||
LOG_EXTRACT: {
|
||||
EXTRACT: '/api/infrastructure/log-extract/extract',
|
||||
},
|
||||
/** Webhook Endpoints */
|
||||
WEBHOOKS: {
|
||||
WOODPECKER: '/api/webhooks/woodpecker',
|
||||
},
|
||||
/** LLM Endpoints */
|
||||
LLM: {
|
||||
ANALYZE: '/api/ai/analyze',
|
||||
@@ -375,7 +353,6 @@ export const INFRASTRUCTURE_API_ENDPOINTS = {
|
||||
*/
|
||||
export const DEVOPS_ARCHITECTURE = {
|
||||
services: [
|
||||
{ name: 'Woodpecker CI', port: 8000, description: 'CI/CD Pipeline Server' },
|
||||
{ name: 'Gitea', port: 3003, description: 'Git Repository Server' },
|
||||
{ name: 'Syft', type: 'CLI', description: 'SBOM Generator' },
|
||||
{ name: 'Grype', type: 'CLI', description: 'Vulnerability Scanner' },
|
||||
|
||||
@@ -18,7 +18,8 @@ COPY requirements.txt .
|
||||
RUN python -m venv /opt/venv
|
||||
ENV PATH="/opt/venv/bin:$PATH"
|
||||
RUN pip install --no-cache-dir --upgrade pip && \
|
||||
pip install --no-cache-dir -r requirements.txt
|
||||
pip install --no-cache-dir -r requirements.txt && \
|
||||
pip install --no-cache-dir semgrep bandit
|
||||
|
||||
# ---------- Runtime stage ----------
|
||||
FROM python:3.12-slim-bookworm
|
||||
@@ -38,8 +39,27 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
libgl1 \
|
||||
libglib2.0-0 \
|
||||
curl \
|
||||
git \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Install DevSecOps tools (gitleaks, trivy, grype, syft)
|
||||
ARG TARGETARCH
|
||||
RUN set -eux; \
|
||||
ARCH="${TARGETARCH:-$(dpkg --print-architecture)}"; \
|
||||
# Gitleaks
|
||||
GITLEAKS_VERSION=8.21.2; \
|
||||
if [ "$ARCH" = "arm64" ]; then GITLEAKS_ARCH=arm64; else GITLEAKS_ARCH=x64; fi; \
|
||||
curl -sSfL "https://github.com/gitleaks/gitleaks/releases/download/v${GITLEAKS_VERSION}/gitleaks_${GITLEAKS_VERSION}_linux_${GITLEAKS_ARCH}.tar.gz" \
|
||||
| tar xz -C /usr/local/bin gitleaks; \
|
||||
# Trivy
|
||||
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin; \
|
||||
# Grype
|
||||
curl -sSfL https://raw.githubusercontent.com/anchore/grype/main/install.sh | sh -s -- -b /usr/local/bin; \
|
||||
# Syft
|
||||
curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b /usr/local/bin; \
|
||||
# Verify
|
||||
gitleaks version && trivy --version && grype version && syft version
|
||||
|
||||
# Copy virtualenv from builder
|
||||
COPY --from=builder /opt/venv /opt/venv
|
||||
ENV PATH="/opt/venv/bin:$PATH"
|
||||
|
||||
@@ -25,7 +25,6 @@ from email_template_api import (
|
||||
)
|
||||
from system_api import router as system_router
|
||||
from security_api import router as security_router
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Middleware imports
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@@ -34,6 +34,9 @@ BACKEND_DIR = Path(__file__).parent
|
||||
REPORTS_DIR = BACKEND_DIR / "security-reports"
|
||||
SCRIPTS_DIR = BACKEND_DIR / "scripts"
|
||||
|
||||
# Projekt-Root fuer Security-Scans
|
||||
PROJECT_ROOT = BACKEND_DIR
|
||||
|
||||
# Sicherstellen, dass das Reports-Verzeichnis existiert
|
||||
try:
|
||||
REPORTS_DIR.mkdir(exist_ok=True)
|
||||
|
||||
@@ -8,6 +8,7 @@ require (
|
||||
github.com/google/uuid v1.6.0
|
||||
github.com/jackc/pgx/v5 v5.7.6
|
||||
github.com/joho/godotenv v1.5.1
|
||||
github.com/redis/go-redis/v9 v9.17.3
|
||||
github.com/skip2/go-qrcode v0.0.0-20200617195104-da1b6568686e
|
||||
golang.org/x/crypto v0.40.0
|
||||
)
|
||||
@@ -15,7 +16,9 @@ require (
|
||||
require (
|
||||
github.com/bytedance/sonic v1.14.0 // indirect
|
||||
github.com/bytedance/sonic/loader v0.3.0 // indirect
|
||||
github.com/cespare/xxhash/v2 v2.3.0 // indirect
|
||||
github.com/cloudwego/base64x v0.1.6 // indirect
|
||||
github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f // indirect
|
||||
github.com/gabriel-vasile/mimetype v1.4.8 // indirect
|
||||
github.com/gin-contrib/sse v1.1.0 // indirect
|
||||
github.com/go-playground/locales v0.14.1 // indirect
|
||||
|
||||
@@ -1,12 +1,20 @@
|
||||
github.com/bsm/ginkgo/v2 v2.12.0 h1:Ny8MWAHyOepLGlLKYmXG4IEkioBysk6GpaRTLC8zwWs=
|
||||
github.com/bsm/ginkgo/v2 v2.12.0/go.mod h1:SwYbGRRDovPVboqFv0tPTcG1sN61LM1Z4ARdbAV9g4c=
|
||||
github.com/bsm/gomega v1.27.10 h1:yeMWxP2pV2fG3FgAODIY8EiRE3dy0aeFYt4l7wh6yKA=
|
||||
github.com/bsm/gomega v1.27.10/go.mod h1:JyEr/xRbxbtgWNi8tIEVPUYZ5Dzef52k01W3YH0H+O0=
|
||||
github.com/bytedance/sonic v1.14.0 h1:/OfKt8HFw0kh2rj8N0F6C/qPGRESq0BbaNZgcNXXzQQ=
|
||||
github.com/bytedance/sonic v1.14.0/go.mod h1:WoEbx8WTcFJfzCe0hbmyTGrfjt8PzNEBdxlNUO24NhA=
|
||||
github.com/bytedance/sonic/loader v0.3.0 h1:dskwH8edlzNMctoruo8FPTJDF3vLtDT0sXZwvZJyqeA=
|
||||
github.com/bytedance/sonic/loader v0.3.0/go.mod h1:N8A3vUdtUebEY2/VQC0MyhYeKUFosQU6FxH2JmUe6VI=
|
||||
github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs=
|
||||
github.com/cespare/xxhash/v2 v2.3.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs=
|
||||
github.com/cloudwego/base64x v0.1.6 h1:t11wG9AECkCDk5fMSoxmufanudBtJ+/HemLstXDLI2M=
|
||||
github.com/cloudwego/base64x v0.1.6/go.mod h1:OFcloc187FXDaYHvrNIjxSe8ncn0OOM8gEHfghB2IPU=
|
||||
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
||||
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
|
||||
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
|
||||
github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f h1:lO4WD4F/rVNCu3HqELle0jiPLLBs70cWOduZpkS1E78=
|
||||
github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f/go.mod h1:cuUVRXasLTGF7a8hSLbxyZXjz+1KgoB3wDUb6vlszIc=
|
||||
github.com/gabriel-vasile/mimetype v1.4.8 h1:FfZ3gj38NjllZIeJAmMhr+qKL8Wu+nOoI3GqacKw1NM=
|
||||
github.com/gabriel-vasile/mimetype v1.4.8/go.mod h1:ByKUIKGjh1ODkGM1asKUbQZOLGrPjydw3hYPU2YU9t8=
|
||||
github.com/gin-contrib/sse v1.1.0 h1:n0w2GMuUpWDVp7qSpvze6fAu9iRxJY4Hmj6AmBOU05w=
|
||||
@@ -62,6 +70,8 @@ github.com/quic-go/qpack v0.5.1 h1:giqksBPnT/HDtZ6VhtFKgoLOWmlyo9Ei6u9PqzIMbhI=
|
||||
github.com/quic-go/qpack v0.5.1/go.mod h1:+PC4XFrEskIVkcLzpEkbLqq1uCoxPhQuvK5rH1ZgaEg=
|
||||
github.com/quic-go/quic-go v0.54.0 h1:6s1YB9QotYI6Ospeiguknbp2Znb/jZYjZLRXn9kMQBg=
|
||||
github.com/quic-go/quic-go v0.54.0/go.mod h1:e68ZEaCdyviluZmy44P6Iey98v/Wfz6HCjQEm+l8zTY=
|
||||
github.com/redis/go-redis/v9 v9.17.3 h1:fN29NdNrE17KttK5Ndf20buqfDZwGNgoUr9qjl1DQx4=
|
||||
github.com/redis/go-redis/v9 v9.17.3/go.mod h1:u410H11HMLoB+TP67dz8rL9s6QW2j76l0//kSOd3370=
|
||||
github.com/skip2/go-qrcode v0.0.0-20200617195104-da1b6568686e h1:MRM5ITcdelLK2j1vwZ3Je0FKVCfqOLp5zO6trqMLYs0=
|
||||
github.com/skip2/go-qrcode v0.0.0-20200617195104-da1b6568686e/go.mod h1:XV66xRDqSt+GTGFMVlhk3ULuV0y9ZmzeVGR4mloJI3M=
|
||||
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME=
|
||||
|
||||
19
control-pipeline/Dockerfile
Normal file
19
control-pipeline/Dockerfile
Normal file
@@ -0,0 +1,19 @@
|
||||
FROM python:3.11-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
curl \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
|
||||
COPY requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
COPY . .
|
||||
|
||||
EXPOSE 8098
|
||||
|
||||
HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
|
||||
CMD curl -f http://127.0.0.1:8098/health || exit 1
|
||||
|
||||
CMD ["python", "main.py"]
|
||||
8
control-pipeline/api/__init__.py
Normal file
8
control-pipeline/api/__init__.py
Normal file
@@ -0,0 +1,8 @@
|
||||
from fastapi import APIRouter
|
||||
|
||||
from api.control_generator_routes import router as generator_router
|
||||
from api.canonical_control_routes import router as canonical_router
|
||||
|
||||
router = APIRouter()
|
||||
router.include_router(generator_router)
|
||||
router.include_router(canonical_router)
|
||||
2132
control-pipeline/api/canonical_control_routes.py
Normal file
2132
control-pipeline/api/canonical_control_routes.py
Normal file
File diff suppressed because it is too large
Load Diff
1102
control-pipeline/api/control_generator_routes.py
Normal file
1102
control-pipeline/api/control_generator_routes.py
Normal file
File diff suppressed because it is too large
Load Diff
67
control-pipeline/config.py
Normal file
67
control-pipeline/config.py
Normal file
@@ -0,0 +1,67 @@
|
||||
import os
|
||||
|
||||
|
||||
class Settings:
|
||||
"""Environment-based configuration for control-pipeline."""
|
||||
|
||||
# Database (compliance schema)
|
||||
DATABASE_URL: str = os.getenv(
|
||||
"DATABASE_URL",
|
||||
"postgresql://breakpilot:breakpilot123@localhost:5432/breakpilot_db",
|
||||
)
|
||||
SCHEMA_SEARCH_PATH: str = os.getenv(
|
||||
"SCHEMA_SEARCH_PATH", "compliance,core,public"
|
||||
)
|
||||
|
||||
# Qdrant (vector search for dedup)
|
||||
QDRANT_URL: str = os.getenv("QDRANT_URL", "http://localhost:6333")
|
||||
QDRANT_API_KEY: str = os.getenv("QDRANT_API_KEY", "")
|
||||
|
||||
# Embedding Service
|
||||
EMBEDDING_SERVICE_URL: str = os.getenv(
|
||||
"EMBEDDING_SERVICE_URL", "http://embedding-service:8087"
|
||||
)
|
||||
|
||||
# LLM - Anthropic
|
||||
ANTHROPIC_API_KEY: str = os.getenv("ANTHROPIC_API_KEY", "")
|
||||
CONTROL_GEN_ANTHROPIC_MODEL: str = os.getenv(
|
||||
"CONTROL_GEN_ANTHROPIC_MODEL", "claude-sonnet-4-6"
|
||||
)
|
||||
DECOMPOSITION_LLM_MODEL: str = os.getenv(
|
||||
"DECOMPOSITION_LLM_MODEL", "claude-haiku-4-5-20251001"
|
||||
)
|
||||
CONTROL_GEN_LLM_TIMEOUT: int = int(
|
||||
os.getenv("CONTROL_GEN_LLM_TIMEOUT", "180")
|
||||
)
|
||||
|
||||
# LLM - Ollama (fallback)
|
||||
OLLAMA_URL: str = os.getenv(
|
||||
"OLLAMA_URL", "http://host.docker.internal:11434"
|
||||
)
|
||||
CONTROL_GEN_OLLAMA_MODEL: str = os.getenv(
|
||||
"CONTROL_GEN_OLLAMA_MODEL", "qwen3.5:35b-a3b"
|
||||
)
|
||||
|
||||
# SDK Service (for RAG search proxy)
|
||||
SDK_URL: str = os.getenv(
|
||||
"SDK_URL", "http://ai-compliance-sdk:8090"
|
||||
)
|
||||
|
||||
# Auth
|
||||
JWT_SECRET: str = os.getenv("JWT_SECRET", "")
|
||||
|
||||
# Server
|
||||
PORT: int = int(os.getenv("PORT", "8098"))
|
||||
LOG_LEVEL: str = os.getenv("LOG_LEVEL", "INFO")
|
||||
ENVIRONMENT: str = os.getenv("ENVIRONMENT", "development")
|
||||
|
||||
# Pipeline
|
||||
DECOMPOSITION_BATCH_SIZE: int = int(
|
||||
os.getenv("DECOMPOSITION_BATCH_SIZE", "5")
|
||||
)
|
||||
DECOMPOSITION_LLM_TIMEOUT: int = int(
|
||||
os.getenv("DECOMPOSITION_LLM_TIMEOUT", "120")
|
||||
)
|
||||
|
||||
|
||||
settings = Settings()
|
||||
0
control-pipeline/data/__init__.py
Normal file
0
control-pipeline/data/__init__.py
Normal file
205
control-pipeline/data/source_type_classification.py
Normal file
205
control-pipeline/data/source_type_classification.py
Normal file
@@ -0,0 +1,205 @@
|
||||
"""
|
||||
Source-Type-Klassifikation fuer Regulierungen und Frameworks.
|
||||
|
||||
Dreistufiges Modell der normativen Verbindlichkeit:
|
||||
|
||||
Stufe 1 — GESETZ (law):
|
||||
Rechtlich bindend. Bussgeld bei Verstoss.
|
||||
Beispiele: DSGVO, NIS2, AI Act, CRA
|
||||
|
||||
Stufe 2 — LEITLINIE (guideline):
|
||||
Offizielle Auslegungshilfe von Aufsichtsbehoerden.
|
||||
Beweislastumkehr: Wer abweicht, muss begruenden warum.
|
||||
Beispiele: EDPB-Leitlinien, BSI-Standards, WP29-Dokumente
|
||||
|
||||
Stufe 3 — FRAMEWORK (framework):
|
||||
Freiwillige Best Practices, nicht rechtsverbindlich.
|
||||
Aber: Koennen als "Stand der Technik" herangezogen werden.
|
||||
Beispiele: ENISA, NIST, OWASP, OECD, CISA
|
||||
|
||||
Mapping: source_regulation (aus control_parent_links) -> source_type
|
||||
"""
|
||||
|
||||
# --- Typ-Definitionen ---
|
||||
SOURCE_TYPE_LAW = "law" # Gesetz/Verordnung/Richtlinie — normative_strength bleibt
|
||||
SOURCE_TYPE_GUIDELINE = "guideline" # Leitlinie/Standard — max "should"
|
||||
SOURCE_TYPE_FRAMEWORK = "framework" # Framework/Best Practice — max "may"
|
||||
|
||||
# Max erlaubte normative_strength pro source_type
|
||||
# DB-Constraint erlaubt: must, should, may (NICHT "can")
|
||||
NORMATIVE_STRENGTH_CAP: dict[str, str] = {
|
||||
SOURCE_TYPE_LAW: "must", # keine Begrenzung
|
||||
SOURCE_TYPE_GUIDELINE: "should", # max "should"
|
||||
SOURCE_TYPE_FRAMEWORK: "may", # max "may" (= "kann")
|
||||
}
|
||||
|
||||
# Reihenfolge fuer Vergleiche (hoeher = staerker)
|
||||
STRENGTH_ORDER: dict[str, int] = {
|
||||
"may": 1, # KANN (DB-Wert)
|
||||
"can": 1, # Alias — wird in cap_normative_strength zu "may" normalisiert
|
||||
"should": 2,
|
||||
"must": 3,
|
||||
}
|
||||
|
||||
|
||||
def cap_normative_strength(original: str, source_type: str) -> str:
|
||||
"""
|
||||
Begrenzt die normative_strength basierend auf dem source_type.
|
||||
|
||||
Beispiel:
|
||||
cap_normative_strength("must", "framework") -> "may"
|
||||
cap_normative_strength("should", "law") -> "should"
|
||||
cap_normative_strength("must", "guideline") -> "should"
|
||||
"""
|
||||
cap = NORMATIVE_STRENGTH_CAP.get(source_type, "must")
|
||||
cap_level = STRENGTH_ORDER.get(cap, 3)
|
||||
original_level = STRENGTH_ORDER.get(original, 3)
|
||||
if original_level > cap_level:
|
||||
return cap
|
||||
return original
|
||||
|
||||
|
||||
def get_highest_source_type(source_types: list[str]) -> str:
|
||||
"""
|
||||
Bestimmt den hoechsten source_type aus einer Liste.
|
||||
Ein Gesetz uebertrumpft alles.
|
||||
|
||||
Beispiel:
|
||||
get_highest_source_type(["framework", "law"]) -> "law"
|
||||
get_highest_source_type(["framework", "guideline"]) -> "guideline"
|
||||
"""
|
||||
type_order = {SOURCE_TYPE_FRAMEWORK: 1, SOURCE_TYPE_GUIDELINE: 2, SOURCE_TYPE_LAW: 3}
|
||||
if not source_types:
|
||||
return SOURCE_TYPE_FRAMEWORK
|
||||
return max(source_types, key=lambda t: type_order.get(t, 0))
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Klassifikation: source_regulation -> source_type
|
||||
#
|
||||
# Diese Map wird fuer den Backfill und zukuenftige Pipeline-Runs verwendet.
|
||||
# Neue Regulierungen hier eintragen!
|
||||
# ============================================================================
|
||||
|
||||
SOURCE_REGULATION_CLASSIFICATION: dict[str, str] = {
|
||||
# --- EU-Verordnungen (unmittelbar bindend) ---
|
||||
"DSGVO (EU) 2016/679": SOURCE_TYPE_LAW,
|
||||
"KI-Verordnung (EU) 2024/1689": SOURCE_TYPE_LAW,
|
||||
"Cyber Resilience Act (CRA)": SOURCE_TYPE_LAW,
|
||||
"NIS2-Richtlinie (EU) 2022/2555": SOURCE_TYPE_LAW,
|
||||
"Data Act": SOURCE_TYPE_LAW,
|
||||
"Data Governance Act (DGA)": SOURCE_TYPE_LAW,
|
||||
"Markets in Crypto-Assets (MiCA)": SOURCE_TYPE_LAW,
|
||||
"Maschinenverordnung (EU) 2023/1230": SOURCE_TYPE_LAW,
|
||||
"Batterieverordnung (EU) 2023/1542": SOURCE_TYPE_LAW,
|
||||
"AML-Verordnung": SOURCE_TYPE_LAW,
|
||||
|
||||
# --- EU-Richtlinien (nach nationaler Umsetzung bindend) ---
|
||||
# Fuer Compliance-Zwecke wie Gesetze behandeln
|
||||
|
||||
# --- Nationale Gesetze ---
|
||||
"Bundesdatenschutzgesetz (BDSG)": SOURCE_TYPE_LAW,
|
||||
"Telekommunikationsgesetz": SOURCE_TYPE_LAW,
|
||||
"Telekommunikationsgesetz Oesterreich": SOURCE_TYPE_LAW,
|
||||
"Gewerbeordnung (GewO)": SOURCE_TYPE_LAW,
|
||||
"Handelsgesetzbuch (HGB)": SOURCE_TYPE_LAW,
|
||||
"Abgabenordnung (AO)": SOURCE_TYPE_LAW,
|
||||
"IFRS-Übernahmeverordnung": SOURCE_TYPE_LAW,
|
||||
"Österreichisches Datenschutzgesetz (DSG)": SOURCE_TYPE_LAW,
|
||||
"LOPDGDD - Ley Orgánica de Protección de Datos (Spanien)": SOURCE_TYPE_LAW,
|
||||
"Loi Informatique et Libertés (Frankreich)": SOURCE_TYPE_LAW,
|
||||
"Információs önrendelkezési jog törvény (Ungarn)": SOURCE_TYPE_LAW,
|
||||
"EU Blue Guide 2022": SOURCE_TYPE_LAW,
|
||||
|
||||
# --- EDPB/WP29 Leitlinien (offizielle Auslegungshilfe) ---
|
||||
"EDPB Leitlinien 01/2019 (Zertifizierung)": SOURCE_TYPE_GUIDELINE,
|
||||
"EDPB Leitlinien 01/2020 (Datentransfers)": SOURCE_TYPE_GUIDELINE,
|
||||
"EDPB Leitlinien 01/2020 (Vernetzte Fahrzeuge)": SOURCE_TYPE_GUIDELINE,
|
||||
"EDPB Leitlinien 01/2022 (BCR)": SOURCE_TYPE_GUIDELINE,
|
||||
"EDPB Leitlinien 01/2024 (Berechtigtes Interesse)": SOURCE_TYPE_GUIDELINE,
|
||||
"EDPB Leitlinien 04/2019 (Data Protection by Design)": SOURCE_TYPE_GUIDELINE,
|
||||
"EDPB Leitlinien 05/2020 - Einwilligung": SOURCE_TYPE_GUIDELINE,
|
||||
"EDPB Leitlinien 07/2020 (Datentransfers)": SOURCE_TYPE_GUIDELINE,
|
||||
"EDPB Leitlinien 08/2020 (Social Media)": SOURCE_TYPE_GUIDELINE,
|
||||
"EDPB Leitlinien 09/2022 (Data Breach)": SOURCE_TYPE_GUIDELINE,
|
||||
"EDPB Leitlinien 09/2022 - Meldung von Datenschutzverletzungen": SOURCE_TYPE_GUIDELINE,
|
||||
"EDPB Empfehlungen 01/2020 - Ergaenzende Massnahmen fuer Datentransfers": SOURCE_TYPE_GUIDELINE,
|
||||
"EDPB Leitlinien - Berechtigtes Interesse (Art. 6(1)(f))": SOURCE_TYPE_GUIDELINE,
|
||||
"WP244 Leitlinien (Profiling)": SOURCE_TYPE_GUIDELINE,
|
||||
"WP251 Leitlinien (Profiling)": SOURCE_TYPE_GUIDELINE,
|
||||
"WP260 Leitlinien (Transparenz)": SOURCE_TYPE_GUIDELINE,
|
||||
|
||||
# --- BSI Standards (behoerdliche technische Richtlinien) ---
|
||||
"BSI-TR-03161-1": SOURCE_TYPE_GUIDELINE,
|
||||
"BSI-TR-03161-2": SOURCE_TYPE_GUIDELINE,
|
||||
"BSI-TR-03161-3": SOURCE_TYPE_GUIDELINE,
|
||||
|
||||
# --- ENISA (EU-Agentur, aber Empfehlungen nicht rechtsverbindlich) ---
|
||||
"ENISA Cybersecurity State 2024": SOURCE_TYPE_FRAMEWORK,
|
||||
"ENISA ICS/SCADA Dependencies": SOURCE_TYPE_FRAMEWORK,
|
||||
"ENISA Supply Chain Good Practices": SOURCE_TYPE_FRAMEWORK,
|
||||
"ENISA Threat Landscape Supply Chain": SOURCE_TYPE_FRAMEWORK,
|
||||
|
||||
# --- NIST (US-Standards, international als Best Practice) ---
|
||||
"NIST AI Risk Management Framework": SOURCE_TYPE_FRAMEWORK,
|
||||
"NIST Cybersecurity Framework 2.0": SOURCE_TYPE_FRAMEWORK,
|
||||
"NIST SP 800-207 (Zero Trust)": SOURCE_TYPE_FRAMEWORK,
|
||||
"NIST SP 800-218 (SSDF)": SOURCE_TYPE_FRAMEWORK,
|
||||
"NIST SP 800-53 Rev. 5": SOURCE_TYPE_FRAMEWORK,
|
||||
"NIST SP 800-63-3": SOURCE_TYPE_FRAMEWORK,
|
||||
|
||||
# --- OWASP (Community-Standards) ---
|
||||
"OWASP API Security Top 10 (2023)": SOURCE_TYPE_FRAMEWORK,
|
||||
"OWASP ASVS 4.0": SOURCE_TYPE_FRAMEWORK,
|
||||
"OWASP MASVS 2.0": SOURCE_TYPE_FRAMEWORK,
|
||||
"OWASP SAMM 2.0": SOURCE_TYPE_FRAMEWORK,
|
||||
"OWASP Top 10 (2021)": SOURCE_TYPE_FRAMEWORK,
|
||||
|
||||
# --- Sonstige Frameworks ---
|
||||
"OECD KI-Empfehlung": SOURCE_TYPE_FRAMEWORK,
|
||||
"CISA Secure by Design": SOURCE_TYPE_FRAMEWORK,
|
||||
}
|
||||
|
||||
|
||||
def classify_source_regulation(source_regulation: str) -> str:
|
||||
"""
|
||||
Klassifiziert eine source_regulation als law, guideline oder framework.
|
||||
|
||||
Verwendet exaktes Matching gegen die Map. Bei unbekannten Quellen
|
||||
wird anhand von Schluesselwoertern geraten, Fallback ist 'framework'
|
||||
(konservativstes Ergebnis).
|
||||
"""
|
||||
if not source_regulation:
|
||||
return SOURCE_TYPE_FRAMEWORK
|
||||
|
||||
# Exaktes Match
|
||||
if source_regulation in SOURCE_REGULATION_CLASSIFICATION:
|
||||
return SOURCE_REGULATION_CLASSIFICATION[source_regulation]
|
||||
|
||||
# Heuristik fuer unbekannte Quellen
|
||||
lower = source_regulation.lower()
|
||||
|
||||
# Gesetze erkennen
|
||||
law_indicators = [
|
||||
"verordnung", "richtlinie", "gesetz", "directive", "regulation",
|
||||
"(eu)", "(eg)", "act", "ley", "loi", "törvény", "código",
|
||||
]
|
||||
if any(ind in lower for ind in law_indicators):
|
||||
return SOURCE_TYPE_LAW
|
||||
|
||||
# Leitlinien erkennen
|
||||
guideline_indicators = [
|
||||
"edpb", "leitlinie", "guideline", "wp2", "bsi", "empfehlung",
|
||||
]
|
||||
if any(ind in lower for ind in guideline_indicators):
|
||||
return SOURCE_TYPE_GUIDELINE
|
||||
|
||||
# Frameworks erkennen
|
||||
framework_indicators = [
|
||||
"enisa", "nist", "owasp", "oecd", "cisa", "framework", "iso",
|
||||
]
|
||||
if any(ind in lower for ind in framework_indicators):
|
||||
return SOURCE_TYPE_FRAMEWORK
|
||||
|
||||
# Konservativ: unbekannt = framework (geringste Verbindlichkeit)
|
||||
return SOURCE_TYPE_FRAMEWORK
|
||||
0
control-pipeline/db/__init__.py
Normal file
0
control-pipeline/db/__init__.py
Normal file
37
control-pipeline/db/session.py
Normal file
37
control-pipeline/db/session.py
Normal file
@@ -0,0 +1,37 @@
|
||||
"""Database session factory for control-pipeline.
|
||||
|
||||
Connects to the shared PostgreSQL with search_path set to compliance schema.
|
||||
"""
|
||||
|
||||
from sqlalchemy import create_engine, event
|
||||
from sqlalchemy.orm import sessionmaker
|
||||
|
||||
from config import settings
|
||||
|
||||
engine = create_engine(
|
||||
settings.DATABASE_URL,
|
||||
pool_pre_ping=True,
|
||||
pool_size=5,
|
||||
max_overflow=10,
|
||||
echo=False,
|
||||
)
|
||||
|
||||
|
||||
@event.listens_for(engine, "connect")
|
||||
def set_search_path(dbapi_connection, connection_record):
|
||||
cursor = dbapi_connection.cursor()
|
||||
cursor.execute(f"SET search_path TO {settings.SCHEMA_SEARCH_PATH}")
|
||||
cursor.close()
|
||||
dbapi_connection.commit()
|
||||
|
||||
|
||||
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
|
||||
|
||||
|
||||
def get_db():
|
||||
"""FastAPI dependency for DB sessions."""
|
||||
db = SessionLocal()
|
||||
try:
|
||||
yield db
|
||||
finally:
|
||||
db.close()
|
||||
88
control-pipeline/main.py
Normal file
88
control-pipeline/main.py
Normal file
@@ -0,0 +1,88 @@
|
||||
import logging
|
||||
from contextlib import asynccontextmanager
|
||||
|
||||
import uvicorn
|
||||
from fastapi import FastAPI
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
|
||||
from config import settings
|
||||
from db.session import engine
|
||||
|
||||
logging.basicConfig(
|
||||
level=getattr(logging, settings.LOG_LEVEL, logging.INFO),
|
||||
format="%(asctime)s [%(name)s] %(levelname)s: %(message)s",
|
||||
)
|
||||
logger = logging.getLogger("control-pipeline")
|
||||
|
||||
|
||||
@asynccontextmanager
|
||||
async def lifespan(app: FastAPI):
|
||||
"""Startup: verify DB and Qdrant connectivity."""
|
||||
logger.info("Control-Pipeline starting up ...")
|
||||
|
||||
# Verify database connection
|
||||
try:
|
||||
with engine.connect() as conn:
|
||||
conn.execute(__import__("sqlalchemy").text("SELECT 1"))
|
||||
logger.info("Database connection OK")
|
||||
except Exception as exc:
|
||||
logger.error("Database connection failed: %s", exc)
|
||||
|
||||
yield
|
||||
|
||||
logger.info("Control-Pipeline shutting down ...")
|
||||
|
||||
|
||||
app = FastAPI(
|
||||
title="BreakPilot Control Pipeline",
|
||||
description="Control generation, decomposition, and deduplication pipeline for the BreakPilot compliance platform.",
|
||||
version="1.0.0",
|
||||
lifespan=lifespan,
|
||||
)
|
||||
|
||||
# CORS
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["*"],
|
||||
allow_credentials=True,
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"],
|
||||
)
|
||||
|
||||
# Routers
|
||||
from api import router as api_router # noqa: E402
|
||||
|
||||
app.include_router(api_router)
|
||||
|
||||
|
||||
# Health
|
||||
@app.get("/health")
|
||||
async def health():
|
||||
"""Liveness probe."""
|
||||
db_ok = False
|
||||
try:
|
||||
with engine.connect() as conn:
|
||||
conn.execute(__import__("sqlalchemy").text("SELECT 1"))
|
||||
db_ok = True
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
status = "healthy" if db_ok else "degraded"
|
||||
return {
|
||||
"status": status,
|
||||
"service": "control-pipeline",
|
||||
"version": "1.0.0",
|
||||
"dependencies": {
|
||||
"postgres": "ok" if db_ok else "unavailable",
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
uvicorn.run(
|
||||
"main:app",
|
||||
host="0.0.0.0",
|
||||
port=settings.PORT,
|
||||
reload=False,
|
||||
log_level="info",
|
||||
)
|
||||
22
control-pipeline/requirements.txt
Normal file
22
control-pipeline/requirements.txt
Normal file
@@ -0,0 +1,22 @@
|
||||
# Web Framework
|
||||
fastapi>=0.123.0
|
||||
uvicorn[standard]>=0.27.0
|
||||
|
||||
# Database
|
||||
SQLAlchemy>=2.0.36
|
||||
psycopg2-binary>=2.9.10
|
||||
|
||||
# HTTP Client
|
||||
httpx>=0.28.0
|
||||
|
||||
# Validation
|
||||
pydantic>=2.5.0
|
||||
|
||||
# AI - Anthropic Claude
|
||||
anthropic>=0.75.0
|
||||
|
||||
# Vector DB (dedup)
|
||||
qdrant-client>=1.7.0
|
||||
|
||||
# Auth
|
||||
python-jose[cryptography]>=3.3.0
|
||||
219
control-pipeline/scripts/import_backup.py
Normal file
219
control-pipeline/scripts/import_backup.py
Normal file
@@ -0,0 +1,219 @@
|
||||
"""
|
||||
Import compliance backup into local PostgreSQL.
|
||||
Fixes Python-style lists/dicts in JSONB fields to valid JSON.
|
||||
"""
|
||||
import ast
|
||||
import gzip
|
||||
import json
|
||||
import re
|
||||
import sys
|
||||
import psycopg2
|
||||
|
||||
DB_URL = "postgresql://breakpilot:breakpilot123@localhost:5432/breakpilot_db"
|
||||
BACKUP_PATH = "/tmp/compliance-db-2026-03-28_16-25-19.sql.gz"
|
||||
|
||||
# Tables with JSONB columns that need Python→JSON conversion
|
||||
JSONB_TABLES = {
|
||||
"canonical_controls",
|
||||
"canonical_controls_pre_dedup",
|
||||
"obligation_candidates",
|
||||
"control_dedup_reviews",
|
||||
"canonical_generation_jobs",
|
||||
"canonical_processed_chunks",
|
||||
}
|
||||
|
||||
|
||||
def fix_python_value(val: str) -> str:
|
||||
"""Convert Python repr to JSON string for JSONB fields."""
|
||||
if val == "NULL":
|
||||
return None
|
||||
# Strip outer SQL quotes
|
||||
if val.startswith("'") and val.endswith("'"):
|
||||
# Unescape SQL single quotes
|
||||
inner = val[1:-1].replace("''", "'")
|
||||
else:
|
||||
return val
|
||||
|
||||
# Try to parse as Python literal and convert to JSON
|
||||
try:
|
||||
obj = ast.literal_eval(inner)
|
||||
return json.dumps(obj, ensure_ascii=False)
|
||||
except (ValueError, SyntaxError):
|
||||
# Already valid JSON or plain string
|
||||
return inner
|
||||
|
||||
|
||||
def process_line(line: str, conn) -> bool:
|
||||
"""Process a single SQL line. Returns True if it was an INSERT."""
|
||||
line = line.strip()
|
||||
if not line.startswith("INSERT INTO"):
|
||||
if line.startswith("SET "):
|
||||
return False
|
||||
return False
|
||||
|
||||
# Execute directly for non-JSONB tables
|
||||
table_match = re.match(r'INSERT INTO "(\w+)"', line)
|
||||
if not table_match:
|
||||
return False
|
||||
table = table_match.group(1)
|
||||
|
||||
if table not in JSONB_TABLES:
|
||||
# Execute as-is
|
||||
try:
|
||||
with conn.cursor() as cur:
|
||||
cur.execute(line)
|
||||
return True
|
||||
except Exception as e:
|
||||
conn.rollback()
|
||||
return False
|
||||
|
||||
# For JSONB tables: use psycopg2 parameterized query
|
||||
# Extract column names and values
|
||||
cols_match = re.match(r'INSERT INTO "\w+" \(([^)]+)\) VALUES \(', line)
|
||||
if not cols_match:
|
||||
return False
|
||||
|
||||
col_names = [c.strip().strip('"') for c in cols_match.group(1).split(",")]
|
||||
|
||||
# Extract VALUES portion
|
||||
vals_start = line.index("VALUES (") + 8
|
||||
vals_str = line[vals_start:-2] # Remove trailing );
|
||||
|
||||
# Parse SQL values (handling nested quotes and parentheses)
|
||||
values = []
|
||||
current = ""
|
||||
in_quote = False
|
||||
depth = 0
|
||||
i = 0
|
||||
while i < len(vals_str):
|
||||
c = vals_str[i]
|
||||
if in_quote:
|
||||
if c == "'" and i + 1 < len(vals_str) and vals_str[i + 1] == "'":
|
||||
current += "''"
|
||||
i += 2
|
||||
continue
|
||||
elif c == "'":
|
||||
current += "'"
|
||||
in_quote = False
|
||||
else:
|
||||
current += c
|
||||
else:
|
||||
if c == "'":
|
||||
current += "'"
|
||||
in_quote = True
|
||||
elif c == "(" :
|
||||
depth += 1
|
||||
current += c
|
||||
elif c == ")":
|
||||
depth -= 1
|
||||
current += c
|
||||
elif c == "," and depth == 0:
|
||||
values.append(current.strip())
|
||||
current = ""
|
||||
else:
|
||||
current += c
|
||||
i += 1
|
||||
values.append(current.strip())
|
||||
|
||||
if len(values) != len(col_names):
|
||||
# Fallback: try direct execution
|
||||
try:
|
||||
with conn.cursor() as cur:
|
||||
cur.execute(line)
|
||||
return True
|
||||
except Exception:
|
||||
conn.rollback()
|
||||
return False
|
||||
|
||||
# Convert values
|
||||
params = []
|
||||
placeholders = []
|
||||
for col, val in zip(col_names, values):
|
||||
if val == "NULL":
|
||||
params.append(None)
|
||||
placeholders.append("%s")
|
||||
elif val in ("TRUE", "true"):
|
||||
params.append(True)
|
||||
placeholders.append("%s")
|
||||
elif val in ("FALSE", "false"):
|
||||
params.append(False)
|
||||
placeholders.append("%s")
|
||||
elif val.startswith("'") and val.endswith("'"):
|
||||
inner = val[1:-1].replace("''", "'")
|
||||
# Check if this looks like a Python literal (list/dict)
|
||||
stripped = inner.strip()
|
||||
if stripped and stripped[0] in ("[", "{") and stripped not in ("[]", "{}"):
|
||||
try:
|
||||
obj = ast.literal_eval(inner)
|
||||
params.append(json.dumps(obj, ensure_ascii=False))
|
||||
except (ValueError, SyntaxError):
|
||||
params.append(inner)
|
||||
else:
|
||||
params.append(inner)
|
||||
placeholders.append("%s")
|
||||
else:
|
||||
# Numeric or other
|
||||
try:
|
||||
if "." in val:
|
||||
params.append(float(val))
|
||||
else:
|
||||
params.append(int(val))
|
||||
except ValueError:
|
||||
params.append(val)
|
||||
placeholders.append("%s")
|
||||
|
||||
col_list = ", ".join(f'"{c}"' for c in col_names)
|
||||
ph_list = ", ".join(placeholders)
|
||||
sql = f'INSERT INTO "{table}" ({col_list}) VALUES ({ph_list})'
|
||||
|
||||
try:
|
||||
with conn.cursor() as cur:
|
||||
cur.execute(sql, params)
|
||||
return True
|
||||
except Exception as e:
|
||||
conn.rollback()
|
||||
if "duplicate key" not in str(e):
|
||||
print(f" ERROR [{table}]: {str(e)[:120]}", file=sys.stderr)
|
||||
return False
|
||||
|
||||
|
||||
def main():
|
||||
conn = psycopg2.connect(DB_URL)
|
||||
conn.autocommit = True
|
||||
|
||||
with conn.cursor() as cur:
|
||||
cur.execute("SET search_path TO compliance, public")
|
||||
|
||||
total = 0
|
||||
ok = 0
|
||||
errors = 0
|
||||
|
||||
print(f"Reading {BACKUP_PATH}...")
|
||||
with gzip.open(BACKUP_PATH, "rt", encoding="utf-8") as f:
|
||||
buffer = ""
|
||||
for line in f:
|
||||
buffer += line
|
||||
if not buffer.rstrip().endswith(";"):
|
||||
continue
|
||||
# Complete SQL statement
|
||||
stmt = buffer.strip()
|
||||
buffer = ""
|
||||
|
||||
if not stmt.startswith("INSERT"):
|
||||
continue
|
||||
|
||||
total += 1
|
||||
if process_line(stmt, conn):
|
||||
ok += 1
|
||||
else:
|
||||
errors += 1
|
||||
|
||||
if total % 10000 == 0:
|
||||
print(f" {total:>8} processed, {ok} ok, {errors} errors")
|
||||
|
||||
print(f"\nDONE: {total} total, {ok} ok, {errors} errors")
|
||||
conn.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
0
control-pipeline/services/__init__.py
Normal file
0
control-pipeline/services/__init__.py
Normal file
187
control-pipeline/services/anchor_finder.py
Normal file
187
control-pipeline/services/anchor_finder.py
Normal file
@@ -0,0 +1,187 @@
|
||||
"""
|
||||
Anchor Finder — finds open-source references (OWASP, NIST, ENISA) for controls.
|
||||
|
||||
Two-stage search:
|
||||
Stage A: RAG-internal search for open-source chunks matching the control topic
|
||||
Stage B: Web search via DuckDuckGo Instant Answer API (no API key needed)
|
||||
|
||||
Only open-source references (Rule 1+2) are accepted as anchors.
|
||||
"""
|
||||
|
||||
import logging
|
||||
from dataclasses import dataclass
|
||||
from typing import List, Optional
|
||||
|
||||
import httpx
|
||||
|
||||
from .rag_client import ComplianceRAGClient, get_rag_client
|
||||
from .control_generator import (
|
||||
GeneratedControl,
|
||||
REGULATION_LICENSE_MAP,
|
||||
_RULE2_PREFIXES,
|
||||
_RULE3_PREFIXES,
|
||||
_classify_regulation,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Regulation codes that are safe to reference as open anchors (Rule 1+2)
|
||||
_OPEN_SOURCE_RULES = {1, 2}
|
||||
|
||||
|
||||
@dataclass
|
||||
class OpenAnchor:
|
||||
framework: str
|
||||
ref: str
|
||||
url: str
|
||||
|
||||
|
||||
class AnchorFinder:
|
||||
"""Finds open-source references to anchor generated controls."""
|
||||
|
||||
def __init__(self, rag_client: Optional[ComplianceRAGClient] = None):
|
||||
self.rag = rag_client or get_rag_client()
|
||||
|
||||
async def find_anchors(
|
||||
self,
|
||||
control: GeneratedControl,
|
||||
skip_web: bool = False,
|
||||
min_anchors: int = 2,
|
||||
) -> List[OpenAnchor]:
|
||||
"""Find open-source anchors for a control."""
|
||||
# Stage A: RAG-internal search
|
||||
anchors = await self._search_rag_for_open_anchors(control)
|
||||
|
||||
# Stage B: Web search if not enough anchors
|
||||
if len(anchors) < min_anchors and not skip_web:
|
||||
web_anchors = await self._search_web(control)
|
||||
# Deduplicate by framework+ref
|
||||
existing_keys = {(a.framework, a.ref) for a in anchors}
|
||||
for wa in web_anchors:
|
||||
if (wa.framework, wa.ref) not in existing_keys:
|
||||
anchors.append(wa)
|
||||
|
||||
return anchors
|
||||
|
||||
async def _search_rag_for_open_anchors(self, control: GeneratedControl) -> List[OpenAnchor]:
|
||||
"""Search RAG for chunks from open sources matching the control topic."""
|
||||
# Build search query from control title + first 3 tags
|
||||
tags_str = " ".join(control.tags[:3]) if control.tags else ""
|
||||
query = f"{control.title} {tags_str}".strip()
|
||||
|
||||
results = await self.rag.search_with_rerank(
|
||||
query=query,
|
||||
collection="bp_compliance_ce",
|
||||
top_k=15,
|
||||
)
|
||||
|
||||
anchors: List[OpenAnchor] = []
|
||||
seen: set[str] = set()
|
||||
|
||||
for r in results:
|
||||
if not r.regulation_code:
|
||||
continue
|
||||
|
||||
# Only accept open-source references
|
||||
license_info = _classify_regulation(r.regulation_code)
|
||||
if license_info.get("rule") not in _OPEN_SOURCE_RULES:
|
||||
continue
|
||||
|
||||
# Build reference key for dedup
|
||||
ref = r.article or r.category or ""
|
||||
key = f"{r.regulation_code}:{ref}"
|
||||
if key in seen:
|
||||
continue
|
||||
seen.add(key)
|
||||
|
||||
framework_name = license_info.get("name", r.regulation_name or r.regulation_short or r.regulation_code)
|
||||
url = r.source_url or self._build_reference_url(r.regulation_code, ref)
|
||||
|
||||
anchors.append(OpenAnchor(
|
||||
framework=framework_name,
|
||||
ref=ref,
|
||||
url=url,
|
||||
))
|
||||
|
||||
if len(anchors) >= 5:
|
||||
break
|
||||
|
||||
return anchors
|
||||
|
||||
async def _search_web(self, control: GeneratedControl) -> List[OpenAnchor]:
|
||||
"""Search DuckDuckGo Instant Answer API for open references."""
|
||||
keywords = f"{control.title} security control OWASP NIST"
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
resp = await client.get(
|
||||
"https://api.duckduckgo.com/",
|
||||
params={
|
||||
"q": keywords,
|
||||
"format": "json",
|
||||
"no_html": "1",
|
||||
"skip_disambig": "1",
|
||||
},
|
||||
)
|
||||
if resp.status_code != 200:
|
||||
return []
|
||||
|
||||
data = resp.json()
|
||||
anchors: List[OpenAnchor] = []
|
||||
|
||||
# Parse RelatedTopics
|
||||
for topic in data.get("RelatedTopics", [])[:10]:
|
||||
url = topic.get("FirstURL", "")
|
||||
text = topic.get("Text", "")
|
||||
|
||||
if not url:
|
||||
continue
|
||||
|
||||
# Only accept known open-source domains
|
||||
framework = self._identify_framework_from_url(url)
|
||||
if framework:
|
||||
anchors.append(OpenAnchor(
|
||||
framework=framework,
|
||||
ref=text[:100] if text else url,
|
||||
url=url,
|
||||
))
|
||||
|
||||
if len(anchors) >= 3:
|
||||
break
|
||||
|
||||
return anchors
|
||||
|
||||
except Exception as e:
|
||||
logger.warning("Web anchor search failed: %s", e)
|
||||
return []
|
||||
|
||||
@staticmethod
|
||||
def _identify_framework_from_url(url: str) -> Optional[str]:
|
||||
"""Identify if a URL belongs to a known open-source framework."""
|
||||
url_lower = url.lower()
|
||||
if "owasp.org" in url_lower:
|
||||
return "OWASP"
|
||||
if "nist.gov" in url_lower or "csrc.nist.gov" in url_lower:
|
||||
return "NIST"
|
||||
if "enisa.europa.eu" in url_lower:
|
||||
return "ENISA"
|
||||
if "cisa.gov" in url_lower:
|
||||
return "CISA"
|
||||
if "eur-lex.europa.eu" in url_lower:
|
||||
return "EU Law"
|
||||
return None
|
||||
|
||||
@staticmethod
|
||||
def _build_reference_url(regulation_code: str, ref: str) -> str:
|
||||
"""Build a reference URL for known frameworks."""
|
||||
code = regulation_code.lower()
|
||||
if code.startswith("owasp"):
|
||||
return "https://owasp.org/www-project-application-security-verification-standard/"
|
||||
if code.startswith("nist"):
|
||||
return "https://csrc.nist.gov/publications"
|
||||
if code.startswith("enisa"):
|
||||
return "https://www.enisa.europa.eu/publications"
|
||||
if code.startswith("eu_"):
|
||||
return "https://eur-lex.europa.eu/"
|
||||
if code == "cisa_secure_by_design":
|
||||
return "https://www.cisa.gov/securebydesign"
|
||||
return ""
|
||||
245
control-pipeline/services/applicability_engine.py
Normal file
245
control-pipeline/services/applicability_engine.py
Normal file
@@ -0,0 +1,245 @@
|
||||
"""
|
||||
Applicability Engine -- filters controls based on company profile + scope answers.
|
||||
|
||||
Deterministic, no LLM needed. Implements Scoped Control Applicability (Phase C2).
|
||||
|
||||
Filtering logic:
|
||||
- Controls with NULL applicability fields are INCLUDED (apply to everyone).
|
||||
- Controls with '["all"]' match all queries.
|
||||
- Industry: control applies if its applicable_industries contains the requested
|
||||
industry OR contains "all" OR is NULL.
|
||||
- Company size: control applies if its applicable_company_size contains the
|
||||
requested size OR contains "all" OR is NULL.
|
||||
- Scope signals: control applies if it has NO scope_conditions, or the company
|
||||
has at least one of the required signals (requires_any logic).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import logging
|
||||
from typing import Any, Optional
|
||||
|
||||
from sqlalchemy import text
|
||||
|
||||
from db.session import SessionLocal
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Valid company sizes (ordered smallest to largest)
|
||||
VALID_SIZES = ("micro", "small", "medium", "large", "enterprise")
|
||||
|
||||
|
||||
def _parse_json_text(value: Any) -> Any:
|
||||
"""Parse a TEXT column that stores JSON. Returns None if unparseable."""
|
||||
if value is None:
|
||||
return None
|
||||
if isinstance(value, (list, dict)):
|
||||
return value
|
||||
if isinstance(value, str):
|
||||
try:
|
||||
return json.loads(value)
|
||||
except (json.JSONDecodeError, ValueError):
|
||||
return None
|
||||
return None
|
||||
|
||||
|
||||
def _matches_industry(applicable_industries_raw: Any, industry: str) -> bool:
|
||||
"""Check if a control's applicable_industries matches the requested industry."""
|
||||
industries = _parse_json_text(applicable_industries_raw)
|
||||
if industries is None:
|
||||
return True # NULL = applies to everyone
|
||||
if not isinstance(industries, list):
|
||||
return True # malformed = include
|
||||
if "all" in industries:
|
||||
return True
|
||||
return industry in industries
|
||||
|
||||
|
||||
def _matches_company_size(applicable_company_size_raw: Any, company_size: str) -> bool:
|
||||
"""Check if a control's applicable_company_size matches the requested size."""
|
||||
sizes = _parse_json_text(applicable_company_size_raw)
|
||||
if sizes is None:
|
||||
return True # NULL = applies to everyone
|
||||
if not isinstance(sizes, list):
|
||||
return True # malformed = include
|
||||
if "all" in sizes:
|
||||
return True
|
||||
return company_size in sizes
|
||||
|
||||
|
||||
def _matches_scope_signals(
|
||||
scope_conditions_raw: Any, scope_signals: list[str]
|
||||
) -> bool:
|
||||
"""Check if a control's scope_conditions are satisfied by the given signals.
|
||||
|
||||
A control with scope_conditions = {"requires_any": ["uses_ai", "processes_health_data"]}
|
||||
matches if the company has at least one of those signals.
|
||||
A control with NULL or empty scope_conditions always matches.
|
||||
"""
|
||||
conditions = _parse_json_text(scope_conditions_raw)
|
||||
if conditions is None:
|
||||
return True # no conditions = applies to everyone
|
||||
if not isinstance(conditions, dict):
|
||||
return True # malformed = include
|
||||
|
||||
requires_any = conditions.get("requires_any", [])
|
||||
if not requires_any:
|
||||
return True # no required signals = applies to everyone
|
||||
|
||||
# Company must have at least one of the required signals
|
||||
return bool(set(requires_any) & set(scope_signals))
|
||||
|
||||
|
||||
def get_applicable_controls(
|
||||
db,
|
||||
industry: Optional[str] = None,
|
||||
company_size: Optional[str] = None,
|
||||
scope_signals: Optional[list[str]] = None,
|
||||
limit: int = 100,
|
||||
offset: int = 0,
|
||||
) -> dict[str, Any]:
|
||||
"""
|
||||
Returns controls applicable to the given company profile.
|
||||
|
||||
Uses SQL pre-filtering with LIKE for performance, then Python post-filtering
|
||||
for precise JSON matching (since columns are TEXT, not JSONB).
|
||||
|
||||
Args:
|
||||
db: SQLAlchemy session
|
||||
industry: e.g. "Telekommunikation", "Energie", "Gesundheitswesen"
|
||||
company_size: e.g. "medium", "large", "enterprise"
|
||||
scope_signals: e.g. ["uses_ai", "third_country_transfer"]
|
||||
limit: max results to return (applied after filtering)
|
||||
offset: pagination offset (applied after filtering)
|
||||
|
||||
Returns:
|
||||
dict with total_applicable count, paginated controls, and breakdown stats
|
||||
"""
|
||||
if scope_signals is None:
|
||||
scope_signals = []
|
||||
|
||||
# SQL pre-filter: broad match to reduce Python-side filtering
|
||||
query = """
|
||||
SELECT id, framework_id, control_id, title, objective, rationale,
|
||||
scope, requirements, test_procedure, evidence,
|
||||
severity, risk_score, implementation_effort,
|
||||
evidence_confidence, open_anchors, release_state, tags,
|
||||
license_rule, source_original_text, source_citation,
|
||||
customer_visible, verification_method, category, evidence_type,
|
||||
target_audience, generation_metadata, generation_strategy,
|
||||
applicable_industries, applicable_company_size, scope_conditions,
|
||||
parent_control_uuid, decomposition_method, pipeline_version,
|
||||
created_at, updated_at
|
||||
FROM canonical_controls
|
||||
WHERE release_state NOT IN ('duplicate', 'deprecated', 'rejected')
|
||||
"""
|
||||
params: dict[str, Any] = {}
|
||||
|
||||
# SQL-level pre-filtering (broad, may include false positives)
|
||||
if industry:
|
||||
query += """ AND (applicable_industries IS NULL
|
||||
OR applicable_industries LIKE '%"all"%'
|
||||
OR applicable_industries LIKE '%' || :industry || '%')"""
|
||||
params["industry"] = industry
|
||||
|
||||
if company_size:
|
||||
query += """ AND (applicable_company_size IS NULL
|
||||
OR applicable_company_size LIKE '%"all"%'
|
||||
OR applicable_company_size LIKE '%' || :company_size || '%')"""
|
||||
params["company_size"] = company_size
|
||||
|
||||
# For scope_signals we cannot do precise SQL filtering on requires_any,
|
||||
# but we can at least exclude controls whose scope_conditions text
|
||||
# does not contain any of the requested signals (if only 1 signal).
|
||||
# With multiple signals we skip SQL pre-filter and do it in Python.
|
||||
if scope_signals and len(scope_signals) == 1:
|
||||
query += """ AND (scope_conditions IS NULL
|
||||
OR scope_conditions LIKE '%' || :scope_sig || '%')"""
|
||||
params["scope_sig"] = scope_signals[0]
|
||||
|
||||
query += " ORDER BY control_id"
|
||||
|
||||
rows = db.execute(text(query), params).fetchall()
|
||||
|
||||
# Python-level precise filtering
|
||||
applicable = []
|
||||
for r in rows:
|
||||
if industry and not _matches_industry(r.applicable_industries, industry):
|
||||
continue
|
||||
if company_size and not _matches_company_size(
|
||||
r.applicable_company_size, company_size
|
||||
):
|
||||
continue
|
||||
if scope_signals and not _matches_scope_signals(
|
||||
r.scope_conditions, scope_signals
|
||||
):
|
||||
continue
|
||||
applicable.append(r)
|
||||
|
||||
total_applicable = len(applicable)
|
||||
|
||||
# Apply pagination
|
||||
paginated = applicable[offset : offset + limit]
|
||||
|
||||
# Build domain breakdown
|
||||
domain_counts: dict[str, int] = {}
|
||||
for r in applicable:
|
||||
domain = r.control_id.split("-")[0].upper() if r.control_id else "UNKNOWN"
|
||||
domain_counts[domain] = domain_counts.get(domain, 0) + 1
|
||||
|
||||
# Build severity breakdown
|
||||
severity_counts: dict[str, int] = {}
|
||||
for r in applicable:
|
||||
sev = r.severity or "unknown"
|
||||
severity_counts[sev] = severity_counts.get(sev, 0) + 1
|
||||
|
||||
# Build industry breakdown (from matched controls)
|
||||
industry_counts: dict[str, int] = {}
|
||||
for r in applicable:
|
||||
industries = _parse_json_text(r.applicable_industries)
|
||||
if isinstance(industries, list):
|
||||
for ind in industries:
|
||||
industry_counts[ind] = industry_counts.get(ind, 0) + 1
|
||||
else:
|
||||
industry_counts["unclassified"] = (
|
||||
industry_counts.get("unclassified", 0) + 1
|
||||
)
|
||||
|
||||
return {
|
||||
"total_applicable": total_applicable,
|
||||
"limit": limit,
|
||||
"offset": offset,
|
||||
"controls": [_row_to_control(r) for r in paginated],
|
||||
"breakdown": {
|
||||
"by_domain": domain_counts,
|
||||
"by_severity": severity_counts,
|
||||
"by_industry": industry_counts,
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
def _row_to_control(r) -> dict[str, Any]:
|
||||
"""Convert a DB row to a control dict for API response."""
|
||||
return {
|
||||
"id": str(r.id),
|
||||
"framework_id": str(r.framework_id),
|
||||
"control_id": r.control_id,
|
||||
"title": r.title,
|
||||
"objective": r.objective,
|
||||
"rationale": r.rationale,
|
||||
"severity": r.severity,
|
||||
"category": r.category,
|
||||
"verification_method": r.verification_method,
|
||||
"evidence_type": getattr(r, "evidence_type", None),
|
||||
"target_audience": r.target_audience,
|
||||
"applicable_industries": r.applicable_industries,
|
||||
"applicable_company_size": r.applicable_company_size,
|
||||
"scope_conditions": r.scope_conditions,
|
||||
"release_state": r.release_state,
|
||||
"control_id_domain": (
|
||||
r.control_id.split("-")[0].upper() if r.control_id else None
|
||||
),
|
||||
"created_at": r.created_at.isoformat() if r.created_at else None,
|
||||
"updated_at": r.updated_at.isoformat() if r.updated_at else None,
|
||||
}
|
||||
631
control-pipeline/services/batch_dedup_runner.py
Normal file
631
control-pipeline/services/batch_dedup_runner.py
Normal file
@@ -0,0 +1,631 @@
|
||||
"""Batch Dedup Runner — Orchestrates deduplication of ~85k atomare Controls.
|
||||
|
||||
Reduces Pass 0b controls from ~85k to ~18-25k unique Master Controls via:
|
||||
Phase 1: Intra-Group Dedup — same merge_group_hint → pick best, link rest
|
||||
(85k → ~52k, mostly title-identical short-circuit, no embeddings)
|
||||
Phase 2: Cross-Group Dedup — embed masters, search Qdrant for similar
|
||||
masters with different hints (52k → ~18-25k)
|
||||
|
||||
All Pass 0b controls have pattern_id=NULL. The primary grouping key is
|
||||
merge_group_hint (format: "action_type:norm_obj:trigger_key"), which
|
||||
encodes the normalized action, object, and trigger.
|
||||
|
||||
Usage:
|
||||
runner = BatchDedupRunner(db)
|
||||
stats = await runner.run(dry_run=True) # preview
|
||||
stats = await runner.run(dry_run=False) # execute
|
||||
stats = await runner.run(hint_filter="implement:multi_factor_auth:none")
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import logging
|
||||
import time
|
||||
from collections import defaultdict
|
||||
|
||||
from sqlalchemy import text
|
||||
|
||||
from services.control_dedup import (
|
||||
canonicalize_text,
|
||||
ensure_qdrant_collection,
|
||||
get_embedding,
|
||||
normalize_action,
|
||||
normalize_object,
|
||||
qdrant_search_cross_regulation,
|
||||
qdrant_upsert,
|
||||
LINK_THRESHOLD,
|
||||
REVIEW_THRESHOLD,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
DEDUP_COLLECTION = "atomic_controls_dedup"
|
||||
|
||||
|
||||
# ── Quality Score ────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def quality_score(control: dict) -> float:
|
||||
"""Score a control by richness of requirements, tests, evidence, and objective.
|
||||
|
||||
Higher score = better candidate for master control.
|
||||
"""
|
||||
score = 0.0
|
||||
|
||||
reqs = control.get("requirements") or "[]"
|
||||
if isinstance(reqs, str):
|
||||
try:
|
||||
reqs = json.loads(reqs)
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
reqs = []
|
||||
score += len(reqs) * 2.0
|
||||
|
||||
tests = control.get("test_procedure") or "[]"
|
||||
if isinstance(tests, str):
|
||||
try:
|
||||
tests = json.loads(tests)
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
tests = []
|
||||
score += len(tests) * 1.5
|
||||
|
||||
evidence = control.get("evidence") or "[]"
|
||||
if isinstance(evidence, str):
|
||||
try:
|
||||
evidence = json.loads(evidence)
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
evidence = []
|
||||
score += len(evidence) * 1.0
|
||||
|
||||
objective = control.get("objective") or ""
|
||||
score += min(len(objective) / 200, 3.0)
|
||||
|
||||
return score
|
||||
|
||||
|
||||
# ── Batch Dedup Runner ───────────────────────────────────────────────────
|
||||
|
||||
|
||||
class BatchDedupRunner:
|
||||
"""Batch dedup orchestrator for existing Pass 0b atomic controls."""
|
||||
|
||||
def __init__(self, db, collection: str = DEDUP_COLLECTION):
|
||||
self.db = db
|
||||
self.collection = collection
|
||||
self.stats = {
|
||||
"total_controls": 0,
|
||||
"unique_hints": 0,
|
||||
"phase1_groups_processed": 0,
|
||||
"masters": 0,
|
||||
"linked": 0,
|
||||
"review": 0,
|
||||
"new_controls": 0,
|
||||
"parent_links_transferred": 0,
|
||||
"cross_group_linked": 0,
|
||||
"cross_group_review": 0,
|
||||
"errors": 0,
|
||||
"skipped_title_identical": 0,
|
||||
}
|
||||
self._progress_phase = ""
|
||||
self._progress_count = 0
|
||||
self._progress_total = 0
|
||||
|
||||
async def run(
|
||||
self,
|
||||
dry_run: bool = False,
|
||||
hint_filter: str = None,
|
||||
) -> dict:
|
||||
"""Run the full batch dedup pipeline.
|
||||
|
||||
Args:
|
||||
dry_run: If True, compute stats but don't modify DB/Qdrant.
|
||||
hint_filter: If set, only process groups matching this hint prefix.
|
||||
|
||||
Returns:
|
||||
Stats dict with counts.
|
||||
"""
|
||||
start = time.monotonic()
|
||||
logger.info("BatchDedup starting (dry_run=%s, hint_filter=%s)",
|
||||
dry_run, hint_filter)
|
||||
|
||||
if not dry_run:
|
||||
await ensure_qdrant_collection(collection=self.collection)
|
||||
|
||||
# Phase 1: Intra-group dedup (same merge_group_hint)
|
||||
self._progress_phase = "phase1"
|
||||
groups = self._load_merge_groups(hint_filter)
|
||||
self._progress_total = self.stats["total_controls"]
|
||||
|
||||
for hint, controls in groups:
|
||||
try:
|
||||
await self._process_hint_group(hint, controls, dry_run)
|
||||
self.stats["phase1_groups_processed"] += 1
|
||||
except Exception as e:
|
||||
logger.error("BatchDedup Phase 1 error on hint %s: %s", hint, e)
|
||||
self.stats["errors"] += 1
|
||||
try:
|
||||
self.db.rollback()
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
logger.info(
|
||||
"BatchDedup Phase 1 done: %d masters, %d linked, %d review",
|
||||
self.stats["masters"], self.stats["linked"], self.stats["review"],
|
||||
)
|
||||
|
||||
# Phase 2: Cross-group dedup via embeddings
|
||||
if not dry_run:
|
||||
self._progress_phase = "phase2"
|
||||
await self._run_cross_group_pass()
|
||||
|
||||
elapsed = time.monotonic() - start
|
||||
self.stats["elapsed_seconds"] = round(elapsed, 1)
|
||||
logger.info("BatchDedup completed in %.1fs: %s", elapsed, self.stats)
|
||||
return self.stats
|
||||
|
||||
def _load_merge_groups(self, hint_filter: str = None) -> list:
|
||||
"""Load all Pass 0b controls grouped by merge_group_hint, largest first."""
|
||||
conditions = [
|
||||
"decomposition_method = 'pass0b'",
|
||||
"release_state != 'deprecated'",
|
||||
"release_state != 'duplicate'",
|
||||
]
|
||||
params = {}
|
||||
|
||||
if hint_filter:
|
||||
conditions.append("generation_metadata->>'merge_group_hint' LIKE :hf")
|
||||
params["hf"] = f"{hint_filter}%"
|
||||
|
||||
where = " AND ".join(conditions)
|
||||
rows = self.db.execute(text(f"""
|
||||
SELECT id::text, control_id, title, objective,
|
||||
pattern_id, requirements::text, test_procedure::text,
|
||||
evidence::text, release_state,
|
||||
generation_metadata->>'merge_group_hint' as merge_group_hint,
|
||||
generation_metadata->>'action_object_class' as action_object_class
|
||||
FROM canonical_controls
|
||||
WHERE {where}
|
||||
ORDER BY control_id
|
||||
"""), params).fetchall()
|
||||
|
||||
by_hint = defaultdict(list)
|
||||
for r in rows:
|
||||
by_hint[r[9] or ""].append({
|
||||
"uuid": r[0],
|
||||
"control_id": r[1],
|
||||
"title": r[2],
|
||||
"objective": r[3],
|
||||
"pattern_id": r[4],
|
||||
"requirements": r[5],
|
||||
"test_procedure": r[6],
|
||||
"evidence": r[7],
|
||||
"release_state": r[8],
|
||||
"merge_group_hint": r[9] or "",
|
||||
"action_object_class": r[10] or "",
|
||||
})
|
||||
|
||||
self.stats["total_controls"] = len(rows)
|
||||
self.stats["unique_hints"] = len(by_hint)
|
||||
|
||||
sorted_groups = sorted(by_hint.items(), key=lambda x: len(x[1]), reverse=True)
|
||||
logger.info("BatchDedup loaded %d controls in %d hint groups",
|
||||
len(rows), len(sorted_groups))
|
||||
return sorted_groups
|
||||
|
||||
def _sub_group_by_merge_hint(self, controls: list) -> dict:
|
||||
"""Group controls by merge_group_hint composite key."""
|
||||
groups = defaultdict(list)
|
||||
for c in controls:
|
||||
hint = c["merge_group_hint"]
|
||||
if hint:
|
||||
groups[hint].append(c)
|
||||
else:
|
||||
groups[f"__no_hint_{c['uuid']}"].append(c)
|
||||
return dict(groups)
|
||||
|
||||
async def _process_hint_group(
|
||||
self,
|
||||
hint: str,
|
||||
controls: list,
|
||||
dry_run: bool,
|
||||
):
|
||||
"""Process all controls sharing the same merge_group_hint.
|
||||
|
||||
Within a hint group, all controls share action+object+trigger.
|
||||
The best-quality control becomes master, rest are linked as duplicates.
|
||||
"""
|
||||
if len(controls) < 2:
|
||||
# Singleton → always master
|
||||
self.stats["masters"] += 1
|
||||
if not dry_run:
|
||||
await self._embed_and_index(controls[0])
|
||||
self._progress_count += 1
|
||||
self._log_progress(hint)
|
||||
return
|
||||
|
||||
# Sort by quality score (best first)
|
||||
sorted_group = sorted(controls, key=quality_score, reverse=True)
|
||||
master = sorted_group[0]
|
||||
self.stats["masters"] += 1
|
||||
|
||||
if not dry_run:
|
||||
await self._embed_and_index(master)
|
||||
|
||||
for candidate in sorted_group[1:]:
|
||||
# All share the same hint → check title similarity
|
||||
if candidate["title"].strip().lower() == master["title"].strip().lower():
|
||||
# Identical title → direct link (no embedding needed)
|
||||
self.stats["linked"] += 1
|
||||
self.stats["skipped_title_identical"] += 1
|
||||
if not dry_run:
|
||||
await self._mark_duplicate(master, candidate, confidence=1.0)
|
||||
else:
|
||||
# Different title within same hint → still likely duplicate
|
||||
# Use embedding to verify
|
||||
await self._check_and_link_within_group(master, candidate, dry_run)
|
||||
|
||||
self._progress_count += 1
|
||||
self._log_progress(hint)
|
||||
|
||||
async def _check_and_link_within_group(
|
||||
self,
|
||||
master: dict,
|
||||
candidate: dict,
|
||||
dry_run: bool,
|
||||
):
|
||||
"""Check if candidate (same hint group) is duplicate of master via embedding."""
|
||||
parts = candidate["merge_group_hint"].split(":", 2)
|
||||
action = parts[0] if len(parts) > 0 else ""
|
||||
obj = parts[1] if len(parts) > 1 else ""
|
||||
|
||||
canonical = canonicalize_text(action, obj, candidate["title"])
|
||||
embedding = await get_embedding(canonical)
|
||||
|
||||
if not embedding:
|
||||
# Can't embed → link anyway (same hint = same action+object)
|
||||
self.stats["linked"] += 1
|
||||
if not dry_run:
|
||||
await self._mark_duplicate(master, candidate, confidence=0.90)
|
||||
return
|
||||
|
||||
# Search the dedup collection (unfiltered — pattern_id is NULL)
|
||||
results = await qdrant_search_cross_regulation(
|
||||
embedding, top_k=3, collection=self.collection,
|
||||
)
|
||||
|
||||
if not results:
|
||||
# No Qdrant matches yet (master might not be indexed yet) → link to master
|
||||
self.stats["linked"] += 1
|
||||
if not dry_run:
|
||||
await self._mark_duplicate(master, candidate, confidence=0.90)
|
||||
return
|
||||
|
||||
best = results[0]
|
||||
best_score = best.get("score", 0.0)
|
||||
best_payload = best.get("payload", {})
|
||||
best_uuid = best_payload.get("control_uuid", "")
|
||||
|
||||
if best_score > LINK_THRESHOLD:
|
||||
self.stats["linked"] += 1
|
||||
if not dry_run:
|
||||
await self._mark_duplicate_to(best_uuid, candidate, confidence=best_score)
|
||||
elif best_score > REVIEW_THRESHOLD:
|
||||
self.stats["review"] += 1
|
||||
if not dry_run:
|
||||
self._write_review(candidate, best_payload, best_score)
|
||||
else:
|
||||
# Very different despite same hint → new master
|
||||
self.stats["new_controls"] += 1
|
||||
if not dry_run:
|
||||
await self._index_with_embedding(candidate, embedding)
|
||||
|
||||
async def _run_cross_group_pass(self):
|
||||
"""Phase 2: Find cross-group duplicates among surviving masters.
|
||||
|
||||
After Phase 1, ~52k masters remain. Many have similar semantics
|
||||
despite different merge_group_hints (e.g. different German spellings).
|
||||
This pass embeds all masters and finds near-duplicates via Qdrant.
|
||||
"""
|
||||
logger.info("BatchDedup Phase 2: Cross-group pass starting...")
|
||||
|
||||
rows = self.db.execute(text("""
|
||||
SELECT id::text, control_id, title,
|
||||
generation_metadata->>'merge_group_hint' as merge_group_hint
|
||||
FROM canonical_controls
|
||||
WHERE decomposition_method = 'pass0b'
|
||||
AND release_state != 'duplicate'
|
||||
AND release_state != 'deprecated'
|
||||
ORDER BY control_id
|
||||
""")).fetchall()
|
||||
|
||||
self._progress_total = len(rows)
|
||||
self._progress_count = 0
|
||||
logger.info("BatchDedup Cross-group: %d masters to check", len(rows))
|
||||
cross_linked = 0
|
||||
cross_review = 0
|
||||
|
||||
# Process in parallel batches for embedding + Qdrant search
|
||||
PARALLEL_BATCH = 10
|
||||
|
||||
async def _embed_and_search(r):
|
||||
"""Embed one control and search Qdrant — safe for asyncio.gather."""
|
||||
hint = r[3] or ""
|
||||
parts = hint.split(":", 2)
|
||||
action = parts[0] if len(parts) > 0 else ""
|
||||
obj = parts[1] if len(parts) > 1 else ""
|
||||
canonical = canonicalize_text(action, obj, r[2])
|
||||
embedding = await get_embedding(canonical)
|
||||
if not embedding:
|
||||
return None
|
||||
results = await qdrant_search_cross_regulation(
|
||||
embedding, top_k=5, collection=self.collection,
|
||||
)
|
||||
return (r, results)
|
||||
|
||||
for batch_start in range(0, len(rows), PARALLEL_BATCH):
|
||||
batch = rows[batch_start:batch_start + PARALLEL_BATCH]
|
||||
tasks = [_embed_and_search(r) for r in batch]
|
||||
results_batch = await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
for res in results_batch:
|
||||
if res is None or isinstance(res, Exception):
|
||||
if isinstance(res, Exception):
|
||||
logger.error("BatchDedup embed/search error: %s", res)
|
||||
self.stats["errors"] += 1
|
||||
continue
|
||||
|
||||
r, results = res
|
||||
ctrl_uuid = r[0]
|
||||
hint = r[3] or ""
|
||||
|
||||
if not results:
|
||||
continue
|
||||
|
||||
for match in results:
|
||||
match_score = match.get("score", 0.0)
|
||||
match_payload = match.get("payload", {})
|
||||
match_uuid = match_payload.get("control_uuid", "")
|
||||
|
||||
if match_uuid == ctrl_uuid:
|
||||
continue
|
||||
|
||||
if match_score > LINK_THRESHOLD:
|
||||
try:
|
||||
self.db.execute(text("""
|
||||
UPDATE canonical_controls
|
||||
SET release_state = 'duplicate', merged_into_uuid = CAST(:master AS uuid)
|
||||
WHERE id = CAST(:dup AS uuid)
|
||||
AND release_state != 'duplicate'
|
||||
"""), {"master": match_uuid, "dup": ctrl_uuid})
|
||||
|
||||
self.db.execute(text("""
|
||||
INSERT INTO control_parent_links
|
||||
(control_uuid, parent_control_uuid, link_type, confidence)
|
||||
VALUES (CAST(:cu AS uuid), CAST(:pu AS uuid), 'cross_regulation', :conf)
|
||||
ON CONFLICT (control_uuid, parent_control_uuid) DO NOTHING
|
||||
"""), {"cu": match_uuid, "pu": ctrl_uuid, "conf": match_score})
|
||||
|
||||
transferred = self._transfer_parent_links(match_uuid, ctrl_uuid)
|
||||
self.stats["parent_links_transferred"] += transferred
|
||||
|
||||
self.db.commit()
|
||||
cross_linked += 1
|
||||
except Exception as e:
|
||||
logger.error("BatchDedup cross-group link error %s→%s: %s",
|
||||
ctrl_uuid, match_uuid, e)
|
||||
self.db.rollback()
|
||||
self.stats["errors"] += 1
|
||||
break
|
||||
elif match_score > REVIEW_THRESHOLD:
|
||||
self._write_review(
|
||||
{"control_id": r[1], "title": r[2], "objective": "",
|
||||
"merge_group_hint": hint, "pattern_id": None},
|
||||
match_payload, match_score,
|
||||
)
|
||||
cross_review += 1
|
||||
break
|
||||
|
||||
processed = min(batch_start + PARALLEL_BATCH, len(rows))
|
||||
self._progress_count = processed
|
||||
if processed % 500 < PARALLEL_BATCH:
|
||||
logger.info("BatchDedup Cross-group: %d/%d checked, %d linked, %d review",
|
||||
processed, len(rows), cross_linked, cross_review)
|
||||
|
||||
self.stats["cross_group_linked"] = cross_linked
|
||||
self.stats["cross_group_review"] = cross_review
|
||||
logger.info("BatchDedup Cross-group complete: %d linked, %d review",
|
||||
cross_linked, cross_review)
|
||||
|
||||
# ── Qdrant Helpers ───────────────────────────────────────────────────
|
||||
|
||||
async def _embed_and_index(self, control: dict):
|
||||
"""Compute embedding and index a control in the dedup Qdrant collection."""
|
||||
parts = control["merge_group_hint"].split(":", 2)
|
||||
action = parts[0] if len(parts) > 0 else ""
|
||||
obj = parts[1] if len(parts) > 1 else ""
|
||||
|
||||
norm_action = normalize_action(action)
|
||||
norm_object = normalize_object(obj)
|
||||
canonical = canonicalize_text(action, obj, control["title"])
|
||||
embedding = await get_embedding(canonical)
|
||||
|
||||
if not embedding:
|
||||
return
|
||||
|
||||
await qdrant_upsert(
|
||||
point_id=control["uuid"],
|
||||
embedding=embedding,
|
||||
payload={
|
||||
"control_uuid": control["uuid"],
|
||||
"control_id": control["control_id"],
|
||||
"title": control["title"],
|
||||
"pattern_id": control.get("pattern_id"),
|
||||
"action_normalized": norm_action,
|
||||
"object_normalized": norm_object,
|
||||
"canonical_text": canonical,
|
||||
"merge_group_hint": control["merge_group_hint"],
|
||||
},
|
||||
collection=self.collection,
|
||||
)
|
||||
|
||||
async def _index_with_embedding(self, control: dict, embedding: list):
|
||||
"""Index a control with a pre-computed embedding."""
|
||||
parts = control["merge_group_hint"].split(":", 2)
|
||||
action = parts[0] if len(parts) > 0 else ""
|
||||
obj = parts[1] if len(parts) > 1 else ""
|
||||
|
||||
norm_action = normalize_action(action)
|
||||
norm_object = normalize_object(obj)
|
||||
canonical = canonicalize_text(action, obj, control["title"])
|
||||
|
||||
await qdrant_upsert(
|
||||
point_id=control["uuid"],
|
||||
embedding=embedding,
|
||||
payload={
|
||||
"control_uuid": control["uuid"],
|
||||
"control_id": control["control_id"],
|
||||
"title": control["title"],
|
||||
"pattern_id": control.get("pattern_id"),
|
||||
"action_normalized": norm_action,
|
||||
"object_normalized": norm_object,
|
||||
"canonical_text": canonical,
|
||||
"merge_group_hint": control["merge_group_hint"],
|
||||
},
|
||||
collection=self.collection,
|
||||
)
|
||||
|
||||
# ── DB Write Helpers ─────────────────────────────────────────────────
|
||||
|
||||
async def _mark_duplicate(self, master: dict, candidate: dict, confidence: float):
|
||||
"""Mark candidate as duplicate of master, transfer parent links."""
|
||||
try:
|
||||
self.db.execute(text("""
|
||||
UPDATE canonical_controls
|
||||
SET release_state = 'duplicate', merged_into_uuid = CAST(:master AS uuid)
|
||||
WHERE id = CAST(:cand AS uuid)
|
||||
"""), {"master": master["uuid"], "cand": candidate["uuid"]})
|
||||
|
||||
self.db.execute(text("""
|
||||
INSERT INTO control_parent_links
|
||||
(control_uuid, parent_control_uuid, link_type, confidence)
|
||||
VALUES (CAST(:master AS uuid), CAST(:cand_parent AS uuid), 'dedup_merge', :conf)
|
||||
ON CONFLICT (control_uuid, parent_control_uuid) DO NOTHING
|
||||
"""), {"master": master["uuid"], "cand_parent": candidate["uuid"], "conf": confidence})
|
||||
|
||||
transferred = self._transfer_parent_links(master["uuid"], candidate["uuid"])
|
||||
self.stats["parent_links_transferred"] += transferred
|
||||
|
||||
self.db.commit()
|
||||
except Exception as e:
|
||||
logger.error("BatchDedup _mark_duplicate error %s→%s: %s",
|
||||
candidate["uuid"], master["uuid"], e)
|
||||
self.db.rollback()
|
||||
raise
|
||||
|
||||
async def _mark_duplicate_to(self, master_uuid: str, candidate: dict, confidence: float):
|
||||
"""Mark candidate as duplicate of a Qdrant-matched master."""
|
||||
try:
|
||||
self.db.execute(text("""
|
||||
UPDATE canonical_controls
|
||||
SET release_state = 'duplicate', merged_into_uuid = CAST(:master AS uuid)
|
||||
WHERE id = CAST(:cand AS uuid)
|
||||
"""), {"master": master_uuid, "cand": candidate["uuid"]})
|
||||
|
||||
self.db.execute(text("""
|
||||
INSERT INTO control_parent_links
|
||||
(control_uuid, parent_control_uuid, link_type, confidence)
|
||||
VALUES (CAST(:master AS uuid), CAST(:cand_parent AS uuid), 'dedup_merge', :conf)
|
||||
ON CONFLICT (control_uuid, parent_control_uuid) DO NOTHING
|
||||
"""), {"master": master_uuid, "cand_parent": candidate["uuid"], "conf": confidence})
|
||||
|
||||
transferred = self._transfer_parent_links(master_uuid, candidate["uuid"])
|
||||
self.stats["parent_links_transferred"] += transferred
|
||||
|
||||
self.db.commit()
|
||||
except Exception as e:
|
||||
logger.error("BatchDedup _mark_duplicate_to error %s→%s: %s",
|
||||
candidate["uuid"], master_uuid, e)
|
||||
self.db.rollback()
|
||||
raise
|
||||
|
||||
def _transfer_parent_links(self, master_uuid: str, duplicate_uuid: str) -> int:
|
||||
"""Move existing parent links from duplicate to master."""
|
||||
rows = self.db.execute(text("""
|
||||
SELECT parent_control_uuid::text, link_type, confidence,
|
||||
source_regulation, source_article, obligation_candidate_id::text
|
||||
FROM control_parent_links
|
||||
WHERE control_uuid = CAST(:dup AS uuid)
|
||||
AND link_type = 'decomposition'
|
||||
"""), {"dup": duplicate_uuid}).fetchall()
|
||||
|
||||
transferred = 0
|
||||
for r in rows:
|
||||
parent_uuid = r[0]
|
||||
if parent_uuid == master_uuid:
|
||||
continue
|
||||
self.db.execute(text("""
|
||||
INSERT INTO control_parent_links
|
||||
(control_uuid, parent_control_uuid, link_type, confidence,
|
||||
source_regulation, source_article, obligation_candidate_id)
|
||||
VALUES (CAST(:cu AS uuid), CAST(:pu AS uuid), :lt, :conf,
|
||||
:sr, :sa, CAST(:oci AS uuid))
|
||||
ON CONFLICT (control_uuid, parent_control_uuid) DO NOTHING
|
||||
"""), {
|
||||
"cu": master_uuid,
|
||||
"pu": parent_uuid,
|
||||
"lt": r[1],
|
||||
"conf": float(r[2]) if r[2] else 1.0,
|
||||
"sr": r[3],
|
||||
"sa": r[4],
|
||||
"oci": r[5],
|
||||
})
|
||||
transferred += 1
|
||||
|
||||
return transferred
|
||||
|
||||
def _write_review(self, candidate: dict, matched_payload: dict, score: float):
|
||||
"""Write a dedup review entry for borderline matches."""
|
||||
try:
|
||||
self.db.execute(text("""
|
||||
INSERT INTO control_dedup_reviews
|
||||
(candidate_control_id, candidate_title, candidate_objective,
|
||||
matched_control_uuid, matched_control_id,
|
||||
similarity_score, dedup_stage, dedup_details)
|
||||
VALUES (:ccid, :ct, :co, CAST(:mcu AS uuid), :mci,
|
||||
:ss, 'batch_dedup', CAST(:dd AS jsonb))
|
||||
"""), {
|
||||
"ccid": candidate["control_id"],
|
||||
"ct": candidate["title"],
|
||||
"co": candidate.get("objective", ""),
|
||||
"mcu": matched_payload.get("control_uuid"),
|
||||
"mci": matched_payload.get("control_id"),
|
||||
"ss": score,
|
||||
"dd": json.dumps({
|
||||
"merge_group_hint": candidate.get("merge_group_hint", ""),
|
||||
"pattern_id": candidate.get("pattern_id"),
|
||||
}),
|
||||
})
|
||||
self.db.commit()
|
||||
except Exception as e:
|
||||
logger.error("BatchDedup _write_review error: %s", e)
|
||||
self.db.rollback()
|
||||
raise
|
||||
|
||||
# ── Progress ─────────────────────────────────────────────────────────
|
||||
|
||||
def _log_progress(self, hint: str):
|
||||
"""Log progress every 500 controls."""
|
||||
if self._progress_count > 0 and self._progress_count % 500 == 0:
|
||||
logger.info(
|
||||
"BatchDedup [%s] %d/%d — masters=%d, linked=%d, review=%d",
|
||||
self._progress_phase, self._progress_count, self._progress_total,
|
||||
self.stats["masters"], self.stats["linked"], self.stats["review"],
|
||||
)
|
||||
|
||||
def get_status(self) -> dict:
|
||||
"""Return current progress stats (for status endpoint)."""
|
||||
return {
|
||||
"phase": self._progress_phase,
|
||||
"progress": self._progress_count,
|
||||
"total": self._progress_total,
|
||||
**self.stats,
|
||||
}
|
||||
438
control-pipeline/services/citation_backfill.py
Normal file
438
control-pipeline/services/citation_backfill.py
Normal file
@@ -0,0 +1,438 @@
|
||||
"""
|
||||
Citation Backfill Service — enrich existing controls with article/paragraph provenance.
|
||||
|
||||
3-tier matching strategy:
|
||||
Tier 1 — Hash match: sha256(source_original_text) → RAG chunk lookup
|
||||
Tier 2 — Regex parse: split concatenated "DSGVO Art. 35" → regulation + article
|
||||
Tier 3 — Ollama LLM: ask local LLM to identify article/paragraph from text
|
||||
"""
|
||||
|
||||
import hashlib
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime, timezone
|
||||
from typing import Optional
|
||||
|
||||
import httpx
|
||||
from sqlalchemy import text
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
from .rag_client import ComplianceRAGClient, RAGSearchResult
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
OLLAMA_URL = os.getenv("OLLAMA_URL", "http://host.docker.internal:11434")
|
||||
OLLAMA_MODEL = os.getenv("CONTROL_GEN_OLLAMA_MODEL", "qwen3.5:35b-a3b")
|
||||
LLM_TIMEOUT = float(os.getenv("CONTROL_GEN_LLM_TIMEOUT", "180"))
|
||||
|
||||
ALL_COLLECTIONS = [
|
||||
"bp_compliance_ce",
|
||||
"bp_compliance_gesetze",
|
||||
"bp_compliance_datenschutz",
|
||||
"bp_dsfa_corpus",
|
||||
"bp_legal_templates",
|
||||
]
|
||||
|
||||
BACKFILL_SYSTEM_PROMPT = (
|
||||
"Du bist ein Rechtsexperte. Deine Aufgabe ist es, aus einem Gesetzestext "
|
||||
"den genauen Artikel und Absatz zu bestimmen. Antworte NUR mit validem JSON."
|
||||
)
|
||||
|
||||
# Regex to split concatenated source like "DSGVO Art. 35" or "NIS2 Artikel 21 Abs. 2"
|
||||
_SOURCE_ARTICLE_RE = re.compile(
|
||||
r"^(.+?)\s+(Art(?:ikel)?\.?\s*\d+.*)$", re.IGNORECASE
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class MatchResult:
|
||||
article: str
|
||||
paragraph: str
|
||||
method: str # "hash", "regex", "llm"
|
||||
|
||||
|
||||
@dataclass
|
||||
class BackfillResult:
|
||||
total_controls: int = 0
|
||||
matched_hash: int = 0
|
||||
matched_regex: int = 0
|
||||
matched_llm: int = 0
|
||||
unmatched: int = 0
|
||||
updated: int = 0
|
||||
errors: list = field(default_factory=list)
|
||||
|
||||
|
||||
class CitationBackfill:
|
||||
"""Backfill article/paragraph into existing control source_citations."""
|
||||
|
||||
def __init__(self, db: Session, rag_client: ComplianceRAGClient):
|
||||
self.db = db
|
||||
self.rag = rag_client
|
||||
self._rag_index: dict[str, RAGSearchResult] = {}
|
||||
|
||||
async def run(self, dry_run: bool = True, limit: int = 0) -> BackfillResult:
|
||||
"""Main entry: iterate controls missing article/paragraph, match to RAG, update."""
|
||||
result = BackfillResult()
|
||||
|
||||
# Load controls needing backfill
|
||||
controls = self._load_controls_needing_backfill(limit)
|
||||
result.total_controls = len(controls)
|
||||
logger.info("Backfill: %d controls need article/paragraph enrichment", len(controls))
|
||||
|
||||
if not controls:
|
||||
return result
|
||||
|
||||
# Collect hashes we need to find — only build index for controls with source text
|
||||
needed_hashes: set[str] = set()
|
||||
for ctrl in controls:
|
||||
src = ctrl.get("source_original_text")
|
||||
if src:
|
||||
needed_hashes.add(hashlib.sha256(src.encode()).hexdigest())
|
||||
|
||||
if needed_hashes:
|
||||
# Build targeted RAG index — only scroll collections that our controls reference
|
||||
logger.info("Building targeted RAG hash index for %d source texts...", len(needed_hashes))
|
||||
await self._build_rag_index_targeted(controls)
|
||||
logger.info("RAG index built: %d chunks indexed, %d hashes needed", len(self._rag_index), len(needed_hashes))
|
||||
else:
|
||||
logger.info("No source_original_text found — skipping RAG index build")
|
||||
|
||||
# Process each control
|
||||
for i, ctrl in enumerate(controls):
|
||||
if i > 0 and i % 100 == 0:
|
||||
logger.info("Backfill progress: %d/%d processed", i, result.total_controls)
|
||||
|
||||
try:
|
||||
match = await self._match_control(ctrl)
|
||||
if match:
|
||||
if match.method == "hash":
|
||||
result.matched_hash += 1
|
||||
elif match.method == "regex":
|
||||
result.matched_regex += 1
|
||||
elif match.method == "llm":
|
||||
result.matched_llm += 1
|
||||
|
||||
if not dry_run:
|
||||
self._update_control(ctrl, match)
|
||||
result.updated += 1
|
||||
else:
|
||||
logger.debug(
|
||||
"DRY RUN: Would update %s with article=%s paragraph=%s (method=%s)",
|
||||
ctrl["control_id"], match.article, match.paragraph, match.method,
|
||||
)
|
||||
else:
|
||||
result.unmatched += 1
|
||||
|
||||
except Exception as e:
|
||||
error_msg = f"Error backfilling {ctrl.get('control_id', '?')}: {e}"
|
||||
logger.error(error_msg)
|
||||
result.errors.append(error_msg)
|
||||
|
||||
if not dry_run:
|
||||
try:
|
||||
self.db.commit()
|
||||
except Exception as e:
|
||||
logger.error("Backfill commit failed: %s", e)
|
||||
result.errors.append(f"Commit failed: {e}")
|
||||
|
||||
logger.info(
|
||||
"Backfill complete: %d total, hash=%d regex=%d llm=%d unmatched=%d updated=%d",
|
||||
result.total_controls, result.matched_hash, result.matched_regex,
|
||||
result.matched_llm, result.unmatched, result.updated,
|
||||
)
|
||||
return result
|
||||
|
||||
def _load_controls_needing_backfill(self, limit: int = 0) -> list[dict]:
|
||||
"""Load controls where source_citation exists but lacks separate 'article' key."""
|
||||
query = """
|
||||
SELECT id, control_id, source_citation, source_original_text,
|
||||
generation_metadata, license_rule
|
||||
FROM canonical_controls
|
||||
WHERE license_rule IN (1, 2)
|
||||
AND source_citation IS NOT NULL
|
||||
AND (
|
||||
source_citation->>'article' IS NULL
|
||||
OR source_citation->>'article' = ''
|
||||
)
|
||||
ORDER BY control_id
|
||||
"""
|
||||
if limit > 0:
|
||||
query += f" LIMIT {limit}"
|
||||
|
||||
result = self.db.execute(text(query))
|
||||
cols = result.keys()
|
||||
controls = []
|
||||
for row in result:
|
||||
ctrl = dict(zip(cols, row))
|
||||
ctrl["id"] = str(ctrl["id"])
|
||||
# Parse JSON fields
|
||||
for jf in ("source_citation", "generation_metadata"):
|
||||
if isinstance(ctrl.get(jf), str):
|
||||
try:
|
||||
ctrl[jf] = json.loads(ctrl[jf])
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
ctrl[jf] = {}
|
||||
controls.append(ctrl)
|
||||
return controls
|
||||
|
||||
async def _build_rag_index_targeted(self, controls: list[dict]):
|
||||
"""Build RAG index by scrolling only collections relevant to our controls.
|
||||
|
||||
Uses regulation codes from generation_metadata to identify which collections
|
||||
to search, falling back to all collections only if needed.
|
||||
"""
|
||||
# Determine which collections are relevant based on regulation codes
|
||||
regulation_to_collection = self._map_regulations_to_collections(controls)
|
||||
collections_to_search = set(regulation_to_collection.values()) or set(ALL_COLLECTIONS)
|
||||
|
||||
logger.info("Targeted index: searching %d collections: %s",
|
||||
len(collections_to_search), ", ".join(collections_to_search))
|
||||
|
||||
for collection in collections_to_search:
|
||||
offset = None
|
||||
page = 0
|
||||
seen_offsets: set[str] = set()
|
||||
while True:
|
||||
chunks, next_offset = await self.rag.scroll(
|
||||
collection=collection, offset=offset, limit=200,
|
||||
)
|
||||
if not chunks:
|
||||
break
|
||||
for chunk in chunks:
|
||||
if chunk.text and len(chunk.text.strip()) >= 50:
|
||||
h = hashlib.sha256(chunk.text.encode()).hexdigest()
|
||||
self._rag_index[h] = chunk
|
||||
page += 1
|
||||
if page % 50 == 0:
|
||||
logger.info("Indexing %s: page %d (%d chunks so far)",
|
||||
collection, page, len(self._rag_index))
|
||||
if not next_offset:
|
||||
break
|
||||
if next_offset in seen_offsets:
|
||||
logger.warning("Scroll loop in %s at page %d — stopping", collection, page)
|
||||
break
|
||||
seen_offsets.add(next_offset)
|
||||
offset = next_offset
|
||||
|
||||
logger.info("Indexed collection %s: %d pages", collection, page)
|
||||
|
||||
def _map_regulations_to_collections(self, controls: list[dict]) -> dict[str, str]:
|
||||
"""Map regulation codes from controls to likely Qdrant collections."""
|
||||
# Heuristic: regulation code prefix → collection
|
||||
collection_map = {
|
||||
"eu_": "bp_compliance_gesetze",
|
||||
"dsgvo": "bp_compliance_datenschutz",
|
||||
"bdsg": "bp_compliance_gesetze",
|
||||
"ttdsg": "bp_compliance_gesetze",
|
||||
"nist_": "bp_compliance_ce",
|
||||
"owasp": "bp_compliance_ce",
|
||||
"bsi_": "bp_compliance_ce",
|
||||
"enisa": "bp_compliance_ce",
|
||||
"at_": "bp_compliance_recht",
|
||||
"fr_": "bp_compliance_recht",
|
||||
"es_": "bp_compliance_recht",
|
||||
}
|
||||
result: dict[str, str] = {}
|
||||
for ctrl in controls:
|
||||
meta = ctrl.get("generation_metadata") or {}
|
||||
reg = meta.get("source_regulation", "")
|
||||
if not reg:
|
||||
continue
|
||||
for prefix, coll in collection_map.items():
|
||||
if reg.startswith(prefix):
|
||||
result[reg] = coll
|
||||
break
|
||||
else:
|
||||
# Unknown regulation — search all
|
||||
for coll in ALL_COLLECTIONS:
|
||||
result[f"_all_{coll}"] = coll
|
||||
return result
|
||||
|
||||
async def _match_control(self, ctrl: dict) -> Optional[MatchResult]:
|
||||
"""3-tier matching: hash → regex → LLM."""
|
||||
|
||||
# Tier 1: Hash match against RAG index
|
||||
source_text = ctrl.get("source_original_text")
|
||||
if source_text:
|
||||
h = hashlib.sha256(source_text.encode()).hexdigest()
|
||||
chunk = self._rag_index.get(h)
|
||||
if chunk and (chunk.article or chunk.paragraph):
|
||||
return MatchResult(
|
||||
article=chunk.article or "",
|
||||
paragraph=chunk.paragraph or "",
|
||||
method="hash",
|
||||
)
|
||||
|
||||
# Tier 2: Regex parse concatenated source
|
||||
citation = ctrl.get("source_citation") or {}
|
||||
source_str = citation.get("source", "")
|
||||
parsed = _parse_concatenated_source(source_str)
|
||||
if parsed and parsed["article"]:
|
||||
return MatchResult(
|
||||
article=parsed["article"],
|
||||
paragraph="", # Regex can't extract paragraph from concatenated format
|
||||
method="regex",
|
||||
)
|
||||
|
||||
# Tier 3: Ollama LLM
|
||||
if source_text:
|
||||
return await self._llm_match(ctrl)
|
||||
|
||||
return None
|
||||
|
||||
async def _llm_match(self, ctrl: dict) -> Optional[MatchResult]:
|
||||
"""Use Ollama to identify article/paragraph from source text."""
|
||||
citation = ctrl.get("source_citation") or {}
|
||||
regulation_name = citation.get("source", "")
|
||||
metadata = ctrl.get("generation_metadata") or {}
|
||||
regulation_code = metadata.get("source_regulation", "")
|
||||
source_text = ctrl.get("source_original_text", "")
|
||||
|
||||
prompt = f"""Analysiere den folgenden Gesetzestext und bestimme den genauen Artikel und Absatz.
|
||||
|
||||
Gesetz: {regulation_name} (Code: {regulation_code})
|
||||
|
||||
Text:
|
||||
---
|
||||
{source_text[:2000]}
|
||||
---
|
||||
|
||||
Antworte NUR mit JSON:
|
||||
{{"article": "Art. XX", "paragraph": "Abs. Y"}}
|
||||
|
||||
Falls kein spezifischer Absatz erkennbar ist, setze paragraph auf "".
|
||||
Falls kein Artikel erkennbar ist, setze article auf "".
|
||||
Bei deutschen Gesetzen mit § verwende: "§ XX" statt "Art. XX"."""
|
||||
|
||||
try:
|
||||
raw = await _llm_ollama(prompt, BACKFILL_SYSTEM_PROMPT)
|
||||
data = _parse_json(raw)
|
||||
if data and (data.get("article") or data.get("paragraph")):
|
||||
return MatchResult(
|
||||
article=data.get("article", ""),
|
||||
paragraph=data.get("paragraph", ""),
|
||||
method="llm",
|
||||
)
|
||||
except Exception as e:
|
||||
logger.warning("LLM match failed for %s: %s", ctrl.get("control_id"), e)
|
||||
|
||||
return None
|
||||
|
||||
def _update_control(self, ctrl: dict, match: MatchResult):
|
||||
"""Update source_citation and generation_metadata in DB."""
|
||||
citation = ctrl.get("source_citation") or {}
|
||||
|
||||
# Clean the source name: remove concatenated article if present
|
||||
source_str = citation.get("source", "")
|
||||
parsed = _parse_concatenated_source(source_str)
|
||||
if parsed:
|
||||
citation["source"] = parsed["name"]
|
||||
|
||||
# Add separate article/paragraph fields
|
||||
citation["article"] = match.article
|
||||
citation["paragraph"] = match.paragraph
|
||||
|
||||
# Update generation_metadata
|
||||
metadata = ctrl.get("generation_metadata") or {}
|
||||
if match.article:
|
||||
metadata["source_article"] = match.article
|
||||
metadata["source_paragraph"] = match.paragraph
|
||||
metadata["backfill_method"] = match.method
|
||||
metadata["backfill_at"] = datetime.now(timezone.utc).isoformat()
|
||||
|
||||
self.db.execute(
|
||||
text("""
|
||||
UPDATE canonical_controls
|
||||
SET source_citation = :citation,
|
||||
generation_metadata = :metadata,
|
||||
updated_at = NOW()
|
||||
WHERE id = CAST(:id AS uuid)
|
||||
"""),
|
||||
{
|
||||
"id": ctrl["id"],
|
||||
"citation": json.dumps(citation),
|
||||
"metadata": json.dumps(metadata),
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
def _parse_concatenated_source(source: str) -> Optional[dict]:
|
||||
"""Parse 'DSGVO Art. 35' → {name: 'DSGVO', article: 'Art. 35'}.
|
||||
|
||||
Also handles '§' format: 'BDSG § 42' → {name: 'BDSG', article: '§ 42'}.
|
||||
"""
|
||||
if not source:
|
||||
return None
|
||||
|
||||
# Try Art./Artikel pattern
|
||||
m = _SOURCE_ARTICLE_RE.match(source)
|
||||
if m:
|
||||
return {"name": m.group(1).strip(), "article": m.group(2).strip()}
|
||||
|
||||
# Try § pattern
|
||||
m2 = re.match(r"^(.+?)\s+(§\s*\d+.*)$", source)
|
||||
if m2:
|
||||
return {"name": m2.group(1).strip(), "article": m2.group(2).strip()}
|
||||
|
||||
return None
|
||||
|
||||
|
||||
async def _llm_ollama(prompt: str, system_prompt: Optional[str] = None) -> str:
|
||||
"""Call Ollama chat API for backfill matching."""
|
||||
messages = []
|
||||
if system_prompt:
|
||||
messages.append({"role": "system", "content": system_prompt})
|
||||
messages.append({"role": "user", "content": prompt})
|
||||
|
||||
payload = {
|
||||
"model": OLLAMA_MODEL,
|
||||
"messages": messages,
|
||||
"stream": False,
|
||||
"format": "json",
|
||||
"options": {"num_predict": 256},
|
||||
"think": False,
|
||||
}
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=LLM_TIMEOUT) as client:
|
||||
resp = await client.post(f"{OLLAMA_URL}/api/chat", json=payload)
|
||||
if resp.status_code != 200:
|
||||
logger.error("Ollama backfill failed %d: %s", resp.status_code, resp.text[:300])
|
||||
return ""
|
||||
data = resp.json()
|
||||
msg = data.get("message", {})
|
||||
if isinstance(msg, dict):
|
||||
return msg.get("content", "")
|
||||
return data.get("response", str(msg))
|
||||
except Exception as e:
|
||||
logger.error("Ollama backfill request failed: %s", e)
|
||||
return ""
|
||||
|
||||
|
||||
def _parse_json(raw: str) -> Optional[dict]:
|
||||
"""Extract JSON object from LLM output."""
|
||||
if not raw:
|
||||
return None
|
||||
# Try direct parse
|
||||
try:
|
||||
return json.loads(raw)
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
# Try extracting from markdown code block
|
||||
m = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", raw, re.DOTALL)
|
||||
if m:
|
||||
try:
|
||||
return json.loads(m.group(1))
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
# Try finding first { ... }
|
||||
m = re.search(r"\{[^{}]*\}", raw)
|
||||
if m:
|
||||
try:
|
||||
return json.loads(m.group(0))
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
return None
|
||||
546
control-pipeline/services/control_composer.py
Normal file
546
control-pipeline/services/control_composer.py
Normal file
@@ -0,0 +1,546 @@
|
||||
"""Control Composer — Pattern + Obligation → Master Control.
|
||||
|
||||
Takes an obligation (from ObligationExtractor) and a matched control pattern
|
||||
(from PatternMatcher), then uses LLM to compose a structured, actionable
|
||||
Master Control. Replaces the old Stage 3 (STRUCTURE/REFORM) with a
|
||||
pattern-guided approach.
|
||||
|
||||
Three composition modes based on license rules:
|
||||
Rule 1: Obligation + Pattern + original text → full control
|
||||
Rule 2: Obligation + Pattern + original text + citation → control
|
||||
Rule 3: Obligation + Pattern (NO original text) → reformulated control
|
||||
|
||||
Fallback: No pattern match → basic generation (tagged needs_pattern_assignment)
|
||||
|
||||
Part of the Multi-Layer Control Architecture (Phase 6 of 8).
|
||||
"""
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Optional
|
||||
|
||||
from services.obligation_extractor import (
|
||||
ObligationMatch,
|
||||
_llm_ollama,
|
||||
_parse_json,
|
||||
)
|
||||
from services.pattern_matcher import (
|
||||
ControlPattern,
|
||||
PatternMatchResult,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
OLLAMA_MODEL = os.getenv("CONTROL_GEN_OLLAMA_MODEL", "qwen3.5:35b-a3b")
|
||||
|
||||
# Valid values for generated control fields
|
||||
VALID_SEVERITIES = {"low", "medium", "high", "critical"}
|
||||
VALID_EFFORTS = {"s", "m", "l", "xl"}
|
||||
VALID_VERIFICATION = {"code_review", "document", "tool", "hybrid"}
|
||||
|
||||
|
||||
@dataclass
|
||||
class ComposedControl:
|
||||
"""A Master Control composed from an obligation + pattern."""
|
||||
|
||||
# Core fields (match canonical_controls schema)
|
||||
control_id: str = ""
|
||||
title: str = ""
|
||||
objective: str = ""
|
||||
rationale: str = ""
|
||||
scope: dict = field(default_factory=dict)
|
||||
requirements: list = field(default_factory=list)
|
||||
test_procedure: list = field(default_factory=list)
|
||||
evidence: list = field(default_factory=list)
|
||||
severity: str = "medium"
|
||||
risk_score: float = 5.0
|
||||
implementation_effort: str = "m"
|
||||
open_anchors: list = field(default_factory=list)
|
||||
release_state: str = "draft"
|
||||
tags: list = field(default_factory=list)
|
||||
# 3-Rule License fields
|
||||
license_rule: Optional[int] = None
|
||||
source_original_text: Optional[str] = None
|
||||
source_citation: Optional[dict] = None
|
||||
customer_visible: bool = True
|
||||
# Classification
|
||||
verification_method: Optional[str] = None
|
||||
category: Optional[str] = None
|
||||
target_audience: Optional[list] = None
|
||||
# Pattern + Obligation linkage
|
||||
pattern_id: Optional[str] = None
|
||||
obligation_ids: list = field(default_factory=list)
|
||||
# Metadata
|
||||
generation_metadata: dict = field(default_factory=dict)
|
||||
composition_method: str = "pattern_guided" # pattern_guided | fallback
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
"""Serialize for DB storage or API response."""
|
||||
return {
|
||||
"control_id": self.control_id,
|
||||
"title": self.title,
|
||||
"objective": self.objective,
|
||||
"rationale": self.rationale,
|
||||
"scope": self.scope,
|
||||
"requirements": self.requirements,
|
||||
"test_procedure": self.test_procedure,
|
||||
"evidence": self.evidence,
|
||||
"severity": self.severity,
|
||||
"risk_score": self.risk_score,
|
||||
"implementation_effort": self.implementation_effort,
|
||||
"open_anchors": self.open_anchors,
|
||||
"release_state": self.release_state,
|
||||
"tags": self.tags,
|
||||
"license_rule": self.license_rule,
|
||||
"source_original_text": self.source_original_text,
|
||||
"source_citation": self.source_citation,
|
||||
"customer_visible": self.customer_visible,
|
||||
"verification_method": self.verification_method,
|
||||
"category": self.category,
|
||||
"target_audience": self.target_audience,
|
||||
"pattern_id": self.pattern_id,
|
||||
"obligation_ids": self.obligation_ids,
|
||||
"generation_metadata": self.generation_metadata,
|
||||
"composition_method": self.composition_method,
|
||||
}
|
||||
|
||||
|
||||
class ControlComposer:
|
||||
"""Composes Master Controls from obligations + patterns.
|
||||
|
||||
Usage::
|
||||
|
||||
composer = ControlComposer()
|
||||
|
||||
control = await composer.compose(
|
||||
obligation=obligation_match,
|
||||
pattern_result=pattern_match_result,
|
||||
chunk_text="...",
|
||||
license_rule=1,
|
||||
source_citation={...},
|
||||
)
|
||||
"""
|
||||
|
||||
async def compose(
|
||||
self,
|
||||
obligation: ObligationMatch,
|
||||
pattern_result: PatternMatchResult,
|
||||
chunk_text: Optional[str] = None,
|
||||
license_rule: int = 3,
|
||||
source_citation: Optional[dict] = None,
|
||||
regulation_code: Optional[str] = None,
|
||||
) -> ComposedControl:
|
||||
"""Compose a Master Control from obligation + pattern.
|
||||
|
||||
Args:
|
||||
obligation: The extracted obligation (from ObligationExtractor).
|
||||
pattern_result: The matched pattern (from PatternMatcher).
|
||||
chunk_text: Original RAG chunk text (only used for Rules 1-2).
|
||||
license_rule: 1=free, 2=citation, 3=restricted.
|
||||
source_citation: Citation metadata for Rule 2.
|
||||
regulation_code: Source regulation code.
|
||||
|
||||
Returns:
|
||||
ComposedControl ready for storage.
|
||||
"""
|
||||
pattern = pattern_result.pattern if pattern_result else None
|
||||
|
||||
if pattern:
|
||||
control = await self._compose_with_pattern(
|
||||
obligation, pattern, chunk_text, license_rule, source_citation,
|
||||
)
|
||||
else:
|
||||
control = await self._compose_fallback(
|
||||
obligation, chunk_text, license_rule, source_citation,
|
||||
)
|
||||
|
||||
# Set linkage fields
|
||||
control.pattern_id = pattern.id if pattern else None
|
||||
if obligation.obligation_id:
|
||||
control.obligation_ids = [obligation.obligation_id]
|
||||
|
||||
# Set license fields
|
||||
control.license_rule = license_rule
|
||||
if license_rule in (1, 2) and chunk_text:
|
||||
control.source_original_text = chunk_text
|
||||
if license_rule == 2 and source_citation:
|
||||
control.source_citation = source_citation
|
||||
if license_rule == 3:
|
||||
control.customer_visible = False
|
||||
control.source_original_text = None
|
||||
control.source_citation = None
|
||||
|
||||
# Build metadata
|
||||
control.generation_metadata = {
|
||||
"composition_method": control.composition_method,
|
||||
"pattern_id": control.pattern_id,
|
||||
"pattern_confidence": round(pattern_result.confidence, 3) if pattern_result else 0,
|
||||
"pattern_method": pattern_result.method if pattern_result else "none",
|
||||
"obligation_id": obligation.obligation_id,
|
||||
"obligation_method": obligation.method,
|
||||
"obligation_confidence": round(obligation.confidence, 3),
|
||||
"license_rule": license_rule,
|
||||
"regulation_code": regulation_code,
|
||||
}
|
||||
|
||||
# Validate and fix fields
|
||||
_validate_control(control)
|
||||
|
||||
return control
|
||||
|
||||
async def compose_batch(
|
||||
self,
|
||||
items: list[dict],
|
||||
) -> list[ComposedControl]:
|
||||
"""Compose multiple controls.
|
||||
|
||||
Args:
|
||||
items: List of dicts with keys: obligation, pattern_result,
|
||||
chunk_text, license_rule, source_citation, regulation_code.
|
||||
|
||||
Returns:
|
||||
List of ComposedControl instances.
|
||||
"""
|
||||
results = []
|
||||
for item in items:
|
||||
control = await self.compose(
|
||||
obligation=item["obligation"],
|
||||
pattern_result=item.get("pattern_result", PatternMatchResult()),
|
||||
chunk_text=item.get("chunk_text"),
|
||||
license_rule=item.get("license_rule", 3),
|
||||
source_citation=item.get("source_citation"),
|
||||
regulation_code=item.get("regulation_code"),
|
||||
)
|
||||
results.append(control)
|
||||
return results
|
||||
|
||||
# -----------------------------------------------------------------------
|
||||
# Pattern-guided composition
|
||||
# -----------------------------------------------------------------------
|
||||
|
||||
async def _compose_with_pattern(
|
||||
self,
|
||||
obligation: ObligationMatch,
|
||||
pattern: ControlPattern,
|
||||
chunk_text: Optional[str],
|
||||
license_rule: int,
|
||||
source_citation: Optional[dict],
|
||||
) -> ComposedControl:
|
||||
"""Use LLM to fill the pattern template with obligation-specific details."""
|
||||
prompt = _build_compose_prompt(obligation, pattern, chunk_text, license_rule)
|
||||
system_prompt = _compose_system_prompt(license_rule)
|
||||
|
||||
llm_result = await _llm_ollama(prompt, system_prompt)
|
||||
if not llm_result:
|
||||
return self._compose_from_template(obligation, pattern)
|
||||
|
||||
parsed = _parse_json(llm_result)
|
||||
if not parsed:
|
||||
return self._compose_from_template(obligation, pattern)
|
||||
|
||||
control = ComposedControl(
|
||||
title=parsed.get("title", pattern.name_de)[:255],
|
||||
objective=parsed.get("objective", pattern.objective_template),
|
||||
rationale=parsed.get("rationale", pattern.rationale_template),
|
||||
requirements=_ensure_list(parsed.get("requirements", pattern.requirements_template)),
|
||||
test_procedure=_ensure_list(parsed.get("test_procedure", pattern.test_procedure_template)),
|
||||
evidence=_ensure_list(parsed.get("evidence", pattern.evidence_template)),
|
||||
severity=parsed.get("severity", pattern.severity_default),
|
||||
implementation_effort=parsed.get("implementation_effort", pattern.implementation_effort_default),
|
||||
category=parsed.get("category", pattern.category),
|
||||
tags=_ensure_list(parsed.get("tags", pattern.tags)),
|
||||
target_audience=_ensure_list(parsed.get("target_audience", [])),
|
||||
verification_method=parsed.get("verification_method"),
|
||||
open_anchors=_anchors_from_pattern(pattern),
|
||||
composition_method="pattern_guided",
|
||||
)
|
||||
|
||||
return control
|
||||
|
||||
def _compose_from_template(
|
||||
self,
|
||||
obligation: ObligationMatch,
|
||||
pattern: ControlPattern,
|
||||
) -> ComposedControl:
|
||||
"""Fallback: fill template directly without LLM (when LLM fails)."""
|
||||
obl_title = obligation.obligation_title or ""
|
||||
obl_text = obligation.obligation_text or ""
|
||||
|
||||
title = f"{pattern.name_de}"
|
||||
if obl_title:
|
||||
title = f"{pattern.name_de} — {obl_title}"
|
||||
|
||||
objective = pattern.objective_template
|
||||
if obl_text and len(obl_text) > 20:
|
||||
objective = f"{pattern.objective_template} Bezug: {obl_text[:200]}"
|
||||
|
||||
return ComposedControl(
|
||||
title=title[:255],
|
||||
objective=objective,
|
||||
rationale=pattern.rationale_template,
|
||||
requirements=list(pattern.requirements_template),
|
||||
test_procedure=list(pattern.test_procedure_template),
|
||||
evidence=list(pattern.evidence_template),
|
||||
severity=pattern.severity_default,
|
||||
implementation_effort=pattern.implementation_effort_default,
|
||||
category=pattern.category,
|
||||
tags=list(pattern.tags),
|
||||
open_anchors=_anchors_from_pattern(pattern),
|
||||
composition_method="template_only",
|
||||
)
|
||||
|
||||
# -----------------------------------------------------------------------
|
||||
# Fallback (no pattern)
|
||||
# -----------------------------------------------------------------------
|
||||
|
||||
async def _compose_fallback(
|
||||
self,
|
||||
obligation: ObligationMatch,
|
||||
chunk_text: Optional[str],
|
||||
license_rule: int,
|
||||
source_citation: Optional[dict],
|
||||
) -> ComposedControl:
|
||||
"""Generate a control without a pattern template (old-style)."""
|
||||
prompt = _build_fallback_prompt(obligation, chunk_text, license_rule)
|
||||
system_prompt = _compose_system_prompt(license_rule)
|
||||
|
||||
llm_result = await _llm_ollama(prompt, system_prompt)
|
||||
parsed = _parse_json(llm_result) if llm_result else {}
|
||||
|
||||
obl_text = obligation.obligation_text or ""
|
||||
|
||||
control = ComposedControl(
|
||||
title=parsed.get("title", obl_text[:100] if obl_text else "Untitled Control")[:255],
|
||||
objective=parsed.get("objective", obl_text[:500]),
|
||||
rationale=parsed.get("rationale", "Aus gesetzlicher Pflicht abgeleitet."),
|
||||
requirements=_ensure_list(parsed.get("requirements", [])),
|
||||
test_procedure=_ensure_list(parsed.get("test_procedure", [])),
|
||||
evidence=_ensure_list(parsed.get("evidence", [])),
|
||||
severity=parsed.get("severity", "medium"),
|
||||
implementation_effort=parsed.get("implementation_effort", "m"),
|
||||
category=parsed.get("category"),
|
||||
tags=_ensure_list(parsed.get("tags", [])),
|
||||
target_audience=_ensure_list(parsed.get("target_audience", [])),
|
||||
verification_method=parsed.get("verification_method"),
|
||||
composition_method="fallback",
|
||||
release_state="needs_review",
|
||||
)
|
||||
|
||||
return control
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Prompt builders
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _compose_system_prompt(license_rule: int) -> str:
|
||||
"""Build the system prompt based on license rule."""
|
||||
if license_rule == 3:
|
||||
return (
|
||||
"Du bist ein Security-Compliance-Experte. Deine Aufgabe ist es, "
|
||||
"eigenstaendige Security Controls zu formulieren. "
|
||||
"Du formulierst IMMER in eigenen Worten. "
|
||||
"KOPIERE KEINE Saetze aus dem Quelltext. "
|
||||
"Verwende eigene Begriffe und Struktur. "
|
||||
"NENNE NICHT die Quelle. Keine proprietaeren Bezeichner. "
|
||||
"Antworte NUR mit validem JSON."
|
||||
)
|
||||
return (
|
||||
"Du bist ein Security-Compliance-Experte. "
|
||||
"Erstelle ein praxisorientiertes, umsetzbares Security Control. "
|
||||
"Antworte NUR mit validem JSON."
|
||||
)
|
||||
|
||||
|
||||
def _build_compose_prompt(
|
||||
obligation: ObligationMatch,
|
||||
pattern: ControlPattern,
|
||||
chunk_text: Optional[str],
|
||||
license_rule: int,
|
||||
) -> str:
|
||||
"""Build the LLM prompt for pattern-guided composition."""
|
||||
obl_section = _obligation_section(obligation)
|
||||
pattern_section = _pattern_section(pattern)
|
||||
|
||||
if license_rule == 3:
|
||||
context_section = "KONTEXT: Intern analysiert (keine Quellenangabe)."
|
||||
elif chunk_text:
|
||||
context_section = f"KONTEXT (Originaltext):\n{chunk_text[:2000]}"
|
||||
else:
|
||||
context_section = "KONTEXT: Kein Originaltext verfuegbar."
|
||||
|
||||
return f"""Erstelle ein PRAXISORIENTIERTES Security Control.
|
||||
|
||||
{obl_section}
|
||||
|
||||
{pattern_section}
|
||||
|
||||
{context_section}
|
||||
|
||||
AUFGABE:
|
||||
Fuelle das Muster mit pflicht-spezifischen Details.
|
||||
Das Ergebnis muss UMSETZBAR sein — keine Gesetzesparaphrase.
|
||||
Formuliere konkret und handlungsorientiert.
|
||||
|
||||
Antworte als JSON:
|
||||
{{
|
||||
"title": "Kurzer praegnanter Titel (max 100 Zeichen, deutsch)",
|
||||
"objective": "Was soll erreicht werden? (1-3 Saetze)",
|
||||
"rationale": "Warum ist das wichtig? (1-2 Saetze)",
|
||||
"requirements": ["Konkrete Anforderung 1", "Anforderung 2", ...],
|
||||
"test_procedure": ["Pruefschritt 1", "Pruefschritt 2", ...],
|
||||
"evidence": ["Nachweis 1", "Nachweis 2", ...],
|
||||
"severity": "low|medium|high|critical",
|
||||
"implementation_effort": "s|m|l|xl",
|
||||
"category": "{pattern.category}",
|
||||
"tags": ["tag1", "tag2"],
|
||||
"target_audience": ["unternehmen", "behoerden", "entwickler"],
|
||||
"verification_method": "code_review|document|tool|hybrid"
|
||||
}}"""
|
||||
|
||||
|
||||
def _build_fallback_prompt(
|
||||
obligation: ObligationMatch,
|
||||
chunk_text: Optional[str],
|
||||
license_rule: int,
|
||||
) -> str:
|
||||
"""Build the LLM prompt for fallback composition (no pattern)."""
|
||||
obl_section = _obligation_section(obligation)
|
||||
|
||||
if license_rule == 3:
|
||||
context_section = "KONTEXT: Intern analysiert (keine Quellenangabe)."
|
||||
elif chunk_text:
|
||||
context_section = f"KONTEXT (Originaltext):\n{chunk_text[:2000]}"
|
||||
else:
|
||||
context_section = "KONTEXT: Kein Originaltext verfuegbar."
|
||||
|
||||
return f"""Erstelle ein Security Control aus der folgenden Pflicht.
|
||||
|
||||
{obl_section}
|
||||
|
||||
{context_section}
|
||||
|
||||
AUFGABE:
|
||||
Formuliere ein umsetzbares Security Control.
|
||||
Keine Gesetzesparaphrase — konkrete Massnahmen beschreiben.
|
||||
|
||||
Antworte als JSON:
|
||||
{{
|
||||
"title": "Kurzer praegnanter Titel (max 100 Zeichen, deutsch)",
|
||||
"objective": "Was soll erreicht werden? (1-3 Saetze)",
|
||||
"rationale": "Warum ist das wichtig? (1-2 Saetze)",
|
||||
"requirements": ["Konkrete Anforderung 1", "Anforderung 2", ...],
|
||||
"test_procedure": ["Pruefschritt 1", "Pruefschritt 2", ...],
|
||||
"evidence": ["Nachweis 1", "Nachweis 2", ...],
|
||||
"severity": "low|medium|high|critical",
|
||||
"implementation_effort": "s|m|l|xl",
|
||||
"category": "one of: authentication, encryption, data_protection, etc.",
|
||||
"tags": ["tag1", "tag2"],
|
||||
"target_audience": ["unternehmen"],
|
||||
"verification_method": "code_review|document|tool|hybrid"
|
||||
}}"""
|
||||
|
||||
|
||||
def _obligation_section(obligation: ObligationMatch) -> str:
|
||||
"""Format the obligation for the prompt."""
|
||||
parts = ["PFLICHT (was das Gesetz verlangt):"]
|
||||
if obligation.obligation_title:
|
||||
parts.append(f" Titel: {obligation.obligation_title}")
|
||||
if obligation.obligation_text:
|
||||
parts.append(f" Beschreibung: {obligation.obligation_text[:500]}")
|
||||
if obligation.obligation_id:
|
||||
parts.append(f" ID: {obligation.obligation_id}")
|
||||
if obligation.regulation_id:
|
||||
parts.append(f" Rechtsgrundlage: {obligation.regulation_id}")
|
||||
if not obligation.obligation_text and not obligation.obligation_title:
|
||||
parts.append(" (Keine spezifische Pflicht extrahiert)")
|
||||
return "\n".join(parts)
|
||||
|
||||
|
||||
def _pattern_section(pattern: ControlPattern) -> str:
|
||||
"""Format the pattern for the prompt."""
|
||||
reqs = "\n ".join(f"- {r}" for r in pattern.requirements_template[:5])
|
||||
tests = "\n ".join(f"- {t}" for t in pattern.test_procedure_template[:3])
|
||||
return f"""MUSTER (wie man es typischerweise umsetzt):
|
||||
Pattern: {pattern.name_de} ({pattern.id})
|
||||
Domain: {pattern.domain}
|
||||
Ziel-Template: {pattern.objective_template}
|
||||
Anforderungs-Template:
|
||||
{reqs}
|
||||
Pruefverfahren-Template:
|
||||
{tests}"""
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _ensure_list(value) -> list:
|
||||
"""Ensure a value is a list of strings."""
|
||||
if isinstance(value, list):
|
||||
return [str(v) for v in value if v]
|
||||
if isinstance(value, str):
|
||||
return [value]
|
||||
return []
|
||||
|
||||
|
||||
def _anchors_from_pattern(pattern: ControlPattern) -> list:
|
||||
"""Convert pattern's open_anchor_refs to control anchor format."""
|
||||
anchors = []
|
||||
for ref in pattern.open_anchor_refs:
|
||||
anchors.append({
|
||||
"framework": ref.get("framework", ""),
|
||||
"control_id": ref.get("ref", ""),
|
||||
"title": "",
|
||||
"alignment_score": 0.8,
|
||||
})
|
||||
return anchors
|
||||
|
||||
|
||||
def _validate_control(control: ComposedControl) -> None:
|
||||
"""Validate and fix control field values."""
|
||||
# Severity
|
||||
if control.severity not in VALID_SEVERITIES:
|
||||
control.severity = "medium"
|
||||
|
||||
# Implementation effort
|
||||
if control.implementation_effort not in VALID_EFFORTS:
|
||||
control.implementation_effort = "m"
|
||||
|
||||
# Verification method
|
||||
if control.verification_method and control.verification_method not in VALID_VERIFICATION:
|
||||
control.verification_method = None
|
||||
|
||||
# Risk score
|
||||
if not (0 <= control.risk_score <= 10):
|
||||
control.risk_score = _severity_to_risk(control.severity)
|
||||
|
||||
# Title length
|
||||
if len(control.title) > 255:
|
||||
control.title = control.title[:252] + "..."
|
||||
|
||||
# Ensure minimum content
|
||||
if not control.objective:
|
||||
control.objective = control.title
|
||||
if not control.rationale:
|
||||
control.rationale = "Aus regulatorischer Anforderung abgeleitet."
|
||||
if not control.requirements:
|
||||
control.requirements = ["Anforderung gemaess Pflichtbeschreibung umsetzen"]
|
||||
if not control.test_procedure:
|
||||
control.test_procedure = ["Umsetzung der Anforderungen pruefen"]
|
||||
if not control.evidence:
|
||||
control.evidence = ["Dokumentation der Umsetzung"]
|
||||
|
||||
|
||||
def _severity_to_risk(severity: str) -> float:
|
||||
"""Map severity to a default risk score."""
|
||||
return {
|
||||
"critical": 9.0,
|
||||
"high": 7.0,
|
||||
"medium": 5.0,
|
||||
"low": 3.0,
|
||||
}.get(severity, 5.0)
|
||||
745
control-pipeline/services/control_dedup.py
Normal file
745
control-pipeline/services/control_dedup.py
Normal file
@@ -0,0 +1,745 @@
|
||||
"""Control Deduplication Engine — 4-Stage Matching Pipeline.
|
||||
|
||||
Prevents duplicate atomic controls during Pass 0b by checking candidates
|
||||
against existing controls before insertion.
|
||||
|
||||
Stages:
|
||||
1. Pattern-Gate: pattern_id must match (hard gate)
|
||||
2. Action-Check: normalized action verb must match (hard gate)
|
||||
3. Object-Norm: normalized object must match (soft gate with high threshold)
|
||||
4. Embedding: cosine similarity with tiered thresholds (Qdrant)
|
||||
|
||||
Verdicts:
|
||||
- NEW: create a new atomic control
|
||||
- LINK: add parent link to existing control (similarity > LINK_THRESHOLD)
|
||||
- REVIEW: queue for human review (REVIEW_THRESHOLD < sim < LINK_THRESHOLD)
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Optional, Callable, Awaitable
|
||||
|
||||
import httpx
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ── Configuration ────────────────────────────────────────────────────
|
||||
|
||||
DEDUP_ENABLED = os.getenv("DEDUP_ENABLED", "true").lower() == "true"
|
||||
LINK_THRESHOLD = float(os.getenv("DEDUP_LINK_THRESHOLD", "0.92"))
|
||||
REVIEW_THRESHOLD = float(os.getenv("DEDUP_REVIEW_THRESHOLD", "0.85"))
|
||||
LINK_THRESHOLD_DIFF_OBJECT = float(os.getenv("DEDUP_LINK_THRESHOLD_DIFF_OBJ", "0.95"))
|
||||
CROSS_REG_LINK_THRESHOLD = float(os.getenv("DEDUP_CROSS_REG_THRESHOLD", "0.95"))
|
||||
QDRANT_COLLECTION = os.getenv("DEDUP_QDRANT_COLLECTION", "atomic_controls")
|
||||
QDRANT_URL = os.getenv("QDRANT_URL", "http://host.docker.internal:6333")
|
||||
EMBEDDING_URL = os.getenv("EMBEDDING_URL", "http://embedding-service:8087")
|
||||
|
||||
|
||||
# ── Result Dataclass ─────────────────────────────────────────────────
|
||||
|
||||
@dataclass
|
||||
class DedupResult:
|
||||
"""Outcome of the dedup check."""
|
||||
verdict: str # "new" | "link" | "review"
|
||||
matched_control_uuid: Optional[str] = None
|
||||
matched_control_id: Optional[str] = None
|
||||
matched_title: Optional[str] = None
|
||||
stage: str = "" # which stage decided
|
||||
similarity_score: float = 0.0
|
||||
link_type: str = "dedup_merge" # "dedup_merge" | "cross_regulation"
|
||||
details: dict = field(default_factory=dict)
|
||||
|
||||
|
||||
# ── Action Normalization ─────────────────────────────────────────────
|
||||
|
||||
_ACTION_SYNONYMS: dict[str, str] = {
|
||||
# German → canonical English
|
||||
"implementieren": "implement",
|
||||
"umsetzen": "implement",
|
||||
"einrichten": "implement",
|
||||
"einführen": "implement",
|
||||
"aufbauen": "implement",
|
||||
"bereitstellen": "implement",
|
||||
"aktivieren": "implement",
|
||||
"konfigurieren": "configure",
|
||||
"einstellen": "configure",
|
||||
"parametrieren": "configure",
|
||||
"testen": "test",
|
||||
"prüfen": "test",
|
||||
"überprüfen": "test",
|
||||
"verifizieren": "test",
|
||||
"validieren": "test",
|
||||
"kontrollieren": "test",
|
||||
"auditieren": "audit",
|
||||
"dokumentieren": "document",
|
||||
"protokollieren": "log",
|
||||
"aufzeichnen": "log",
|
||||
"loggen": "log",
|
||||
"überwachen": "monitor",
|
||||
"monitoring": "monitor",
|
||||
"beobachten": "monitor",
|
||||
"schulen": "train",
|
||||
"trainieren": "train",
|
||||
"sensibilisieren": "train",
|
||||
"löschen": "delete",
|
||||
"entfernen": "delete",
|
||||
"verschlüsseln": "encrypt",
|
||||
"sperren": "block",
|
||||
"beschränken": "restrict",
|
||||
"einschränken": "restrict",
|
||||
"begrenzen": "restrict",
|
||||
"autorisieren": "authorize",
|
||||
"genehmigen": "authorize",
|
||||
"freigeben": "authorize",
|
||||
"authentifizieren": "authenticate",
|
||||
"identifizieren": "identify",
|
||||
"melden": "report",
|
||||
"benachrichtigen": "notify",
|
||||
"informieren": "notify",
|
||||
"aktualisieren": "update",
|
||||
"erneuern": "update",
|
||||
"sichern": "backup",
|
||||
"wiederherstellen": "restore",
|
||||
# English passthrough
|
||||
"implement": "implement",
|
||||
"configure": "configure",
|
||||
"test": "test",
|
||||
"verify": "test",
|
||||
"validate": "test",
|
||||
"audit": "audit",
|
||||
"document": "document",
|
||||
"log": "log",
|
||||
"monitor": "monitor",
|
||||
"train": "train",
|
||||
"delete": "delete",
|
||||
"encrypt": "encrypt",
|
||||
"restrict": "restrict",
|
||||
"authorize": "authorize",
|
||||
"authenticate": "authenticate",
|
||||
"report": "report",
|
||||
"update": "update",
|
||||
"backup": "backup",
|
||||
"restore": "restore",
|
||||
}
|
||||
|
||||
|
||||
def normalize_action(action: str) -> str:
|
||||
"""Normalize an action verb to a canonical English form."""
|
||||
if not action:
|
||||
return ""
|
||||
action = action.strip().lower()
|
||||
# Strip German infinitive/conjugation suffixes for lookup
|
||||
action_base = re.sub(r"(en|t|st|e|te|tet|end)$", "", action)
|
||||
# Try exact match first, then base form
|
||||
if action in _ACTION_SYNONYMS:
|
||||
return _ACTION_SYNONYMS[action]
|
||||
if action_base in _ACTION_SYNONYMS:
|
||||
return _ACTION_SYNONYMS[action_base]
|
||||
# Fuzzy: check if action starts with any known verb
|
||||
for verb, canonical in _ACTION_SYNONYMS.items():
|
||||
if action.startswith(verb) or verb.startswith(action):
|
||||
return canonical
|
||||
return action # fallback: return as-is
|
||||
|
||||
|
||||
# ── Object Normalization ─────────────────────────────────────────────
|
||||
|
||||
_OBJECT_SYNONYMS: dict[str, str] = {
|
||||
# Authentication / Access
|
||||
"mfa": "multi_factor_auth",
|
||||
"multi-faktor-authentifizierung": "multi_factor_auth",
|
||||
"mehrfaktorauthentifizierung": "multi_factor_auth",
|
||||
"multi-factor authentication": "multi_factor_auth",
|
||||
"two-factor": "multi_factor_auth",
|
||||
"2fa": "multi_factor_auth",
|
||||
"passwort": "password_policy",
|
||||
"kennwort": "password_policy",
|
||||
"password": "password_policy",
|
||||
"zugangsdaten": "credentials",
|
||||
"credentials": "credentials",
|
||||
"admin-konten": "privileged_access",
|
||||
"admin accounts": "privileged_access",
|
||||
"administratorkonten": "privileged_access",
|
||||
"privilegierte zugriffe": "privileged_access",
|
||||
"privileged accounts": "privileged_access",
|
||||
"remote-zugriff": "remote_access",
|
||||
"fernzugriff": "remote_access",
|
||||
"remote access": "remote_access",
|
||||
"session": "session_management",
|
||||
"sitzung": "session_management",
|
||||
"sitzungsverwaltung": "session_management",
|
||||
# Encryption
|
||||
"verschlüsselung": "encryption",
|
||||
"encryption": "encryption",
|
||||
"kryptografie": "encryption",
|
||||
"kryptografische verfahren": "encryption",
|
||||
"schlüssel": "key_management",
|
||||
"key management": "key_management",
|
||||
"schlüsselverwaltung": "key_management",
|
||||
"zertifikat": "certificate_management",
|
||||
"certificate": "certificate_management",
|
||||
"tls": "transport_encryption",
|
||||
"ssl": "transport_encryption",
|
||||
"https": "transport_encryption",
|
||||
# Network
|
||||
"firewall": "firewall",
|
||||
"netzwerk": "network_security",
|
||||
"network": "network_security",
|
||||
"vpn": "vpn",
|
||||
"segmentierung": "network_segmentation",
|
||||
"segmentation": "network_segmentation",
|
||||
# Logging / Monitoring
|
||||
"audit-log": "audit_logging",
|
||||
"audit log": "audit_logging",
|
||||
"protokoll": "audit_logging",
|
||||
"logging": "audit_logging",
|
||||
"monitoring": "monitoring",
|
||||
"überwachung": "monitoring",
|
||||
"alerting": "alerting",
|
||||
"alarmierung": "alerting",
|
||||
"siem": "siem",
|
||||
# Data
|
||||
"personenbezogene daten": "personal_data",
|
||||
"personal data": "personal_data",
|
||||
"sensible daten": "sensitive_data",
|
||||
"sensitive data": "sensitive_data",
|
||||
"datensicherung": "backup",
|
||||
"backup": "backup",
|
||||
"wiederherstellung": "disaster_recovery",
|
||||
"disaster recovery": "disaster_recovery",
|
||||
# Policy / Process
|
||||
"richtlinie": "policy",
|
||||
"policy": "policy",
|
||||
"verfahrensanweisung": "procedure",
|
||||
"procedure": "procedure",
|
||||
"prozess": "process",
|
||||
"schulung": "training",
|
||||
"training": "training",
|
||||
"awareness": "awareness",
|
||||
"sensibilisierung": "awareness",
|
||||
# Incident
|
||||
"vorfall": "incident",
|
||||
"incident": "incident",
|
||||
"sicherheitsvorfall": "security_incident",
|
||||
"security incident": "security_incident",
|
||||
# Vulnerability
|
||||
"schwachstelle": "vulnerability",
|
||||
"vulnerability": "vulnerability",
|
||||
"patch": "patch_management",
|
||||
"update": "patch_management",
|
||||
"patching": "patch_management",
|
||||
}
|
||||
|
||||
# Precompile for substring matching (longest first)
|
||||
_OBJECT_KEYS_SORTED = sorted(_OBJECT_SYNONYMS.keys(), key=len, reverse=True)
|
||||
|
||||
|
||||
def normalize_object(obj: str) -> str:
|
||||
"""Normalize a compliance object to a canonical token."""
|
||||
if not obj:
|
||||
return ""
|
||||
obj_lower = obj.strip().lower()
|
||||
# Exact match
|
||||
if obj_lower in _OBJECT_SYNONYMS:
|
||||
return _OBJECT_SYNONYMS[obj_lower]
|
||||
# Substring match (longest first)
|
||||
for phrase in _OBJECT_KEYS_SORTED:
|
||||
if phrase in obj_lower:
|
||||
return _OBJECT_SYNONYMS[phrase]
|
||||
# Fallback: strip articles/prepositions, join with underscore
|
||||
cleaned = re.sub(r"\b(der|die|das|den|dem|des|ein|eine|eines|einem|einen"
|
||||
r"|für|von|zu|auf|in|an|bei|mit|nach|über|unter|the|a|an"
|
||||
r"|for|of|to|on|in|at|by|with)\b", "", obj_lower)
|
||||
tokens = [t for t in cleaned.split() if len(t) > 2]
|
||||
return "_".join(tokens[:4]) if tokens else obj_lower.replace(" ", "_")
|
||||
|
||||
|
||||
# ── Canonicalization ─────────────────────────────────────────────────
|
||||
|
||||
def canonicalize_text(action: str, obj: str, title: str = "") -> str:
|
||||
"""Build a canonical English text for embedding.
|
||||
|
||||
Transforms German compliance text into normalized English tokens
|
||||
for more stable embedding comparisons.
|
||||
"""
|
||||
norm_action = normalize_action(action)
|
||||
norm_object = normalize_object(obj)
|
||||
# Build canonical sentence
|
||||
parts = [norm_action, norm_object]
|
||||
if title:
|
||||
# Add title keywords (stripped of common filler)
|
||||
title_clean = re.sub(
|
||||
r"\b(und|oder|für|von|zu|der|die|das|den|dem|des|ein|eine"
|
||||
r"|bei|mit|nach|gemäß|gem\.|laut|entsprechend)\b",
|
||||
"", title.lower()
|
||||
)
|
||||
title_tokens = [t for t in title_clean.split() if len(t) > 3][:5]
|
||||
if title_tokens:
|
||||
parts.append("for")
|
||||
parts.extend(title_tokens)
|
||||
return " ".join(parts)
|
||||
|
||||
|
||||
# ── Embedding Helper ─────────────────────────────────────────────────
|
||||
|
||||
async def get_embedding(text: str) -> list[float]:
|
||||
"""Get embedding vector for a single text via embedding service."""
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
resp = await client.post(
|
||||
f"{EMBEDDING_URL}/embed",
|
||||
json={"texts": [text]},
|
||||
)
|
||||
embeddings = resp.json().get("embeddings", [])
|
||||
return embeddings[0] if embeddings else []
|
||||
except Exception as e:
|
||||
logger.warning("Embedding failed: %s", e)
|
||||
return []
|
||||
|
||||
|
||||
def cosine_similarity(a: list[float], b: list[float]) -> float:
|
||||
"""Compute cosine similarity between two vectors."""
|
||||
if not a or not b or len(a) != len(b):
|
||||
return 0.0
|
||||
dot = sum(x * y for x, y in zip(a, b))
|
||||
norm_a = sum(x * x for x in a) ** 0.5
|
||||
norm_b = sum(x * x for x in b) ** 0.5
|
||||
if norm_a == 0 or norm_b == 0:
|
||||
return 0.0
|
||||
return dot / (norm_a * norm_b)
|
||||
|
||||
|
||||
# ── Qdrant Helpers ───────────────────────────────────────────────────
|
||||
|
||||
async def qdrant_search(
|
||||
embedding: list[float],
|
||||
pattern_id: str,
|
||||
top_k: int = 10,
|
||||
collection: Optional[str] = None,
|
||||
) -> list[dict]:
|
||||
"""Search Qdrant for similar atomic controls, filtered by pattern_id."""
|
||||
if not embedding:
|
||||
return []
|
||||
coll = collection or QDRANT_COLLECTION
|
||||
body: dict = {
|
||||
"vector": embedding,
|
||||
"limit": top_k,
|
||||
"with_payload": True,
|
||||
"filter": {
|
||||
"must": [
|
||||
{"key": "pattern_id", "match": {"value": pattern_id}}
|
||||
]
|
||||
},
|
||||
}
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
resp = await client.post(
|
||||
f"{QDRANT_URL}/collections/{coll}/points/search",
|
||||
json=body,
|
||||
)
|
||||
if resp.status_code != 200:
|
||||
logger.warning("Qdrant search failed: %d", resp.status_code)
|
||||
return []
|
||||
return resp.json().get("result", [])
|
||||
except Exception as e:
|
||||
logger.warning("Qdrant search error: %s", e)
|
||||
return []
|
||||
|
||||
|
||||
async def qdrant_search_cross_regulation(
|
||||
embedding: list[float],
|
||||
top_k: int = 5,
|
||||
collection: Optional[str] = None,
|
||||
) -> list[dict]:
|
||||
"""Search Qdrant for similar controls across ALL regulations (no pattern_id filter).
|
||||
|
||||
Used for cross-regulation linking (e.g. DSGVO Art. 25 ↔ NIS2 Art. 21).
|
||||
"""
|
||||
if not embedding:
|
||||
return []
|
||||
coll = collection or QDRANT_COLLECTION
|
||||
body: dict = {
|
||||
"vector": embedding,
|
||||
"limit": top_k,
|
||||
"with_payload": True,
|
||||
}
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
resp = await client.post(
|
||||
f"{QDRANT_URL}/collections/{coll}/points/search",
|
||||
json=body,
|
||||
)
|
||||
if resp.status_code != 200:
|
||||
logger.warning("Qdrant cross-reg search failed: %d", resp.status_code)
|
||||
return []
|
||||
return resp.json().get("result", [])
|
||||
except Exception as e:
|
||||
logger.warning("Qdrant cross-reg search error: %s", e)
|
||||
return []
|
||||
|
||||
|
||||
async def qdrant_upsert(
|
||||
point_id: str,
|
||||
embedding: list[float],
|
||||
payload: dict,
|
||||
collection: Optional[str] = None,
|
||||
) -> bool:
|
||||
"""Upsert a single point into a Qdrant collection."""
|
||||
if not embedding:
|
||||
return False
|
||||
coll = collection or QDRANT_COLLECTION
|
||||
body = {
|
||||
"points": [{
|
||||
"id": point_id,
|
||||
"vector": embedding,
|
||||
"payload": payload,
|
||||
}]
|
||||
}
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
resp = await client.put(
|
||||
f"{QDRANT_URL}/collections/{coll}/points",
|
||||
json=body,
|
||||
)
|
||||
return resp.status_code == 200
|
||||
except Exception as e:
|
||||
logger.warning("Qdrant upsert error: %s", e)
|
||||
return False
|
||||
|
||||
|
||||
async def ensure_qdrant_collection(
|
||||
vector_size: int = 1024,
|
||||
collection: Optional[str] = None,
|
||||
) -> bool:
|
||||
"""Create a Qdrant collection if it doesn't exist (idempotent)."""
|
||||
coll = collection or QDRANT_COLLECTION
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
# Check if exists
|
||||
resp = await client.get(f"{QDRANT_URL}/collections/{coll}")
|
||||
if resp.status_code == 200:
|
||||
return True
|
||||
# Create
|
||||
resp = await client.put(
|
||||
f"{QDRANT_URL}/collections/{coll}",
|
||||
json={
|
||||
"vectors": {"size": vector_size, "distance": "Cosine"},
|
||||
},
|
||||
)
|
||||
if resp.status_code == 200:
|
||||
logger.info("Created Qdrant collection: %s", coll)
|
||||
# Create payload indexes
|
||||
for field_name in ["pattern_id", "action_normalized", "object_normalized", "control_id"]:
|
||||
await client.put(
|
||||
f"{QDRANT_URL}/collections/{coll}/index",
|
||||
json={"field_name": field_name, "field_schema": "keyword"},
|
||||
)
|
||||
return True
|
||||
logger.error("Failed to create Qdrant collection: %d", resp.status_code)
|
||||
return False
|
||||
except Exception as e:
|
||||
logger.warning("Qdrant collection check error: %s", e)
|
||||
return False
|
||||
|
||||
|
||||
# ── Main Dedup Checker ───────────────────────────────────────────────
|
||||
|
||||
class ControlDedupChecker:
|
||||
"""4-stage dedup checker for atomic controls.
|
||||
|
||||
Usage:
|
||||
checker = ControlDedupChecker(db_session)
|
||||
result = await checker.check_duplicate(candidate_action, candidate_object, candidate_title, pattern_id)
|
||||
if result.verdict == "link":
|
||||
checker.add_parent_link(result.matched_control_uuid, parent_uuid)
|
||||
elif result.verdict == "review":
|
||||
checker.write_review(candidate, result)
|
||||
else:
|
||||
# Insert new control
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
db,
|
||||
embed_fn: Optional[Callable[[str], Awaitable[list[float]]]] = None,
|
||||
search_fn: Optional[Callable] = None,
|
||||
):
|
||||
self.db = db
|
||||
self._embed = embed_fn or get_embedding
|
||||
self._search = search_fn or qdrant_search
|
||||
self._cache: dict[str, list[dict]] = {} # pattern_id → existing controls
|
||||
|
||||
def _load_existing(self, pattern_id: str) -> list[dict]:
|
||||
"""Load existing atomic controls with same pattern_id from DB."""
|
||||
if pattern_id in self._cache:
|
||||
return self._cache[pattern_id]
|
||||
from sqlalchemy import text
|
||||
rows = self.db.execute(text("""
|
||||
SELECT id::text, control_id, title, objective,
|
||||
pattern_id,
|
||||
generation_metadata->>'obligation_type' as obligation_type
|
||||
FROM canonical_controls
|
||||
WHERE parent_control_uuid IS NOT NULL
|
||||
AND release_state != 'deprecated'
|
||||
AND pattern_id = :pid
|
||||
"""), {"pid": pattern_id}).fetchall()
|
||||
result = [
|
||||
{
|
||||
"uuid": r[0], "control_id": r[1], "title": r[2],
|
||||
"objective": r[3], "pattern_id": r[4],
|
||||
"obligation_type": r[5],
|
||||
}
|
||||
for r in rows
|
||||
]
|
||||
self._cache[pattern_id] = result
|
||||
return result
|
||||
|
||||
async def check_duplicate(
|
||||
self,
|
||||
action: str,
|
||||
obj: str,
|
||||
title: str,
|
||||
pattern_id: Optional[str],
|
||||
) -> DedupResult:
|
||||
"""Run the 4-stage dedup pipeline + cross-regulation linking.
|
||||
|
||||
Returns DedupResult with verdict: new/link/review.
|
||||
"""
|
||||
# No pattern_id → can't dedup meaningfully
|
||||
if not pattern_id:
|
||||
return DedupResult(verdict="new", stage="no_pattern")
|
||||
|
||||
# Stage 1: Pattern-Gate
|
||||
existing = self._load_existing(pattern_id)
|
||||
if not existing:
|
||||
return DedupResult(
|
||||
verdict="new", stage="pattern_gate",
|
||||
details={"reason": "no existing controls with this pattern_id"},
|
||||
)
|
||||
|
||||
# Stage 2: Action-Check
|
||||
norm_action = normalize_action(action)
|
||||
# We don't have action stored on existing controls from DB directly,
|
||||
# so we use embedding for controls that passed pattern gate.
|
||||
# But we CAN check via generation_metadata if available.
|
||||
|
||||
# Stage 3: Object-Normalization
|
||||
norm_object = normalize_object(obj)
|
||||
|
||||
# Stage 4: Embedding Similarity
|
||||
canonical = canonicalize_text(action, obj, title)
|
||||
embedding = await self._embed(canonical)
|
||||
if not embedding:
|
||||
# Can't compute embedding → default to new
|
||||
return DedupResult(
|
||||
verdict="new", stage="embedding_unavailable",
|
||||
details={"canonical_text": canonical},
|
||||
)
|
||||
|
||||
# Search Qdrant
|
||||
results = await self._search(embedding, pattern_id, top_k=5)
|
||||
|
||||
if not results:
|
||||
# No intra-pattern matches → try cross-regulation
|
||||
return await self._check_cross_regulation(embedding, DedupResult(
|
||||
verdict="new", stage="no_qdrant_matches",
|
||||
details={"canonical_text": canonical, "action": norm_action, "object": norm_object},
|
||||
))
|
||||
|
||||
# Evaluate best match
|
||||
best = results[0]
|
||||
best_score = best.get("score", 0.0)
|
||||
best_payload = best.get("payload", {})
|
||||
best_action = best_payload.get("action_normalized", "")
|
||||
best_object = best_payload.get("object_normalized", "")
|
||||
|
||||
# Action differs → NEW (even if embedding is high)
|
||||
if best_action and norm_action and best_action != norm_action:
|
||||
return await self._check_cross_regulation(embedding, DedupResult(
|
||||
verdict="new", stage="action_mismatch",
|
||||
similarity_score=best_score,
|
||||
matched_control_id=best_payload.get("control_id"),
|
||||
details={
|
||||
"candidate_action": norm_action,
|
||||
"existing_action": best_action,
|
||||
"similarity": best_score,
|
||||
},
|
||||
))
|
||||
|
||||
# Object differs → use higher threshold
|
||||
if best_object and norm_object and best_object != norm_object:
|
||||
if best_score > LINK_THRESHOLD_DIFF_OBJECT:
|
||||
return DedupResult(
|
||||
verdict="link", stage="embedding_diff_object",
|
||||
matched_control_uuid=best_payload.get("control_uuid"),
|
||||
matched_control_id=best_payload.get("control_id"),
|
||||
matched_title=best_payload.get("title"),
|
||||
similarity_score=best_score,
|
||||
details={"candidate_object": norm_object, "existing_object": best_object},
|
||||
)
|
||||
return await self._check_cross_regulation(embedding, DedupResult(
|
||||
verdict="new", stage="object_mismatch_below_threshold",
|
||||
similarity_score=best_score,
|
||||
matched_control_id=best_payload.get("control_id"),
|
||||
details={
|
||||
"candidate_object": norm_object,
|
||||
"existing_object": best_object,
|
||||
"threshold": LINK_THRESHOLD_DIFF_OBJECT,
|
||||
},
|
||||
))
|
||||
|
||||
# Same action + same object → tiered thresholds
|
||||
if best_score > LINK_THRESHOLD:
|
||||
return DedupResult(
|
||||
verdict="link", stage="embedding_match",
|
||||
matched_control_uuid=best_payload.get("control_uuid"),
|
||||
matched_control_id=best_payload.get("control_id"),
|
||||
matched_title=best_payload.get("title"),
|
||||
similarity_score=best_score,
|
||||
)
|
||||
if best_score > REVIEW_THRESHOLD:
|
||||
return DedupResult(
|
||||
verdict="review", stage="embedding_review",
|
||||
matched_control_uuid=best_payload.get("control_uuid"),
|
||||
matched_control_id=best_payload.get("control_id"),
|
||||
matched_title=best_payload.get("title"),
|
||||
similarity_score=best_score,
|
||||
)
|
||||
return await self._check_cross_regulation(embedding, DedupResult(
|
||||
verdict="new", stage="embedding_below_threshold",
|
||||
similarity_score=best_score,
|
||||
details={"threshold": REVIEW_THRESHOLD},
|
||||
))
|
||||
|
||||
async def _check_cross_regulation(
|
||||
self,
|
||||
embedding: list[float],
|
||||
intra_result: DedupResult,
|
||||
) -> DedupResult:
|
||||
"""Second pass: cross-regulation linking for controls deemed 'new'.
|
||||
|
||||
Searches Qdrant WITHOUT pattern_id filter. Uses a higher threshold
|
||||
(0.95) to avoid false positives across regulation boundaries.
|
||||
"""
|
||||
if intra_result.verdict != "new" or not embedding:
|
||||
return intra_result
|
||||
|
||||
cross_results = await qdrant_search_cross_regulation(embedding, top_k=5)
|
||||
if not cross_results:
|
||||
return intra_result
|
||||
|
||||
best = cross_results[0]
|
||||
best_score = best.get("score", 0.0)
|
||||
if best_score > CROSS_REG_LINK_THRESHOLD:
|
||||
best_payload = best.get("payload", {})
|
||||
return DedupResult(
|
||||
verdict="link",
|
||||
stage="cross_regulation",
|
||||
matched_control_uuid=best_payload.get("control_uuid"),
|
||||
matched_control_id=best_payload.get("control_id"),
|
||||
matched_title=best_payload.get("title"),
|
||||
similarity_score=best_score,
|
||||
link_type="cross_regulation",
|
||||
details={
|
||||
"cross_reg_score": best_score,
|
||||
"cross_reg_threshold": CROSS_REG_LINK_THRESHOLD,
|
||||
},
|
||||
)
|
||||
|
||||
return intra_result
|
||||
|
||||
def add_parent_link(
|
||||
self,
|
||||
control_uuid: str,
|
||||
parent_control_uuid: str,
|
||||
link_type: str = "dedup_merge",
|
||||
confidence: float = 0.0,
|
||||
source_regulation: Optional[str] = None,
|
||||
source_article: Optional[str] = None,
|
||||
obligation_candidate_id: Optional[str] = None,
|
||||
) -> None:
|
||||
"""Add a parent link to an existing atomic control."""
|
||||
from sqlalchemy import text
|
||||
self.db.execute(text("""
|
||||
INSERT INTO control_parent_links
|
||||
(control_uuid, parent_control_uuid, link_type, confidence,
|
||||
source_regulation, source_article, obligation_candidate_id)
|
||||
VALUES (:cu, :pu, :lt, :conf, :sr, :sa, :oci::uuid)
|
||||
ON CONFLICT (control_uuid, parent_control_uuid) DO NOTHING
|
||||
"""), {
|
||||
"cu": control_uuid,
|
||||
"pu": parent_control_uuid,
|
||||
"lt": link_type,
|
||||
"conf": confidence,
|
||||
"sr": source_regulation,
|
||||
"sa": source_article,
|
||||
"oci": obligation_candidate_id,
|
||||
})
|
||||
self.db.commit()
|
||||
|
||||
def write_review(
|
||||
self,
|
||||
candidate_control_id: str,
|
||||
candidate_title: str,
|
||||
candidate_objective: str,
|
||||
result: DedupResult,
|
||||
parent_control_uuid: Optional[str] = None,
|
||||
obligation_candidate_id: Optional[str] = None,
|
||||
) -> None:
|
||||
"""Write a dedup review queue entry."""
|
||||
from sqlalchemy import text
|
||||
self.db.execute(text("""
|
||||
INSERT INTO control_dedup_reviews
|
||||
(candidate_control_id, candidate_title, candidate_objective,
|
||||
matched_control_uuid, matched_control_id,
|
||||
similarity_score, dedup_stage, dedup_details,
|
||||
parent_control_uuid, obligation_candidate_id)
|
||||
VALUES (:ccid, :ct, :co, :mcu::uuid, :mci, :ss, :ds,
|
||||
:dd::jsonb, :pcu::uuid, :oci)
|
||||
"""), {
|
||||
"ccid": candidate_control_id,
|
||||
"ct": candidate_title,
|
||||
"co": candidate_objective,
|
||||
"mcu": result.matched_control_uuid,
|
||||
"mci": result.matched_control_id,
|
||||
"ss": result.similarity_score,
|
||||
"ds": result.stage,
|
||||
"dd": __import__("json").dumps(result.details),
|
||||
"pcu": parent_control_uuid,
|
||||
"oci": obligation_candidate_id,
|
||||
})
|
||||
self.db.commit()
|
||||
|
||||
async def index_control(
|
||||
self,
|
||||
control_uuid: str,
|
||||
control_id: str,
|
||||
title: str,
|
||||
action: str,
|
||||
obj: str,
|
||||
pattern_id: str,
|
||||
collection: Optional[str] = None,
|
||||
) -> bool:
|
||||
"""Index a new atomic control in Qdrant for future dedup checks."""
|
||||
norm_action = normalize_action(action)
|
||||
norm_object = normalize_object(obj)
|
||||
canonical = canonicalize_text(action, obj, title)
|
||||
embedding = await self._embed(canonical)
|
||||
if not embedding:
|
||||
return False
|
||||
return await qdrant_upsert(
|
||||
point_id=control_uuid,
|
||||
embedding=embedding,
|
||||
payload={
|
||||
"control_uuid": control_uuid,
|
||||
"control_id": control_id,
|
||||
"title": title,
|
||||
"pattern_id": pattern_id,
|
||||
"action_normalized": norm_action,
|
||||
"object_normalized": norm_object,
|
||||
"canonical_text": canonical,
|
||||
},
|
||||
collection=collection,
|
||||
)
|
||||
2321
control-pipeline/services/control_generator.py
Normal file
2321
control-pipeline/services/control_generator.py
Normal file
File diff suppressed because it is too large
Load Diff
154
control-pipeline/services/control_status_machine.py
Normal file
154
control-pipeline/services/control_status_machine.py
Normal file
@@ -0,0 +1,154 @@
|
||||
"""
|
||||
Control Status Transition State Machine.
|
||||
|
||||
Enforces that controls cannot be set to "pass" without sufficient evidence.
|
||||
Prevents Compliance-Theater where controls claim compliance without real proof.
|
||||
|
||||
Transition rules:
|
||||
planned → in_progress : always allowed
|
||||
in_progress → pass : requires ≥1 evidence with confidence ≥ E2 and
|
||||
truth_status in (uploaded, observed, validated_internal)
|
||||
in_progress → partial : requires ≥1 evidence (any level)
|
||||
pass → fail : always allowed (degradation)
|
||||
any → n/a : requires status_justification
|
||||
any → planned : always allowed (reset)
|
||||
"""
|
||||
|
||||
from typing import Any, List, Optional, Tuple
|
||||
|
||||
# EvidenceDB is an ORM model from compliance — we only need duck-typed objects
|
||||
# with .confidence_level and .truth_status attributes.
|
||||
EvidenceDB = Any
|
||||
|
||||
|
||||
# Confidence level ordering for comparisons
|
||||
CONFIDENCE_ORDER = {"E0": 0, "E1": 1, "E2": 2, "E3": 3, "E4": 4}
|
||||
|
||||
# Truth statuses that qualify as "real" evidence for pass transitions
|
||||
VALID_TRUTH_STATUSES = {"uploaded", "observed", "validated_internal", "accepted_by_auditor", "provided_to_auditor"}
|
||||
|
||||
|
||||
def validate_transition(
|
||||
current_status: str,
|
||||
new_status: str,
|
||||
evidence_list: Optional[List[EvidenceDB]] = None,
|
||||
status_justification: Optional[str] = None,
|
||||
bypass_for_auto_updater: bool = False,
|
||||
) -> Tuple[bool, List[str]]:
|
||||
"""
|
||||
Validate whether a control status transition is allowed.
|
||||
|
||||
Args:
|
||||
current_status: Current control status value (e.g. "planned", "pass")
|
||||
new_status: Requested new status
|
||||
evidence_list: List of EvidenceDB objects linked to this control
|
||||
status_justification: Text justification (required for n/a transitions)
|
||||
bypass_for_auto_updater: If True, skip evidence checks (used by CI/CD auto-updater
|
||||
which creates evidence atomically with status change)
|
||||
|
||||
Returns:
|
||||
Tuple of (allowed: bool, violations: list[str])
|
||||
"""
|
||||
violations: List[str] = []
|
||||
evidence_list = evidence_list or []
|
||||
|
||||
# Same status → no-op, always allowed
|
||||
if current_status == new_status:
|
||||
return True, []
|
||||
|
||||
# Reset to planned is always allowed
|
||||
if new_status == "planned":
|
||||
return True, []
|
||||
|
||||
# n/a requires justification
|
||||
if new_status == "n/a":
|
||||
if not status_justification or not status_justification.strip():
|
||||
violations.append("Transition to 'n/a' requires a status_justification explaining why this control is not applicable.")
|
||||
return len(violations) == 0, violations
|
||||
|
||||
# Degradation: pass → fail is always allowed
|
||||
if current_status == "pass" and new_status == "fail":
|
||||
return True, []
|
||||
|
||||
# planned → in_progress: always allowed
|
||||
if current_status == "planned" and new_status == "in_progress":
|
||||
return True, []
|
||||
|
||||
# in_progress → partial: needs at least 1 evidence
|
||||
if new_status == "partial":
|
||||
if not bypass_for_auto_updater and len(evidence_list) == 0:
|
||||
violations.append("Transition to 'partial' requires at least 1 evidence record.")
|
||||
return len(violations) == 0, violations
|
||||
|
||||
# in_progress → pass: strict requirements
|
||||
if new_status == "pass":
|
||||
if bypass_for_auto_updater:
|
||||
return True, []
|
||||
|
||||
if len(evidence_list) == 0:
|
||||
violations.append("Transition to 'pass' requires at least 1 evidence record.")
|
||||
return False, violations
|
||||
|
||||
# Check for at least one qualifying evidence
|
||||
has_qualifying = False
|
||||
for e in evidence_list:
|
||||
conf = getattr(e, "confidence_level", None)
|
||||
truth = getattr(e, "truth_status", None)
|
||||
|
||||
# Get string values from enum or string
|
||||
conf_val = conf.value if hasattr(conf, "value") else str(conf) if conf else "E1"
|
||||
truth_val = truth.value if hasattr(truth, "value") else str(truth) if truth else "uploaded"
|
||||
|
||||
if CONFIDENCE_ORDER.get(conf_val, 1) >= CONFIDENCE_ORDER["E2"] and truth_val in VALID_TRUTH_STATUSES:
|
||||
has_qualifying = True
|
||||
break
|
||||
|
||||
if not has_qualifying:
|
||||
violations.append(
|
||||
"Transition to 'pass' requires at least 1 evidence with confidence >= E2 "
|
||||
"and truth_status in (uploaded, observed, validated_internal, accepted_by_auditor). "
|
||||
"Current evidence does not meet this threshold."
|
||||
)
|
||||
|
||||
return len(violations) == 0, violations
|
||||
|
||||
# in_progress → fail: always allowed
|
||||
if new_status == "fail":
|
||||
return True, []
|
||||
|
||||
# Any other transition from planned/fail to pass requires going through in_progress
|
||||
if current_status in ("planned", "fail") and new_status == "pass":
|
||||
if bypass_for_auto_updater:
|
||||
return True, []
|
||||
violations.append(
|
||||
f"Direct transition from '{current_status}' to 'pass' is not allowed. "
|
||||
f"Move to 'in_progress' first, then to 'pass' with qualifying evidence."
|
||||
)
|
||||
return False, violations
|
||||
|
||||
# Default: allow other transitions (e.g. fail → partial, partial → pass)
|
||||
# For partial → pass, apply the same evidence checks
|
||||
if current_status == "partial" and new_status == "pass":
|
||||
if bypass_for_auto_updater:
|
||||
return True, []
|
||||
|
||||
has_qualifying = False
|
||||
for e in evidence_list:
|
||||
conf = getattr(e, "confidence_level", None)
|
||||
truth = getattr(e, "truth_status", None)
|
||||
conf_val = conf.value if hasattr(conf, "value") else str(conf) if conf else "E1"
|
||||
truth_val = truth.value if hasattr(truth, "value") else str(truth) if truth else "uploaded"
|
||||
|
||||
if CONFIDENCE_ORDER.get(conf_val, 1) >= CONFIDENCE_ORDER["E2"] and truth_val in VALID_TRUTH_STATUSES:
|
||||
has_qualifying = True
|
||||
break
|
||||
|
||||
if not has_qualifying:
|
||||
violations.append(
|
||||
"Transition from 'partial' to 'pass' requires at least 1 evidence with confidence >= E2 "
|
||||
"and truth_status in (uploaded, observed, validated_internal, accepted_by_auditor)."
|
||||
)
|
||||
return len(violations) == 0, violations
|
||||
|
||||
# All other transitions allowed
|
||||
return True, []
|
||||
3932
control-pipeline/services/decomposition_pass.py
Normal file
3932
control-pipeline/services/decomposition_pass.py
Normal file
File diff suppressed because it is too large
Load Diff
714
control-pipeline/services/framework_decomposition.py
Normal file
714
control-pipeline/services/framework_decomposition.py
Normal file
@@ -0,0 +1,714 @@
|
||||
"""Framework Decomposition Engine — decomposes framework-container obligations.
|
||||
|
||||
Sits between Pass 0a (obligation extraction) and Pass 0b (atomic control
|
||||
composition). Detects obligations that reference a framework domain (e.g.
|
||||
"CCM-Praktiken fuer AIS") and decomposes them into concrete sub-obligations
|
||||
using an internal framework registry.
|
||||
|
||||
Three routing types:
|
||||
atomic → pass through to Pass 0b unchanged
|
||||
compound → split compound verbs, then Pass 0b
|
||||
framework_container → decompose via registry, then Pass 0b
|
||||
|
||||
The registry is a set of JSON files under compliance/data/frameworks/.
|
||||
"""
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
import uuid
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Registry loading
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_REGISTRY_DIR = Path(__file__).resolve().parent.parent / "data" / "frameworks"
|
||||
_REGISTRY: dict[str, dict] = {} # framework_id → framework dict
|
||||
|
||||
|
||||
def _load_registry() -> dict[str, dict]:
|
||||
"""Load all framework JSON files from the registry directory."""
|
||||
registry: dict[str, dict] = {}
|
||||
if not _REGISTRY_DIR.is_dir():
|
||||
logger.warning("Framework registry dir not found: %s", _REGISTRY_DIR)
|
||||
return registry
|
||||
|
||||
for fpath in sorted(_REGISTRY_DIR.glob("*.json")):
|
||||
try:
|
||||
with open(fpath, encoding="utf-8") as f:
|
||||
fw = json.load(f)
|
||||
fw_id = fw.get("framework_id", fpath.stem)
|
||||
registry[fw_id] = fw
|
||||
logger.info(
|
||||
"Loaded framework: %s (%d domains)",
|
||||
fw_id,
|
||||
len(fw.get("domains", [])),
|
||||
)
|
||||
except Exception:
|
||||
logger.exception("Failed to load framework file: %s", fpath)
|
||||
return registry
|
||||
|
||||
|
||||
def get_registry() -> dict[str, dict]:
|
||||
"""Return the global framework registry (lazy-loaded)."""
|
||||
global _REGISTRY
|
||||
if not _REGISTRY:
|
||||
_REGISTRY = _load_registry()
|
||||
return _REGISTRY
|
||||
|
||||
|
||||
def reload_registry() -> dict[str, dict]:
|
||||
"""Force-reload the framework registry from disk."""
|
||||
global _REGISTRY
|
||||
_REGISTRY = _load_registry()
|
||||
return _REGISTRY
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Framework alias index (built from registry)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _build_alias_index(registry: dict[str, dict]) -> dict[str, str]:
|
||||
"""Build a lowercase alias → framework_id lookup."""
|
||||
idx: dict[str, str] = {}
|
||||
for fw_id, fw in registry.items():
|
||||
# Framework-level aliases
|
||||
idx[fw_id.lower()] = fw_id
|
||||
name = fw.get("display_name", "")
|
||||
if name:
|
||||
idx[name.lower()] = fw_id
|
||||
# Common short forms
|
||||
for part in fw_id.lower().replace("_", " ").split():
|
||||
if len(part) >= 3:
|
||||
idx[part] = fw_id
|
||||
return idx
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Routing — classify obligation type
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
# Extended patterns for framework detection (beyond the simple _COMPOSITE_RE
|
||||
# in decomposition_pass.py — here we also capture the framework name)
|
||||
_FRAMEWORK_PATTERN = re.compile(
|
||||
r"(?:praktiken|kontrollen|ma(?:ss|ß)nahmen|anforderungen|vorgaben|controls|practices|measures|requirements)"
|
||||
r"\s+(?:f(?:ue|ü)r|aus|gem(?:ae|ä)(?:ss|ß)|nach|from|of|for|per)\s+"
|
||||
r"(.+?)(?:\s+(?:m(?:ue|ü)ssen|sollen|sind|werden|implementieren|umsetzen|einf(?:ue|ü)hren)|\.|,|$)",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
# Direct framework name references
|
||||
_DIRECT_FRAMEWORK_RE = re.compile(
|
||||
r"\b(?:CSA\s*CCM|NIST\s*(?:SP\s*)?800-53|OWASP\s*(?:ASVS|SAMM|Top\s*10)"
|
||||
r"|CIS\s*Controls|BSI\s*(?:IT-)?Grundschutz|ENISA|ISO\s*2700[12]"
|
||||
r"|COBIT|SOX|PCI\s*DSS|HITRUST|SOC\s*2|KRITIS)\b",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
# Compound verb patterns (multiple main verbs)
|
||||
_COMPOUND_VERB_RE = re.compile(
|
||||
r"\b(?:und|sowie|als\s+auch|or|and)\b",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
# No-split phrases that look compound but aren't
|
||||
_NO_SPLIT_PHRASES = [
|
||||
"pflegen und aufrechterhalten",
|
||||
"dokumentieren und pflegen",
|
||||
"definieren und dokumentieren",
|
||||
"erstellen und freigeben",
|
||||
"pruefen und genehmigen",
|
||||
"identifizieren und bewerten",
|
||||
"erkennen und melden",
|
||||
"define and maintain",
|
||||
"create and maintain",
|
||||
"establish and maintain",
|
||||
"monitor and review",
|
||||
"detect and respond",
|
||||
]
|
||||
|
||||
|
||||
@dataclass
|
||||
class RoutingResult:
|
||||
"""Result of obligation routing classification."""
|
||||
routing_type: str # atomic | compound | framework_container | unknown_review
|
||||
framework_ref: Optional[str] = None
|
||||
framework_domain: Optional[str] = None
|
||||
domain_title: Optional[str] = None
|
||||
confidence: float = 0.0
|
||||
reason: str = ""
|
||||
|
||||
|
||||
def classify_routing(
|
||||
obligation_text: str,
|
||||
action_raw: str,
|
||||
object_raw: str,
|
||||
condition_raw: Optional[str] = None,
|
||||
) -> RoutingResult:
|
||||
"""Classify an obligation into atomic / compound / framework_container."""
|
||||
combined = f"{obligation_text} {object_raw}".lower()
|
||||
|
||||
# --- Step 1: Framework container detection ---
|
||||
fw_result = _detect_framework(obligation_text, object_raw)
|
||||
if fw_result.routing_type == "framework_container":
|
||||
return fw_result
|
||||
|
||||
# --- Step 2: Compound verb detection ---
|
||||
if _is_compound_obligation(action_raw, obligation_text):
|
||||
return RoutingResult(
|
||||
routing_type="compound",
|
||||
confidence=0.7,
|
||||
reason="multiple_main_verbs",
|
||||
)
|
||||
|
||||
# --- Step 3: Default = atomic ---
|
||||
return RoutingResult(
|
||||
routing_type="atomic",
|
||||
confidence=0.9,
|
||||
reason="single_action_single_object",
|
||||
)
|
||||
|
||||
|
||||
def _detect_framework(
|
||||
obligation_text: str, object_raw: str,
|
||||
) -> RoutingResult:
|
||||
"""Detect if obligation references a framework domain."""
|
||||
combined = f"{obligation_text} {object_raw}"
|
||||
registry = get_registry()
|
||||
alias_idx = _build_alias_index(registry)
|
||||
|
||||
# Strategy 1: direct framework name match
|
||||
m = _DIRECT_FRAMEWORK_RE.search(combined)
|
||||
if m:
|
||||
fw_name = m.group(0).strip()
|
||||
fw_id = _resolve_framework_id(fw_name, alias_idx, registry)
|
||||
if fw_id:
|
||||
domain_id, domain_title = _match_domain(
|
||||
combined, registry[fw_id],
|
||||
)
|
||||
return RoutingResult(
|
||||
routing_type="framework_container",
|
||||
framework_ref=fw_id,
|
||||
framework_domain=domain_id,
|
||||
domain_title=domain_title,
|
||||
confidence=0.95 if domain_id else 0.75,
|
||||
reason=f"direct_framework_match:{fw_name}",
|
||||
)
|
||||
else:
|
||||
# Framework name recognized but not in registry
|
||||
return RoutingResult(
|
||||
routing_type="framework_container",
|
||||
framework_ref=None,
|
||||
framework_domain=None,
|
||||
confidence=0.6,
|
||||
reason=f"direct_framework_match_no_registry:{fw_name}",
|
||||
)
|
||||
|
||||
# Strategy 2: pattern match ("Praktiken fuer X")
|
||||
m2 = _FRAMEWORK_PATTERN.search(combined)
|
||||
if m2:
|
||||
ref_text = m2.group(1).strip()
|
||||
fw_id, domain_id, domain_title = _resolve_from_ref_text(
|
||||
ref_text, registry, alias_idx,
|
||||
)
|
||||
if fw_id:
|
||||
return RoutingResult(
|
||||
routing_type="framework_container",
|
||||
framework_ref=fw_id,
|
||||
framework_domain=domain_id,
|
||||
domain_title=domain_title,
|
||||
confidence=0.85 if domain_id else 0.65,
|
||||
reason=f"pattern_match:{ref_text}",
|
||||
)
|
||||
|
||||
# Strategy 3: keyword-heavy object
|
||||
if _has_framework_keywords(object_raw):
|
||||
return RoutingResult(
|
||||
routing_type="framework_container",
|
||||
framework_ref=None,
|
||||
framework_domain=None,
|
||||
confidence=0.5,
|
||||
reason="framework_keywords_in_object",
|
||||
)
|
||||
|
||||
return RoutingResult(routing_type="atomic", confidence=0.0)
|
||||
|
||||
|
||||
def _resolve_framework_id(
|
||||
name: str,
|
||||
alias_idx: dict[str, str],
|
||||
registry: dict[str, dict],
|
||||
) -> Optional[str]:
|
||||
"""Resolve a framework name to its registry ID."""
|
||||
normalized = re.sub(r"\s+", " ", name.strip().lower())
|
||||
# Direct alias match
|
||||
if normalized in alias_idx:
|
||||
return alias_idx[normalized]
|
||||
# Try compact form (strip spaces, hyphens, underscores)
|
||||
compact = re.sub(r"[\s_\-]+", "", normalized)
|
||||
for alias, fw_id in alias_idx.items():
|
||||
if re.sub(r"[\s_\-]+", "", alias) == compact:
|
||||
return fw_id
|
||||
# Substring match in display names
|
||||
for fw_id, fw in registry.items():
|
||||
display = fw.get("display_name", "").lower()
|
||||
if normalized in display or display in normalized:
|
||||
return fw_id
|
||||
# Partial match: check if normalized contains any alias (for multi-word refs)
|
||||
for alias, fw_id in alias_idx.items():
|
||||
if len(alias) >= 4 and alias in normalized:
|
||||
return fw_id
|
||||
return None
|
||||
|
||||
|
||||
def _match_domain(
|
||||
text: str, framework: dict,
|
||||
) -> tuple[Optional[str], Optional[str]]:
|
||||
"""Match a domain within a framework from text references."""
|
||||
text_lower = text.lower()
|
||||
best_id: Optional[str] = None
|
||||
best_title: Optional[str] = None
|
||||
best_score = 0
|
||||
|
||||
for domain in framework.get("domains", []):
|
||||
score = 0
|
||||
domain_id = domain["domain_id"]
|
||||
title = domain.get("title", "")
|
||||
|
||||
# Exact domain ID match (e.g. "AIS")
|
||||
if re.search(rf"\b{re.escape(domain_id)}\b", text, re.IGNORECASE):
|
||||
score += 10
|
||||
|
||||
# Full title match
|
||||
if title.lower() in text_lower:
|
||||
score += 8
|
||||
|
||||
# Alias match
|
||||
for alias in domain.get("aliases", []):
|
||||
if alias.lower() in text_lower:
|
||||
score += 6
|
||||
break
|
||||
|
||||
# Keyword overlap
|
||||
kw_hits = sum(
|
||||
1 for kw in domain.get("keywords", [])
|
||||
if kw.lower() in text_lower
|
||||
)
|
||||
score += kw_hits
|
||||
|
||||
if score > best_score:
|
||||
best_score = score
|
||||
best_id = domain_id
|
||||
best_title = title
|
||||
|
||||
if best_score >= 3:
|
||||
return best_id, best_title
|
||||
return None, None
|
||||
|
||||
|
||||
def _resolve_from_ref_text(
|
||||
ref_text: str,
|
||||
registry: dict[str, dict],
|
||||
alias_idx: dict[str, str],
|
||||
) -> tuple[Optional[str], Optional[str], Optional[str]]:
|
||||
"""Resolve framework + domain from a reference text like 'AIS' or 'Application Security'."""
|
||||
ref_lower = ref_text.lower()
|
||||
|
||||
for fw_id, fw in registry.items():
|
||||
for domain in fw.get("domains", []):
|
||||
# Check domain ID
|
||||
if domain["domain_id"].lower() in ref_lower:
|
||||
return fw_id, domain["domain_id"], domain.get("title")
|
||||
# Check title
|
||||
if domain.get("title", "").lower() in ref_lower:
|
||||
return fw_id, domain["domain_id"], domain.get("title")
|
||||
# Check aliases
|
||||
for alias in domain.get("aliases", []):
|
||||
if alias.lower() in ref_lower or ref_lower in alias.lower():
|
||||
return fw_id, domain["domain_id"], domain.get("title")
|
||||
|
||||
return None, None, None
|
||||
|
||||
|
||||
_FRAMEWORK_KW_SET = {
|
||||
"praktiken", "kontrollen", "massnahmen", "maßnahmen",
|
||||
"anforderungen", "vorgaben", "framework", "standard",
|
||||
"baseline", "katalog", "domain", "family", "category",
|
||||
"practices", "controls", "measures", "requirements",
|
||||
}
|
||||
|
||||
|
||||
def _has_framework_keywords(text: str) -> bool:
|
||||
"""Check if text contains framework-indicator keywords."""
|
||||
words = set(re.findall(r"[a-zäöüß]+", text.lower()))
|
||||
return len(words & _FRAMEWORK_KW_SET) >= 2
|
||||
|
||||
|
||||
def _is_compound_obligation(action_raw: str, obligation_text: str) -> bool:
|
||||
"""Detect if the obligation has multiple competing main verbs."""
|
||||
if not action_raw:
|
||||
return False
|
||||
|
||||
action_lower = action_raw.lower().strip()
|
||||
|
||||
# Check no-split phrases first
|
||||
for phrase in _NO_SPLIT_PHRASES:
|
||||
if phrase in action_lower:
|
||||
return False
|
||||
|
||||
# Must have a conjunction
|
||||
if not _COMPOUND_VERB_RE.search(action_lower):
|
||||
return False
|
||||
|
||||
# Split by conjunctions and check if we get 2+ meaningful verbs
|
||||
parts = re.split(r"\b(?:und|sowie|als\s+auch|or|and)\b", action_lower)
|
||||
meaningful = [p.strip() for p in parts if len(p.strip()) >= 3]
|
||||
return len(meaningful) >= 2
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Framework Decomposition
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@dataclass
|
||||
class DecomposedObligation:
|
||||
"""A concrete obligation derived from a framework container."""
|
||||
obligation_candidate_id: str
|
||||
parent_control_id: str
|
||||
parent_framework_container_id: str
|
||||
source_ref_law: str
|
||||
source_ref_article: str
|
||||
obligation_text: str
|
||||
actor: str
|
||||
action_raw: str
|
||||
object_raw: str
|
||||
condition_raw: Optional[str] = None
|
||||
trigger_raw: Optional[str] = None
|
||||
routing_type: str = "atomic"
|
||||
release_state: str = "decomposed"
|
||||
subcontrol_id: str = ""
|
||||
# Metadata
|
||||
action_hint: str = ""
|
||||
object_hint: str = ""
|
||||
object_class: str = ""
|
||||
keywords: list[str] = field(default_factory=list)
|
||||
|
||||
|
||||
@dataclass
|
||||
class FrameworkDecompositionResult:
|
||||
"""Result of framework decomposition."""
|
||||
framework_container_id: str
|
||||
source_obligation_candidate_id: str
|
||||
framework_ref: Optional[str]
|
||||
framework_domain: Optional[str]
|
||||
domain_title: Optional[str]
|
||||
matched_subcontrols: list[str]
|
||||
decomposition_confidence: float
|
||||
release_state: str # decomposed | unmatched | error
|
||||
decomposed_obligations: list[DecomposedObligation]
|
||||
issues: list[str]
|
||||
|
||||
|
||||
def decompose_framework_container(
|
||||
obligation_candidate_id: str,
|
||||
parent_control_id: str,
|
||||
obligation_text: str,
|
||||
framework_ref: Optional[str],
|
||||
framework_domain: Optional[str],
|
||||
actor: str = "organization",
|
||||
) -> FrameworkDecompositionResult:
|
||||
"""Decompose a framework-container obligation into concrete sub-obligations.
|
||||
|
||||
Steps:
|
||||
1. Resolve framework from registry
|
||||
2. Resolve domain within framework
|
||||
3. Select relevant subcontrols (keyword filter or full domain)
|
||||
4. Generate decomposed obligations
|
||||
"""
|
||||
container_id = f"FWC-{uuid.uuid4().hex[:8]}"
|
||||
registry = get_registry()
|
||||
issues: list[str] = []
|
||||
|
||||
# Step 1: Resolve framework
|
||||
fw = None
|
||||
if framework_ref and framework_ref in registry:
|
||||
fw = registry[framework_ref]
|
||||
else:
|
||||
# Try to find by name in text
|
||||
fw, framework_ref = _find_framework_in_text(obligation_text, registry)
|
||||
|
||||
if not fw:
|
||||
issues.append("ERROR: framework_not_matched")
|
||||
return FrameworkDecompositionResult(
|
||||
framework_container_id=container_id,
|
||||
source_obligation_candidate_id=obligation_candidate_id,
|
||||
framework_ref=framework_ref,
|
||||
framework_domain=framework_domain,
|
||||
domain_title=None,
|
||||
matched_subcontrols=[],
|
||||
decomposition_confidence=0.0,
|
||||
release_state="unmatched",
|
||||
decomposed_obligations=[],
|
||||
issues=issues,
|
||||
)
|
||||
|
||||
# Step 2: Resolve domain
|
||||
domain_data = None
|
||||
domain_title = None
|
||||
if framework_domain:
|
||||
for d in fw.get("domains", []):
|
||||
if d["domain_id"].lower() == framework_domain.lower():
|
||||
domain_data = d
|
||||
domain_title = d.get("title")
|
||||
break
|
||||
if not domain_data:
|
||||
# Try matching from text
|
||||
domain_id, domain_title = _match_domain(obligation_text, fw)
|
||||
if domain_id:
|
||||
for d in fw.get("domains", []):
|
||||
if d["domain_id"] == domain_id:
|
||||
domain_data = d
|
||||
framework_domain = domain_id
|
||||
break
|
||||
|
||||
if not domain_data:
|
||||
issues.append("WARN: domain_not_matched — using all domains")
|
||||
# Fall back to all subcontrols across all domains
|
||||
all_subcontrols = []
|
||||
for d in fw.get("domains", []):
|
||||
for sc in d.get("subcontrols", []):
|
||||
sc["_domain_id"] = d["domain_id"]
|
||||
all_subcontrols.append(sc)
|
||||
subcontrols = _select_subcontrols(obligation_text, all_subcontrols)
|
||||
if not subcontrols:
|
||||
issues.append("ERROR: no_subcontrols_matched")
|
||||
return FrameworkDecompositionResult(
|
||||
framework_container_id=container_id,
|
||||
source_obligation_candidate_id=obligation_candidate_id,
|
||||
framework_ref=framework_ref,
|
||||
framework_domain=framework_domain,
|
||||
domain_title=None,
|
||||
matched_subcontrols=[],
|
||||
decomposition_confidence=0.0,
|
||||
release_state="unmatched",
|
||||
decomposed_obligations=[],
|
||||
issues=issues,
|
||||
)
|
||||
else:
|
||||
# Step 3: Select subcontrols from domain
|
||||
raw_subcontrols = domain_data.get("subcontrols", [])
|
||||
subcontrols = _select_subcontrols(obligation_text, raw_subcontrols)
|
||||
if not subcontrols:
|
||||
# Full domain decomposition
|
||||
subcontrols = raw_subcontrols
|
||||
|
||||
# Quality check: too many subcontrols
|
||||
if len(subcontrols) > 25:
|
||||
issues.append(f"WARN: {len(subcontrols)} subcontrols — may be too broad")
|
||||
|
||||
# Step 4: Generate decomposed obligations
|
||||
display_name = fw.get("display_name", framework_ref or "Unknown")
|
||||
decomposed: list[DecomposedObligation] = []
|
||||
matched_ids: list[str] = []
|
||||
|
||||
for sc in subcontrols:
|
||||
sc_id = sc.get("subcontrol_id", "")
|
||||
matched_ids.append(sc_id)
|
||||
|
||||
action_hint = sc.get("action_hint", "")
|
||||
object_hint = sc.get("object_hint", "")
|
||||
|
||||
# Quality warnings
|
||||
if not action_hint:
|
||||
issues.append(f"WARN: {sc_id} missing action_hint")
|
||||
if not object_hint:
|
||||
issues.append(f"WARN: {sc_id} missing object_hint")
|
||||
|
||||
obl_id = f"{obligation_candidate_id}-{sc_id}"
|
||||
|
||||
decomposed.append(DecomposedObligation(
|
||||
obligation_candidate_id=obl_id,
|
||||
parent_control_id=parent_control_id,
|
||||
parent_framework_container_id=container_id,
|
||||
source_ref_law=display_name,
|
||||
source_ref_article=sc_id,
|
||||
obligation_text=sc.get("statement", ""),
|
||||
actor=actor,
|
||||
action_raw=action_hint or _infer_action(sc.get("statement", "")),
|
||||
object_raw=object_hint or _infer_object(sc.get("statement", "")),
|
||||
routing_type="atomic",
|
||||
release_state="decomposed",
|
||||
subcontrol_id=sc_id,
|
||||
action_hint=action_hint,
|
||||
object_hint=object_hint,
|
||||
object_class=sc.get("object_class", ""),
|
||||
keywords=sc.get("keywords", []),
|
||||
))
|
||||
|
||||
# Check if decomposed are identical to container
|
||||
for d in decomposed:
|
||||
if d.obligation_text.strip() == obligation_text.strip():
|
||||
issues.append(f"WARN: {d.subcontrol_id} identical to container text")
|
||||
|
||||
confidence = _compute_decomposition_confidence(
|
||||
framework_ref, framework_domain, domain_data, len(subcontrols), issues,
|
||||
)
|
||||
|
||||
return FrameworkDecompositionResult(
|
||||
framework_container_id=container_id,
|
||||
source_obligation_candidate_id=obligation_candidate_id,
|
||||
framework_ref=framework_ref,
|
||||
framework_domain=framework_domain,
|
||||
domain_title=domain_title,
|
||||
matched_subcontrols=matched_ids,
|
||||
decomposition_confidence=confidence,
|
||||
release_state="decomposed",
|
||||
decomposed_obligations=decomposed,
|
||||
issues=issues,
|
||||
)
|
||||
|
||||
|
||||
def _find_framework_in_text(
|
||||
text: str, registry: dict[str, dict],
|
||||
) -> tuple[Optional[dict], Optional[str]]:
|
||||
"""Try to find a framework by searching text for known names."""
|
||||
alias_idx = _build_alias_index(registry)
|
||||
m = _DIRECT_FRAMEWORK_RE.search(text)
|
||||
if m:
|
||||
fw_id = _resolve_framework_id(m.group(0), alias_idx, registry)
|
||||
if fw_id and fw_id in registry:
|
||||
return registry[fw_id], fw_id
|
||||
return None, None
|
||||
|
||||
|
||||
def _select_subcontrols(
|
||||
obligation_text: str, subcontrols: list[dict],
|
||||
) -> list[dict]:
|
||||
"""Select relevant subcontrols based on keyword matching.
|
||||
|
||||
Returns empty list if no targeted match found (caller falls back to
|
||||
full domain).
|
||||
"""
|
||||
text_lower = obligation_text.lower()
|
||||
scored: list[tuple[int, dict]] = []
|
||||
|
||||
for sc in subcontrols:
|
||||
score = 0
|
||||
for kw in sc.get("keywords", []):
|
||||
if kw.lower() in text_lower:
|
||||
score += 1
|
||||
# Title match
|
||||
title = sc.get("title", "").lower()
|
||||
if title and title in text_lower:
|
||||
score += 3
|
||||
# Object hint in text
|
||||
obj = sc.get("object_hint", "").lower()
|
||||
if obj and obj in text_lower:
|
||||
score += 2
|
||||
|
||||
if score > 0:
|
||||
scored.append((score, sc))
|
||||
|
||||
if not scored:
|
||||
return []
|
||||
|
||||
# Only return those with meaningful overlap (score >= 2)
|
||||
scored.sort(key=lambda x: x[0], reverse=True)
|
||||
return [sc for score, sc in scored if score >= 2]
|
||||
|
||||
|
||||
def _infer_action(statement: str) -> str:
|
||||
"""Infer a basic action verb from a statement."""
|
||||
s = statement.lower()
|
||||
if any(w in s for w in ["definiert", "definieren", "define"]):
|
||||
return "definieren"
|
||||
if any(w in s for w in ["implementiert", "implementieren", "implement"]):
|
||||
return "implementieren"
|
||||
if any(w in s for w in ["dokumentiert", "dokumentieren", "document"]):
|
||||
return "dokumentieren"
|
||||
if any(w in s for w in ["ueberwacht", "ueberwachen", "monitor"]):
|
||||
return "ueberwachen"
|
||||
if any(w in s for w in ["getestet", "testen", "test"]):
|
||||
return "testen"
|
||||
if any(w in s for w in ["geschuetzt", "schuetzen", "protect"]):
|
||||
return "implementieren"
|
||||
if any(w in s for w in ["verwaltet", "verwalten", "manage"]):
|
||||
return "pflegen"
|
||||
if any(w in s for w in ["gemeldet", "melden", "report"]):
|
||||
return "melden"
|
||||
return "implementieren"
|
||||
|
||||
|
||||
def _infer_object(statement: str) -> str:
|
||||
"""Infer the primary object from a statement (first noun phrase)."""
|
||||
# Simple heuristic: take the text after "muessen"/"muss" up to the verb
|
||||
m = re.search(
|
||||
r"(?:muessen|muss|m(?:ü|ue)ssen)\s+(.+?)(?:\s+werden|\s+sein|\.|,|$)",
|
||||
statement,
|
||||
re.IGNORECASE,
|
||||
)
|
||||
if m:
|
||||
return m.group(1).strip()[:80]
|
||||
# Fallback: first 80 chars
|
||||
return statement[:80] if statement else ""
|
||||
|
||||
|
||||
def _compute_decomposition_confidence(
|
||||
framework_ref: Optional[str],
|
||||
domain: Optional[str],
|
||||
domain_data: Optional[dict],
|
||||
num_subcontrols: int,
|
||||
issues: list[str],
|
||||
) -> float:
|
||||
"""Compute confidence score for the decomposition."""
|
||||
score = 0.3
|
||||
if framework_ref:
|
||||
score += 0.25
|
||||
if domain:
|
||||
score += 0.20
|
||||
if domain_data:
|
||||
score += 0.10
|
||||
if 1 <= num_subcontrols <= 15:
|
||||
score += 0.10
|
||||
elif num_subcontrols > 15:
|
||||
score += 0.05 # less confident with too many
|
||||
|
||||
# Penalize errors
|
||||
errors = sum(1 for i in issues if i.startswith("ERROR:"))
|
||||
score -= errors * 0.15
|
||||
return round(max(min(score, 1.0), 0.0), 2)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Registry statistics (for admin/debugging)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def registry_stats() -> dict:
|
||||
"""Return summary statistics about the loaded registry."""
|
||||
reg = get_registry()
|
||||
stats = {
|
||||
"frameworks": len(reg),
|
||||
"details": [],
|
||||
}
|
||||
total_domains = 0
|
||||
total_subcontrols = 0
|
||||
for fw_id, fw in reg.items():
|
||||
domains = fw.get("domains", [])
|
||||
n_sc = sum(len(d.get("subcontrols", [])) for d in domains)
|
||||
total_domains += len(domains)
|
||||
total_subcontrols += n_sc
|
||||
stats["details"].append({
|
||||
"framework_id": fw_id,
|
||||
"display_name": fw.get("display_name", ""),
|
||||
"domains": len(domains),
|
||||
"subcontrols": n_sc,
|
||||
})
|
||||
stats["total_domains"] = total_domains
|
||||
stats["total_subcontrols"] = total_subcontrols
|
||||
return stats
|
||||
116
control-pipeline/services/license_gate.py
Normal file
116
control-pipeline/services/license_gate.py
Normal file
@@ -0,0 +1,116 @@
|
||||
"""
|
||||
License Gate — checks whether a given source may be used for a specific purpose.
|
||||
|
||||
Usage types:
|
||||
- analysis: Read + analyse internally (TDM under UrhG 44b)
|
||||
- store_excerpt: Store verbatim excerpt in vault
|
||||
- ship_embeddings: Ship embeddings in product
|
||||
- ship_in_product: Ship text/content in product
|
||||
|
||||
Policy is driven by the canonical_control_sources table columns:
|
||||
allowed_analysis, allowed_store_excerpt, allowed_ship_embeddings, allowed_ship_in_product
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
from sqlalchemy import text
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
USAGE_COLUMN_MAP = {
|
||||
"analysis": "allowed_analysis",
|
||||
"store_excerpt": "allowed_store_excerpt",
|
||||
"ship_embeddings": "allowed_ship_embeddings",
|
||||
"ship_in_product": "allowed_ship_in_product",
|
||||
}
|
||||
|
||||
|
||||
def check_source_allowed(db: Session, source_id: str, usage_type: str) -> bool:
|
||||
"""Check whether *source_id* may be used for *usage_type*.
|
||||
|
||||
Returns False if the source is unknown or the usage is not allowed.
|
||||
"""
|
||||
col = USAGE_COLUMN_MAP.get(usage_type)
|
||||
if col is None:
|
||||
logger.warning("Unknown usage_type=%s", usage_type)
|
||||
return False
|
||||
|
||||
row = db.execute(
|
||||
text(f"SELECT {col} FROM canonical_control_sources WHERE source_id = :sid"),
|
||||
{"sid": source_id},
|
||||
).fetchone()
|
||||
|
||||
if row is None:
|
||||
logger.warning("Source %s not found in registry", source_id)
|
||||
return False
|
||||
|
||||
return bool(row[0])
|
||||
|
||||
|
||||
def get_license_matrix(db: Session) -> list[dict[str, Any]]:
|
||||
"""Return the full license matrix with allowed usages per license."""
|
||||
rows = db.execute(
|
||||
text("""
|
||||
SELECT license_id, name, terms_url, commercial_use,
|
||||
ai_training_restriction, tdm_allowed_under_44b,
|
||||
deletion_required, notes
|
||||
FROM canonical_control_licenses
|
||||
ORDER BY license_id
|
||||
""")
|
||||
).fetchall()
|
||||
|
||||
return [
|
||||
{
|
||||
"license_id": r.license_id,
|
||||
"name": r.name,
|
||||
"terms_url": r.terms_url,
|
||||
"commercial_use": r.commercial_use,
|
||||
"ai_training_restriction": r.ai_training_restriction,
|
||||
"tdm_allowed_under_44b": r.tdm_allowed_under_44b,
|
||||
"deletion_required": r.deletion_required,
|
||||
"notes": r.notes,
|
||||
}
|
||||
for r in rows
|
||||
]
|
||||
|
||||
|
||||
def get_source_permissions(db: Session) -> list[dict[str, Any]]:
|
||||
"""Return all sources with their permission flags."""
|
||||
rows = db.execute(
|
||||
text("""
|
||||
SELECT s.source_id, s.title, s.publisher, s.url, s.version_label,
|
||||
s.language, s.license_id,
|
||||
s.allowed_analysis, s.allowed_store_excerpt,
|
||||
s.allowed_ship_embeddings, s.allowed_ship_in_product,
|
||||
s.vault_retention_days, s.vault_access_tier,
|
||||
l.name AS license_name, l.commercial_use
|
||||
FROM canonical_control_sources s
|
||||
JOIN canonical_control_licenses l ON l.license_id = s.license_id
|
||||
ORDER BY s.source_id
|
||||
""")
|
||||
).fetchall()
|
||||
|
||||
return [
|
||||
{
|
||||
"source_id": r.source_id,
|
||||
"title": r.title,
|
||||
"publisher": r.publisher,
|
||||
"url": r.url,
|
||||
"version_label": r.version_label,
|
||||
"language": r.language,
|
||||
"license_id": r.license_id,
|
||||
"license_name": r.license_name,
|
||||
"commercial_use": r.commercial_use,
|
||||
"allowed_analysis": r.allowed_analysis,
|
||||
"allowed_store_excerpt": r.allowed_store_excerpt,
|
||||
"allowed_ship_embeddings": r.allowed_ship_embeddings,
|
||||
"allowed_ship_in_product": r.allowed_ship_in_product,
|
||||
"vault_retention_days": r.vault_retention_days,
|
||||
"vault_access_tier": r.vault_access_tier,
|
||||
}
|
||||
for r in rows
|
||||
]
|
||||
624
control-pipeline/services/llm_provider.py
Normal file
624
control-pipeline/services/llm_provider.py
Normal file
@@ -0,0 +1,624 @@
|
||||
"""
|
||||
LLM Provider Abstraction for Compliance AI Features.
|
||||
|
||||
Supports:
|
||||
- Anthropic Claude API (default)
|
||||
- Self-Hosted LLMs (Ollama, vLLM, LocalAI, etc.)
|
||||
- HashiCorp Vault integration for secure API key storage
|
||||
|
||||
Configuration via environment variables:
|
||||
- COMPLIANCE_LLM_PROVIDER: "anthropic" or "self_hosted"
|
||||
- ANTHROPIC_API_KEY: API key for Claude (or loaded from Vault)
|
||||
- ANTHROPIC_MODEL: Model name (default: claude-sonnet-4-20250514)
|
||||
- SELF_HOSTED_LLM_URL: Base URL for self-hosted LLM
|
||||
- SELF_HOSTED_LLM_MODEL: Model name for self-hosted
|
||||
- SELF_HOSTED_LLM_KEY: Optional API key for self-hosted
|
||||
|
||||
Vault Configuration:
|
||||
- VAULT_ADDR: Vault server address (e.g., http://vault:8200)
|
||||
- VAULT_TOKEN: Vault authentication token
|
||||
- USE_VAULT_SECRETS: Set to "true" to enable Vault integration
|
||||
- VAULT_SECRET_PATH: Path to secrets (default: secret/breakpilot/api_keys)
|
||||
"""
|
||||
|
||||
import os
|
||||
import asyncio
|
||||
import logging
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import List, Optional, Dict, Any
|
||||
from dataclasses import dataclass
|
||||
from enum import Enum
|
||||
|
||||
import httpx
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Vault Integration
|
||||
# =============================================================================
|
||||
|
||||
class VaultClient:
|
||||
"""
|
||||
HashiCorp Vault client for retrieving secrets.
|
||||
|
||||
Supports KV v2 secrets engine.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
addr: Optional[str] = None,
|
||||
token: Optional[str] = None
|
||||
):
|
||||
self.addr = addr or os.getenv("VAULT_ADDR", "http://localhost:8200")
|
||||
self.token = token or os.getenv("VAULT_TOKEN")
|
||||
self._cache: Dict[str, Any] = {}
|
||||
self._cache_ttl = 300 # 5 minutes cache
|
||||
|
||||
def _get_headers(self) -> Dict[str, str]:
|
||||
"""Get request headers with Vault token."""
|
||||
headers = {"Content-Type": "application/json"}
|
||||
if self.token:
|
||||
headers["X-Vault-Token"] = self.token
|
||||
return headers
|
||||
|
||||
def get_secret(self, path: str, key: str = "value") -> Optional[str]:
|
||||
"""
|
||||
Get a secret from Vault KV v2.
|
||||
|
||||
Args:
|
||||
path: Secret path (e.g., "breakpilot/api_keys/anthropic")
|
||||
key: Key within the secret data (default: "value")
|
||||
|
||||
Returns:
|
||||
Secret value or None if not found
|
||||
"""
|
||||
cache_key = f"{path}:{key}"
|
||||
|
||||
# Check cache first
|
||||
if cache_key in self._cache:
|
||||
return self._cache[cache_key]
|
||||
|
||||
try:
|
||||
# KV v2 uses /data/ in the path
|
||||
full_path = f"{self.addr}/v1/secret/data/{path}"
|
||||
|
||||
response = httpx.get(
|
||||
full_path,
|
||||
headers=self._get_headers(),
|
||||
timeout=10.0
|
||||
)
|
||||
|
||||
if response.status_code == 200:
|
||||
data = response.json()
|
||||
secret_data = data.get("data", {}).get("data", {})
|
||||
secret_value = secret_data.get(key)
|
||||
|
||||
if secret_value:
|
||||
self._cache[cache_key] = secret_value
|
||||
logger.info(f"Successfully loaded secret from Vault: {path}")
|
||||
return secret_value
|
||||
|
||||
elif response.status_code == 404:
|
||||
logger.warning(f"Secret not found in Vault: {path}")
|
||||
else:
|
||||
logger.error(f"Vault error {response.status_code}: {response.text}")
|
||||
|
||||
except httpx.RequestError as e:
|
||||
logger.error(f"Failed to connect to Vault at {self.addr}: {e}")
|
||||
except Exception as e:
|
||||
logger.error(f"Error retrieving secret from Vault: {e}")
|
||||
|
||||
return None
|
||||
|
||||
def get_anthropic_key(self) -> Optional[str]:
|
||||
"""Get Anthropic API key from Vault."""
|
||||
path = os.getenv("VAULT_ANTHROPIC_PATH", "breakpilot/api_keys/anthropic")
|
||||
return self.get_secret(path, "value")
|
||||
|
||||
def is_available(self) -> bool:
|
||||
"""Check if Vault is available and authenticated."""
|
||||
try:
|
||||
response = httpx.get(
|
||||
f"{self.addr}/v1/sys/health",
|
||||
headers=self._get_headers(),
|
||||
timeout=5.0
|
||||
)
|
||||
return response.status_code in (200, 429, 472, 473, 501, 503)
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
|
||||
# Singleton Vault client
|
||||
_vault_client: Optional[VaultClient] = None
|
||||
|
||||
|
||||
def get_vault_client() -> VaultClient:
|
||||
"""Get shared Vault client instance."""
|
||||
global _vault_client
|
||||
if _vault_client is None:
|
||||
_vault_client = VaultClient()
|
||||
return _vault_client
|
||||
|
||||
|
||||
def get_secret_from_vault_or_env(
|
||||
vault_path: str,
|
||||
env_var: str,
|
||||
vault_key: str = "value"
|
||||
) -> Optional[str]:
|
||||
"""
|
||||
Get a secret, trying Vault first, then falling back to environment variable.
|
||||
|
||||
Args:
|
||||
vault_path: Path in Vault (e.g., "breakpilot/api_keys/anthropic")
|
||||
env_var: Environment variable name as fallback
|
||||
vault_key: Key within Vault secret data
|
||||
|
||||
Returns:
|
||||
Secret value or None
|
||||
"""
|
||||
use_vault = os.getenv("USE_VAULT_SECRETS", "").lower() in ("true", "1", "yes")
|
||||
|
||||
if use_vault:
|
||||
vault = get_vault_client()
|
||||
secret = vault.get_secret(vault_path, vault_key)
|
||||
if secret:
|
||||
return secret
|
||||
logger.info(f"Vault secret not found, falling back to env: {env_var}")
|
||||
|
||||
return os.getenv(env_var)
|
||||
|
||||
|
||||
class LLMProviderType(str, Enum):
|
||||
"""Supported LLM provider types."""
|
||||
ANTHROPIC = "anthropic"
|
||||
SELF_HOSTED = "self_hosted"
|
||||
OLLAMA = "ollama" # Alias for self_hosted (Ollama-specific)
|
||||
MOCK = "mock" # For testing
|
||||
|
||||
|
||||
@dataclass
|
||||
class LLMResponse:
|
||||
"""Standard response from LLM."""
|
||||
content: str
|
||||
model: str
|
||||
provider: str
|
||||
usage: Optional[Dict[str, int]] = None
|
||||
raw_response: Optional[Dict[str, Any]] = None
|
||||
|
||||
|
||||
@dataclass
|
||||
class LLMConfig:
|
||||
"""Configuration for LLM provider."""
|
||||
provider_type: LLMProviderType
|
||||
api_key: Optional[str] = None
|
||||
model: str = "claude-sonnet-4-20250514"
|
||||
base_url: Optional[str] = None
|
||||
max_tokens: int = 4096
|
||||
temperature: float = 0.3
|
||||
timeout: float = 60.0
|
||||
|
||||
|
||||
class LLMProvider(ABC):
|
||||
"""Abstract base class for LLM providers."""
|
||||
|
||||
def __init__(self, config: LLMConfig):
|
||||
self.config = config
|
||||
|
||||
@abstractmethod
|
||||
async def complete(
|
||||
self,
|
||||
prompt: str,
|
||||
system_prompt: Optional[str] = None,
|
||||
max_tokens: Optional[int] = None,
|
||||
temperature: Optional[float] = None
|
||||
) -> LLMResponse:
|
||||
"""Generate a completion for the given prompt."""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
async def batch_complete(
|
||||
self,
|
||||
prompts: List[str],
|
||||
system_prompt: Optional[str] = None,
|
||||
max_tokens: Optional[int] = None,
|
||||
rate_limit: float = 1.0
|
||||
) -> List[LLMResponse]:
|
||||
"""Generate completions for multiple prompts with rate limiting."""
|
||||
pass
|
||||
|
||||
@property
|
||||
@abstractmethod
|
||||
def provider_name(self) -> str:
|
||||
"""Return the provider name."""
|
||||
pass
|
||||
|
||||
|
||||
class AnthropicProvider(LLMProvider):
|
||||
"""Claude API Provider using Anthropic's official API."""
|
||||
|
||||
ANTHROPIC_API_URL = "https://api.anthropic.com/v1/messages"
|
||||
|
||||
def __init__(self, config: LLMConfig):
|
||||
super().__init__(config)
|
||||
if not config.api_key:
|
||||
raise ValueError("Anthropic API key is required")
|
||||
self.api_key = config.api_key
|
||||
self.model = config.model or "claude-sonnet-4-20250514"
|
||||
|
||||
@property
|
||||
def provider_name(self) -> str:
|
||||
return "anthropic"
|
||||
|
||||
async def complete(
|
||||
self,
|
||||
prompt: str,
|
||||
system_prompt: Optional[str] = None,
|
||||
max_tokens: Optional[int] = None,
|
||||
temperature: Optional[float] = None
|
||||
) -> LLMResponse:
|
||||
"""Generate completion using Claude API."""
|
||||
|
||||
headers = {
|
||||
"x-api-key": self.api_key,
|
||||
"anthropic-version": "2023-06-01",
|
||||
"content-type": "application/json"
|
||||
}
|
||||
|
||||
messages = [{"role": "user", "content": prompt}]
|
||||
|
||||
payload = {
|
||||
"model": self.model,
|
||||
"max_tokens": max_tokens or self.config.max_tokens,
|
||||
"messages": messages
|
||||
}
|
||||
|
||||
if system_prompt:
|
||||
payload["system"] = system_prompt
|
||||
|
||||
if temperature is not None:
|
||||
payload["temperature"] = temperature
|
||||
elif self.config.temperature is not None:
|
||||
payload["temperature"] = self.config.temperature
|
||||
|
||||
async with httpx.AsyncClient(timeout=self.config.timeout) as client:
|
||||
try:
|
||||
response = await client.post(
|
||||
self.ANTHROPIC_API_URL,
|
||||
headers=headers,
|
||||
json=payload
|
||||
)
|
||||
response.raise_for_status()
|
||||
data = response.json()
|
||||
|
||||
content = ""
|
||||
if data.get("content"):
|
||||
content = data["content"][0].get("text", "")
|
||||
|
||||
return LLMResponse(
|
||||
content=content,
|
||||
model=self.model,
|
||||
provider=self.provider_name,
|
||||
usage=data.get("usage"),
|
||||
raw_response=data
|
||||
)
|
||||
|
||||
except httpx.HTTPStatusError as e:
|
||||
logger.error(f"Anthropic API error: {e.response.status_code} - {e.response.text}")
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"Anthropic API request failed: {e}")
|
||||
raise
|
||||
|
||||
async def batch_complete(
|
||||
self,
|
||||
prompts: List[str],
|
||||
system_prompt: Optional[str] = None,
|
||||
max_tokens: Optional[int] = None,
|
||||
rate_limit: float = 1.0
|
||||
) -> List[LLMResponse]:
|
||||
"""Process multiple prompts with rate limiting."""
|
||||
results = []
|
||||
|
||||
for i, prompt in enumerate(prompts):
|
||||
if i > 0:
|
||||
await asyncio.sleep(rate_limit)
|
||||
|
||||
try:
|
||||
result = await self.complete(
|
||||
prompt=prompt,
|
||||
system_prompt=system_prompt,
|
||||
max_tokens=max_tokens
|
||||
)
|
||||
results.append(result)
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to process prompt {i}: {e}")
|
||||
# Append error response
|
||||
results.append(LLMResponse(
|
||||
content=f"Error: {str(e)}",
|
||||
model=self.model,
|
||||
provider=self.provider_name
|
||||
))
|
||||
|
||||
return results
|
||||
|
||||
|
||||
class SelfHostedProvider(LLMProvider):
|
||||
"""Self-Hosted LLM Provider supporting Ollama, vLLM, LocalAI, etc."""
|
||||
|
||||
def __init__(self, config: LLMConfig):
|
||||
super().__init__(config)
|
||||
if not config.base_url:
|
||||
raise ValueError("Base URL is required for self-hosted provider")
|
||||
self.base_url = config.base_url.rstrip("/")
|
||||
self.model = config.model
|
||||
self.api_key = config.api_key
|
||||
|
||||
@property
|
||||
def provider_name(self) -> str:
|
||||
return "self_hosted"
|
||||
|
||||
def _detect_api_format(self) -> str:
|
||||
"""Detect the API format based on URL patterns."""
|
||||
if "11434" in self.base_url or "ollama" in self.base_url.lower():
|
||||
return "ollama"
|
||||
elif "openai" in self.base_url.lower() or "v1" in self.base_url:
|
||||
return "openai"
|
||||
else:
|
||||
return "ollama" # Default to Ollama format
|
||||
|
||||
async def complete(
|
||||
self,
|
||||
prompt: str,
|
||||
system_prompt: Optional[str] = None,
|
||||
max_tokens: Optional[int] = None,
|
||||
temperature: Optional[float] = None
|
||||
) -> LLMResponse:
|
||||
"""Generate completion using self-hosted LLM."""
|
||||
|
||||
api_format = self._detect_api_format()
|
||||
|
||||
headers = {"content-type": "application/json"}
|
||||
if self.api_key:
|
||||
headers["Authorization"] = f"Bearer {self.api_key}"
|
||||
|
||||
if api_format == "ollama":
|
||||
# Ollama API format
|
||||
endpoint = f"{self.base_url}/api/generate"
|
||||
full_prompt = prompt
|
||||
if system_prompt:
|
||||
full_prompt = f"{system_prompt}\n\n{prompt}"
|
||||
|
||||
payload = {
|
||||
"model": self.model,
|
||||
"prompt": full_prompt,
|
||||
"stream": False,
|
||||
"think": False, # Disable thinking mode (qwen3.5 etc.)
|
||||
"options": {}
|
||||
}
|
||||
|
||||
if max_tokens:
|
||||
payload["options"]["num_predict"] = max_tokens
|
||||
if temperature is not None:
|
||||
payload["options"]["temperature"] = temperature
|
||||
|
||||
else:
|
||||
# OpenAI-compatible format (vLLM, LocalAI, etc.)
|
||||
endpoint = f"{self.base_url}/v1/chat/completions"
|
||||
|
||||
messages = []
|
||||
if system_prompt:
|
||||
messages.append({"role": "system", "content": system_prompt})
|
||||
messages.append({"role": "user", "content": prompt})
|
||||
|
||||
payload = {
|
||||
"model": self.model,
|
||||
"messages": messages,
|
||||
"max_tokens": max_tokens or self.config.max_tokens,
|
||||
"temperature": temperature if temperature is not None else self.config.temperature
|
||||
}
|
||||
|
||||
async with httpx.AsyncClient(timeout=self.config.timeout) as client:
|
||||
try:
|
||||
response = await client.post(endpoint, headers=headers, json=payload)
|
||||
response.raise_for_status()
|
||||
data = response.json()
|
||||
|
||||
# Parse response based on format
|
||||
if api_format == "ollama":
|
||||
content = data.get("response", "")
|
||||
else:
|
||||
# OpenAI format
|
||||
content = data.get("choices", [{}])[0].get("message", {}).get("content", "")
|
||||
|
||||
return LLMResponse(
|
||||
content=content,
|
||||
model=self.model,
|
||||
provider=self.provider_name,
|
||||
usage=data.get("usage"),
|
||||
raw_response=data
|
||||
)
|
||||
|
||||
except httpx.HTTPStatusError as e:
|
||||
logger.error(f"Self-hosted LLM error: {e.response.status_code} - {e.response.text}")
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"Self-hosted LLM request failed: {e}")
|
||||
raise
|
||||
|
||||
async def batch_complete(
|
||||
self,
|
||||
prompts: List[str],
|
||||
system_prompt: Optional[str] = None,
|
||||
max_tokens: Optional[int] = None,
|
||||
rate_limit: float = 0.5 # Self-hosted can be faster
|
||||
) -> List[LLMResponse]:
|
||||
"""Process multiple prompts with rate limiting."""
|
||||
results = []
|
||||
|
||||
for i, prompt in enumerate(prompts):
|
||||
if i > 0:
|
||||
await asyncio.sleep(rate_limit)
|
||||
|
||||
try:
|
||||
result = await self.complete(
|
||||
prompt=prompt,
|
||||
system_prompt=system_prompt,
|
||||
max_tokens=max_tokens
|
||||
)
|
||||
results.append(result)
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to process prompt {i}: {e}")
|
||||
results.append(LLMResponse(
|
||||
content=f"Error: {str(e)}",
|
||||
model=self.model,
|
||||
provider=self.provider_name
|
||||
))
|
||||
|
||||
return results
|
||||
|
||||
|
||||
class MockProvider(LLMProvider):
|
||||
"""Mock provider for testing without actual API calls."""
|
||||
|
||||
def __init__(self, config: LLMConfig):
|
||||
super().__init__(config)
|
||||
self.responses: List[str] = []
|
||||
self.call_count = 0
|
||||
|
||||
@property
|
||||
def provider_name(self) -> str:
|
||||
return "mock"
|
||||
|
||||
def set_responses(self, responses: List[str]):
|
||||
"""Set predetermined responses for testing."""
|
||||
self.responses = responses
|
||||
self.call_count = 0
|
||||
|
||||
async def complete(
|
||||
self,
|
||||
prompt: str,
|
||||
system_prompt: Optional[str] = None,
|
||||
max_tokens: Optional[int] = None,
|
||||
temperature: Optional[float] = None
|
||||
) -> LLMResponse:
|
||||
"""Return mock response."""
|
||||
if self.responses:
|
||||
content = self.responses[self.call_count % len(self.responses)]
|
||||
else:
|
||||
content = f"Mock response for: {prompt[:50]}..."
|
||||
|
||||
self.call_count += 1
|
||||
|
||||
return LLMResponse(
|
||||
content=content,
|
||||
model="mock-model",
|
||||
provider=self.provider_name,
|
||||
usage={"input_tokens": len(prompt), "output_tokens": len(content)}
|
||||
)
|
||||
|
||||
async def batch_complete(
|
||||
self,
|
||||
prompts: List[str],
|
||||
system_prompt: Optional[str] = None,
|
||||
max_tokens: Optional[int] = None,
|
||||
rate_limit: float = 0.0
|
||||
) -> List[LLMResponse]:
|
||||
"""Return mock responses for batch."""
|
||||
return [await self.complete(p, system_prompt, max_tokens) for p in prompts]
|
||||
|
||||
|
||||
def get_llm_config() -> LLMConfig:
|
||||
"""
|
||||
Create LLM config from environment variables or Vault.
|
||||
|
||||
Priority for API key:
|
||||
1. Vault (if USE_VAULT_SECRETS=true and Vault is available)
|
||||
2. Environment variable (ANTHROPIC_API_KEY)
|
||||
"""
|
||||
provider_type_str = os.getenv("COMPLIANCE_LLM_PROVIDER", "anthropic")
|
||||
|
||||
try:
|
||||
provider_type = LLMProviderType(provider_type_str)
|
||||
except ValueError:
|
||||
logger.warning(f"Unknown LLM provider: {provider_type_str}, falling back to mock")
|
||||
provider_type = LLMProviderType.MOCK
|
||||
|
||||
# Get API key from Vault or environment
|
||||
api_key = None
|
||||
if provider_type == LLMProviderType.ANTHROPIC:
|
||||
api_key = get_secret_from_vault_or_env(
|
||||
vault_path="breakpilot/api_keys/anthropic",
|
||||
env_var="ANTHROPIC_API_KEY"
|
||||
)
|
||||
elif provider_type in (LLMProviderType.SELF_HOSTED, LLMProviderType.OLLAMA):
|
||||
api_key = get_secret_from_vault_or_env(
|
||||
vault_path="breakpilot/api_keys/self_hosted_llm",
|
||||
env_var="SELF_HOSTED_LLM_KEY"
|
||||
)
|
||||
|
||||
# Select model based on provider type
|
||||
if provider_type == LLMProviderType.ANTHROPIC:
|
||||
model = os.getenv("ANTHROPIC_MODEL", "claude-sonnet-4-20250514")
|
||||
elif provider_type in (LLMProviderType.SELF_HOSTED, LLMProviderType.OLLAMA):
|
||||
model = os.getenv("SELF_HOSTED_LLM_MODEL", "qwen2.5:14b")
|
||||
else:
|
||||
model = "mock-model"
|
||||
|
||||
return LLMConfig(
|
||||
provider_type=provider_type,
|
||||
api_key=api_key,
|
||||
model=model,
|
||||
base_url=os.getenv("SELF_HOSTED_LLM_URL"),
|
||||
max_tokens=int(os.getenv("COMPLIANCE_LLM_MAX_TOKENS", "4096")),
|
||||
temperature=float(os.getenv("COMPLIANCE_LLM_TEMPERATURE", "0.3")),
|
||||
timeout=float(os.getenv("COMPLIANCE_LLM_TIMEOUT", "60.0"))
|
||||
)
|
||||
|
||||
|
||||
def get_llm_provider(config: Optional[LLMConfig] = None) -> LLMProvider:
|
||||
"""
|
||||
Factory function to get the appropriate LLM provider based on configuration.
|
||||
|
||||
Usage:
|
||||
provider = get_llm_provider()
|
||||
response = await provider.complete("Analyze this requirement...")
|
||||
"""
|
||||
if config is None:
|
||||
config = get_llm_config()
|
||||
|
||||
if config.provider_type == LLMProviderType.ANTHROPIC:
|
||||
if not config.api_key:
|
||||
logger.warning("No Anthropic API key found, using mock provider")
|
||||
return MockProvider(config)
|
||||
return AnthropicProvider(config)
|
||||
|
||||
elif config.provider_type in (LLMProviderType.SELF_HOSTED, LLMProviderType.OLLAMA):
|
||||
if not config.base_url:
|
||||
logger.warning("No self-hosted LLM URL found, using mock provider")
|
||||
return MockProvider(config)
|
||||
return SelfHostedProvider(config)
|
||||
|
||||
elif config.provider_type == LLMProviderType.MOCK:
|
||||
return MockProvider(config)
|
||||
|
||||
else:
|
||||
raise ValueError(f"Unsupported LLM provider type: {config.provider_type}")
|
||||
|
||||
|
||||
# Singleton instance for reuse
|
||||
_provider_instance: Optional[LLMProvider] = None
|
||||
|
||||
|
||||
def get_shared_provider() -> LLMProvider:
|
||||
"""Get a shared LLM provider instance."""
|
||||
global _provider_instance
|
||||
if _provider_instance is None:
|
||||
_provider_instance = get_llm_provider()
|
||||
return _provider_instance
|
||||
|
||||
|
||||
def reset_shared_provider():
|
||||
"""Reset the shared provider instance (useful for testing)."""
|
||||
global _provider_instance
|
||||
_provider_instance = None
|
||||
59
control-pipeline/services/normative_patterns.py
Normal file
59
control-pipeline/services/normative_patterns.py
Normal file
@@ -0,0 +1,59 @@
|
||||
"""Shared normative language patterns for assertion classification.
|
||||
|
||||
Extracted from decomposition_pass.py for reuse in the assertion engine.
|
||||
"""
|
||||
|
||||
import re
|
||||
|
||||
_PFLICHT_SIGNALS = [
|
||||
r"\bmüssen\b", r"\bmuss\b", r"\bhat\s+sicherzustellen\b",
|
||||
r"\bhaben\s+sicherzustellen\b", r"\bsind\s+verpflichtet\b",
|
||||
r"\bist\s+verpflichtet\b",
|
||||
r"\bist\s+zu\s+\w+en\b", r"\bsind\s+zu\s+\w+en\b",
|
||||
r"\bhat\s+zu\s+\w+en\b", r"\bhaben\s+zu\s+\w+en\b",
|
||||
r"\bist\s+\w+zu\w+en\b", r"\bsind\s+\w+zu\w+en\b",
|
||||
r"\bist\s+\w+\s+zu\s+\w+en\b", r"\bsind\s+\w+\s+zu\s+\w+en\b",
|
||||
r"\bhat\s+\w+\s+zu\s+\w+en\b", r"\bhaben\s+\w+\s+zu\s+\w+en\b",
|
||||
r"\bshall\b", r"\bmust\b", r"\brequired\b",
|
||||
r"\b\w+zuteilen\b", r"\b\w+zuwenden\b", r"\b\w+zustellen\b", r"\b\w+zulegen\b",
|
||||
r"\b\w+zunehmen\b", r"\b\w+zuführen\b", r"\b\w+zuhalten\b", r"\b\w+zusetzen\b",
|
||||
r"\b\w+zuweisen\b", r"\b\w+zuordnen\b", r"\b\w+zufügen\b", r"\b\w+zugeben\b",
|
||||
r"\bist\b.{1,80}\bzu\s+\w+en\b", r"\bsind\b.{1,80}\bzu\s+\w+en\b",
|
||||
]
|
||||
PFLICHT_RE = re.compile("|".join(_PFLICHT_SIGNALS), re.IGNORECASE)
|
||||
|
||||
_EMPFEHLUNG_SIGNALS = [
|
||||
r"\bsoll\b", r"\bsollen\b", r"\bsollte\b", r"\bsollten\b",
|
||||
r"\bgewährleisten\b", r"\bsicherstellen\b",
|
||||
r"\bshould\b", r"\bensure\b", r"\brecommend\w*\b",
|
||||
r"\bnachweisen\b", r"\beinhalten\b", r"\bunterlassen\b", r"\bwahren\b",
|
||||
r"\bdokumentieren\b", r"\bimplementieren\b", r"\büberprüfen\b", r"\büberwachen\b",
|
||||
r"\bprüfen,\s+ob\b", r"\bkontrollieren,\s+ob\b",
|
||||
]
|
||||
EMPFEHLUNG_RE = re.compile("|".join(_EMPFEHLUNG_SIGNALS), re.IGNORECASE)
|
||||
|
||||
_KANN_SIGNALS = [
|
||||
r"\bkann\b", r"\bkönnen\b", r"\bdarf\b", r"\bdürfen\b",
|
||||
r"\bmay\b", r"\boptional\b",
|
||||
]
|
||||
KANN_RE = re.compile("|".join(_KANN_SIGNALS), re.IGNORECASE)
|
||||
|
||||
NORMATIVE_RE = re.compile(
|
||||
"|".join(_PFLICHT_SIGNALS + _EMPFEHLUNG_SIGNALS + _KANN_SIGNALS),
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
_RATIONALE_SIGNALS = [
|
||||
r"\bda\s+", r"\bweil\b", r"\bgrund\b", r"\berwägung",
|
||||
r"\bbecause\b", r"\breason\b", r"\brationale\b",
|
||||
r"\bkönnen\s+.*\s+verursachen\b", r"\bführt\s+zu\b",
|
||||
]
|
||||
RATIONALE_RE = re.compile("|".join(_RATIONALE_SIGNALS), re.IGNORECASE)
|
||||
|
||||
# Evidence-related keywords (for fact detection)
|
||||
_EVIDENCE_KEYWORDS = [
|
||||
r"\bnachweis\b", r"\bzertifikat\b", r"\baudit.report\b",
|
||||
r"\bprotokoll\b", r"\bdokumentation\b", r"\bbericht\b",
|
||||
r"\bcertificate\b", r"\bevidence\b", r"\bproof\b",
|
||||
]
|
||||
EVIDENCE_RE = re.compile("|".join(_EVIDENCE_KEYWORDS), re.IGNORECASE)
|
||||
563
control-pipeline/services/obligation_extractor.py
Normal file
563
control-pipeline/services/obligation_extractor.py
Normal file
@@ -0,0 +1,563 @@
|
||||
"""Obligation Extractor — 3-Tier Chunk-to-Obligation Linking.
|
||||
|
||||
Maps RAG chunks to obligations from the v2 obligation framework using
|
||||
three tiers (fastest first):
|
||||
|
||||
Tier 1: EXACT MATCH — regulation_code + article → obligation_id (~40%)
|
||||
Tier 2: EMBEDDING — chunk text vs. obligation descriptions (~30%)
|
||||
Tier 3: LLM EXTRACT — local Ollama extracts obligation text (~25%)
|
||||
|
||||
Part of the Multi-Layer Control Architecture (Phase 4 of 8).
|
||||
"""
|
||||
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
import re
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
import httpx
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
EMBEDDING_URL = os.getenv("EMBEDDING_URL", "http://embedding-service:8087")
|
||||
OLLAMA_URL = os.getenv("OLLAMA_URL", "http://host.docker.internal:11434")
|
||||
OLLAMA_MODEL = os.getenv("CONTROL_GEN_OLLAMA_MODEL", "qwen3.5:35b-a3b")
|
||||
LLM_TIMEOUT = float(os.getenv("CONTROL_GEN_LLM_TIMEOUT", "180"))
|
||||
|
||||
# Embedding similarity thresholds for Tier 2
|
||||
EMBEDDING_MATCH_THRESHOLD = 0.80
|
||||
EMBEDDING_CANDIDATE_THRESHOLD = 0.60
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Regulation code mapping: RAG chunk codes → obligation file regulation IDs
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_REGULATION_CODE_TO_ID = {
|
||||
# DSGVO
|
||||
"eu_2016_679": "dsgvo",
|
||||
"dsgvo": "dsgvo",
|
||||
"gdpr": "dsgvo",
|
||||
# AI Act
|
||||
"eu_2024_1689": "ai_act",
|
||||
"ai_act": "ai_act",
|
||||
"aiact": "ai_act",
|
||||
# NIS2
|
||||
"eu_2022_2555": "nis2",
|
||||
"nis2": "nis2",
|
||||
"bsig": "nis2",
|
||||
# BDSG
|
||||
"bdsg": "bdsg",
|
||||
# TTDSG
|
||||
"ttdsg": "ttdsg",
|
||||
# DSA
|
||||
"eu_2022_2065": "dsa",
|
||||
"dsa": "dsa",
|
||||
# Data Act
|
||||
"eu_2023_2854": "data_act",
|
||||
"data_act": "data_act",
|
||||
# EU Machinery
|
||||
"eu_2023_1230": "eu_machinery",
|
||||
"eu_machinery": "eu_machinery",
|
||||
# DORA
|
||||
"eu_2022_2554": "dora",
|
||||
"dora": "dora",
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class ObligationMatch:
|
||||
"""Result of obligation extraction."""
|
||||
|
||||
obligation_id: Optional[str] = None
|
||||
obligation_title: Optional[str] = None
|
||||
obligation_text: Optional[str] = None
|
||||
method: str = "none" # exact_match | embedding_match | llm_extracted | inferred
|
||||
confidence: float = 0.0
|
||||
regulation_id: Optional[str] = None # e.g. "dsgvo"
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {
|
||||
"obligation_id": self.obligation_id,
|
||||
"obligation_title": self.obligation_title,
|
||||
"obligation_text": self.obligation_text,
|
||||
"method": self.method,
|
||||
"confidence": self.confidence,
|
||||
"regulation_id": self.regulation_id,
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class _ObligationEntry:
|
||||
"""Internal representation of a loaded obligation."""
|
||||
|
||||
id: str
|
||||
title: str
|
||||
description: str
|
||||
regulation_id: str
|
||||
articles: list[str] = field(default_factory=list) # normalized: ["art. 30", "§ 38"]
|
||||
embedding: list[float] = field(default_factory=list)
|
||||
|
||||
|
||||
class ObligationExtractor:
|
||||
"""3-Tier obligation extraction from RAG chunks.
|
||||
|
||||
Usage::
|
||||
|
||||
extractor = ObligationExtractor()
|
||||
await extractor.initialize() # loads obligations + embeddings
|
||||
|
||||
match = await extractor.extract(
|
||||
chunk_text="...",
|
||||
regulation_code="eu_2016_679",
|
||||
article="Art. 30",
|
||||
paragraph="Abs. 1",
|
||||
)
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
self._article_lookup: dict[str, list[str]] = {} # "dsgvo/art. 30" → ["DSGVO-OBL-001"]
|
||||
self._obligations: dict[str, _ObligationEntry] = {} # id → entry
|
||||
self._obligation_embeddings: list[list[float]] = []
|
||||
self._obligation_ids: list[str] = []
|
||||
self._initialized = False
|
||||
|
||||
async def initialize(self) -> None:
|
||||
"""Load all obligations from v2 JSON files and compute embeddings."""
|
||||
if self._initialized:
|
||||
return
|
||||
|
||||
self._load_obligations()
|
||||
await self._compute_embeddings()
|
||||
self._initialized = True
|
||||
logger.info(
|
||||
"ObligationExtractor initialized: %d obligations, %d article lookups, %d embeddings",
|
||||
len(self._obligations),
|
||||
len(self._article_lookup),
|
||||
sum(1 for e in self._obligation_embeddings if e),
|
||||
)
|
||||
|
||||
async def extract(
|
||||
self,
|
||||
chunk_text: str,
|
||||
regulation_code: str,
|
||||
article: Optional[str] = None,
|
||||
paragraph: Optional[str] = None,
|
||||
) -> ObligationMatch:
|
||||
"""Extract obligation from a chunk using 3-tier strategy."""
|
||||
if not self._initialized:
|
||||
await self.initialize()
|
||||
|
||||
reg_id = _normalize_regulation(regulation_code)
|
||||
|
||||
# Tier 1: Exact match via article lookup
|
||||
if article:
|
||||
match = self._tier1_exact(reg_id, article)
|
||||
if match:
|
||||
return match
|
||||
|
||||
# Tier 2: Embedding similarity
|
||||
match = await self._tier2_embedding(chunk_text, reg_id)
|
||||
if match:
|
||||
return match
|
||||
|
||||
# Tier 3: LLM extraction
|
||||
match = await self._tier3_llm(chunk_text, regulation_code, article)
|
||||
return match
|
||||
|
||||
# -----------------------------------------------------------------------
|
||||
# Tier 1: Exact Match
|
||||
# -----------------------------------------------------------------------
|
||||
|
||||
def _tier1_exact(self, reg_id: Optional[str], article: str) -> Optional[ObligationMatch]:
|
||||
"""Look up obligation by regulation + article."""
|
||||
if not reg_id:
|
||||
return None
|
||||
|
||||
norm_article = _normalize_article(article)
|
||||
key = f"{reg_id}/{norm_article}"
|
||||
|
||||
obl_ids = self._article_lookup.get(key)
|
||||
if not obl_ids:
|
||||
return None
|
||||
|
||||
# Take the first match (highest priority)
|
||||
obl_id = obl_ids[0]
|
||||
entry = self._obligations.get(obl_id)
|
||||
if not entry:
|
||||
return None
|
||||
|
||||
return ObligationMatch(
|
||||
obligation_id=entry.id,
|
||||
obligation_title=entry.title,
|
||||
obligation_text=entry.description,
|
||||
method="exact_match",
|
||||
confidence=1.0,
|
||||
regulation_id=reg_id,
|
||||
)
|
||||
|
||||
# -----------------------------------------------------------------------
|
||||
# Tier 2: Embedding Match
|
||||
# -----------------------------------------------------------------------
|
||||
|
||||
async def _tier2_embedding(
|
||||
self, chunk_text: str, reg_id: Optional[str]
|
||||
) -> Optional[ObligationMatch]:
|
||||
"""Find nearest obligation by embedding similarity."""
|
||||
if not self._obligation_embeddings:
|
||||
return None
|
||||
|
||||
chunk_embedding = await _get_embedding(chunk_text[:2000])
|
||||
if not chunk_embedding:
|
||||
return None
|
||||
|
||||
best_idx = -1
|
||||
best_score = 0.0
|
||||
|
||||
for i, obl_emb in enumerate(self._obligation_embeddings):
|
||||
if not obl_emb:
|
||||
continue
|
||||
# Prefer same-regulation matches
|
||||
obl_id = self._obligation_ids[i]
|
||||
entry = self._obligations.get(obl_id)
|
||||
score = _cosine_sim(chunk_embedding, obl_emb)
|
||||
|
||||
# Domain bonus: +0.05 if same regulation
|
||||
if entry and reg_id and entry.regulation_id == reg_id:
|
||||
score += 0.05
|
||||
|
||||
if score > best_score:
|
||||
best_score = score
|
||||
best_idx = i
|
||||
|
||||
if best_idx < 0:
|
||||
return None
|
||||
|
||||
# Remove domain bonus for threshold comparison
|
||||
raw_score = best_score
|
||||
obl_id = self._obligation_ids[best_idx]
|
||||
entry = self._obligations.get(obl_id)
|
||||
if entry and reg_id and entry.regulation_id == reg_id:
|
||||
raw_score -= 0.05
|
||||
|
||||
if raw_score >= EMBEDDING_MATCH_THRESHOLD:
|
||||
return ObligationMatch(
|
||||
obligation_id=entry.id if entry else obl_id,
|
||||
obligation_title=entry.title if entry else None,
|
||||
obligation_text=entry.description if entry else None,
|
||||
method="embedding_match",
|
||||
confidence=round(min(raw_score, 1.0), 3),
|
||||
regulation_id=entry.regulation_id if entry else reg_id,
|
||||
)
|
||||
|
||||
return None
|
||||
|
||||
# -----------------------------------------------------------------------
|
||||
# Tier 3: LLM Extraction
|
||||
# -----------------------------------------------------------------------
|
||||
|
||||
async def _tier3_llm(
|
||||
self, chunk_text: str, regulation_code: str, article: Optional[str]
|
||||
) -> ObligationMatch:
|
||||
"""Use local LLM to extract the obligation from the chunk."""
|
||||
prompt = f"""Analysiere den folgenden Gesetzestext und extrahiere die zentrale rechtliche Pflicht.
|
||||
|
||||
Text:
|
||||
{chunk_text[:3000]}
|
||||
|
||||
Quelle: {regulation_code} {article or ''}
|
||||
|
||||
Antworte NUR als JSON:
|
||||
{{
|
||||
"obligation_text": "Die zentrale Pflicht in einem Satz",
|
||||
"actor": "Wer muss handeln (z.B. Verantwortlicher, Auftragsverarbeiter)",
|
||||
"action": "Was muss getan werden",
|
||||
"normative_strength": "muss|soll|kann"
|
||||
}}"""
|
||||
|
||||
system_prompt = (
|
||||
"Du bist ein Rechtsexperte fuer EU-Datenschutz- und Digitalrecht. "
|
||||
"Extrahiere die zentrale rechtliche Pflicht aus Gesetzestexten. "
|
||||
"Antworte ausschliesslich als JSON."
|
||||
)
|
||||
|
||||
result_text = await _llm_ollama(prompt, system_prompt)
|
||||
if not result_text:
|
||||
return ObligationMatch(
|
||||
method="llm_extracted",
|
||||
confidence=0.0,
|
||||
regulation_id=_normalize_regulation(regulation_code),
|
||||
)
|
||||
|
||||
parsed = _parse_json(result_text)
|
||||
obligation_text = parsed.get("obligation_text", result_text[:500])
|
||||
|
||||
return ObligationMatch(
|
||||
obligation_id=None,
|
||||
obligation_title=None,
|
||||
obligation_text=obligation_text,
|
||||
method="llm_extracted",
|
||||
confidence=0.60,
|
||||
regulation_id=_normalize_regulation(regulation_code),
|
||||
)
|
||||
|
||||
# -----------------------------------------------------------------------
|
||||
# Initialization helpers
|
||||
# -----------------------------------------------------------------------
|
||||
|
||||
def _load_obligations(self) -> None:
|
||||
"""Load all obligation files from v2 framework."""
|
||||
v2_dir = _find_obligations_dir()
|
||||
if not v2_dir:
|
||||
logger.warning("Obligations v2 directory not found — Tier 1 disabled")
|
||||
return
|
||||
|
||||
manifest_path = v2_dir / "_manifest.json"
|
||||
if not manifest_path.exists():
|
||||
logger.warning("Manifest not found at %s", manifest_path)
|
||||
return
|
||||
|
||||
with open(manifest_path) as f:
|
||||
manifest = json.load(f)
|
||||
|
||||
for reg_info in manifest.get("regulations", []):
|
||||
reg_id = reg_info["id"]
|
||||
reg_file = v2_dir / reg_info["file"]
|
||||
if not reg_file.exists():
|
||||
logger.warning("Regulation file not found: %s", reg_file)
|
||||
continue
|
||||
|
||||
with open(reg_file) as f:
|
||||
data = json.load(f)
|
||||
|
||||
for obl in data.get("obligations", []):
|
||||
obl_id = obl["id"]
|
||||
entry = _ObligationEntry(
|
||||
id=obl_id,
|
||||
title=obl.get("title", ""),
|
||||
description=obl.get("description", ""),
|
||||
regulation_id=reg_id,
|
||||
)
|
||||
|
||||
# Build article lookup from legal_basis
|
||||
for basis in obl.get("legal_basis", []):
|
||||
article_raw = basis.get("article", "")
|
||||
if article_raw:
|
||||
norm_art = _normalize_article(article_raw)
|
||||
key = f"{reg_id}/{norm_art}"
|
||||
if key not in self._article_lookup:
|
||||
self._article_lookup[key] = []
|
||||
self._article_lookup[key].append(obl_id)
|
||||
entry.articles.append(norm_art)
|
||||
|
||||
self._obligations[obl_id] = entry
|
||||
|
||||
logger.info(
|
||||
"Loaded %d obligations from %d regulations",
|
||||
len(self._obligations),
|
||||
len(manifest.get("regulations", [])),
|
||||
)
|
||||
|
||||
async def _compute_embeddings(self) -> None:
|
||||
"""Compute embeddings for all obligation descriptions."""
|
||||
if not self._obligations:
|
||||
return
|
||||
|
||||
self._obligation_ids = list(self._obligations.keys())
|
||||
texts = [
|
||||
f"{self._obligations[oid].title}: {self._obligations[oid].description}"
|
||||
for oid in self._obligation_ids
|
||||
]
|
||||
|
||||
logger.info("Computing embeddings for %d obligations...", len(texts))
|
||||
self._obligation_embeddings = await _get_embeddings_batch(texts)
|
||||
valid = sum(1 for e in self._obligation_embeddings if e)
|
||||
logger.info("Got %d/%d valid embeddings", valid, len(texts))
|
||||
|
||||
# -----------------------------------------------------------------------
|
||||
# Stats
|
||||
# -----------------------------------------------------------------------
|
||||
|
||||
def stats(self) -> dict:
|
||||
"""Return initialization statistics."""
|
||||
return {
|
||||
"total_obligations": len(self._obligations),
|
||||
"article_lookups": len(self._article_lookup),
|
||||
"embeddings_valid": sum(1 for e in self._obligation_embeddings if e),
|
||||
"regulations": list(
|
||||
{e.regulation_id for e in self._obligations.values()}
|
||||
),
|
||||
"initialized": self._initialized,
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Module-level helpers (reusable by other modules)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _normalize_regulation(regulation_code: str) -> Optional[str]:
|
||||
"""Map a RAG regulation_code to obligation framework regulation ID."""
|
||||
if not regulation_code:
|
||||
return None
|
||||
code = regulation_code.lower().strip()
|
||||
|
||||
# Direct lookup
|
||||
if code in _REGULATION_CODE_TO_ID:
|
||||
return _REGULATION_CODE_TO_ID[code]
|
||||
|
||||
# Prefix matching for families
|
||||
for prefix, reg_id in [
|
||||
("eu_2016_679", "dsgvo"),
|
||||
("eu_2024_1689", "ai_act"),
|
||||
("eu_2022_2555", "nis2"),
|
||||
("eu_2022_2065", "dsa"),
|
||||
("eu_2023_2854", "data_act"),
|
||||
("eu_2023_1230", "eu_machinery"),
|
||||
("eu_2022_2554", "dora"),
|
||||
]:
|
||||
if code.startswith(prefix):
|
||||
return reg_id
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def _normalize_article(article: str) -> str:
|
||||
"""Normalize article references for consistent lookup.
|
||||
|
||||
Examples:
|
||||
"Art. 30" → "art. 30"
|
||||
"§ 38 BDSG" → "§ 38"
|
||||
"Article 10" → "art. 10"
|
||||
"Art. 30 Abs. 1" → "art. 30"
|
||||
"Artikel 35" → "art. 35"
|
||||
"""
|
||||
if not article:
|
||||
return ""
|
||||
s = article.strip()
|
||||
|
||||
# Remove trailing law name: "§ 38 BDSG" → "§ 38"
|
||||
s = re.sub(r"\s+(DSGVO|BDSG|TTDSG|DSA|NIS2|DORA|AI.?Act)\s*$", "", s, flags=re.IGNORECASE)
|
||||
|
||||
# Remove paragraph references: "Art. 30 Abs. 1" → "Art. 30"
|
||||
s = re.sub(r"\s+(Abs|Absatz|para|paragraph|lit|Satz)\.?\s+.*$", "", s, flags=re.IGNORECASE)
|
||||
|
||||
# Normalize "Article" / "Artikel" → "Art."
|
||||
s = re.sub(r"^(Article|Artikel)\s+", "Art. ", s, flags=re.IGNORECASE)
|
||||
|
||||
return s.lower().strip()
|
||||
|
||||
|
||||
def _cosine_sim(a: list[float], b: list[float]) -> float:
|
||||
"""Compute cosine similarity between two vectors."""
|
||||
if not a or not b or len(a) != len(b):
|
||||
return 0.0
|
||||
dot = sum(x * y for x, y in zip(a, b))
|
||||
norm_a = sum(x * x for x in a) ** 0.5
|
||||
norm_b = sum(x * x for x in b) ** 0.5
|
||||
if norm_a == 0 or norm_b == 0:
|
||||
return 0.0
|
||||
return dot / (norm_a * norm_b)
|
||||
|
||||
|
||||
def _find_obligations_dir() -> Optional[Path]:
|
||||
"""Locate the obligations v2 directory."""
|
||||
candidates = [
|
||||
Path(__file__).resolve().parent.parent.parent.parent
|
||||
/ "ai-compliance-sdk" / "policies" / "obligations" / "v2",
|
||||
Path("/app/ai-compliance-sdk/policies/obligations/v2"),
|
||||
Path("ai-compliance-sdk/policies/obligations/v2"),
|
||||
]
|
||||
for p in candidates:
|
||||
if p.is_dir() and (p / "_manifest.json").exists():
|
||||
return p
|
||||
return None
|
||||
|
||||
|
||||
async def _get_embedding(text: str) -> list[float]:
|
||||
"""Get embedding vector for a single text."""
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
resp = await client.post(
|
||||
f"{EMBEDDING_URL}/embed",
|
||||
json={"texts": [text]},
|
||||
)
|
||||
resp.raise_for_status()
|
||||
embeddings = resp.json().get("embeddings", [])
|
||||
return embeddings[0] if embeddings else []
|
||||
except Exception:
|
||||
return []
|
||||
|
||||
|
||||
async def _get_embeddings_batch(
|
||||
texts: list[str], batch_size: int = 32
|
||||
) -> list[list[float]]:
|
||||
"""Get embeddings for multiple texts in batches."""
|
||||
all_embeddings: list[list[float]] = []
|
||||
for i in range(0, len(texts), batch_size):
|
||||
batch = texts[i : i + batch_size]
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=30.0) as client:
|
||||
resp = await client.post(
|
||||
f"{EMBEDDING_URL}/embed",
|
||||
json={"texts": batch},
|
||||
)
|
||||
resp.raise_for_status()
|
||||
embeddings = resp.json().get("embeddings", [])
|
||||
all_embeddings.extend(embeddings)
|
||||
except Exception as e:
|
||||
logger.warning("Batch embedding failed for %d texts: %s", len(batch), e)
|
||||
all_embeddings.extend([[] for _ in batch])
|
||||
return all_embeddings
|
||||
|
||||
|
||||
async def _llm_ollama(prompt: str, system_prompt: Optional[str] = None) -> str:
|
||||
"""Call local Ollama for LLM extraction."""
|
||||
messages = []
|
||||
if system_prompt:
|
||||
messages.append({"role": "system", "content": system_prompt})
|
||||
messages.append({"role": "user", "content": prompt})
|
||||
|
||||
payload = {
|
||||
"model": OLLAMA_MODEL,
|
||||
"messages": messages,
|
||||
"stream": False,
|
||||
"format": "json",
|
||||
"options": {"num_predict": 512},
|
||||
"think": False,
|
||||
}
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=LLM_TIMEOUT) as client:
|
||||
resp = await client.post(f"{OLLAMA_URL}/api/chat", json=payload)
|
||||
if resp.status_code != 200:
|
||||
logger.error(
|
||||
"Ollama chat failed %d: %s", resp.status_code, resp.text[:300]
|
||||
)
|
||||
return ""
|
||||
data = resp.json()
|
||||
return data.get("message", {}).get("content", "")
|
||||
except Exception as e:
|
||||
logger.warning("Ollama call failed: %s", e)
|
||||
return ""
|
||||
|
||||
|
||||
def _parse_json(text: str) -> dict:
|
||||
"""Extract JSON from LLM response text."""
|
||||
# Try direct parse
|
||||
try:
|
||||
return json.loads(text)
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
# Try extracting JSON block
|
||||
match = re.search(r"\{[^{}]*\}", text, re.DOTALL)
|
||||
if match:
|
||||
try:
|
||||
return json.loads(match.group())
|
||||
except json.JSONDecodeError:
|
||||
pass
|
||||
|
||||
return {}
|
||||
532
control-pipeline/services/pattern_matcher.py
Normal file
532
control-pipeline/services/pattern_matcher.py
Normal file
@@ -0,0 +1,532 @@
|
||||
"""Pattern Matcher — Obligation-to-Control-Pattern Linking.
|
||||
|
||||
Maps obligations (from the ObligationExtractor) to control patterns
|
||||
using two tiers:
|
||||
|
||||
Tier 1: KEYWORD MATCH — obligation_match_keywords from patterns (~70%)
|
||||
Tier 2: EMBEDDING — cosine similarity with domain bonus (~25%)
|
||||
|
||||
Part of the Multi-Layer Control Architecture (Phase 5 of 8).
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
import yaml
|
||||
|
||||
from services.obligation_extractor import (
|
||||
_cosine_sim,
|
||||
_get_embedding,
|
||||
_get_embeddings_batch,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Minimum keyword score to accept a match (at least 2 keyword hits)
|
||||
KEYWORD_MATCH_MIN_HITS = 2
|
||||
# Embedding threshold for Tier 2
|
||||
EMBEDDING_PATTERN_THRESHOLD = 0.75
|
||||
# Domain bonus when regulation maps to the pattern's domain
|
||||
DOMAIN_BONUS = 0.10
|
||||
|
||||
# Map regulation IDs to pattern domains that are likely relevant
|
||||
_REGULATION_DOMAIN_AFFINITY = {
|
||||
"dsgvo": ["DATA", "COMP", "GOV"],
|
||||
"bdsg": ["DATA", "COMP"],
|
||||
"ttdsg": ["DATA"],
|
||||
"ai_act": ["AI", "COMP", "DATA"],
|
||||
"nis2": ["SEC", "INC", "NET", "LOG", "CRYP"],
|
||||
"dsa": ["DATA", "COMP"],
|
||||
"data_act": ["DATA", "COMP"],
|
||||
"eu_machinery": ["SEC", "COMP"],
|
||||
"dora": ["SEC", "INC", "FIN", "COMP"],
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class ControlPattern:
|
||||
"""Python representation of a control pattern from YAML."""
|
||||
|
||||
id: str
|
||||
name: str
|
||||
name_de: str
|
||||
domain: str
|
||||
category: str
|
||||
description: str
|
||||
objective_template: str
|
||||
rationale_template: str
|
||||
requirements_template: list[str] = field(default_factory=list)
|
||||
test_procedure_template: list[str] = field(default_factory=list)
|
||||
evidence_template: list[str] = field(default_factory=list)
|
||||
severity_default: str = "medium"
|
||||
implementation_effort_default: str = "m"
|
||||
obligation_match_keywords: list[str] = field(default_factory=list)
|
||||
tags: list[str] = field(default_factory=list)
|
||||
composable_with: list[str] = field(default_factory=list)
|
||||
open_anchor_refs: list[dict] = field(default_factory=list)
|
||||
|
||||
|
||||
@dataclass
|
||||
class PatternMatchResult:
|
||||
"""Result of pattern matching."""
|
||||
|
||||
pattern: Optional[ControlPattern] = None
|
||||
pattern_id: Optional[str] = None
|
||||
method: str = "none" # keyword | embedding | combined | none
|
||||
confidence: float = 0.0
|
||||
keyword_hits: int = 0
|
||||
total_keywords: int = 0
|
||||
embedding_score: float = 0.0
|
||||
domain_bonus_applied: bool = False
|
||||
composable_patterns: list[str] = field(default_factory=list)
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {
|
||||
"pattern_id": self.pattern_id,
|
||||
"method": self.method,
|
||||
"confidence": round(self.confidence, 3),
|
||||
"keyword_hits": self.keyword_hits,
|
||||
"total_keywords": self.total_keywords,
|
||||
"embedding_score": round(self.embedding_score, 3),
|
||||
"domain_bonus_applied": self.domain_bonus_applied,
|
||||
"composable_patterns": self.composable_patterns,
|
||||
}
|
||||
|
||||
|
||||
class PatternMatcher:
|
||||
"""Links obligations to control patterns using keyword + embedding matching.
|
||||
|
||||
Usage::
|
||||
|
||||
matcher = PatternMatcher()
|
||||
await matcher.initialize()
|
||||
|
||||
result = await matcher.match(
|
||||
obligation_text="Fuehrung eines Verarbeitungsverzeichnisses...",
|
||||
regulation_id="dsgvo",
|
||||
)
|
||||
print(result.pattern_id) # e.g. "CP-COMP-001"
|
||||
print(result.confidence) # e.g. 0.85
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
self._patterns: list[ControlPattern] = []
|
||||
self._by_id: dict[str, ControlPattern] = {}
|
||||
self._by_domain: dict[str, list[ControlPattern]] = {}
|
||||
self._keyword_index: dict[str, list[str]] = {} # keyword → [pattern_ids]
|
||||
self._pattern_embeddings: list[list[float]] = []
|
||||
self._pattern_ids: list[str] = []
|
||||
self._initialized = False
|
||||
|
||||
async def initialize(self) -> None:
|
||||
"""Load patterns from YAML and compute embeddings."""
|
||||
if self._initialized:
|
||||
return
|
||||
|
||||
self._load_patterns()
|
||||
self._build_keyword_index()
|
||||
await self._compute_embeddings()
|
||||
self._initialized = True
|
||||
logger.info(
|
||||
"PatternMatcher initialized: %d patterns, %d keywords, %d embeddings",
|
||||
len(self._patterns),
|
||||
len(self._keyword_index),
|
||||
sum(1 for e in self._pattern_embeddings if e),
|
||||
)
|
||||
|
||||
async def match(
|
||||
self,
|
||||
obligation_text: str,
|
||||
regulation_id: Optional[str] = None,
|
||||
top_n: int = 1,
|
||||
) -> PatternMatchResult:
|
||||
"""Match obligation text to the best control pattern.
|
||||
|
||||
Args:
|
||||
obligation_text: The obligation description to match against.
|
||||
regulation_id: Source regulation (for domain bonus).
|
||||
top_n: Number of top results to consider for composability.
|
||||
|
||||
Returns:
|
||||
PatternMatchResult with the best match.
|
||||
"""
|
||||
if not self._initialized:
|
||||
await self.initialize()
|
||||
|
||||
if not obligation_text or not self._patterns:
|
||||
return PatternMatchResult()
|
||||
|
||||
# Tier 1: Keyword matching
|
||||
keyword_result = self._tier1_keyword(obligation_text, regulation_id)
|
||||
|
||||
# Tier 2: Embedding matching
|
||||
embedding_result = await self._tier2_embedding(obligation_text, regulation_id)
|
||||
|
||||
# Combine scores: prefer keyword match, boost with embedding if available
|
||||
best = self._combine_results(keyword_result, embedding_result)
|
||||
|
||||
# Attach composable patterns
|
||||
if best.pattern:
|
||||
best.composable_patterns = [
|
||||
pid for pid in best.pattern.composable_with
|
||||
if pid in self._by_id
|
||||
]
|
||||
|
||||
return best
|
||||
|
||||
async def match_top_n(
|
||||
self,
|
||||
obligation_text: str,
|
||||
regulation_id: Optional[str] = None,
|
||||
n: int = 3,
|
||||
) -> list[PatternMatchResult]:
|
||||
"""Return top-N pattern matches sorted by confidence descending."""
|
||||
if not self._initialized:
|
||||
await self.initialize()
|
||||
|
||||
if not obligation_text or not self._patterns:
|
||||
return []
|
||||
|
||||
keyword_scores = self._keyword_scores(obligation_text, regulation_id)
|
||||
embedding_scores = await self._embedding_scores(obligation_text, regulation_id)
|
||||
|
||||
# Merge scores
|
||||
all_pattern_ids = set(keyword_scores.keys()) | set(embedding_scores.keys())
|
||||
results: list[PatternMatchResult] = []
|
||||
|
||||
for pid in all_pattern_ids:
|
||||
pattern = self._by_id.get(pid)
|
||||
if not pattern:
|
||||
continue
|
||||
|
||||
kw_score = keyword_scores.get(pid, (0, 0, 0.0)) # (hits, total, score)
|
||||
emb_score = embedding_scores.get(pid, (0.0, False)) # (score, bonus_applied)
|
||||
|
||||
kw_hits, kw_total, kw_confidence = kw_score
|
||||
emb_confidence, bonus_applied = emb_score
|
||||
|
||||
# Combined confidence: max of keyword and embedding, with boost if both
|
||||
if kw_confidence > 0 and emb_confidence > 0:
|
||||
combined = max(kw_confidence, emb_confidence) + 0.05
|
||||
method = "combined"
|
||||
elif kw_confidence > 0:
|
||||
combined = kw_confidence
|
||||
method = "keyword"
|
||||
else:
|
||||
combined = emb_confidence
|
||||
method = "embedding"
|
||||
|
||||
results.append(PatternMatchResult(
|
||||
pattern=pattern,
|
||||
pattern_id=pid,
|
||||
method=method,
|
||||
confidence=min(combined, 1.0),
|
||||
keyword_hits=kw_hits,
|
||||
total_keywords=kw_total,
|
||||
embedding_score=emb_confidence,
|
||||
domain_bonus_applied=bonus_applied,
|
||||
composable_patterns=[
|
||||
p for p in pattern.composable_with if p in self._by_id
|
||||
],
|
||||
))
|
||||
|
||||
# Sort by confidence descending
|
||||
results.sort(key=lambda r: r.confidence, reverse=True)
|
||||
return results[:n]
|
||||
|
||||
# -----------------------------------------------------------------------
|
||||
# Tier 1: Keyword Match
|
||||
# -----------------------------------------------------------------------
|
||||
|
||||
def _tier1_keyword(
|
||||
self, obligation_text: str, regulation_id: Optional[str]
|
||||
) -> Optional[PatternMatchResult]:
|
||||
"""Match by counting keyword hits in the obligation text."""
|
||||
scores = self._keyword_scores(obligation_text, regulation_id)
|
||||
if not scores:
|
||||
return None
|
||||
|
||||
# Find best match
|
||||
best_pid = max(scores, key=lambda pid: scores[pid][2])
|
||||
hits, total, confidence = scores[best_pid]
|
||||
|
||||
if hits < KEYWORD_MATCH_MIN_HITS:
|
||||
return None
|
||||
|
||||
pattern = self._by_id.get(best_pid)
|
||||
if not pattern:
|
||||
return None
|
||||
|
||||
# Check domain bonus
|
||||
bonus_applied = False
|
||||
if regulation_id and self._domain_matches(pattern.domain, regulation_id):
|
||||
confidence = min(confidence + DOMAIN_BONUS, 1.0)
|
||||
bonus_applied = True
|
||||
|
||||
return PatternMatchResult(
|
||||
pattern=pattern,
|
||||
pattern_id=best_pid,
|
||||
method="keyword",
|
||||
confidence=confidence,
|
||||
keyword_hits=hits,
|
||||
total_keywords=total,
|
||||
domain_bonus_applied=bonus_applied,
|
||||
)
|
||||
|
||||
def _keyword_scores(
|
||||
self, text: str, regulation_id: Optional[str]
|
||||
) -> dict[str, tuple[int, int, float]]:
|
||||
"""Compute keyword match scores for all patterns.
|
||||
|
||||
Returns dict: pattern_id → (hits, total_keywords, confidence).
|
||||
"""
|
||||
text_lower = text.lower()
|
||||
hits_by_pattern: dict[str, int] = {}
|
||||
|
||||
for keyword, pattern_ids in self._keyword_index.items():
|
||||
if keyword in text_lower:
|
||||
for pid in pattern_ids:
|
||||
hits_by_pattern[pid] = hits_by_pattern.get(pid, 0) + 1
|
||||
|
||||
result: dict[str, tuple[int, int, float]] = {}
|
||||
for pid, hits in hits_by_pattern.items():
|
||||
pattern = self._by_id.get(pid)
|
||||
if not pattern:
|
||||
continue
|
||||
total = len(pattern.obligation_match_keywords)
|
||||
confidence = hits / total if total > 0 else 0.0
|
||||
result[pid] = (hits, total, confidence)
|
||||
|
||||
return result
|
||||
|
||||
# -----------------------------------------------------------------------
|
||||
# Tier 2: Embedding Match
|
||||
# -----------------------------------------------------------------------
|
||||
|
||||
async def _tier2_embedding(
|
||||
self, obligation_text: str, regulation_id: Optional[str]
|
||||
) -> Optional[PatternMatchResult]:
|
||||
"""Match by embedding similarity against pattern objective_templates."""
|
||||
scores = await self._embedding_scores(obligation_text, regulation_id)
|
||||
if not scores:
|
||||
return None
|
||||
|
||||
best_pid = max(scores, key=lambda pid: scores[pid][0])
|
||||
emb_score, bonus_applied = scores[best_pid]
|
||||
|
||||
if emb_score < EMBEDDING_PATTERN_THRESHOLD:
|
||||
return None
|
||||
|
||||
pattern = self._by_id.get(best_pid)
|
||||
if not pattern:
|
||||
return None
|
||||
|
||||
return PatternMatchResult(
|
||||
pattern=pattern,
|
||||
pattern_id=best_pid,
|
||||
method="embedding",
|
||||
confidence=min(emb_score, 1.0),
|
||||
embedding_score=emb_score,
|
||||
domain_bonus_applied=bonus_applied,
|
||||
)
|
||||
|
||||
async def _embedding_scores(
|
||||
self, obligation_text: str, regulation_id: Optional[str]
|
||||
) -> dict[str, tuple[float, bool]]:
|
||||
"""Compute embedding similarity scores for all patterns.
|
||||
|
||||
Returns dict: pattern_id → (score, domain_bonus_applied).
|
||||
"""
|
||||
if not self._pattern_embeddings:
|
||||
return {}
|
||||
|
||||
chunk_embedding = await _get_embedding(obligation_text[:2000])
|
||||
if not chunk_embedding:
|
||||
return {}
|
||||
|
||||
result: dict[str, tuple[float, bool]] = {}
|
||||
for i, pat_emb in enumerate(self._pattern_embeddings):
|
||||
if not pat_emb:
|
||||
continue
|
||||
pid = self._pattern_ids[i]
|
||||
pattern = self._by_id.get(pid)
|
||||
if not pattern:
|
||||
continue
|
||||
|
||||
score = _cosine_sim(chunk_embedding, pat_emb)
|
||||
|
||||
# Domain bonus
|
||||
bonus_applied = False
|
||||
if regulation_id and self._domain_matches(pattern.domain, regulation_id):
|
||||
score += DOMAIN_BONUS
|
||||
bonus_applied = True
|
||||
|
||||
result[pid] = (score, bonus_applied)
|
||||
|
||||
return result
|
||||
|
||||
# -----------------------------------------------------------------------
|
||||
# Score combination
|
||||
# -----------------------------------------------------------------------
|
||||
|
||||
def _combine_results(
|
||||
self,
|
||||
keyword_result: Optional[PatternMatchResult],
|
||||
embedding_result: Optional[PatternMatchResult],
|
||||
) -> PatternMatchResult:
|
||||
"""Combine keyword and embedding results into the best match."""
|
||||
if not keyword_result and not embedding_result:
|
||||
return PatternMatchResult()
|
||||
|
||||
if not keyword_result:
|
||||
return embedding_result
|
||||
if not embedding_result:
|
||||
return keyword_result
|
||||
|
||||
# Both matched — check if they agree
|
||||
if keyword_result.pattern_id == embedding_result.pattern_id:
|
||||
# Same pattern: boost confidence
|
||||
combined_confidence = min(
|
||||
max(keyword_result.confidence, embedding_result.confidence) + 0.05,
|
||||
1.0,
|
||||
)
|
||||
return PatternMatchResult(
|
||||
pattern=keyword_result.pattern,
|
||||
pattern_id=keyword_result.pattern_id,
|
||||
method="combined",
|
||||
confidence=combined_confidence,
|
||||
keyword_hits=keyword_result.keyword_hits,
|
||||
total_keywords=keyword_result.total_keywords,
|
||||
embedding_score=embedding_result.embedding_score,
|
||||
domain_bonus_applied=(
|
||||
keyword_result.domain_bonus_applied
|
||||
or embedding_result.domain_bonus_applied
|
||||
),
|
||||
)
|
||||
|
||||
# Different patterns: pick the one with higher confidence
|
||||
if keyword_result.confidence >= embedding_result.confidence:
|
||||
return keyword_result
|
||||
return embedding_result
|
||||
|
||||
# -----------------------------------------------------------------------
|
||||
# Domain affinity
|
||||
# -----------------------------------------------------------------------
|
||||
|
||||
@staticmethod
|
||||
def _domain_matches(pattern_domain: str, regulation_id: str) -> bool:
|
||||
"""Check if a pattern's domain has affinity with a regulation."""
|
||||
affine_domains = _REGULATION_DOMAIN_AFFINITY.get(regulation_id, [])
|
||||
return pattern_domain in affine_domains
|
||||
|
||||
# -----------------------------------------------------------------------
|
||||
# Initialization helpers
|
||||
# -----------------------------------------------------------------------
|
||||
|
||||
def _load_patterns(self) -> None:
|
||||
"""Load control patterns from YAML files."""
|
||||
patterns_dir = _find_patterns_dir()
|
||||
if not patterns_dir:
|
||||
logger.warning("Control patterns directory not found")
|
||||
return
|
||||
|
||||
for yaml_file in sorted(patterns_dir.glob("*.yaml")):
|
||||
if yaml_file.name.startswith("_"):
|
||||
continue
|
||||
try:
|
||||
with open(yaml_file) as f:
|
||||
data = yaml.safe_load(f)
|
||||
if not data or "patterns" not in data:
|
||||
continue
|
||||
for p in data["patterns"]:
|
||||
pattern = ControlPattern(
|
||||
id=p["id"],
|
||||
name=p["name"],
|
||||
name_de=p["name_de"],
|
||||
domain=p["domain"],
|
||||
category=p["category"],
|
||||
description=p["description"],
|
||||
objective_template=p["objective_template"],
|
||||
rationale_template=p["rationale_template"],
|
||||
requirements_template=p.get("requirements_template", []),
|
||||
test_procedure_template=p.get("test_procedure_template", []),
|
||||
evidence_template=p.get("evidence_template", []),
|
||||
severity_default=p.get("severity_default", "medium"),
|
||||
implementation_effort_default=p.get("implementation_effort_default", "m"),
|
||||
obligation_match_keywords=p.get("obligation_match_keywords", []),
|
||||
tags=p.get("tags", []),
|
||||
composable_with=p.get("composable_with", []),
|
||||
open_anchor_refs=p.get("open_anchor_refs", []),
|
||||
)
|
||||
self._patterns.append(pattern)
|
||||
self._by_id[pattern.id] = pattern
|
||||
domain_list = self._by_domain.setdefault(pattern.domain, [])
|
||||
domain_list.append(pattern)
|
||||
except Exception as e:
|
||||
logger.error("Failed to load %s: %s", yaml_file.name, e)
|
||||
|
||||
logger.info("Loaded %d patterns from %s", len(self._patterns), patterns_dir)
|
||||
|
||||
def _build_keyword_index(self) -> None:
|
||||
"""Build reverse index: keyword → [pattern_ids]."""
|
||||
for pattern in self._patterns:
|
||||
for kw in pattern.obligation_match_keywords:
|
||||
lower_kw = kw.lower()
|
||||
if lower_kw not in self._keyword_index:
|
||||
self._keyword_index[lower_kw] = []
|
||||
self._keyword_index[lower_kw].append(pattern.id)
|
||||
|
||||
async def _compute_embeddings(self) -> None:
|
||||
"""Compute embeddings for all pattern objective templates."""
|
||||
if not self._patterns:
|
||||
return
|
||||
|
||||
self._pattern_ids = [p.id for p in self._patterns]
|
||||
texts = [
|
||||
f"{p.name_de}: {p.objective_template}"
|
||||
for p in self._patterns
|
||||
]
|
||||
|
||||
logger.info("Computing embeddings for %d patterns...", len(texts))
|
||||
self._pattern_embeddings = await _get_embeddings_batch(texts)
|
||||
valid = sum(1 for e in self._pattern_embeddings if e)
|
||||
logger.info("Got %d/%d valid pattern embeddings", valid, len(texts))
|
||||
|
||||
# -----------------------------------------------------------------------
|
||||
# Public helpers
|
||||
# -----------------------------------------------------------------------
|
||||
|
||||
def get_pattern(self, pattern_id: str) -> Optional[ControlPattern]:
|
||||
"""Get a pattern by its ID."""
|
||||
return self._by_id.get(pattern_id.upper())
|
||||
|
||||
def get_patterns_by_domain(self, domain: str) -> list[ControlPattern]:
|
||||
"""Get all patterns for a domain."""
|
||||
return self._by_domain.get(domain.upper(), [])
|
||||
|
||||
def stats(self) -> dict:
|
||||
"""Return matcher statistics."""
|
||||
return {
|
||||
"total_patterns": len(self._patterns),
|
||||
"domains": list(self._by_domain.keys()),
|
||||
"keywords": len(self._keyword_index),
|
||||
"embeddings_valid": sum(1 for e in self._pattern_embeddings if e),
|
||||
"initialized": self._initialized,
|
||||
}
|
||||
|
||||
|
||||
def _find_patterns_dir() -> Optional[Path]:
|
||||
"""Locate the control_patterns directory."""
|
||||
candidates = [
|
||||
Path(__file__).resolve().parent.parent.parent.parent
|
||||
/ "ai-compliance-sdk" / "policies" / "control_patterns",
|
||||
Path("/app/ai-compliance-sdk/policies/control_patterns"),
|
||||
Path("ai-compliance-sdk/policies/control_patterns"),
|
||||
]
|
||||
for p in candidates:
|
||||
if p.is_dir():
|
||||
return p
|
||||
return None
|
||||
670
control-pipeline/services/pipeline_adapter.py
Normal file
670
control-pipeline/services/pipeline_adapter.py
Normal file
@@ -0,0 +1,670 @@
|
||||
"""Pipeline Adapter — New 10-Stage Pipeline Integration.
|
||||
|
||||
Bridges the existing 7-stage control_generator pipeline with the new
|
||||
multi-layer components (ObligationExtractor, PatternMatcher, ControlComposer).
|
||||
|
||||
New pipeline flow:
|
||||
chunk → license_classify
|
||||
→ obligation_extract (Stage 4 — NEW)
|
||||
→ pattern_match (Stage 5 — NEW)
|
||||
→ control_compose (Stage 6 — replaces old Stage 3)
|
||||
→ harmonize → anchor → store + crosswalk → mark processed
|
||||
|
||||
Can be used in two modes:
|
||||
1. INLINE: Called from _process_batch() to enrich the pipeline
|
||||
2. STANDALONE: Process chunks directly through new stages
|
||||
|
||||
Part of the Multi-Layer Control Architecture (Phase 7 of 8).
|
||||
"""
|
||||
|
||||
import hashlib
|
||||
import json
|
||||
import logging
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Optional
|
||||
|
||||
from sqlalchemy import text
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
from services.control_composer import ComposedControl, ControlComposer
|
||||
from services.obligation_extractor import ObligationExtractor, ObligationMatch
|
||||
from services.pattern_matcher import PatternMatcher, PatternMatchResult
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class PipelineChunk:
|
||||
"""Input chunk for the new pipeline stages."""
|
||||
|
||||
text: str
|
||||
collection: str = ""
|
||||
regulation_code: str = ""
|
||||
article: Optional[str] = None
|
||||
paragraph: Optional[str] = None
|
||||
license_rule: int = 3
|
||||
license_info: dict = field(default_factory=dict)
|
||||
source_citation: Optional[dict] = None
|
||||
chunk_hash: str = ""
|
||||
|
||||
def compute_hash(self) -> str:
|
||||
if not self.chunk_hash:
|
||||
self.chunk_hash = hashlib.sha256(self.text.encode()).hexdigest()
|
||||
return self.chunk_hash
|
||||
|
||||
|
||||
@dataclass
|
||||
class PipelineResult:
|
||||
"""Result of processing a chunk through the new pipeline."""
|
||||
|
||||
chunk: PipelineChunk
|
||||
obligation: ObligationMatch = field(default_factory=ObligationMatch)
|
||||
pattern_result: PatternMatchResult = field(default_factory=PatternMatchResult)
|
||||
control: Optional[ComposedControl] = None
|
||||
crosswalk_written: bool = False
|
||||
error: Optional[str] = None
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {
|
||||
"chunk_hash": self.chunk.chunk_hash,
|
||||
"obligation": self.obligation.to_dict() if self.obligation else None,
|
||||
"pattern": self.pattern_result.to_dict() if self.pattern_result else None,
|
||||
"control": self.control.to_dict() if self.control else None,
|
||||
"crosswalk_written": self.crosswalk_written,
|
||||
"error": self.error,
|
||||
}
|
||||
|
||||
|
||||
class PipelineAdapter:
|
||||
"""Integrates ObligationExtractor + PatternMatcher + ControlComposer.
|
||||
|
||||
Usage::
|
||||
|
||||
adapter = PipelineAdapter(db)
|
||||
await adapter.initialize()
|
||||
|
||||
result = await adapter.process_chunk(PipelineChunk(
|
||||
text="...",
|
||||
regulation_code="eu_2016_679",
|
||||
article="Art. 30",
|
||||
license_rule=1,
|
||||
))
|
||||
"""
|
||||
|
||||
def __init__(self, db: Optional[Session] = None):
|
||||
self.db = db
|
||||
self._extractor = ObligationExtractor()
|
||||
self._matcher = PatternMatcher()
|
||||
self._composer = ControlComposer()
|
||||
self._initialized = False
|
||||
|
||||
async def initialize(self) -> None:
|
||||
"""Initialize all sub-components."""
|
||||
if self._initialized:
|
||||
return
|
||||
await self._extractor.initialize()
|
||||
await self._matcher.initialize()
|
||||
self._initialized = True
|
||||
logger.info("PipelineAdapter initialized")
|
||||
|
||||
async def process_chunk(self, chunk: PipelineChunk) -> PipelineResult:
|
||||
"""Process a single chunk through the new 3-stage pipeline.
|
||||
|
||||
Stage 4: Obligation Extract
|
||||
Stage 5: Pattern Match
|
||||
Stage 6: Control Compose
|
||||
"""
|
||||
if not self._initialized:
|
||||
await self.initialize()
|
||||
|
||||
chunk.compute_hash()
|
||||
result = PipelineResult(chunk=chunk)
|
||||
|
||||
try:
|
||||
# Stage 4: Obligation Extract
|
||||
result.obligation = await self._extractor.extract(
|
||||
chunk_text=chunk.text,
|
||||
regulation_code=chunk.regulation_code,
|
||||
article=chunk.article,
|
||||
paragraph=chunk.paragraph,
|
||||
)
|
||||
|
||||
# Stage 5: Pattern Match
|
||||
obligation_text = (
|
||||
result.obligation.obligation_text
|
||||
or result.obligation.obligation_title
|
||||
or chunk.text[:500]
|
||||
)
|
||||
result.pattern_result = await self._matcher.match(
|
||||
obligation_text=obligation_text,
|
||||
regulation_id=result.obligation.regulation_id,
|
||||
)
|
||||
|
||||
# Stage 6: Control Compose
|
||||
result.control = await self._composer.compose(
|
||||
obligation=result.obligation,
|
||||
pattern_result=result.pattern_result,
|
||||
chunk_text=chunk.text if chunk.license_rule in (1, 2) else None,
|
||||
license_rule=chunk.license_rule,
|
||||
source_citation=chunk.source_citation,
|
||||
regulation_code=chunk.regulation_code,
|
||||
)
|
||||
|
||||
except Exception as e:
|
||||
logger.error("Pipeline processing failed: %s", e)
|
||||
result.error = str(e)
|
||||
|
||||
return result
|
||||
|
||||
async def process_batch(self, chunks: list[PipelineChunk]) -> list[PipelineResult]:
|
||||
"""Process multiple chunks through the pipeline."""
|
||||
results = []
|
||||
for chunk in chunks:
|
||||
result = await self.process_chunk(chunk)
|
||||
results.append(result)
|
||||
return results
|
||||
|
||||
def write_crosswalk(self, result: PipelineResult, control_uuid: str) -> bool:
|
||||
"""Write obligation_extraction + crosswalk_matrix rows for a processed chunk.
|
||||
|
||||
Called AFTER the control is stored in canonical_controls.
|
||||
"""
|
||||
if not self.db or not result.control:
|
||||
return False
|
||||
|
||||
chunk = result.chunk
|
||||
obligation = result.obligation
|
||||
pattern = result.pattern_result
|
||||
|
||||
try:
|
||||
# 1. Write obligation_extraction row
|
||||
self.db.execute(
|
||||
text("""
|
||||
INSERT INTO obligation_extractions (
|
||||
chunk_hash, collection, regulation_code,
|
||||
article, paragraph, obligation_id,
|
||||
obligation_text, confidence, extraction_method,
|
||||
pattern_id, pattern_match_score, control_uuid
|
||||
) VALUES (
|
||||
:chunk_hash, :collection, :regulation_code,
|
||||
:article, :paragraph, :obligation_id,
|
||||
:obligation_text, :confidence, :extraction_method,
|
||||
:pattern_id, :pattern_match_score,
|
||||
CAST(:control_uuid AS uuid)
|
||||
)
|
||||
"""),
|
||||
{
|
||||
"chunk_hash": chunk.chunk_hash,
|
||||
"collection": chunk.collection,
|
||||
"regulation_code": chunk.regulation_code,
|
||||
"article": chunk.article,
|
||||
"paragraph": chunk.paragraph,
|
||||
"obligation_id": obligation.obligation_id if obligation else None,
|
||||
"obligation_text": (
|
||||
obligation.obligation_text[:2000]
|
||||
if obligation and obligation.obligation_text
|
||||
else None
|
||||
),
|
||||
"confidence": obligation.confidence if obligation else 0,
|
||||
"extraction_method": obligation.method if obligation else "none",
|
||||
"pattern_id": pattern.pattern_id if pattern else None,
|
||||
"pattern_match_score": pattern.confidence if pattern else 0,
|
||||
"control_uuid": control_uuid,
|
||||
},
|
||||
)
|
||||
|
||||
# 2. Write crosswalk_matrix row
|
||||
self.db.execute(
|
||||
text("""
|
||||
INSERT INTO crosswalk_matrix (
|
||||
regulation_code, article, paragraph,
|
||||
obligation_id, pattern_id,
|
||||
master_control_id, master_control_uuid,
|
||||
confidence, source
|
||||
) VALUES (
|
||||
:regulation_code, :article, :paragraph,
|
||||
:obligation_id, :pattern_id,
|
||||
:master_control_id,
|
||||
CAST(:master_control_uuid AS uuid),
|
||||
:confidence, :source
|
||||
)
|
||||
"""),
|
||||
{
|
||||
"regulation_code": chunk.regulation_code,
|
||||
"article": chunk.article,
|
||||
"paragraph": chunk.paragraph,
|
||||
"obligation_id": obligation.obligation_id if obligation else None,
|
||||
"pattern_id": pattern.pattern_id if pattern else None,
|
||||
"master_control_id": result.control.control_id,
|
||||
"master_control_uuid": control_uuid,
|
||||
"confidence": min(
|
||||
obligation.confidence if obligation else 0,
|
||||
pattern.confidence if pattern else 0,
|
||||
),
|
||||
"source": "auto",
|
||||
},
|
||||
)
|
||||
|
||||
# 3. Update canonical_controls with pattern_id + obligation_ids
|
||||
if result.control.pattern_id or result.control.obligation_ids:
|
||||
self.db.execute(
|
||||
text("""
|
||||
UPDATE canonical_controls
|
||||
SET pattern_id = COALESCE(:pattern_id, pattern_id),
|
||||
obligation_ids = COALESCE(:obligation_ids, obligation_ids)
|
||||
WHERE id = CAST(:control_uuid AS uuid)
|
||||
"""),
|
||||
{
|
||||
"pattern_id": result.control.pattern_id,
|
||||
"obligation_ids": json.dumps(result.control.obligation_ids),
|
||||
"control_uuid": control_uuid,
|
||||
},
|
||||
)
|
||||
|
||||
self.db.commit()
|
||||
result.crosswalk_written = True
|
||||
return True
|
||||
|
||||
except Exception as e:
|
||||
logger.error("Failed to write crosswalk: %s", e)
|
||||
self.db.rollback()
|
||||
return False
|
||||
|
||||
def stats(self) -> dict:
|
||||
"""Return component statistics."""
|
||||
return {
|
||||
"extractor": self._extractor.stats(),
|
||||
"matcher": self._matcher.stats(),
|
||||
"initialized": self._initialized,
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Migration Passes — Backfill existing 4,800+ controls
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class MigrationPasses:
|
||||
"""Non-destructive migration passes for existing controls.
|
||||
|
||||
Pass 1: Obligation Linkage (deterministic, article→obligation lookup)
|
||||
Pass 2: Pattern Classification (keyword-based matching)
|
||||
Pass 3: Quality Triage (categorize by linkage completeness)
|
||||
Pass 4: Crosswalk Backfill (write crosswalk rows for linked controls)
|
||||
Pass 5: Deduplication (mark duplicate controls)
|
||||
|
||||
Usage::
|
||||
|
||||
migration = MigrationPasses(db)
|
||||
await migration.initialize()
|
||||
|
||||
result = await migration.run_pass1_obligation_linkage(limit=100)
|
||||
result = await migration.run_pass2_pattern_classification(limit=100)
|
||||
result = migration.run_pass3_quality_triage()
|
||||
result = migration.run_pass4_crosswalk_backfill()
|
||||
result = migration.run_pass5_deduplication()
|
||||
"""
|
||||
|
||||
def __init__(self, db: Session):
|
||||
self.db = db
|
||||
self._extractor = ObligationExtractor()
|
||||
self._matcher = PatternMatcher()
|
||||
self._initialized = False
|
||||
|
||||
async def initialize(self) -> None:
|
||||
"""Initialize extractors (loads obligations + patterns)."""
|
||||
if self._initialized:
|
||||
return
|
||||
self._extractor._load_obligations()
|
||||
self._matcher._load_patterns()
|
||||
self._matcher._build_keyword_index()
|
||||
self._initialized = True
|
||||
|
||||
# -------------------------------------------------------------------
|
||||
# Pass 1: Obligation Linkage (deterministic)
|
||||
# -------------------------------------------------------------------
|
||||
|
||||
async def run_pass1_obligation_linkage(self, limit: int = 0) -> dict:
|
||||
"""Link existing controls to obligations via source_citation article.
|
||||
|
||||
For each control with source_citation → extract regulation + article
|
||||
→ look up in obligation framework → set obligation_ids.
|
||||
"""
|
||||
if not self._initialized:
|
||||
await self.initialize()
|
||||
|
||||
query = """
|
||||
SELECT id, control_id, source_citation, generation_metadata
|
||||
FROM canonical_controls
|
||||
WHERE release_state NOT IN ('deprecated')
|
||||
AND (obligation_ids IS NULL OR obligation_ids = '[]')
|
||||
"""
|
||||
if limit > 0:
|
||||
query += f" LIMIT {limit}"
|
||||
|
||||
rows = self.db.execute(text(query)).fetchall()
|
||||
|
||||
stats = {"total": len(rows), "linked": 0, "no_match": 0, "no_citation": 0}
|
||||
|
||||
for row in rows:
|
||||
control_uuid = str(row[0])
|
||||
control_id = row[1]
|
||||
citation = row[2]
|
||||
metadata = row[3]
|
||||
|
||||
# Extract regulation + article from citation or metadata
|
||||
reg_code, article = _extract_regulation_article(citation, metadata)
|
||||
if not reg_code:
|
||||
stats["no_citation"] += 1
|
||||
continue
|
||||
|
||||
# Tier 1: Exact match
|
||||
match = self._extractor._tier1_exact(reg_code, article or "")
|
||||
if match and match.obligation_id:
|
||||
self.db.execute(
|
||||
text("""
|
||||
UPDATE canonical_controls
|
||||
SET obligation_ids = :obl_ids
|
||||
WHERE id = CAST(:uuid AS uuid)
|
||||
"""),
|
||||
{
|
||||
"obl_ids": json.dumps([match.obligation_id]),
|
||||
"uuid": control_uuid,
|
||||
},
|
||||
)
|
||||
stats["linked"] += 1
|
||||
else:
|
||||
stats["no_match"] += 1
|
||||
|
||||
self.db.commit()
|
||||
logger.info("Pass 1: %s", stats)
|
||||
return stats
|
||||
|
||||
# -------------------------------------------------------------------
|
||||
# Pass 2: Pattern Classification (keyword-based)
|
||||
# -------------------------------------------------------------------
|
||||
|
||||
async def run_pass2_pattern_classification(self, limit: int = 0) -> dict:
|
||||
"""Classify existing controls into patterns via keyword matching.
|
||||
|
||||
For each control without pattern_id → keyword-match title+objective
|
||||
against pattern library → assign best match.
|
||||
"""
|
||||
if not self._initialized:
|
||||
await self.initialize()
|
||||
|
||||
query = """
|
||||
SELECT id, control_id, title, objective
|
||||
FROM canonical_controls
|
||||
WHERE release_state NOT IN ('deprecated')
|
||||
AND (pattern_id IS NULL OR pattern_id = '')
|
||||
"""
|
||||
if limit > 0:
|
||||
query += f" LIMIT {limit}"
|
||||
|
||||
rows = self.db.execute(text(query)).fetchall()
|
||||
|
||||
stats = {"total": len(rows), "classified": 0, "no_match": 0}
|
||||
|
||||
for row in rows:
|
||||
control_uuid = str(row[0])
|
||||
title = row[2] or ""
|
||||
objective = row[3] or ""
|
||||
|
||||
# Keyword match
|
||||
match_text = f"{title} {objective}"
|
||||
result = self._matcher._tier1_keyword(match_text, None)
|
||||
|
||||
if result and result.pattern_id and result.keyword_hits >= 2:
|
||||
self.db.execute(
|
||||
text("""
|
||||
UPDATE canonical_controls
|
||||
SET pattern_id = :pattern_id
|
||||
WHERE id = CAST(:uuid AS uuid)
|
||||
"""),
|
||||
{
|
||||
"pattern_id": result.pattern_id,
|
||||
"uuid": control_uuid,
|
||||
},
|
||||
)
|
||||
stats["classified"] += 1
|
||||
else:
|
||||
stats["no_match"] += 1
|
||||
|
||||
self.db.commit()
|
||||
logger.info("Pass 2: %s", stats)
|
||||
return stats
|
||||
|
||||
# -------------------------------------------------------------------
|
||||
# Pass 3: Quality Triage
|
||||
# -------------------------------------------------------------------
|
||||
|
||||
def run_pass3_quality_triage(self) -> dict:
|
||||
"""Categorize controls by linkage completeness.
|
||||
|
||||
Sets generation_metadata.triage_status:
|
||||
- "review": has both obligation_id + pattern_id
|
||||
- "needs_obligation": has pattern_id but no obligation_id
|
||||
- "needs_pattern": has obligation_id but no pattern_id
|
||||
- "legacy_unlinked": has neither
|
||||
"""
|
||||
categories = {
|
||||
"review": """
|
||||
UPDATE canonical_controls
|
||||
SET generation_metadata = jsonb_set(
|
||||
COALESCE(generation_metadata::jsonb, '{}'::jsonb),
|
||||
'{triage_status}', '"review"'
|
||||
)
|
||||
WHERE release_state NOT IN ('deprecated')
|
||||
AND obligation_ids IS NOT NULL AND obligation_ids != '[]'
|
||||
AND pattern_id IS NOT NULL AND pattern_id != ''
|
||||
""",
|
||||
"needs_obligation": """
|
||||
UPDATE canonical_controls
|
||||
SET generation_metadata = jsonb_set(
|
||||
COALESCE(generation_metadata::jsonb, '{}'::jsonb),
|
||||
'{triage_status}', '"needs_obligation"'
|
||||
)
|
||||
WHERE release_state NOT IN ('deprecated')
|
||||
AND (obligation_ids IS NULL OR obligation_ids = '[]')
|
||||
AND pattern_id IS NOT NULL AND pattern_id != ''
|
||||
""",
|
||||
"needs_pattern": """
|
||||
UPDATE canonical_controls
|
||||
SET generation_metadata = jsonb_set(
|
||||
COALESCE(generation_metadata::jsonb, '{}'::jsonb),
|
||||
'{triage_status}', '"needs_pattern"'
|
||||
)
|
||||
WHERE release_state NOT IN ('deprecated')
|
||||
AND obligation_ids IS NOT NULL AND obligation_ids != '[]'
|
||||
AND (pattern_id IS NULL OR pattern_id = '')
|
||||
""",
|
||||
"legacy_unlinked": """
|
||||
UPDATE canonical_controls
|
||||
SET generation_metadata = jsonb_set(
|
||||
COALESCE(generation_metadata::jsonb, '{}'::jsonb),
|
||||
'{triage_status}', '"legacy_unlinked"'
|
||||
)
|
||||
WHERE release_state NOT IN ('deprecated')
|
||||
AND (obligation_ids IS NULL OR obligation_ids = '[]')
|
||||
AND (pattern_id IS NULL OR pattern_id = '')
|
||||
""",
|
||||
}
|
||||
|
||||
stats = {}
|
||||
for category, sql in categories.items():
|
||||
result = self.db.execute(text(sql))
|
||||
stats[category] = result.rowcount
|
||||
|
||||
self.db.commit()
|
||||
logger.info("Pass 3: %s", stats)
|
||||
return stats
|
||||
|
||||
# -------------------------------------------------------------------
|
||||
# Pass 4: Crosswalk Backfill
|
||||
# -------------------------------------------------------------------
|
||||
|
||||
def run_pass4_crosswalk_backfill(self) -> dict:
|
||||
"""Create crosswalk_matrix rows for controls with obligation + pattern.
|
||||
|
||||
Only creates rows that don't already exist.
|
||||
"""
|
||||
result = self.db.execute(text("""
|
||||
INSERT INTO crosswalk_matrix (
|
||||
regulation_code, obligation_id, pattern_id,
|
||||
master_control_id, master_control_uuid,
|
||||
confidence, source
|
||||
)
|
||||
SELECT
|
||||
COALESCE(
|
||||
(generation_metadata::jsonb->>'source_regulation'),
|
||||
''
|
||||
) AS regulation_code,
|
||||
obl.value::text AS obligation_id,
|
||||
cc.pattern_id,
|
||||
cc.control_id,
|
||||
cc.id,
|
||||
0.80,
|
||||
'migrated'
|
||||
FROM canonical_controls cc,
|
||||
jsonb_array_elements_text(
|
||||
COALESCE(cc.obligation_ids::jsonb, '[]'::jsonb)
|
||||
) AS obl(value)
|
||||
WHERE cc.release_state NOT IN ('deprecated')
|
||||
AND cc.pattern_id IS NOT NULL AND cc.pattern_id != ''
|
||||
AND cc.obligation_ids IS NOT NULL AND cc.obligation_ids != '[]'
|
||||
AND NOT EXISTS (
|
||||
SELECT 1 FROM crosswalk_matrix cw
|
||||
WHERE cw.master_control_uuid = cc.id
|
||||
AND cw.obligation_id = obl.value::text
|
||||
)
|
||||
"""))
|
||||
|
||||
rows_inserted = result.rowcount
|
||||
self.db.commit()
|
||||
logger.info("Pass 4: %d crosswalk rows inserted", rows_inserted)
|
||||
return {"rows_inserted": rows_inserted}
|
||||
|
||||
# -------------------------------------------------------------------
|
||||
# Pass 5: Deduplication
|
||||
# -------------------------------------------------------------------
|
||||
|
||||
def run_pass5_deduplication(self) -> dict:
|
||||
"""Mark duplicate controls (same obligation + same pattern).
|
||||
|
||||
Groups controls by (obligation_id, pattern_id), keeps the one with
|
||||
highest evidence_confidence (or newest), marks rest as deprecated.
|
||||
"""
|
||||
# Find groups with duplicates
|
||||
groups = self.db.execute(text("""
|
||||
SELECT cc.pattern_id,
|
||||
obl.value::text AS obligation_id,
|
||||
array_agg(cc.id ORDER BY cc.evidence_confidence DESC NULLS LAST, cc.created_at DESC) AS ids,
|
||||
count(*) AS cnt
|
||||
FROM canonical_controls cc,
|
||||
jsonb_array_elements_text(
|
||||
COALESCE(cc.obligation_ids::jsonb, '[]'::jsonb)
|
||||
) AS obl(value)
|
||||
WHERE cc.release_state NOT IN ('deprecated')
|
||||
AND cc.pattern_id IS NOT NULL AND cc.pattern_id != ''
|
||||
GROUP BY cc.pattern_id, obl.value::text
|
||||
HAVING count(*) > 1
|
||||
""")).fetchall()
|
||||
|
||||
stats = {"groups_found": len(groups), "controls_deprecated": 0}
|
||||
|
||||
for group in groups:
|
||||
ids = group[2] # Array of UUIDs, first is the keeper
|
||||
if len(ids) <= 1:
|
||||
continue
|
||||
|
||||
# Keep first (highest confidence), deprecate rest
|
||||
deprecate_ids = ids[1:]
|
||||
for dep_id in deprecate_ids:
|
||||
self.db.execute(
|
||||
text("""
|
||||
UPDATE canonical_controls
|
||||
SET release_state = 'deprecated',
|
||||
generation_metadata = jsonb_set(
|
||||
COALESCE(generation_metadata::jsonb, '{}'::jsonb),
|
||||
'{deprecated_reason}', '"duplicate_same_obligation_pattern"'
|
||||
)
|
||||
WHERE id = CAST(:uuid AS uuid)
|
||||
AND release_state != 'deprecated'
|
||||
"""),
|
||||
{"uuid": str(dep_id)},
|
||||
)
|
||||
stats["controls_deprecated"] += 1
|
||||
|
||||
self.db.commit()
|
||||
logger.info("Pass 5: %s", stats)
|
||||
return stats
|
||||
|
||||
def migration_status(self) -> dict:
|
||||
"""Return overall migration progress."""
|
||||
row = self.db.execute(text("""
|
||||
SELECT
|
||||
count(*) AS total,
|
||||
count(*) FILTER (WHERE obligation_ids IS NOT NULL AND obligation_ids != '[]') AS has_obligation,
|
||||
count(*) FILTER (WHERE pattern_id IS NOT NULL AND pattern_id != '') AS has_pattern,
|
||||
count(*) FILTER (
|
||||
WHERE obligation_ids IS NOT NULL AND obligation_ids != '[]'
|
||||
AND pattern_id IS NOT NULL AND pattern_id != ''
|
||||
) AS fully_linked,
|
||||
count(*) FILTER (WHERE release_state = 'deprecated') AS deprecated
|
||||
FROM canonical_controls
|
||||
""")).fetchone()
|
||||
|
||||
return {
|
||||
"total_controls": row[0],
|
||||
"has_obligation": row[1],
|
||||
"has_pattern": row[2],
|
||||
"fully_linked": row[3],
|
||||
"deprecated": row[4],
|
||||
"coverage_obligation_pct": round(row[1] / max(row[0], 1) * 100, 1),
|
||||
"coverage_pattern_pct": round(row[2] / max(row[0], 1) * 100, 1),
|
||||
"coverage_full_pct": round(row[3] / max(row[0], 1) * 100, 1),
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _extract_regulation_article(
|
||||
citation: Optional[str], metadata: Optional[str]
|
||||
) -> tuple[Optional[str], Optional[str]]:
|
||||
"""Extract regulation_code and article from control's citation/metadata."""
|
||||
from services.obligation_extractor import _normalize_regulation
|
||||
|
||||
reg_code = None
|
||||
article = None
|
||||
|
||||
# Try citation first (JSON string or dict)
|
||||
if citation:
|
||||
try:
|
||||
c = json.loads(citation) if isinstance(citation, str) else citation
|
||||
if isinstance(c, dict):
|
||||
article = c.get("article") or c.get("source_article")
|
||||
# Try to get regulation from source field
|
||||
source = c.get("source", "")
|
||||
if source:
|
||||
reg_code = _normalize_regulation(source)
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
pass
|
||||
|
||||
# Try metadata
|
||||
if metadata and not reg_code:
|
||||
try:
|
||||
m = json.loads(metadata) if isinstance(metadata, str) else metadata
|
||||
if isinstance(m, dict):
|
||||
src_reg = m.get("source_regulation", "")
|
||||
if src_reg:
|
||||
reg_code = _normalize_regulation(src_reg)
|
||||
if not article:
|
||||
article = m.get("source_article")
|
||||
except (json.JSONDecodeError, TypeError):
|
||||
pass
|
||||
|
||||
return reg_code, article
|
||||
213
control-pipeline/services/rag_client.py
Normal file
213
control-pipeline/services/rag_client.py
Normal file
@@ -0,0 +1,213 @@
|
||||
"""
|
||||
Compliance RAG Client — Proxy to Go SDK RAG Search.
|
||||
|
||||
Lightweight HTTP client that queries the Go AI Compliance SDK's
|
||||
POST /sdk/v1/rag/search endpoint. This avoids needing embedding
|
||||
models or direct Qdrant access in Python.
|
||||
|
||||
Error-tolerant: RAG failures never break the calling function.
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
from dataclasses import dataclass
|
||||
from typing import List, Optional
|
||||
|
||||
import httpx
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
SDK_URL = os.getenv("SDK_URL", "http://ai-compliance-sdk:8090")
|
||||
RAG_SEARCH_TIMEOUT = 15.0 # seconds
|
||||
|
||||
|
||||
@dataclass
|
||||
class RAGSearchResult:
|
||||
"""A single search result from the compliance corpus."""
|
||||
text: str
|
||||
regulation_code: str
|
||||
regulation_name: str
|
||||
regulation_short: str
|
||||
category: str
|
||||
article: str
|
||||
paragraph: str
|
||||
source_url: str
|
||||
score: float
|
||||
collection: str = ""
|
||||
|
||||
|
||||
class ComplianceRAGClient:
|
||||
"""
|
||||
RAG client that proxies search requests to the Go SDK.
|
||||
|
||||
Usage:
|
||||
client = get_rag_client()
|
||||
results = await client.search("DSGVO Art. 35", collection="bp_compliance_recht")
|
||||
context_str = client.format_for_prompt(results)
|
||||
"""
|
||||
|
||||
def __init__(self, base_url: str = SDK_URL):
|
||||
self._search_url = f"{base_url}/sdk/v1/rag/search"
|
||||
|
||||
async def search(
|
||||
self,
|
||||
query: str,
|
||||
collection: str = "bp_compliance_ce",
|
||||
regulations: Optional[List[str]] = None,
|
||||
top_k: int = 5,
|
||||
) -> List[RAGSearchResult]:
|
||||
"""
|
||||
Search the RAG corpus via Go SDK.
|
||||
|
||||
Returns an empty list on any error (never raises).
|
||||
"""
|
||||
payload = {
|
||||
"query": query,
|
||||
"collection": collection,
|
||||
"top_k": top_k,
|
||||
}
|
||||
if regulations:
|
||||
payload["regulations"] = regulations
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=RAG_SEARCH_TIMEOUT) as client:
|
||||
resp = await client.post(self._search_url, json=payload)
|
||||
|
||||
if resp.status_code != 200:
|
||||
logger.warning(
|
||||
"RAG search returned %d: %s", resp.status_code, resp.text[:200]
|
||||
)
|
||||
return []
|
||||
|
||||
data = resp.json()
|
||||
results = []
|
||||
for r in data.get("results", []):
|
||||
results.append(RAGSearchResult(
|
||||
text=r.get("text", ""),
|
||||
regulation_code=r.get("regulation_code", ""),
|
||||
regulation_name=r.get("regulation_name", ""),
|
||||
regulation_short=r.get("regulation_short", ""),
|
||||
category=r.get("category", ""),
|
||||
article=r.get("article", ""),
|
||||
paragraph=r.get("paragraph", ""),
|
||||
source_url=r.get("source_url", ""),
|
||||
score=r.get("score", 0.0),
|
||||
collection=collection,
|
||||
))
|
||||
return results
|
||||
|
||||
except Exception as e:
|
||||
logger.warning("RAG search failed: %s", e)
|
||||
return []
|
||||
|
||||
async def search_with_rerank(
|
||||
self,
|
||||
query: str,
|
||||
collection: str = "bp_compliance_ce",
|
||||
regulations: Optional[List[str]] = None,
|
||||
top_k: int = 5,
|
||||
) -> List[RAGSearchResult]:
|
||||
"""
|
||||
Search with optional cross-encoder re-ranking.
|
||||
|
||||
Fetches top_k*4 results from RAG, then re-ranks with cross-encoder
|
||||
and returns top_k. Falls back to regular search if reranker is disabled.
|
||||
"""
|
||||
from .reranker import get_reranker
|
||||
|
||||
reranker = get_reranker()
|
||||
if reranker is None:
|
||||
return await self.search(query, collection, regulations, top_k)
|
||||
|
||||
# Fetch more candidates for re-ranking
|
||||
candidates = await self.search(
|
||||
query, collection, regulations, top_k=max(top_k * 4, 20)
|
||||
)
|
||||
if not candidates:
|
||||
return []
|
||||
|
||||
texts = [c.text for c in candidates]
|
||||
try:
|
||||
ranked_indices = reranker.rerank(query, texts, top_k=top_k)
|
||||
return [candidates[i] for i in ranked_indices]
|
||||
except Exception as e:
|
||||
logger.warning("Reranking failed, returning unranked: %s", e)
|
||||
return candidates[:top_k]
|
||||
|
||||
async def scroll(
|
||||
self,
|
||||
collection: str,
|
||||
offset: Optional[str] = None,
|
||||
limit: int = 100,
|
||||
) -> tuple[List[RAGSearchResult], Optional[str]]:
|
||||
"""
|
||||
Scroll through ALL chunks in a collection (paginated).
|
||||
|
||||
Returns (chunks, next_offset). next_offset is None when done.
|
||||
"""
|
||||
scroll_url = self._search_url.replace("/search", "/scroll")
|
||||
params = {"collection": collection, "limit": str(limit)}
|
||||
if offset:
|
||||
params["offset"] = offset
|
||||
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=30.0) as client:
|
||||
resp = await client.get(scroll_url, params=params)
|
||||
|
||||
if resp.status_code != 200:
|
||||
logger.warning(
|
||||
"RAG scroll returned %d: %s", resp.status_code, resp.text[:200]
|
||||
)
|
||||
return [], None
|
||||
|
||||
data = resp.json()
|
||||
results = []
|
||||
for r in data.get("chunks", []):
|
||||
results.append(RAGSearchResult(
|
||||
text=r.get("text", ""),
|
||||
regulation_code=r.get("regulation_code", ""),
|
||||
regulation_name=r.get("regulation_name", ""),
|
||||
regulation_short=r.get("regulation_short", ""),
|
||||
category=r.get("category", ""),
|
||||
article=r.get("article", ""),
|
||||
paragraph=r.get("paragraph", ""),
|
||||
source_url=r.get("source_url", ""),
|
||||
score=0.0,
|
||||
collection=collection,
|
||||
))
|
||||
next_offset = data.get("next_offset") or None
|
||||
return results, next_offset
|
||||
|
||||
except Exception as e:
|
||||
logger.warning("RAG scroll failed: %s", e)
|
||||
return [], None
|
||||
|
||||
def format_for_prompt(
|
||||
self, results: List[RAGSearchResult], max_results: int = 5
|
||||
) -> str:
|
||||
"""Format search results as Markdown for inclusion in an LLM prompt."""
|
||||
if not results:
|
||||
return ""
|
||||
|
||||
lines = ["## Relevanter Rechtskontext\n"]
|
||||
for i, r in enumerate(results[:max_results]):
|
||||
header = f"{i + 1}. **{r.regulation_short}** ({r.regulation_code})"
|
||||
if r.article:
|
||||
header += f" — {r.article}"
|
||||
lines.append(header)
|
||||
text = r.text[:400] + "..." if len(r.text) > 400 else r.text
|
||||
lines.append(f" > {text}\n")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
# Singleton
|
||||
_rag_client: Optional[ComplianceRAGClient] = None
|
||||
|
||||
|
||||
def get_rag_client() -> ComplianceRAGClient:
|
||||
"""Get the shared RAG client instance."""
|
||||
global _rag_client
|
||||
if _rag_client is None:
|
||||
_rag_client = ComplianceRAGClient()
|
||||
return _rag_client
|
||||
85
control-pipeline/services/reranker.py
Normal file
85
control-pipeline/services/reranker.py
Normal file
@@ -0,0 +1,85 @@
|
||||
"""
|
||||
Cross-Encoder Re-Ranking for RAG Search Results.
|
||||
|
||||
Uses BGE Reranker v2 (BAAI/bge-reranker-v2-m3, MIT license) to re-rank
|
||||
search results from Qdrant for improved retrieval quality.
|
||||
|
||||
Lazy-loads the model on first use. Disabled by default (RERANK_ENABLED=false).
|
||||
"""
|
||||
|
||||
import logging
|
||||
import os
|
||||
from typing import Optional
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
RERANK_ENABLED = os.getenv("RERANK_ENABLED", "false").lower() == "true"
|
||||
RERANK_MODEL = os.getenv("RERANK_MODEL", "BAAI/bge-reranker-v2-m3")
|
||||
|
||||
|
||||
class Reranker:
|
||||
"""Cross-encoder reranker using sentence-transformers."""
|
||||
|
||||
def __init__(self, model_name: str = RERANK_MODEL):
|
||||
self._model = None # Lazy init
|
||||
self._model_name = model_name
|
||||
|
||||
def _ensure_model(self) -> None:
|
||||
"""Load model on first use."""
|
||||
if self._model is not None:
|
||||
return
|
||||
try:
|
||||
from sentence_transformers import CrossEncoder
|
||||
|
||||
logger.info("Loading reranker model: %s", self._model_name)
|
||||
self._model = CrossEncoder(self._model_name)
|
||||
logger.info("Reranker model loaded successfully")
|
||||
except ImportError:
|
||||
logger.error(
|
||||
"sentence-transformers not installed. "
|
||||
"Install with: pip install sentence-transformers"
|
||||
)
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error("Failed to load reranker model: %s", e)
|
||||
raise
|
||||
|
||||
def rerank(
|
||||
self, query: str, texts: list[str], top_k: int = 5
|
||||
) -> list[int]:
|
||||
"""
|
||||
Return indices of top_k texts sorted by relevance (highest first).
|
||||
|
||||
Args:
|
||||
query: The search query.
|
||||
texts: List of candidate texts to re-rank.
|
||||
top_k: Number of top results to return.
|
||||
|
||||
Returns:
|
||||
List of indices into the original texts list, sorted by relevance.
|
||||
"""
|
||||
if not texts:
|
||||
return []
|
||||
|
||||
self._ensure_model()
|
||||
|
||||
pairs = [[query, text] for text in texts]
|
||||
scores = self._model.predict(pairs)
|
||||
|
||||
# Sort by score descending, return indices
|
||||
ranked = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)
|
||||
return ranked[:top_k]
|
||||
|
||||
|
||||
# Module-level singleton
|
||||
_reranker: Optional[Reranker] = None
|
||||
|
||||
|
||||
def get_reranker() -> Optional[Reranker]:
|
||||
"""Get the shared reranker instance. Returns None if disabled."""
|
||||
global _reranker
|
||||
if not RERANK_ENABLED:
|
||||
return None
|
||||
if _reranker is None:
|
||||
_reranker = Reranker()
|
||||
return _reranker
|
||||
223
control-pipeline/services/similarity_detector.py
Normal file
223
control-pipeline/services/similarity_detector.py
Normal file
@@ -0,0 +1,223 @@
|
||||
"""
|
||||
Too-Close Similarity Detector — checks whether a candidate text is too similar
|
||||
to a protected source text (copyright / license compliance).
|
||||
|
||||
Five metrics:
|
||||
1. Exact-phrase — longest identical token sequence
|
||||
2. Token overlap — Jaccard similarity of token sets
|
||||
3. 3-gram Jaccard — Jaccard similarity of character 3-grams
|
||||
4. Embedding cosine — via bge-m3 (Ollama or embedding-service)
|
||||
5. LCS ratio — Longest Common Subsequence / max(len_a, len_b)
|
||||
|
||||
Decision:
|
||||
PASS — no fail + max 1 warn
|
||||
WARN — max 2 warn, no fail → human review
|
||||
FAIL — any fail threshold → block, rewrite required
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import re
|
||||
from dataclasses import dataclass
|
||||
from typing import Optional
|
||||
|
||||
import httpx
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Thresholds
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
THRESHOLDS = {
|
||||
"max_exact_run": {"warn": 8, "fail": 12},
|
||||
"token_overlap": {"warn": 0.20, "fail": 0.30},
|
||||
"ngram_jaccard": {"warn": 0.10, "fail": 0.18},
|
||||
"embedding_cosine": {"warn": 0.86, "fail": 0.92},
|
||||
"lcs_ratio": {"warn": 0.35, "fail": 0.50},
|
||||
}
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Tokenisation helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
_WORD_RE = re.compile(r"\w+", re.UNICODE)
|
||||
|
||||
|
||||
def _tokenize(text: str) -> list[str]:
|
||||
return [t.lower() for t in _WORD_RE.findall(text)]
|
||||
|
||||
|
||||
def _char_ngrams(text: str, n: int = 3) -> set[str]:
|
||||
text = text.lower()
|
||||
return {text[i : i + n] for i in range(len(text) - n + 1)} if len(text) >= n else set()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Metric implementations
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def max_exact_run(tokens_a: list[str], tokens_b: list[str]) -> int:
|
||||
"""Longest contiguous identical token sequence between a and b."""
|
||||
if not tokens_a or not tokens_b:
|
||||
return 0
|
||||
|
||||
best = 0
|
||||
set_b = set(tokens_b)
|
||||
|
||||
for i in range(len(tokens_a)):
|
||||
if tokens_a[i] not in set_b:
|
||||
continue
|
||||
for j in range(len(tokens_b)):
|
||||
if tokens_a[i] != tokens_b[j]:
|
||||
continue
|
||||
run = 0
|
||||
ii, jj = i, j
|
||||
while ii < len(tokens_a) and jj < len(tokens_b) and tokens_a[ii] == tokens_b[jj]:
|
||||
run += 1
|
||||
ii += 1
|
||||
jj += 1
|
||||
if run > best:
|
||||
best = run
|
||||
return best
|
||||
|
||||
|
||||
def token_overlap_jaccard(tokens_a: list[str], tokens_b: list[str]) -> float:
|
||||
"""Jaccard similarity of token sets."""
|
||||
set_a, set_b = set(tokens_a), set(tokens_b)
|
||||
if not set_a and not set_b:
|
||||
return 0.0
|
||||
return len(set_a & set_b) / len(set_a | set_b)
|
||||
|
||||
|
||||
def ngram_jaccard(text_a: str, text_b: str, n: int = 3) -> float:
|
||||
"""Jaccard similarity of character n-grams."""
|
||||
grams_a = _char_ngrams(text_a, n)
|
||||
grams_b = _char_ngrams(text_b, n)
|
||||
if not grams_a and not grams_b:
|
||||
return 0.0
|
||||
return len(grams_a & grams_b) / len(grams_a | grams_b)
|
||||
|
||||
|
||||
def lcs_ratio(tokens_a: list[str], tokens_b: list[str]) -> float:
|
||||
"""LCS length / max(len_a, len_b)."""
|
||||
m, n = len(tokens_a), len(tokens_b)
|
||||
if m == 0 or n == 0:
|
||||
return 0.0
|
||||
|
||||
# Space-optimised LCS (two rows)
|
||||
prev = [0] * (n + 1)
|
||||
curr = [0] * (n + 1)
|
||||
for i in range(1, m + 1):
|
||||
for j in range(1, n + 1):
|
||||
if tokens_a[i - 1] == tokens_b[j - 1]:
|
||||
curr[j] = prev[j - 1] + 1
|
||||
else:
|
||||
curr[j] = max(prev[j], curr[j - 1])
|
||||
prev, curr = curr, [0] * (n + 1)
|
||||
|
||||
return prev[n] / max(m, n)
|
||||
|
||||
|
||||
async def embedding_cosine(text_a: str, text_b: str, embedding_url: str | None = None) -> float:
|
||||
"""Cosine similarity via embedding service (bge-m3).
|
||||
|
||||
Falls back to 0.0 if the service is unreachable.
|
||||
"""
|
||||
url = embedding_url or "http://embedding-service:8087"
|
||||
try:
|
||||
async with httpx.AsyncClient(timeout=10.0) as client:
|
||||
resp = await client.post(
|
||||
f"{url}/embed",
|
||||
json={"texts": [text_a, text_b]},
|
||||
)
|
||||
resp.raise_for_status()
|
||||
embeddings = resp.json().get("embeddings", [])
|
||||
if len(embeddings) < 2:
|
||||
return 0.0
|
||||
return _cosine(embeddings[0], embeddings[1])
|
||||
except Exception:
|
||||
logger.warning("Embedding service unreachable, skipping cosine check")
|
||||
return 0.0
|
||||
|
||||
|
||||
def _cosine(a: list[float], b: list[float]) -> float:
|
||||
dot = sum(x * y for x, y in zip(a, b))
|
||||
norm_a = sum(x * x for x in a) ** 0.5
|
||||
norm_b = sum(x * x for x in b) ** 0.5
|
||||
if norm_a == 0 or norm_b == 0:
|
||||
return 0.0
|
||||
return dot / (norm_a * norm_b)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Decision engine
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@dataclass
|
||||
class SimilarityReport:
|
||||
max_exact_run: int
|
||||
token_overlap: float
|
||||
ngram_jaccard: float
|
||||
embedding_cosine: float
|
||||
lcs_ratio: float
|
||||
status: str # PASS, WARN, FAIL
|
||||
details: dict # per-metric status
|
||||
|
||||
|
||||
def _classify(value: float | int, metric: str) -> str:
|
||||
t = THRESHOLDS[metric]
|
||||
if value >= t["fail"]:
|
||||
return "FAIL"
|
||||
if value >= t["warn"]:
|
||||
return "WARN"
|
||||
return "PASS"
|
||||
|
||||
|
||||
async def check_similarity(
|
||||
source_text: str,
|
||||
candidate_text: str,
|
||||
embedding_url: str | None = None,
|
||||
) -> SimilarityReport:
|
||||
"""Run all 5 metrics and return an aggregate report."""
|
||||
tok_src = _tokenize(source_text)
|
||||
tok_cand = _tokenize(candidate_text)
|
||||
|
||||
m_exact = max_exact_run(tok_src, tok_cand)
|
||||
m_token = token_overlap_jaccard(tok_src, tok_cand)
|
||||
m_ngram = ngram_jaccard(source_text, candidate_text)
|
||||
m_embed = await embedding_cosine(source_text, candidate_text, embedding_url)
|
||||
m_lcs = lcs_ratio(tok_src, tok_cand)
|
||||
|
||||
details = {
|
||||
"max_exact_run": _classify(m_exact, "max_exact_run"),
|
||||
"token_overlap": _classify(m_token, "token_overlap"),
|
||||
"ngram_jaccard": _classify(m_ngram, "ngram_jaccard"),
|
||||
"embedding_cosine": _classify(m_embed, "embedding_cosine"),
|
||||
"lcs_ratio": _classify(m_lcs, "lcs_ratio"),
|
||||
}
|
||||
|
||||
fail_count = sum(1 for v in details.values() if v == "FAIL")
|
||||
warn_count = sum(1 for v in details.values() if v == "WARN")
|
||||
|
||||
if fail_count > 0:
|
||||
status = "FAIL"
|
||||
elif warn_count > 2:
|
||||
status = "FAIL"
|
||||
elif warn_count > 1:
|
||||
status = "WARN"
|
||||
elif warn_count == 1:
|
||||
status = "PASS"
|
||||
else:
|
||||
status = "PASS"
|
||||
|
||||
return SimilarityReport(
|
||||
max_exact_run=m_exact,
|
||||
token_overlap=round(m_token, 4),
|
||||
ngram_jaccard=round(m_ngram, 4),
|
||||
embedding_cosine=round(m_embed, 4),
|
||||
lcs_ratio=round(m_lcs, 4),
|
||||
status=status,
|
||||
details=details,
|
||||
)
|
||||
331
control-pipeline/services/v1_enrichment.py
Normal file
331
control-pipeline/services/v1_enrichment.py
Normal file
@@ -0,0 +1,331 @@
|
||||
"""V1 Control Enrichment Service — Match Eigenentwicklung controls to regulations.
|
||||
|
||||
Finds regulatory coverage for v1 controls (generation_strategy='ungrouped',
|
||||
pipeline_version=1, no source_citation) by embedding similarity search.
|
||||
|
||||
Reuses embedding + Qdrant helpers from control_dedup.py.
|
||||
"""
|
||||
|
||||
import logging
|
||||
from typing import Optional
|
||||
|
||||
from sqlalchemy import text
|
||||
|
||||
from db.session import SessionLocal
|
||||
from services.control_dedup import (
|
||||
get_embedding,
|
||||
qdrant_search_cross_regulation,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Similarity threshold — lower than dedup (0.85) since we want informational matches
|
||||
# Typical top scores for v1 controls are 0.70-0.77
|
||||
V1_MATCH_THRESHOLD = 0.70
|
||||
V1_MAX_MATCHES = 5
|
||||
|
||||
|
||||
def _is_eigenentwicklung_query() -> str:
|
||||
"""SQL WHERE clause identifying v1 Eigenentwicklung controls."""
|
||||
return """
|
||||
generation_strategy = 'ungrouped'
|
||||
AND (pipeline_version = '1' OR pipeline_version IS NULL)
|
||||
AND source_citation IS NULL
|
||||
AND parent_control_uuid IS NULL
|
||||
AND release_state NOT IN ('rejected', 'merged', 'deprecated')
|
||||
"""
|
||||
|
||||
|
||||
async def count_v1_controls() -> int:
|
||||
"""Count how many v1 Eigenentwicklung controls exist."""
|
||||
with SessionLocal() as db:
|
||||
row = db.execute(text(f"""
|
||||
SELECT COUNT(*) AS cnt
|
||||
FROM canonical_controls
|
||||
WHERE {_is_eigenentwicklung_query()}
|
||||
""")).fetchone()
|
||||
return row.cnt if row else 0
|
||||
|
||||
|
||||
async def enrich_v1_matches(
|
||||
dry_run: bool = True,
|
||||
batch_size: int = 100,
|
||||
offset: int = 0,
|
||||
) -> dict:
|
||||
"""Find regulatory matches for v1 Eigenentwicklung controls.
|
||||
|
||||
Args:
|
||||
dry_run: If True, only count — don't write matches.
|
||||
batch_size: Number of v1 controls to process per call.
|
||||
offset: Pagination offset (v1 control index).
|
||||
|
||||
Returns:
|
||||
Stats dict with counts, sample matches, and pagination info.
|
||||
"""
|
||||
with SessionLocal() as db:
|
||||
# 1. Load v1 controls (paginated)
|
||||
v1_controls = db.execute(text(f"""
|
||||
SELECT id, control_id, title, objective, category
|
||||
FROM canonical_controls
|
||||
WHERE {_is_eigenentwicklung_query()}
|
||||
ORDER BY control_id
|
||||
LIMIT :limit OFFSET :offset
|
||||
"""), {"limit": batch_size, "offset": offset}).fetchall()
|
||||
|
||||
# Count total for pagination
|
||||
total_row = db.execute(text(f"""
|
||||
SELECT COUNT(*) AS cnt
|
||||
FROM canonical_controls
|
||||
WHERE {_is_eigenentwicklung_query()}
|
||||
""")).fetchone()
|
||||
total_v1 = total_row.cnt if total_row else 0
|
||||
|
||||
if not v1_controls:
|
||||
return {
|
||||
"dry_run": dry_run,
|
||||
"processed": 0,
|
||||
"total_v1": total_v1,
|
||||
"message": "Kein weiterer Batch — alle v1 Controls verarbeitet.",
|
||||
}
|
||||
|
||||
if dry_run:
|
||||
return {
|
||||
"dry_run": True,
|
||||
"total_v1": total_v1,
|
||||
"offset": offset,
|
||||
"batch_size": batch_size,
|
||||
"sample_controls": [
|
||||
{
|
||||
"control_id": r.control_id,
|
||||
"title": r.title,
|
||||
"category": r.category,
|
||||
}
|
||||
for r in v1_controls[:20]
|
||||
],
|
||||
}
|
||||
|
||||
# 2. Process each v1 control
|
||||
processed = 0
|
||||
matches_inserted = 0
|
||||
errors = []
|
||||
sample_matches = []
|
||||
|
||||
for v1 in v1_controls:
|
||||
try:
|
||||
# Build search text
|
||||
search_text = f"{v1.title} — {v1.objective}"
|
||||
|
||||
# Get embedding
|
||||
embedding = await get_embedding(search_text)
|
||||
if not embedding:
|
||||
errors.append({
|
||||
"control_id": v1.control_id,
|
||||
"error": "Embedding fehlgeschlagen",
|
||||
})
|
||||
continue
|
||||
|
||||
# Search Qdrant (cross-regulation, no pattern filter)
|
||||
# Collection is atomic_controls_dedup (contains ~51k atomare Controls)
|
||||
results = await qdrant_search_cross_regulation(
|
||||
embedding, top_k=20,
|
||||
collection="atomic_controls_dedup",
|
||||
)
|
||||
|
||||
# For each hit: resolve to a regulatory parent with source_citation.
|
||||
# Atomic controls in Qdrant usually have parent_control_uuid → parent
|
||||
# has the source_citation. We deduplicate by parent to avoid
|
||||
# listing the same regulation multiple times.
|
||||
rank = 0
|
||||
seen_parents: set[str] = set()
|
||||
|
||||
for hit in results:
|
||||
score = hit.get("score", 0)
|
||||
if score < V1_MATCH_THRESHOLD:
|
||||
continue
|
||||
|
||||
payload = hit.get("payload", {})
|
||||
matched_uuid = payload.get("control_uuid")
|
||||
if not matched_uuid or matched_uuid == str(v1.id):
|
||||
continue
|
||||
|
||||
# Try the matched control itself first, then its parent
|
||||
matched_row = db.execute(text("""
|
||||
SELECT c.id, c.control_id, c.title, c.source_citation,
|
||||
c.severity, c.category, c.parent_control_uuid
|
||||
FROM canonical_controls c
|
||||
WHERE c.id = CAST(:uuid AS uuid)
|
||||
"""), {"uuid": matched_uuid}).fetchone()
|
||||
|
||||
if not matched_row:
|
||||
continue
|
||||
|
||||
# Resolve to regulatory control (one with source_citation)
|
||||
reg_row = matched_row
|
||||
if not reg_row.source_citation and reg_row.parent_control_uuid:
|
||||
# Look up parent — the parent has the source_citation
|
||||
parent_row = db.execute(text("""
|
||||
SELECT id, control_id, title, source_citation,
|
||||
severity, category, parent_control_uuid
|
||||
FROM canonical_controls
|
||||
WHERE id = CAST(:uuid AS uuid)
|
||||
AND source_citation IS NOT NULL
|
||||
"""), {"uuid": str(reg_row.parent_control_uuid)}).fetchone()
|
||||
if parent_row:
|
||||
reg_row = parent_row
|
||||
|
||||
if not reg_row.source_citation:
|
||||
continue
|
||||
|
||||
# Deduplicate by parent UUID
|
||||
parent_key = str(reg_row.id)
|
||||
if parent_key in seen_parents:
|
||||
continue
|
||||
seen_parents.add(parent_key)
|
||||
|
||||
rank += 1
|
||||
if rank > V1_MAX_MATCHES:
|
||||
break
|
||||
|
||||
# Extract source info
|
||||
source_citation = reg_row.source_citation or {}
|
||||
matched_source = source_citation.get("source") if isinstance(source_citation, dict) else None
|
||||
matched_article = source_citation.get("article") if isinstance(source_citation, dict) else None
|
||||
|
||||
# Insert match — link to the regulatory parent (not the atomic child)
|
||||
db.execute(text("""
|
||||
INSERT INTO v1_control_matches
|
||||
(v1_control_uuid, matched_control_uuid, similarity_score,
|
||||
match_rank, matched_source, matched_article, match_method)
|
||||
VALUES
|
||||
(CAST(:v1_uuid AS uuid), CAST(:matched_uuid AS uuid), :score,
|
||||
:rank, :source, :article, 'embedding')
|
||||
ON CONFLICT (v1_control_uuid, matched_control_uuid) DO UPDATE
|
||||
SET similarity_score = EXCLUDED.similarity_score,
|
||||
match_rank = EXCLUDED.match_rank
|
||||
"""), {
|
||||
"v1_uuid": str(v1.id),
|
||||
"matched_uuid": str(reg_row.id),
|
||||
"score": round(score, 3),
|
||||
"rank": rank,
|
||||
"source": matched_source,
|
||||
"article": matched_article,
|
||||
})
|
||||
matches_inserted += 1
|
||||
|
||||
# Collect sample
|
||||
if len(sample_matches) < 20:
|
||||
sample_matches.append({
|
||||
"v1_control_id": v1.control_id,
|
||||
"v1_title": v1.title,
|
||||
"matched_control_id": reg_row.control_id,
|
||||
"matched_title": reg_row.title,
|
||||
"matched_source": matched_source,
|
||||
"matched_article": matched_article,
|
||||
"similarity_score": round(score, 3),
|
||||
"match_rank": rank,
|
||||
})
|
||||
|
||||
processed += 1
|
||||
|
||||
except Exception as e:
|
||||
logger.warning("V1 enrichment error for %s: %s", v1.control_id, e)
|
||||
errors.append({
|
||||
"control_id": v1.control_id,
|
||||
"error": str(e),
|
||||
})
|
||||
|
||||
db.commit()
|
||||
|
||||
# Pagination
|
||||
next_offset = offset + batch_size if len(v1_controls) == batch_size else None
|
||||
|
||||
return {
|
||||
"dry_run": False,
|
||||
"offset": offset,
|
||||
"batch_size": batch_size,
|
||||
"next_offset": next_offset,
|
||||
"total_v1": total_v1,
|
||||
"processed": processed,
|
||||
"matches_inserted": matches_inserted,
|
||||
"errors": errors[:10],
|
||||
"sample_matches": sample_matches,
|
||||
}
|
||||
|
||||
|
||||
async def get_v1_matches(control_uuid: str) -> list[dict]:
|
||||
"""Get all regulatory matches for a specific v1 control.
|
||||
|
||||
Args:
|
||||
control_uuid: The UUID of the v1 control.
|
||||
|
||||
Returns:
|
||||
List of match dicts with control details.
|
||||
"""
|
||||
with SessionLocal() as db:
|
||||
rows = db.execute(text("""
|
||||
SELECT
|
||||
m.similarity_score,
|
||||
m.match_rank,
|
||||
m.matched_source,
|
||||
m.matched_article,
|
||||
m.match_method,
|
||||
c.control_id AS matched_control_id,
|
||||
c.title AS matched_title,
|
||||
c.objective AS matched_objective,
|
||||
c.severity AS matched_severity,
|
||||
c.category AS matched_category,
|
||||
c.source_citation AS matched_source_citation
|
||||
FROM v1_control_matches m
|
||||
JOIN canonical_controls c ON c.id = m.matched_control_uuid
|
||||
WHERE m.v1_control_uuid = CAST(:uuid AS uuid)
|
||||
ORDER BY m.match_rank
|
||||
"""), {"uuid": control_uuid}).fetchall()
|
||||
|
||||
return [
|
||||
{
|
||||
"matched_control_id": r.matched_control_id,
|
||||
"matched_title": r.matched_title,
|
||||
"matched_objective": r.matched_objective,
|
||||
"matched_severity": r.matched_severity,
|
||||
"matched_category": r.matched_category,
|
||||
"matched_source": r.matched_source,
|
||||
"matched_article": r.matched_article,
|
||||
"matched_source_citation": r.matched_source_citation,
|
||||
"similarity_score": float(r.similarity_score),
|
||||
"match_rank": r.match_rank,
|
||||
"match_method": r.match_method,
|
||||
}
|
||||
for r in rows
|
||||
]
|
||||
|
||||
|
||||
async def get_v1_enrichment_stats() -> dict:
|
||||
"""Get overview stats for v1 enrichment."""
|
||||
with SessionLocal() as db:
|
||||
total_v1 = db.execute(text(f"""
|
||||
SELECT COUNT(*) AS cnt FROM canonical_controls
|
||||
WHERE {_is_eigenentwicklung_query()}
|
||||
""")).fetchone()
|
||||
|
||||
matched_v1 = db.execute(text(f"""
|
||||
SELECT COUNT(DISTINCT m.v1_control_uuid) AS cnt
|
||||
FROM v1_control_matches m
|
||||
JOIN canonical_controls c ON c.id = m.v1_control_uuid
|
||||
WHERE {_is_eigenentwicklung_query().replace('release_state', 'c.release_state').replace('generation_strategy', 'c.generation_strategy').replace('pipeline_version', 'c.pipeline_version').replace('source_citation', 'c.source_citation').replace('parent_control_uuid', 'c.parent_control_uuid')}
|
||||
""")).fetchone()
|
||||
|
||||
total_matches = db.execute(text("""
|
||||
SELECT COUNT(*) AS cnt FROM v1_control_matches
|
||||
""")).fetchone()
|
||||
|
||||
avg_score = db.execute(text("""
|
||||
SELECT AVG(similarity_score) AS avg_score FROM v1_control_matches
|
||||
""")).fetchone()
|
||||
|
||||
return {
|
||||
"total_v1_controls": total_v1.cnt if total_v1 else 0,
|
||||
"v1_with_matches": matched_v1.cnt if matched_v1 else 0,
|
||||
"v1_without_matches": (total_v1.cnt if total_v1 else 0) - (matched_v1.cnt if matched_v1 else 0),
|
||||
"total_matches": total_matches.cnt if total_matches else 0,
|
||||
"avg_similarity_score": round(float(avg_score.avg_score), 3) if avg_score and avg_score.avg_score else None,
|
||||
}
|
||||
0
control-pipeline/tests/__init__.py
Normal file
0
control-pipeline/tests/__init__.py
Normal file
229
control-pipeline/tests/test_applicability_engine.py
Normal file
229
control-pipeline/tests/test_applicability_engine.py
Normal file
@@ -0,0 +1,229 @@
|
||||
"""
|
||||
Tests for the Applicability Engine (Phase C2).
|
||||
|
||||
Tests the deterministic filtering logic for industry, company size,
|
||||
and scope signals without requiring a database connection.
|
||||
"""
|
||||
|
||||
import pytest
|
||||
|
||||
from services.applicability_engine import (
|
||||
_matches_company_size,
|
||||
_matches_industry,
|
||||
_matches_scope_signals,
|
||||
_parse_json_text,
|
||||
)
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# _parse_json_text
|
||||
# =============================================================================
|
||||
|
||||
|
||||
class TestParseJsonText:
|
||||
def test_none_returns_none(self):
|
||||
assert _parse_json_text(None) is None
|
||||
|
||||
def test_valid_json_list(self):
|
||||
assert _parse_json_text('["all"]') == ["all"]
|
||||
|
||||
def test_valid_json_list_multiple(self):
|
||||
result = _parse_json_text('["Telekommunikation", "Energie"]')
|
||||
assert result == ["Telekommunikation", "Energie"]
|
||||
|
||||
def test_valid_json_dict(self):
|
||||
result = _parse_json_text('{"requires_any": ["uses_ai"]}')
|
||||
assert result == {"requires_any": ["uses_ai"]}
|
||||
|
||||
def test_invalid_json_returns_none(self):
|
||||
assert _parse_json_text("not json") is None
|
||||
|
||||
def test_empty_string_returns_none(self):
|
||||
assert _parse_json_text("") is None
|
||||
|
||||
def test_already_list_passthrough(self):
|
||||
val = ["all"]
|
||||
assert _parse_json_text(val) == ["all"]
|
||||
|
||||
def test_already_dict_passthrough(self):
|
||||
val = {"requires_any": ["uses_ai"]}
|
||||
assert _parse_json_text(val) == val
|
||||
|
||||
def test_integer_returns_none(self):
|
||||
assert _parse_json_text(42) is None
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# _matches_industry
|
||||
# =============================================================================
|
||||
|
||||
|
||||
class TestMatchesIndustry:
|
||||
def test_null_matches_any_industry(self):
|
||||
assert _matches_industry(None, "Telekommunikation") is True
|
||||
|
||||
def test_all_matches_any_industry(self):
|
||||
assert _matches_industry('["all"]', "Telekommunikation") is True
|
||||
assert _matches_industry('["all"]', "Energie") is True
|
||||
|
||||
def test_specific_industry_matches(self):
|
||||
assert _matches_industry(
|
||||
'["Telekommunikation", "Energie"]', "Telekommunikation"
|
||||
) is True
|
||||
|
||||
def test_specific_industry_no_match(self):
|
||||
assert _matches_industry(
|
||||
'["Telekommunikation", "Energie"]', "Gesundheitswesen"
|
||||
) is False
|
||||
|
||||
def test_malformed_json_matches(self):
|
||||
"""Malformed data should be treated as 'applies to everyone'."""
|
||||
assert _matches_industry("not json", "anything") is True
|
||||
|
||||
def test_all_with_other_industries(self):
|
||||
assert _matches_industry(
|
||||
'["all", "Telekommunikation"]', "Gesundheitswesen"
|
||||
) is True
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# _matches_company_size
|
||||
# =============================================================================
|
||||
|
||||
|
||||
class TestMatchesCompanySize:
|
||||
def test_null_matches_any_size(self):
|
||||
assert _matches_company_size(None, "medium") is True
|
||||
|
||||
def test_all_matches_any_size(self):
|
||||
assert _matches_company_size('["all"]', "micro") is True
|
||||
assert _matches_company_size('["all"]', "enterprise") is True
|
||||
|
||||
def test_specific_size_matches(self):
|
||||
assert _matches_company_size(
|
||||
'["medium", "large", "enterprise"]', "large"
|
||||
) is True
|
||||
|
||||
def test_specific_size_no_match(self):
|
||||
assert _matches_company_size(
|
||||
'["medium", "large", "enterprise"]', "small"
|
||||
) is False
|
||||
|
||||
def test_micro_excluded_from_nis2(self):
|
||||
"""NIS2 typically requires medium+."""
|
||||
assert _matches_company_size(
|
||||
'["medium", "large", "enterprise"]', "micro"
|
||||
) is False
|
||||
|
||||
def test_malformed_json_matches(self):
|
||||
assert _matches_company_size("broken", "medium") is True
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# _matches_scope_signals
|
||||
# =============================================================================
|
||||
|
||||
|
||||
class TestMatchesScopeSignals:
|
||||
def test_null_conditions_always_match(self):
|
||||
assert _matches_scope_signals(None, ["uses_ai"]) is True
|
||||
assert _matches_scope_signals(None, []) is True
|
||||
|
||||
def test_empty_requires_any_matches(self):
|
||||
assert _matches_scope_signals('{"requires_any": []}', ["uses_ai"]) is True
|
||||
|
||||
def test_no_requires_any_key_matches(self):
|
||||
assert _matches_scope_signals(
|
||||
'{"description": "some text"}', ["uses_ai"]
|
||||
) is True
|
||||
|
||||
def test_requires_any_with_matching_signal(self):
|
||||
conditions = '{"requires_any": ["uses_ai"], "description": "AI Act"}'
|
||||
assert _matches_scope_signals(conditions, ["uses_ai"]) is True
|
||||
|
||||
def test_requires_any_with_no_matching_signal(self):
|
||||
conditions = '{"requires_any": ["uses_ai"], "description": "AI Act"}'
|
||||
assert _matches_scope_signals(
|
||||
conditions, ["third_country_transfer"]
|
||||
) is False
|
||||
|
||||
def test_requires_any_with_one_of_multiple_matching(self):
|
||||
conditions = '{"requires_any": ["uses_ai", "processes_health_data"]}'
|
||||
assert _matches_scope_signals(
|
||||
conditions, ["processes_health_data", "financial_data"]
|
||||
) is True
|
||||
|
||||
def test_requires_any_with_no_signals_provided(self):
|
||||
conditions = '{"requires_any": ["uses_ai"]}'
|
||||
assert _matches_scope_signals(conditions, []) is False
|
||||
|
||||
def test_malformed_json_matches(self):
|
||||
assert _matches_scope_signals("broken", ["uses_ai"]) is True
|
||||
|
||||
def test_multiple_required_signals_any_match(self):
|
||||
"""requires_any means at least ONE must match."""
|
||||
conditions = (
|
||||
'{"requires_any": ["uses_ai", "third_country_transfer", '
|
||||
'"processes_health_data"]}'
|
||||
)
|
||||
assert _matches_scope_signals(
|
||||
conditions, ["third_country_transfer"]
|
||||
) is True
|
||||
|
||||
def test_multiple_required_signals_none_match(self):
|
||||
conditions = (
|
||||
'{"requires_any": ["uses_ai", "third_country_transfer"]}'
|
||||
)
|
||||
assert _matches_scope_signals(
|
||||
conditions, ["financial_data", "employee_monitoring"]
|
||||
) is False
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Integration-style: combined filtering scenarios
|
||||
# =============================================================================
|
||||
|
||||
|
||||
class TestCombinedFiltering:
|
||||
"""Test typical real-world filtering scenarios."""
|
||||
|
||||
def test_dsgvo_art5_applies_to_everyone(self):
|
||||
"""DSGVO Art. 5 = all industries, all sizes, no scope conditions."""
|
||||
assert _matches_industry('["all"]', "Telekommunikation") is True
|
||||
assert _matches_company_size('["all"]', "micro") is True
|
||||
assert _matches_scope_signals(None, []) is True
|
||||
|
||||
def test_nis2_art21_kritis_medium_plus(self):
|
||||
"""NIS2 Art. 21 = KRITIS sectors, medium+."""
|
||||
industries = '["Energie", "Gesundheitswesen", "Digitale Infrastruktur", "Logistik / Transport"]'
|
||||
sizes = '["medium", "large", "enterprise"]'
|
||||
|
||||
# Matches: Energie + large
|
||||
assert _matches_industry(industries, "Energie") is True
|
||||
assert _matches_company_size(sizes, "large") is True
|
||||
|
||||
# No match: IT company
|
||||
assert _matches_industry(industries, "Technologie / IT") is False
|
||||
|
||||
# No match: small company
|
||||
assert _matches_company_size(sizes, "small") is False
|
||||
|
||||
def test_ai_act_scope_condition(self):
|
||||
"""AI Act = all industries, all sizes, but only if uses_ai."""
|
||||
conditions = '{"requires_any": ["uses_ai"], "description": "Nur bei KI-Einsatz"}'
|
||||
|
||||
# Company uses AI
|
||||
assert _matches_scope_signals(conditions, ["uses_ai"]) is True
|
||||
|
||||
# Company does not use AI
|
||||
assert _matches_scope_signals(conditions, []) is False
|
||||
assert _matches_scope_signals(
|
||||
conditions, ["third_country_transfer"]
|
||||
) is False
|
||||
|
||||
def test_tkg_telekom_only(self):
|
||||
"""TKG = only Telekommunikation, all sizes."""
|
||||
industries = '["Telekommunikation"]'
|
||||
|
||||
assert _matches_industry(industries, "Telekommunikation") is True
|
||||
assert _matches_industry(industries, "Energie") is False
|
||||
234
docker-compose.coolify.yml
Normal file
234
docker-compose.coolify.yml
Normal file
@@ -0,0 +1,234 @@
|
||||
# =========================================================
|
||||
# BreakPilot Core — Shared Infrastructure (Coolify)
|
||||
# =========================================================
|
||||
# Deployed via Coolify. SSL termination handled by Traefik.
|
||||
# External services (managed separately in Coolify):
|
||||
# - PostgreSQL (PostGIS), Qdrant, S3-compatible storage
|
||||
# Network: breakpilot-network (shared across all 3 repos)
|
||||
# =========================================================
|
||||
|
||||
networks:
|
||||
breakpilot-network:
|
||||
name: breakpilot-network
|
||||
driver: bridge
|
||||
|
||||
volumes:
|
||||
valkey_data:
|
||||
embedding_models:
|
||||
paddleocr_models:
|
||||
|
||||
services:
|
||||
|
||||
# =========================================================
|
||||
# CACHE
|
||||
# =========================================================
|
||||
valkey:
|
||||
image: valkey/valkey:8-alpine
|
||||
container_name: bp-core-valkey
|
||||
volumes:
|
||||
- valkey_data:/data
|
||||
command: valkey-server --appendonly yes --maxmemory 256mb --maxmemory-policy allkeys-lru
|
||||
healthcheck:
|
||||
test: ["CMD", "valkey-cli", "ping"]
|
||||
interval: 5s
|
||||
timeout: 3s
|
||||
retries: 5
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
# =========================================================
|
||||
# SHARED SERVICES
|
||||
# =========================================================
|
||||
consent-service:
|
||||
build:
|
||||
context: ./consent-service
|
||||
dockerfile: Dockerfile
|
||||
container_name: bp-core-consent-service
|
||||
expose:
|
||||
- "8081"
|
||||
environment:
|
||||
DATABASE_URL: postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}:${POSTGRES_PORT:-5432}/${POSTGRES_DB}
|
||||
JWT_SECRET: ${JWT_SECRET}
|
||||
JWT_REFRESH_SECRET: ${JWT_REFRESH_SECRET}
|
||||
PORT: 8081
|
||||
ENVIRONMENT: production
|
||||
ALLOWED_ORIGINS: "*"
|
||||
VALKEY_URL: redis://valkey:6379/0
|
||||
SESSION_TTL_HOURS: ${SESSION_TTL_HOURS:-24}
|
||||
SMTP_HOST: ${SMTP_HOST}
|
||||
SMTP_PORT: ${SMTP_PORT:-587}
|
||||
SMTP_USERNAME: ${SMTP_USERNAME}
|
||||
SMTP_PASSWORD: ${SMTP_PASSWORD}
|
||||
SMTP_FROM_NAME: ${SMTP_FROM_NAME:-BreakPilot}
|
||||
SMTP_FROM_ADDR: ${SMTP_FROM_ADDR:-noreply@breakpilot.ai}
|
||||
FRONTEND_URL: ${FRONTEND_URL:-https://www.breakpilot.ai}
|
||||
depends_on:
|
||||
valkey:
|
||||
condition: service_healthy
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "-q", "--spider", "http://127.0.0.1:8081/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
start_period: 15s
|
||||
retries: 3
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
# =========================================================
|
||||
# RAG & EMBEDDING SERVICES
|
||||
# =========================================================
|
||||
rag-service:
|
||||
build:
|
||||
context: ./rag-service
|
||||
dockerfile: Dockerfile
|
||||
container_name: bp-core-rag-service
|
||||
expose:
|
||||
- "8097"
|
||||
environment:
|
||||
PORT: 8097
|
||||
QDRANT_URL: ${QDRANT_URL}
|
||||
QDRANT_API_KEY: ${QDRANT_API_KEY:-}
|
||||
MINIO_ENDPOINT: ${S3_ENDPOINT}
|
||||
MINIO_ACCESS_KEY: ${S3_ACCESS_KEY}
|
||||
MINIO_SECRET_KEY: ${S3_SECRET_KEY}
|
||||
MINIO_BUCKET: ${S3_BUCKET:-breakpilot-rag}
|
||||
MINIO_SECURE: ${S3_SECURE:-true}
|
||||
EMBEDDING_SERVICE_URL: http://embedding-service:8087
|
||||
OLLAMA_URL: ${OLLAMA_URL:-}
|
||||
OLLAMA_EMBED_MODEL: ${OLLAMA_EMBED_MODEL:-bge-m3}
|
||||
JWT_SECRET: ${JWT_SECRET}
|
||||
ENVIRONMENT: production
|
||||
depends_on:
|
||||
embedding-service:
|
||||
condition: service_healthy
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://127.0.0.1:8097/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 15s
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
embedding-service:
|
||||
build:
|
||||
context: ./embedding-service
|
||||
dockerfile: Dockerfile
|
||||
container_name: bp-core-embedding-service
|
||||
volumes:
|
||||
- embedding_models:/root/.cache/huggingface
|
||||
environment:
|
||||
EMBEDDING_BACKEND: ${EMBEDDING_BACKEND:-local}
|
||||
LOCAL_EMBEDDING_MODEL: ${LOCAL_EMBEDDING_MODEL:-BAAI/bge-m3}
|
||||
LOCAL_RERANKER_MODEL: ${LOCAL_RERANKER_MODEL:-cross-encoder/ms-marco-MiniLM-L-6-v2}
|
||||
PDF_EXTRACTION_BACKEND: ${PDF_EXTRACTION_BACKEND:-pymupdf}
|
||||
OPENAI_API_KEY: ${OPENAI_API_KEY:-}
|
||||
COHERE_API_KEY: ${COHERE_API_KEY:-}
|
||||
LOG_LEVEL: ${LOG_LEVEL:-INFO}
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 8G
|
||||
healthcheck:
|
||||
test: ["CMD", "python", "-c", "import httpx; r=httpx.get('http://127.0.0.1:8087/health'); r.raise_for_status()"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
start_period: 120s
|
||||
retries: 3
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
# =========================================================
|
||||
# OCR SERVICE (PaddleOCR PP-OCRv5)
|
||||
# =========================================================
|
||||
paddleocr-service:
|
||||
build:
|
||||
context: ./paddleocr-service
|
||||
dockerfile: Dockerfile
|
||||
container_name: bp-core-paddleocr
|
||||
expose:
|
||||
- "8095"
|
||||
environment:
|
||||
PADDLEOCR_API_KEY: ${PADDLEOCR_API_KEY:-}
|
||||
FLAGS_use_mkldnn: "0"
|
||||
volumes:
|
||||
- paddleocr_models:/root/.paddleocr
|
||||
labels:
|
||||
- "traefik.http.services.paddleocr.loadbalancer.server.port=8095"
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 6G
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://127.0.0.1:8095/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
start_period: 300s
|
||||
retries: 5
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
# =========================================================
|
||||
# PITCH DECK
|
||||
# =========================================================
|
||||
pitch-deck:
|
||||
build:
|
||||
context: ./pitch-deck
|
||||
dockerfile: Dockerfile
|
||||
container_name: bp-core-pitch-deck
|
||||
expose:
|
||||
- "3000"
|
||||
environment:
|
||||
DATABASE_URL: postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}:${POSTGRES_PORT:-5432}/${POSTGRES_DB}
|
||||
PITCH_JWT_SECRET: ${PITCH_JWT_SECRET}
|
||||
PITCH_ADMIN_SECRET: ${PITCH_ADMIN_SECRET}
|
||||
PITCH_BASE_URL: ${PITCH_BASE_URL:-https://pitch.breakpilot.ai}
|
||||
MAGIC_LINK_TTL_HOURS: ${MAGIC_LINK_TTL_HOURS:-72}
|
||||
# Optional: bootstrap first admin via `npm run admin:create` inside the container.
|
||||
PITCH_ADMIN_BOOTSTRAP_EMAIL: ${PITCH_ADMIN_BOOTSTRAP_EMAIL:-}
|
||||
PITCH_ADMIN_BOOTSTRAP_NAME: ${PITCH_ADMIN_BOOTSTRAP_NAME:-}
|
||||
PITCH_ADMIN_BOOTSTRAP_PASSWORD: ${PITCH_ADMIN_BOOTSTRAP_PASSWORD:-}
|
||||
SMTP_HOST: ${SMTP_HOST}
|
||||
SMTP_PORT: ${SMTP_PORT:-587}
|
||||
SMTP_USERNAME: ${SMTP_USERNAME}
|
||||
SMTP_PASSWORD: ${SMTP_PASSWORD}
|
||||
SMTP_FROM_NAME: ${SMTP_FROM_NAME:-BreakPilot}
|
||||
SMTP_FROM_ADDR: ${SMTP_FROM_ADDR:-noreply@breakpilot.ai}
|
||||
NODE_ENV: production
|
||||
healthcheck:
|
||||
test: ["CMD", "wget", "-q", "--spider", "http://127.0.0.1:3000/api/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
start_period: 15s
|
||||
retries: 3
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
# =========================================================
|
||||
# HEALTH AGGREGATOR
|
||||
# =========================================================
|
||||
health-aggregator:
|
||||
build:
|
||||
context: ./scripts
|
||||
dockerfile: Dockerfile.health
|
||||
container_name: bp-core-health
|
||||
expose:
|
||||
- "8099"
|
||||
environment:
|
||||
PORT: 8099
|
||||
CHECK_SERVICES: "valkey:6379,consent-service:8081,rag-service:8097,embedding-service:8087,paddleocr-service:8095,pitch-deck:3000"
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://127.0.0.1:8099/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
175
docker-compose.hetzner.yml
Normal file
175
docker-compose.hetzner.yml
Normal file
@@ -0,0 +1,175 @@
|
||||
# =========================================================
|
||||
# BreakPilot Core — Hetzner Override (x86_64)
|
||||
# =========================================================
|
||||
# Verwendung:
|
||||
# docker compose -f docker-compose.yml -f docker-compose.hetzner.yml up -d \
|
||||
# postgres valkey qdrant ollama embedding-service rag-service \
|
||||
# backend-core consent-service health-aggregator
|
||||
#
|
||||
# Aenderungen gegenueber Basis (docker-compose.yml):
|
||||
# - platform: linux/amd64 (statt arm64)
|
||||
# - Ollama Container fuer CPU-Embeddings (bge-m3)
|
||||
# - Mailpit ersetzt durch Dummy (kein Mail-Dev-Server noetig)
|
||||
# - Vault, Nginx, Gitea etc. deaktiviert via Profile
|
||||
# - Netzwerk: auto-create (nicht external)
|
||||
# =========================================================
|
||||
|
||||
networks:
|
||||
breakpilot-network:
|
||||
external: true
|
||||
name: breakpilot-network
|
||||
|
||||
services:
|
||||
|
||||
# =========================================================
|
||||
# NEUE SERVICES
|
||||
# =========================================================
|
||||
|
||||
# Ollama fuer Embeddings (CPU-only, bge-m3)
|
||||
ollama:
|
||||
image: ollama/ollama:latest
|
||||
container_name: bp-core-ollama
|
||||
platform: linux/amd64
|
||||
volumes:
|
||||
- ollama_models:/root/.ollama
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "curl -sf http://127.0.0.1:11434/api/tags || exit 1"]
|
||||
interval: 15s
|
||||
timeout: 10s
|
||||
retries: 5
|
||||
start_period: 30s
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
# =========================================================
|
||||
# PLATFORM OVERRIDES (arm64 → amd64)
|
||||
# =========================================================
|
||||
|
||||
backend-core:
|
||||
platform: linux/amd64
|
||||
build:
|
||||
context: ./backend-core
|
||||
dockerfile: Dockerfile
|
||||
args:
|
||||
TARGETARCH: amd64
|
||||
ports:
|
||||
- "8000:8000"
|
||||
environment:
|
||||
DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-breakpilot}:${POSTGRES_PASSWORD:-breakpilot123}@postgres:5432/${POSTGRES_DB:-breakpilot_db}?options=-csearch_path%3Dcore,public
|
||||
JWT_SECRET: ${JWT_SECRET:-your-super-secret-jwt-key-change-in-production}
|
||||
ENVIRONMENT: ${ENVIRONMENT:-production}
|
||||
VALKEY_URL: redis://valkey:6379/0
|
||||
SESSION_TTL_HOURS: ${SESSION_TTL_HOURS:-24}
|
||||
CONSENT_SERVICE_URL: http://consent-service:8081
|
||||
USE_VAULT_SECRETS: "false"
|
||||
SMTP_HOST: ${SMTP_HOST:-smtp.example.com}
|
||||
SMTP_PORT: ${SMTP_PORT:-587}
|
||||
SMTP_USERNAME: ${SMTP_USERNAME:-}
|
||||
SMTP_PASSWORD: ${SMTP_PASSWORD:-}
|
||||
SMTP_FROM_NAME: ${SMTP_FROM_NAME:-BreakPilot}
|
||||
SMTP_FROM_ADDR: ${SMTP_FROM_ADDR:-noreply@breakpilot.app}
|
||||
|
||||
consent-service:
|
||||
platform: linux/amd64
|
||||
environment:
|
||||
DATABASE_URL: postgres://${POSTGRES_USER:-breakpilot}:${POSTGRES_PASSWORD:-breakpilot123}@postgres:5432/${POSTGRES_DB:-breakpilot_db}
|
||||
JWT_SECRET: ${JWT_SECRET:-your-super-secret-jwt-key-change-in-production}
|
||||
JWT_REFRESH_SECRET: ${JWT_REFRESH_SECRET:-your-refresh-secret}
|
||||
PORT: 8081
|
||||
ENVIRONMENT: ${ENVIRONMENT:-production}
|
||||
ALLOWED_ORIGINS: "*"
|
||||
VALKEY_URL: redis://valkey:6379/0
|
||||
SESSION_TTL_HOURS: ${SESSION_TTL_HOURS:-24}
|
||||
SMTP_HOST: ${SMTP_HOST:-smtp.example.com}
|
||||
SMTP_PORT: ${SMTP_PORT:-587}
|
||||
SMTP_USERNAME: ${SMTP_USERNAME:-}
|
||||
SMTP_PASSWORD: ${SMTP_PASSWORD:-}
|
||||
SMTP_FROM_NAME: ${SMTP_FROM_NAME:-BreakPilot}
|
||||
SMTP_FROM_ADDR: ${SMTP_FROM_ADDR:-noreply@breakpilot.app}
|
||||
FRONTEND_URL: ${FRONTEND_URL:-https://admin-dev.breakpilot.ai}
|
||||
|
||||
billing-service:
|
||||
platform: linux/amd64
|
||||
|
||||
rag-service:
|
||||
platform: linux/amd64
|
||||
ports:
|
||||
- "8097:8097"
|
||||
environment:
|
||||
PORT: 8097
|
||||
QDRANT_URL: http://qdrant:6333
|
||||
MINIO_ENDPOINT: nbg1.your-objectstorage.com
|
||||
MINIO_ACCESS_KEY: ${MINIO_ACCESS_KEY:-T18RGFVXXG2ZHQ5404TP}
|
||||
MINIO_SECRET_KEY: ${MINIO_SECRET_KEY:-KOUU4WO6wh07cQjNgh0IZHkeKQrVfBz6hnIGpNss}
|
||||
MINIO_BUCKET: ${MINIO_BUCKET:-breakpilot-rag}
|
||||
MINIO_SECURE: "true"
|
||||
EMBEDDING_SERVICE_URL: http://embedding-service:8087
|
||||
OLLAMA_URL: http://ollama:11434
|
||||
OLLAMA_EMBED_MODEL: ${OLLAMA_EMBED_MODEL:-bge-m3}
|
||||
JWT_SECRET: ${JWT_SECRET:-your-super-secret-jwt-key-change-in-production}
|
||||
ENVIRONMENT: ${ENVIRONMENT:-production}
|
||||
|
||||
embedding-service:
|
||||
platform: linux/amd64
|
||||
ports:
|
||||
- "8087:8087"
|
||||
|
||||
health-aggregator:
|
||||
platform: linux/amd64
|
||||
environment:
|
||||
PORT: 8099
|
||||
CHECK_SERVICES: "postgres:5432,valkey:6379,qdrant:6333,backend-core:8000,rag-service:8097,embedding-service:8087"
|
||||
|
||||
# =========================================================
|
||||
# DUMMY-ERSATZ FUER ABHAENGIGKEITEN
|
||||
# =========================================================
|
||||
# backend-core + consent-service haengen von mailpit ab
|
||||
# (depends_on merged bei compose override, kann nicht entfernt werden)
|
||||
# → Mailpit durch leichtgewichtigen Dummy ersetzen
|
||||
|
||||
mailpit:
|
||||
image: alpine:3.19
|
||||
entrypoint: ["sh", "-c", "echo 'Mailpit dummy on Hetzner' && tail -f /dev/null"]
|
||||
volumes: []
|
||||
ports: []
|
||||
environment: {}
|
||||
|
||||
# Qdrant: RocksDB braucht mehr open files
|
||||
qdrant:
|
||||
ulimits:
|
||||
nofile:
|
||||
soft: 65536
|
||||
hard: 65536
|
||||
|
||||
# minio: rag-service haengt davon ab (depends_on)
|
||||
# Lokal laufen lassen, aber rag-service nutzt externe Hetzner Object Storage
|
||||
# minio bleibt unveraendert (klein, ~50MB RAM)
|
||||
|
||||
# =========================================================
|
||||
# DEAKTIVIERTE SERVICES (via profiles)
|
||||
# =========================================================
|
||||
|
||||
nginx:
|
||||
profiles: ["disabled"]
|
||||
vault:
|
||||
profiles: ["disabled"]
|
||||
vault-init:
|
||||
profiles: ["disabled"]
|
||||
vault-agent:
|
||||
profiles: ["disabled"]
|
||||
gitea:
|
||||
profiles: ["disabled"]
|
||||
gitea-runner:
|
||||
profiles: ["disabled"]
|
||||
night-scheduler:
|
||||
profiles: ["disabled"]
|
||||
admin-core:
|
||||
profiles: ["disabled"]
|
||||
pitch-deck:
|
||||
profiles: ["disabled"]
|
||||
levis-holzbau:
|
||||
profiles: ["disabled"]
|
||||
|
||||
volumes:
|
||||
ollama_models:
|
||||
@@ -19,22 +19,10 @@ volumes:
|
||||
valkey_data:
|
||||
qdrant_data:
|
||||
minio_data:
|
||||
# Communication
|
||||
synapse_data:
|
||||
synapse_db_data:
|
||||
jitsi_web_config:
|
||||
jitsi_web_crontabs:
|
||||
jitsi_transcripts:
|
||||
jitsi_prosody_config:
|
||||
jitsi_prosody_plugins:
|
||||
jitsi_jicofo_config:
|
||||
jitsi_jvb_config:
|
||||
jibri_recordings:
|
||||
# CI/CD
|
||||
gitea_data:
|
||||
gitea_config:
|
||||
gitea_runner_data:
|
||||
woodpecker_data:
|
||||
# ERP
|
||||
erpnext_db_data:
|
||||
erpnext_redis_queue_data:
|
||||
@@ -68,18 +56,19 @@ services:
|
||||
- "8091:8091" # Voice Service (WSS)
|
||||
- "8093:8093" # AI Compliance SDK
|
||||
- "8097:8097" # RAG Service (NEU)
|
||||
#- "8098:8098" # Control Pipeline (intern only, kein Nginx-Port noetig)
|
||||
- "8443:8443" # Jitsi Meet
|
||||
- "3008:3008" # Admin Core
|
||||
- "3010:3010" # Portal Dashboard
|
||||
- "8011:8011" # Compliance Docs (MkDocs)
|
||||
volumes:
|
||||
- ./nginx/conf.d:/etc/nginx/conf.d:ro
|
||||
- vault_certs:/etc/nginx/certs:ro
|
||||
- ./nginx/html:/usr/share/nginx/html/portal:ro
|
||||
- /Users/benjaminadmin/rag-originals:/data/rag-originals:ro
|
||||
depends_on:
|
||||
vault-agent:
|
||||
condition: service_started
|
||||
extra_hosts:
|
||||
- "breakpilot-edu-search:host-gateway"
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- breakpilot-network
|
||||
@@ -89,19 +78,20 @@ services:
|
||||
# =========================================================
|
||||
vault:
|
||||
image: hashicorp/vault:1.15
|
||||
entrypoint: ["vault"]
|
||||
command: server -config=/vault/config/config.hcl
|
||||
container_name: bp-core-vault
|
||||
ports:
|
||||
- "8200:8200"
|
||||
volumes:
|
||||
- vault_data:/vault/data
|
||||
- ./vault/config.hcl:/vault/config/config.hcl:ro
|
||||
cap_add:
|
||||
- IPC_LOCK
|
||||
environment:
|
||||
VAULT_DEV_ROOT_TOKEN_ID: ${VAULT_TOKEN:-breakpilot-dev-token}
|
||||
VAULT_DEV_LISTEN_ADDRESS: "0.0.0.0:8200"
|
||||
VAULT_ADDR: "http://127.0.0.1:8200"
|
||||
healthcheck:
|
||||
test: ["CMD", "vault", "status"]
|
||||
test: ["CMD-SHELL", "vault status; test $? -le 2"]
|
||||
interval: 10s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
@@ -113,14 +103,16 @@ services:
|
||||
image: hashicorp/vault:1.15
|
||||
container_name: bp-core-vault-init
|
||||
volumes:
|
||||
- ./vault/init-pki.sh:/init-pki.sh:ro
|
||||
- ./vault/init-vault.sh:/vault/scripts/init-vault.sh:ro
|
||||
- ./vault/init-pki.sh:/vault/scripts/init-pki.sh:ro
|
||||
- ./vault/init-secrets.sh:/vault/scripts/init-secrets.sh:ro
|
||||
- vault_data:/vault/data
|
||||
- vault_agent_config:/vault/agent/data
|
||||
- vault_certs:/vault/certs
|
||||
environment:
|
||||
VAULT_ADDR: "http://vault:8200"
|
||||
VAULT_TOKEN: ${VAULT_TOKEN:-breakpilot-dev-token}
|
||||
entrypoint: /bin/sh
|
||||
command: /init-pki.sh
|
||||
command: /vault/scripts/init-vault.sh
|
||||
depends_on:
|
||||
vault:
|
||||
condition: service_healthy
|
||||
@@ -191,26 +183,6 @@ services:
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
synapse-db:
|
||||
image: postgres:16-alpine
|
||||
container_name: bp-core-synapse-db
|
||||
profiles: [chat]
|
||||
environment:
|
||||
POSTGRES_USER: synapse
|
||||
POSTGRES_PASSWORD: ${SYNAPSE_DB_PASSWORD:-synapse_secret}
|
||||
POSTGRES_DB: synapse
|
||||
POSTGRES_INITDB_ARGS: "--encoding=UTF-8 --lc-collate=C --lc-ctype=C"
|
||||
volumes:
|
||||
- synapse_db_data:/var/lib/postgresql/data
|
||||
healthcheck:
|
||||
test: ["CMD-SHELL", "pg_isready -U synapse"]
|
||||
interval: 5s
|
||||
timeout: 5s
|
||||
retries: 5
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
# =========================================================
|
||||
# VECTOR DB & OBJECT STORAGE
|
||||
# =========================================================
|
||||
@@ -376,14 +348,18 @@ services:
|
||||
environment:
|
||||
PORT: 8097
|
||||
QDRANT_URL: http://qdrant:6333
|
||||
MINIO_ENDPOINT: minio:9000
|
||||
MINIO_ACCESS_KEY: ${MINIO_ROOT_USER:-breakpilot}
|
||||
MINIO_SECRET_KEY: ${MINIO_ROOT_PASSWORD:-breakpilot123}
|
||||
MINIO_ENDPOINT: nbg1.your-objectstorage.com
|
||||
MINIO_ACCESS_KEY: T18RGFVXXG2ZHQ5404TP
|
||||
MINIO_SECRET_KEY: KOUU4WO6wh07cQjNgh0IZHkeKQrVfBz6hnIGpNss
|
||||
MINIO_BUCKET: ${MINIO_BUCKET:-breakpilot-rag}
|
||||
MINIO_SECURE: "false"
|
||||
MINIO_SECURE: "true"
|
||||
EMBEDDING_SERVICE_URL: http://embedding-service:8087
|
||||
OLLAMA_URL: ${OLLAMA_URL:-http://host.docker.internal:11434}
|
||||
OLLAMA_EMBED_MODEL: ${OLLAMA_EMBED_MODEL:-bge-m3}
|
||||
JWT_SECRET: ${JWT_SECRET:-your-super-secret-jwt-key-change-in-production}
|
||||
ENVIRONMENT: ${ENVIRONMENT:-development}
|
||||
extra_hosts:
|
||||
- "host.docker.internal:host-gateway"
|
||||
depends_on:
|
||||
qdrant:
|
||||
condition: service_healthy
|
||||
@@ -401,6 +377,50 @@ services:
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
# =========================================================
|
||||
# CONTROL PIPELINE (Entwickler-only, nicht kundenrelevant)
|
||||
# =========================================================
|
||||
control-pipeline:
|
||||
build:
|
||||
context: ./control-pipeline
|
||||
dockerfile: Dockerfile
|
||||
container_name: bp-core-control-pipeline
|
||||
platform: linux/arm64
|
||||
expose:
|
||||
- "8098"
|
||||
environment:
|
||||
PORT: 8098
|
||||
DATABASE_URL: postgresql://${POSTGRES_USER:-breakpilot}:${POSTGRES_PASSWORD:-breakpilot123}@postgres:5432/${POSTGRES_DB:-breakpilot_db}
|
||||
SCHEMA_SEARCH_PATH: compliance,core,public
|
||||
QDRANT_URL: http://qdrant:6333
|
||||
EMBEDDING_SERVICE_URL: http://embedding-service:8087
|
||||
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-}
|
||||
CONTROL_GEN_ANTHROPIC_MODEL: ${CONTROL_GEN_ANTHROPIC_MODEL:-claude-sonnet-4-6}
|
||||
DECOMPOSITION_LLM_MODEL: ${DECOMPOSITION_LLM_MODEL:-claude-haiku-4-5-20251001}
|
||||
OLLAMA_URL: ${OLLAMA_URL:-http://host.docker.internal:11434}
|
||||
CONTROL_GEN_OLLAMA_MODEL: ${CONTROL_GEN_OLLAMA_MODEL:-qwen3.5:35b-a3b}
|
||||
SDK_URL: http://ai-compliance-sdk:8090
|
||||
JWT_SECRET: ${JWT_SECRET:-your-super-secret-jwt-key-change-in-production}
|
||||
ENVIRONMENT: ${ENVIRONMENT:-development}
|
||||
extra_hosts:
|
||||
- "host.docker.internal:host-gateway"
|
||||
depends_on:
|
||||
postgres:
|
||||
condition: service_healthy
|
||||
qdrant:
|
||||
condition: service_healthy
|
||||
embedding-service:
|
||||
condition: service_healthy
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://127.0.0.1:8098/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 10s
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
embedding-service:
|
||||
build:
|
||||
context: ./embedding-service
|
||||
@@ -411,7 +431,7 @@ services:
|
||||
- embedding_models:/root/.cache/huggingface
|
||||
environment:
|
||||
EMBEDDING_BACKEND: ${EMBEDDING_BACKEND:-local}
|
||||
LOCAL_EMBEDDING_MODEL: ${LOCAL_EMBEDDING_MODEL:-sentence-transformers/all-MiniLM-L6-v2}
|
||||
LOCAL_EMBEDDING_MODEL: ${LOCAL_EMBEDDING_MODEL:-BAAI/bge-m3}
|
||||
LOCAL_RERANKER_MODEL: ${LOCAL_RERANKER_MODEL:-cross-encoder/ms-marco-MiniLM-L-6-v2}
|
||||
PDF_EXTRACTION_BACKEND: ${PDF_EXTRACTION_BACKEND:-pymupdf}
|
||||
OPENAI_API_KEY: ${OPENAI_API_KEY:-}
|
||||
@@ -420,7 +440,7 @@ services:
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 4G
|
||||
memory: 8G
|
||||
healthcheck:
|
||||
test: ["CMD", "python", "-c", "import httpx; r=httpx.get('http://127.0.0.1:8087/health'); r.raise_for_status()"]
|
||||
interval: 30s
|
||||
@@ -457,199 +477,6 @@ services:
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
# =========================================================
|
||||
# COMMUNICATION
|
||||
# =========================================================
|
||||
synapse:
|
||||
image: matrixdotorg/synapse:latest
|
||||
container_name: bp-core-synapse
|
||||
profiles: [chat]
|
||||
ports:
|
||||
- "8008:8008"
|
||||
- "8448:8448"
|
||||
volumes:
|
||||
- synapse_data:/data
|
||||
environment:
|
||||
SYNAPSE_SERVER_NAME: ${SYNAPSE_SERVER_NAME:-macmini}
|
||||
SYNAPSE_REPORT_STATS: "no"
|
||||
SYNAPSE_NO_TLS: "true"
|
||||
SYNAPSE_ENABLE_REGISTRATION: ${SYNAPSE_ENABLE_REGISTRATION:-true}
|
||||
SYNAPSE_LOG_LEVEL: ${SYNAPSE_LOG_LEVEL:-WARNING}
|
||||
UID: "1000"
|
||||
GID: "1000"
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://127.0.0.1:8008/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
start_period: 30s
|
||||
retries: 3
|
||||
depends_on:
|
||||
synapse-db:
|
||||
condition: service_healthy
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
jitsi-web:
|
||||
image: jitsi/web:stable-9823
|
||||
container_name: bp-core-jitsi-web
|
||||
expose:
|
||||
- "80"
|
||||
volumes:
|
||||
- jitsi_web_config:/config
|
||||
- jitsi_web_crontabs:/var/spool/cron/crontabs
|
||||
- jitsi_transcripts:/usr/share/jitsi-meet/transcripts
|
||||
environment:
|
||||
ENABLE_XMPP_WEBSOCKET: "true"
|
||||
ENABLE_COLIBRI_WEBSOCKET: "true"
|
||||
XMPP_DOMAIN: ${XMPP_DOMAIN:-meet.jitsi}
|
||||
XMPP_BOSH_URL_BASE: http://jitsi-xmpp:5280
|
||||
XMPP_MUC_DOMAIN: ${XMPP_MUC_DOMAIN:-muc.meet.jitsi}
|
||||
XMPP_GUEST_DOMAIN: ${XMPP_GUEST_DOMAIN:-guest.meet.jitsi}
|
||||
TZ: ${TZ:-Europe/Berlin}
|
||||
PUBLIC_URL: ${JITSI_PUBLIC_URL:-https://macmini:8443}
|
||||
JICOFO_AUTH_USER: focus
|
||||
ENABLE_AUTH: ${JITSI_ENABLE_AUTH:-false}
|
||||
ENABLE_GUESTS: "true"
|
||||
ENABLE_RECORDING: "true"
|
||||
ENABLE_LIVESTREAMING: "false"
|
||||
DISABLE_HTTPS: "true"
|
||||
APP_NAME: "BreakPilot Meet"
|
||||
NATIVE_APP_NAME: "BreakPilot Meet"
|
||||
PROVIDER_NAME: "BreakPilot"
|
||||
depends_on:
|
||||
- jitsi-xmpp
|
||||
networks:
|
||||
breakpilot-network:
|
||||
aliases:
|
||||
- meet.jitsi
|
||||
|
||||
jitsi-xmpp:
|
||||
image: jitsi/prosody:stable-9823
|
||||
container_name: bp-core-jitsi-xmpp
|
||||
volumes:
|
||||
- jitsi_prosody_config:/config
|
||||
- jitsi_prosody_plugins:/prosody-plugins-custom
|
||||
environment:
|
||||
XMPP_DOMAIN: ${XMPP_DOMAIN:-meet.jitsi}
|
||||
XMPP_AUTH_DOMAIN: ${XMPP_AUTH_DOMAIN:-auth.meet.jitsi}
|
||||
XMPP_MUC_DOMAIN: ${XMPP_MUC_DOMAIN:-muc.meet.jitsi}
|
||||
XMPP_INTERNAL_MUC_DOMAIN: ${XMPP_INTERNAL_MUC_DOMAIN:-internal-muc.meet.jitsi}
|
||||
XMPP_GUEST_DOMAIN: ${XMPP_GUEST_DOMAIN:-guest.meet.jitsi}
|
||||
XMPP_RECORDER_DOMAIN: ${XMPP_RECORDER_DOMAIN:-recorder.meet.jitsi}
|
||||
XMPP_CROSS_DOMAIN: "true"
|
||||
TZ: ${TZ:-Europe/Berlin}
|
||||
JICOFO_AUTH_USER: focus
|
||||
JICOFO_AUTH_PASSWORD: ${JICOFO_AUTH_PASSWORD:-jicofo_secret}
|
||||
JVB_AUTH_USER: jvb
|
||||
JVB_AUTH_PASSWORD: ${JVB_AUTH_PASSWORD:-jvb_secret}
|
||||
JIBRI_XMPP_USER: jibri
|
||||
JIBRI_XMPP_PASSWORD: ${JIBRI_XMPP_PASSWORD:-jibri_secret}
|
||||
JIBRI_RECORDER_USER: recorder
|
||||
JIBRI_RECORDER_PASSWORD: ${JIBRI_RECORDER_PASSWORD:-recorder_secret}
|
||||
LOG_LEVEL: ${XMPP_LOG_LEVEL:-warn}
|
||||
PUBLIC_URL: ${JITSI_PUBLIC_URL:-https://macmini:8443}
|
||||
ENABLE_AUTH: ${JITSI_ENABLE_AUTH:-false}
|
||||
ENABLE_GUESTS: "true"
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
breakpilot-network:
|
||||
aliases:
|
||||
- xmpp.meet.jitsi
|
||||
|
||||
jitsi-jicofo:
|
||||
image: jitsi/jicofo:stable-9823
|
||||
container_name: bp-core-jitsi-jicofo
|
||||
volumes:
|
||||
- jitsi_jicofo_config:/config
|
||||
environment:
|
||||
XMPP_DOMAIN: ${XMPP_DOMAIN:-meet.jitsi}
|
||||
XMPP_AUTH_DOMAIN: ${XMPP_AUTH_DOMAIN:-auth.meet.jitsi}
|
||||
XMPP_MUC_DOMAIN: ${XMPP_MUC_DOMAIN:-muc.meet.jitsi}
|
||||
XMPP_INTERNAL_MUC_DOMAIN: ${XMPP_INTERNAL_MUC_DOMAIN:-internal-muc.meet.jitsi}
|
||||
XMPP_SERVER: jitsi-xmpp
|
||||
JICOFO_AUTH_USER: focus
|
||||
JICOFO_AUTH_PASSWORD: ${JICOFO_AUTH_PASSWORD:-jicofo_secret}
|
||||
TZ: ${TZ:-Europe/Berlin}
|
||||
ENABLE_AUTH: ${JITSI_ENABLE_AUTH:-false}
|
||||
AUTH_TYPE: internal
|
||||
ENABLE_AUTO_OWNER: "true"
|
||||
depends_on:
|
||||
- jitsi-xmpp
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
jitsi-jvb:
|
||||
image: jitsi/jvb:stable-9823
|
||||
container_name: bp-core-jitsi-jvb
|
||||
ports:
|
||||
- "10000:10000/udp"
|
||||
- "8080:8080"
|
||||
volumes:
|
||||
- jitsi_jvb_config:/config
|
||||
environment:
|
||||
XMPP_DOMAIN: ${XMPP_DOMAIN:-meet.jitsi}
|
||||
XMPP_AUTH_DOMAIN: ${XMPP_AUTH_DOMAIN:-auth.meet.jitsi}
|
||||
XMPP_INTERNAL_MUC_DOMAIN: ${XMPP_INTERNAL_MUC_DOMAIN:-internal-muc.meet.jitsi}
|
||||
XMPP_SERVER: jitsi-xmpp
|
||||
JVB_AUTH_USER: jvb
|
||||
JVB_AUTH_PASSWORD: ${JVB_AUTH_PASSWORD:-jvb_secret}
|
||||
JVB_PORT: 10000
|
||||
JVB_STUN_SERVERS: ${JVB_STUN_SERVERS:-stun.l.google.com:19302}
|
||||
TZ: ${TZ:-Europe/Berlin}
|
||||
PUBLIC_URL: ${JITSI_PUBLIC_URL:-https://macmini:8443}
|
||||
COLIBRI_REST_ENABLED: "true"
|
||||
ENABLE_COLIBRI_WEBSOCKET: "true"
|
||||
depends_on:
|
||||
- jitsi-xmpp
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
jibri:
|
||||
build:
|
||||
context: ./docker/jibri
|
||||
dockerfile: Dockerfile
|
||||
container_name: bp-core-jibri
|
||||
volumes:
|
||||
- jibri_recordings:/recordings
|
||||
- /dev/shm:/dev/shm
|
||||
shm_size: 2gb
|
||||
cap_add:
|
||||
- SYS_ADMIN
|
||||
- NET_BIND_SERVICE
|
||||
environment:
|
||||
XMPP_DOMAIN: ${XMPP_DOMAIN:-meet.jitsi}
|
||||
XMPP_AUTH_DOMAIN: ${XMPP_AUTH_DOMAIN:-auth.meet.jitsi}
|
||||
XMPP_INTERNAL_MUC_DOMAIN: ${XMPP_INTERNAL_MUC_DOMAIN:-internal-muc.meet.jitsi}
|
||||
XMPP_RECORDER_DOMAIN: ${XMPP_RECORDER_DOMAIN:-recorder.meet.jitsi}
|
||||
XMPP_SERVER: jitsi-xmpp
|
||||
XMPP_MUC_DOMAIN: ${XMPP_MUC_DOMAIN:-muc.meet.jitsi}
|
||||
JIBRI_XMPP_USER: jibri
|
||||
JIBRI_XMPP_PASSWORD: ${JIBRI_XMPP_PASSWORD:-jibri_secret}
|
||||
JIBRI_RECORDER_USER: recorder
|
||||
JIBRI_RECORDER_PASSWORD: ${JIBRI_RECORDER_PASSWORD:-recorder_secret}
|
||||
JIBRI_BREWERY_MUC: JibriBrewery
|
||||
JIBRI_RECORDING_DIR: /recordings
|
||||
JIBRI_FINALIZE_SCRIPT: /finalize.sh
|
||||
TZ: ${TZ:-Europe/Berlin}
|
||||
DISPLAY: ":0"
|
||||
RESOLUTION: "1920x1080"
|
||||
MINIO_ENDPOINT: minio:9000
|
||||
MINIO_ACCESS_KEY: ${MINIO_ROOT_USER:-breakpilot}
|
||||
MINIO_SECRET_KEY: ${MINIO_ROOT_PASSWORD:-breakpilot123}
|
||||
MINIO_BUCKET: ${MINIO_BUCKET:-breakpilot-recordings}
|
||||
BACKEND_WEBHOOK_URL: http://backend-core:8000/api/recordings/webhook
|
||||
depends_on:
|
||||
- jitsi-xmpp
|
||||
- minio
|
||||
profiles:
|
||||
- recording
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
# =========================================================
|
||||
# DEVOPS & CI/CD
|
||||
# =========================================================
|
||||
@@ -721,88 +548,6 @@ services:
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
woodpecker-server:
|
||||
image: woodpeckerci/woodpecker-server:v3
|
||||
container_name: bp-core-woodpecker-server
|
||||
ports:
|
||||
- "8090:8000"
|
||||
volumes:
|
||||
- woodpecker_data:/var/lib/woodpecker
|
||||
environment:
|
||||
WOODPECKER_OPEN: "true"
|
||||
WOODPECKER_HOST: ${WOODPECKER_HOST:-http://macmini:8090}
|
||||
WOODPECKER_ADMIN: ${WOODPECKER_ADMIN:-pilotadmin}
|
||||
WOODPECKER_GITEA: "true"
|
||||
WOODPECKER_GITEA_URL: http://gitea:3003
|
||||
WOODPECKER_GITEA_CLIENT: ${WOODPECKER_GITEA_CLIENT:-}
|
||||
WOODPECKER_GITEA_SECRET: ${WOODPECKER_GITEA_SECRET:-}
|
||||
WOODPECKER_AGENT_SECRET: ${WOODPECKER_AGENT_SECRET:-woodpecker-secret}
|
||||
WOODPECKER_DATABASE_DRIVER: sqlite3
|
||||
WOODPECKER_DATABASE_DATASOURCE: /var/lib/woodpecker/woodpecker.sqlite
|
||||
WOODPECKER_LOG_LEVEL: warn
|
||||
WOODPECKER_PLUGINS_PRIVILEGED: "plugins/docker"
|
||||
WOODPECKER_PLUGINS_TRUSTED_CLONE: "true"
|
||||
extra_hosts:
|
||||
- "macmini:192.168.178.100"
|
||||
depends_on:
|
||||
gitea:
|
||||
condition: service_healthy
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
woodpecker-agent:
|
||||
image: woodpeckerci/woodpecker-agent:v3
|
||||
container_name: bp-core-woodpecker-agent
|
||||
volumes:
|
||||
- /var/run/docker.sock:/var/run/docker.sock
|
||||
environment:
|
||||
WOODPECKER_SERVER: woodpecker-server:9000
|
||||
WOODPECKER_AGENT_SECRET: ${WOODPECKER_AGENT_SECRET:-woodpecker-secret}
|
||||
WOODPECKER_MAX_WORKFLOWS: "2"
|
||||
WOODPECKER_LOG_LEVEL: warn
|
||||
WOODPECKER_BACKEND: docker
|
||||
DOCKER_HOST: unix:///var/run/docker.sock
|
||||
WOODPECKER_BACKEND_DOCKER_EXTRA_HOSTS: "macmini:192.168.178.100"
|
||||
WOODPECKER_BACKEND_DOCKER_NETWORK: breakpilot-network
|
||||
depends_on:
|
||||
- woodpecker-server
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
# =========================================================
|
||||
# WORKFLOW ENGINE
|
||||
# =========================================================
|
||||
camunda:
|
||||
image: camunda/camunda-bpm-platform:7.21.0
|
||||
container_name: bp-core-camunda
|
||||
ports:
|
||||
- "8089:8080"
|
||||
environment:
|
||||
DB_DRIVER: org.postgresql.Driver
|
||||
DB_URL: jdbc:postgresql://postgres:5432/${POSTGRES_DB:-breakpilot_db}
|
||||
DB_USERNAME: ${POSTGRES_USER:-breakpilot}
|
||||
DB_PASSWORD: ${POSTGRES_PASSWORD:-breakpilot123}
|
||||
DB_VALIDATE_ON_BORROW: "true"
|
||||
WAIT_FOR: postgres:5432
|
||||
CAMUNDA_BPM_ADMIN_USER_ID: ${CAMUNDA_ADMIN_USER:-admin}
|
||||
CAMUNDA_BPM_ADMIN_USER_PASSWORD: ${CAMUNDA_ADMIN_PASSWORD:-admin}
|
||||
depends_on:
|
||||
postgres:
|
||||
condition: service_healthy
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://127.0.0.1:8080/camunda/api/engine"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
start_period: 60s
|
||||
retries: 5
|
||||
profiles:
|
||||
- bpmn
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
# =========================================================
|
||||
# DOCUMENTATION & UTILITIES
|
||||
# =========================================================
|
||||
@@ -837,6 +582,9 @@ services:
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
# =========================================================
|
||||
# NIGHT SCHEDULER
|
||||
# =========================================================
|
||||
night-scheduler:
|
||||
build:
|
||||
context: ./night-scheduler
|
||||
@@ -877,8 +625,6 @@ services:
|
||||
environment:
|
||||
NODE_ENV: production
|
||||
BACKEND_URL: http://backend-core:8000
|
||||
WOODPECKER_URL: http://bp-core-woodpecker-server:8000
|
||||
WOODPECKER_TOKEN: ${WOODPECKER_TOKEN:-}
|
||||
OLLAMA_URL: ${OLLAMA_URL:-http://host.docker.internal:11434}
|
||||
extra_hosts:
|
||||
- "host.docker.internal:host-gateway"
|
||||
@@ -1132,8 +878,10 @@ services:
|
||||
environment:
|
||||
NODE_ENV: production
|
||||
DATABASE_URL: postgres://${POSTGRES_USER:-breakpilot}:${POSTGRES_PASSWORD:-breakpilot123}@postgres:5432/${POSTGRES_DB:-breakpilot_db}
|
||||
OLLAMA_URL: ${OLLAMA_URL:-http://host.docker.internal:11434}
|
||||
OLLAMA_MODEL: ${OLLAMA_MODEL:-qwen3:30b-a3b}
|
||||
LITELLM_URL: ${LITELLM_URL:-https://llm-dev.meghsakha.com}
|
||||
LITELLM_MODEL: ${LITELLM_MODEL:-gpt-oss-120b}
|
||||
LITELLM_API_KEY: ${LITELLM_API_KEY:-sk-0nAyxaMVbIqmz_ntnndzag}
|
||||
TTS_SERVICE_URL: http://bp-compliance-tts:8095
|
||||
extra_hosts:
|
||||
- "host.docker.internal:host-gateway"
|
||||
depends_on:
|
||||
@@ -1142,3 +890,20 @@ services:
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
# =========================================================
|
||||
# LEVIS HOLZBAU - Kinder-Holzwerk-Website
|
||||
# =========================================================
|
||||
levis-holzbau:
|
||||
build:
|
||||
context: ./levis-holzbau
|
||||
dockerfile: Dockerfile
|
||||
container_name: bp-core-levis-holzbau
|
||||
platform: linux/arm64
|
||||
ports:
|
||||
- "3013:3000"
|
||||
environment:
|
||||
NODE_ENV: production
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- breakpilot-network
|
||||
|
||||
@@ -1,194 +1,77 @@
|
||||
# Umgebungs-Architektur
|
||||
|
||||
## Übersicht
|
||||
## Uebersicht
|
||||
|
||||
BreakPilot verwendet eine 3-Umgebungs-Strategie für sichere Entwicklung und Deployment:
|
||||
BreakPilot verwendet zwei Umgebungen:
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ Development │────▶│ Staging │────▶│ Production │
|
||||
│ (develop) │ │ (staging) │ │ (main) │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
Tägliche Getesteter Code Produktionsreif
|
||||
Entwicklung
|
||||
┌─────────────────┐ ┌─────────────────┐
|
||||
│ Development │───── git push ────▶│ Production │
|
||||
│ (Mac Mini) │ │ (Coolify) │
|
||||
└─────────────────┘ └─────────────────┘
|
||||
Lokale Automatisch
|
||||
Entwicklung via Coolify
|
||||
```
|
||||
|
||||
## Umgebungen
|
||||
|
||||
### Development (Dev)
|
||||
### Development (Lokal — Mac Mini)
|
||||
|
||||
**Zweck:** Tägliche Entwicklungsarbeit
|
||||
**Zweck:** Lokale Entwicklung und Tests
|
||||
|
||||
| Eigenschaft | Wert |
|
||||
|-------------|------|
|
||||
| Git Branch | `develop` |
|
||||
| Compose File | `docker-compose.yml` + `docker-compose.override.yml` (auto) |
|
||||
| Env File | `.env.dev` |
|
||||
| Database | `breakpilot_dev` |
|
||||
| Git Branch | `main` |
|
||||
| Compose File | `docker-compose.yml` |
|
||||
| Database | Lokale PostgreSQL |
|
||||
| Debug | Aktiviert |
|
||||
| Hot-Reload | Aktiviert |
|
||||
|
||||
**Start:**
|
||||
```bash
|
||||
./scripts/start.sh dev
|
||||
# oder einfach:
|
||||
docker compose up -d
|
||||
ssh macmini "cd ~/Projekte/breakpilot-core && /usr/local/bin/docker compose up -d"
|
||||
```
|
||||
|
||||
### Staging
|
||||
### Production (Coolify)
|
||||
|
||||
**Zweck:** Getesteter, freigegebener Code vor Produktion
|
||||
|
||||
| Eigenschaft | Wert |
|
||||
|-------------|------|
|
||||
| Git Branch | `staging` |
|
||||
| Compose File | `docker-compose.yml` + `docker-compose.staging.yml` |
|
||||
| Env File | `.env.staging` |
|
||||
| Database | `breakpilot_staging` (separates Volume) |
|
||||
| Debug | Deaktiviert |
|
||||
| Hot-Reload | Deaktiviert |
|
||||
|
||||
**Start:**
|
||||
```bash
|
||||
./scripts/start.sh staging
|
||||
# oder:
|
||||
docker compose -f docker-compose.yml -f docker-compose.staging.yml up -d
|
||||
```
|
||||
|
||||
### Production (Prod)
|
||||
|
||||
**Zweck:** Live-System für Endbenutzer (ab Launch)
|
||||
**Zweck:** Live-System
|
||||
|
||||
| Eigenschaft | Wert |
|
||||
|-------------|------|
|
||||
| Git Branch | `main` |
|
||||
| Compose File | `docker-compose.yml` + `docker-compose.prod.yml` |
|
||||
| Env File | `.env.prod` (NICHT im Repository!) |
|
||||
| Database | `breakpilot_prod` (separates Volume) |
|
||||
| Deployment | Coolify (automatisch bei Push auf gitea) |
|
||||
| Database | Externe PostgreSQL (TLS) |
|
||||
| Debug | Deaktiviert |
|
||||
| Vault | Pflicht (keine Env-Fallbacks) |
|
||||
|
||||
## Datenbank-Trennung
|
||||
|
||||
Jede Umgebung verwendet separate Docker Volumes für vollständige Datenisolierung:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ PostgreSQL Volumes │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ breakpilot-dev_postgres_data │ Development Database │
|
||||
│ breakpilot_staging_postgres │ Staging Database │
|
||||
│ breakpilot_prod_postgres │ Production Database │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Port-Mapping
|
||||
|
||||
Um mehrere Umgebungen gleichzeitig laufen zu lassen, verwenden sie unterschiedliche Ports:
|
||||
|
||||
| Service | Dev Port | Staging Port | Prod Port |
|
||||
|---------|----------|--------------|-----------|
|
||||
| Backend | 8000 | 8001 | 8000 |
|
||||
| PostgreSQL | 5432 | 5433 | - (intern) |
|
||||
| MinIO | 9000/9001 | 9002/9003 | - (intern) |
|
||||
| Qdrant | 6333/6334 | 6335/6336 | - (intern) |
|
||||
| Mailpit | 8025/1025 | 8026/1026 | - (deaktiviert) |
|
||||
|
||||
## Git Branching Strategie
|
||||
|
||||
```
|
||||
main (Prod) ← Nur Release-Merges, geschützt
|
||||
│
|
||||
▼
|
||||
staging ← Getesteter Code, Review erforderlich
|
||||
│
|
||||
▼
|
||||
develop (Dev) ← Tägliche Arbeit, Default-Branch
|
||||
│
|
||||
▼
|
||||
feature/* ← Feature-Branches (optional)
|
||||
```
|
||||
|
||||
### Workflow
|
||||
|
||||
1. **Entwicklung:** Arbeite auf `develop`
|
||||
2. **Code-Review:** Erstelle PR von Feature-Branch → `develop`
|
||||
3. **Staging:** Promote `develop` → `staging` mit Tests
|
||||
4. **Release:** Promote `staging` → `main` nach Freigabe
|
||||
|
||||
### Promotion-Befehle
|
||||
|
||||
**Deploy:**
|
||||
```bash
|
||||
# develop → staging
|
||||
./scripts/promote.sh dev-to-staging
|
||||
|
||||
# staging → main (Production)
|
||||
./scripts/promote.sh staging-to-prod
|
||||
git push origin main && git push gitea main
|
||||
# Coolify baut und deployt automatisch
|
||||
```
|
||||
|
||||
## Secrets Management
|
||||
|
||||
### Development
|
||||
- `.env.dev` enthält Entwicklungs-Credentials
|
||||
- Vault optional (Dev-Token)
|
||||
- Mailpit für E-Mail-Tests
|
||||
|
||||
### Staging
|
||||
- `.env.staging` enthält Test-Credentials
|
||||
- Vault empfohlen
|
||||
- Mailpit für E-Mail-Sicherheit
|
||||
|
||||
### Production
|
||||
- `.env.prod` NICHT im Repository
|
||||
- Vault PFLICHT
|
||||
- Echte SMTP-Konfiguration
|
||||
|
||||
Siehe auch: [Secrets Management](./secrets-management.md)
|
||||
|
||||
## Docker Compose Architektur
|
||||
|
||||
```
|
||||
docker-compose.yml ← Basis-Konfiguration
|
||||
docker-compose.yml ← Basis-Konfiguration (lokal, arm64)
|
||||
│
|
||||
├── docker-compose.override.yml ← Dev (auto-geladen)
|
||||
│
|
||||
├── docker-compose.staging.yml ← Staging (explizit)
|
||||
│
|
||||
└── docker-compose.prod.yml ← Production (explizit)
|
||||
└── docker-compose.coolify.yml ← Production Override (amd64)
|
||||
```
|
||||
|
||||
### Automatisches Laden
|
||||
Coolify verwendet automatisch beide Compose-Files fuer den Production-Build.
|
||||
|
||||
Docker Compose lädt automatisch:
|
||||
1. `docker-compose.yml`
|
||||
2. `docker-compose.override.yml` (falls vorhanden)
|
||||
## Secrets Management
|
||||
|
||||
Daher startet `docker compose up` automatisch die Dev-Umgebung.
|
||||
### Development
|
||||
- `.env` enthält Entwicklungs-Credentials
|
||||
- Vault optional (Dev-Token)
|
||||
- Mailpit für E-Mail-Tests
|
||||
|
||||
## Helper Scripts
|
||||
### Production
|
||||
- `.env` auf dem Server (nicht im Repository)
|
||||
- Vault PFLICHT
|
||||
- Echte SMTP-Konfiguration
|
||||
|
||||
| Script | Beschreibung |
|
||||
|--------|--------------|
|
||||
| `scripts/env-switch.sh` | Wechselt zwischen Umgebungen |
|
||||
| `scripts/start.sh` | Startet Services für Umgebung |
|
||||
| `scripts/stop.sh` | Stoppt Services |
|
||||
| `scripts/promote.sh` | Promotet Code zwischen Branches |
|
||||
| `scripts/status.sh` | Zeigt aktuellen Status |
|
||||
|
||||
## Verifikation
|
||||
|
||||
Nach Setup prüfen:
|
||||
|
||||
```bash
|
||||
# Status anzeigen
|
||||
./scripts/status.sh
|
||||
|
||||
# Branches prüfen
|
||||
git branch -v
|
||||
|
||||
# Volumes prüfen
|
||||
docker volume ls | grep breakpilot
|
||||
```
|
||||
Siehe auch: [Secrets Management](./secrets-management.md)
|
||||
|
||||
## Verwandte Dokumentation
|
||||
|
||||
|
||||
317
docs-src/architecture/sdk-protection.md
Normal file
317
docs-src/architecture/sdk-protection.md
Normal file
@@ -0,0 +1,317 @@
|
||||
# SDK Protection Middleware
|
||||
|
||||
## 1. Worum geht es?
|
||||
|
||||
Die SDK Protection Middleware schuetzt die Compliance-SDK-Endpunkte vor einer bestimmten Art von Angriff: der **systematischen Enumeration**. Was bedeutet das?
|
||||
|
||||
> *Ein Wettbewerber registriert sich als zahlender Kunde und laesst ein Skript langsam und verteilt alle TOM-Controls, alle Pruefaspekte und alle Assessment-Kriterien abfragen. Aus den Ergebnissen rekonstruiert er die gesamte Compliance-Framework-Logik.*
|
||||
|
||||
Der klassische Rate Limiter (100 Requests/Minute) hilft hier nicht, weil ein cleverer Angreifer langsam vorgeht -- vielleicht nur 20 Anfragen pro Minute, dafuer systematisch und ueber Stunden. Die SDK Protection erkennt solche Muster und reagiert darauf.
|
||||
|
||||
!!! info "Kern-Designprinzip"
|
||||
**Normale Nutzer merken nichts.** Ein Lehrer, der im TOM-Modul arbeitet, greift typischerweise auf 3-5 Kategorien zu und wiederholt Anfragen an gleiche Endpunkte. Ein Angreifer durchlaeuft dagegen 40+ Kategorien in alphabetischer Reihenfolge. Genau diesen Unterschied erkennt die Middleware.
|
||||
|
||||
---
|
||||
|
||||
## 2. Wie funktioniert der Schutz?
|
||||
|
||||
Die Middleware nutzt ein **Anomaly-Score-System**. Jeder Benutzer hat einen Score, der bei 0 beginnt. Verschiedene verdaechtige Verhaltensweisen erhoehen den Score. Ueber die Zeit sinkt er wieder ab. Je hoeher der Score, desto staerker wird der Benutzer gebremst.
|
||||
|
||||
Man kann es sich wie eine Ampel vorstellen:
|
||||
|
||||
| Score | Ampel | Wirkung | Beispiel |
|
||||
|-------|-------|---------|----------|
|
||||
| 0-29 | Gruen | Keine Einschraenkung | Normaler Nutzer |
|
||||
| 30-59 | Gelb | 1-3 Sekunden Verzoegerung | Leicht auffaelliges Muster |
|
||||
| 60-84 | Orange | 5-10 Sekunden Verzoegerung, reduzierte Details | Deutlich verdaechtiges Verhalten |
|
||||
| 85+ | Rot | Zugriff blockiert (HTTP 429) | Sehr wahrscheinlich automatisierter Angriff |
|
||||
|
||||
### Score-Zerfall
|
||||
|
||||
Der Score sinkt automatisch: Alle 5 Minuten wird er mit dem Faktor 0,95 multipliziert. Ein Score von 60 faellt also innerhalb einer Stunde auf etwa 30 -- wenn kein neues verdaechtiges Verhalten hinzukommt.
|
||||
|
||||
---
|
||||
|
||||
## 3. Was wird erkannt?
|
||||
|
||||
Die Middleware erkennt fuenf verschiedene Anomalie-Muster:
|
||||
|
||||
### 3.1 Hohe Kategorie-Diversitaet
|
||||
|
||||
**Was:** Ein Benutzer greift innerhalb einer Stunde auf mehr als 40 verschiedene SDK-Kategorien zu.
|
||||
|
||||
**Warum verdaechtig:** Ein normaler Nutzer arbeitet in der Regel mit 3-10 Kategorien. Wer systematisch alle durchlaeuft, sammelt vermutlich Daten.
|
||||
|
||||
**Score-Erhoehung:** +15
|
||||
|
||||
```
|
||||
Normal: tom/access-control → tom/access-control → tom/encryption → tom/encryption
|
||||
(3 verschiedene Kategorien in einer Stunde)
|
||||
|
||||
Verdaechtig: tom/access-control → tom/encryption → tom/pseudonymization → tom/integrity
|
||||
→ tom/availability → tom/resilience → dsfa/threshold → dsfa/necessity → ...
|
||||
(40+ verschiedene Kategorien in einer Stunde)
|
||||
```
|
||||
|
||||
### 3.2 Burst-Erkennung
|
||||
|
||||
**Was:** Ein Benutzer sendet mehr als 15 Anfragen an die gleiche Kategorie innerhalb von 2 Minuten.
|
||||
|
||||
**Warum verdaechtig:** Selbst ein eifriger Nutzer klickt nicht 15-mal pro Minute auf denselben Endpunkt. Das deutet auf automatisiertes Scraping hin.
|
||||
|
||||
**Score-Erhoehung:** +20
|
||||
|
||||
### 3.3 Sequentielle Enumeration
|
||||
|
||||
**Was:** Die letzten 10 aufgerufenen Kategorien sind zu mindestens 70% in alphabetischer oder numerischer Reihenfolge.
|
||||
|
||||
**Warum verdaechtig:** Menschen springen zwischen Kategorien -- sie arbeiten thematisch, nicht alphabetisch. Ein Skript dagegen iteriert oft ueber eine sortierte Liste.
|
||||
|
||||
**Score-Erhoehung:** +25
|
||||
|
||||
```
|
||||
Verdaechtig: assessment_general → compliance_general → controls_general
|
||||
→ dsfa_measures → dsfa_necessity → dsfa_residual → dsfa_risks
|
||||
→ dsfa_threshold → eh_general → namespace_general
|
||||
(alphabetisch sortiert = Skript-Verhalten)
|
||||
```
|
||||
|
||||
### 3.4 Ungewoehnliche Uhrzeiten
|
||||
|
||||
**Was:** Anfragen zwischen 0:00 und 5:00 Uhr UTC.
|
||||
|
||||
**Warum verdaechtig:** Lehrer arbeiten tagsüber. Wer um 3 Uhr morgens SDK-Endpunkte abfragt, ist wahrscheinlich ein automatisierter Prozess.
|
||||
|
||||
**Score-Erhoehung:** +10
|
||||
|
||||
### 3.5 Multi-Tenant-Zugriff
|
||||
|
||||
**Was:** Ein Benutzer greift innerhalb einer Stunde auf mehr als 3 verschiedene Mandanten (Tenants) zu.
|
||||
|
||||
**Warum verdaechtig:** Ein normaler Nutzer gehoert zu einem Mandanten. Wer mehrere durchprobiert, koennte versuchen, mandantenuebergreifend Daten zu sammeln.
|
||||
|
||||
**Score-Erhoehung:** +15
|
||||
|
||||
---
|
||||
|
||||
## 4. Quota-System (Mengenbegrenzung)
|
||||
|
||||
Zusaetzlich zum Anomaly-Score gibt es klassische Mengenbegrenzungen in vier Zeitfenstern:
|
||||
|
||||
| Tier | pro Minute | pro Stunde | pro Tag | pro Monat |
|
||||
|------|-----------|-----------|---------|-----------|
|
||||
| **Free** | 30 | 500 | 3.000 | 50.000 |
|
||||
| **Standard** | 60 | 1.500 | 10.000 | 200.000 |
|
||||
| **Enterprise** | 120 | 5.000 | 50.000 | 1.000.000 |
|
||||
|
||||
Wenn ein Limit in irgendeinem Zeitfenster ueberschritten wird, erhaelt der Nutzer sofort HTTP 429 -- unabhaengig vom Anomaly-Score.
|
||||
|
||||
---
|
||||
|
||||
## 5. Architektur
|
||||
|
||||
### Datenfluss eines SDK-Requests
|
||||
|
||||
```
|
||||
Request kommt an
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Ist der Pfad geschuetzt? │
|
||||
│ (/api/sdk/*, /api/v1/tom/*, /api/v1/dsfa/*, ...) │
|
||||
│ Nein → direkt weiterleiten │
|
||||
└──────────────┬──────────────────────────────────────────────┘
|
||||
│ Ja
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ User + Tier + Kategorie extrahieren │
|
||||
│ (aus Session, API-Key oder X-SDK-Tier Header) │
|
||||
└──────────────┬──────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Multi-Window Quota pruefen │
|
||||
│ (Minute / Stunde / Tag / Monat) │
|
||||
│ Ueberschritten → HTTP 429 zurueck │
|
||||
└──────────────┬──────────────────────────────────────────────┘
|
||||
│ OK
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Anomaly-Score laden (aus Valkey) │
|
||||
│ Zeitbasierten Zerfall anwenden (×0,95 alle 5 min) │
|
||||
└──────────────┬──────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Anomalie-Detektoren ausfuehren: │
|
||||
│ ├── Diversity-Tracking (+15 wenn >40 Kategorien/h) │
|
||||
│ ├── Burst-Detection (+20 wenn >15 gleiche/2min) │
|
||||
│ ├── Sequential-Enumeration (+25 wenn sortiert) │
|
||||
│ ├── Unusual-Hours (+10 wenn 0-5 Uhr UTC) │
|
||||
│ └── Multi-Tenant (+15 wenn >3 Tenants/h) │
|
||||
└──────────────┬──────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Throttle-Level bestimmen │
|
||||
│ Level 3 (Score ≥85) → HTTP 429 │
|
||||
│ Level 2 (Score ≥60) → 5-10s Delay + reduzierte Details │
|
||||
│ Level 1 (Score ≥30) → 1-3s Delay │
|
||||
│ Level 0 → keine Einschraenkung │
|
||||
└──────────────┬──────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ Request weiterleiten │
|
||||
│ Response-Headers setzen: │
|
||||
│ ├── X-SDK-Quota-Remaining-Minute/Hour │
|
||||
│ ├── X-SDK-Throttle-Level │
|
||||
│ ├── X-SDK-Detail-Reduced (bei Level ≥2) │
|
||||
│ └── X-BP-Trace (HMAC-Watermark) │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Valkey-Datenstrukturen
|
||||
|
||||
Die Middleware speichert alle Tracking-Daten in Valkey (Redis-Fork). Wenn Valkey nicht erreichbar ist, wird automatisch auf eine In-Memory-Implementierung zurueckgefallen.
|
||||
|
||||
| Zweck | Valkey-Typ | Key-Muster | TTL |
|
||||
|-------|-----------|------------|-----|
|
||||
| Quota pro Zeitfenster | Sorted Set | `sdk_protect:quota:{user}:{window}` | Fenster + 10s |
|
||||
| Kategorie-Diversitaet | Set | `sdk_protect:diversity:{user}:{stunde}` | 3660s |
|
||||
| Burst-Tracking | Sorted Set | `sdk_protect:burst:{user}:{kategorie}` | 130s |
|
||||
| Sequenz-Tracking | List | `sdk_protect:seq:{user}` | 310s |
|
||||
| Anomaly-Score | Hash | `sdk_protect:score:{user}` | 86400s |
|
||||
| Tenant-Tracking | Set | `sdk_protect:tenants:{user}:{stunde}` | 3660s |
|
||||
|
||||
### Watermarking
|
||||
|
||||
Jede Antwort enthaelt einen `X-BP-Trace` Header mit einem HMAC-basierten Fingerabdruck. Damit kann nachtraeglich nachgewiesen werden, welcher Benutzer wann welche Daten abgerufen hat -- ohne dass der Benutzer den Trace veraendern kann.
|
||||
|
||||
---
|
||||
|
||||
## 6. Geschuetzte Endpunkte
|
||||
|
||||
Die Middleware schuetzt alle Pfade, die SDK- und Compliance-relevante Daten liefern:
|
||||
|
||||
| Pfad-Prefix | Bereich |
|
||||
|-------------|---------|
|
||||
| `/api/sdk/*` | SDK-Hauptendpunkte |
|
||||
| `/api/compliance/*` | Compliance-Bewertungen |
|
||||
| `/api/v1/tom/*` | Technisch-organisatorische Massnahmen |
|
||||
| `/api/v1/dsfa/*` | Datenschutz-Folgenabschaetzung |
|
||||
| `/api/v1/vvt/*` | Verarbeitungsverzeichnis |
|
||||
| `/api/v1/controls/*` | Controls und Massnahmen |
|
||||
| `/api/v1/assessment/*` | Assessment-Bewertungen |
|
||||
| `/api/v1/eh/*` | Erwartungshorizonte |
|
||||
| `/api/v1/namespace/*` | Namespace-Verwaltung |
|
||||
|
||||
Nicht geschuetzt sind `/health`, `/metrics` und `/api/health`.
|
||||
|
||||
---
|
||||
|
||||
## 7. Admin-Verwaltung
|
||||
|
||||
Ueber das Admin-Dashboard koennen Anomaly-Scores eingesehen und verwaltet werden:
|
||||
|
||||
| Endpoint | Methode | Beschreibung |
|
||||
|----------|---------|--------------|
|
||||
| `/api/admin/middleware/sdk-protection/scores` | GET | Aktuelle Anomaly-Scores aller Benutzer |
|
||||
| `/api/admin/middleware/sdk-protection/stats` | GET | Statistik: Benutzer pro Throttle-Level |
|
||||
| `/api/admin/middleware/sdk-protection/reset-score/{user_id}` | POST | Score eines Benutzers zuruecksetzen |
|
||||
| `/api/admin/middleware/sdk-protection/tiers` | GET | Tier-Konfigurationen anzeigen |
|
||||
| `/api/admin/middleware/sdk-protection/tiers/{name}` | PUT | Tier-Limits aendern |
|
||||
|
||||
---
|
||||
|
||||
## 8. Dateien und Quellcode
|
||||
|
||||
| Datei | Beschreibung |
|
||||
|-------|--------------|
|
||||
| `backend/middleware/sdk_protection.py` | Kern-Middleware (~460 Zeilen) |
|
||||
| `backend/middleware/__init__.py` | Export der Middleware-Klassen |
|
||||
| `backend/main.py` | Registrierung im FastAPI-Stack |
|
||||
| `backend/middleware_admin_api.py` | Admin-API-Endpoints |
|
||||
| `backend/migrations/add_sdk_protection_tables.sql` | Datenbank-Migration |
|
||||
| `backend/tests/test_middleware.py` | 14 Tests fuer alle Erkennungsmechanismen |
|
||||
|
||||
---
|
||||
|
||||
## 9. Datenbank-Tabellen
|
||||
|
||||
### sdk_anomaly_scores
|
||||
|
||||
Speichert Snapshots der Anomaly-Scores fuer Audit und Analyse.
|
||||
|
||||
| Spalte | Typ | Beschreibung |
|
||||
|--------|-----|--------------|
|
||||
| `id` | UUID | Primaerschluessel |
|
||||
| `user_id` | VARCHAR(255) | Benutzer-Identifikation |
|
||||
| `score` | DECIMAL(5,2) | Aktueller Anomaly-Score |
|
||||
| `throttle_level` | SMALLINT | Aktueller Throttle-Level (0-3) |
|
||||
| `triggered_rules` | JSONB | Welche Regeln ausgeloest wurden |
|
||||
| `endpoint_diversity_count` | INT | Anzahl verschiedener Kategorien |
|
||||
| `request_count_1h` | INT | Anfragen in der letzten Stunde |
|
||||
| `snapshot_at` | TIMESTAMPTZ | Zeitpunkt des Snapshots |
|
||||
|
||||
### sdk_protection_tiers
|
||||
|
||||
Konfigurierbare Quota-Tiers, editierbar ueber die Admin-API.
|
||||
|
||||
| Spalte | Typ | Beschreibung |
|
||||
|--------|-----|--------------|
|
||||
| `tier_name` | VARCHAR(50) | Name des Tiers (free, standard, enterprise) |
|
||||
| `quota_per_minute` | INT | Maximale Anfragen pro Minute |
|
||||
| `quota_per_hour` | INT | Maximale Anfragen pro Stunde |
|
||||
| `quota_per_day` | INT | Maximale Anfragen pro Tag |
|
||||
| `quota_per_month` | INT | Maximale Anfragen pro Monat |
|
||||
| `diversity_threshold` | INT | Max verschiedene Kategorien pro Stunde |
|
||||
| `burst_threshold` | INT | Max gleiche Kategorie in 2 Minuten |
|
||||
|
||||
---
|
||||
|
||||
## 10. Konfiguration
|
||||
|
||||
Die Middleware wird in `main.py` registriert:
|
||||
|
||||
```python
|
||||
from middleware import SDKProtectionMiddleware
|
||||
|
||||
app.add_middleware(SDKProtectionMiddleware)
|
||||
```
|
||||
|
||||
Alle Parameter koennen ueber die `SDKProtectionConfig` Dataclass angepasst werden. Die wichtigsten Umgebungsvariablen:
|
||||
|
||||
| Variable | Default | Beschreibung |
|
||||
|----------|---------|--------------|
|
||||
| `VALKEY_URL` | `redis://localhost:6379` | Verbindung zur Valkey-Instanz |
|
||||
| `SDK_WATERMARK_SECRET` | (generiert) | HMAC-Secret fuer Watermarks |
|
||||
|
||||
---
|
||||
|
||||
## 11. Tests
|
||||
|
||||
Die Middleware wird durch 14 automatisierte Tests abgedeckt:
|
||||
|
||||
```bash
|
||||
# Alle SDK Protection Tests ausfuehren
|
||||
docker compose run --rm --no-deps backend \
|
||||
python -m pytest tests/test_middleware.py -v -k sdk
|
||||
```
|
||||
|
||||
| Test | Prueft |
|
||||
|------|--------|
|
||||
| `test_allows_normal_request` | Normaler Request wird durchgelassen |
|
||||
| `test_blocks_after_quota_exceeded` | 429 bei Quota-Ueberschreitung |
|
||||
| `test_diversity_tracking_increments_score` | Viele Kategorien erhoehen den Score |
|
||||
| `test_burst_detection` | Schnelle gleiche Anfragen erhoehen den Score |
|
||||
| `test_sequential_enumeration_detection` | Alphabetische Muster werden erkannt |
|
||||
| `test_progressive_throttling_level_1` | Delay bei Score >= 30 |
|
||||
| `test_progressive_throttling_level_3_blocks` | Block bei Score >= 85 |
|
||||
| `test_score_decay_over_time` | Score sinkt ueber die Zeit |
|
||||
| `test_skips_non_protected_paths` | Nicht-SDK-Pfade bleiben frei |
|
||||
| `test_watermark_header_present` | X-BP-Trace Header vorhanden |
|
||||
| `test_fallback_to_inmemory` | Funktioniert ohne Valkey |
|
||||
| `test_no_user_passes_through` | Anonyme Requests passieren |
|
||||
| `test_category_extraction` | Korrekte Kategorie-Zuordnung |
|
||||
| `test_quota_headers_present` | Response-Headers vorhanden |
|
||||
@@ -38,7 +38,7 @@ BreakPilot ist eine modulare Bildungs- und Compliance-Plattform, aufgeteilt in d
|
||||
│ Jitsi (5x) │ │ BreakPilot Drive│ │ │
|
||||
│ Night Scheduler │ │ │ │ │
|
||||
│ Health Agg. │ │ │ │ │
|
||||
│ Gitea/Woodpecker│ │ │ │ │
|
||||
│ Gitea Actions │ │ │ │ │
|
||||
│ ERP (optional) │ │ │ │ │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
│ │ │
|
||||
@@ -67,7 +67,7 @@ Stellt gemeinsam genutzte Infrastruktur bereit. Beide Teams (Lehrer + Compliance
|
||||
| Frontend | Admin Core (Next.js, Port 3008) |
|
||||
| Networking | Nginx (Reverse Proxy + TLS) |
|
||||
| Monitoring | Health Aggregator |
|
||||
| DevOps | Gitea, Woodpecker CI/CD, Night Scheduler, Mailpit |
|
||||
| DevOps | Gitea, Gitea Actions (act_runner), Night Scheduler, Mailpit |
|
||||
| Kommunikation | Jitsi Meet (5 Container), Synapse (Matrix Chat) |
|
||||
| ERP | ERPNext (optional, 9 Container) |
|
||||
|
||||
|
||||
@@ -1,15 +1,14 @@
|
||||
# CI/CD Pipeline
|
||||
|
||||
Übersicht über den Deployment-Prozess für Breakpilot.
|
||||
Uebersicht ueber den Deployment-Prozess fuer BreakPilot.
|
||||
|
||||
## Übersicht
|
||||
## Uebersicht
|
||||
|
||||
| Komponente | Build-Tool | Deployment |
|
||||
|------------|------------|------------|
|
||||
| Frontend (Next.js) | Docker | Mac Mini |
|
||||
| Backend (FastAPI) | Docker | Mac Mini |
|
||||
| Go Services | Docker (Multi-stage) | Mac Mini |
|
||||
| Documentation | MkDocs | Docker (Nginx) |
|
||||
| Repo | Deployment | Trigger | Compose File |
|
||||
|------|-----------|---------|--------------|
|
||||
| **breakpilot-core** | Coolify (automatisch) | Push auf `coolify` Branch | `docker-compose.coolify.yml` |
|
||||
| **breakpilot-compliance** | Coolify (automatisch) | Push auf `main` Branch | `docker-compose.yml` + `docker-compose.coolify.yml` |
|
||||
| **breakpilot-lehrer** | Mac Mini (lokal) | Manuell `docker compose` | `docker-compose.yml` |
|
||||
|
||||
## Deployment-Architektur
|
||||
|
||||
@@ -17,188 +16,193 @@
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Entwickler-MacBook │
|
||||
│ │
|
||||
│ breakpilot-pwa/ │
|
||||
│ ├── studio-v2/ (Next.js Frontend) │
|
||||
│ ├── admin-v2/ (Next.js Admin) │
|
||||
│ ├── backend/ (Python FastAPI) │
|
||||
│ ├── consent-service/ (Go Service) │
|
||||
│ ├── klausur-service/ (Python FastAPI) │
|
||||
│ ├── voice-service/ (Python FastAPI) │
|
||||
│ ├── ai-compliance-sdk/ (Go Service) │
|
||||
│ └── docs-src/ (MkDocs) │
|
||||
│ breakpilot-core/ → git push gitea coolify │
|
||||
│ breakpilot-compliance/ → git push gitea main │
|
||||
│ breakpilot-lehrer/ → git push + ssh macmini docker ... │
|
||||
│ │
|
||||
│ $ ./sync-and-deploy.sh │
|
||||
└───────────────────────────────┬─────────────────────────────────┘
|
||||
│
|
||||
│ rsync + SSH
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Mac Mini Server │
|
||||
│ │
|
||||
│ Docker Compose │
|
||||
│ ├── website (Port 3000) │
|
||||
│ ├── studio-v2 (Port 3001) │
|
||||
│ ├── admin-v2 (Port 3002) │
|
||||
│ ├── backend (Port 8000) │
|
||||
│ ├── consent-service (Port 8081) │
|
||||
│ ├── klausur-service (Port 8086) │
|
||||
│ ├── voice-service (Port 8082) │
|
||||
│ ├── ai-compliance-sdk (Port 8090) │
|
||||
│ ├── docs (Port 8009) │
|
||||
│ ├── postgres │
|
||||
│ ├── valkey (Redis) │
|
||||
│ ├── qdrant │
|
||||
│ └── minio │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
┌───────────┴───────────┐
|
||||
│ │
|
||||
▼ ▼
|
||||
┌───────────────────────────┐ ┌───────────────────────────┐
|
||||
│ Coolify (Production) │ │ Mac Mini (Lokal/Dev) │
|
||||
│ │ │ │
|
||||
│ Gitea Actions │ │ breakpilot-lehrer │
|
||||
│ ├── Tests │ │ ├── studio-v2 │
|
||||
│ └── Coolify API Deploy │ │ ├── klausur-service │
|
||||
│ │ │ ├── backend-lehrer │
|
||||
│ Core Services: │ │ └── voice-service │
|
||||
│ ├── consent-service │ │ │
|
||||
│ ├── rag-service │ │ Core Services (lokal): │
|
||||
│ ├── embedding-service │ │ ├── postgres │
|
||||
│ ├── paddleocr-service │ │ ├── valkey, vault │
|
||||
│ └── health-aggregator │ │ ├── nginx, gitea │
|
||||
│ │ │ └── ... │
|
||||
│ Compliance Services: │ │ │
|
||||
│ ├── admin-compliance │ │ │
|
||||
│ ├── backend-compliance │ │ │
|
||||
│ ├── ai-compliance-sdk │ │ │
|
||||
│ └── developer-portal │ │ │
|
||||
└───────────────────────────┘ └───────────────────────────┘
|
||||
```
|
||||
|
||||
## Sync & Deploy Workflow
|
||||
## breakpilot-core → Coolify
|
||||
|
||||
### 1. Dateien synchronisieren
|
||||
### Pipeline
|
||||
|
||||
```bash
|
||||
# Sync aller relevanten Verzeichnisse zum Mac Mini
|
||||
rsync -avz --delete \
|
||||
--exclude 'node_modules' \
|
||||
--exclude '.next' \
|
||||
--exclude '.git' \
|
||||
--exclude '__pycache__' \
|
||||
--exclude 'venv' \
|
||||
--exclude '.pytest_cache' \
|
||||
/Users/benjaminadmin/Projekte/breakpilot-pwa/ \
|
||||
macmini:/Users/benjaminadmin/Projekte/breakpilot-pwa/
|
||||
```yaml
|
||||
# .gitea/workflows/deploy-coolify.yml
|
||||
on:
|
||||
push:
|
||||
branches: [coolify]
|
||||
|
||||
jobs:
|
||||
deploy:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Deploy via Coolify API
|
||||
# Triggert Coolify Build + Deploy ueber API
|
||||
# Secrets: COOLIFY_API_TOKEN, COOLIFY_RESOURCE_UUID, COOLIFY_BASE_URL
|
||||
```
|
||||
|
||||
### 2. Container bauen
|
||||
### Workflow
|
||||
|
||||
```bash
|
||||
# Einzelnen Service bauen
|
||||
# 1. Code auf MacBook bearbeiten
|
||||
# 2. Committen und pushen:
|
||||
git push origin main && git push gitea main
|
||||
|
||||
# 3. Fuer Production-Deploy:
|
||||
git push gitea coolify
|
||||
|
||||
# 4. Status pruefen:
|
||||
# https://gitea.meghsakha.com/Benjamin_Boenisch/breakpilot-core/actions
|
||||
```
|
||||
|
||||
### Coolify-deployed Services
|
||||
|
||||
| Service | Container | Beschreibung |
|
||||
|---------|-----------|--------------|
|
||||
| valkey | bp-core-valkey | Session-Cache |
|
||||
| consent-service | bp-core-consent-service | Consent-Management (Go) |
|
||||
| rag-service | bp-core-rag-service | Semantische Suche |
|
||||
| embedding-service | bp-core-embedding-service | Text-Embeddings |
|
||||
| paddleocr-service | bp-core-paddleocr | OCR Engine (x86_64) |
|
||||
| health-aggregator | bp-core-health | Health-Check Aggregator |
|
||||
|
||||
## breakpilot-compliance → Coolify
|
||||
|
||||
### Pipeline
|
||||
|
||||
```yaml
|
||||
# .gitea/workflows/ci.yaml
|
||||
on:
|
||||
push:
|
||||
branches: [main, develop]
|
||||
|
||||
jobs:
|
||||
# Lint (nur PRs)
|
||||
# Tests (Go, Python, Node.js)
|
||||
# Validate Canonical Controls
|
||||
# Deploy (nur main, nach allen Tests)
|
||||
```
|
||||
|
||||
### Workflow
|
||||
|
||||
```bash
|
||||
# Committen und pushen → Coolify deployt automatisch:
|
||||
git push origin main && git push gitea main
|
||||
|
||||
# CI-Status pruefen:
|
||||
# https://gitea.meghsakha.com/Benjamin_Boenisch/breakpilot-compliance/actions
|
||||
|
||||
# Health Checks:
|
||||
curl -sf https://api-dev.breakpilot.ai/health
|
||||
curl -sf https://sdk-dev.breakpilot.ai/health
|
||||
```
|
||||
|
||||
## breakpilot-lehrer → Mac Mini (lokal)
|
||||
|
||||
### Workflow
|
||||
|
||||
```bash
|
||||
# 1. Code auf MacBook bearbeiten
|
||||
# 2. Committen und pushen:
|
||||
git push origin main && git push gitea main
|
||||
|
||||
# 3. Auf Mac Mini pullen und Container neu bauen:
|
||||
ssh macmini "git -C /Users/benjaminadmin/Projekte/breakpilot-lehrer pull --no-rebase origin main"
|
||||
ssh macmini "/usr/local/bin/docker compose -f /Users/benjaminadmin/Projekte/breakpilot-lehrer/docker-compose.yml build --no-cache <service>"
|
||||
ssh macmini "/usr/local/bin/docker compose -f /Users/benjaminadmin/Projekte/breakpilot-lehrer/docker-compose.yml up -d <service>"
|
||||
```
|
||||
|
||||
## Gitea Actions
|
||||
|
||||
### Ueberblick
|
||||
|
||||
BreakPilot nutzt **Gitea Actions** (GitHub Actions-kompatibel) als CI/CD-System. Der `act_runner` laeuft als Container auf dem Mac Mini und fuehrt Pipelines aus.
|
||||
|
||||
| Komponente | Container | Beschreibung |
|
||||
|------------|-----------|--------------|
|
||||
| Gitea | `bp-core-gitea` (Port 3003) | Git-Server + Actions-Trigger |
|
||||
| Gitea Runner | `bp-core-gitea-runner` | Fuehrt Actions-Workflows aus |
|
||||
|
||||
### Pipeline-Konfiguration
|
||||
|
||||
Workflows liegen in jedem Repo unter `.gitea/workflows/`:
|
||||
|
||||
| Repo | Workflow | Branch | Aktion |
|
||||
|------|----------|--------|--------|
|
||||
| breakpilot-core | `deploy-coolify.yml` | `coolify` | Coolify API Deploy |
|
||||
| breakpilot-compliance | `ci.yaml` | `main` | Tests + Coolify Deploy |
|
||||
|
||||
### Runner-Token erneuern
|
||||
|
||||
```bash
|
||||
# Runner-Token in Gitea UI generieren:
|
||||
# https://macmini:3003 → Settings → Actions → Runners → New Runner
|
||||
|
||||
# Token in .env setzen:
|
||||
GITEA_RUNNER_TOKEN=<neues_token>
|
||||
|
||||
# Runner neu starten:
|
||||
ssh macmini "/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
build --no-cache <service-name>"
|
||||
|
||||
# Beispiele:
|
||||
# studio-v2, admin-v2, website, backend, klausur-service, docs
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-core/docker-compose.yml \
|
||||
up -d --force-recreate gitea-runner"
|
||||
```
|
||||
|
||||
### 3. Container deployen
|
||||
### Pipeline-Status pruefen
|
||||
|
||||
```bash
|
||||
# Container neu starten
|
||||
ssh macmini "/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
up -d <service-name>"
|
||||
```
|
||||
|
||||
### 4. Logs prüfen
|
||||
|
||||
```bash
|
||||
# Container-Logs anzeigen
|
||||
ssh macmini "/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
logs -f <service-name>"
|
||||
```
|
||||
|
||||
## Service-spezifische Deployments
|
||||
|
||||
### Next.js Frontend (studio-v2, admin-v2, website)
|
||||
|
||||
```bash
|
||||
# 1. Sync
|
||||
rsync -avz --delete \
|
||||
--exclude 'node_modules' --exclude '.next' --exclude '.git' \
|
||||
/Users/benjaminadmin/Projekte/breakpilot-pwa/studio-v2/ \
|
||||
macmini:/Users/benjaminadmin/Projekte/breakpilot-pwa/studio-v2/
|
||||
|
||||
# 2. Build & Deploy
|
||||
ssh macmini "/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
build --no-cache studio-v2 && \
|
||||
/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
up -d studio-v2"
|
||||
```
|
||||
|
||||
### Python Services (backend, klausur-service, voice-service)
|
||||
|
||||
```bash
|
||||
# Build mit requirements.txt
|
||||
ssh macmini "/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
build klausur-service && \
|
||||
/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
up -d klausur-service"
|
||||
```
|
||||
|
||||
### Go Services (consent-service, ai-compliance-sdk)
|
||||
|
||||
```bash
|
||||
# Multi-stage Build (Go → Alpine)
|
||||
ssh macmini "/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
build --no-cache consent-service && \
|
||||
/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
up -d consent-service"
|
||||
```
|
||||
|
||||
### MkDocs Dokumentation
|
||||
|
||||
```bash
|
||||
# Build & Deploy
|
||||
ssh macmini "/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
build --no-cache docs && \
|
||||
/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
up -d docs"
|
||||
|
||||
# Verfügbar unter: http://macmini:8009
|
||||
# Runner-Logs
|
||||
ssh macmini "/usr/local/bin/docker logs -f bp-core-gitea-runner"
|
||||
```
|
||||
|
||||
## Health Checks
|
||||
|
||||
### Service-Status prüfen
|
||||
### Production (Coolify)
|
||||
|
||||
```bash
|
||||
# Alle Container-Status
|
||||
ssh macmini "docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'"
|
||||
# Core PaddleOCR
|
||||
curl -sf https://ocr.breakpilot.com/health
|
||||
|
||||
# Health-Endpoints prüfen
|
||||
curl -s http://macmini:8000/health
|
||||
curl -s http://macmini:8081/health
|
||||
curl -s http://macmini:8086/health
|
||||
curl -s http://macmini:8090/health
|
||||
# Compliance
|
||||
curl -sf https://api-dev.breakpilot.ai/health
|
||||
curl -sf https://sdk-dev.breakpilot.ai/health
|
||||
```
|
||||
|
||||
### Logs analysieren
|
||||
### Lokal (Mac Mini)
|
||||
|
||||
```bash
|
||||
# Letzte 100 Zeilen
|
||||
ssh macmini "docker logs --tail 100 breakpilot-pwa-backend-1"
|
||||
# Core Health Aggregator
|
||||
curl -sf http://macmini:8099/health
|
||||
|
||||
# Live-Logs folgen
|
||||
ssh macmini "docker logs -f breakpilot-pwa-backend-1"
|
||||
```
|
||||
# Lehrer Backend
|
||||
curl -sf https://macmini:8001/health
|
||||
|
||||
## Rollback
|
||||
|
||||
### Container auf vorherige Version zurücksetzen
|
||||
|
||||
```bash
|
||||
# 1. Aktuelles Image taggen
|
||||
ssh macmini "docker tag breakpilot-pwa-backend:latest breakpilot-pwa-backend:backup"
|
||||
|
||||
# 2. Altes Image deployen
|
||||
ssh macmini "/usr/local/bin/docker compose \
|
||||
-f /Users/benjaminadmin/Projekte/breakpilot-pwa/docker-compose.yml \
|
||||
up -d backend"
|
||||
|
||||
# 3. Bei Problemen: Backup wiederherstellen
|
||||
ssh macmini "docker tag breakpilot-pwa-backend:backup breakpilot-pwa-backend:latest"
|
||||
# Klausur-Service
|
||||
curl -sf https://macmini:8086/health
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
@@ -206,24 +210,11 @@ ssh macmini "docker tag breakpilot-pwa-backend:backup breakpilot-pwa-backend:lat
|
||||
### Container startet nicht
|
||||
|
||||
```bash
|
||||
# 1. Logs prüfen
|
||||
ssh macmini "docker logs breakpilot-pwa-<service>-1"
|
||||
# Logs pruefen (lokal)
|
||||
ssh macmini "/usr/local/bin/docker logs bp-core-<service>"
|
||||
|
||||
# 2. Container manuell starten für Debug-Output
|
||||
ssh macmini "docker compose -f .../docker-compose.yml run --rm <service>"
|
||||
|
||||
# 3. In Container einloggen
|
||||
ssh macmini "docker exec -it breakpilot-pwa-<service>-1 /bin/sh"
|
||||
```
|
||||
|
||||
### Port bereits belegt
|
||||
|
||||
```bash
|
||||
# Port-Belegung prüfen
|
||||
ssh macmini "lsof -i :8000"
|
||||
|
||||
# Container mit dem Port finden
|
||||
ssh macmini "docker ps --filter publish=8000"
|
||||
# In Container einloggen
|
||||
ssh macmini "/usr/local/bin/docker exec -it bp-core-<service> /bin/sh"
|
||||
```
|
||||
|
||||
### Build-Fehler
|
||||
@@ -236,167 +227,24 @@ ssh macmini "docker builder prune -a"
|
||||
ssh macmini "docker compose build --no-cache <service>"
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
## Rollback
|
||||
|
||||
### Resource-Nutzung
|
||||
### Coolify
|
||||
|
||||
Ein Redeploy mit einem aelteren Commit kann durch Zuruecksetzen des Branches ausgeloest werden:
|
||||
|
||||
```bash
|
||||
# CPU/Memory aller Container
|
||||
ssh macmini "docker stats --no-stream"
|
||||
|
||||
# Disk-Nutzung
|
||||
ssh macmini "docker system df"
|
||||
# Branch auf vorherigen Commit zuruecksetzen und pushen
|
||||
git reset --hard <previous-commit>
|
||||
git push gitea coolify --force
|
||||
```
|
||||
|
||||
### Cleanup
|
||||
### Lokal (Mac Mini)
|
||||
|
||||
```bash
|
||||
# Ungenutzte Images/Container entfernen
|
||||
ssh macmini "docker system prune -a --volumes"
|
||||
# Image taggen als Backup
|
||||
ssh macmini "docker tag breakpilot-lehrer-klausur-service:latest breakpilot-lehrer-klausur-service:backup"
|
||||
|
||||
# Nur dangling Images
|
||||
ssh macmini "docker image prune"
|
||||
```
|
||||
|
||||
## Umgebungsvariablen
|
||||
|
||||
Umgebungsvariablen werden über `.env` Dateien und docker-compose.yml verwaltet:
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
services:
|
||||
backend:
|
||||
environment:
|
||||
- DATABASE_URL=postgresql://...
|
||||
- REDIS_URL=redis://valkey:6379
|
||||
- SECRET_KEY=${SECRET_KEY}
|
||||
```
|
||||
|
||||
**Wichtig**: Sensible Werte niemals in Git committen. Stattdessen:
|
||||
- `.env` Datei auf dem Server pflegen
|
||||
- Secrets über HashiCorp Vault (siehe unten)
|
||||
|
||||
## Woodpecker CI - Automatisierte OAuth Integration
|
||||
|
||||
### Überblick
|
||||
|
||||
Die OAuth-Integration zwischen Woodpecker CI und Gitea ist **vollständig automatisiert**. Credentials werden in HashiCorp Vault gespeichert und bei Bedarf automatisch regeneriert.
|
||||
|
||||
!!! info "Warum automatisiert?"
|
||||
Diese Automatisierung ist eine DevSecOps Best Practice:
|
||||
|
||||
- **Infrastructure-as-Code**: Alles ist reproduzierbar
|
||||
- **Disaster Recovery**: Verlorene Credentials können automatisch regeneriert werden
|
||||
- **Security**: Secrets werden zentral in Vault verwaltet
|
||||
- **Onboarding**: Neue Entwickler müssen nichts manuell konfigurieren
|
||||
|
||||
### Architektur
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Mac Mini Server │
|
||||
│ │
|
||||
│ ┌───────────────┐ OAuth 2.0 ┌───────────────┐ │
|
||||
│ │ Gitea │ ←─────────────────────────→│ Woodpecker │ │
|
||||
│ │ (Port 3003) │ Client ID + Secret │ (Port 8090) │ │
|
||||
│ └───────────────┘ └───────────────┘ │
|
||||
│ │ │ │
|
||||
│ │ OAuth App │ Env Vars│
|
||||
│ │ (DB: oauth2_application) │ │
|
||||
│ │ │ │
|
||||
│ ▼ ▼ │
|
||||
│ ┌───────────────────────────────────────────────────────────┐ │
|
||||
│ │ HashiCorp Vault (Port 8200) │ │
|
||||
│ │ │ │
|
||||
│ │ secret/cicd/woodpecker: │ │
|
||||
│ │ - gitea_client_id │ │
|
||||
│ │ - gitea_client_secret │ │
|
||||
│ │ │ │
|
||||
│ │ secret/cicd/api-tokens: │ │
|
||||
│ │ - gitea_token (für API-Zugriff) │ │
|
||||
│ │ - woodpecker_token (für Pipeline-Trigger) │ │
|
||||
│ └───────────────────────────────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Credentials-Speicherorte
|
||||
|
||||
| Ort | Pfad | Inhalt |
|
||||
|-----|------|--------|
|
||||
| **HashiCorp Vault** | `secret/cicd/woodpecker` | Client ID + Secret (Quelle der Wahrheit) |
|
||||
| **.env Datei** | `WOODPECKER_GITEA_CLIENT/SECRET` | Für Docker Compose (aus Vault geladen) |
|
||||
| **Gitea PostgreSQL** | `oauth2_application` Tabelle | OAuth App Registration (gehashtes Secret) |
|
||||
|
||||
### Troubleshooting: OAuth Fehler
|
||||
|
||||
Falls der Fehler "Client ID not registered" oder "user does not exist [uid: 0]" auftritt:
|
||||
|
||||
```bash
|
||||
# Option 1: Automatisches Regenerieren (empfohlen)
|
||||
./scripts/sync-woodpecker-credentials.sh --regenerate
|
||||
|
||||
# Option 2: Manuelles Vorgehen
|
||||
# 1. Credentials aus Vault laden
|
||||
vault kv get secret/cicd/woodpecker
|
||||
|
||||
# 2. .env aktualisieren
|
||||
WOODPECKER_GITEA_CLIENT=<client_id>
|
||||
WOODPECKER_GITEA_SECRET=<client_secret>
|
||||
|
||||
# 3. Zu Mac Mini synchronisieren
|
||||
rsync .env macmini:~/Projekte/breakpilot-pwa/
|
||||
|
||||
# 4. Woodpecker neu starten
|
||||
ssh macmini "cd ~/Projekte/breakpilot-pwa && \
|
||||
docker compose up -d --force-recreate woodpecker-server"
|
||||
```
|
||||
|
||||
### Das Sync-Script
|
||||
|
||||
Das Script `scripts/sync-woodpecker-credentials.sh` automatisiert den gesamten Prozess:
|
||||
|
||||
```bash
|
||||
# Credentials aus Vault laden und .env aktualisieren
|
||||
./scripts/sync-woodpecker-credentials.sh
|
||||
|
||||
# Neue Credentials generieren (OAuth App in Gitea + Vault + .env)
|
||||
./scripts/sync-woodpecker-credentials.sh --regenerate
|
||||
```
|
||||
|
||||
Was das Script macht:
|
||||
|
||||
1. **Liest** die aktuellen Credentials aus Vault
|
||||
2. **Aktualisiert** die .env Datei automatisch
|
||||
3. **Bei `--regenerate`**:
|
||||
- Löscht alte OAuth Apps in Gitea
|
||||
- Erstellt neue OAuth App mit neuem Client ID/Secret
|
||||
- Speichert Credentials in Vault
|
||||
- Aktualisiert .env
|
||||
|
||||
### Vault-Zugriff
|
||||
|
||||
```bash
|
||||
# Vault Token (Development)
|
||||
export VAULT_TOKEN=breakpilot-dev-token
|
||||
|
||||
# Credentials lesen
|
||||
docker exec -e VAULT_TOKEN=$VAULT_TOKEN breakpilot-pwa-vault \
|
||||
vault kv get secret/cicd/woodpecker
|
||||
|
||||
# Credentials setzen
|
||||
docker exec -e VAULT_TOKEN=$VAULT_TOKEN breakpilot-pwa-vault \
|
||||
vault kv put secret/cicd/woodpecker \
|
||||
gitea_client_id="..." \
|
||||
gitea_client_secret="..."
|
||||
```
|
||||
|
||||
### Services neustarten nach Credentials-Änderung
|
||||
|
||||
```bash
|
||||
# Wichtig: --force-recreate um neue Env Vars zu laden
|
||||
cd /Users/benjaminadmin/Projekte/breakpilot-pwa
|
||||
docker compose up -d --force-recreate woodpecker-server
|
||||
|
||||
# Logs prüfen
|
||||
docker logs breakpilot-pwa-woodpecker-server --tail 50
|
||||
# Bei Problemen: Backup wiederherstellen
|
||||
ssh macmini "docker tag breakpilot-lehrer-klausur-service:backup breakpilot-lehrer-klausur-service:latest"
|
||||
```
|
||||
|
||||
@@ -12,6 +12,14 @@ BreakPilot besteht aus drei unabhaengigen Projekten:
|
||||
| **breakpilot-lehrer** | Bildungs-Stack (Team A) | `bp-lehrer-*` | Blau |
|
||||
| **breakpilot-compliance** | DSGVO/Compliance-Stack (Team B) | `bp-compliance-*` | Lila |
|
||||
|
||||
### Deployment-Modell
|
||||
|
||||
| Repo | Deployment | Trigger |
|
||||
|------|-----------|---------|
|
||||
| **breakpilot-core** | Coolify (automatisch) | Push auf gitea main |
|
||||
| **breakpilot-compliance** | Coolify (automatisch) | Push auf gitea main |
|
||||
| **breakpilot-lehrer** | Mac Mini (lokal) | Manuell docker compose |
|
||||
|
||||
## Core Services
|
||||
|
||||
| Service | Container | Port | Beschreibung |
|
||||
@@ -30,32 +38,11 @@ BreakPilot besteht aus drei unabhaengigen Projekten:
|
||||
| Admin Core | bp-core-admin | 3008 | Admin-Dashboard (Next.js) |
|
||||
| Health Aggregator | bp-core-health | 8099 | Service-Health Monitoring |
|
||||
| Night Scheduler | bp-core-night-scheduler | 8096 | Nachtabschaltung |
|
||||
| Pitch Deck | bp-core-pitch-deck | 3012 | Investor-Praesentation |
|
||||
| Mailpit | bp-core-mailpit | 8025 | E-Mail (Entwicklung) |
|
||||
| Gitea | bp-core-gitea | 3003 | Git-Server |
|
||||
| Woodpecker | bp-core-woodpecker-server | 8090 | CI/CD |
|
||||
| Gitea Runner | bp-core-gitea-runner | - | CI/CD (Gitea Actions) |
|
||||
| Jitsi (5 Container) | bp-core-jitsi-* | 8443 | Videokonferenzen |
|
||||
|
||||
## Nginx Routing-Tabelle
|
||||
|
||||
| Port | Upstream | Projekt |
|
||||
|------|----------|---------|
|
||||
| 443 | bp-lehrer-studio-v2:3001 | Lehrer |
|
||||
| 3000 | bp-lehrer-website:3000 | Lehrer |
|
||||
| 3002 | bp-lehrer-admin:3000 | Lehrer |
|
||||
| 3006 | bp-compliance-developer-portal:3000 | Compliance |
|
||||
| 3007 | bp-compliance-admin:3000 | Compliance |
|
||||
| 3008 | bp-core-admin:3000 | Core |
|
||||
| 8000 | bp-core-backend:8000 | Core |
|
||||
| 8001 | bp-lehrer-backend:8001 | Lehrer |
|
||||
| 8002 | bp-compliance-backend:8002 | Compliance |
|
||||
| 8086 | bp-lehrer-klausur-service:8086 | Lehrer |
|
||||
| 8087 | bp-core-embedding-service:8087 | Core |
|
||||
| 8091 | bp-lehrer-voice-service:8091 | Lehrer |
|
||||
| 8093 | bp-compliance-ai-sdk:8090 | Compliance |
|
||||
| 8097 | bp-core-rag-service:8097 | Core |
|
||||
| 8443 | bp-core-jitsi-web:80 | Core |
|
||||
|
||||
## Architektur
|
||||
|
||||
- [System-Architektur](architecture/system-architecture.md)
|
||||
|
||||
80
document-templates/README.md
Normal file
80
document-templates/README.md
Normal file
@@ -0,0 +1,80 @@
|
||||
# Document Templates V2
|
||||
|
||||
Erweiterte Compliance-Vorlagen (DSFA, TOM, VVT, AVV) fuer den BreakPilot Document Generator.
|
||||
|
||||
**Branch:** `feature/document-templates-v2`
|
||||
**Ziel-Integration:** breakpilot-compliance (nach Abschluss des Refactoring)
|
||||
**Datenbank:** `compliance.compliance_legal_templates` (shared PostgreSQL)
|
||||
|
||||
## Inhalt
|
||||
|
||||
### SQL Migrations (`migrations/`)
|
||||
|
||||
| Datei | Typ | Beschreibung |
|
||||
|-------|-----|--------------|
|
||||
| `001_dsfa_template_v2.sql` | DSFA | Schwellwertanalyse (WP248), SDM-TOM, KI-Modul, ~60 Placeholders |
|
||||
| `002_tom_sdm_template.sql` | TOM | 7 SDM-Gewaehrleistungsziele, Sektorbloecke, Compliance-Bewertung |
|
||||
| `003_vvt_sector_templates.sql` | VVT | 6 Branchen-Muster (IT/SaaS, Gesundheit, Handel, Handwerk, Bildung, Beratung) |
|
||||
| `004_avv_template.sql` | AVV | Auftragsverarbeitungsvertrag Art. 28, 12 Sections, TOM-Anlage |
|
||||
| `005_additional_templates.sql` | Div. | Verpflichtungserklaerung + Art. 13/14 Informationspflichten |
|
||||
|
||||
### Python Generators (`generators/`)
|
||||
|
||||
| Datei | Beschreibung |
|
||||
|-------|--------------|
|
||||
| `dsfa_template.py` | DSFA-Generator mit Schwellwertanalyse, Bundesland-Mapping, SDM-TOM, Art. 36 |
|
||||
| `tom_template.py` | TOM-Generator mit SDM-Struktur, NIS2/ISO27001/AI Act Erweiterungen, Sektoren |
|
||||
| `vvt_template.py` | VVT-Generator mit 6 Branchen-Katalogen, Art. 30 Validierung |
|
||||
|
||||
### Scripts (`scripts/`)
|
||||
|
||||
| Datei | Beschreibung |
|
||||
|-------|--------------|
|
||||
| `cleanup_temp_vorlagen.py` | Loescht temporaere DPA-Vorlagen aus Qdrant (`temp_vorlagen=true`) |
|
||||
|
||||
## Integration in breakpilot-compliance
|
||||
|
||||
### 1. SQL Migrations ausfuehren
|
||||
|
||||
```bash
|
||||
# Migrations gegen die shared DB ausfuehren
|
||||
# Auf dem Mac Mini:
|
||||
ssh macmini "docker exec bp-core-postgres psql -U breakpilot -d breakpilot_db -f -" < migrations/001_dsfa_template_v2.sql
|
||||
ssh macmini "docker exec bp-core-postgres psql -U breakpilot -d breakpilot_db -f -" < migrations/002_tom_sdm_template.sql
|
||||
# ... usw.
|
||||
```
|
||||
|
||||
### 2. Python Generators kopieren (bei Compliance-Integration)
|
||||
|
||||
```bash
|
||||
cp generators/*.py /path/to/breakpilot-compliance/backend-compliance/compliance/api/document_templates/
|
||||
```
|
||||
|
||||
### 3. Neue document_types registrieren
|
||||
|
||||
In `breakpilot-compliance/backend-compliance/compliance/api/legal_template_routes.py`,
|
||||
`VALID_DOCUMENT_TYPES` erweitern um:
|
||||
- `verpflichtungserklaerung`
|
||||
- `informationspflichten`
|
||||
|
||||
### 4. Qdrant Cleanup ausfuehren
|
||||
|
||||
```bash
|
||||
# Vorschau
|
||||
ssh macmini "python3 /path/to/cleanup_temp_vorlagen.py --dry-run"
|
||||
|
||||
# Ausfuehren
|
||||
ssh macmini "python3 /path/to/cleanup_temp_vorlagen.py"
|
||||
```
|
||||
|
||||
## Template-Syntax
|
||||
|
||||
- `{{PLACEHOLDER}}` — Wird durch Kontext-Wert ersetzt
|
||||
- `{{#IF FELD}}...{{/IF}}` — Bedingter Block (wird nur angezeigt wenn Feld gesetzt)
|
||||
- `{{#IF_NOT FELD}}...{{/IF_NOT}}` — Invertierter bedingter Block
|
||||
- `[BLOCK:ID]...[/BLOCK:ID]` — Block der per Rule Engine entfernt werden kann
|
||||
|
||||
## Lizenz
|
||||
|
||||
Alle Templates: MIT License, BreakPilot Compliance.
|
||||
Keine Texte aus DPA-Dokumenten uebernommen — alle Formulierungen eigenstaendig.
|
||||
455
document-templates/generators/dsfa_template.py
Normal file
455
document-templates/generators/dsfa_template.py
Normal file
@@ -0,0 +1,455 @@
|
||||
"""DSFA template generator V2 — creates DSFA skeleton from company profile.
|
||||
|
||||
Enhanced with:
|
||||
- Schwellwertanalyse (9 WP248 criteria)
|
||||
- Bundesland-specific Muss-Listen references
|
||||
- SDM-based TOM structure (7 Gewaehrleistungsziele)
|
||||
- Structured risk assessment (ISO 29134 methodology)
|
||||
- AI Act module (Section 8)
|
||||
- Art. 36 consultation assessment
|
||||
"""
|
||||
|
||||
from typing import Optional
|
||||
|
||||
# -- WP248 Kriterien --------------------------------------------------------
|
||||
|
||||
WP248_CRITERIA = [
|
||||
{"id": "K1", "label": "Bewertung oder Scoring (einschl. Profiling und Prognose)",
|
||||
"ctx_keys": ["has_profiling", "has_scoring"]},
|
||||
{"id": "K2", "label": "Automatisierte Entscheidungsfindung mit Rechtswirkung",
|
||||
"ctx_keys": ["has_automated_decisions"]},
|
||||
{"id": "K3", "label": "Systematische Ueberwachung von Personen",
|
||||
"ctx_keys": ["has_surveillance", "has_employee_monitoring", "has_video_surveillance"]},
|
||||
{"id": "K4", "label": "Verarbeitung sensibler Daten (Art. 9/10 DS-GVO)",
|
||||
"ctx_keys": ["processes_health_data", "processes_biometric_data", "processes_criminal_data"]},
|
||||
{"id": "K5", "label": "Datenverarbeitung in grossem Umfang",
|
||||
"ctx_keys": ["large_scale_processing"]},
|
||||
{"id": "K6", "label": "Verknuepfung oder Zusammenfuehrung von Datenbestaenden",
|
||||
"ctx_keys": ["data_matching", "data_combining"]},
|
||||
{"id": "K7", "label": "Daten zu schutzbeduerftigen Betroffenen",
|
||||
"ctx_keys": ["processes_minors_data", "processes_employee_data", "processes_patient_data"]},
|
||||
{"id": "K8", "label": "Innovative Nutzung neuer technologischer Loesungen",
|
||||
"ctx_keys": ["uses_ai", "uses_biometrics", "uses_iot"]},
|
||||
{"id": "K9", "label": "Verarbeitung hindert Betroffene an Rechtsausuebung",
|
||||
"ctx_keys": ["blocks_service_access", "blocks_contract"]},
|
||||
]
|
||||
|
||||
# -- Bundesland -> Aufsichtsbehoerde Mapping --------------------------------
|
||||
|
||||
BUNDESLAND_AUFSICHT = {
|
||||
"baden-wuerttemberg": ("LfDI Baden-Wuerttemberg", "DSK Muss-Liste + BW-spezifische Liste (Art. 35 Abs. 4)"),
|
||||
"bayern": ("BayLDA (nicht-oeffentlicher Bereich)", "BayLDA Muss-Liste (17.10.2018) + Fallbeispiel ISO 29134"),
|
||||
"berlin": ("BlnBDI", "BlnBDI Muss-Liste nicht-oeffentlich / oeffentlich"),
|
||||
"brandenburg": ("LDA Brandenburg", "LDA BB Muss-Liste allgemein / oeffentlich"),
|
||||
"bremen": ("LfDI Bremen", "LfDI HB Muss-Liste"),
|
||||
"hamburg": ("HmbBfDI", "HmbBfDI Muss-Liste nicht-oeffentlich / oeffentlich"),
|
||||
"hessen": ("HBDI", "DSK Muss-Liste (HBDI uebernimmt DSK-Liste)"),
|
||||
"mecklenburg-vorpommern": ("LfDI M-V", "LfDI M-V Muss-Liste"),
|
||||
"niedersachsen": ("LfD Niedersachsen", "LfD NI Muss-Liste + Pruefschema"),
|
||||
"nordrhein-westfalen": ("LDI NRW", "LDI NRW Muss-Liste nicht-oeffentlich / oeffentlich"),
|
||||
"rheinland-pfalz": ("LfDI RLP", "LfDI RLP Muss-Liste allgemein / oeffentlich"),
|
||||
"saarland": ("UDS Saarland", "DSK Muss-Liste (UDS uebernimmt DSK-Liste)"),
|
||||
"sachsen": ("SDB Sachsen", "SDB Sachsen Muss-Liste"),
|
||||
"sachsen-anhalt": ("LfD Sachsen-Anhalt", "LfD SA Muss-Liste allgemein / oeffentlich"),
|
||||
"schleswig-holstein": ("ULD Schleswig-Holstein", "ULD Muss-Liste + Planspiel-DSFA"),
|
||||
"thueringen": ("TLfDI", "TLfDI Muss-Liste (04.07.2018)"),
|
||||
"bund": ("BfDI", "BfDI Muss-Liste / DSFA-Hinweise"),
|
||||
}
|
||||
|
||||
# -- SDM Gewaehrleistungsziele -----------------------------------------------
|
||||
|
||||
SDM_GOALS = [
|
||||
{
|
||||
"id": "verfuegbarkeit",
|
||||
"label": "Verfuegbarkeit",
|
||||
"description": "Personenbezogene Daten stehen zeitgerecht zur Verfuegung und koennen ordnungsgemaess verarbeitet werden.",
|
||||
"default_measures": [
|
||||
"Redundante Datenhaltung und regelmaessige Backups",
|
||||
"Disaster-Recovery-Plan mit definierten RTO/RPO-Werten",
|
||||
"USV und Notstromversorgung fuer kritische Systeme",
|
||||
],
|
||||
},
|
||||
{
|
||||
"id": "integritaet",
|
||||
"label": "Integritaet",
|
||||
"description": "Personenbezogene Daten bleiben waehrend der Verarbeitung unversehrt, vollstaendig und aktuell.",
|
||||
"default_measures": [
|
||||
"Pruefsummen und digitale Signaturen fuer Datenuebertragungen",
|
||||
"Eingabevalidierung und Plausibilitaetspruefungen",
|
||||
"Versionierung und Change-Management-Verfahren",
|
||||
],
|
||||
},
|
||||
{
|
||||
"id": "vertraulichkeit",
|
||||
"label": "Vertraulichkeit",
|
||||
"description": "Nur befugte Personen koennen personenbezogene Daten zur Kenntnis nehmen.",
|
||||
"default_measures": [
|
||||
"Verschluesselung: TLS 1.3 im Transit, AES-256 at Rest",
|
||||
"Rollenbasiertes Zugriffskonzept (RBAC) mit Least-Privilege-Prinzip",
|
||||
"Multi-Faktor-Authentifizierung fuer administrative Zugaenge",
|
||||
],
|
||||
},
|
||||
{
|
||||
"id": "nichtverkettung",
|
||||
"label": "Nichtverkettung",
|
||||
"description": "Personenbezogene Daten werden nur fuer den Zweck verarbeitet, zu dem sie erhoben wurden.",
|
||||
"default_measures": [
|
||||
"Technische Zweckbindung durch Mandantentrennung",
|
||||
"Pseudonymisierung wo fachlich moeglich",
|
||||
"Getrennte Datenbanken / Schemata je Verarbeitungszweck",
|
||||
],
|
||||
},
|
||||
{
|
||||
"id": "transparenz",
|
||||
"label": "Transparenz",
|
||||
"description": "Betroffene, der Verantwortliche und die Aufsichtsbehoerde koennen die Verarbeitung nachvollziehen.",
|
||||
"default_measures": [
|
||||
"Vollstaendiges Audit-Log aller Datenzugriffe und -aenderungen",
|
||||
"Verzeichnis der Verarbeitungstaetigkeiten (Art. 30 DS-GVO)",
|
||||
"Informationspflichten gemaess Art. 13/14 DS-GVO umgesetzt",
|
||||
],
|
||||
},
|
||||
{
|
||||
"id": "intervenierbarkeit",
|
||||
"label": "Intervenierbarkeit",
|
||||
"description": "Betroffenenrechte (Auskunft, Berichtigung, Loeschung, Widerspruch) koennen wirksam ausgeuebt werden.",
|
||||
"default_measures": [
|
||||
"Self-Service-Portal oder dokumentierter Prozess fuer Betroffenenanfragen",
|
||||
"Technische Loeschfaehigkeit mit Nachweis (Loeschprotokoll)",
|
||||
"Datenexport in maschinenlesbarem Format (Art. 20 DS-GVO)",
|
||||
],
|
||||
},
|
||||
{
|
||||
"id": "datenminimierung",
|
||||
"label": "Datenminimierung",
|
||||
"description": "Die Verarbeitung beschraenkt sich auf das erforderliche Mass.",
|
||||
"default_measures": [
|
||||
"Regelmaessige Pruefung der Erforderlichkeit erhobener Datenfelder",
|
||||
"Automatisierte Loeschung nach Ablauf der Aufbewahrungsfrist",
|
||||
"Anonymisierung / Aggregation fuer statistische Zwecke",
|
||||
],
|
||||
},
|
||||
]
|
||||
|
||||
|
||||
def generate_dsfa_draft(ctx: dict) -> dict:
|
||||
"""Generate a DSFA draft document from template context.
|
||||
|
||||
Args:
|
||||
ctx: Flat dict from company-profile/template-context endpoint.
|
||||
|
||||
Returns:
|
||||
Dict with DSFA fields ready for creation via POST /dsfa.
|
||||
"""
|
||||
company = ctx.get("company_name", "Unbekannt")
|
||||
dpo = ctx.get("dpo_name", "")
|
||||
dpo_email = ctx.get("dpo_email", "")
|
||||
federal_state = ctx.get("federal_state", "").lower().replace(" ", "-")
|
||||
|
||||
# --- Section 0: Schwellwertanalyse ---
|
||||
schwellwert = _generate_schwellwertanalyse(ctx)
|
||||
|
||||
# --- Section 1: Verarbeitungsbeschreibung ---
|
||||
section_1 = _generate_section_1(ctx, company, dpo, dpo_email)
|
||||
|
||||
# --- Section 2: Notwendigkeit ---
|
||||
section_2 = _generate_section_2(ctx)
|
||||
|
||||
# --- Section 3: Risikobewertung ---
|
||||
section_3 = _generate_risk_assessment(ctx)
|
||||
|
||||
# --- Section 4: Stakeholder-Konsultation ---
|
||||
section_4 = _generate_section_4(ctx)
|
||||
|
||||
# --- Section 5: TOM nach SDM ---
|
||||
section_5 = _generate_sdm_tom_section(ctx)
|
||||
|
||||
# --- Section 6: DSB-Stellungnahme ---
|
||||
section_6 = _generate_section_6(ctx, dpo)
|
||||
|
||||
# --- Section 7: Ergebnis ---
|
||||
section_7 = _generate_section_7(ctx)
|
||||
|
||||
# --- Section 8: KI-Modul ---
|
||||
ai_systems = ctx.get("ai_systems", [])
|
||||
involves_ai = len(ai_systems) > 0
|
||||
section_8 = _generate_ai_module(ctx) if involves_ai else None
|
||||
|
||||
sections = {
|
||||
"section_0": {"title": "Schwellwertanalyse", "content": schwellwert["content"]},
|
||||
"section_1": {"title": "Allgemeine Informationen und Verarbeitungsbeschreibung", "content": section_1},
|
||||
"section_2": {"title": "Notwendigkeit und Verhaeltnismaessigkeit", "content": section_2},
|
||||
"section_3": {"title": "Risikobewertung", "content": section_3},
|
||||
"section_4": {"title": "Konsultation der Betroffenen", "content": section_4},
|
||||
"section_5": {"title": "Technische und organisatorische Massnahmen (SDM)", "content": section_5},
|
||||
"section_6": {"title": "Stellungnahme des DSB", "content": section_6},
|
||||
"section_7": {"title": "Ergebnis und Ueberprufungsplan", "content": section_7},
|
||||
}
|
||||
if section_8:
|
||||
sections["section_8"] = {"title": "KI-spezifisches Modul (EU AI Act)", "content": section_8}
|
||||
|
||||
# Assess Art. 36 consultation requirement
|
||||
art36_required = _assess_art36_consultation(ctx, schwellwert)
|
||||
|
||||
return {
|
||||
"title": f"DSFA — {company}",
|
||||
"description": f"Datenschutz-Folgenabschaetzung fuer {company}",
|
||||
"status": "draft",
|
||||
"risk_level": "high" if involves_ai or schwellwert["criteria_met"] >= 3 else "medium",
|
||||
"involves_ai": involves_ai,
|
||||
"dpo_name": dpo,
|
||||
"federal_state": ctx.get("federal_state", ""),
|
||||
"sections": sections,
|
||||
"wp248_criteria_met": schwellwert["criteria_details"],
|
||||
"art35_abs3_triggered": schwellwert["art35_abs3"],
|
||||
"threshold_analysis": {
|
||||
"criteria_met_count": schwellwert["criteria_met"],
|
||||
"dsfa_required": schwellwert["dsfa_required"],
|
||||
"muss_liste_ref": schwellwert.get("muss_liste_ref", ""),
|
||||
},
|
||||
"consultation_requirement": {
|
||||
"art36_required": art36_required,
|
||||
"reason": "Restrisiko bleibt nach Massnahmen hoch" if art36_required else "Restrisiko akzeptabel",
|
||||
},
|
||||
"processing_systems": [s.get("name", "") for s in ctx.get("processing_systems", [])],
|
||||
"ai_systems_summary": [
|
||||
{"name": s.get("name"), "risk": s.get("risk_category", "unknown")}
|
||||
for s in ai_systems
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
# -- Internal helpers --------------------------------------------------------
|
||||
|
||||
def _generate_schwellwertanalyse(ctx: dict) -> dict:
|
||||
"""Evaluate 9 WP248 criteria against company profile."""
|
||||
criteria_details = []
|
||||
criteria_met = 0
|
||||
|
||||
for criterion in WP248_CRITERIA:
|
||||
met = any(ctx.get(key) for key in criterion["ctx_keys"])
|
||||
criteria_details.append({
|
||||
"id": criterion["id"],
|
||||
"label": criterion["label"],
|
||||
"met": met,
|
||||
})
|
||||
if met:
|
||||
criteria_met += 1
|
||||
|
||||
# Art. 35 Abs. 3 specific triggers
|
||||
art35_abs3 = []
|
||||
if ctx.get("has_profiling") and ctx.get("has_automated_decisions"):
|
||||
art35_abs3.append("Art. 35 Abs. 3 lit. a: Profiling mit Rechtswirkung")
|
||||
if any(ctx.get(k) for k in ["processes_health_data", "processes_biometric_data", "processes_criminal_data"]):
|
||||
if ctx.get("large_scale_processing"):
|
||||
art35_abs3.append("Art. 35 Abs. 3 lit. b: Umfangreiche Verarbeitung besonderer Kategorien")
|
||||
if ctx.get("has_surveillance"):
|
||||
art35_abs3.append("Art. 35 Abs. 3 lit. c: Systematische Ueberwachung oeffentlicher Bereiche")
|
||||
|
||||
dsfa_required = criteria_met >= 2 or len(art35_abs3) > 0
|
||||
|
||||
# Bundesland reference
|
||||
federal_state = ctx.get("federal_state", "").lower().replace(" ", "-")
|
||||
aufsicht_info = BUNDESLAND_AUFSICHT.get(federal_state, ("Nicht zugeordnet", "DSK Muss-Liste (allgemein)"))
|
||||
|
||||
met_labels = [c["label"] for c in criteria_details if c["met"]]
|
||||
content_lines = [
|
||||
f"**Anzahl erfuellter WP248-Kriterien:** {criteria_met} von 9\n",
|
||||
f"**Erfuellte Kriterien:** {', '.join(met_labels) if met_labels else 'Keine'}\n",
|
||||
]
|
||||
if art35_abs3:
|
||||
content_lines.append(f"**Art. 35 Abs. 3 DS-GVO direkt ausgeloest:** {'; '.join(art35_abs3)}\n")
|
||||
content_lines.append(
|
||||
f"\n**Ergebnis:** DSFA ist {'**erforderlich**' if dsfa_required else '**nicht erforderlich**'}."
|
||||
)
|
||||
if dsfa_required and criteria_met < 2:
|
||||
content_lines.append(" (Ausgeloest durch Art. 35 Abs. 3 DS-GVO)")
|
||||
|
||||
return {
|
||||
"content": "\n".join(content_lines),
|
||||
"criteria_met": criteria_met,
|
||||
"criteria_details": criteria_details,
|
||||
"art35_abs3": art35_abs3,
|
||||
"dsfa_required": dsfa_required,
|
||||
"muss_liste_ref": aufsicht_info[1],
|
||||
}
|
||||
|
||||
|
||||
def _generate_section_1(ctx: dict, company: str, dpo: str, dpo_email: str) -> str:
|
||||
federal_state = ctx.get("federal_state", "")
|
||||
aufsicht = BUNDESLAND_AUFSICHT.get(
|
||||
federal_state.lower().replace(" ", "-"), ("Nicht zugeordnet",)
|
||||
)[0]
|
||||
|
||||
lines = [
|
||||
f"**Verantwortlicher:** {company}",
|
||||
f"**Datenschutzbeauftragter:** {dpo}" + (f" ({dpo_email})" if dpo_email else ""),
|
||||
f"**Zustaendige Aufsichtsbehoerde:** {aufsicht}",
|
||||
]
|
||||
|
||||
systems = ctx.get("processing_systems", [])
|
||||
if systems:
|
||||
lines.append("\n**Eingesetzte Verarbeitungssysteme:**")
|
||||
for s in systems:
|
||||
hosting = s.get("hosting", "")
|
||||
lines.append(f"- {s.get('name', 'N/A')}" + (f" ({hosting})" if hosting else ""))
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def _generate_section_2(ctx: dict) -> str:
|
||||
lines = [
|
||||
"### Notwendigkeit\n",
|
||||
"Die Verarbeitung ist zur Erreichung des beschriebenen Zwecks erforderlich. ",
|
||||
"Alternative, weniger eingriffsintensive Massnahmen wurden geprueft.\n",
|
||||
"### Datenminimierung\n",
|
||||
"Die verarbeiteten Datenkategorien beschraenken sich auf das fuer den ",
|
||||
"Verarbeitungszweck erforderliche Minimum (Art. 5 Abs. 1 lit. c DS-GVO).\n",
|
||||
]
|
||||
return "".join(lines)
|
||||
|
||||
|
||||
def _generate_risk_assessment(ctx: dict) -> str:
|
||||
lines = ["## Risikoanalyse\n"]
|
||||
|
||||
# Standard risks
|
||||
risks = [
|
||||
("Unbefugter Zugriff auf personenbezogene Daten", "mittel", "hoch", "hoch"),
|
||||
("Datenverlust durch technischen Ausfall", "niedrig", "hoch", "mittel"),
|
||||
("Fehlerhafte Verarbeitung / Datenqualitaet", "niedrig", "mittel", "niedrig"),
|
||||
("Zweckentfremdung erhobener Daten", "niedrig", "hoch", "mittel"),
|
||||
]
|
||||
|
||||
if ctx.get("has_ai_systems") or ctx.get("uses_ai"):
|
||||
risks.append(("Diskriminierung durch algorithmische Entscheidungen", "mittel", "hoch", "hoch"))
|
||||
risks.append(("Mangelnde Erklaerbarkeit von KI-Entscheidungen", "mittel", "mittel", "mittel"))
|
||||
|
||||
if ctx.get("processes_health_data"):
|
||||
risks.append(("Offenlegung von Gesundheitsdaten", "niedrig", "gross", "hoch"))
|
||||
|
||||
if any(ctx.get(k) for k in ["third_country_transfer", "processes_in_third_country"]):
|
||||
risks.append(("Zugriff durch Behoerden in Drittlaendern", "mittel", "hoch", "hoch"))
|
||||
|
||||
lines.append("| Risiko | Eintrittswahrscheinlichkeit | Schwere | Gesamt |")
|
||||
lines.append("|--------|----------------------------|---------|--------|")
|
||||
for risk_name, likelihood, severity, overall in risks:
|
||||
lines.append(f"| {risk_name} | {likelihood} | {severity} | **{overall}** |")
|
||||
|
||||
lines.append("")
|
||||
|
||||
high_risks = sum(1 for _, _, _, o in risks if o == "hoch")
|
||||
if high_risks > 0:
|
||||
lines.append(f"\n**{high_risks} Risiken mit Stufe 'hoch' identifiziert.** "
|
||||
"Massnahmen gemaess Abschnitt 5 reduzieren das Restrisiko.")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def _generate_section_4(ctx: dict) -> str:
|
||||
lines = []
|
||||
if ctx.get("has_works_council"):
|
||||
lines.append("Der Betriebsrat wurde informiert und angehoert.")
|
||||
lines.append(
|
||||
"Eine Konsultation der Betroffenen gemaess Art. 35 Abs. 9 DS-GVO "
|
||||
"wird empfohlen, soweit verhaeltnismaessig und praktikabel."
|
||||
)
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def _generate_sdm_tom_section(ctx: dict) -> str:
|
||||
"""Generate TOM section structured by 7 SDM Gewaehrleistungsziele."""
|
||||
lines = []
|
||||
for goal in SDM_GOALS:
|
||||
lines.append(f"**{goal['label']}** — {goal['description']}\n")
|
||||
lines.append("| Massnahme | Typ | Status |")
|
||||
lines.append("|-----------|-----|--------|")
|
||||
for measure in goal["default_measures"]:
|
||||
mtype = "technisch" if any(
|
||||
kw in measure.lower()
|
||||
for kw in ["verschluesselung", "backup", "redundanz", "tls", "aes", "rbac", "mfa",
|
||||
"pruefsumm", "validierung", "loeschfaehigkeit", "export", "automatisiert"]
|
||||
) else "organisatorisch"
|
||||
lines.append(f"| {measure} | {mtype} | geplant |")
|
||||
lines.append("")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def _generate_section_6(ctx: dict, dpo: str) -> str:
|
||||
if dpo:
|
||||
return (
|
||||
f"Der Datenschutzbeauftragte ({dpo}) wurde konsultiert. "
|
||||
"Die Stellungnahme liegt bei bzw. wird nachgereicht."
|
||||
)
|
||||
return (
|
||||
"Ein Datenschutzbeauftragter wurde noch nicht benannt. "
|
||||
"Sofern eine Benennungspflicht besteht (Art. 37 DS-GVO), "
|
||||
"ist dies vor Abschluss der DSFA nachzuholen."
|
||||
)
|
||||
|
||||
|
||||
def _generate_section_7(ctx: dict) -> str:
|
||||
review_months = ctx.get("review_cycle_months", 12)
|
||||
lines = [
|
||||
"### Ergebnis\n",
|
||||
"Die DSFA wurde gemaess Art. 35 DS-GVO durchgefuehrt. Die identifizierten Risiken ",
|
||||
"wurden bewertet und durch geeignete Massnahmen auf ein akzeptables Niveau reduziert.\n",
|
||||
"### Ueberprufungsplan\n",
|
||||
f"- **Regelmaessige Ueberprufung:** alle {review_months} Monate\n",
|
||||
"- **Trigger fuer ausserplanmaessige Ueberprufung:**\n",
|
||||
" - Wesentliche Aenderung der Verarbeitungstaetigkeit\n",
|
||||
" - Neue oder geaenderte Rechtsgrundlage\n",
|
||||
" - Sicherheitsvorfall mit Bezug zur Verarbeitung\n",
|
||||
" - Aenderung der eingesetzten Technologie oder Auftragsverarbeiter\n",
|
||||
" - Neue Erkenntnisse zu Risiken oder Bedrohungen\n",
|
||||
]
|
||||
return "".join(lines)
|
||||
|
||||
|
||||
def _generate_ai_module(ctx: dict) -> str:
|
||||
"""Generate Section 8 for AI systems (EU AI Act)."""
|
||||
lines = ["### Eingesetzte KI-Systeme\n"]
|
||||
|
||||
ai_systems = ctx.get("ai_systems", [])
|
||||
if ai_systems:
|
||||
lines.append("| System | Zweck | Risikokategorie | Human Oversight |")
|
||||
lines.append("|--------|-------|-----------------|-----------------|")
|
||||
for s in ai_systems:
|
||||
risk = s.get("risk_category", "unbekannt")
|
||||
oversight = "Ja" if s.get("has_human_oversight") else "Nein"
|
||||
lines.append(f"| {s.get('name', 'N/A')} | {s.get('purpose', 'N/A')} | {risk} | {oversight} |")
|
||||
lines.append("")
|
||||
|
||||
if ctx.get("subject_to_ai_act"):
|
||||
lines.append(
|
||||
"**Hinweis:** Das Unternehmen unterliegt dem EU AI Act (Verordnung (EU) 2024/1689). "
|
||||
"Fuer Hochrisiko-KI-Systeme ist eine grundrechtliche Folgenabschaetzung "
|
||||
"gemaess Art. 27 KI-VO durchzufuehren.\n"
|
||||
)
|
||||
|
||||
high_risk = [s for s in ai_systems if s.get("risk_category") in ("high", "hoch")]
|
||||
if high_risk:
|
||||
lines.append("### Hochrisiko-KI-Systeme — Zusatzanforderungen\n")
|
||||
lines.append("Fuer die folgenden Systeme gelten die Anforderungen aus Kapitel III KI-VO:\n")
|
||||
for s in high_risk:
|
||||
lines.append(f"- **{s.get('name', 'N/A')}**: Risikomanagement (Art. 9), "
|
||||
f"Daten-Governance (Art. 10), Transparenz (Art. 13), "
|
||||
f"Human Oversight (Art. 14)\n")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def _assess_art36_consultation(ctx: dict, schwellwert: dict) -> bool:
|
||||
"""Determine if Art. 36 DSGVO consultation with supervisory authority is required.
|
||||
|
||||
Art. 36 requires prior consultation when the DSFA indicates that the processing
|
||||
would result in a HIGH residual risk despite mitigation measures.
|
||||
"""
|
||||
if schwellwert["criteria_met"] >= 4:
|
||||
return True
|
||||
if len(schwellwert.get("art35_abs3", [])) >= 2:
|
||||
return True
|
||||
ai_systems = ctx.get("ai_systems", [])
|
||||
high_risk_ai = [s for s in ai_systems if s.get("risk_category") in ("high", "hoch", "unacceptable")]
|
||||
if len(high_risk_ai) >= 2:
|
||||
return True
|
||||
return False
|
||||
285
document-templates/generators/tom_template.py
Normal file
285
document-templates/generators/tom_template.py
Normal file
@@ -0,0 +1,285 @@
|
||||
"""TOM template generator V2 — SDM-structured TOM catalog.
|
||||
|
||||
Replaces the flat 17-measure list with a hierarchical structure based on
|
||||
the 7 SDM Gewaehrleistungsziele (Standard-Datenschutzmodell V3.1a).
|
||||
"""
|
||||
|
||||
# -- SDM-structured TOM catalog ---------------------------------------------
|
||||
|
||||
SDM_TOM_CATALOG = {
|
||||
"verfuegbarkeit": {
|
||||
"label": "Verfuegbarkeit",
|
||||
"sdm_baustein": "SDM-B11 (Aufbewahren)",
|
||||
"measures": [
|
||||
{"name": "Redundante Datenhaltung", "description": "RAID, Replikation, Geo-Redundanz", "type": "technical"},
|
||||
{"name": "Backup-Strategie", "description": "Taeglich inkrementell, woechentlich voll, verschluesselt", "type": "technical"},
|
||||
{"name": "Disaster-Recovery-Plan", "description": "Dokumentierte RTO/RPO-Werte, jaehrliche Tests", "type": "organizational"},
|
||||
{"name": "USV / Notstromversorgung", "description": "Unterbrechungsfreie Stromversorgung fuer kritische Systeme", "type": "technical"},
|
||||
],
|
||||
},
|
||||
"integritaet": {
|
||||
"label": "Integritaet",
|
||||
"sdm_baustein": "SDM-B61 (Berichtigen)",
|
||||
"measures": [
|
||||
{"name": "Pruefsummen und Signaturen", "description": "Digitale Signaturen fuer Datenuebertragungen", "type": "technical"},
|
||||
{"name": "Eingabevalidierung", "description": "Plausibilitaetspruefungen auf allen Eingabeschnittstellen", "type": "technical"},
|
||||
{"name": "Change Management", "description": "Dokumentierte Aenderungsverfahren mit Freigabeprozess", "type": "organizational"},
|
||||
{"name": "Versionierung", "description": "Versionierung von Datensaetzen und Konfigurationen", "type": "technical"},
|
||||
],
|
||||
},
|
||||
"vertraulichkeit": {
|
||||
"label": "Vertraulichkeit",
|
||||
"sdm_baustein": "SDM-B51 (Zugriffe regeln)",
|
||||
"measures": [
|
||||
{"name": "Verschluesselung im Transit", "description": "TLS 1.3 fuer alle Verbindungen", "type": "technical"},
|
||||
{"name": "Verschluesselung at Rest", "description": "AES-256 fuer gespeicherte Daten", "type": "technical"},
|
||||
{"name": "Zugriffskonzept (RBAC)", "description": "Rollenbasiert, Least-Privilege-Prinzip, regelmaessige Reviews", "type": "technical"},
|
||||
{"name": "Multi-Faktor-Authentifizierung", "description": "MFA fuer alle administrativen Zugaenge", "type": "technical"},
|
||||
{"name": "Physische Zutrittskontrolle", "description": "Schluessel, Kartenleser, Besucherprotokoll", "type": "technical"},
|
||||
{"name": "Vertraulichkeitsverpflichtung", "description": "Schriftliche Verpflichtung aller Mitarbeitenden", "type": "organizational"},
|
||||
],
|
||||
},
|
||||
"nichtverkettung": {
|
||||
"label": "Nichtverkettung",
|
||||
"sdm_baustein": "SDM-B50 (Trennen)",
|
||||
"measures": [
|
||||
{"name": "Mandantentrennung", "description": "Logische Datentrennung nach Mandanten/Zweck", "type": "technical"},
|
||||
{"name": "Pseudonymisierung", "description": "Wo fachlich moeglich, Einsatz von Pseudonymen", "type": "technical"},
|
||||
{"name": "Zweckbindungspruefung", "description": "Pruefung bei jeder neuen Datennutzung", "type": "organizational"},
|
||||
],
|
||||
},
|
||||
"transparenz": {
|
||||
"label": "Transparenz",
|
||||
"sdm_baustein": "SDM-B42 (Dokumentieren), SDM-B43 (Protokollieren)",
|
||||
"measures": [
|
||||
{"name": "Verarbeitungsverzeichnis", "description": "Art. 30 DS-GVO konformes VVT", "type": "organizational"},
|
||||
{"name": "Audit-Logging", "description": "Vollstaendige Protokollierung aller Datenzugriffe", "type": "technical"},
|
||||
{"name": "Informationspflichten", "description": "Art. 13/14 DS-GVO Datenschutzerklaerung", "type": "organizational"},
|
||||
{"name": "Datenpannen-Prozess", "description": "Dokumentierter Meldeprozess Art. 33/34 DS-GVO", "type": "organizational"},
|
||||
],
|
||||
},
|
||||
"intervenierbarkeit": {
|
||||
"label": "Intervenierbarkeit",
|
||||
"sdm_baustein": "SDM-B60 (Loeschen), SDM-B61 (Berichtigen), SDM-B62 (Einschraenken)",
|
||||
"measures": [
|
||||
{"name": "Betroffenenanfragen-Prozess", "description": "Auskunft, Loeschung, Berichtigung, Widerspruch", "type": "organizational"},
|
||||
{"name": "Technische Loeschfaehigkeit", "description": "Loeschung mit Nachweis (Loeschprotokoll)", "type": "technical"},
|
||||
{"name": "Datenportabilitaet", "description": "Export in maschinenlesbarem Format (Art. 20)", "type": "technical"},
|
||||
{"name": "Sperrfunktion", "description": "Einschraenkung der Verarbeitung moeglich", "type": "technical"},
|
||||
],
|
||||
},
|
||||
"datenminimierung": {
|
||||
"label": "Datenminimierung",
|
||||
"sdm_baustein": "SDM-B41 (Planen und Spezifizieren)",
|
||||
"measures": [
|
||||
{"name": "Erforderlichkeitspruefung", "description": "Regelmaessige Pruefung der erhobenen Datenfelder", "type": "organizational"},
|
||||
{"name": "Automatisierte Loeschung", "description": "Fristgerechte Loeschung nach Aufbewahrungsfrist", "type": "technical"},
|
||||
{"name": "Anonymisierung", "description": "Anonymisierung/Aggregation fuer Statistik", "type": "technical"},
|
||||
{"name": "Privacy by Design", "description": "Datenschutz ab Entwurfsphase neuer Verarbeitungen", "type": "organizational"},
|
||||
],
|
||||
},
|
||||
}
|
||||
|
||||
# -- Sector-specific extensions ----------------------------------------------
|
||||
|
||||
SECTOR_TOMS = {
|
||||
"it_saas": {
|
||||
"label": "IT / SaaS",
|
||||
"measures": [
|
||||
{"name": "Container-Isolation", "description": "Workload-Isolation zwischen Mandanten (Kubernetes Namespaces)", "type": "technical", "sdm_goal": "nichtverkettung"},
|
||||
{"name": "API-Security", "description": "Rate Limiting, OAuth 2.0, API-Key-Rotation", "type": "technical", "sdm_goal": "vertraulichkeit"},
|
||||
{"name": "DevSecOps Pipeline", "description": "SAST/DAST in CI/CD, Dependency Scanning", "type": "technical", "sdm_goal": "integritaet"},
|
||||
{"name": "Secrets Management", "description": "Vault/KMS fuer Credentials, keine Hardcoded Secrets", "type": "technical", "sdm_goal": "vertraulichkeit"},
|
||||
],
|
||||
},
|
||||
"gesundheitswesen": {
|
||||
"label": "Gesundheitswesen",
|
||||
"measures": [
|
||||
{"name": "Patientenakten-Verschluesselung", "description": "Ende-zu-Ende-Verschluesselung fuer Gesundheitsdaten (Art. 9)", "type": "technical", "sdm_goal": "vertraulichkeit"},
|
||||
{"name": "Notfallzugriff", "description": "Break-the-Glass-Verfahren fuer medizinische Notfaelle", "type": "organizational", "sdm_goal": "verfuegbarkeit"},
|
||||
{"name": "Forschungsdaten-Anonymisierung", "description": "Vollstaendige Anonymisierung vor Forschungsnutzung", "type": "technical", "sdm_goal": "datenminimierung"},
|
||||
],
|
||||
},
|
||||
"finanzdienstleistungen": {
|
||||
"label": "Finanzdienstleistungen",
|
||||
"measures": [
|
||||
{"name": "Transaktions-Monitoring", "description": "Echtzeit-Ueberwachung auf Unregelmaessigkeiten (GwG)", "type": "technical", "sdm_goal": "integritaet"},
|
||||
{"name": "Aufbewahrungspflichten", "description": "10 Jahre Aufbewahrung gemaess AO/HGB, danach Loeschung", "type": "organizational", "sdm_goal": "datenminimierung"},
|
||||
{"name": "PCI-DSS Compliance", "description": "Payment Card Industry Standards fuer Kartendaten", "type": "technical", "sdm_goal": "vertraulichkeit"},
|
||||
],
|
||||
},
|
||||
"handel": {
|
||||
"label": "Handel / E-Commerce",
|
||||
"measures": [
|
||||
{"name": "Cookie-Consent-Management", "description": "TDDDG-konformes Einwilligungsmanagement", "type": "technical", "sdm_goal": "transparenz"},
|
||||
{"name": "Gastzugang-Option", "description": "Bestellung ohne Pflicht-Kundenkonto (Datenminimierung)", "type": "organizational", "sdm_goal": "datenminimierung"},
|
||||
{"name": "Zahlungsdaten-Tokenisierung", "description": "Keine direkte Speicherung von Zahlungsdaten", "type": "technical", "sdm_goal": "vertraulichkeit"},
|
||||
],
|
||||
},
|
||||
"handwerk": {
|
||||
"label": "Handwerk",
|
||||
"measures": [
|
||||
{"name": "Mobile-Device-Management", "description": "Absicherung mobiler Endgeraete auf Baustellen", "type": "technical", "sdm_goal": "vertraulichkeit"},
|
||||
{"name": "Papierakten-Sicherung", "description": "Verschlossene Schraenke fuer physische Kundenakten", "type": "technical", "sdm_goal": "vertraulichkeit"},
|
||||
],
|
||||
},
|
||||
}
|
||||
|
||||
# -- NIS2 / ISO 27001 / AI Act extensions -----------------------------------
|
||||
|
||||
NIS2_TOMS = [
|
||||
{"name": "Incident-Response-Plan", "description": "NIS2-konformer Vorfallreaktionsplan (72h Meldepflicht an BSI)", "type": "organizational", "sdm_goal": "verfuegbarkeit"},
|
||||
{"name": "Supply-Chain-Security", "description": "Bewertung der Lieferkettensicherheit (BSIG 2025)", "type": "organizational", "sdm_goal": "integritaet"},
|
||||
{"name": "Vulnerability Management", "description": "Regelmaessige Schwachstellenscans, Patch-Management", "type": "technical", "sdm_goal": "integritaet"},
|
||||
{"name": "Security Awareness", "description": "Pflicht-Schulungen Cybersicherheit fuer Geschaeftsleitung", "type": "organizational", "sdm_goal": "vertraulichkeit"},
|
||||
]
|
||||
|
||||
ISO27001_TOMS = [
|
||||
{"name": "ISMS Risikomanagement", "description": "ISO 27001 Anhang A — Informationssicherheits-Risikobewertung", "type": "organizational", "sdm_goal": "verfuegbarkeit"},
|
||||
{"name": "Dokumentenlenkung", "description": "Versionierte Sicherheitsrichtlinien und -verfahren", "type": "organizational", "sdm_goal": "transparenz"},
|
||||
{"name": "Management Review", "description": "Jaehrliche Ueberprufung des ISMS durch Geschaeftsleitung", "type": "organizational", "sdm_goal": "transparenz"},
|
||||
]
|
||||
|
||||
AI_ACT_TOMS = [
|
||||
{"name": "KI-Risikoklassifizierung", "description": "Bewertung aller KI-Systeme nach EU AI Act Risikokategorien", "type": "organizational", "sdm_goal": "transparenz"},
|
||||
{"name": "Human Oversight", "description": "Menschliche Aufsicht fuer Hochrisiko-KI-Systeme (Art. 14 KI-VO)", "type": "organizational", "sdm_goal": "intervenierbarkeit"},
|
||||
{"name": "KI-Transparenz", "description": "Transparenzpflichten bei KI-Einsatz gegenueber Betroffenen (Art. 13 KI-VO)", "type": "organizational", "sdm_goal": "transparenz"},
|
||||
{"name": "KI-Bias-Monitoring", "description": "Ueberwachung auf diskriminierende Ergebnisse", "type": "technical", "sdm_goal": "integritaet"},
|
||||
]
|
||||
|
||||
|
||||
def generate_tom_drafts(ctx: dict) -> list[dict]:
|
||||
"""Generate TOM measure drafts structured by SDM Gewaehrleistungsziele.
|
||||
|
||||
Args:
|
||||
ctx: Flat dict from company-profile/template-context.
|
||||
|
||||
Returns:
|
||||
List of TOM measure dicts with SDM goal assignment.
|
||||
"""
|
||||
measures = []
|
||||
control_counter = 0
|
||||
|
||||
# Base SDM measures
|
||||
for goal_key, goal_data in SDM_TOM_CATALOG.items():
|
||||
for m in goal_data["measures"]:
|
||||
control_counter += 1
|
||||
measures.append(_build_measure(
|
||||
counter=control_counter,
|
||||
measure=m,
|
||||
sdm_goal=goal_key,
|
||||
sdm_baustein=goal_data["sdm_baustein"],
|
||||
category=goal_data["label"],
|
||||
ctx=ctx,
|
||||
))
|
||||
|
||||
# Regulatory extensions
|
||||
if ctx.get("subject_to_nis2"):
|
||||
for m in NIS2_TOMS:
|
||||
control_counter += 1
|
||||
measures.append(_build_measure(
|
||||
counter=control_counter,
|
||||
measure=m,
|
||||
sdm_goal=m["sdm_goal"],
|
||||
sdm_baustein="NIS2 / BSIG 2025",
|
||||
category="Cybersicherheit (NIS2)",
|
||||
ctx=ctx,
|
||||
))
|
||||
|
||||
if ctx.get("subject_to_iso27001"):
|
||||
for m in ISO27001_TOMS:
|
||||
control_counter += 1
|
||||
measures.append(_build_measure(
|
||||
counter=control_counter,
|
||||
measure=m,
|
||||
sdm_goal=m["sdm_goal"],
|
||||
sdm_baustein="ISO 27001 Anhang A",
|
||||
category="ISMS (ISO 27001)",
|
||||
ctx=ctx,
|
||||
))
|
||||
|
||||
if ctx.get("subject_to_ai_act") or ctx.get("has_ai_systems"):
|
||||
for m in AI_ACT_TOMS:
|
||||
control_counter += 1
|
||||
measures.append(_build_measure(
|
||||
counter=control_counter,
|
||||
measure=m,
|
||||
sdm_goal=m["sdm_goal"],
|
||||
sdm_baustein="EU AI Act (2024/1689)",
|
||||
category="KI-Compliance",
|
||||
ctx=ctx,
|
||||
))
|
||||
|
||||
# Sector-specific extensions
|
||||
sector = _detect_sector(ctx)
|
||||
if sector and sector in SECTOR_TOMS:
|
||||
sector_data = SECTOR_TOMS[sector]
|
||||
for m in sector_data["measures"]:
|
||||
control_counter += 1
|
||||
measures.append(_build_measure(
|
||||
counter=control_counter,
|
||||
measure=m,
|
||||
sdm_goal=m.get("sdm_goal", "vertraulichkeit"),
|
||||
sdm_baustein=f"Sektor: {sector_data['label']}",
|
||||
category=f"Sektor ({sector_data['label']})",
|
||||
ctx=ctx,
|
||||
))
|
||||
|
||||
return measures
|
||||
|
||||
|
||||
def sdm_coverage_summary(measures: list[dict]) -> dict:
|
||||
"""Return coverage matrix: SDM goal -> measure count."""
|
||||
summary = {}
|
||||
for goal_key in SDM_TOM_CATALOG:
|
||||
count = sum(1 for m in measures if m.get("sdm_goal") == goal_key)
|
||||
summary[goal_key] = {
|
||||
"label": SDM_TOM_CATALOG[goal_key]["label"],
|
||||
"count": count,
|
||||
}
|
||||
return summary
|
||||
|
||||
|
||||
# -- Internal helpers --------------------------------------------------------
|
||||
|
||||
def _build_measure(counter: int, measure: dict, sdm_goal: str,
|
||||
sdm_baustein: str, category: str, ctx: dict) -> dict:
|
||||
return {
|
||||
"control_id": f"TOM-SDM-{counter:03d}",
|
||||
"name": measure["name"],
|
||||
"description": measure["description"],
|
||||
"category": category,
|
||||
"type": measure.get("type", "organizational"),
|
||||
"sdm_goal": sdm_goal,
|
||||
"sdm_baustein_ref": sdm_baustein,
|
||||
"implementation_status": "not_implemented",
|
||||
"effectiveness_rating": "not_assessed",
|
||||
"responsible_department": "IT-Sicherheit",
|
||||
"priority": _assess_priority(measure, ctx),
|
||||
"review_frequency": f"{ctx.get('review_cycle_months', 12)} Monate",
|
||||
}
|
||||
|
||||
|
||||
def _assess_priority(measure: dict, ctx: dict) -> str:
|
||||
name_lower = measure.get("name", "").lower()
|
||||
if any(kw in name_lower for kw in ["verschluesselung", "mfa", "incident", "ki-risiko"]):
|
||||
return "high"
|
||||
if any(kw in name_lower for kw in ["backup", "zugriff", "logging", "loeschung"]):
|
||||
return "high"
|
||||
return "medium"
|
||||
|
||||
|
||||
def _detect_sector(ctx: dict) -> str | None:
|
||||
"""Map company industry to sector key."""
|
||||
industry = (ctx.get("industry") or "").lower()
|
||||
mapping = {
|
||||
"technologie": "it_saas", "it": "it_saas", "saas": "it_saas", "software": "it_saas",
|
||||
"gesundheit": "gesundheitswesen", "pharma": "gesundheitswesen", "medizin": "gesundheitswesen",
|
||||
"finanz": "finanzdienstleistungen", "bank": "finanzdienstleistungen", "versicherung": "finanzdienstleistungen",
|
||||
"handel": "handel", "e-commerce": "handel", "einzelhandel": "handel", "shop": "handel",
|
||||
"handwerk": "handwerk", "bau": "handwerk", "kfz": "handwerk",
|
||||
}
|
||||
for keyword, sector in mapping.items():
|
||||
if keyword in industry:
|
||||
return sector
|
||||
return None
|
||||
393
document-templates/generators/vvt_template.py
Normal file
393
document-templates/generators/vvt_template.py
Normal file
@@ -0,0 +1,393 @@
|
||||
"""VVT template generator V2 — sector-specific VVT activity drafts.
|
||||
|
||||
Generates Art. 30 DS-GVO compliant VVT entries with sector-specific
|
||||
standard processing activities inspired by BayLDA patterns.
|
||||
"""
|
||||
|
||||
from typing import Optional
|
||||
|
||||
# -- Sector activity catalogs ------------------------------------------------
|
||||
|
||||
SECTOR_ACTIVITIES = {
|
||||
"it_saas": [
|
||||
{
|
||||
"name": "SaaS-Plattformbetrieb",
|
||||
"purposes": ["Bereitstellung und Betrieb der SaaS-Plattform"],
|
||||
"legal_bases": ["Art. 6 Abs. 1 lit. b DS-GVO (Vertragserfullung)"],
|
||||
"data_subject_categories": ["Kunden", "Endnutzer"],
|
||||
"personal_data_categories": ["Stammdaten", "Nutzungsdaten", "Inhaltsdaten", "Logdaten"],
|
||||
"recipient_categories": ["Hosting-Anbieter (AVV)", "Support-Dienstleister (AVV)"],
|
||||
"retention_period": "90 Tage nach Vertragsende + gesetzl. Aufbewahrung",
|
||||
"tom_description": "Mandantentrennung, Verschluesselung, RBAC",
|
||||
"dpia_required": True,
|
||||
},
|
||||
{
|
||||
"name": "Kundenverwaltung / CRM",
|
||||
"purposes": ["Verwaltung von Kundenbeziehungen, Vertragsmanagement"],
|
||||
"legal_bases": ["Art. 6 Abs. 1 lit. b DS-GVO"],
|
||||
"data_subject_categories": ["Kunden", "Ansprechpartner", "Interessenten"],
|
||||
"personal_data_categories": ["Kontaktdaten", "Vertragsdaten", "Kommunikationshistorie"],
|
||||
"recipient_categories": ["CRM-Anbieter (AVV)"],
|
||||
"retention_period": "3 Jahre nach letztem Kontakt, 10 Jahre Rechnungsdaten",
|
||||
"tom_description": "Zugriffsbeschraenkung Vertrieb/Support, Protokollierung",
|
||||
},
|
||||
{
|
||||
"name": "E-Mail-Marketing / Newsletter",
|
||||
"purposes": ["Versand von Produkt-Updates und Marketing-Newsletter"],
|
||||
"legal_bases": ["Art. 6 Abs. 1 lit. a DS-GVO (Einwilligung)", "UWG §7"],
|
||||
"data_subject_categories": ["Newsletter-Abonnenten"],
|
||||
"personal_data_categories": ["E-Mail-Adresse", "Name", "Oeffnungs-/Klickverhalten"],
|
||||
"recipient_categories": ["E-Mail-Dienstleister (AVV)"],
|
||||
"retention_period": "Unverzueglich nach Widerruf",
|
||||
"tom_description": "Double-Opt-In, einfache Abmeldefunktion",
|
||||
},
|
||||
{
|
||||
"name": "Webanalyse",
|
||||
"purposes": ["Analyse der Website-Nutzung zur Verbesserung"],
|
||||
"legal_bases": ["Art. 6 Abs. 1 lit. a DS-GVO (Einwilligung via Cookie-Banner)"],
|
||||
"data_subject_categories": ["Website-Besucher"],
|
||||
"personal_data_categories": ["IP-Adresse (anonymisiert)", "Seitenaufrufe", "Geraeteinformationen"],
|
||||
"recipient_categories": ["Analyse-Anbieter (AVV)"],
|
||||
"retention_period": "14 Monate",
|
||||
"tom_description": "IP-Anonymisierung, Cookie-Consent (TDDDG §25)",
|
||||
},
|
||||
{
|
||||
"name": "Bewerbermanagement",
|
||||
"purposes": ["Bearbeitung von Bewerbungen"],
|
||||
"legal_bases": ["Art. 6 Abs. 1 lit. b DS-GVO i.V.m. §26 BDSG"],
|
||||
"data_subject_categories": ["Bewerber"],
|
||||
"personal_data_categories": ["Kontaktdaten", "Lebenslauf", "Qualifikationen"],
|
||||
"recipient_categories": ["Fachabteilung"],
|
||||
"retention_period": "6 Monate nach Verfahrensabschluss (AGG)",
|
||||
"tom_description": "Zugriffsschutz Bewerbungsportal, verschluesselte Uebertragung",
|
||||
},
|
||||
{
|
||||
"name": "Mitarbeiterverwaltung / HR",
|
||||
"purposes": ["Personalverwaltung, Lohnabrechnung, Arbeitszeiterfassung"],
|
||||
"legal_bases": ["Art. 6 Abs. 1 lit. b/c DS-GVO i.V.m. §26 BDSG"],
|
||||
"data_subject_categories": ["Beschaeftigte"],
|
||||
"personal_data_categories": ["Stammdaten", "Vertragsdaten", "Bankverbindung", "Arbeitszeiten"],
|
||||
"recipient_categories": ["Lohnbuero (AVV)", "Finanzamt", "Sozialversicherungstraeger"],
|
||||
"retention_period": "10 Jahre nach Austritt",
|
||||
"tom_description": "Besonderer Zugriffsschutz (nur HR), verschluesselte Speicherung",
|
||||
},
|
||||
{
|
||||
"name": "Support-Ticketing",
|
||||
"purposes": ["Bearbeitung von Kundenanfragen und Stoerungsmeldungen"],
|
||||
"legal_bases": ["Art. 6 Abs. 1 lit. b DS-GVO"],
|
||||
"data_subject_categories": ["Kunden", "Endnutzer"],
|
||||
"personal_data_categories": ["Kontaktdaten", "Ticket-Inhalt", "Systemlogs"],
|
||||
"recipient_categories": ["Support-Tool-Anbieter (AVV)"],
|
||||
"retention_period": "2 Jahre nach Ticket-Schliessung",
|
||||
"tom_description": "Rollenbasierter Zugriff, Pseudonymisierung in Reports",
|
||||
},
|
||||
{
|
||||
"name": "Logging und Monitoring",
|
||||
"purposes": ["Sicherheitsueberwachung, Fehleranalyse"],
|
||||
"legal_bases": ["Art. 6 Abs. 1 lit. f DS-GVO (berechtigtes Interesse: IT-Sicherheit)"],
|
||||
"data_subject_categories": ["Plattform-Nutzer", "Administratoren"],
|
||||
"personal_data_categories": ["IP-Adressen", "Zugriffszeitpunkte", "Fehlerprotokolle"],
|
||||
"recipient_categories": ["Log-Management-Anbieter (AVV)"],
|
||||
"retention_period": "30 Tage Anwendungslogs, 90 Tage Sicherheitslogs",
|
||||
"tom_description": "Zugriffsschutz Logdaten, automatische Rotation",
|
||||
},
|
||||
],
|
||||
"gesundheitswesen": [
|
||||
{
|
||||
"name": "Patientenverwaltung",
|
||||
"purposes": ["Patientenakte, Behandlungsdokumentation"],
|
||||
"legal_bases": ["Art. 9 Abs. 2 lit. h DS-GVO i.V.m. §630f BGB"],
|
||||
"data_subject_categories": ["Patienten"],
|
||||
"personal_data_categories": ["Stammdaten", "Versicherung", "Diagnosen", "Befunde (Art. 9)"],
|
||||
"recipient_categories": ["PVS-Anbieter (AVV)", "Labor (AVV)", "ueberweisende Aerzte"],
|
||||
"retention_period": "10 Jahre nach letzter Behandlung (§630f BGB)",
|
||||
"tom_description": "Verschluesselung Patientenakte, Notfallzugriff",
|
||||
"dpia_required": True,
|
||||
},
|
||||
{
|
||||
"name": "Abrechnung (KV/PKV)",
|
||||
"purposes": ["Abrechnung aerztlicher Leistungen"],
|
||||
"legal_bases": ["Art. 6 Abs. 1 lit. c DS-GVO", "Art. 9 Abs. 2 lit. h"],
|
||||
"data_subject_categories": ["Patienten"],
|
||||
"personal_data_categories": ["Stammdaten", "Versicherung", "Diagnosen (ICD)", "Leistungsziffern"],
|
||||
"recipient_categories": ["KV", "PKV", "Abrechnungsstelle (AVV)"],
|
||||
"retention_period": "10 Jahre (AO)",
|
||||
"tom_description": "Verschluesselte Uebermittlung (KV-Connect/KIM)",
|
||||
},
|
||||
],
|
||||
"handel": [
|
||||
{
|
||||
"name": "Bestellabwicklung",
|
||||
"purposes": ["Bestellannahme, Versand, Rechnungsstellung"],
|
||||
"legal_bases": ["Art. 6 Abs. 1 lit. b DS-GVO"],
|
||||
"data_subject_categories": ["Kunden (Besteller)"],
|
||||
"personal_data_categories": ["Kontaktdaten", "Lieferadresse", "Bestelldaten", "Rechnungsdaten"],
|
||||
"recipient_categories": ["Versanddienstleister", "Zahlungsanbieter (AVV)"],
|
||||
"retention_period": "10 Jahre Rechnungen, 3 Jahre Bestelldaten",
|
||||
"tom_description": "Verschluesselte Uebertragung, Zugriffsschutz",
|
||||
},
|
||||
{
|
||||
"name": "Kundenkonto",
|
||||
"purposes": ["Bereitstellung Kundenkonto (optional)"],
|
||||
"legal_bases": ["Art. 6 Abs. 1 lit. a/b DS-GVO"],
|
||||
"data_subject_categories": ["Registrierte Kunden"],
|
||||
"personal_data_categories": ["Stammdaten", "Passwort (gehasht)", "Bestellhistorie"],
|
||||
"recipient_categories": ["Shop-Plattform (AVV)"],
|
||||
"retention_period": "Sofort nach Kontoloesch-Anfrage, Rechnungen 10 Jahre",
|
||||
"tom_description": "MFA-Option, bcrypt Passwortspeicherung, Gastzugang-Alternative",
|
||||
},
|
||||
{
|
||||
"name": "Zahlungsabwicklung",
|
||||
"purposes": ["Abwicklung von Zahlungsvorgaengen"],
|
||||
"legal_bases": ["Art. 6 Abs. 1 lit. b DS-GVO"],
|
||||
"data_subject_categories": ["Zahlende Kunden"],
|
||||
"personal_data_categories": ["Zahlungsart", "Transaktionsdaten"],
|
||||
"recipient_categories": ["Payment-Service-Provider"],
|
||||
"retention_period": "10 Jahre (AO)",
|
||||
"tom_description": "PCI-DSS, Tokenisierung, keine direkte Kartenspeicherung",
|
||||
},
|
||||
],
|
||||
"handwerk": [
|
||||
{
|
||||
"name": "Kundenauftraege und Angebotserstellung",
|
||||
"purposes": ["Angebotserstellung, Auftragsabwicklung, Rechnungsstellung"],
|
||||
"legal_bases": ["Art. 6 Abs. 1 lit. b DS-GVO"],
|
||||
"data_subject_categories": ["Kunden (Privat/Gewerbe)"],
|
||||
"personal_data_categories": ["Kontaktdaten", "Objektadresse", "Auftrag", "Rechnungsdaten"],
|
||||
"recipient_categories": ["Steuerberater", "ggf. Subunternehmer"],
|
||||
"retention_period": "10 Jahre Rechnungen, 5 Jahre Gewaehrleistung",
|
||||
"tom_description": "Zugriffskontrolle Auftragssystem",
|
||||
},
|
||||
{
|
||||
"name": "Baustellendokumentation",
|
||||
"purposes": ["Dokumentation Baufortschritt, Maengelprotokoll"],
|
||||
"legal_bases": ["Art. 6 Abs. 1 lit. b/f DS-GVO"],
|
||||
"data_subject_categories": ["Kunden", "Mitarbeitende"],
|
||||
"personal_data_categories": ["Fotos", "Protokolle", "Abnahmedokumente"],
|
||||
"recipient_categories": ["Auftraggeber", "Architekten"],
|
||||
"retention_period": "5 Jahre nach Abnahme",
|
||||
"tom_description": "Projektordner mit Zugriffsbeschraenkung",
|
||||
},
|
||||
],
|
||||
"bildung": [
|
||||
{
|
||||
"name": "Schueler-/Studierendenverwaltung",
|
||||
"purposes": ["Verwaltung von Schueler-/Studierendendaten"],
|
||||
"legal_bases": ["Art. 6 Abs. 1 lit. c/e DS-GVO i.V.m. Schulgesetz"],
|
||||
"data_subject_categories": ["Schueler/Studierende (ggf. Minderjaehrige)", "Erziehungsberechtigte"],
|
||||
"personal_data_categories": ["Stammdaten", "Kontaktdaten Erziehungsberechtigte"],
|
||||
"recipient_categories": ["Schulverwaltungssoftware (AVV)", "Schulbehoerde"],
|
||||
"retention_period": "Gemaess Schulgesetz (i.d.R. 5 Jahre nach Abgang)",
|
||||
"tom_description": "Besonderer Zugriffsschutz, Einwilligung Erziehungsberechtigte",
|
||||
"dpia_required": True,
|
||||
},
|
||||
{
|
||||
"name": "Notenverarbeitung",
|
||||
"purposes": ["Leistungsbewertung, Zeugniserstellung"],
|
||||
"legal_bases": ["Art. 6 Abs. 1 lit. c/e DS-GVO i.V.m. Schulgesetz"],
|
||||
"data_subject_categories": ["Schueler/Studierende"],
|
||||
"personal_data_categories": ["Noten", "Leistungsbewertungen", "Pruefungsergebnisse"],
|
||||
"recipient_categories": ["Lehrkraefte", "Schulleitung"],
|
||||
"retention_period": "Zeugniskopien 50 Jahre, Einzelnoten 2 Jahre",
|
||||
"tom_description": "Zugriffsbeschraenkung auf Fachlehrkraft, verschluesselt",
|
||||
},
|
||||
],
|
||||
"beratung": [
|
||||
{
|
||||
"name": "Mandantenverwaltung",
|
||||
"purposes": ["Verwaltung von Mandantenbeziehungen"],
|
||||
"legal_bases": ["Art. 6 Abs. 1 lit. b DS-GVO"],
|
||||
"data_subject_categories": ["Mandanten", "Ansprechpartner"],
|
||||
"personal_data_categories": ["Kontaktdaten", "Vertragsdaten", "Korrespondenz"],
|
||||
"recipient_categories": ["Kanzleisoftware (AVV)", "Steuerberater"],
|
||||
"retention_period": "10 Jahre Rechnungen, 5 Jahre Handakten",
|
||||
"tom_description": "Mandantengeheimnis, Need-to-know-Prinzip",
|
||||
},
|
||||
{
|
||||
"name": "Projektmanagement",
|
||||
"purposes": ["Planung und Steuerung von Beratungsprojekten"],
|
||||
"legal_bases": ["Art. 6 Abs. 1 lit. b/f DS-GVO"],
|
||||
"data_subject_categories": ["Projektbeteiligte"],
|
||||
"personal_data_categories": ["Projektdaten", "Aufgaben", "Zeiterfassung"],
|
||||
"recipient_categories": ["PM-Tool (AVV)", "Mandant"],
|
||||
"retention_period": "2 Jahre nach Projektabschluss",
|
||||
"tom_description": "Projektspezifische Zugriffsrechte, Mandantentrennung",
|
||||
},
|
||||
{
|
||||
"name": "Zeiterfassung und Abrechnung",
|
||||
"purposes": ["Stundenerfassung, Abrechnung gegenueber Mandanten"],
|
||||
"legal_bases": ["Art. 6 Abs. 1 lit. b DS-GVO"],
|
||||
"data_subject_categories": ["Berater/Mitarbeitende", "Mandanten"],
|
||||
"personal_data_categories": ["Arbeitszeiten", "Taetigkeitsbeschreibungen", "Stundensaetze"],
|
||||
"recipient_categories": ["Abrechnungssystem (AVV)", "Buchhaltung"],
|
||||
"retention_period": "10 Jahre (AO)",
|
||||
"tom_description": "Zugriff nur eigene Zeiten + Projektleitung",
|
||||
},
|
||||
],
|
||||
}
|
||||
|
||||
# Industry -> Sector mapping
|
||||
INDUSTRY_SECTOR_MAP = {
|
||||
"technologie": "it_saas", "it": "it_saas", "saas": "it_saas", "software": "it_saas",
|
||||
"it dienstleistungen": "it_saas",
|
||||
"gesundheit": "gesundheitswesen", "pharma": "gesundheitswesen",
|
||||
"e-commerce": "handel", "handel": "handel", "einzelhandel": "handel",
|
||||
"handwerk": "handwerk", "bau": "handwerk", "kfz": "handwerk",
|
||||
"bildung": "bildung", "schule": "bildung", "hochschule": "bildung",
|
||||
"beratung": "beratung", "consulting": "beratung", "kanzlei": "beratung",
|
||||
"recht": "beratung",
|
||||
}
|
||||
|
||||
|
||||
def generate_vvt_drafts(ctx: dict) -> list[dict]:
|
||||
"""Generate VVT activity drafts, sector-specific if possible.
|
||||
|
||||
Args:
|
||||
ctx: Flat dict from company-profile/template-context.
|
||||
|
||||
Returns:
|
||||
List of VVT activity dicts ready for creation.
|
||||
"""
|
||||
company = ctx.get("company_name", "Unbekannt")
|
||||
dpo = ctx.get("dpo_name", "")
|
||||
sector = _detect_sector(ctx)
|
||||
|
||||
# Use sector-specific activities if available, else generate from systems
|
||||
if sector and sector in SECTOR_ACTIVITIES:
|
||||
activities = _generate_sector_vvt(ctx, sector, company, dpo)
|
||||
else:
|
||||
activities = _generate_system_vvt(ctx, company, dpo)
|
||||
|
||||
# Always add standard HR activity if not already present
|
||||
has_hr = any("mitarbeiter" in a.get("name", "").lower() or "hr" in a.get("name", "").lower()
|
||||
for a in activities)
|
||||
if not has_hr and len(activities) > 0:
|
||||
activities.append(_build_hr_activity(len(activities) + 1, company, dpo))
|
||||
|
||||
return activities
|
||||
|
||||
|
||||
def _detect_sector(ctx: dict) -> Optional[str]:
|
||||
industry = (ctx.get("industry") or "").lower().strip()
|
||||
for keyword, sector in INDUSTRY_SECTOR_MAP.items():
|
||||
if keyword in industry:
|
||||
return sector
|
||||
return None
|
||||
|
||||
|
||||
def _generate_sector_vvt(ctx: dict, sector: str, company: str, dpo: str) -> list[dict]:
|
||||
activities = []
|
||||
sector_data = SECTOR_ACTIVITIES[sector]
|
||||
|
||||
for i, template in enumerate(sector_data, 1):
|
||||
activity = {
|
||||
"vvt_id": f"VVT-{sector.upper()[:3]}-{i:03d}",
|
||||
"name": template["name"],
|
||||
"description": f"Automatisch generierter VVT-Eintrag: {template['name']}",
|
||||
"purposes": template["purposes"],
|
||||
"legal_bases": template["legal_bases"],
|
||||
"data_subject_categories": template["data_subject_categories"],
|
||||
"personal_data_categories": template["personal_data_categories"],
|
||||
"recipient_categories": template["recipient_categories"],
|
||||
"third_country_transfers": _assess_third_country(ctx),
|
||||
"retention_period": {"default": template["retention_period"]},
|
||||
"tom_description": template["tom_description"],
|
||||
"business_function": _infer_business_function(template["name"]),
|
||||
"systems": [],
|
||||
"protection_level": "HIGH" if template.get("dpia_required") else "MEDIUM",
|
||||
"dpia_required": template.get("dpia_required", False),
|
||||
"status": "DRAFT",
|
||||
"responsible": dpo or company,
|
||||
"source_sector": sector,
|
||||
}
|
||||
activities.append(activity)
|
||||
|
||||
return activities
|
||||
|
||||
|
||||
def _generate_system_vvt(ctx: dict, company: str, dpo: str) -> list[dict]:
|
||||
"""Fallback: generate VVT per processing system (original approach)."""
|
||||
systems = ctx.get("processing_systems", [])
|
||||
activities = []
|
||||
|
||||
for i, system in enumerate(systems, 1):
|
||||
name = system.get("name", f"System {i}")
|
||||
vendor = system.get("vendor", "")
|
||||
hosting = system.get("hosting", "on-premise")
|
||||
categories = system.get("personal_data_categories", [])
|
||||
|
||||
activity = {
|
||||
"vvt_id": f"VVT-SYS-{i:03d}",
|
||||
"name": f"Verarbeitung in {name}",
|
||||
"description": f"VVT-Eintrag fuer System '{name}'"
|
||||
+ (f" (Anbieter: {vendor})" if vendor else ""),
|
||||
"purposes": [f"Datenverarbeitung via {name}"],
|
||||
"legal_bases": ["Art. 6 Abs. 1 lit. b DS-GVO (Vertragserfullung)"],
|
||||
"data_subject_categories": [],
|
||||
"personal_data_categories": categories,
|
||||
"recipient_categories": [vendor] if vendor else [],
|
||||
"third_country_transfers": _assess_third_country_hosting(hosting),
|
||||
"retention_period": {"default": "Gemaess Loeschfristenkatalog"},
|
||||
"tom_description": f"Siehe TOM-Katalog fuer {name}",
|
||||
"business_function": "IT",
|
||||
"systems": [name],
|
||||
"deployment_model": hosting,
|
||||
"protection_level": "HIGH" if len(categories) > 3 else "MEDIUM",
|
||||
"dpia_required": len(categories) > 3,
|
||||
"status": "DRAFT",
|
||||
"responsible": dpo or company,
|
||||
}
|
||||
activities.append(activity)
|
||||
|
||||
return activities
|
||||
|
||||
|
||||
def _build_hr_activity(index: int, company: str, dpo: str) -> dict:
|
||||
return {
|
||||
"vvt_id": f"VVT-STD-{index:03d}",
|
||||
"name": "Mitarbeiterverwaltung / HR",
|
||||
"description": "Standard-Verarbeitungstaetigkeit Personalverwaltung",
|
||||
"purposes": ["Personalverwaltung, Lohnabrechnung, Arbeitszeiterfassung"],
|
||||
"legal_bases": ["Art. 6 Abs. 1 lit. b/c DS-GVO i.V.m. §26 BDSG"],
|
||||
"data_subject_categories": ["Beschaeftigte"],
|
||||
"personal_data_categories": ["Stammdaten", "Vertragsdaten", "Bankverbindung", "Arbeitszeiten"],
|
||||
"recipient_categories": ["Lohnbuero (AVV)", "Finanzamt", "Sozialversicherungstraeger"],
|
||||
"third_country_transfers": [],
|
||||
"retention_period": {"default": "10 Jahre nach Austritt"},
|
||||
"tom_description": "Besonderer Zugriffsschutz (nur HR), verschluesselte Speicherung",
|
||||
"business_function": "HR",
|
||||
"systems": [],
|
||||
"protection_level": "HIGH",
|
||||
"dpia_required": False,
|
||||
"status": "DRAFT",
|
||||
"responsible": dpo or company,
|
||||
}
|
||||
|
||||
|
||||
def _assess_third_country(ctx: dict) -> list:
|
||||
if ctx.get("third_country_transfer"):
|
||||
return [{"country": "Abhaengig von Dienstleister", "mechanism": "Pruefung erforderlich"}]
|
||||
return []
|
||||
|
||||
|
||||
def _assess_third_country_hosting(hosting: str) -> list:
|
||||
if hosting in ("us-cloud", "international"):
|
||||
return [{"country": "USA", "mechanism": "EU-US Data Privacy Framework"}]
|
||||
return []
|
||||
|
||||
|
||||
def _infer_business_function(name: str) -> str:
|
||||
name_lower = name.lower()
|
||||
if any(kw in name_lower for kw in ["mitarbeiter", "hr", "personal", "bewerbung"]):
|
||||
return "HR"
|
||||
if any(kw in name_lower for kw in ["abrechnung", "rechnung", "zahlung", "buchhaltung"]):
|
||||
return "Finanzen"
|
||||
if any(kw in name_lower for kw in ["marketing", "newsletter", "webanalyse", "crm", "akquise"]):
|
||||
return "Marketing/Vertrieb"
|
||||
if any(kw in name_lower for kw in ["support", "ticket", "kundenservice"]):
|
||||
return "Support"
|
||||
if any(kw in name_lower for kw in ["patient", "befund", "labor", "termin"]):
|
||||
return "Medizin"
|
||||
if any(kw in name_lower for kw in ["schueler", "noten", "lernplattform"]):
|
||||
return "Paedagogik"
|
||||
return "IT"
|
||||
405
document-templates/migrations/001_dsfa_template_v2.sql
Normal file
405
document-templates/migrations/001_dsfa_template_v2.sql
Normal file
@@ -0,0 +1,405 @@
|
||||
-- Migration 001: DSFA Template V2 — Datenschutz-Folgenabschaetzung
|
||||
-- Archiviert V1 (aus Migration 025) und fuegt erweiterte V2 ein.
|
||||
-- Zielrepo: breakpilot-compliance (spaetere Integration)
|
||||
|
||||
-- 1. Bestehende V1 archivieren
|
||||
UPDATE compliance.compliance_legal_templates
|
||||
SET status = 'archived', updated_at = NOW()
|
||||
WHERE document_type = 'dsfa'
|
||||
AND status = 'published';
|
||||
|
||||
-- 2. DSFA V2 einfuegen
|
||||
INSERT INTO compliance.compliance_legal_templates (
|
||||
tenant_id, document_type, title, description, language, jurisdiction,
|
||||
version, status, license_name, source_name, attribution_required,
|
||||
is_complete_document, placeholders, content
|
||||
) VALUES (
|
||||
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
|
||||
'dsfa',
|
||||
'Datenschutz-Folgenabschaetzung (DSFA) gemaess Art. 35 DSGVO — V2',
|
||||
'Erweiterte Vorlage fuer eine Datenschutz-Folgenabschaetzung mit Schwellwertanalyse (WP248), SDM-basierter TOM-Struktur, strukturierter Risikobewertung nach ISO 29134 und KI-Modul (EU AI Act). Geeignet fuer alle Verarbeitungen, die einer DSFA beduerfen.',
|
||||
'de',
|
||||
'EU/DSGVO',
|
||||
'2.0',
|
||||
'published',
|
||||
'MIT',
|
||||
'BreakPilot Compliance',
|
||||
false,
|
||||
true,
|
||||
CAST('[
|
||||
"{{ORGANISATION_NAME}}",
|
||||
"{{ORGANISATION_ADRESSE}}",
|
||||
"{{DSB_NAME}}",
|
||||
"{{DSB_KONTAKT}}",
|
||||
"{{BUNDESLAND}}",
|
||||
"{{AUFSICHTSBEHOERDE}}",
|
||||
"{{ERSTELLT_VON}}",
|
||||
"{{ERSTELLT_AM}}",
|
||||
"{{GENEHMIGT_VON}}",
|
||||
"{{GENEHMIGT_AM}}",
|
||||
"{{WP248_K1_BEWERTUNG_SCORING}}",
|
||||
"{{WP248_K2_AUTOMATISIERTE_ENTSCHEIDUNG}}",
|
||||
"{{WP248_K3_SYSTEMATISCHE_UEBERWACHUNG}}",
|
||||
"{{WP248_K4_SENSIBLE_DATEN}}",
|
||||
"{{WP248_K5_GROSSER_UMFANG}}",
|
||||
"{{WP248_K6_DATENVERKNUEPFUNG}}",
|
||||
"{{WP248_K7_SCHUTZBEDUERFTIGE_BETROFFENE}}",
|
||||
"{{WP248_K8_INNOVATIVE_TECHNOLOGIE}}",
|
||||
"{{WP248_K9_RECHTSAUSUEBUNG_HINDERT}}",
|
||||
"{{SCHWELLWERT_ERGEBNIS}}",
|
||||
"{{MUSS_LISTEN_REFERENZ}}",
|
||||
"{{VERARBEITUNG_TITEL}}",
|
||||
"{{VERARBEITUNG_BESCHREIBUNG}}",
|
||||
"{{VERARBEITUNG_UMFANG}}",
|
||||
"{{VERARBEITUNG_KONTEXT}}",
|
||||
"{{VERARBEITUNGSMITTEL}}",
|
||||
"{{ZWECK_VERARBEITUNG}}",
|
||||
"{{RECHTSGRUNDLAGE}}",
|
||||
"{{RECHTSGRUNDLAGE_DETAILS}}",
|
||||
"{{DATENKATEGORIEN}}",
|
||||
"{{BETROFFENENGRUPPEN}}",
|
||||
"{{EMPFAENGER}}",
|
||||
"{{DRITTLANDTRANSFER}}",
|
||||
"{{SPEICHERDAUER}}",
|
||||
"{{GEMEINSAME_VERANTWORTUNG_DETAILS}}",
|
||||
"{{AUFTRAGSVERARBEITER_DETAILS}}",
|
||||
"{{NOTWENDIGKEIT_BEWERTUNG}}",
|
||||
"{{VERHAELTNISMAESSIGKEIT_BEWERTUNG}}",
|
||||
"{{DATENMINIMIERUNG_NACHWEIS}}",
|
||||
"{{ALTERNATIVEN_GEPRUEFT}}",
|
||||
"{{SPEICHERBEGRENZUNG_NACHWEIS}}",
|
||||
"{{RISIKO_METHODIK}}",
|
||||
"{{RISIKEN_TABELLE}}",
|
||||
"{{GESAMT_RISIKO_NIVEAU}}",
|
||||
"{{KONSULTATION_BETROFFENE}}",
|
||||
"{{KONSULTATION_BETRIEBSRAT}}",
|
||||
"{{TOM_VERFUEGBARKEIT}}",
|
||||
"{{TOM_INTEGRITAET}}",
|
||||
"{{TOM_VERTRAULICHKEIT}}",
|
||||
"{{TOM_NICHTVERKETTUNG}}",
|
||||
"{{TOM_TRANSPARENZ}}",
|
||||
"{{TOM_INTERVENIERBARKEIT}}",
|
||||
"{{TOM_DATENMINIMIERUNG}}",
|
||||
"{{DSB_STELLUNGNAHME}}",
|
||||
"{{DSB_DATUM}}",
|
||||
"{{ART36_BEGRUENDUNG}}",
|
||||
"{{DSFA_ERGEBNIS}}",
|
||||
"{{RESTRISIKO_BEWERTUNG}}",
|
||||
"{{UEBERPRUFUNGSINTERVALL}}",
|
||||
"{{NAECHSTE_UEBERPRUFUNG}}",
|
||||
"{{AENDERUNGSTRIGGER}}",
|
||||
"{{KI_SYSTEME_DETAILS}}",
|
||||
"{{KI_GRUNDRECHTSPRUEFUNG}}"
|
||||
]' AS jsonb),
|
||||
$template$# Datenschutz-Folgenabschaetzung (DSFA)
|
||||
**gemaess Art. 35 DS-GVO**
|
||||
|
||||
---
|
||||
|
||||
## 0. Schwellwertanalyse
|
||||
|
||||
Vor Durchfuehrung einer vollstaendigen DSFA ist zu pruefen, ob die geplante Verarbeitung eine solche erfordert. Die Pruefung erfolgt anhand der neun Kriterien der WP29/EDPB-Leitlinien (WP 248 rev.01) sowie der Muss-Liste der zustaendigen Aufsichtsbehoerde.
|
||||
|
||||
### 0.1 WP248-Kriterien (Art. 29-Datenschutzgruppe)
|
||||
|
||||
Sobald mindestens **zwei** der folgenden Kriterien zutreffen, ist eine DSFA in der Regel erforderlich.
|
||||
|
||||
| Nr. | Kriterium | Zutreffend? | Begruendung |
|
||||
|-----|-----------|-------------|-------------|
|
||||
| K1 | Bewertung oder Scoring (einschl. Profiling und Prognose) | {{WP248_K1_BEWERTUNG_SCORING}} | |
|
||||
| K2 | Automatisierte Entscheidungsfindung mit Rechtswirkung oder aehnlich erheblicher Wirkung | {{WP248_K2_AUTOMATISIERTE_ENTSCHEIDUNG}} | |
|
||||
| K3 | Systematische Ueberwachung von Personen | {{WP248_K3_SYSTEMATISCHE_UEBERWACHUNG}} | |
|
||||
| K4 | Verarbeitung sensibler Daten oder hoechst persoenlicher Daten (Art. 9, 10 DS-GVO) | {{WP248_K4_SENSIBLE_DATEN}} | |
|
||||
| K5 | Datenverarbeitung in grossem Umfang | {{WP248_K5_GROSSER_UMFANG}} | |
|
||||
| K6 | Verknuepfung oder Zusammenfuehrung von Datenbestaenden | {{WP248_K6_DATENVERKNUEPFUNG}} | |
|
||||
| K7 | Daten zu schutzbeduerftigen Betroffenen (Kinder, Beschaeftigte, Patienten) | {{WP248_K7_SCHUTZBEDUERFTIGE_BETROFFENE}} | |
|
||||
| K8 | Innovative Nutzung oder Anwendung neuer technologischer Loesungen | {{WP248_K8_INNOVATIVE_TECHNOLOGIE}} | |
|
||||
| K9 | Verarbeitung, die Betroffene an der Ausuebung eines Rechts oder der Nutzung einer Dienstleistung hindert | {{WP248_K9_RECHTSAUSUEBUNG_HINDERT}} | |
|
||||
|
||||
### 0.2 Muss-Liste der Aufsichtsbehoerde
|
||||
|
||||
**Bundesland:** {{BUNDESLAND}}
|
||||
**Zustaendige Aufsichtsbehoerde:** {{AUFSICHTSBEHOERDE}}
|
||||
**Referenz:** {{MUSS_LISTEN_REFERENZ}}
|
||||
|
||||
### 0.3 Ergebnis der Schwellwertanalyse
|
||||
|
||||
{{SCHWELLWERT_ERGEBNIS}}
|
||||
|
||||
---
|
||||
|
||||
## 1. Allgemeine Informationen und Verarbeitungsbeschreibung
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Organisation** | {{ORGANISATION_NAME}} |
|
||||
| **Adresse** | {{ORGANISATION_ADRESSE}} |
|
||||
| **Datenschutzbeauftragter** | {{DSB_NAME}} |
|
||||
| **DSB-Kontakt** | {{DSB_KONTAKT}} |
|
||||
| **Erstellt von** | {{ERSTELLT_VON}} |
|
||||
| **Erstellt am** | {{ERSTELLT_AM}} |
|
||||
{{#IF GENEHMIGT_VON}}| **Genehmigt von** | {{GENEHMIGT_VON}} |
|
||||
| **Genehmigt am** | {{GENEHMIGT_AM}} |
|
||||
{{/IF}}
|
||||
|
||||
### 1.1 Bezeichnung der Verarbeitungstaetigkeit
|
||||
|
||||
**{{VERARBEITUNG_TITEL}}**
|
||||
|
||||
### 1.2 Beschreibung der Verarbeitung
|
||||
|
||||
{{VERARBEITUNG_BESCHREIBUNG}}
|
||||
|
||||
### 1.3 Umfang und Kontext
|
||||
|
||||
| Aspekt | Beschreibung |
|
||||
|--------|--------------|
|
||||
| **Umfang** | {{VERARBEITUNG_UMFANG}} |
|
||||
| **Kontext** | {{VERARBEITUNG_KONTEXT}} |
|
||||
| **Eingesetzte Verarbeitungsmittel** | {{VERARBEITUNGSMITTEL}} |
|
||||
|
||||
### 1.4 Zweck der Verarbeitung
|
||||
|
||||
{{ZWECK_VERARBEITUNG}}
|
||||
|
||||
### 1.5 Rechtsgrundlage
|
||||
|
||||
**Rechtsgrundlage:** {{RECHTSGRUNDLAGE}}
|
||||
|
||||
{{#IF RECHTSGRUNDLAGE_DETAILS}}
|
||||
**Erlaeuterung:** {{RECHTSGRUNDLAGE_DETAILS}}
|
||||
{{/IF}}
|
||||
|
||||
### 1.6 Verarbeitete Datenkategorien
|
||||
|
||||
{{DATENKATEGORIEN}}
|
||||
|
||||
### 1.7 Betroffene Personengruppen
|
||||
|
||||
{{BETROFFENENGRUPPEN}}
|
||||
|
||||
### 1.8 Empfaenger und Auftragsverarbeiter
|
||||
|
||||
{{EMPFAENGER}}
|
||||
|
||||
{{#IF DRITTLANDTRANSFER}}
|
||||
### 1.9 Uebermittlung in Drittlaender
|
||||
|
||||
{{DRITTLANDTRANSFER}}
|
||||
{{/IF}}
|
||||
|
||||
### 1.10 Speicherdauer und Loeschfristen
|
||||
|
||||
{{SPEICHERDAUER}}
|
||||
|
||||
{{#IF GEMEINSAME_VERANTWORTUNG_DETAILS}}
|
||||
### 1.11 Gemeinsame Verantwortlichkeit (Art. 26 DS-GVO)
|
||||
|
||||
{{GEMEINSAME_VERANTWORTUNG_DETAILS}}
|
||||
{{/IF}}
|
||||
|
||||
{{#IF AUFTRAGSVERARBEITER_DETAILS}}
|
||||
### 1.12 Auftragsverarbeitung (Art. 28 DS-GVO)
|
||||
|
||||
{{AUFTRAGSVERARBEITER_DETAILS}}
|
||||
{{/IF}}
|
||||
|
||||
---
|
||||
|
||||
## 2. Notwendigkeit und Verhaeltnismaessigkeit
|
||||
|
||||
### 2.1 Notwendigkeit der Verarbeitung
|
||||
|
||||
{{NOTWENDIGKEIT_BEWERTUNG}}
|
||||
|
||||
### 2.2 Verhaeltnismaessigkeit
|
||||
|
||||
{{VERHAELTNISMAESSIGKEIT_BEWERTUNG}}
|
||||
|
||||
### 2.3 Pruefung der Grundsaetze (Art. 5 DS-GVO)
|
||||
|
||||
| Grundsatz | Einhaltung | Nachweis |
|
||||
|-----------|------------|----------|
|
||||
| **Zweckbindung** (Art. 5 Abs. 1 lit. b) | Die Verarbeitung erfolgt ausschliesslich fuer die angegebenen Zwecke. | Siehe Abschnitt 1.4 |
|
||||
| **Datenminimierung** (Art. 5 Abs. 1 lit. c) | {{DATENMINIMIERUNG_NACHWEIS}} | |
|
||||
| **Richtigkeit** (Art. 5 Abs. 1 lit. d) | Verfahren zur Sicherstellung der Datenqualitaet sind implementiert. | |
|
||||
| **Speicherbegrenzung** (Art. 5 Abs. 1 lit. e) | {{SPEICHERBEGRENZUNG_NACHWEIS}} | |
|
||||
| **Integritaet und Vertraulichkeit** (Art. 5 Abs. 1 lit. f) | Technische und organisatorische Massnahmen gemaess Abschnitt 5 umgesetzt. | Siehe Abschnitt 5 |
|
||||
|
||||
### 2.4 Pruefung alternativer Verarbeitungsmoeglichkeiten
|
||||
|
||||
{{ALTERNATIVEN_GEPRUEFT}}
|
||||
|
||||
---
|
||||
|
||||
## 3. Risikobewertung
|
||||
|
||||
### 3.1 Methodik
|
||||
|
||||
{{RISIKO_METHODIK}}
|
||||
|
||||
Die Risikobewertung erfolgt anhand zweier Dimensionen:
|
||||
- **Schwere des Schadens** fuer die Betroffenen (gering / ueberschaubar / substanziell / gross)
|
||||
- **Eintrittswahrscheinlichkeit** (gering / mittel / hoch / sehr hoch)
|
||||
|
||||
| | Schwere: Gering | Schwere: Ueberschaubar | Schwere: Substanziell | Schwere: Gross |
|
||||
|---|---|---|---|---|
|
||||
| **Wahrscheinlichkeit: Sehr hoch** | Mittel | Hoch | Sehr hoch | Sehr hoch |
|
||||
| **Wahrscheinlichkeit: Hoch** | Niedrig | Mittel | Hoch | Sehr hoch |
|
||||
| **Wahrscheinlichkeit: Mittel** | Niedrig | Niedrig | Mittel | Hoch |
|
||||
| **Wahrscheinlichkeit: Gering** | Niedrig | Niedrig | Niedrig | Mittel |
|
||||
|
||||
### 3.2 Identifizierte Risiken
|
||||
|
||||
{{RISIKEN_TABELLE}}
|
||||
|
||||
### 3.3 Gesamtrisikobewertung
|
||||
|
||||
{{GESAMT_RISIKO_NIVEAU}}
|
||||
|
||||
---
|
||||
|
||||
## 4. Konsultation der Betroffenen und Interessentraeger
|
||||
|
||||
### 4.1 Konsultation der Betroffenen (Art. 35 Abs. 9 DS-GVO)
|
||||
|
||||
{{#IF KONSULTATION_BETROFFENE}}
|
||||
{{KONSULTATION_BETROFFENE}}
|
||||
{{/IF}}
|
||||
{{#IF_NOT KONSULTATION_BETROFFENE}}
|
||||
Eine Konsultation der Betroffenen wurde nicht durchgefuehrt. Begruendung: [Bitte ergaenzen — z. B. Unverhaeltnismaessigkeit, Geheimhaltungsinteressen, fehlende Praktikabilitaet].
|
||||
{{/IF_NOT}}
|
||||
|
||||
{{#IF KONSULTATION_BETRIEBSRAT}}
|
||||
### 4.2 Beteiligung der Arbeitnehmervertretung
|
||||
|
||||
{{KONSULTATION_BETRIEBSRAT}}
|
||||
{{/IF}}
|
||||
|
||||
---
|
||||
|
||||
## 5. Technische und organisatorische Massnahmen (TOM)
|
||||
|
||||
Die Massnahmen sind nach den sieben Gewaehrleistungszielen des Standard-Datenschutzmodells (SDM V3.1a) strukturiert.
|
||||
|
||||
### 5.1 Verfuegbarkeit
|
||||
|
||||
Ziel: Personenbezogene Daten stehen zeitgerecht zur Verfuegung und koennen ordnungsgemaess verarbeitet werden.
|
||||
|
||||
{{TOM_VERFUEGBARKEIT}}
|
||||
|
||||
### 5.2 Integritaet
|
||||
|
||||
Ziel: Personenbezogene Daten bleiben waehrend der Verarbeitung unversehrt, vollstaendig und aktuell.
|
||||
|
||||
{{TOM_INTEGRITAET}}
|
||||
|
||||
### 5.3 Vertraulichkeit
|
||||
|
||||
Ziel: Nur befugte Personen koennen personenbezogene Daten zur Kenntnis nehmen.
|
||||
|
||||
{{TOM_VERTRAULICHKEIT}}
|
||||
|
||||
### 5.4 Nichtverkettung
|
||||
|
||||
Ziel: Personenbezogene Daten werden nur fuer den Zweck verarbeitet, zu dem sie erhoben wurden.
|
||||
|
||||
{{TOM_NICHTVERKETTUNG}}
|
||||
|
||||
### 5.5 Transparenz
|
||||
|
||||
Ziel: Betroffene, der Verantwortliche und die Aufsichtsbehoerde koennen die Verarbeitung nachvollziehen.
|
||||
|
||||
{{TOM_TRANSPARENZ}}
|
||||
|
||||
### 5.6 Intervenierbarkeit
|
||||
|
||||
Ziel: Betroffenenrechte (Auskunft, Berichtigung, Loeschung, Widerspruch) koennen wirksam ausgeuebt werden.
|
||||
|
||||
{{TOM_INTERVENIERBARKEIT}}
|
||||
|
||||
### 5.7 Datenminimierung
|
||||
|
||||
Ziel: Die Verarbeitung beschraenkt sich auf das erforderliche Mass.
|
||||
|
||||
{{TOM_DATENMINIMIERUNG}}
|
||||
|
||||
---
|
||||
|
||||
## 6. Stellungnahme des Datenschutzbeauftragten
|
||||
|
||||
### 6.1 Konsultation des DSB
|
||||
|
||||
{{DSB_STELLUNGNAHME}}
|
||||
|
||||
{{#IF DSB_DATUM}}
|
||||
**Datum der Stellungnahme:** {{DSB_DATUM}}
|
||||
{{/IF}}
|
||||
|
||||
### 6.2 Pruefung der Konsultationspflicht (Art. 36 DS-GVO)
|
||||
|
||||
Sofern das Restrisiko nach Umsetzung aller Massnahmen **hoch** bleibt, ist vor Beginn der Verarbeitung die zustaendige Aufsichtsbehoerde zu konsultieren (Art. 36 Abs. 1 DS-GVO).
|
||||
|
||||
{{#IF ART36_BEGRUENDUNG}}
|
||||
{{ART36_BEGRUENDUNG}}
|
||||
{{/IF}}
|
||||
{{#IF_NOT ART36_BEGRUENDUNG}}
|
||||
Nach Umsetzung der beschriebenen Massnahmen wird das Restrisiko als akzeptabel eingestuft. Eine Konsultation der Aufsichtsbehoerde ist nicht erforderlich.
|
||||
{{/IF_NOT}}
|
||||
|
||||
---
|
||||
|
||||
## 7. Ergebnis und Ueberprufungsplan
|
||||
|
||||
### 7.1 Ergebnis der DSFA
|
||||
|
||||
{{DSFA_ERGEBNIS}}
|
||||
|
||||
### 7.2 Restrisikobewertung
|
||||
|
||||
{{RESTRISIKO_BEWERTUNG}}
|
||||
|
||||
### 7.3 Ueberprufungsplan
|
||||
|
||||
| Aspekt | Festlegung |
|
||||
|--------|------------|
|
||||
| **Regelmaessiges Ueberprufungsintervall** | {{UEBERPRUFUNGSINTERVALL}} |
|
||||
| **Naechste geplante Ueberprufung** | {{NAECHSTE_UEBERPRUFUNG}} |
|
||||
|
||||
### 7.4 Trigger fuer ausserplanmaessige Ueberprufung
|
||||
|
||||
{{AENDERUNGSTRIGGER}}
|
||||
|
||||
---
|
||||
|
||||
{{#IF KI_SYSTEME_DETAILS}}
|
||||
## 8. KI-spezifisches Modul (EU AI Act)
|
||||
|
||||
Dieses Kapitel ist relevant, da KI-Systeme in der beschriebenen Verarbeitung eingesetzt werden.
|
||||
|
||||
### 8.1 Eingesetzte KI-Systeme
|
||||
|
||||
{{KI_SYSTEME_DETAILS}}
|
||||
|
||||
### 8.2 Grundrechtliche Folgenabschaetzung (Art. 27 KI-VO)
|
||||
|
||||
{{KI_GRUNDRECHTSPRUEFUNG}}
|
||||
|
||||
{{/IF}}
|
||||
|
||||
---
|
||||
|
||||
## Unterschriften
|
||||
|
||||
| Rolle | Name | Datum | Unterschrift |
|
||||
|-------|------|-------|--------------|
|
||||
| Erstellt von | {{ERSTELLT_VON}} | {{ERSTELLT_AM}} | _________________ |
|
||||
{{#IF GENEHMIGT_VON}}| Datenschutzbeauftragter | {{GENEHMIGT_VON}} | {{GENEHMIGT_AM}} | _________________ |
|
||||
{{/IF}}
|
||||
| Verantwortlicher | | | _________________ |
|
||||
|
||||
---
|
||||
|
||||
*Erstellt mit BreakPilot Compliance. Dieses Dokument ist vertraulich und nur fuer den internen Gebrauch bestimmt.*
|
||||
$template$
|
||||
) ON CONFLICT DO NOTHING;
|
||||
247
document-templates/migrations/002_tom_sdm_template.sql
Normal file
247
document-templates/migrations/002_tom_sdm_template.sql
Normal file
@@ -0,0 +1,247 @@
|
||||
-- Migration 002: TOM Template V2 — nach SDM-Gewaehrleistungszielen
|
||||
-- Archiviert V1 und fuegt SDM-strukturierte TOM-Dokumentation ein.
|
||||
|
||||
-- 1. Bestehende V1 archivieren
|
||||
UPDATE compliance.compliance_legal_templates
|
||||
SET status = 'archived', updated_at = NOW()
|
||||
WHERE document_type = 'tom_documentation'
|
||||
AND status = 'published';
|
||||
|
||||
-- 2. TOM V2 einfuegen
|
||||
INSERT INTO compliance.compliance_legal_templates (
|
||||
tenant_id, document_type, title, description, language, jurisdiction,
|
||||
version, status, license_name, source_name, attribution_required,
|
||||
is_complete_document, placeholders, content
|
||||
) VALUES (
|
||||
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
|
||||
'tom_documentation',
|
||||
'Technische und Organisatorische Massnahmen (TOM) nach SDM V3.1a',
|
||||
'TOM-Dokumentation strukturiert nach den sieben Gewaehrleistungszielen des Standard-Datenschutzmodells (SDM V3.1a). Mit sektorspezifischen Ergaenzungen und Compliance-Bewertung.',
|
||||
'de',
|
||||
'EU/DSGVO',
|
||||
'2.0',
|
||||
'published',
|
||||
'MIT',
|
||||
'BreakPilot Compliance',
|
||||
false,
|
||||
true,
|
||||
CAST('[
|
||||
"{{ORGANISATION_NAME}}",
|
||||
"{{ORGANISATION_ADRESSE}}",
|
||||
"{{DSB_NAME}}",
|
||||
"{{DSB_KONTAKT}}",
|
||||
"{{ERSTELLT_VON}}",
|
||||
"{{ERSTELLT_AM}}",
|
||||
"{{VERSION}}",
|
||||
"{{GELTUNGSBEREICH}}",
|
||||
"{{SCHUTZBEDARF_VERTRAULICHKEIT}}",
|
||||
"{{SCHUTZBEDARF_INTEGRITAET}}",
|
||||
"{{SCHUTZBEDARF_VERFUEGBARKEIT}}",
|
||||
"{{GESAMTSCHUTZNIVEAU}}",
|
||||
"{{TOM_VERFUEGBARKEIT}}",
|
||||
"{{TOM_INTEGRITAET}}",
|
||||
"{{TOM_VERTRAULICHKEIT}}",
|
||||
"{{TOM_NICHTVERKETTUNG}}",
|
||||
"{{TOM_TRANSPARENZ}}",
|
||||
"{{TOM_INTERVENIERBARKEIT}}",
|
||||
"{{TOM_DATENMINIMIERUNG}}",
|
||||
"{{TOM_SEKTOR_ERGAENZUNGEN}}",
|
||||
"{{COMPLIANCE_BEWERTUNG}}",
|
||||
"{{NAECHSTE_UEBERPRUFUNG}}",
|
||||
"{{UEBERPRUFUNGSINTERVALL}}"
|
||||
]' AS jsonb),
|
||||
$template$# Technische und Organisatorische Massnahmen (TOM)
|
||||
**gemaess Art. 32 DS-GVO — strukturiert nach SDM V3.1a**
|
||||
|
||||
---
|
||||
|
||||
## 1. Allgemeine Informationen
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Organisation** | {{ORGANISATION_NAME}} |
|
||||
| **Adresse** | {{ORGANISATION_ADRESSE}} |
|
||||
| **Datenschutzbeauftragter** | {{DSB_NAME}} ({{DSB_KONTAKT}}) |
|
||||
| **Erstellt von** | {{ERSTELLT_VON}} |
|
||||
| **Erstellt am** | {{ERSTELLT_AM}} |
|
||||
| **Version** | {{VERSION}} |
|
||||
|
||||
### 1.1 Geltungsbereich
|
||||
|
||||
{{GELTUNGSBEREICH}}
|
||||
|
||||
---
|
||||
|
||||
## 2. Schutzbedarfsanalyse
|
||||
|
||||
Die Schutzbedarfsanalyse bildet die Grundlage fuer die Auswahl angemessener Massnahmen. Der Schutzbedarf wird fuer die drei klassischen Schutzziele bewertet.
|
||||
|
||||
| Schutzziel | Schutzbedarf | Begruendung |
|
||||
|------------|-------------|-------------|
|
||||
| **Vertraulichkeit** | {{SCHUTZBEDARF_VERTRAULICHKEIT}} | |
|
||||
| **Integritaet** | {{SCHUTZBEDARF_INTEGRITAET}} | |
|
||||
| **Verfuegbarkeit** | {{SCHUTZBEDARF_VERFUEGBARKEIT}} | |
|
||||
|
||||
**Gesamtschutzniveau:** {{GESAMTSCHUTZNIVEAU}}
|
||||
|
||||
*Bewertungsskala: normal / hoch / sehr hoch*
|
||||
|
||||
---
|
||||
|
||||
## 3. Massnahmen nach SDM-Gewaehrleistungszielen
|
||||
|
||||
Die folgende Struktur folgt den sieben Gewaehrleistungszielen des Standard-Datenschutzmodells (SDM V3.1a) der Datenschutzkonferenz.
|
||||
|
||||
### 3.1 Verfuegbarkeit
|
||||
|
||||
**Ziel:** Personenbezogene Daten stehen zeitgerecht zur Verfuegung und koennen ordnungsgemaess verarbeitet werden.
|
||||
|
||||
**Referenz:** SDM-Baustein 11 (Aufbewahren)
|
||||
|
||||
{{TOM_VERFUEGBARKEIT}}
|
||||
|
||||
| Massnahme | Typ | Status | Verantwortlich | Pruefintervall |
|
||||
|-----------|-----|--------|----------------|----------------|
|
||||
| Redundante Datenhaltung (RAID, Replikation) | technisch | | IT-Betrieb | 12 Monate |
|
||||
| Regelmaessige Backups (taeglich inkrementell, woechentlich voll) | technisch | | IT-Betrieb | 6 Monate |
|
||||
| Disaster-Recovery-Plan mit dokumentierten RTO/RPO | organisatorisch | | IT-Sicherheit | 12 Monate |
|
||||
| USV und Notstromversorgung | technisch | | Facility Mgmt | 12 Monate |
|
||||
| Wiederherstellungstests (mind. jaehrlich) | organisatorisch | | IT-Betrieb | 12 Monate |
|
||||
|
||||
### 3.2 Integritaet
|
||||
|
||||
**Ziel:** Personenbezogene Daten bleiben waehrend der Verarbeitung unversehrt, vollstaendig und aktuell.
|
||||
|
||||
**Referenz:** SDM-Baustein 61 (Berichtigen)
|
||||
|
||||
{{TOM_INTEGRITAET}}
|
||||
|
||||
| Massnahme | Typ | Status | Verantwortlich | Pruefintervall |
|
||||
|-----------|-----|--------|----------------|----------------|
|
||||
| Pruefsummen und digitale Signaturen | technisch | | IT-Entwicklung | 12 Monate |
|
||||
| Eingabevalidierung und Plausibilitaetspruefungen | technisch | | IT-Entwicklung | bei Release |
|
||||
| Change-Management-Verfahren | organisatorisch | | IT-Betrieb | 12 Monate |
|
||||
| Versionierung von Datensaetzen | technisch | | IT-Entwicklung | 12 Monate |
|
||||
|
||||
### 3.3 Vertraulichkeit
|
||||
|
||||
**Ziel:** Nur befugte Personen koennen personenbezogene Daten zur Kenntnis nehmen.
|
||||
|
||||
**Referenz:** SDM-Baustein 51 (Zugriffe regeln)
|
||||
|
||||
{{TOM_VERTRAULICHKEIT}}
|
||||
|
||||
| Massnahme | Typ | Status | Verantwortlich | Pruefintervall |
|
||||
|-----------|-----|--------|----------------|----------------|
|
||||
| Verschluesselung im Transit (TLS 1.3) | technisch | | IT-Sicherheit | 12 Monate |
|
||||
| Verschluesselung at Rest (AES-256) | technisch | | IT-Sicherheit | 12 Monate |
|
||||
| Rollenbasiertes Zugriffskonzept (RBAC, Least Privilege) | technisch | | IT-Sicherheit | 6 Monate |
|
||||
| Multi-Faktor-Authentifizierung (MFA) | technisch | | IT-Sicherheit | 12 Monate |
|
||||
| Physische Zutrittskontrolle (Schluessel, Kartenleser) | technisch | | Facility Mgmt | 12 Monate |
|
||||
| Vertraulichkeitsverpflichtung Mitarbeitende | organisatorisch | | HR / DSB | bei Eintritt |
|
||||
| Passwortrichtlinie (Komplexitaet, Ablauf, Historie) | organisatorisch | | IT-Sicherheit | 12 Monate |
|
||||
|
||||
### 3.4 Nichtverkettung
|
||||
|
||||
**Ziel:** Personenbezogene Daten werden nur fuer den Zweck verarbeitet, zu dem sie erhoben wurden.
|
||||
|
||||
**Referenz:** SDM-Baustein 50 (Trennen)
|
||||
|
||||
{{TOM_NICHTVERKETTUNG}}
|
||||
|
||||
| Massnahme | Typ | Status | Verantwortlich | Pruefintervall |
|
||||
|-----------|-----|--------|----------------|----------------|
|
||||
| Mandantentrennung (logisch oder physisch) | technisch | | IT-Architektur | 12 Monate |
|
||||
| Pseudonymisierung wo fachlich moeglich | technisch | | IT-Entwicklung | 12 Monate |
|
||||
| Zweckbindungspruefung bei neuen Datennutzungen | organisatorisch | | DSB | bei Bedarf |
|
||||
| Getrennte Datenbanken je Verarbeitungszweck | technisch | | IT-Architektur | 12 Monate |
|
||||
|
||||
### 3.5 Transparenz
|
||||
|
||||
**Ziel:** Betroffene, der Verantwortliche und die Aufsichtsbehoerde koennen die Verarbeitung nachvollziehen.
|
||||
|
||||
**Referenz:** SDM-Baustein 42 (Dokumentieren), SDM-Baustein 43 (Protokollieren)
|
||||
|
||||
{{TOM_TRANSPARENZ}}
|
||||
|
||||
| Massnahme | Typ | Status | Verantwortlich | Pruefintervall |
|
||||
|-----------|-----|--------|----------------|----------------|
|
||||
| Verzeichnis der Verarbeitungstaetigkeiten (Art. 30) | organisatorisch | | DSB | 12 Monate |
|
||||
| Vollstaendiges Audit-Log aller Datenzugriffe | technisch | | IT-Betrieb | 6 Monate |
|
||||
| Datenschutzerklaerung (Art. 13/14 DS-GVO) | organisatorisch | | DSB / Recht | bei Aenderung |
|
||||
| Dokumentierte Prozesse fuer Datenpannen-Meldung | organisatorisch | | DSB | 12 Monate |
|
||||
|
||||
### 3.6 Intervenierbarkeit
|
||||
|
||||
**Ziel:** Betroffenenrechte (Auskunft, Berichtigung, Loeschung, Widerspruch) koennen wirksam ausgeuebt werden.
|
||||
|
||||
**Referenz:** SDM-Baustein 60 (Loeschen), SDM-Baustein 61 (Berichtigen), SDM-Baustein 62 (Einschraenken)
|
||||
|
||||
{{TOM_INTERVENIERBARKEIT}}
|
||||
|
||||
| Massnahme | Typ | Status | Verantwortlich | Pruefintervall |
|
||||
|-----------|-----|--------|----------------|----------------|
|
||||
| Prozess fuer Betroffenenanfragen (Auskunft, Loeschung, Berichtigung) | organisatorisch | | DSB | 12 Monate |
|
||||
| Technische Loeschfaehigkeit mit Nachweis | technisch | | IT-Entwicklung | 12 Monate |
|
||||
| Datenexport in maschinenlesbarem Format (Art. 20) | technisch | | IT-Entwicklung | 12 Monate |
|
||||
| Sperrfunktion (Einschraenkung der Verarbeitung) | technisch | | IT-Entwicklung | 12 Monate |
|
||||
| Widerspruchsmoeglichkeit gegen Verarbeitung | organisatorisch | | DSB | 12 Monate |
|
||||
|
||||
### 3.7 Datenminimierung
|
||||
|
||||
**Ziel:** Die Verarbeitung beschraenkt sich auf das erforderliche Mass.
|
||||
|
||||
**Referenz:** SDM-Baustein 41 (Planen und Spezifizieren)
|
||||
|
||||
{{TOM_DATENMINIMIERUNG}}
|
||||
|
||||
| Massnahme | Typ | Status | Verantwortlich | Pruefintervall |
|
||||
|-----------|-----|--------|----------------|----------------|
|
||||
| Regelmaessige Pruefung der Erforderlichkeit | organisatorisch | | DSB | 12 Monate |
|
||||
| Automatisierte Loeschung nach Fristablauf | technisch | | IT-Entwicklung | 6 Monate |
|
||||
| Anonymisierung fuer statistische Zwecke | technisch | | IT-Entwicklung | bei Bedarf |
|
||||
| Privacy by Design bei neuen Verarbeitungen | organisatorisch | | IT-Architektur / DSB | bei Bedarf |
|
||||
| Loeschfristenkatalog (dokumentiert) | organisatorisch | | DSB / Recht | 12 Monate |
|
||||
|
||||
---
|
||||
|
||||
## 4. Sektorspezifische Ergaenzungen
|
||||
|
||||
{{#IF TOM_SEKTOR_ERGAENZUNGEN}}
|
||||
{{TOM_SEKTOR_ERGAENZUNGEN}}
|
||||
{{/IF}}
|
||||
{{#IF_NOT TOM_SEKTOR_ERGAENZUNGEN}}
|
||||
Keine sektorspezifischen Ergaenzungen erforderlich.
|
||||
{{/IF_NOT}}
|
||||
|
||||
---
|
||||
|
||||
## 5. Compliance-Bewertung
|
||||
|
||||
{{#IF COMPLIANCE_BEWERTUNG}}
|
||||
{{COMPLIANCE_BEWERTUNG}}
|
||||
{{/IF}}
|
||||
{{#IF_NOT COMPLIANCE_BEWERTUNG}}
|
||||
Die Compliance-Bewertung erfolgt nach erstmaliger Implementierung aller Massnahmen.
|
||||
{{/IF_NOT}}
|
||||
|
||||
---
|
||||
|
||||
## 6. Ueberprufungsplan
|
||||
|
||||
| Aspekt | Festlegung |
|
||||
|--------|------------|
|
||||
| **Regelmaessige Ueberprufung** | {{UEBERPRUFUNGSINTERVALL}} |
|
||||
| **Naechste geplante Ueberprufung** | {{NAECHSTE_UEBERPRUFUNG}} |
|
||||
|
||||
**Trigger fuer ausserplanmaessige Ueberprufung:**
|
||||
- Sicherheitsvorfall oder Datenpanne
|
||||
- Wesentliche Aenderung der Verarbeitungssysteme
|
||||
- Neue regulatorische Anforderungen (z. B. NIS2, AI Act)
|
||||
- Ergebnisse interner oder externer Audits
|
||||
|
||||
---
|
||||
|
||||
*Erstellt mit BreakPilot Compliance. Struktur basiert auf dem Standard-Datenschutzmodell (SDM V3.1a) der Datenschutzkonferenz.*
|
||||
$template$
|
||||
) ON CONFLICT DO NOTHING;
|
||||
663
document-templates/migrations/003_vvt_sector_templates.sql
Normal file
663
document-templates/migrations/003_vvt_sector_templates.sql
Normal file
@@ -0,0 +1,663 @@
|
||||
-- Migration 003: VVT Sector Templates — Branchenspezifische Verarbeitungsverzeichnisse
|
||||
-- 6 Branchen-Muster + 1 allgemeine V2-Vorlage
|
||||
|
||||
-- 1. Bestehende V1 archivieren
|
||||
UPDATE compliance.compliance_legal_templates
|
||||
SET status = 'archived', updated_at = NOW()
|
||||
WHERE document_type = 'vvt_register'
|
||||
AND status = 'published';
|
||||
|
||||
-- 2. Allgemeine VVT V2 Vorlage (branchenuebergreifend)
|
||||
INSERT INTO compliance.compliance_legal_templates (
|
||||
tenant_id, document_type, title, description, language, jurisdiction,
|
||||
version, status, license_name, source_name, attribution_required,
|
||||
is_complete_document, placeholders, content
|
||||
) VALUES (
|
||||
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
|
||||
'vvt_register',
|
||||
'Verzeichnis von Verarbeitungstaetigkeiten (VVT) gemaess Art. 30 DS-GVO — V2',
|
||||
'Erweiterte VVT-Vorlage mit vollstaendiger Art. 30 Struktur, Loeschfristen-Integration und DSFA-Verweis. Branchenuebergreifend einsetzbar.',
|
||||
'de',
|
||||
'EU/DSGVO',
|
||||
'2.0',
|
||||
'published',
|
||||
'MIT',
|
||||
'BreakPilot Compliance',
|
||||
false,
|
||||
true,
|
||||
CAST('[
|
||||
"{{ORGANISATION_NAME}}",
|
||||
"{{ORGANISATION_ADRESSE}}",
|
||||
"{{VERTRETER_NAME}}",
|
||||
"{{DSB_NAME}}",
|
||||
"{{DSB_KONTAKT}}",
|
||||
"{{ERSTELLT_AM}}",
|
||||
"{{VERSION}}",
|
||||
"{{VVT_NR}}",
|
||||
"{{VERARBEITUNG_NAME}}",
|
||||
"{{VERARBEITUNG_BESCHREIBUNG}}",
|
||||
"{{ZWECKE}}",
|
||||
"{{RECHTSGRUNDLAGEN}}",
|
||||
"{{BETROFFENE}}",
|
||||
"{{DATENKATEGORIEN}}",
|
||||
"{{EMPFAENGER}}",
|
||||
"{{DRITTLAND}}",
|
||||
"{{DRITTLAND_GARANTIEN}}",
|
||||
"{{LOESCHFRISTEN}}",
|
||||
"{{TOM_REFERENZ}}",
|
||||
"{{SYSTEME}}",
|
||||
"{{VERANTWORTLICHER}}",
|
||||
"{{RISIKOBEWERTUNG}}",
|
||||
"{{DSFA_ERFORDERLICH}}",
|
||||
"{{LETZTE_PRUEFUNG}}",
|
||||
"{{NAECHSTE_PRUEFUNG}}",
|
||||
"{{STATUS}}"
|
||||
]' AS jsonb),
|
||||
$template$# Verzeichnis von Verarbeitungstaetigkeiten (VVT)
|
||||
**gemaess Art. 30 DS-GVO**
|
||||
|
||||
---
|
||||
|
||||
## Angaben zum Verantwortlichen
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Name / Firma** | {{ORGANISATION_NAME}} |
|
||||
| **Adresse** | {{ORGANISATION_ADRESSE}} |
|
||||
| **Vertreter des Verantwortlichen** | {{VERTRETER_NAME}} |
|
||||
| **Datenschutzbeauftragter** | {{DSB_NAME}} ({{DSB_KONTAKT}}) |
|
||||
| **Stand** | {{ERSTELLT_AM}} |
|
||||
| **Version** | {{VERSION}} |
|
||||
|
||||
---
|
||||
|
||||
## Verarbeitungstaetigkeit
|
||||
|
||||
### Stammdaten
|
||||
|
||||
| Pflichtfeld (Art. 30) | Inhalt |
|
||||
|------------------------|--------|
|
||||
| **VVT-Nr.** | {{VVT_NR}} |
|
||||
| **Bezeichnung** | {{VERARBEITUNG_NAME}} |
|
||||
| **Beschreibung** | {{VERARBEITUNG_BESCHREIBUNG}} |
|
||||
|
||||
### Zweck und Rechtsgrundlage
|
||||
|
||||
| Pflichtfeld | Inhalt |
|
||||
|-------------|--------|
|
||||
| **Zweck(e) der Verarbeitung** | {{ZWECKE}} |
|
||||
| **Rechtsgrundlage(n)** | {{RECHTSGRUNDLAGEN}} |
|
||||
|
||||
### Betroffene und Daten
|
||||
|
||||
| Pflichtfeld | Inhalt |
|
||||
|-------------|--------|
|
||||
| **Kategorien betroffener Personen** | {{BETROFFENE}} |
|
||||
| **Kategorien personenbezogener Daten** | {{DATENKATEGORIEN}} |
|
||||
|
||||
### Empfaenger und Uebermittlung
|
||||
|
||||
| Pflichtfeld | Inhalt |
|
||||
|-------------|--------|
|
||||
| **Kategorien von Empfaengern** | {{EMPFAENGER}} |
|
||||
|
||||
{{#IF DRITTLAND}}
|
||||
| **Uebermittlung in Drittlaender** | {{DRITTLAND}} |
|
||||
| **Geeignete Garantien (Art. 46)** | {{DRITTLAND_GARANTIEN}} |
|
||||
{{/IF}}
|
||||
|
||||
### Fristen und Schutzmassnahmen
|
||||
|
||||
| Pflichtfeld | Inhalt |
|
||||
|-------------|--------|
|
||||
| **Loeschfristen** | {{LOESCHFRISTEN}} |
|
||||
| **TOM-Beschreibung (Art. 32)** | {{TOM_REFERENZ}} |
|
||||
|
||||
### Zusaetzliche Angaben (empfohlen)
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Eingesetzte Systeme** | {{SYSTEME}} |
|
||||
| **Verantwortliche Abteilung** | {{VERANTWORTLICHER}} |
|
||||
| **Risikobewertung** | {{RISIKOBEWERTUNG}} |
|
||||
| **DSFA erforderlich?** | {{DSFA_ERFORDERLICH}} |
|
||||
| **Letzte Pruefung** | {{LETZTE_PRUEFUNG}} |
|
||||
| **Naechste Pruefung** | {{NAECHSTE_PRUEFUNG}} |
|
||||
| **Status** | {{STATUS}} |
|
||||
|
||||
---
|
||||
|
||||
*Erstellt mit BreakPilot Compliance. Struktur entspricht Art. 30 Abs. 1 DS-GVO.*
|
||||
$template$
|
||||
) ON CONFLICT DO NOTHING;
|
||||
|
||||
-- 3. VVT Branchenvorlage: IT / SaaS
|
||||
INSERT INTO compliance.compliance_legal_templates (
|
||||
tenant_id, document_type, title, description, language, jurisdiction,
|
||||
version, status, license_name, source_name, attribution_required,
|
||||
is_complete_document, placeholders, content
|
||||
) VALUES (
|
||||
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
|
||||
'vvt_register',
|
||||
'VVT Branchenvorlage: IT / SaaS-Unternehmen',
|
||||
'Vorbefuelltes Verarbeitungsverzeichnis mit typischen Verarbeitungstaetigkeiten eines IT- oder SaaS-Unternehmens. Enthalt 8 Standard-Verarbeitungen.',
|
||||
'de', 'EU/DSGVO', '2.0', 'published', 'MIT', 'BreakPilot Compliance', false, true,
|
||||
'[]'::jsonb,
|
||||
$template$# VVT Branchenvorlage: IT / SaaS-Unternehmen
|
||||
|
||||
Die folgenden Verarbeitungstaetigkeiten sind typisch fuer IT- und SaaS-Unternehmen. Bitte pruefen und an Ihre konkrete Situation anpassen.
|
||||
|
||||
---
|
||||
|
||||
## VVT-001: SaaS-Plattformbetrieb
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Bereitstellung und Betrieb der SaaS-Plattform fuer Kunden |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO (Vertragserfullung) |
|
||||
| **Betroffene** | Kunden, Endnutzer der Plattform |
|
||||
| **Datenkategorien** | Stammdaten, Nutzungsdaten, Inhaltsdaten, technische Logdaten |
|
||||
| **Empfaenger** | Hosting-Anbieter (AVV), Support-Dienstleister (AVV) |
|
||||
| **Loeschfrist** | 90 Tage nach Vertragsende + gesetzliche Aufbewahrungsfristen |
|
||||
| **TOM** | Siehe TOM-Dokumentation: Mandantentrennung, Verschluesselung, RBAC |
|
||||
| **DSFA erforderlich?** | Abhaengig von Art und Umfang der verarbeiteten Daten |
|
||||
|
||||
## VVT-002: Kundenverwaltung / CRM
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Verwaltung von Kundenbeziehungen, Vertragsmanagement |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO (Vertragserfullung) |
|
||||
| **Betroffene** | Kunden, Ansprechpartner, Interessenten |
|
||||
| **Datenkategorien** | Kontaktdaten, Vertragsdaten, Kommunikationshistorie |
|
||||
| **Empfaenger** | CRM-Anbieter (AVV), ggf. Vertriebspartner |
|
||||
| **Loeschfrist** | 3 Jahre nach letztem Kontakt (Verjaeherung), 10 Jahre Rechnungsdaten (HGB/AO) |
|
||||
| **TOM** | Zugriffsbeschraenkung auf Vertrieb/Support, Protokollierung |
|
||||
|
||||
## VVT-003: E-Mail-Marketing / Newsletter
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Versand von Produkt-Updates, Marketing-Newsletter |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. a DS-GVO (Einwilligung) + UWG §7 |
|
||||
| **Betroffene** | Newsletter-Abonnenten |
|
||||
| **Datenkategorien** | E-Mail-Adresse, Name, Oeffnungs-/Klickverhalten |
|
||||
| **Empfaenger** | E-Mail-Dienstleister (AVV) |
|
||||
| **Loeschfrist** | Unverzueglich nach Widerruf der Einwilligung |
|
||||
| **TOM** | Double-Opt-In, einfache Abmeldefunktion |
|
||||
|
||||
## VVT-004: Webanalyse
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Analyse der Website-Nutzung zur Verbesserung des Angebots |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. a DS-GVO (Einwilligung via Cookie-Banner) |
|
||||
| **Betroffene** | Website-Besucher |
|
||||
| **Datenkategorien** | IP-Adresse (anonymisiert), Seitenaufrufe, Verweildauer, Geraeteinformationen |
|
||||
| **Empfaenger** | Analyse-Anbieter (AVV) |
|
||||
| **Loeschfrist** | 14 Monate (max. Cookie-Laufzeit) |
|
||||
| **TOM** | IP-Anonymisierung, Cookie-Consent-Management (TDDDG §25) |
|
||||
|
||||
## VVT-005: Bewerbermanagement
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Bearbeitung von Bewerbungen, Auswahlverfahren |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO i.V.m. §26 BDSG (Beschaeftigungsverhaeltnis) |
|
||||
| **Betroffene** | Bewerberinnen und Bewerber |
|
||||
| **Datenkategorien** | Kontaktdaten, Lebenslauf, Qualifikationen, Bewerbungsunterlagen |
|
||||
| **Empfaenger** | Fachabteilung, ggf. Personaldienstleister (AVV) |
|
||||
| **Loeschfrist** | 6 Monate nach Abschluss des Verfahrens (AGG-Frist) |
|
||||
| **TOM** | Zugriffsschutz auf Bewerbungsportal, verschluesselte Uebertragung |
|
||||
|
||||
## VVT-006: Mitarbeiterverwaltung / HR
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Personalverwaltung, Lohn-/Gehaltsabrechnung, Arbeitszeiterfassung |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b/c DS-GVO i.V.m. §26 BDSG |
|
||||
| **Betroffene** | Beschaeftigte |
|
||||
| **Datenkategorien** | Stammdaten, Vertragsdaten, Bankverbindung, Sozialversicherung, Arbeitszeitdaten |
|
||||
| **Empfaenger** | Lohnbuero (AVV), Finanzamt, Sozialversicherungstraeger |
|
||||
| **Loeschfrist** | 10 Jahre nach Austritt (steuerliche Aufbewahrung), Personalakte 3 Jahre |
|
||||
| **TOM** | Besonderer Zugriffsschutz (nur HR), verschluesselte Speicherung |
|
||||
|
||||
## VVT-007: Support-Ticketing
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Bearbeitung von Kundenanfragen und Stoerungsmeldungen |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO (Vertragserfullung) |
|
||||
| **Betroffene** | Kunden, Endnutzer |
|
||||
| **Datenkategorien** | Kontaktdaten, Ticket-Inhalt, Screenshots, Systemlogs |
|
||||
| **Empfaenger** | Support-Tool-Anbieter (AVV), ggf. Entwicklungsteam |
|
||||
| **Loeschfrist** | 2 Jahre nach Ticket-Schliessung |
|
||||
| **TOM** | Rollenbasierter Zugriff, Pseudonymisierung in internen Reports |
|
||||
|
||||
## VVT-008: Logging und Monitoring
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Sicherheitsueberwachung, Fehleranalyse, Leistungsoptimierung |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. f DS-GVO (berechtigtes Interesse: IT-Sicherheit) |
|
||||
| **Betroffene** | Nutzer der Plattform, Administratoren |
|
||||
| **Datenkategorien** | IP-Adressen, Zugriffszeitpunkte, Fehlerprotokolle, Performance-Metriken |
|
||||
| **Empfaenger** | Log-Management-Anbieter (AVV) |
|
||||
| **Loeschfrist** | 30 Tage Anwendungslogs, 90 Tage Sicherheitslogs |
|
||||
| **TOM** | Zugriffsschutz auf Logdaten, automatische Rotation |
|
||||
|
||||
---
|
||||
|
||||
*Erstellt mit BreakPilot Compliance. Branchenvorlage IT / SaaS.*
|
||||
$template$
|
||||
) ON CONFLICT DO NOTHING;
|
||||
|
||||
-- 4. VVT Branchenvorlage: Gesundheitswesen
|
||||
INSERT INTO compliance.compliance_legal_templates (
|
||||
tenant_id, document_type, title, description, language, jurisdiction,
|
||||
version, status, license_name, source_name, attribution_required,
|
||||
is_complete_document, placeholders, content
|
||||
) VALUES (
|
||||
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
|
||||
'vvt_register',
|
||||
'VVT Branchenvorlage: Gesundheitswesen',
|
||||
'Vorbefuelltes Verarbeitungsverzeichnis mit typischen Verarbeitungen im Gesundheitswesen (Arztpraxis, MVZ, Klinik). Beruecksichtigt Art. 9 DS-GVO besondere Kategorien.',
|
||||
'de', 'EU/DSGVO', '2.0', 'published', 'MIT', 'BreakPilot Compliance', false, true,
|
||||
'[]'::jsonb,
|
||||
$template$# VVT Branchenvorlage: Gesundheitswesen
|
||||
|
||||
Typische Verarbeitungstaetigkeiten fuer Arztpraxen, MVZ und Kliniken. **Besonderheit:** Verarbeitung besonderer Kategorien personenbezogener Daten (Art. 9 DS-GVO — Gesundheitsdaten).
|
||||
|
||||
---
|
||||
|
||||
## VVT-G01: Patientenverwaltung
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Fuehrung der Patientenakte, Behandlungsdokumentation |
|
||||
| **Rechtsgrundlage** | Art. 9 Abs. 2 lit. h DS-GVO i.V.m. §630f BGB (Dokumentationspflicht) |
|
||||
| **Betroffene** | Patienten |
|
||||
| **Datenkategorien** | Stammdaten, Versicherungsdaten, Diagnosen, Befunde, Behandlungsverlaeufe (Art. 9) |
|
||||
| **Empfaenger** | Praxisverwaltungssystem-Anbieter (AVV), Labor (AVV), ueberweisende Aerzte |
|
||||
| **Loeschfrist** | 10 Jahre nach letzter Behandlung (§630f Abs. 3 BGB), Strahlenpass 30 Jahre |
|
||||
| **TOM** | Verschluesselung Patientenakte, Zugriffsschutz (nur behandelnde Aerzte), Notfallzugriff |
|
||||
| **DSFA erforderlich?** | Ja (umfangreiche Verarbeitung Art. 9 Daten) |
|
||||
|
||||
## VVT-G02: Terminmanagement
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Organisation und Verwaltung von Patienten-Terminen |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO (Behandlungsvertrag) |
|
||||
| **Betroffene** | Patienten |
|
||||
| **Datenkategorien** | Name, Kontaktdaten, Terminwunsch, ggf. Behandlungsgrund |
|
||||
| **Empfaenger** | Online-Terminbuchungs-Anbieter (AVV) |
|
||||
| **Loeschfrist** | 6 Monate nach Termin (sofern nicht zur Patientenakte) |
|
||||
| **TOM** | Verschluesselte Uebertragung, Zugriffsschutz Terminkalender |
|
||||
|
||||
## VVT-G03: Abrechnung (KV / PKV)
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Abrechnung aerztlicher Leistungen gegenueber Krankenkassen / Privatpatienten |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. c DS-GVO (gesetzliche Pflicht), Art. 9 Abs. 2 lit. h |
|
||||
| **Betroffene** | Patienten |
|
||||
| **Datenkategorien** | Stammdaten, Versicherungsdaten, Diagnosen (ICD), Leistungsziffern (EBM/GOAe) |
|
||||
| **Empfaenger** | KV (Kassenaerztliche Vereinigung), PKV, Abrechnungsstelle (AVV) |
|
||||
| **Loeschfrist** | 10 Jahre (steuerliche Aufbewahrung AO) |
|
||||
| **TOM** | Verschluesselte Datenuebermittlung (KV-Connect/KIM), Zugriffskontrolle |
|
||||
|
||||
## VVT-G04: Laborbefunde
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Beauftragung und Empfang von Laboruntersuchungen |
|
||||
| **Rechtsgrundlage** | Art. 9 Abs. 2 lit. h DS-GVO |
|
||||
| **Betroffene** | Patienten |
|
||||
| **Datenkategorien** | Proben-ID, Untersuchungsparameter, Befundergebnisse (Art. 9) |
|
||||
| **Empfaenger** | Labordienstleister (AVV) |
|
||||
| **Loeschfrist** | 10 Jahre (Dokumentationspflicht) |
|
||||
| **TOM** | Pseudonymisierung der Proben, verschluesselte Uebertragung |
|
||||
|
||||
## VVT-G05: Mitarbeiterverwaltung
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Personalverwaltung, Dienstplanung, Lohnabrechnung |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b/c DS-GVO i.V.m. §26 BDSG |
|
||||
| **Betroffene** | Beschaeftigte (Aerzte, MFA, Verwaltung) |
|
||||
| **Datenkategorien** | Stammdaten, Vertragsdaten, Bankverbindung, Dienstzeiten |
|
||||
| **Empfaenger** | Lohnbuero (AVV), Finanzamt, Sozialversicherungstraeger |
|
||||
| **Loeschfrist** | 10 Jahre nach Austritt |
|
||||
| **TOM** | Zugriffsschutz (nur HR/Praxisleitung) |
|
||||
|
||||
---
|
||||
|
||||
*Erstellt mit BreakPilot Compliance. Branchenvorlage Gesundheitswesen.*
|
||||
$template$
|
||||
) ON CONFLICT DO NOTHING;
|
||||
|
||||
-- 5. VVT Branchenvorlage: Handel / E-Commerce
|
||||
INSERT INTO compliance.compliance_legal_templates (
|
||||
tenant_id, document_type, title, description, language, jurisdiction,
|
||||
version, status, license_name, source_name, attribution_required,
|
||||
is_complete_document, placeholders, content
|
||||
) VALUES (
|
||||
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
|
||||
'vvt_register',
|
||||
'VVT Branchenvorlage: Handel / E-Commerce',
|
||||
'Vorbefuelltes Verarbeitungsverzeichnis fuer Online-Shops und Einzelhaendler. Beruecksichtigt TDDDG, Fernabsatzrecht und Zahlungsdienste.',
|
||||
'de', 'EU/DSGVO', '2.0', 'published', 'MIT', 'BreakPilot Compliance', false, true,
|
||||
'[]'::jsonb,
|
||||
$template$# VVT Branchenvorlage: Handel / E-Commerce
|
||||
|
||||
Typische Verarbeitungstaetigkeiten fuer Online-Shops und Einzelhandel.
|
||||
|
||||
---
|
||||
|
||||
## VVT-H01: Bestellabwicklung
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Bestellannahme, Versand, Rechnungsstellung |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO (Vertragserfullung) |
|
||||
| **Betroffene** | Kunden (Besteller) |
|
||||
| **Datenkategorien** | Kontaktdaten, Lieferadresse, Bestelldaten, Rechnungsdaten |
|
||||
| **Empfaenger** | Versanddienstleister, Zahlungsanbieter (AVV), Warenwirtschaft |
|
||||
| **Loeschfrist** | 10 Jahre Rechnungsdaten (AO/HGB), 3 Jahre Bestelldaten (Verjaeherung) |
|
||||
| **TOM** | Verschluesselte Uebertragung, Zugriffsschutz Bestellsystem |
|
||||
|
||||
## VVT-H02: Kundenkonto
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Bereitstellung eines Kundenkontos (optional, nicht Pflicht) |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. a/b DS-GVO |
|
||||
| **Betroffene** | Registrierte Kunden |
|
||||
| **Datenkategorien** | Stammdaten, Passwort (gehasht), Bestellhistorie, Wunschliste |
|
||||
| **Empfaenger** | Shop-Plattform-Anbieter (AVV) |
|
||||
| **Loeschfrist** | Unverzueglich nach Kontoloesch-Anfrage, Rechnungsdaten 10 Jahre |
|
||||
| **TOM** | MFA-Option, sichere Passwortspeicherung (bcrypt), Gastzugang-Alternative |
|
||||
|
||||
## VVT-H03: Zahlungsabwicklung
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Abwicklung von Zahlungsvorgaengen |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO |
|
||||
| **Betroffene** | Zahlende Kunden |
|
||||
| **Datenkategorien** | Zahlungsart, Transaktionsdaten (keine Kartennummern bei Tokenisierung) |
|
||||
| **Empfaenger** | Payment-Service-Provider (eigene Verantwortung oder AVV) |
|
||||
| **Loeschfrist** | 10 Jahre (steuerliche Aufbewahrung) |
|
||||
| **TOM** | PCI-DSS Compliance, Tokenisierung, keine direkte Kartenspeicherung |
|
||||
|
||||
## VVT-H04: Newsletter / E-Mail-Marketing
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Versand von Angeboten und Produktneuheiten |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. a DS-GVO (Einwilligung) + UWG §7 Abs. 3 (Bestandskunden) |
|
||||
| **Betroffene** | Newsletter-Abonnenten |
|
||||
| **Datenkategorien** | E-Mail-Adresse, Name, Kaufhistorie (Bestandskunden), Oeffnungsraten |
|
||||
| **Empfaenger** | Newsletter-Dienstleister (AVV) |
|
||||
| **Loeschfrist** | Sofort nach Abmeldung |
|
||||
| **TOM** | Double-Opt-In, Abmeldelink in jeder E-Mail |
|
||||
|
||||
## VVT-H05: Webanalyse und Tracking
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Analyse des Nutzerverhaltens im Shop, Conversion-Optimierung |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. a DS-GVO (Einwilligung, TDDDG §25) |
|
||||
| **Betroffene** | Website-Besucher |
|
||||
| **Datenkategorien** | Anonymisierte IP, Seitenaufrufe, Klickpfade, Warenkorbdaten |
|
||||
| **Empfaenger** | Analyse-Anbieter (AVV) |
|
||||
| **Loeschfrist** | 14 Monate |
|
||||
| **TOM** | IP-Anonymisierung, Cookie-Consent-Management, Opt-Out |
|
||||
|
||||
## VVT-H06: Retouren und Widerruf
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Bearbeitung von Retouren und Widerrufen (Fernabsatzrecht) |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b/c DS-GVO |
|
||||
| **Betroffene** | Kunden (Verbraucher) |
|
||||
| **Datenkategorien** | Bestelldaten, Retourengrund, Erstattungsdaten |
|
||||
| **Empfaenger** | Logistikdienstleister, Zahlungsanbieter |
|
||||
| **Loeschfrist** | 3 Jahre (Verjaeherung), Buchhaltung 10 Jahre |
|
||||
| **TOM** | Nachvollziehbare Retourenprozesse, Zugriffsbeschraenkung |
|
||||
|
||||
---
|
||||
|
||||
*Erstellt mit BreakPilot Compliance. Branchenvorlage Handel / E-Commerce.*
|
||||
$template$
|
||||
) ON CONFLICT DO NOTHING;
|
||||
|
||||
-- 6. VVT Branchenvorlage: Handwerk
|
||||
INSERT INTO compliance.compliance_legal_templates (
|
||||
tenant_id, document_type, title, description, language, jurisdiction,
|
||||
version, status, license_name, source_name, attribution_required,
|
||||
is_complete_document, placeholders, content
|
||||
) VALUES (
|
||||
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
|
||||
'vvt_register',
|
||||
'VVT Branchenvorlage: Handwerksbetrieb',
|
||||
'Vorbefuelltes Verarbeitungsverzeichnis fuer Handwerksbetriebe (Bau, Kfz, Elektro, etc.).',
|
||||
'de', 'EU/DSGVO', '2.0', 'published', 'MIT', 'BreakPilot Compliance', false, true,
|
||||
'[]'::jsonb,
|
||||
$template$# VVT Branchenvorlage: Handwerksbetrieb
|
||||
|
||||
Typische Verarbeitungstaetigkeiten fuer Handwerksbetriebe.
|
||||
|
||||
---
|
||||
|
||||
## VVT-HW01: Kundenauftraege und Angebotserstellung
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Angebotserstellung, Auftragsabwicklung, Rechnungsstellung |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO (Vertragserfullung) |
|
||||
| **Betroffene** | Kunden (Privat und Gewerbe) |
|
||||
| **Datenkategorien** | Kontaktdaten, Objektadresse, Auftragsbeschreibung, Rechnungsdaten |
|
||||
| **Empfaenger** | Buchhaltung, Steuerberater, ggf. Subunternehmer |
|
||||
| **Loeschfrist** | 10 Jahre Rechnungen (AO/HGB), 5 Jahre Gewaehrleistung (BGB) |
|
||||
| **TOM** | Zugriffskontrolle Auftragssystem, verschluesselte Speicherung |
|
||||
|
||||
## VVT-HW02: Mitarbeiterverwaltung
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Personalverwaltung, Lohnabrechnung, Arbeitszeiterfassung |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b/c DS-GVO i.V.m. §26 BDSG |
|
||||
| **Betroffene** | Beschaeftigte, Auszubildende |
|
||||
| **Datenkategorien** | Stammdaten, Vertragsdaten, Bankverbindung, Arbeitszeiten, Gesundheitszeugnisse |
|
||||
| **Empfaenger** | Lohnbuero (AVV), Finanzamt, Berufsgenossenschaft |
|
||||
| **Loeschfrist** | 10 Jahre nach Austritt |
|
||||
| **TOM** | Verschlossene Personalakte, Zugriffsschutz |
|
||||
|
||||
## VVT-HW03: Baustellendokumentation
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Dokumentation von Baufortschritt, Maengelprotokoll |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b/f DS-GVO (Vertrag + berechtigtes Interesse) |
|
||||
| **Betroffene** | Kunden, Mitarbeitende auf der Baustelle |
|
||||
| **Datenkategorien** | Fotos (ggf. mit Personen), Protokolle, Abnahmedokumente |
|
||||
| **Empfaenger** | Auftraggeber, Architekten, Baugutachter |
|
||||
| **Loeschfrist** | 5 Jahre nach Abnahme (Verjaeherung), Fotos nach Projektabschluss |
|
||||
| **TOM** | Beschraenkter Zugriff auf Projektordner, keine oeffentliche Cloud ohne AVV |
|
||||
|
||||
## VVT-HW04: Materialwirtschaft
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Materialbeschaffung, Lagerverwaltung, Lieferantenmanagement |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO |
|
||||
| **Betroffene** | Lieferanten (Ansprechpartner) |
|
||||
| **Datenkategorien** | Firmendaten, Ansprechpartner, Bestellhistorie, Konditionen |
|
||||
| **Empfaenger** | Grosshandel, Buchhaltung |
|
||||
| **Loeschfrist** | 6 Jahre (Handelsbriefe HGB), 10 Jahre Rechnungen |
|
||||
| **TOM** | Zugriffskontrolle ERP/Warenwirtschaft |
|
||||
|
||||
---
|
||||
|
||||
*Erstellt mit BreakPilot Compliance. Branchenvorlage Handwerksbetrieb.*
|
||||
$template$
|
||||
) ON CONFLICT DO NOTHING;
|
||||
|
||||
-- 7. VVT Branchenvorlage: Bildung
|
||||
INSERT INTO compliance.compliance_legal_templates (
|
||||
tenant_id, document_type, title, description, language, jurisdiction,
|
||||
version, status, license_name, source_name, attribution_required,
|
||||
is_complete_document, placeholders, content
|
||||
) VALUES (
|
||||
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
|
||||
'vvt_register',
|
||||
'VVT Branchenvorlage: Bildungseinrichtung',
|
||||
'Vorbefuelltes Verarbeitungsverzeichnis fuer Schulen, Hochschulen und Bildungstraeger. Beruecksichtigt Schueler-/Studentendaten als schutzbeduerftige Betroffene.',
|
||||
'de', 'EU/DSGVO', '2.0', 'published', 'MIT', 'BreakPilot Compliance', false, true,
|
||||
'[]'::jsonb,
|
||||
$template$# VVT Branchenvorlage: Bildungseinrichtung
|
||||
|
||||
Typische Verarbeitungstaetigkeiten fuer Schulen, Hochschulen und Bildungstraeger.
|
||||
|
||||
---
|
||||
|
||||
## VVT-B01: Schueler-/Studierendenverwaltung
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Verwaltung von Schueler-/Studierendendaten, Anmeldung, Klassenzuordnung |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. c/e DS-GVO i.V.m. Landesschulgesetz |
|
||||
| **Betroffene** | Schueler/Studierende (ggf. Minderjaehrige — besonders schutzbeduerftig), Erziehungsberechtigte |
|
||||
| **Datenkategorien** | Stammdaten, Kontaktdaten Erziehungsberechtigte, Klassenzuordnung |
|
||||
| **Empfaenger** | Schulverwaltungssoftware-Anbieter (AVV), Schulbehoerde |
|
||||
| **Loeschfrist** | Gemaess Landesschulgesetz (i.d.R. 5 Jahre nach Abgang) |
|
||||
| **TOM** | Besonderer Zugriffsschutz, Altersverifizierung, Einwilligung Erziehungsberechtigte |
|
||||
| **DSFA erforderlich?** | Ja (schutzbeduerftige Betroffene, ggf. grosser Umfang) |
|
||||
|
||||
## VVT-B02: Notenverarbeitung und Zeugniserstellung
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Leistungsbewertung, Zeugnis- und Notenverwaltung |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. c/e DS-GVO i.V.m. Schulgesetz |
|
||||
| **Betroffene** | Schueler/Studierende |
|
||||
| **Datenkategorien** | Noten, Leistungsbewertungen, Pruefungsergebnisse |
|
||||
| **Empfaenger** | Lehrkraefte, Schulleitung, Pruefungsamt |
|
||||
| **Loeschfrist** | Zeugniskopien: 50 Jahre (Nachweispflicht), Einzelnoten: 2 Jahre |
|
||||
| **TOM** | Zugriffsbeschraenkung auf Fachlehrkraft, verschluesselte Speicherung |
|
||||
|
||||
## VVT-B03: Lernplattform / LMS
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Digitaler Unterricht, Aufgabenverteilung, Kommunikation |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. e DS-GVO (oeffentliches Interesse) / lit. a (Einwilligung bei Minderjaehrigen) |
|
||||
| **Betroffene** | Schueler/Studierende, Lehrkraefte |
|
||||
| **Datenkategorien** | Nutzungsdaten, eingereichte Aufgaben, Chat-Nachrichten |
|
||||
| **Empfaenger** | LMS-Anbieter (AVV), Hosting-Provider (AVV) |
|
||||
| **Loeschfrist** | Kursende + 1 Schuljahr |
|
||||
| **TOM** | Datensparsamkeit, keine Lernanalytics ohne Einwilligung, Hosting in EU |
|
||||
|
||||
## VVT-B04: Elternkommunikation
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Information und Kommunikation mit Erziehungsberechtigten |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. e DS-GVO |
|
||||
| **Betroffene** | Erziehungsberechtigte |
|
||||
| **Datenkategorien** | Kontaktdaten, Nachrichteninhalt |
|
||||
| **Empfaenger** | Kommunikationsplattform-Anbieter (AVV) |
|
||||
| **Loeschfrist** | Ende des Schuljahres bzw. Abgang des Kindes |
|
||||
| **TOM** | Verschluesselte Kommunikation, kein WhatsApp/Social Media |
|
||||
|
||||
---
|
||||
|
||||
*Erstellt mit BreakPilot Compliance. Branchenvorlage Bildungseinrichtung.*
|
||||
$template$
|
||||
) ON CONFLICT DO NOTHING;
|
||||
|
||||
-- 8. VVT Branchenvorlage: Beratung / Dienstleistung
|
||||
INSERT INTO compliance.compliance_legal_templates (
|
||||
tenant_id, document_type, title, description, language, jurisdiction,
|
||||
version, status, license_name, source_name, attribution_required,
|
||||
is_complete_document, placeholders, content
|
||||
) VALUES (
|
||||
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
|
||||
'vvt_register',
|
||||
'VVT Branchenvorlage: Beratung / Dienstleistung',
|
||||
'Vorbefuelltes Verarbeitungsverzeichnis fuer Beratungsunternehmen, Kanzleien und Dienstleister.',
|
||||
'de', 'EU/DSGVO', '2.0', 'published', 'MIT', 'BreakPilot Compliance', false, true,
|
||||
'[]'::jsonb,
|
||||
$template$# VVT Branchenvorlage: Beratung / Dienstleistung
|
||||
|
||||
Typische Verarbeitungstaetigkeiten fuer Beratungsunternehmen, Kanzleien und professionelle Dienstleister.
|
||||
|
||||
---
|
||||
|
||||
## VVT-D01: Mandantenverwaltung
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Verwaltung von Mandanten-/Kundenbeziehungen, Vertragsdokumentation |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO (Vertragserfullung) |
|
||||
| **Betroffene** | Mandanten, Ansprechpartner |
|
||||
| **Datenkategorien** | Kontaktdaten, Vertragsdaten, Korrespondenz, Rechnungsdaten |
|
||||
| **Empfaenger** | Kanzleisoftware-Anbieter (AVV), Steuerberater |
|
||||
| **Loeschfrist** | 10 Jahre Rechnungen, 5 Jahre Handakten (Berufsrecht), 3 Jahre sonstige |
|
||||
| **TOM** | Mandantengeheimnis, verschluesselte Speicherung, Need-to-know-Prinzip |
|
||||
|
||||
## VVT-D02: Projektmanagement
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Planung und Steuerung von Beratungsprojekten |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b/f DS-GVO |
|
||||
| **Betroffene** | Projektbeteiligte (Mandant + intern) |
|
||||
| **Datenkategorien** | Projektdaten, Aufgaben, Zeiterfassung, Ergebnisdokumente |
|
||||
| **Empfaenger** | Projektmanagement-Tool (AVV), Mandant |
|
||||
| **Loeschfrist** | 2 Jahre nach Projektabschluss |
|
||||
| **TOM** | Projektspezifische Zugriffsrechte, Mandantentrennung |
|
||||
|
||||
## VVT-D03: Zeiterfassung und Abrechnung
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Erfassung geleisteter Stunden, Abrechnung gegenueber Mandanten |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO |
|
||||
| **Betroffene** | Berater/Mitarbeitende, Mandanten |
|
||||
| **Datenkategorien** | Arbeitszeiten, Taetigkeitsbeschreibungen, Stundensaetze |
|
||||
| **Empfaenger** | Abrechnungssystem (AVV), Buchhaltung |
|
||||
| **Loeschfrist** | 10 Jahre (steuerliche Aufbewahrung) |
|
||||
| **TOM** | Zugriffsbeschraenkung (nur eigene Zeiten + Projektleitung) |
|
||||
|
||||
## VVT-D04: Dokumentenmanagement
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Verwaltung und Archivierung von Mandantendokumenten |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b/c DS-GVO |
|
||||
| **Betroffene** | Mandanten, ggf. Dritte in Dokumenten |
|
||||
| **Datenkategorien** | Vertraege, Gutachten, Korrespondenz, Berichte |
|
||||
| **Empfaenger** | DMS-Anbieter (AVV), Cloud-Speicher (AVV) |
|
||||
| **Loeschfrist** | Gemaess Berufsrecht und Mandatsvereinbarung |
|
||||
| **TOM** | Dokumentenklassifizierung, Versionierung, Zugriffsprotokollierung |
|
||||
|
||||
## VVT-D05: CRM und Akquise
|
||||
|
||||
| Feld | Inhalt |
|
||||
|------|--------|
|
||||
| **Zweck** | Kontaktpflege, Akquise, Beziehungsmanagement |
|
||||
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. f DS-GVO (berechtigtes Interesse: Geschaeftsanbahnung) |
|
||||
| **Betroffene** | Interessenten, Geschaeftskontakte |
|
||||
| **Datenkategorien** | Kontaktdaten, Firma, Branche, Gespraechsnotizen |
|
||||
| **Empfaenger** | CRM-Anbieter (AVV) |
|
||||
| **Loeschfrist** | 3 Jahre nach letztem Kontakt |
|
||||
| **TOM** | Widerspruchsmoeglichkeit, Datenminimierung |
|
||||
|
||||
---
|
||||
|
||||
*Erstellt mit BreakPilot Compliance. Branchenvorlage Beratung / Dienstleistung.*
|
||||
$template$
|
||||
) ON CONFLICT DO NOTHING;
|
||||
212
document-templates/migrations/004_avv_template.sql
Normal file
212
document-templates/migrations/004_avv_template.sql
Normal file
@@ -0,0 +1,212 @@
|
||||
-- Migration 004: AVV Template — Auftragsverarbeitungsvertrag (Art. 28 DS-GVO)
|
||||
-- Deutsche AVV-Vorlage mit allen Pflichtinhalten.
|
||||
|
||||
INSERT INTO compliance.compliance_legal_templates (
|
||||
tenant_id, document_type, title, description, language, jurisdiction,
|
||||
version, status, license_name, source_name, attribution_required,
|
||||
is_complete_document, placeholders, content
|
||||
) VALUES (
|
||||
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
|
||||
'dpa',
|
||||
'Auftragsverarbeitungsvertrag (AVV) gemaess Art. 28 DS-GVO',
|
||||
'Vollstaendiger Auftragsverarbeitungsvertrag mit allen Pflichtinhalten nach Art. 28 Abs. 3 DS-GVO. Inkl. TOM-Anlage und Drittlandtransfer-Klausel.',
|
||||
'de',
|
||||
'EU/DSGVO',
|
||||
'2.0',
|
||||
'published',
|
||||
'MIT',
|
||||
'BreakPilot Compliance',
|
||||
false,
|
||||
true,
|
||||
CAST('[
|
||||
"{{VERANTWORTLICHER_NAME}}",
|
||||
"{{VERANTWORTLICHER_ADRESSE}}",
|
||||
"{{VERANTWORTLICHER_VERTRETER}}",
|
||||
"{{AUFTRAGSVERARBEITER_NAME}}",
|
||||
"{{AUFTRAGSVERARBEITER_ADRESSE}}",
|
||||
"{{AUFTRAGSVERARBEITER_VERTRETER}}",
|
||||
"{{VERTRAGSGEGENSTAND}}",
|
||||
"{{VERTRAGSDAUER}}",
|
||||
"{{VERARBEITUNGSZWECK}}",
|
||||
"{{ART_DER_VERARBEITUNG}}",
|
||||
"{{DATENKATEGORIEN}}",
|
||||
"{{BETROFFENE}}",
|
||||
"{{UNTERAUFTRAGSVERARBEITER_LISTE}}",
|
||||
"{{TOM_ANLAGE}}",
|
||||
"{{DRITTLANDTRANSFER_DETAILS}}",
|
||||
"{{ORT_DATUM}}",
|
||||
"{{WEISUNGSBERECHTIGTER}}",
|
||||
"{{KONTAKT_DATENSCHUTZ_AV}}"
|
||||
]' AS jsonb),
|
||||
$template$# Auftragsverarbeitungsvertrag (AVV)
|
||||
**gemaess Art. 28 Abs. 3 DS-GVO**
|
||||
|
||||
---
|
||||
|
||||
## Vertragsparteien
|
||||
|
||||
**Verantwortlicher (Auftraggeber):**
|
||||
{{VERANTWORTLICHER_NAME}}
|
||||
{{VERANTWORTLICHER_ADRESSE}}
|
||||
Vertreten durch: {{VERANTWORTLICHER_VERTRETER}}
|
||||
|
||||
**Auftragsverarbeiter (Auftragnehmer):**
|
||||
{{AUFTRAGSVERARBEITER_NAME}}
|
||||
{{AUFTRAGSVERARBEITER_ADRESSE}}
|
||||
Vertreten durch: {{AUFTRAGSVERARBEITER_VERTRETER}}
|
||||
|
||||
---
|
||||
|
||||
## §1 Gegenstand und Dauer
|
||||
|
||||
(1) Der Auftragsverarbeiter verarbeitet personenbezogene Daten im Auftrag des Verantwortlichen. Gegenstand der Auftragsverarbeitung ist:
|
||||
|
||||
{{VERTRAGSGEGENSTAND}}
|
||||
|
||||
(2) Die Dauer der Verarbeitung entspricht der Laufzeit des Hauptvertrags: {{VERTRAGSDAUER}}.
|
||||
|
||||
---
|
||||
|
||||
## §2 Art und Zweck der Verarbeitung
|
||||
|
||||
(1) **Zweck:** {{VERARBEITUNGSZWECK}}
|
||||
|
||||
(2) **Art der Verarbeitung:** {{ART_DER_VERARBEITUNG}}
|
||||
|
||||
---
|
||||
|
||||
## §3 Art der personenbezogenen Daten
|
||||
|
||||
{{DATENKATEGORIEN}}
|
||||
|
||||
---
|
||||
|
||||
## §4 Kategorien betroffener Personen
|
||||
|
||||
{{BETROFFENE}}
|
||||
|
||||
---
|
||||
|
||||
## §5 Pflichten des Verantwortlichen
|
||||
|
||||
(1) Der Verantwortliche ist fuer die Rechtmaessigkeit der Datenverarbeitung verantwortlich.
|
||||
|
||||
(2) Der Verantwortliche erteilt Weisungen zur Datenverarbeitung. Weisungsberechtigt ist: {{WEISUNGSBERECHTIGTER}}.
|
||||
|
||||
(3) Der Verantwortliche informiert den Auftragsverarbeiter unverzueglich, wenn er Fehler oder Unregelmaessigkeiten feststellt.
|
||||
|
||||
(4) Der Verantwortliche ist verpflichtet, alle im Rahmen des Vertragsverhaeltnisses erlangten Kenntnisse vertraulich zu behandeln.
|
||||
|
||||
---
|
||||
|
||||
## §6 Pflichten des Auftragsverarbeiters
|
||||
|
||||
(1) Der Auftragsverarbeiter verarbeitet die Daten ausschliesslich auf dokumentierte Weisung des Verantwortlichen (Art. 28 Abs. 3 lit. a DS-GVO), es sei denn, er ist durch Unionsrecht oder nationales Recht hierzu verpflichtet.
|
||||
|
||||
(2) Der Auftragsverarbeiter gewaehrleistet, dass sich die zur Verarbeitung befugten Personen zur Vertraulichkeit verpflichtet haben oder einer angemessenen gesetzlichen Verschwiegenheitspflicht unterliegen (Art. 28 Abs. 3 lit. b).
|
||||
|
||||
(3) Der Auftragsverarbeiter trifft alle erforderlichen technischen und organisatorischen Massnahmen gemaess Art. 32 DS-GVO (siehe Anlage 1: TOM).
|
||||
|
||||
(4) Der Auftragsverarbeiter beachtet die Bedingungen fuer die Inanspruchnahme von Unterauftragsverarbeitern (§7 dieses Vertrags).
|
||||
|
||||
(5) Der Auftragsverarbeiter unterstuetzt den Verantwortlichen bei der Erfuellung der Betroffenenrechte (Art. 15-22 DS-GVO) durch geeignete technische und organisatorische Massnahmen (Art. 28 Abs. 3 lit. e).
|
||||
|
||||
(6) Der Auftragsverarbeiter unterstuetzt den Verantwortlichen bei der Einhaltung der Pflichten aus Art. 32-36 DS-GVO (Sicherheit, Meldepflichten, DSFA, Konsultation).
|
||||
|
||||
(7) Der Auftragsverarbeiter loescht oder gibt nach Wahl des Verantwortlichen alle personenbezogenen Daten nach Beendigung der Auftragsverarbeitung zurueck und loescht vorhandene Kopien, es sei denn, eine Aufbewahrungspflicht besteht (Art. 28 Abs. 3 lit. g).
|
||||
|
||||
(8) Der Auftragsverarbeiter stellt dem Verantwortlichen alle erforderlichen Informationen zum Nachweis der Einhaltung der Pflichten zur Verfuegung und ermoeglicht Ueberpruefungen/Audits (Art. 28 Abs. 3 lit. h).
|
||||
|
||||
(9) Der Auftragsverarbeiter informiert den Verantwortlichen unverzueglich, wenn eine Weisung nach seiner Auffassung gegen datenschutzrechtliche Vorschriften verstoesst.
|
||||
|
||||
(10) Der Auftragsverarbeiter benennt einen Ansprechpartner fuer den Datenschutz: {{KONTAKT_DATENSCHUTZ_AV}}.
|
||||
|
||||
---
|
||||
|
||||
## §7 Unterauftragsverarbeitung
|
||||
|
||||
(1) Der Auftragsverarbeiter darf Unterauftragsverarbeiter nur mit vorheriger schriftlicher Genehmigung des Verantwortlichen einsetzen. Es wird eine allgemeine Genehmigung erteilt, wobei der Auftragsverarbeiter den Verantwortlichen ueber beabsichtigte Aenderungen mindestens 14 Tage im Voraus informiert. Der Verantwortliche kann Einspruch erheben.
|
||||
|
||||
(2) Aktuelle Unterauftragsverarbeiter:
|
||||
|
||||
{{UNTERAUFTRAGSVERARBEITER_LISTE}}
|
||||
|
||||
(3) Der Auftragsverarbeiter stellt vertraglich sicher, dass die Unterauftragsverarbeiter dieselben Datenschutzpflichten einhalten.
|
||||
|
||||
{{#IF DRITTLANDTRANSFER_DETAILS}}
|
||||
---
|
||||
|
||||
## §8 Uebermittlung in Drittlaender
|
||||
|
||||
(1) Eine Uebermittlung personenbezogener Daten in Drittlaender erfolgt nur unter Einhaltung der Voraussetzungen der Art. 44-49 DS-GVO.
|
||||
|
||||
(2) Details:
|
||||
|
||||
{{DRITTLANDTRANSFER_DETAILS}}
|
||||
{{/IF}}
|
||||
|
||||
---
|
||||
|
||||
## §9 Kontrollrechte und Audits
|
||||
|
||||
(1) Der Verantwortliche hat das Recht, die Einhaltung der Vorschriften durch den Auftragsverarbeiter zu ueberpruefen. Dies umfasst Inspektionen vor Ort, Dokumentenpruefungen und die Einholung von Auskuenften.
|
||||
|
||||
(2) Der Auftragsverarbeiter unterstuetzt den Verantwortlichen bei der Durchfuehrung und gewaehrt Zugang zu relevanten Raeumlichkeiten und Systemen mit angemessener Vorankuendigung (in der Regel 14 Tage).
|
||||
|
||||
(3) Alternativ kann der Auftragsverarbeiter aktuelle Zertifizierungen (z. B. ISO 27001, SOC 2) oder Auditberichte unabhaengiger Pruefervorlegen.
|
||||
|
||||
---
|
||||
|
||||
## §10 Meldung von Datenpannen
|
||||
|
||||
(1) Der Auftragsverarbeiter informiert den Verantwortlichen unverzueglich (in der Regel innerhalb von 24 Stunden) nach Kenntniserlangung ueber eine Verletzung des Schutzes personenbezogener Daten (Art. 33 Abs. 2 DS-GVO).
|
||||
|
||||
(2) Die Meldung umfasst mindestens die Art der Datenpanne, die betroffenen Kategorien und ungefaehre Anzahl der Betroffenen, die wahrscheinlichen Folgen und die ergriffenen Gegenmassnahmen.
|
||||
|
||||
---
|
||||
|
||||
## §11 Haftung
|
||||
|
||||
Die Haftung richtet sich nach Art. 82 DS-GVO. Der Auftragsverarbeiter haftet fuer Schaeden, die durch eine nicht den Vorgaben der DS-GVO entsprechende Verarbeitung oder durch Handeln entgegen den Weisungen des Verantwortlichen verursacht wurden.
|
||||
|
||||
---
|
||||
|
||||
## §12 Laufzeit und Kuendigung
|
||||
|
||||
(1) Dieser AVV tritt mit Unterzeichnung in Kraft und endet automatisch mit Beendigung des Hauptvertrags.
|
||||
|
||||
(2) Eine ausserordentliche Kuendigung ist bei schwerem Verstoss gegen diesen Vertrag oder datenschutzrechtliche Vorschriften moeglich.
|
||||
|
||||
(3) Nach Vertragsende hat der Auftragsverarbeiter alle personenbezogenen Daten gemaess §6 Abs. 7 zu loeschen oder zurueckzugeben.
|
||||
|
||||
---
|
||||
|
||||
## §13 Schlussbestimmungen
|
||||
|
||||
(1) Aenderungen dieses Vertrags beduerfen der Schriftform.
|
||||
|
||||
(2) Sollten einzelne Bestimmungen unwirksam sein, bleibt die Wirksamkeit des uebrigen Vertrags unberuehrt.
|
||||
|
||||
(3) Es gilt das Recht der Bundesrepublik Deutschland.
|
||||
|
||||
---
|
||||
|
||||
## Anlage 1: Technische und Organisatorische Massnahmen (TOM)
|
||||
|
||||
{{TOM_ANLAGE}}
|
||||
|
||||
---
|
||||
|
||||
## Unterschriften
|
||||
|
||||
| | Verantwortlicher | Auftragsverarbeiter |
|
||||
|---|---|---|
|
||||
| **Ort, Datum** | {{ORT_DATUM}} | {{ORT_DATUM}} |
|
||||
| **Name** | {{VERANTWORTLICHER_VERTRETER}} | {{AUFTRAGSVERARBEITER_VERTRETER}} |
|
||||
| **Unterschrift** | _________________ | _________________ |
|
||||
|
||||
---
|
||||
|
||||
*Erstellt mit BreakPilot Compliance. Lizenz: MIT.*
|
||||
$template$
|
||||
) ON CONFLICT DO NOTHING;
|
||||
249
document-templates/migrations/005_additional_templates.sql
Normal file
249
document-templates/migrations/005_additional_templates.sql
Normal file
@@ -0,0 +1,249 @@
|
||||
-- Migration 005: Zusaetzliche Templates — Verpflichtungserklaerung + Art. 13/14
|
||||
|
||||
-- 1. Verpflichtungserklaerung (Vertraulichkeit Mitarbeitende)
|
||||
INSERT INTO compliance.compliance_legal_templates (
|
||||
tenant_id, document_type, title, description, language, jurisdiction,
|
||||
version, status, license_name, source_name, attribution_required,
|
||||
is_complete_document, placeholders, content
|
||||
) VALUES (
|
||||
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
|
||||
'verpflichtungserklaerung',
|
||||
'Verpflichtungserklaerung auf das Datengeheimnis',
|
||||
'Vorlage zur Verpflichtung von Mitarbeitenden auf die Vertraulichkeit und das Datengeheimnis gemaess DS-GVO. Fuer Onboarding-Prozesse.',
|
||||
'de',
|
||||
'DE',
|
||||
'1.0',
|
||||
'published',
|
||||
'MIT',
|
||||
'BreakPilot Compliance',
|
||||
false,
|
||||
true,
|
||||
CAST('[
|
||||
"{{UNTERNEHMEN_NAME}}",
|
||||
"{{UNTERNEHMEN_ADRESSE}}",
|
||||
"{{MITARBEITER_NAME}}",
|
||||
"{{MITARBEITER_ABTEILUNG}}",
|
||||
"{{DSB_NAME}}",
|
||||
"{{DSB_KONTAKT}}",
|
||||
"{{ORT_DATUM}}",
|
||||
"{{SCHULUNGSDATUM}}"
|
||||
]' AS jsonb),
|
||||
$template$# Verpflichtung auf das Datengeheimnis
|
||||
**gemaess Art. 28 Abs. 3 lit. b, Art. 29, Art. 32 Abs. 4 DS-GVO**
|
||||
|
||||
---
|
||||
|
||||
## 1. Verpflichtung
|
||||
|
||||
Ich, **{{MITARBEITER_NAME}}**, Abteilung **{{MITARBEITER_ABTEILUNG}}**, werde hiermit auf die Vertraulichkeit im Umgang mit personenbezogenen Daten verpflichtet.
|
||||
|
||||
**Arbeitgeber:** {{UNTERNEHMEN_NAME}}, {{UNTERNEHMEN_ADRESSE}}
|
||||
|
||||
Ich verpflichte mich, personenbezogene Daten, die mir im Rahmen meiner Taetigkeit bekannt werden, nur gemaess den erteilten Weisungen zu verarbeiten. Diese Verpflichtung gilt auch nach Beendigung des Beschaeftigungsverhaeltnisses fort.
|
||||
|
||||
---
|
||||
|
||||
## 2. Pflichten im Einzelnen
|
||||
|
||||
Mir ist bekannt, dass ich verpflichtet bin:
|
||||
|
||||
- Personenbezogene Daten nur im Rahmen meiner Aufgaben und nach Weisung des Verantwortlichen zu verarbeiten.
|
||||
- Die Vertraulichkeit personenbezogener Daten zu wahren und diese nicht unbefugt an Dritte weiterzugeben.
|
||||
- Personenbezogene Daten vor unbefugtem Zugriff, Verlust und Missbrauch zu schuetzen.
|
||||
- Den Datenschutzbeauftragten unverzueglich ueber Datenschutzvorfaelle oder -verletzungen zu informieren.
|
||||
- Keine personenbezogenen Daten fuer private Zwecke zu verwenden.
|
||||
- Mobile Datentraeger und Zugangsmedien sorgfaeltig aufzubewahren.
|
||||
- Passwoerter nicht weiterzugeben und regelmaessig zu aendern.
|
||||
|
||||
---
|
||||
|
||||
## 3. Rechtsfolgen bei Verstoss
|
||||
|
||||
Ein Verstoss gegen das Datengeheimnis kann folgende Konsequenzen haben:
|
||||
|
||||
- **Arbeitsrechtliche Massnahmen** bis hin zur fristlosen Kuendigung
|
||||
- **Schadensersatzansprueche** des Arbeitgebers oder der Betroffenen (Art. 82 DS-GVO)
|
||||
- **Ordnungswidrigkeiten oder Straftaten** nach BDSG und StGB (§§ 42, 43 BDSG; §§ 201-206 StGB)
|
||||
|
||||
---
|
||||
|
||||
## 4. Datenschutzschulung
|
||||
|
||||
{{#IF SCHULUNGSDATUM}}
|
||||
Ich habe am **{{SCHULUNGSDATUM}}** eine Datenschutzschulung erhalten und wurde ueber die wesentlichen Grundsaetze der DS-GVO unterrichtet.
|
||||
{{/IF}}
|
||||
{{#IF_NOT SCHULUNGSDATUM}}
|
||||
Eine Datenschutzschulung wird im Rahmen des Onboarding durchgefuehrt.
|
||||
{{/IF_NOT}}
|
||||
|
||||
---
|
||||
|
||||
## 5. Ansprechpartner
|
||||
|
||||
Bei Fragen zum Datenschutz wende ich mich an den Datenschutzbeauftragten:
|
||||
**{{DSB_NAME}}** — {{DSB_KONTAKT}}
|
||||
|
||||
---
|
||||
|
||||
## 6. Bestaetigung
|
||||
|
||||
Ich habe diese Verpflichtungserklaerung gelesen und verstanden. Ich bin mir meiner Pflichten bewusst.
|
||||
|
||||
| | Mitarbeitende/r | Arbeitgeber |
|
||||
|---|---|---|
|
||||
| **Ort, Datum** | {{ORT_DATUM}} | {{ORT_DATUM}} |
|
||||
| **Name** | {{MITARBEITER_NAME}} | |
|
||||
| **Unterschrift** | _________________ | _________________ |
|
||||
|
||||
---
|
||||
|
||||
*Erstellt mit BreakPilot Compliance. Lizenz: MIT.*
|
||||
$template$
|
||||
) ON CONFLICT DO NOTHING;
|
||||
|
||||
-- 2. Art. 13/14 Informationspflichten-Muster
|
||||
INSERT INTO compliance.compliance_legal_templates (
|
||||
tenant_id, document_type, title, description, language, jurisdiction,
|
||||
version, status, license_name, source_name, attribution_required,
|
||||
is_complete_document, placeholders, content
|
||||
) VALUES (
|
||||
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
|
||||
'informationspflichten',
|
||||
'Informationspflichten gemaess Art. 13/14 DS-GVO',
|
||||
'Mustertext fuer Datenschutzhinweise nach Art. 13 (Direkterhebung) und Art. 14 (Dritterhebung) DS-GVO. Mit bedingten Bloecken fuer beide Varianten.',
|
||||
'de',
|
||||
'EU/DSGVO',
|
||||
'1.0',
|
||||
'published',
|
||||
'MIT',
|
||||
'BreakPilot Compliance',
|
||||
false,
|
||||
true,
|
||||
CAST('[
|
||||
"{{VERANTWORTLICHER_NAME}}",
|
||||
"{{VERANTWORTLICHER_ADRESSE}}",
|
||||
"{{VERANTWORTLICHER_KONTAKT}}",
|
||||
"{{DSB_NAME}}",
|
||||
"{{DSB_KONTAKT}}",
|
||||
"{{VERARBEITUNGSZWECK}}",
|
||||
"{{RECHTSGRUNDLAGE}}",
|
||||
"{{BERECHTIGTES_INTERESSE}}",
|
||||
"{{DATENKATEGORIEN}}",
|
||||
"{{DATENQUELLE}}",
|
||||
"{{EMPFAENGER}}",
|
||||
"{{DRITTLANDTRANSFER}}",
|
||||
"{{SPEICHERDAUER}}",
|
||||
"{{AUFSICHTSBEHOERDE}}",
|
||||
"{{AUTOMATISIERTE_ENTSCHEIDUNG}}",
|
||||
"{{PFLICHT_ODER_FREIWILLIG}}"
|
||||
]' AS jsonb),
|
||||
$template$# Datenschutzhinweise
|
||||
**gemaess Art. 13 und Art. 14 der Datenschutz-Grundverordnung (DS-GVO)**
|
||||
|
||||
---
|
||||
|
||||
## 1. Verantwortlicher
|
||||
|
||||
{{VERANTWORTLICHER_NAME}}
|
||||
{{VERANTWORTLICHER_ADRESSE}}
|
||||
Kontakt: {{VERANTWORTLICHER_KONTAKT}}
|
||||
|
||||
{{#IF DSB_NAME}}
|
||||
## 2. Datenschutzbeauftragter
|
||||
|
||||
{{DSB_NAME}}
|
||||
{{DSB_KONTAKT}}
|
||||
{{/IF}}
|
||||
|
||||
---
|
||||
|
||||
## 3. Zweck und Rechtsgrundlage der Verarbeitung
|
||||
|
||||
Wir verarbeiten Ihre personenbezogenen Daten zu folgenden Zwecken:
|
||||
|
||||
{{VERARBEITUNGSZWECK}}
|
||||
|
||||
**Rechtsgrundlage:** {{RECHTSGRUNDLAGE}}
|
||||
|
||||
{{#IF BERECHTIGTES_INTERESSE}}
|
||||
**Berechtigtes Interesse (Art. 6 Abs. 1 lit. f DS-GVO):** {{BERECHTIGTES_INTERESSE}}
|
||||
{{/IF}}
|
||||
|
||||
---
|
||||
|
||||
## 4. Kategorien personenbezogener Daten
|
||||
|
||||
{{DATENKATEGORIEN}}
|
||||
|
||||
{{#IF DATENQUELLE}}
|
||||
## 5. Herkunft der Daten (Art. 14 DS-GVO)
|
||||
|
||||
Die Daten wurden nicht bei Ihnen direkt erhoben, sondern stammen aus folgender Quelle:
|
||||
|
||||
{{DATENQUELLE}}
|
||||
{{/IF}}
|
||||
|
||||
---
|
||||
|
||||
## 6. Empfaenger und Uebermittlung
|
||||
|
||||
Ihre Daten werden an folgende Empfaenger bzw. Kategorien von Empfaengern uebermittelt:
|
||||
|
||||
{{EMPFAENGER}}
|
||||
|
||||
{{#IF DRITTLANDTRANSFER}}
|
||||
### Uebermittlung in Drittlaender
|
||||
|
||||
{{DRITTLANDTRANSFER}}
|
||||
{{/IF}}
|
||||
|
||||
---
|
||||
|
||||
## 7. Speicherdauer
|
||||
|
||||
{{SPEICHERDAUER}}
|
||||
|
||||
---
|
||||
|
||||
## 8. Ihre Rechte
|
||||
|
||||
Sie haben gegenueber dem Verantwortlichen folgende Rechte hinsichtlich Ihrer personenbezogenen Daten:
|
||||
|
||||
- **Auskunftsrecht** (Art. 15 DS-GVO): Sie koennen Auskunft ueber die gespeicherten Daten verlangen.
|
||||
- **Berichtigungsrecht** (Art. 16 DS-GVO): Sie koennen die Berichtigung unrichtiger Daten verlangen.
|
||||
- **Loeschungsrecht** (Art. 17 DS-GVO): Sie koennen die Loeschung Ihrer Daten verlangen, sofern keine Aufbewahrungspflicht besteht.
|
||||
- **Einschraenkung** (Art. 18 DS-GVO): Sie koennen die Einschraenkung der Verarbeitung verlangen.
|
||||
- **Datenuebert ragbarkeit** (Art. 20 DS-GVO): Sie koennen Ihre Daten in einem strukturierten, maschinenlesbaren Format erhalten.
|
||||
- **Widerspruchsrecht** (Art. 21 DS-GVO): Sie koennen der Verarbeitung widersprechen, insbesondere bei Direktwerbung.
|
||||
|
||||
{{#IF RECHTSGRUNDLAGE}}
|
||||
- **Widerrufsrecht** (Art. 7 Abs. 3 DS-GVO): Sofern die Verarbeitung auf Einwilligung beruht, koennen Sie diese jederzeit widerrufen, ohne dass die Rechtmaessigkeit der bis dahin erfolgten Verarbeitung beruehrt wird.
|
||||
{{/IF}}
|
||||
|
||||
---
|
||||
|
||||
## 9. Beschwerderecht
|
||||
|
||||
Sie haben das Recht, sich bei einer Aufsichtsbehoerde zu beschweren:
|
||||
|
||||
{{AUFSICHTSBEHOERDE}}
|
||||
|
||||
---
|
||||
|
||||
{{#IF AUTOMATISIERTE_ENTSCHEIDUNG}}
|
||||
## 10. Automatisierte Entscheidungsfindung (Art. 22 DS-GVO)
|
||||
|
||||
{{AUTOMATISIERTE_ENTSCHEIDUNG}}
|
||||
{{/IF}}
|
||||
|
||||
{{#IF PFLICHT_ODER_FREIWILLIG}}
|
||||
## 11. Bereitstellung der Daten
|
||||
|
||||
{{PFLICHT_ODER_FREIWILLIG}}
|
||||
{{/IF}}
|
||||
|
||||
---
|
||||
|
||||
*Stand: Siehe Versionsdatum des Dokuments. Erstellt mit BreakPilot Compliance. Lizenz: MIT.*
|
||||
$template$
|
||||
) ON CONFLICT DO NOTHING;
|
||||
137
document-templates/scripts/cleanup_temp_vorlagen.py
Normal file
137
document-templates/scripts/cleanup_temp_vorlagen.py
Normal file
@@ -0,0 +1,137 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Cleanup script: Delete temporary DPA template documents from Qdrant.
|
||||
|
||||
Removes all points with payload field `temp_vorlagen=true` from
|
||||
the bp_compliance_datenschutz collection.
|
||||
|
||||
Usage:
|
||||
python cleanup_temp_vorlagen.py --dry-run # Preview only
|
||||
python cleanup_temp_vorlagen.py # Execute deletion
|
||||
python cleanup_temp_vorlagen.py --qdrant-url http://localhost:6333
|
||||
"""
|
||||
|
||||
import argparse
|
||||
import json
|
||||
import sys
|
||||
from typing import Optional
|
||||
from urllib.request import Request, urlopen
|
||||
from urllib.error import URLError
|
||||
|
||||
|
||||
def qdrant_request(base_url: str, method: str, path: str, body: Optional[dict] = None) -> dict:
|
||||
url = f"{base_url}{path}"
|
||||
data = json.dumps(body).encode() if body else None
|
||||
headers = {"Content-Type": "application/json"} if data else {}
|
||||
req = Request(url, data=data, headers=headers, method=method)
|
||||
with urlopen(req, timeout=30) as resp:
|
||||
return json.loads(resp.read())
|
||||
|
||||
|
||||
def count_temp_vorlagen(base_url: str, collection: str) -> int:
|
||||
"""Count points with temp_vorlagen=true."""
|
||||
body = {
|
||||
"filter": {
|
||||
"must": [
|
||||
{"key": "temp_vorlagen", "match": {"value": True}}
|
||||
]
|
||||
},
|
||||
"limit": 0,
|
||||
"exact": True,
|
||||
}
|
||||
result = qdrant_request(base_url, "POST", f"/collections/{collection}/points/count", body)
|
||||
return result.get("result", {}).get("count", 0)
|
||||
|
||||
|
||||
def list_temp_regulation_ids(base_url: str, collection: str) -> list[str]:
|
||||
"""Get distinct regulation_ids of temp documents."""
|
||||
body = {
|
||||
"filter": {
|
||||
"must": [
|
||||
{"key": "temp_vorlagen", "match": {"value": True}}
|
||||
]
|
||||
},
|
||||
"limit": 500,
|
||||
"with_payload": ["regulation_id", "title", "source"],
|
||||
}
|
||||
result = qdrant_request(base_url, "POST", f"/collections/{collection}/points/scroll", body)
|
||||
points = result.get("result", {}).get("points", [])
|
||||
|
||||
seen = {}
|
||||
for p in points:
|
||||
payload = p.get("payload", {})
|
||||
rid = payload.get("regulation_id", "unknown")
|
||||
if rid not in seen:
|
||||
seen[rid] = {
|
||||
"regulation_id": rid,
|
||||
"title": payload.get("title", ""),
|
||||
"source": payload.get("source", ""),
|
||||
}
|
||||
return list(seen.values())
|
||||
|
||||
|
||||
def delete_temp_vorlagen(base_url: str, collection: str) -> int:
|
||||
"""Delete all points with temp_vorlagen=true."""
|
||||
body = {
|
||||
"filter": {
|
||||
"must": [
|
||||
{"key": "temp_vorlagen", "match": {"value": True}}
|
||||
]
|
||||
}
|
||||
}
|
||||
result = qdrant_request(base_url, "POST", f"/collections/{collection}/points/delete", body)
|
||||
status = result.get("status", "unknown")
|
||||
return status
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description="Delete temp DPA templates from Qdrant")
|
||||
parser.add_argument("--qdrant-url", default="http://localhost:6333",
|
||||
help="Qdrant URL (default: http://localhost:6333)")
|
||||
parser.add_argument("--collection", default="bp_compliance_datenschutz",
|
||||
help="Qdrant collection name")
|
||||
parser.add_argument("--dry-run", action="store_true",
|
||||
help="Only count and list, do not delete")
|
||||
args = parser.parse_args()
|
||||
|
||||
print(f"Qdrant URL: {args.qdrant_url}")
|
||||
print(f"Collection: {args.collection}")
|
||||
print()
|
||||
|
||||
try:
|
||||
count = count_temp_vorlagen(args.qdrant_url, args.collection)
|
||||
except URLError as e:
|
||||
print(f"ERROR: Cannot connect to Qdrant at {args.qdrant_url}: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
print(f"Gefundene Punkte mit temp_vorlagen=true: {count}")
|
||||
|
||||
if count == 0:
|
||||
print("Nichts zu loeschen.")
|
||||
return
|
||||
|
||||
docs = list_temp_regulation_ids(args.qdrant_url, args.collection)
|
||||
print(f"\nBetroffene Dokumente ({len(docs)}):")
|
||||
for doc in sorted(docs, key=lambda d: d["regulation_id"]):
|
||||
source = f" [{doc['source']}]" if doc.get("source") else ""
|
||||
title = f" — {doc['title']}" if doc.get("title") else ""
|
||||
print(f" - {doc['regulation_id']}{title}{source}")
|
||||
|
||||
if args.dry_run:
|
||||
print(f"\n[DRY-RUN] Wuerde {count} Punkte loeschen. Keine Aenderung durchgefuehrt.")
|
||||
return
|
||||
|
||||
print(f"\nLoesche {count} Punkte ...")
|
||||
status = delete_temp_vorlagen(args.qdrant_url, args.collection)
|
||||
print(f"Status: {status}")
|
||||
|
||||
remaining = count_temp_vorlagen(args.qdrant_url, args.collection)
|
||||
print(f"Verbleibende temp_vorlagen Punkte: {remaining}")
|
||||
|
||||
if remaining == 0:
|
||||
print("Cleanup erfolgreich abgeschlossen.")
|
||||
else:
|
||||
print(f"WARNUNG: {remaining} Punkte konnten nicht geloescht werden.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -251,14 +251,251 @@ async def rerank_cohere(query: str, documents: List[str], top_k: int = 5) -> Lis
|
||||
GERMAN_ABBREVIATIONS = {
|
||||
'bzw', 'ca', 'chr', 'd.h', 'dr', 'etc', 'evtl', 'ggf', 'inkl', 'max',
|
||||
'min', 'mio', 'mrd', 'nr', 'prof', 's', 'sog', 'u.a', 'u.ä', 'usw',
|
||||
'v.a', 'vgl', 'vs', 'z.b', 'z.t', 'zzgl'
|
||||
'v.a', 'vgl', 'vs', 'z.b', 'z.t', 'zzgl', 'abs', 'art', 'abschn',
|
||||
'anh', 'anl', 'aufl', 'bd', 'bes', 'bzgl', 'dgl', 'einschl', 'entspr',
|
||||
'erg', 'erl', 'gem', 'grds', 'hrsg', 'insb', 'ivm', 'kap', 'lit',
|
||||
'nachf', 'rdnr', 'rn', 'rz', 'ua', 'uvm', 'vorst', 'ziff'
|
||||
}
|
||||
|
||||
# English abbreviations that don't end sentences
|
||||
ENGLISH_ABBREVIATIONS = {
|
||||
'e.g', 'i.e', 'etc', 'vs', 'al', 'approx', 'avg', 'dept', 'dr', 'ed',
|
||||
'est', 'fig', 'govt', 'inc', 'jr', 'ltd', 'max', 'min', 'mr', 'mrs',
|
||||
'ms', 'no', 'prof', 'pt', 'ref', 'rev', 'sec', 'sgt', 'sr', 'st',
|
||||
'vol', 'cf', 'ch', 'cl', 'col', 'corp', 'cpl', 'def', 'dist', 'div',
|
||||
'gen', 'hon', 'illus', 'intl', 'natl', 'org', 'para', 'pp', 'repr',
|
||||
'resp', 'supp', 'tech', 'temp', 'treas', 'univ'
|
||||
}
|
||||
|
||||
# Combined abbreviations for both languages
|
||||
ALL_ABBREVIATIONS = GERMAN_ABBREVIATIONS | ENGLISH_ABBREVIATIONS
|
||||
|
||||
# Regex pattern for legal section headers (§, Art., Article, Section, etc.)
|
||||
import re
|
||||
|
||||
_LEGAL_SECTION_RE = re.compile(
|
||||
r'^(?:'
|
||||
r'§\s*\d+' # § 25, § 5a
|
||||
r'|Art(?:ikel|icle|\.)\s*\d+' # Artikel 5, Article 12, Art. 3
|
||||
r'|Section\s+\d+' # Section 4.2
|
||||
r'|Abschnitt\s+\d+' # Abschnitt III
|
||||
r'|Kapitel\s+\d+' # Kapitel 2
|
||||
r'|Chapter\s+\d+' # Chapter 3
|
||||
r'|Anhang\s+[IVXLC\d]+' # Anhang III
|
||||
r'|Annex\s+[IVXLC\d]+' # Annex XII
|
||||
r'|TEIL\s+[IVXLC\d]+' # TEIL II
|
||||
r'|Part\s+[IVXLC\d]+' # Part III
|
||||
r'|Recital\s+\d+' # Recital 42
|
||||
r'|Erwaegungsgrund\s+\d+' # Erwaegungsgrund 26
|
||||
r')',
|
||||
re.IGNORECASE | re.MULTILINE
|
||||
)
|
||||
|
||||
# Regex for any heading-like line (Markdown ## or ALL-CAPS line)
|
||||
_HEADING_RE = re.compile(
|
||||
r'^(?:'
|
||||
r'#{1,6}\s+.+' # Markdown headings
|
||||
r'|[A-ZÄÖÜ][A-ZÄÖÜ\s\-]{5,}$' # ALL-CAPS lines (>5 chars)
|
||||
r')',
|
||||
re.MULTILINE
|
||||
)
|
||||
|
||||
|
||||
def _detect_language(text: str) -> str:
|
||||
"""Simple heuristic: count German vs English marker words."""
|
||||
sample = text[:5000].lower()
|
||||
de_markers = sum(1 for w in ['der', 'die', 'das', 'und', 'ist', 'für', 'von',
|
||||
'werden', 'nach', 'gemäß', 'sowie', 'durch']
|
||||
if f' {w} ' in sample)
|
||||
en_markers = sum(1 for w in ['the', 'and', 'for', 'that', 'with', 'shall',
|
||||
'must', 'should', 'which', 'from', 'this']
|
||||
if f' {w} ' in sample)
|
||||
return 'de' if de_markers > en_markers else 'en'
|
||||
|
||||
|
||||
def _protect_abbreviations(text: str) -> str:
|
||||
"""Replace dots in abbreviations with placeholders to prevent false sentence splits."""
|
||||
protected = text
|
||||
for abbrev in ALL_ABBREVIATIONS:
|
||||
pattern = re.compile(r'\b(' + re.escape(abbrev) + r')\.', re.IGNORECASE)
|
||||
# Use lambda to preserve original case of the matched abbreviation
|
||||
protected = pattern.sub(lambda m: m.group(1).replace('.', '<DOT>') + '<ABBR>', protected)
|
||||
# Protect decimals (3.14) and ordinals (1. Absatz)
|
||||
protected = re.sub(r'(\d)\.(\d)', r'\1<DECIMAL>\2', protected)
|
||||
protected = re.sub(r'(\d+)\.\s', r'\1<ORD> ', protected)
|
||||
return protected
|
||||
|
||||
|
||||
def _restore_abbreviations(text: str) -> str:
|
||||
"""Restore placeholders back to dots."""
|
||||
return (text
|
||||
.replace('<DOT>', '.')
|
||||
.replace('<ABBR>', '.')
|
||||
.replace('<DECIMAL>', '.')
|
||||
.replace('<ORD>', '.'))
|
||||
|
||||
|
||||
def _split_sentences(text: str) -> List[str]:
|
||||
"""Split text into sentences, respecting abbreviations in DE and EN."""
|
||||
protected = _protect_abbreviations(text)
|
||||
# Split after sentence-ending punctuation followed by uppercase or newline
|
||||
sentence_pattern = r'(?<=[.!?])\s+(?=[A-ZÄÖÜÀ-Ý])|(?<=[.!?])\s*\n'
|
||||
raw = re.split(sentence_pattern, protected)
|
||||
sentences = []
|
||||
for s in raw:
|
||||
s = _restore_abbreviations(s).strip()
|
||||
if s:
|
||||
sentences.append(s)
|
||||
return sentences
|
||||
|
||||
|
||||
def _extract_section_header(line: str) -> Optional[str]:
|
||||
"""Extract a legal section header from a line, or None."""
|
||||
m = _LEGAL_SECTION_RE.match(line.strip())
|
||||
if m:
|
||||
return line.strip()
|
||||
m = _HEADING_RE.match(line.strip())
|
||||
if m:
|
||||
return line.strip()
|
||||
return None
|
||||
|
||||
|
||||
def chunk_text_legal(text: str, chunk_size: int, overlap: int) -> List[str]:
|
||||
"""
|
||||
Legal-document-aware chunking.
|
||||
|
||||
Strategy:
|
||||
1. Split on legal section boundaries (§, Art., Section, Chapter, etc.)
|
||||
2. Within each section, split on paragraph boundaries (double newline)
|
||||
3. Within each paragraph, split on sentence boundaries
|
||||
4. Prepend section header as context prefix to every chunk
|
||||
5. Add overlap from previous chunk
|
||||
|
||||
Works for both German (DSGVO, BGB, AI Act DE) and English (NIST, SLSA, CRA EN) texts.
|
||||
"""
|
||||
if not text or len(text) <= chunk_size:
|
||||
return [text.strip()] if text and text.strip() else []
|
||||
|
||||
# --- Phase 1: Split into sections by legal headers ---
|
||||
lines = text.split('\n')
|
||||
sections = [] # list of (header, content)
|
||||
current_header = None
|
||||
current_lines = []
|
||||
|
||||
for line in lines:
|
||||
header = _extract_section_header(line)
|
||||
if header and current_lines:
|
||||
sections.append((current_header, '\n'.join(current_lines)))
|
||||
current_header = header
|
||||
current_lines = [line]
|
||||
elif header and not current_lines:
|
||||
current_header = header
|
||||
current_lines = [line]
|
||||
else:
|
||||
current_lines.append(line)
|
||||
|
||||
if current_lines:
|
||||
sections.append((current_header, '\n'.join(current_lines)))
|
||||
|
||||
# --- Phase 2: Within each section, split on paragraphs, then sentences ---
|
||||
raw_chunks = []
|
||||
|
||||
for section_header, section_text in sections:
|
||||
# Build context prefix (max 120 chars to leave room for content)
|
||||
prefix = ""
|
||||
if section_header:
|
||||
truncated = section_header[:120]
|
||||
prefix = f"[{truncated}] "
|
||||
|
||||
paragraphs = re.split(r'\n\s*\n', section_text)
|
||||
|
||||
current_chunk = prefix
|
||||
current_length = len(prefix)
|
||||
|
||||
for para in paragraphs:
|
||||
para = para.strip()
|
||||
if not para:
|
||||
continue
|
||||
|
||||
# If paragraph fits in remaining space, append
|
||||
if current_length + len(para) + 1 <= chunk_size:
|
||||
if current_chunk and not current_chunk.endswith(' '):
|
||||
current_chunk += '\n\n'
|
||||
current_chunk += para
|
||||
current_length = len(current_chunk)
|
||||
continue
|
||||
|
||||
# Paragraph doesn't fit — flush current chunk if non-empty
|
||||
if current_chunk.strip() and current_chunk.strip() != prefix.strip():
|
||||
raw_chunks.append(current_chunk.strip())
|
||||
|
||||
# If entire paragraph fits in a fresh chunk, start new chunk
|
||||
if len(prefix) + len(para) <= chunk_size:
|
||||
current_chunk = prefix + para
|
||||
current_length = len(current_chunk)
|
||||
continue
|
||||
|
||||
# Paragraph too long — split by sentences
|
||||
sentences = _split_sentences(para)
|
||||
current_chunk = prefix
|
||||
current_length = len(prefix)
|
||||
|
||||
for sentence in sentences:
|
||||
sentence_len = len(sentence)
|
||||
|
||||
# Single sentence exceeds chunk_size — force-split
|
||||
if len(prefix) + sentence_len > chunk_size:
|
||||
if current_chunk.strip() and current_chunk.strip() != prefix.strip():
|
||||
raw_chunks.append(current_chunk.strip())
|
||||
# Hard split the long sentence
|
||||
remaining = sentence
|
||||
while remaining:
|
||||
take = chunk_size - len(prefix)
|
||||
chunk_part = prefix + remaining[:take]
|
||||
raw_chunks.append(chunk_part.strip())
|
||||
remaining = remaining[take:]
|
||||
current_chunk = prefix
|
||||
current_length = len(prefix)
|
||||
continue
|
||||
|
||||
if current_length + sentence_len + 1 > chunk_size:
|
||||
if current_chunk.strip() and current_chunk.strip() != prefix.strip():
|
||||
raw_chunks.append(current_chunk.strip())
|
||||
current_chunk = prefix + sentence
|
||||
current_length = len(current_chunk)
|
||||
else:
|
||||
if current_chunk and not current_chunk.endswith(' '):
|
||||
current_chunk += ' '
|
||||
current_chunk += sentence
|
||||
current_length = len(current_chunk)
|
||||
|
||||
# Flush remaining content for this section
|
||||
if current_chunk.strip() and current_chunk.strip() != prefix.strip():
|
||||
raw_chunks.append(current_chunk.strip())
|
||||
|
||||
if not raw_chunks:
|
||||
return [text.strip()] if text.strip() else []
|
||||
|
||||
# --- Phase 3: Add overlap ---
|
||||
final_chunks = []
|
||||
for i, chunk in enumerate(raw_chunks):
|
||||
if i > 0 and overlap > 0:
|
||||
prev = raw_chunks[i - 1]
|
||||
# Take overlap from end of previous chunk (but not the prefix)
|
||||
overlap_text = prev[-min(overlap, len(prev)):]
|
||||
# Only add overlap if it doesn't start mid-word
|
||||
space_idx = overlap_text.find(' ')
|
||||
if space_idx > 0:
|
||||
overlap_text = overlap_text[space_idx + 1:]
|
||||
if overlap_text:
|
||||
chunk = overlap_text + ' ' + chunk
|
||||
final_chunks.append(chunk.strip())
|
||||
|
||||
return [c for c in final_chunks if c]
|
||||
|
||||
|
||||
def chunk_text_recursive(text: str, chunk_size: int, overlap: int) -> List[str]:
|
||||
"""Recursive character-based chunking."""
|
||||
import re
|
||||
|
||||
"""Recursive character-based chunking (legacy, use legal_recursive for legal docs)."""
|
||||
if not text or len(text) <= chunk_size:
|
||||
return [text] if text else []
|
||||
|
||||
@@ -315,36 +552,23 @@ def chunk_text_recursive(text: str, chunk_size: int, overlap: int) -> List[str]:
|
||||
|
||||
def chunk_text_semantic(text: str, chunk_size: int, overlap_sentences: int = 1) -> List[str]:
|
||||
"""Semantic sentence-aware chunking."""
|
||||
import re
|
||||
|
||||
if not text:
|
||||
return []
|
||||
|
||||
if len(text) <= chunk_size:
|
||||
return [text.strip()]
|
||||
|
||||
# Split into sentences (simplified for German)
|
||||
text = re.sub(r'\s+', ' ', text).strip()
|
||||
|
||||
# Protect abbreviations
|
||||
protected = text
|
||||
for abbrev in GERMAN_ABBREVIATIONS:
|
||||
pattern = re.compile(r'\b' + re.escape(abbrev) + r'\.', re.IGNORECASE)
|
||||
protected = pattern.sub(abbrev.replace('.', '<DOT>') + '<ABBR>', protected)
|
||||
|
||||
# Protect decimals and ordinals
|
||||
protected = re.sub(r'(\d)\.(\d)', r'\1<DECIMAL>\2', protected)
|
||||
protected = re.sub(r'(\d+)\.(\s)', r'\1<ORD>\2', protected)
|
||||
protected = _protect_abbreviations(text)
|
||||
|
||||
# Split on sentence endings
|
||||
sentence_pattern = r'(?<=[.!?])\s+(?=[A-ZÄÖÜ])|(?<=[.!?])$'
|
||||
sentence_pattern = r'(?<=[.!?])\s+(?=[A-ZÄÖÜÀ-Ý])|(?<=[.!?])$'
|
||||
raw_sentences = re.split(sentence_pattern, protected)
|
||||
|
||||
# Restore protected characters
|
||||
sentences = []
|
||||
for s in raw_sentences:
|
||||
s = s.replace('<DOT>', '.').replace('<ABBR>', '.').replace('<DECIMAL>', '.').replace('<ORD>', '.')
|
||||
s = s.strip()
|
||||
s = _restore_abbreviations(s).strip()
|
||||
if s:
|
||||
sentences.append(s)
|
||||
|
||||
@@ -638,7 +862,16 @@ async def rerank_documents(request: RerankRequest):
|
||||
|
||||
@app.post("/chunk", response_model=ChunkResponse)
|
||||
async def chunk_text(request: ChunkRequest):
|
||||
"""Chunk text into smaller pieces."""
|
||||
"""Chunk text into smaller pieces.
|
||||
|
||||
Strategies:
|
||||
- "recursive" (default): Legal-document-aware chunking with §/Art./Section
|
||||
boundary detection, section context headers, paragraph-level splitting,
|
||||
and sentence-level splitting respecting DE + EN abbreviations.
|
||||
- "semantic": Sentence-aware chunking with overlap by sentence count.
|
||||
|
||||
The old plain recursive chunker has been retired and is no longer available.
|
||||
"""
|
||||
if not request.text:
|
||||
return ChunkResponse(chunks=[], count=0, strategy=request.strategy)
|
||||
|
||||
@@ -647,7 +880,9 @@ async def chunk_text(request: ChunkRequest):
|
||||
overlap_sentences = max(1, request.overlap // 100)
|
||||
chunks = chunk_text_semantic(request.text, request.chunk_size, overlap_sentences)
|
||||
else:
|
||||
chunks = chunk_text_recursive(request.text, request.chunk_size, request.overlap)
|
||||
# All strategies (recursive, legal_recursive, etc.) use the legal-aware chunker.
|
||||
# The old plain recursive chunker is no longer exposed via the API.
|
||||
chunks = chunk_text_legal(request.text, request.chunk_size, request.overlap)
|
||||
|
||||
return ChunkResponse(
|
||||
chunks=chunks,
|
||||
|
||||
288
embedding-service/test_chunking.py
Normal file
288
embedding-service/test_chunking.py
Normal file
@@ -0,0 +1,288 @@
|
||||
"""
|
||||
Tests for the legal-aware chunking pipeline.
|
||||
|
||||
Covers:
|
||||
- Legal section header detection (§, Art., Section, Chapter, Annex)
|
||||
- Section context prefix in every chunk
|
||||
- Paragraph boundary splitting
|
||||
- Sentence splitting with DE and EN abbreviation protection
|
||||
- Overlap between chunks
|
||||
- Fallback for non-legal text
|
||||
- Long sentence force-splitting
|
||||
"""
|
||||
|
||||
import pytest
|
||||
from main import (
|
||||
chunk_text_legal,
|
||||
chunk_text_recursive,
|
||||
chunk_text_semantic,
|
||||
_extract_section_header,
|
||||
_split_sentences,
|
||||
_detect_language,
|
||||
_protect_abbreviations,
|
||||
_restore_abbreviations,
|
||||
)
|
||||
|
||||
|
||||
# =========================================================================
|
||||
# Section header detection
|
||||
# =========================================================================
|
||||
|
||||
class TestSectionHeaderDetection:
|
||||
|
||||
def test_german_paragraph(self):
|
||||
assert _extract_section_header("§ 25 Informationspflichten") is not None
|
||||
|
||||
def test_german_paragraph_with_letter(self):
|
||||
assert _extract_section_header("§ 5a Elektronischer Geschaeftsverkehr") is not None
|
||||
|
||||
def test_german_artikel(self):
|
||||
assert _extract_section_header("Artikel 5 Grundsaetze") is not None
|
||||
|
||||
def test_english_article(self):
|
||||
assert _extract_section_header("Article 12 Transparency") is not None
|
||||
|
||||
def test_article_abbreviated(self):
|
||||
assert _extract_section_header("Art. 3 Definitions") is not None
|
||||
|
||||
def test_english_section(self):
|
||||
assert _extract_section_header("Section 4.2 Risk Assessment") is not None
|
||||
|
||||
def test_german_abschnitt(self):
|
||||
assert _extract_section_header("Abschnitt 3 Pflichten") is not None
|
||||
|
||||
def test_chapter(self):
|
||||
assert _extract_section_header("Chapter 5 Obligations") is not None
|
||||
|
||||
def test_german_kapitel(self):
|
||||
assert _extract_section_header("Kapitel 2 Anwendungsbereich") is not None
|
||||
|
||||
def test_annex_roman(self):
|
||||
assert _extract_section_header("Annex XII Technical Documentation") is not None
|
||||
|
||||
def test_german_anhang(self):
|
||||
assert _extract_section_header("Anhang III Hochrisiko-KI") is not None
|
||||
|
||||
def test_part(self):
|
||||
assert _extract_section_header("Part III Requirements") is not None
|
||||
|
||||
def test_markdown_heading(self):
|
||||
assert _extract_section_header("## 3.1 Overview") is not None
|
||||
|
||||
def test_normal_text_not_header(self):
|
||||
assert _extract_section_header("This is a normal sentence.") is None
|
||||
|
||||
def test_short_caps_not_header(self):
|
||||
assert _extract_section_header("OK") is None
|
||||
|
||||
|
||||
# =========================================================================
|
||||
# Language detection
|
||||
# =========================================================================
|
||||
|
||||
class TestLanguageDetection:
|
||||
|
||||
def test_german_text(self):
|
||||
text = "Die Verordnung ist für alle Mitgliedstaaten verbindlich und gilt nach dem Grundsatz der unmittelbaren Anwendbarkeit."
|
||||
assert _detect_language(text) == 'de'
|
||||
|
||||
def test_english_text(self):
|
||||
text = "This regulation shall be binding in its entirety and directly applicable in all Member States."
|
||||
assert _detect_language(text) == 'en'
|
||||
|
||||
|
||||
# =========================================================================
|
||||
# Abbreviation protection
|
||||
# =========================================================================
|
||||
|
||||
class TestAbbreviationProtection:
|
||||
|
||||
def test_german_abbreviations(self):
|
||||
text = "gem. § 5 Abs. 1 bzw. § 6 Abs. 2 z.B. die Pflicht"
|
||||
protected = _protect_abbreviations(text)
|
||||
assert "." not in protected.replace("<DOT>", "").replace("<DECIMAL>", "").replace("<ORD>", "").replace("<ABBR>", "")
|
||||
restored = _restore_abbreviations(protected)
|
||||
assert "gem." in restored
|
||||
assert "z.B." in restored.replace("z.b.", "z.B.") or "z.b." in restored
|
||||
|
||||
def test_english_abbreviations(self):
|
||||
text = "e.g. section 4.2, i.e. the requirements in vol. 1 ref. NIST SP 800-30."
|
||||
protected = _protect_abbreviations(text)
|
||||
# "e.g" and "i.e" should be protected
|
||||
restored = _restore_abbreviations(protected)
|
||||
assert "e.g." in restored
|
||||
|
||||
def test_decimals_protected(self):
|
||||
text = "Version 3.14 of the specification requires 2.5 GB."
|
||||
protected = _protect_abbreviations(text)
|
||||
assert "<DECIMAL>" in protected
|
||||
restored = _restore_abbreviations(protected)
|
||||
assert "3.14" in restored
|
||||
|
||||
|
||||
# =========================================================================
|
||||
# Sentence splitting
|
||||
# =========================================================================
|
||||
|
||||
class TestSentenceSplitting:
|
||||
|
||||
def test_simple_german(self):
|
||||
text = "Erster Satz. Zweiter Satz. Dritter Satz."
|
||||
sentences = _split_sentences(text)
|
||||
assert len(sentences) >= 2
|
||||
|
||||
def test_simple_english(self):
|
||||
text = "First sentence. Second sentence. Third sentence."
|
||||
sentences = _split_sentences(text)
|
||||
assert len(sentences) >= 2
|
||||
|
||||
def test_german_abbreviation_not_split(self):
|
||||
text = "Gem. Art. 5 Abs. 1 DSGVO ist die Verarbeitung rechtmaessig. Der Verantwortliche muss dies nachweisen."
|
||||
sentences = _split_sentences(text)
|
||||
# Should NOT split at "Gem." or "Art." or "Abs."
|
||||
assert any("Gem" in s and "DSGVO" in s for s in sentences)
|
||||
|
||||
def test_english_abbreviation_not_split(self):
|
||||
text = "See e.g. Section 4.2 for details. The standard also references vol. 1 of the NIST SP series."
|
||||
sentences = _split_sentences(text)
|
||||
assert any("e.g" in s and "Section" in s for s in sentences)
|
||||
|
||||
def test_exclamation_and_question(self):
|
||||
text = "Is this valid? Yes it is! Continue processing."
|
||||
sentences = _split_sentences(text)
|
||||
assert len(sentences) >= 2
|
||||
|
||||
|
||||
# =========================================================================
|
||||
# Legal chunking
|
||||
# =========================================================================
|
||||
|
||||
class TestChunkTextLegal:
|
||||
|
||||
def test_small_text_single_chunk(self):
|
||||
text = "Short text."
|
||||
chunks = chunk_text_legal(text, chunk_size=1024, overlap=128)
|
||||
assert len(chunks) == 1
|
||||
assert chunks[0] == "Short text."
|
||||
|
||||
def test_section_header_as_prefix(self):
|
||||
text = "§ 25 Informationspflichten\n\nDer Betreiber muss den Nutzer informieren. " * 20
|
||||
chunks = chunk_text_legal(text, chunk_size=200, overlap=0)
|
||||
assert len(chunks) > 1
|
||||
# Every chunk should have the section prefix
|
||||
for chunk in chunks:
|
||||
assert "[§ 25" in chunk or "§ 25" in chunk
|
||||
|
||||
def test_article_prefix_english(self):
|
||||
text = "Article 12 Transparency\n\n" + "The provider shall ensure transparency of AI systems. " * 30
|
||||
chunks = chunk_text_legal(text, chunk_size=300, overlap=0)
|
||||
assert len(chunks) > 1
|
||||
for chunk in chunks:
|
||||
assert "Article 12" in chunk
|
||||
|
||||
def test_multiple_sections(self):
|
||||
text = (
|
||||
"§ 1 Anwendungsbereich\n\nDieses Gesetz gilt fuer alle Betreiber.\n\n"
|
||||
"§ 2 Begriffsbestimmungen\n\nIm Sinne dieses Gesetzes ist Betreiber, wer eine Anlage betreibt.\n\n"
|
||||
"§ 3 Pflichten\n\nDer Betreiber hat die Pflicht, die Anlage sicher zu betreiben."
|
||||
)
|
||||
chunks = chunk_text_legal(text, chunk_size=200, overlap=0)
|
||||
# Should have chunks from different sections
|
||||
section_headers = set()
|
||||
for chunk in chunks:
|
||||
if "[§ 1" in chunk:
|
||||
section_headers.add("§ 1")
|
||||
if "[§ 2" in chunk:
|
||||
section_headers.add("§ 2")
|
||||
if "[§ 3" in chunk:
|
||||
section_headers.add("§ 3")
|
||||
assert len(section_headers) >= 2
|
||||
|
||||
def test_paragraph_boundaries_respected(self):
|
||||
para1 = "First paragraph with enough text to matter. " * 5
|
||||
para2 = "Second paragraph also with content. " * 5
|
||||
text = para1.strip() + "\n\n" + para2.strip()
|
||||
chunks = chunk_text_legal(text, chunk_size=300, overlap=0)
|
||||
# Paragraphs should not be merged mid-sentence across chunk boundary
|
||||
assert len(chunks) >= 2
|
||||
|
||||
def test_overlap_present(self):
|
||||
text = "Sentence one about topic A. " * 10 + "\n\n" + "Sentence two about topic B. " * 10
|
||||
chunks = chunk_text_legal(text, chunk_size=200, overlap=50)
|
||||
if len(chunks) > 1:
|
||||
# Second chunk should contain some text from end of first chunk
|
||||
end_of_first = chunks[0][-30:]
|
||||
# At least some overlap words should appear
|
||||
overlap_words = set(end_of_first.split())
|
||||
second_start_words = set(chunks[1][:80].split())
|
||||
assert len(overlap_words & second_start_words) > 0
|
||||
|
||||
def test_nist_style_sections(self):
|
||||
text = (
|
||||
"Section 2.1 Risk Framing\n\n"
|
||||
"Risk framing establishes the context for risk-based decisions. "
|
||||
"Organizations must define their risk tolerance. " * 10 + "\n\n"
|
||||
"Section 2.2 Risk Assessment\n\n"
|
||||
"Risk assessment identifies threats and vulnerabilities. " * 10
|
||||
)
|
||||
chunks = chunk_text_legal(text, chunk_size=400, overlap=0)
|
||||
has_21 = any("Section 2.1" in c for c in chunks)
|
||||
has_22 = any("Section 2.2" in c for c in chunks)
|
||||
assert has_21 and has_22
|
||||
|
||||
def test_markdown_heading_as_context(self):
|
||||
text = (
|
||||
"## 3.1 Overview\n\n"
|
||||
"This section provides an overview of the specification. " * 15
|
||||
)
|
||||
chunks = chunk_text_legal(text, chunk_size=300, overlap=0)
|
||||
assert len(chunks) > 1
|
||||
for chunk in chunks:
|
||||
assert "3.1 Overview" in chunk
|
||||
|
||||
def test_empty_text(self):
|
||||
assert chunk_text_legal("", 1024, 128) == []
|
||||
|
||||
def test_whitespace_only(self):
|
||||
assert chunk_text_legal(" \n\n ", 1024, 128) == []
|
||||
|
||||
def test_long_sentence_force_split(self):
|
||||
long_sentence = "A" * 2000
|
||||
chunks = chunk_text_legal(long_sentence, chunk_size=500, overlap=0)
|
||||
assert len(chunks) >= 4
|
||||
for chunk in chunks:
|
||||
assert len(chunk) <= 500 + 20 # small margin for prefix
|
||||
|
||||
|
||||
# =========================================================================
|
||||
# Legacy recursive chunking still works
|
||||
# =========================================================================
|
||||
|
||||
class TestChunkTextRecursive:
|
||||
|
||||
def test_basic_split(self):
|
||||
text = "Hello world. " * 200
|
||||
chunks = chunk_text_recursive(text, chunk_size=500, overlap=50)
|
||||
assert len(chunks) > 1
|
||||
for chunk in chunks:
|
||||
assert len(chunk) <= 600 # some margin for overlap
|
||||
|
||||
def test_small_text(self):
|
||||
chunks = chunk_text_recursive("Short.", chunk_size=1024, overlap=128)
|
||||
assert chunks == ["Short."]
|
||||
|
||||
|
||||
# =========================================================================
|
||||
# Semantic chunking still works
|
||||
# =========================================================================
|
||||
|
||||
class TestChunkTextSemantic:
|
||||
|
||||
def test_basic_split(self):
|
||||
text = "First sentence. Second sentence. Third sentence. Fourth sentence. Fifth sentence."
|
||||
chunks = chunk_text_semantic(text, chunk_size=50, overlap_sentences=1)
|
||||
assert len(chunks) >= 2
|
||||
|
||||
def test_small_text(self):
|
||||
chunks = chunk_text_semantic("Short.", chunk_size=1024, overlap_sentences=1)
|
||||
assert chunks == ["Short."]
|
||||
5
levis-holzbau/.dockerignore
Normal file
5
levis-holzbau/.dockerignore
Normal file
@@ -0,0 +1,5 @@
|
||||
node_modules
|
||||
.next
|
||||
.git
|
||||
Dockerfile
|
||||
.dockerignore
|
||||
27
levis-holzbau/Dockerfile
Normal file
27
levis-holzbau/Dockerfile
Normal file
@@ -0,0 +1,27 @@
|
||||
FROM node:20-alpine AS base
|
||||
|
||||
FROM base AS deps
|
||||
WORKDIR /app
|
||||
COPY package.json package-lock.json* ./
|
||||
RUN npm ci
|
||||
|
||||
FROM base AS builder
|
||||
WORKDIR /app
|
||||
COPY --from=deps /app/node_modules ./node_modules
|
||||
COPY . .
|
||||
RUN mkdir -p public
|
||||
RUN npm run build
|
||||
|
||||
FROM base AS runner
|
||||
WORKDIR /app
|
||||
ENV NODE_ENV=production
|
||||
RUN addgroup --system --gid 1001 nodejs
|
||||
RUN adduser --system --uid 1001 nextjs
|
||||
COPY --from=builder /app/public ./public
|
||||
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
|
||||
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static
|
||||
USER nextjs
|
||||
EXPOSE 3000
|
||||
ENV PORT=3000
|
||||
ENV HOSTNAME="0.0.0.0"
|
||||
CMD ["node", "server.js"]
|
||||
25
levis-holzbau/app/globals.css
Normal file
25
levis-holzbau/app/globals.css
Normal file
@@ -0,0 +1,25 @@
|
||||
@tailwind base;
|
||||
@tailwind components;
|
||||
@tailwind utilities;
|
||||
|
||||
@import url('https://fonts.googleapis.com/css2?family=Quicksand:wght@500;600;700&family=Nunito:wght@400;600;700&display=swap');
|
||||
|
||||
html {
|
||||
scroll-behavior: smooth;
|
||||
}
|
||||
|
||||
body {
|
||||
font-family: 'Nunito', sans-serif;
|
||||
background-color: #FDF8F0;
|
||||
color: #2C2C2C;
|
||||
}
|
||||
|
||||
h1, h2, h3, h4, h5, h6 {
|
||||
font-family: 'Quicksand', sans-serif;
|
||||
}
|
||||
|
||||
@layer utilities {
|
||||
.text-balance {
|
||||
text-wrap: balance;
|
||||
}
|
||||
}
|
||||
21
levis-holzbau/app/layout.tsx
Normal file
21
levis-holzbau/app/layout.tsx
Normal file
@@ -0,0 +1,21 @@
|
||||
import type { Metadata } from 'next'
|
||||
import './globals.css'
|
||||
import { Navbar } from '@/components/Navbar'
|
||||
import { Footer } from '@/components/Footer'
|
||||
|
||||
export const metadata: Metadata = {
|
||||
title: 'LEVIS Holzbau — Kinder-Holzwerkstatt',
|
||||
description: 'Lerne Holzfiguren schnitzen und kleine Holzprojekte bauen! Kindgerechte Anleitungen fuer junge Holzwerker.',
|
||||
}
|
||||
|
||||
export default function RootLayout({ children }: { children: React.ReactNode }) {
|
||||
return (
|
||||
<html lang="de">
|
||||
<body className="min-h-screen flex flex-col">
|
||||
<Navbar />
|
||||
<main className="flex-1">{children}</main>
|
||||
<Footer />
|
||||
</body>
|
||||
</html>
|
||||
)
|
||||
}
|
||||
71
levis-holzbau/app/page.tsx
Normal file
71
levis-holzbau/app/page.tsx
Normal file
@@ -0,0 +1,71 @@
|
||||
'use client'
|
||||
|
||||
import { motion } from 'framer-motion'
|
||||
import { Hammer, TreePine, ShieldCheck } from 'lucide-react'
|
||||
import { HeroSection } from '@/components/HeroSection'
|
||||
import { ProjectCard } from '@/components/ProjectCard'
|
||||
import { projects } from '@/lib/projects'
|
||||
|
||||
const features = [
|
||||
{
|
||||
icon: Hammer,
|
||||
title: 'Schnitzen',
|
||||
description: 'Lerne mit Schnitzmesser und Holz umzugehen und forme eigene Figuren.',
|
||||
color: 'bg-primary/10 text-primary',
|
||||
},
|
||||
{
|
||||
icon: TreePine,
|
||||
title: 'Bauen',
|
||||
description: 'Saege, leime und nagle — baue nuetzliche Dinge aus Holz!',
|
||||
color: 'bg-secondary/10 text-secondary',
|
||||
},
|
||||
{
|
||||
icon: ShieldCheck,
|
||||
title: 'Sicherheit',
|
||||
description: 'Jedes Projekt zeigt dir, wie du sicher mit Werkzeug arbeitest.',
|
||||
color: 'bg-accent/10 text-accent',
|
||||
},
|
||||
]
|
||||
|
||||
export default function HomePage() {
|
||||
const featured = projects.slice(0, 4)
|
||||
|
||||
return (
|
||||
<>
|
||||
<HeroSection />
|
||||
|
||||
{/* Features */}
|
||||
<section className="max-w-6xl mx-auto px-4 py-16">
|
||||
<div className="grid grid-cols-1 sm:grid-cols-3 gap-6">
|
||||
{features.map((f, i) => (
|
||||
<motion.div
|
||||
key={f.title}
|
||||
className="bg-white rounded-2xl p-6 shadow-sm border border-primary/5 text-center"
|
||||
initial={{ opacity: 0, y: 20 }}
|
||||
animate={{ opacity: 1, y: 0 }}
|
||||
transition={{ delay: i * 0.1 }}
|
||||
>
|
||||
<div className={`w-14 h-14 rounded-xl ${f.color} flex items-center justify-center mx-auto mb-4`}>
|
||||
<f.icon className="w-7 h-7" />
|
||||
</div>
|
||||
<h3 className="font-heading font-bold text-lg mb-2">{f.title}</h3>
|
||||
<p className="text-sm text-dark/60">{f.description}</p>
|
||||
</motion.div>
|
||||
))}
|
||||
</div>
|
||||
</section>
|
||||
|
||||
{/* Popular Projects */}
|
||||
<section className="max-w-6xl mx-auto px-4 pb-16">
|
||||
<h2 className="font-heading font-bold text-3xl text-center mb-8">
|
||||
Beliebte Projekte
|
||||
</h2>
|
||||
<div className="grid grid-cols-1 sm:grid-cols-2 lg:grid-cols-4 gap-6">
|
||||
{featured.map((p) => (
|
||||
<ProjectCard key={p.slug} project={p} />
|
||||
))}
|
||||
</div>
|
||||
</section>
|
||||
</>
|
||||
)
|
||||
}
|
||||
120
levis-holzbau/app/projekte/[slug]/page.tsx
Normal file
120
levis-holzbau/app/projekte/[slug]/page.tsx
Normal file
@@ -0,0 +1,120 @@
|
||||
import { notFound } from 'next/navigation'
|
||||
import Link from 'next/link'
|
||||
import { ArrowLeft, Clock, Wrench, Package } from 'lucide-react'
|
||||
import { projects, getProject, getRelatedProjects } from '@/lib/projects'
|
||||
import { DifficultyBadge } from '@/components/DifficultyBadge'
|
||||
import { AgeBadge } from '@/components/AgeBadge'
|
||||
import { StepCard } from '@/components/StepCard'
|
||||
import { SafetyTip } from '@/components/SafetyTip'
|
||||
import { ToolIcon } from '@/components/ToolIcon'
|
||||
import { ProjectIllustration } from '@/components/ProjectIllustration'
|
||||
import { ProjectCard } from '@/components/ProjectCard'
|
||||
|
||||
export function generateStaticParams() {
|
||||
return projects.map((p) => ({ slug: p.slug }))
|
||||
}
|
||||
|
||||
export default async function ProjectPage({ params }: { params: Promise<{ slug: string }> }) {
|
||||
const { slug } = await params
|
||||
const project = getProject(slug)
|
||||
if (!project) notFound()
|
||||
|
||||
const related = getRelatedProjects(slug)
|
||||
|
||||
return (
|
||||
<div className="max-w-4xl mx-auto px-4 py-8">
|
||||
{/* Back */}
|
||||
<Link href="/projekte" className="inline-flex items-center gap-1 text-accent hover:underline mb-6 text-sm font-semibold">
|
||||
<ArrowLeft className="w-4 h-4" /> Alle Projekte
|
||||
</Link>
|
||||
|
||||
{/* Hero */}
|
||||
<div className="bg-white rounded-2xl shadow-sm border border-primary/5 overflow-hidden mb-8">
|
||||
<div className="bg-cream p-10 flex items-center justify-center">
|
||||
<ProjectIllustration slug={project.slug} size={180} />
|
||||
</div>
|
||||
<div className="p-6 sm:p-8">
|
||||
<div className="flex flex-wrap items-center gap-3 mb-3">
|
||||
<AgeBadge range={project.ageRange} />
|
||||
<DifficultyBadge level={project.difficulty} />
|
||||
<span className="flex items-center gap-1 text-sm text-dark/50">
|
||||
<Clock className="w-4 h-4" /> {project.duration}
|
||||
</span>
|
||||
</div>
|
||||
<h1 className="font-heading font-bold text-3xl sm:text-4xl mb-3">{project.name}</h1>
|
||||
<p className="text-dark/70 text-lg leading-relaxed">{project.description}</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Tools & Materials */}
|
||||
<div className="grid grid-cols-1 sm:grid-cols-2 gap-4 mb-8">
|
||||
<div className="bg-white rounded-2xl p-6 border border-primary/5">
|
||||
<h2 className="font-heading font-bold text-lg flex items-center gap-2 mb-4">
|
||||
<Wrench className="w-5 h-5 text-primary" /> Werkzeuge
|
||||
</h2>
|
||||
<ul className="space-y-2">
|
||||
{project.tools.map((t) => (
|
||||
<li key={t} className="flex items-center gap-2 text-sm">
|
||||
<ToolIcon name={t} />
|
||||
{t}
|
||||
</li>
|
||||
))}
|
||||
</ul>
|
||||
</div>
|
||||
<div className="bg-white rounded-2xl p-6 border border-primary/5">
|
||||
<h2 className="font-heading font-bold text-lg flex items-center gap-2 mb-4">
|
||||
<Package className="w-5 h-5 text-secondary" /> Material
|
||||
</h2>
|
||||
<ul className="space-y-2">
|
||||
{project.materials.map((m) => (
|
||||
<li key={m} className="flex items-center gap-2 text-sm">
|
||||
<span className="w-2 h-2 rounded-full bg-secondary flex-shrink-0" />
|
||||
{m}
|
||||
</li>
|
||||
))}
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Safety */}
|
||||
<div className="space-y-3 mb-10">
|
||||
<h2 className="font-heading font-bold text-xl mb-2">Sicherheitshinweise</h2>
|
||||
{project.safetyTips.map((tip) => (
|
||||
<SafetyTip key={tip}>{tip}</SafetyTip>
|
||||
))}
|
||||
</div>
|
||||
|
||||
{/* Steps */}
|
||||
<div className="mb-10">
|
||||
<h2 className="font-heading font-bold text-xl mb-6">Schritt fuer Schritt</h2>
|
||||
<div className="space-y-0">
|
||||
{project.steps.map((step, i) => (
|
||||
<StepCard key={i} step={step} index={i} />
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Skills */}
|
||||
<div className="bg-secondary/5 rounded-2xl p-6 mb-12">
|
||||
<h2 className="font-heading font-bold text-xl mb-3">Was du lernst</h2>
|
||||
<div className="flex flex-wrap gap-2">
|
||||
{project.skills.map((s) => (
|
||||
<span key={s} className="px-3 py-1.5 bg-secondary/10 text-secondary rounded-full text-sm font-semibold">
|
||||
{s}
|
||||
</span>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Related */}
|
||||
<div>
|
||||
<h2 className="font-heading font-bold text-xl mb-6">Aehnliche Projekte</h2>
|
||||
<div className="grid grid-cols-1 sm:grid-cols-3 gap-4">
|
||||
{related.map((p) => (
|
||||
<ProjectCard key={p.slug} project={p} />
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
59
levis-holzbau/app/projekte/page.tsx
Normal file
59
levis-holzbau/app/projekte/page.tsx
Normal file
@@ -0,0 +1,59 @@
|
||||
'use client'
|
||||
|
||||
import { useState } from 'react'
|
||||
import { motion } from 'framer-motion'
|
||||
import { ProjectCard } from '@/components/ProjectCard'
|
||||
import { projects } from '@/lib/projects'
|
||||
|
||||
const filters = [
|
||||
{ label: 'Alle', value: 0 },
|
||||
{ label: 'Anfaenger', value: 1 },
|
||||
{ label: 'Fortgeschritten', value: 2 },
|
||||
{ label: 'Profi', value: 3 },
|
||||
]
|
||||
|
||||
export default function ProjektePage() {
|
||||
const [filter, setFilter] = useState(0)
|
||||
const filtered = filter === 0 ? projects : projects.filter((p) => p.difficulty === filter)
|
||||
|
||||
return (
|
||||
<div className="max-w-6xl mx-auto px-4 py-12">
|
||||
<motion.div
|
||||
initial={{ opacity: 0, y: -10 }}
|
||||
animate={{ opacity: 1, y: 0 }}
|
||||
className="text-center mb-10"
|
||||
>
|
||||
<h1 className="font-heading font-bold text-4xl mb-3">Alle Projekte</h1>
|
||||
<p className="text-dark/60 text-lg">Waehle ein Projekt und leg los!</p>
|
||||
</motion.div>
|
||||
|
||||
{/* Filter */}
|
||||
<div className="flex justify-center gap-2 mb-10">
|
||||
{filters.map((f) => (
|
||||
<button
|
||||
key={f.value}
|
||||
onClick={() => setFilter(f.value as 0 | 1 | 2 | 3)}
|
||||
className={`px-4 py-2 rounded-xl font-semibold text-sm transition-colors ${
|
||||
filter === f.value
|
||||
? 'bg-primary text-white'
|
||||
: 'bg-white text-dark/60 hover:bg-primary/5'
|
||||
}`}
|
||||
>
|
||||
{f.label}
|
||||
</button>
|
||||
))}
|
||||
</div>
|
||||
|
||||
{/* Grid */}
|
||||
<div className="grid grid-cols-1 sm:grid-cols-2 lg:grid-cols-3 gap-6">
|
||||
{filtered.map((p) => (
|
||||
<ProjectCard key={p.slug} project={p} />
|
||||
))}
|
||||
</div>
|
||||
|
||||
{filtered.length === 0 && (
|
||||
<p className="text-center text-dark/40 mt-12">Keine Projekte in dieser Kategorie.</p>
|
||||
)}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
101
levis-holzbau/app/sicherheit/page.tsx
Normal file
101
levis-holzbau/app/sicherheit/page.tsx
Normal file
@@ -0,0 +1,101 @@
|
||||
'use client'
|
||||
|
||||
import { motion } from 'framer-motion'
|
||||
import { ShieldCheck, Eye, Hand, Scissors, AlertTriangle, Users } from 'lucide-react'
|
||||
import { SafetyTip } from '@/components/SafetyTip'
|
||||
|
||||
const rules = [
|
||||
{ icon: Users, title: 'Immer mit Erwachsenen', text: 'Bei Saegen, Bohren und Schnitzen muss immer ein Erwachsener dabei sein.' },
|
||||
{ icon: Hand, title: 'Vom Koerper weg', text: 'Schnitze, saege und schneide immer vom Koerper weg. So kannst du dich nicht verletzen.' },
|
||||
{ icon: Eye, title: 'Schutzbrille tragen', text: 'Beim Saegen und Schleifen fliegen Spaene — eine Schutzbrille schuetzt deine Augen.' },
|
||||
{ icon: Scissors, title: 'Werkzeug richtig halten', text: 'Greife Werkzeuge immer am Griff. Trage Messer und Saegen mit der Spitze nach unten.' },
|
||||
{ icon: AlertTriangle, title: 'Aufgeraeumter Arbeitsplatz', text: 'Raeume Werkzeug nach dem Benutzen weg. Ein ordentlicher Platz ist ein sicherer Platz!' },
|
||||
{ icon: ShieldCheck, title: 'Scharfes Werkzeug', text: 'Klingt komisch, aber: Scharfe Messer sind sicherer als stumpfe, weil du weniger Kraft brauchst.' },
|
||||
]
|
||||
|
||||
const toolGuides = [
|
||||
{ name: 'Schnitzmesser', age: 'Ab 6 Jahren (mit Hilfe)', tips: ['Immer vom Koerper weg schnitzen', 'Nach dem Benutzen zuklappen', 'Weiches Holz (Linde) verwenden'] },
|
||||
{ name: 'Handsaege', age: 'Ab 7 Jahren (mit Hilfe)', tips: ['Holz immer fest einspannen', 'Langsam und gleichmaessig saegen', 'Nicht auf die Klinge druecken'] },
|
||||
{ name: 'Hammer', age: 'Ab 5 Jahren', tips: ['Leichten Kinderhammer verwenden', 'Naegel mit Zange halten, nie mit Fingern', 'Auf stabile Unterlage achten'] },
|
||||
{ name: 'Schleifpapier', age: 'Ab 5 Jahren', tips: ['Immer in eine Richtung schleifen', 'Staub nicht einatmen', 'Erst grob, dann fein'] },
|
||||
{ name: 'Holzleim', age: 'Ab 5 Jahren', tips: ['Nicht giftig, aber nicht essen', 'Duenn auftragen reicht', 'Mindestens 1 Stunde trocknen lassen'] },
|
||||
]
|
||||
|
||||
export default function SicherheitPage() {
|
||||
return (
|
||||
<div className="max-w-4xl mx-auto px-4 py-12">
|
||||
<motion.div
|
||||
initial={{ opacity: 0, y: -10 }}
|
||||
animate={{ opacity: 1, y: 0 }}
|
||||
className="text-center mb-12"
|
||||
>
|
||||
<div className="w-16 h-16 bg-warning/10 rounded-2xl flex items-center justify-center mx-auto mb-4">
|
||||
<ShieldCheck className="w-8 h-8 text-warning" />
|
||||
</div>
|
||||
<h1 className="font-heading font-bold text-4xl mb-3">Sicherheit geht vor!</h1>
|
||||
<p className="text-dark/60 text-lg max-w-2xl mx-auto">
|
||||
Holzarbeiten macht riesig Spass — aber nur, wenn du sicher arbeitest.
|
||||
Hier findest du die wichtigsten Regeln.
|
||||
</p>
|
||||
</motion.div>
|
||||
|
||||
{/* Rules Grid */}
|
||||
<section className="mb-16">
|
||||
<h2 className="font-heading font-bold text-2xl mb-6">Die goldenen Regeln</h2>
|
||||
<div className="grid grid-cols-1 sm:grid-cols-2 gap-4">
|
||||
{rules.map((r, i) => (
|
||||
<motion.div
|
||||
key={r.title}
|
||||
className="bg-white rounded-2xl p-5 border border-primary/5 flex gap-4"
|
||||
initial={{ opacity: 0, y: 20 }}
|
||||
animate={{ opacity: 1, y: 0 }}
|
||||
transition={{ delay: i * 0.05 }}
|
||||
>
|
||||
<div className="w-10 h-10 bg-warning/10 rounded-xl flex items-center justify-center flex-shrink-0">
|
||||
<r.icon className="w-5 h-5 text-warning" />
|
||||
</div>
|
||||
<div>
|
||||
<h3 className="font-heading font-bold mb-1">{r.title}</h3>
|
||||
<p className="text-sm text-dark/60">{r.text}</p>
|
||||
</div>
|
||||
</motion.div>
|
||||
))}
|
||||
</div>
|
||||
</section>
|
||||
|
||||
{/* Tool Guides */}
|
||||
<section className="mb-16">
|
||||
<h2 className="font-heading font-bold text-2xl mb-6">Werkzeug-Guide</h2>
|
||||
<div className="space-y-4">
|
||||
{toolGuides.map((tool) => (
|
||||
<div key={tool.name} className="bg-white rounded-2xl p-5 border border-primary/5">
|
||||
<div className="flex items-center justify-between mb-3">
|
||||
<h3 className="font-heading font-bold text-lg">{tool.name}</h3>
|
||||
<span className="text-xs font-semibold bg-accent/10 text-accent px-2.5 py-1 rounded-full">{tool.age}</span>
|
||||
</div>
|
||||
<ul className="space-y-1.5">
|
||||
{tool.tips.map((tip) => (
|
||||
<li key={tip} className="flex items-center gap-2 text-sm text-dark/70">
|
||||
<span className="w-1.5 h-1.5 rounded-full bg-primary flex-shrink-0" />
|
||||
{tip}
|
||||
</li>
|
||||
))}
|
||||
</ul>
|
||||
</div>
|
||||
))}
|
||||
</div>
|
||||
</section>
|
||||
|
||||
{/* Parents */}
|
||||
<section>
|
||||
<h2 className="font-heading font-bold text-2xl mb-4">Hinweise fuer Eltern</h2>
|
||||
<div className="space-y-3">
|
||||
<SafetyTip>Beaufsichtigen Sie Ihr Kind bei allen Projekten — besonders beim Umgang mit Schneidwerkzeugen.</SafetyTip>
|
||||
<SafetyTip>Stellen Sie altersgerechtes Werkzeug bereit. Kinderschnitzmesser haben abgerundete Spitzen.</SafetyTip>
|
||||
<SafetyTip>Richten Sie einen festen Arbeitsplatz ein — idealerweise auf einer stabilen Werkbank oder einem alten Tisch.</SafetyTip>
|
||||
<SafetyTip>Leinoel und Acrylfarben sind fuer Kinder unbedenklich. Vermeiden Sie Lacke mit Loesungsmitteln.</SafetyTip>
|
||||
</div>
|
||||
</section>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
83
levis-holzbau/app/ueber/page.tsx
Normal file
83
levis-holzbau/app/ueber/page.tsx
Normal file
@@ -0,0 +1,83 @@
|
||||
'use client'
|
||||
|
||||
import { motion } from 'framer-motion'
|
||||
import { TreePine, Heart, Sparkles, Users } from 'lucide-react'
|
||||
import Link from 'next/link'
|
||||
|
||||
const reasons = [
|
||||
{ icon: Sparkles, title: 'Kreativitaet', text: 'Du kannst dir selbst ausdenken, was du baust — und es dann wirklich machen!' },
|
||||
{ icon: Heart, title: 'Stolz', text: 'Wenn du etwas mit deinen eigenen Haenden baust, macht dich das richtig stolz.' },
|
||||
{ icon: TreePine, title: 'Natur', text: 'Holz ist ein natuerliches Material. Du lernst die Natur besser kennen.' },
|
||||
{ icon: Users, title: 'Zusammen', text: 'Holzarbeiten macht zusammen mit Freunden oder der Familie am meisten Spass!' },
|
||||
]
|
||||
|
||||
export default function UeberPage() {
|
||||
return (
|
||||
<div className="max-w-4xl mx-auto px-4 py-12">
|
||||
<motion.div
|
||||
initial={{ opacity: 0, y: -10 }}
|
||||
animate={{ opacity: 1, y: 0 }}
|
||||
className="text-center mb-12"
|
||||
>
|
||||
<h1 className="font-heading font-bold text-4xl mb-3">Ueber LEVIS Holzbau</h1>
|
||||
<p className="text-dark/60 text-lg max-w-2xl mx-auto">
|
||||
Wir zeigen dir, wie du aus einem einfachen Stueck Holz etwas Tolles machen kannst!
|
||||
</p>
|
||||
</motion.div>
|
||||
|
||||
{/* Story */}
|
||||
<div className="bg-white rounded-2xl p-6 sm:p-8 border border-primary/5 mb-12">
|
||||
<h2 className="font-heading font-bold text-2xl mb-4">Was ist LEVIS Holzbau?</h2>
|
||||
<div className="space-y-4 text-dark/70 leading-relaxed">
|
||||
<p>
|
||||
LEVIS Holzbau ist deine Online-Holzwerkstatt! Hier findest du Anleitungen fuer tolle Projekte
|
||||
aus Holz — vom einfachen Zauberstab bis zum echten Vogelhaus.
|
||||
</p>
|
||||
<p>
|
||||
Jedes Projekt erklaert dir Schritt fuer Schritt, was du tun musst. Du siehst welches Werkzeug
|
||||
und Material du brauchst, und wir zeigen dir immer, worauf du bei der Sicherheit achten musst.
|
||||
</p>
|
||||
<p>
|
||||
Egal ob du 6 oder 12 Jahre alt bist — fuer jedes Alter gibt es passende Projekte.
|
||||
Faengst du gerade erst an? Dann probier den Zauberstab oder die Nagelbilder. Bist du
|
||||
schon ein Profi? Dann trau dich an den Fliegenpilz!
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
{/* Why woodworking */}
|
||||
<h2 className="font-heading font-bold text-2xl mb-6 text-center">Warum Holzarbeiten Spass macht</h2>
|
||||
<div className="grid grid-cols-1 sm:grid-cols-2 gap-4 mb-12">
|
||||
{reasons.map((r, i) => (
|
||||
<motion.div
|
||||
key={r.title}
|
||||
className="bg-white rounded-2xl p-5 border border-primary/5 flex gap-4"
|
||||
initial={{ opacity: 0, y: 20 }}
|
||||
animate={{ opacity: 1, y: 0 }}
|
||||
transition={{ delay: i * 0.1 }}
|
||||
>
|
||||
<div className="w-10 h-10 bg-secondary/10 rounded-xl flex items-center justify-center flex-shrink-0">
|
||||
<r.icon className="w-5 h-5 text-secondary" />
|
||||
</div>
|
||||
<div>
|
||||
<h3 className="font-heading font-bold mb-1">{r.title}</h3>
|
||||
<p className="text-sm text-dark/60">{r.text}</p>
|
||||
</div>
|
||||
</motion.div>
|
||||
))}
|
||||
</div>
|
||||
|
||||
{/* CTA */}
|
||||
<div className="text-center bg-gradient-to-br from-primary/5 to-secondary/5 rounded-2xl p-8">
|
||||
<h2 className="font-heading font-bold text-2xl mb-3">Bereit loszulegen?</h2>
|
||||
<p className="text-dark/60 mb-6">Schau dir unsere Projekte an und such dir eins aus!</p>
|
||||
<Link
|
||||
href="/projekte"
|
||||
className="inline-flex items-center gap-2 bg-primary hover:bg-primary/90 text-white font-bold px-8 py-3 rounded-2xl transition-colors"
|
||||
>
|
||||
Zu den Projekten
|
||||
</Link>
|
||||
</div>
|
||||
</div>
|
||||
)
|
||||
}
|
||||
7
levis-holzbau/components/AgeBadge.tsx
Normal file
7
levis-holzbau/components/AgeBadge.tsx
Normal file
@@ -0,0 +1,7 @@
|
||||
export function AgeBadge({ range }: { range: string }) {
|
||||
return (
|
||||
<span className="inline-flex items-center px-2.5 py-0.5 rounded-full text-xs font-semibold bg-accent/10 text-accent">
|
||||
{range} Jahre
|
||||
</span>
|
||||
)
|
||||
}
|
||||
15
levis-holzbau/components/DifficultyBadge.tsx
Normal file
15
levis-holzbau/components/DifficultyBadge.tsx
Normal file
@@ -0,0 +1,15 @@
|
||||
import { Hammer } from 'lucide-react'
|
||||
|
||||
export function DifficultyBadge({ level }: { level: 1 | 2 | 3 }) {
|
||||
const labels = ['Anfaenger', 'Fortgeschritten', 'Profi']
|
||||
return (
|
||||
<div className="flex items-center gap-1" title={labels[level - 1]}>
|
||||
{Array.from({ length: 3 }).map((_, i) => (
|
||||
<Hammer
|
||||
key={i}
|
||||
className={`w-4 h-4 ${i < level ? 'text-primary' : 'text-gray-300'}`}
|
||||
/>
|
||||
))}
|
||||
</div>
|
||||
)
|
||||
}
|
||||
17
levis-holzbau/components/Footer.tsx
Normal file
17
levis-holzbau/components/Footer.tsx
Normal file
@@ -0,0 +1,17 @@
|
||||
import { Heart } from 'lucide-react'
|
||||
import { Logo } from './Logo'
|
||||
|
||||
export function Footer() {
|
||||
return (
|
||||
<footer className="bg-white border-t border-primary/10 mt-16">
|
||||
<div className="max-w-6xl mx-auto px-4 py-8">
|
||||
<div className="flex flex-col sm:flex-row items-center justify-between gap-4">
|
||||
<Logo size={32} />
|
||||
<p className="text-sm text-dark/50 flex items-center gap-1">
|
||||
Gemacht mit <Heart className="w-4 h-4 text-red-400 fill-red-400" /> fuer junge Holzwerker
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
</footer>
|
||||
)
|
||||
}
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user