Some checks failed
ci/woodpecker/push/integration Pipeline failed
ci/woodpecker/push/main Pipeline failed
CI/CD Pipeline / Go Tests (push) Has been cancelled
CI/CD Pipeline / Python Tests (push) Has been cancelled
CI/CD Pipeline / Website Tests (push) Has been cancelled
CI/CD Pipeline / Linting (push) Has been cancelled
CI/CD Pipeline / Security Scan (push) Has been cancelled
CI/CD Pipeline / Docker Build & Push (push) Has been cancelled
CI/CD Pipeline / Integration Tests (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / CI Summary (push) Has been cancelled
Security Scanning / Secret Scanning (push) Has been cancelled
Security Scanning / Dependency Vulnerability Scan (push) Has been cancelled
Security Scanning / Go Security Scan (push) Has been cancelled
Security Scanning / Python Security Scan (push) Has been cancelled
Security Scanning / Node.js Security Scan (push) Has been cancelled
Security Scanning / Docker Image Security (push) Has been cancelled
Security Scanning / Security Summary (push) Has been cancelled
Tests / Go Tests (push) Has been cancelled
Tests / Python Tests (push) Has been cancelled
Tests / Integration Tests (push) Has been cancelled
Tests / Go Lint (push) Has been cancelled
Tests / Python Lint (push) Has been cancelled
Tests / Security Scan (push) Has been cancelled
Tests / All Checks Passed (push) Has been cancelled
- Academy, Whistleblower, Incidents frontend pages with API proxies and types - Vendor compliance API proxy route - Go backend handlers and models for all new SDK modules - Investor pitch-deck app with interactive slides - Blog section with DSGVO, AI Act, NIS2, glossary articles - MkDocs documentation site - CI/CD pipelines (Woodpecker, GitHub Actions), security scanning config - Planning and implementation documentation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
17 KiB
17 KiB
Source-Policy System - Implementierungsplan
Zusammenfassung
Whitelist-basiertes Datenquellen-Management fuer das edu-search-service unter /compliance/source-policy. Fuer Auditoren pruefbar mit vollstaendigem Audit-Trail.
Kernprinzipien:
- Nur offizielle Open-Data-Portale und amtliche Quellen (§5 UrhG)
- Training mit externen Daten: VERBOTEN
- Alle Aenderungen protokolliert (Audit-Trail)
- PII-Blocklist mit Hard-Block
1. Architektur
┌─────────────────────────────────────────────────────────────────┐
│ admin-v2 (Next.js) │
│ /app/(admin)/compliance/source-policy/ │
│ ├── page.tsx (Dashboard + Tabs) │
│ └── components/ │
│ ├── SourcesTab.tsx (Whitelist-Verwaltung) │
│ ├── OperationsMatrixTab.tsx (Lookup/RAG/Training/Export) │
│ ├── PIIRulesTab.tsx (PII-Blocklist) │
│ └── AuditTab.tsx (Aenderungshistorie + Export) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ edu-search-service (Go) │
│ NEW: internal/policy/ │
│ ├── models.go (Datenstrukturen) │
│ ├── store.go (PostgreSQL CRUD) │
│ ├── enforcer.go (Policy-Enforcement) │
│ ├── pii_detector.go (PII-Erkennung) │
│ └── audit.go (Audit-Logging) │
│ │
│ MODIFIED: │
│ ├── crawler/crawler.go (Whitelist-Check vor Fetch) │
│ ├── pipeline/pipeline.go (PII-Filter nach Extract) │
│ └── api/handlers/policy_handlers.go (Admin-API) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ PostgreSQL │
│ NEW TABLES: │
│ - source_policies (versionierte Policies) │
│ - allowed_sources (Whitelist pro Bundesland) │
│ - operation_permissions (Lookup/RAG/Training/Export Matrix) │
│ - pii_rules (Regex/Keyword Blocklist) │
│ - policy_audit_log (unveraenderlich) │
│ - blocked_content_log (blockierte URLs fuer Audit) │
└─────────────────────────────────────────────────────────────────┘
2. Datenmodell
2.1 PostgreSQL Schema
-- Policies (versioniert)
CREATE TABLE source_policies (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
version INTEGER NOT NULL DEFAULT 1,
name VARCHAR(255) NOT NULL,
bundesland VARCHAR(2), -- NULL = Bundesebene/KMK
is_active BOOLEAN DEFAULT true,
created_at TIMESTAMP DEFAULT NOW(),
approved_by UUID,
approved_at TIMESTAMP
);
-- Whitelist
CREATE TABLE allowed_sources (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
policy_id UUID REFERENCES source_policies(id),
domain VARCHAR(255) NOT NULL,
name VARCHAR(255) NOT NULL,
license VARCHAR(50) NOT NULL, -- DL-DE-BY-2.0, CC-BY, §5 UrhG
legal_basis VARCHAR(100),
citation_template TEXT,
trust_boost DECIMAL(3,2) DEFAULT 0.50,
is_active BOOLEAN DEFAULT true
);
-- Operations Matrix
CREATE TABLE operation_permissions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
source_id UUID REFERENCES allowed_sources(id),
operation VARCHAR(50) NOT NULL, -- lookup, rag, training, export
is_allowed BOOLEAN NOT NULL,
requires_citation BOOLEAN DEFAULT false,
notes TEXT
);
-- PII Blocklist
CREATE TABLE pii_rules (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
rule_type VARCHAR(50) NOT NULL, -- regex, keyword
pattern TEXT NOT NULL,
severity VARCHAR(20) DEFAULT 'block', -- block, warn, redact
is_active BOOLEAN DEFAULT true
);
-- Audit Log (immutable)
CREATE TABLE policy_audit_log (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
action VARCHAR(50) NOT NULL,
entity_type VARCHAR(50) NOT NULL,
entity_id UUID,
old_value JSONB,
new_value JSONB,
user_email VARCHAR(255),
created_at TIMESTAMP DEFAULT NOW()
);
-- Blocked Content Log
CREATE TABLE blocked_content_log (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
url VARCHAR(2048) NOT NULL,
domain VARCHAR(255) NOT NULL,
block_reason VARCHAR(100) NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
2.2 Initial-Daten
Datei: edu-search-service/policies/bundeslaender.yaml
federal:
name: "KMK & Bundesebene"
sources:
- domain: "kmk.org"
name: "Kultusministerkonferenz"
license: "§5 UrhG"
legal_basis: "Amtliche Werke (§5 UrhG)"
citation_template: "Quelle: KMK, {title}, {date}"
- domain: "bildungsserver.de"
name: "Deutscher Bildungsserver"
license: "DL-DE-BY-2.0"
NI:
name: "Niedersachsen"
sources:
- domain: "nibis.de"
name: "NiBiS Bildungsserver"
license: "DL-DE-BY-2.0"
- domain: "mk.niedersachsen.de"
name: "Kultusministerium Niedersachsen"
license: "§5 UrhG"
- domain: "cuvo.nibis.de"
name: "Kerncurricula Niedersachsen"
license: "DL-DE-BY-2.0"
BY:
name: "Bayern"
sources:
- domain: "km.bayern.de"
name: "Bayerisches Kultusministerium"
license: "§5 UrhG"
- domain: "isb.bayern.de"
name: "ISB Bayern"
license: "DL-DE-BY-2.0"
- domain: "lehrplanplus.bayern.de"
name: "LehrplanPLUS"
license: "DL-DE-BY-2.0"
# Default Operations Matrix
default_operations:
lookup:
allowed: true
requires_citation: true
rag:
allowed: true
requires_citation: true
training:
allowed: false # VERBOTEN
export:
allowed: true
requires_citation: true
# Default PII Rules
pii_rules:
- name: "Email Addresses"
type: "regex"
pattern: "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
severity: "block"
- name: "German Phone Numbers"
type: "regex"
pattern: "(?:\\+49|0)[\\s.-]?\\d{2,4}[\\s.-]?\\d{3,}[\\s.-]?\\d{2,}"
severity: "block"
- name: "IBAN"
type: "regex"
pattern: "DE\\d{2}\\s?\\d{4}\\s?\\d{4}\\s?\\d{4}\\s?\\d{4}\\s?\\d{2}"
severity: "block"
3. Backend Implementation
3.1 Neue Dateien
| Datei | Beschreibung |
|---|---|
internal/policy/models.go |
Go Structs (SourcePolicy, AllowedSource, PIIRule, etc.) |
internal/policy/store.go |
PostgreSQL CRUD mit pgx |
internal/policy/enforcer.go |
CheckSource(), CheckOperation(), DetectPII() |
internal/policy/audit.go |
LogChange(), LogBlocked() |
internal/policy/pii_detector.go |
Regex-basierte PII-Erkennung |
internal/api/handlers/policy_handlers.go |
Admin-Endpoints |
migrations/005_source_policies.sql |
DB-Schema |
policies/bundeslaender.yaml |
Initial-Daten |
3.2 API Endpoints
# Policies
GET /v1/admin/policies
POST /v1/admin/policies
PUT /v1/admin/policies/:id
# Sources (Whitelist)
GET /v1/admin/sources
POST /v1/admin/sources
PUT /v1/admin/sources/:id
DELETE /v1/admin/sources/:id
# Operations Matrix
GET /v1/admin/operations-matrix
PUT /v1/admin/operations/:id
# PII Rules
GET /v1/admin/pii-rules
POST /v1/admin/pii-rules
PUT /v1/admin/pii-rules/:id
DELETE /v1/admin/pii-rules/:id
POST /v1/admin/pii-rules/test # Test gegen Sample-Text
# Audit
GET /v1/admin/policy-audit?from=&to=
GET /v1/admin/blocked-content?from=&to=
GET /v1/admin/compliance-report # PDF/JSON Export
# Live-Check
POST /v1/admin/check-compliance
Body: { "url": "...", "operation": "lookup" }
3.3 Crawler-Integration
In crawler/crawler.go:
func (c *Crawler) FetchWithPolicy(ctx context.Context, url string) (*FetchResult, error) {
// 1. Whitelist-Check
source, err := c.enforcer.CheckSource(ctx, url)
if err != nil || source == nil {
c.enforcer.LogBlocked(ctx, url, "not_whitelisted")
return nil, ErrNotWhitelisted
}
// ... existing fetch ...
// 2. PII-Check nach Fetch
piiMatches := c.enforcer.DetectPII(content)
if hasSeverity(piiMatches, "block") {
c.enforcer.LogBlocked(ctx, url, "pii_detected")
return nil, ErrPIIDetected
}
return result, nil
}
4. Frontend Implementation
4.1 Navigation Update
In lib/navigation.ts unter compliance Kategorie hinzufuegen:
{
id: 'source-policy',
name: 'Quellen-Policy',
href: '/compliance/source-policy',
description: 'Datenquellen & Compliance',
purpose: 'Whitelist zugelassener Datenquellen mit Operations-Matrix und PII-Blocklist.',
audience: ['DSB', 'Compliance Officer', 'Auditor'],
gdprArticles: ['Art. 5 (Rechtmaessigkeit)', 'Art. 6 (Rechtsgrundlage)'],
}
4.2 Seiten-Struktur
/app/(admin)/compliance/source-policy/
├── page.tsx # Haupt-Dashboard mit Tabs
└── components/
├── SourcesTab.tsx # Whitelist-Tabelle mit CRUD
├── OperationsMatrixTab.tsx # 4x4 Matrix
├── PIIRulesTab.tsx # PII-Regeln mit Test-Funktion
└── AuditTab.tsx # Aenderungshistorie + Export
4.3 UI-Layout
Stats Cards (oben):
- Aktive Policies
- Zugelassene Quellen
- Blockiert (heute)
- Compliance Score
Tabs:
- Dashboard - Uebersicht mit Quick-Stats
- Quellen - Whitelist-Tabelle (Domain, Name, Lizenz, Status)
- Operations - Matrix mit Lookup/RAG/Training/Export
- PII-Regeln - Blocklist mit Test-Funktion
- Audit - Aenderungshistorie mit PDF/JSON-Export
Pattern (aus audit-report/page.tsx):
- Tab-Navigation:
bg-purple-600 text-whitefuer aktiv - Status-Badges:
bg-green-100 text-green-700fuer aktiv - Tabellen:
hover:bg-slate-50 - Info-Boxen:
bg-blue-50 border-blue-200
5. Betroffene Dateien
Neue Dateien erstellen:
Backend (edu-search-service):
internal/policy/models.go
internal/policy/store.go
internal/policy/enforcer.go
internal/policy/audit.go
internal/policy/pii_detector.go
internal/api/handlers/policy_handlers.go
migrations/005_source_policies.sql
policies/bundeslaender.yaml
Frontend (admin-v2):
app/(admin)/compliance/source-policy/page.tsx
app/(admin)/compliance/source-policy/components/SourcesTab.tsx
app/(admin)/compliance/source-policy/components/OperationsMatrixTab.tsx
app/(admin)/compliance/source-policy/components/PIIRulesTab.tsx
app/(admin)/compliance/source-policy/components/AuditTab.tsx
Bestehende Dateien aendern:
edu-search-service/cmd/server/main.go # Policy-Endpoints registrieren
edu-search-service/internal/crawler/crawler.go # Policy-Check hinzufuegen
edu-search-service/internal/pipeline/pipeline.go # PII-Filter
edu-search-service/internal/database/database.go # Migrations
admin-v2/lib/navigation.ts # source-policy Modul
6. Implementierungs-Reihenfolge
Phase 1: Datenbank & Models
- Migration
005_source_policies.sqlerstellen - Go Models in
internal/policy/models.go - Store-Layer in
internal/policy/store.go - YAML-Loader fuer Initial-Daten
Phase 2: Policy Enforcer
internal/policy/enforcer.go- CheckSource, CheckOperationinternal/policy/pii_detector.go- Regex-basierte Erkennunginternal/policy/audit.go- Logging- Integration in Crawler
Phase 3: Admin API
internal/api/handlers/policy_handlers.go- Routen in main.go registrieren
- API testen
Phase 4: Frontend
- Hauptseite mit PagePurpose
- SourcesTab mit Whitelist-CRUD
- OperationsMatrixTab
- PIIRulesTab mit Test-Funktion
- AuditTab mit Export
Phase 5: Testing & Deployment
- Unit Tests fuer Enforcer
- Integration Tests fuer API
- E2E Test fuer Frontend
- Deployment auf Mac Mini
7. Verifikation
Nach Backend (Phase 1-3):
# Migration ausfuehren
ssh macmini "cd /path/to/edu-search-service && go run ./cmd/migrate"
# API testen
curl -X GET http://macmini:8088/v1/admin/policies
curl -X POST http://macmini:8088/v1/admin/check-compliance \
-d '{"url":"https://nibis.de/test","operation":"lookup"}'
Nach Frontend (Phase 4):
# Build & Deploy
rsync -avz admin-v2/ macmini:/path/to/admin-v2/
ssh macmini "docker compose build admin-v2 && docker compose up -d admin-v2"
# Testen
open https://macmini:3002/compliance/source-policy
Auditor-Checkliste:
- Alle Quellen in Whitelist dokumentiert
- Operations-Matrix zeigt Training = VERBOTEN
- PII-Regeln aktiv und testbar
- Audit-Log zeigt alle Aenderungen
- Blocked-Content-Log zeigt blockierte URLs
- PDF/JSON-Export funktioniert
8. KMK-Spezifika (§5 UrhG)
Rechtsgrundlage:
- KMK-Beschluesse, Vereinbarungen, EPA sind amtliche Werke nach §5 UrhG
- Frei nutzbar, Attribution erforderlich
Zitierformat:
Quelle: KMK, [Titel des Beschlusses], [Datum]
Beispiel: Quelle: KMK, Bildungsstandards im Fach Deutsch, 2003
Zugelassene Dokumenttypen:
- Beschluesse (Resolutions)
- Vereinbarungen (Agreements)
- EPA (Einheitliche Pruefungsanforderungen)
- Empfehlungen (Recommendations)
In Operations-Matrix:
| Operation | Erlaubt | Hinweis |
|---|---|---|
| Lookup | Ja | Quelle anzeigen |
| RAG | Ja | Zitation im Output |
| Training | NEIN | VERBOTEN |
| Export | Ja | Attribution |
9. Lizenzen
| Lizenz | Name | Attribution |
|---|---|---|
| DL-DE-BY-2.0 | Datenlizenz Deutschland | Ja |
| CC-BY | Creative Commons Attribution | Ja |
| CC-BY-SA | CC Attribution-ShareAlike | Ja + ShareAlike |
| CC0 | Public Domain | Nein |
| §5 UrhG | Amtliche Werke | Ja (Quelle) |
10. Aktueller Stand
Phase 1: Datenbank & Models - ABGESCHLOSSEN
- Codebase-Exploration edu-search-service
- Codebase-Exploration admin-v2
- Plan dokumentiert
- Migration 005_source_policies.sql erstellen
- Go Models implementieren (internal/policy/models.go)
- Store-Layer implementieren (internal/policy/store.go)
- Policy Enforcer implementieren (internal/policy/enforcer.go)
- PII Detector implementieren (internal/policy/pii_detector.go)
- Audit Logging implementieren (internal/policy/audit.go)
- YAML Loader implementieren (internal/policy/loader.go)
- Initial-Daten YAML erstellen (policies/bundeslaender.yaml)
- Unit Tests schreiben (internal/policy/policy_test.go)
- README aktualisieren
Phase 2: Admin API - AUSSTEHEND
- API Handlers implementieren (policy_handlers.go)
- main.go aktualisieren
- API testen
Phase 3: Integration - AUSSTEHEND
- Crawler-Integration
- Pipeline-Integration
Phase 4: Frontend - AUSSTEHEND
- Frontend page.tsx erstellen
- SourcesTab Component
- OperationsMatrixTab Component
- PIIRulesTab Component
- AuditTab Component
- Navigation aktualisieren
Erstellte Dateien:
edu-search-service/
├── migrations/
│ └── 005_source_policies.sql # DB Schema (6 Tabellen)
├── internal/policy/
│ ├── models.go # Datenstrukturen & Enums
│ ├── store.go # PostgreSQL CRUD
│ ├── enforcer.go # Policy-Enforcement
│ ├── pii_detector.go # PII-Erkennung
│ ├── audit.go # Audit-Logging
│ ├── loader.go # YAML-Loader
│ └── policy_test.go # Unit Tests
└── policies/
└── bundeslaender.yaml # Initial-Daten (8 Bundeslaender)