This repository has been archived on 2026-02-15. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
breakpilot-pwa/SOURCE_POLICY_IMPLEMENTATION_PLAN.md
Benjamin Admin bfdaf63ba9 fix: Restore all files lost during destructive rebase
A previous `git pull --rebase origin main` dropped 177 local commits,
losing 3400+ files across admin-v2, backend, studio-v2, website,
klausur-service, and many other services. The partial restore attempt
(660295e2) only recovered some files.

This commit restores all missing files from pre-rebase ref 98933f5e
while preserving post-rebase additions (night-scheduler, night-mode UI,
NightModeWidget dashboard integration).

Restored features include:
- AI Module Sidebar (FAB), OCR Labeling, OCR Compare
- GPU Dashboard, RAG Pipeline, Magic Help
- Klausur-Korrektur (8 files), Abitur-Archiv (5+ files)
- Companion, Zeugnisse-Crawler, Screen Flow
- Full backend, studio-v2, website, klausur-service
- All compliance SDKs, agent-core, voice-service
- CI/CD configs, documentation, scripts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 09:51:32 +01:00

531 lines
17 KiB
Markdown

# Source-Policy System - Implementierungsplan
## Zusammenfassung
Whitelist-basiertes Datenquellen-Management fuer das edu-search-service unter `/compliance/source-policy`. Fuer Auditoren pruefbar mit vollstaendigem Audit-Trail.
**Kernprinzipien:**
- Nur offizielle Open-Data-Portale und amtliche Quellen (§5 UrhG)
- Training mit externen Daten: **VERBOTEN**
- Alle Aenderungen protokolliert (Audit-Trail)
- PII-Blocklist mit Hard-Block
---
## 1. Architektur
```
┌─────────────────────────────────────────────────────────────────┐
│ admin-v2 (Next.js) │
│ /app/(admin)/compliance/source-policy/ │
│ ├── page.tsx (Dashboard + Tabs) │
│ └── components/ │
│ ├── SourcesTab.tsx (Whitelist-Verwaltung) │
│ ├── OperationsMatrixTab.tsx (Lookup/RAG/Training/Export) │
│ ├── PIIRulesTab.tsx (PII-Blocklist) │
│ └── AuditTab.tsx (Aenderungshistorie + Export) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ edu-search-service (Go) │
│ NEW: internal/policy/ │
│ ├── models.go (Datenstrukturen) │
│ ├── store.go (PostgreSQL CRUD) │
│ ├── enforcer.go (Policy-Enforcement) │
│ ├── pii_detector.go (PII-Erkennung) │
│ └── audit.go (Audit-Logging) │
│ │
│ MODIFIED: │
│ ├── crawler/crawler.go (Whitelist-Check vor Fetch) │
│ ├── pipeline/pipeline.go (PII-Filter nach Extract) │
│ └── api/handlers/policy_handlers.go (Admin-API) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ PostgreSQL │
│ NEW TABLES: │
│ - source_policies (versionierte Policies) │
│ - allowed_sources (Whitelist pro Bundesland) │
│ - operation_permissions (Lookup/RAG/Training/Export Matrix) │
│ - pii_rules (Regex/Keyword Blocklist) │
│ - policy_audit_log (unveraenderlich) │
│ - blocked_content_log (blockierte URLs fuer Audit) │
└─────────────────────────────────────────────────────────────────┘
```
---
## 2. Datenmodell
### 2.1 PostgreSQL Schema
```sql
-- Policies (versioniert)
CREATE TABLE source_policies (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
version INTEGER NOT NULL DEFAULT 1,
name VARCHAR(255) NOT NULL,
bundesland VARCHAR(2), -- NULL = Bundesebene/KMK
is_active BOOLEAN DEFAULT true,
created_at TIMESTAMP DEFAULT NOW(),
approved_by UUID,
approved_at TIMESTAMP
);
-- Whitelist
CREATE TABLE allowed_sources (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
policy_id UUID REFERENCES source_policies(id),
domain VARCHAR(255) NOT NULL,
name VARCHAR(255) NOT NULL,
license VARCHAR(50) NOT NULL, -- DL-DE-BY-2.0, CC-BY, §5 UrhG
legal_basis VARCHAR(100),
citation_template TEXT,
trust_boost DECIMAL(3,2) DEFAULT 0.50,
is_active BOOLEAN DEFAULT true
);
-- Operations Matrix
CREATE TABLE operation_permissions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
source_id UUID REFERENCES allowed_sources(id),
operation VARCHAR(50) NOT NULL, -- lookup, rag, training, export
is_allowed BOOLEAN NOT NULL,
requires_citation BOOLEAN DEFAULT false,
notes TEXT
);
-- PII Blocklist
CREATE TABLE pii_rules (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
rule_type VARCHAR(50) NOT NULL, -- regex, keyword
pattern TEXT NOT NULL,
severity VARCHAR(20) DEFAULT 'block', -- block, warn, redact
is_active BOOLEAN DEFAULT true
);
-- Audit Log (immutable)
CREATE TABLE policy_audit_log (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
action VARCHAR(50) NOT NULL,
entity_type VARCHAR(50) NOT NULL,
entity_id UUID,
old_value JSONB,
new_value JSONB,
user_email VARCHAR(255),
created_at TIMESTAMP DEFAULT NOW()
);
-- Blocked Content Log
CREATE TABLE blocked_content_log (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
url VARCHAR(2048) NOT NULL,
domain VARCHAR(255) NOT NULL,
block_reason VARCHAR(100) NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
```
### 2.2 Initial-Daten
Datei: `edu-search-service/policies/bundeslaender.yaml`
```yaml
federal:
name: "KMK & Bundesebene"
sources:
- domain: "kmk.org"
name: "Kultusministerkonferenz"
license: "§5 UrhG"
legal_basis: "Amtliche Werke (§5 UrhG)"
citation_template: "Quelle: KMK, {title}, {date}"
- domain: "bildungsserver.de"
name: "Deutscher Bildungsserver"
license: "DL-DE-BY-2.0"
NI:
name: "Niedersachsen"
sources:
- domain: "nibis.de"
name: "NiBiS Bildungsserver"
license: "DL-DE-BY-2.0"
- domain: "mk.niedersachsen.de"
name: "Kultusministerium Niedersachsen"
license: "§5 UrhG"
- domain: "cuvo.nibis.de"
name: "Kerncurricula Niedersachsen"
license: "DL-DE-BY-2.0"
BY:
name: "Bayern"
sources:
- domain: "km.bayern.de"
name: "Bayerisches Kultusministerium"
license: "§5 UrhG"
- domain: "isb.bayern.de"
name: "ISB Bayern"
license: "DL-DE-BY-2.0"
- domain: "lehrplanplus.bayern.de"
name: "LehrplanPLUS"
license: "DL-DE-BY-2.0"
# Default Operations Matrix
default_operations:
lookup:
allowed: true
requires_citation: true
rag:
allowed: true
requires_citation: true
training:
allowed: false # VERBOTEN
export:
allowed: true
requires_citation: true
# Default PII Rules
pii_rules:
- name: "Email Addresses"
type: "regex"
pattern: "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
severity: "block"
- name: "German Phone Numbers"
type: "regex"
pattern: "(?:\\+49|0)[\\s.-]?\\d{2,4}[\\s.-]?\\d{3,}[\\s.-]?\\d{2,}"
severity: "block"
- name: "IBAN"
type: "regex"
pattern: "DE\\d{2}\\s?\\d{4}\\s?\\d{4}\\s?\\d{4}\\s?\\d{4}\\s?\\d{2}"
severity: "block"
```
---
## 3. Backend Implementation
### 3.1 Neue Dateien
| Datei | Beschreibung |
|-------|--------------|
| `internal/policy/models.go` | Go Structs (SourcePolicy, AllowedSource, PIIRule, etc.) |
| `internal/policy/store.go` | PostgreSQL CRUD mit pgx |
| `internal/policy/enforcer.go` | `CheckSource()`, `CheckOperation()`, `DetectPII()` |
| `internal/policy/audit.go` | `LogChange()`, `LogBlocked()` |
| `internal/policy/pii_detector.go` | Regex-basierte PII-Erkennung |
| `internal/api/handlers/policy_handlers.go` | Admin-Endpoints |
| `migrations/005_source_policies.sql` | DB-Schema |
| `policies/bundeslaender.yaml` | Initial-Daten |
### 3.2 API Endpoints
```
# Policies
GET /v1/admin/policies
POST /v1/admin/policies
PUT /v1/admin/policies/:id
# Sources (Whitelist)
GET /v1/admin/sources
POST /v1/admin/sources
PUT /v1/admin/sources/:id
DELETE /v1/admin/sources/:id
# Operations Matrix
GET /v1/admin/operations-matrix
PUT /v1/admin/operations/:id
# PII Rules
GET /v1/admin/pii-rules
POST /v1/admin/pii-rules
PUT /v1/admin/pii-rules/:id
DELETE /v1/admin/pii-rules/:id
POST /v1/admin/pii-rules/test # Test gegen Sample-Text
# Audit
GET /v1/admin/policy-audit?from=&to=
GET /v1/admin/blocked-content?from=&to=
GET /v1/admin/compliance-report # PDF/JSON Export
# Live-Check
POST /v1/admin/check-compliance
Body: { "url": "...", "operation": "lookup" }
```
### 3.3 Crawler-Integration
In `crawler/crawler.go`:
```go
func (c *Crawler) FetchWithPolicy(ctx context.Context, url string) (*FetchResult, error) {
// 1. Whitelist-Check
source, err := c.enforcer.CheckSource(ctx, url)
if err != nil || source == nil {
c.enforcer.LogBlocked(ctx, url, "not_whitelisted")
return nil, ErrNotWhitelisted
}
// ... existing fetch ...
// 2. PII-Check nach Fetch
piiMatches := c.enforcer.DetectPII(content)
if hasSeverity(piiMatches, "block") {
c.enforcer.LogBlocked(ctx, url, "pii_detected")
return nil, ErrPIIDetected
}
return result, nil
}
```
---
## 4. Frontend Implementation
### 4.1 Navigation Update
In `lib/navigation.ts` unter `compliance` Kategorie hinzufuegen:
```typescript
{
id: 'source-policy',
name: 'Quellen-Policy',
href: '/compliance/source-policy',
description: 'Datenquellen & Compliance',
purpose: 'Whitelist zugelassener Datenquellen mit Operations-Matrix und PII-Blocklist.',
audience: ['DSB', 'Compliance Officer', 'Auditor'],
gdprArticles: ['Art. 5 (Rechtmaessigkeit)', 'Art. 6 (Rechtsgrundlage)'],
}
```
### 4.2 Seiten-Struktur
```
/app/(admin)/compliance/source-policy/
├── page.tsx # Haupt-Dashboard mit Tabs
└── components/
├── SourcesTab.tsx # Whitelist-Tabelle mit CRUD
├── OperationsMatrixTab.tsx # 4x4 Matrix
├── PIIRulesTab.tsx # PII-Regeln mit Test-Funktion
└── AuditTab.tsx # Aenderungshistorie + Export
```
### 4.3 UI-Layout
**Stats Cards (oben):**
- Aktive Policies
- Zugelassene Quellen
- Blockiert (heute)
- Compliance Score
**Tabs:**
1. **Dashboard** - Uebersicht mit Quick-Stats
2. **Quellen** - Whitelist-Tabelle (Domain, Name, Lizenz, Status)
3. **Operations** - Matrix mit Lookup/RAG/Training/Export
4. **PII-Regeln** - Blocklist mit Test-Funktion
5. **Audit** - Aenderungshistorie mit PDF/JSON-Export
**Pattern (aus audit-report/page.tsx):**
- Tab-Navigation: `bg-purple-600 text-white` fuer aktiv
- Status-Badges: `bg-green-100 text-green-700` fuer aktiv
- Tabellen: `hover:bg-slate-50`
- Info-Boxen: `bg-blue-50 border-blue-200`
---
## 5. Betroffene Dateien
### Neue Dateien erstellen:
**Backend (edu-search-service):**
```
internal/policy/models.go
internal/policy/store.go
internal/policy/enforcer.go
internal/policy/audit.go
internal/policy/pii_detector.go
internal/api/handlers/policy_handlers.go
migrations/005_source_policies.sql
policies/bundeslaender.yaml
```
**Frontend (admin-v2):**
```
app/(admin)/compliance/source-policy/page.tsx
app/(admin)/compliance/source-policy/components/SourcesTab.tsx
app/(admin)/compliance/source-policy/components/OperationsMatrixTab.tsx
app/(admin)/compliance/source-policy/components/PIIRulesTab.tsx
app/(admin)/compliance/source-policy/components/AuditTab.tsx
```
### Bestehende Dateien aendern:
```
edu-search-service/cmd/server/main.go # Policy-Endpoints registrieren
edu-search-service/internal/crawler/crawler.go # Policy-Check hinzufuegen
edu-search-service/internal/pipeline/pipeline.go # PII-Filter
edu-search-service/internal/database/database.go # Migrations
admin-v2/lib/navigation.ts # source-policy Modul
```
---
## 6. Implementierungs-Reihenfolge
### Phase 1: Datenbank & Models
1. Migration `005_source_policies.sql` erstellen
2. Go Models in `internal/policy/models.go`
3. Store-Layer in `internal/policy/store.go`
4. YAML-Loader fuer Initial-Daten
### Phase 2: Policy Enforcer
1. `internal/policy/enforcer.go` - CheckSource, CheckOperation
2. `internal/policy/pii_detector.go` - Regex-basierte Erkennung
3. `internal/policy/audit.go` - Logging
4. Integration in Crawler
### Phase 3: Admin API
1. `internal/api/handlers/policy_handlers.go`
2. Routen in main.go registrieren
3. API testen
### Phase 4: Frontend
1. Hauptseite mit PagePurpose
2. SourcesTab mit Whitelist-CRUD
3. OperationsMatrixTab
4. PIIRulesTab mit Test-Funktion
5. AuditTab mit Export
### Phase 5: Testing & Deployment
1. Unit Tests fuer Enforcer
2. Integration Tests fuer API
3. E2E Test fuer Frontend
4. Deployment auf Mac Mini
---
## 7. Verifikation
### Nach Backend (Phase 1-3):
```bash
# Migration ausfuehren
ssh macmini "cd /path/to/edu-search-service && go run ./cmd/migrate"
# API testen
curl -X GET http://macmini:8088/v1/admin/policies
curl -X POST http://macmini:8088/v1/admin/check-compliance \
-d '{"url":"https://nibis.de/test","operation":"lookup"}'
```
### Nach Frontend (Phase 4):
```bash
# Build & Deploy
rsync -avz admin-v2/ macmini:/path/to/admin-v2/
ssh macmini "docker compose build admin-v2 && docker compose up -d admin-v2"
# Testen
open https://macmini:3002/compliance/source-policy
```
### Auditor-Checkliste:
- [ ] Alle Quellen in Whitelist dokumentiert
- [ ] Operations-Matrix zeigt Training = VERBOTEN
- [ ] PII-Regeln aktiv und testbar
- [ ] Audit-Log zeigt alle Aenderungen
- [ ] Blocked-Content-Log zeigt blockierte URLs
- [ ] PDF/JSON-Export funktioniert
---
## 8. KMK-Spezifika (§5 UrhG)
**Rechtsgrundlage:**
- KMK-Beschluesse, Vereinbarungen, EPA sind amtliche Werke nach §5 UrhG
- Frei nutzbar, Attribution erforderlich
**Zitierformat:**
```
Quelle: KMK, [Titel des Beschlusses], [Datum]
Beispiel: Quelle: KMK, Bildungsstandards im Fach Deutsch, 2003
```
**Zugelassene Dokumenttypen:**
- Beschluesse (Resolutions)
- Vereinbarungen (Agreements)
- EPA (Einheitliche Pruefungsanforderungen)
- Empfehlungen (Recommendations)
**In Operations-Matrix:**
| Operation | Erlaubt | Hinweis |
|-----------|---------|---------|
| Lookup | Ja | Quelle anzeigen |
| RAG | Ja | Zitation im Output |
| Training | **NEIN** | VERBOTEN |
| Export | Ja | Attribution |
---
## 9. Lizenzen
| Lizenz | Name | Attribution |
|--------|------|-------------|
| DL-DE-BY-2.0 | Datenlizenz Deutschland | Ja |
| CC-BY | Creative Commons Attribution | Ja |
| CC-BY-SA | CC Attribution-ShareAlike | Ja + ShareAlike |
| CC0 | Public Domain | Nein |
| §5 UrhG | Amtliche Werke | Ja (Quelle) |
---
## 10. Aktueller Stand
**Phase 1: Datenbank & Models - ABGESCHLOSSEN**
- [x] Codebase-Exploration edu-search-service
- [x] Codebase-Exploration admin-v2
- [x] Plan dokumentiert
- [x] Migration 005_source_policies.sql erstellen
- [x] Go Models implementieren (internal/policy/models.go)
- [x] Store-Layer implementieren (internal/policy/store.go)
- [x] Policy Enforcer implementieren (internal/policy/enforcer.go)
- [x] PII Detector implementieren (internal/policy/pii_detector.go)
- [x] Audit Logging implementieren (internal/policy/audit.go)
- [x] YAML Loader implementieren (internal/policy/loader.go)
- [x] Initial-Daten YAML erstellen (policies/bundeslaender.yaml)
- [x] Unit Tests schreiben (internal/policy/policy_test.go)
- [x] README aktualisieren
**Phase 2: Admin API - AUSSTEHEND**
- [ ] API Handlers implementieren (policy_handlers.go)
- [ ] main.go aktualisieren
- [ ] API testen
**Phase 3: Integration - AUSSTEHEND**
- [ ] Crawler-Integration
- [ ] Pipeline-Integration
**Phase 4: Frontend - AUSSTEHEND**
- [ ] Frontend page.tsx erstellen
- [ ] SourcesTab Component
- [ ] OperationsMatrixTab Component
- [ ] PIIRulesTab Component
- [ ] AuditTab Component
- [ ] Navigation aktualisieren
**Erstellte Dateien:**
```
edu-search-service/
├── migrations/
│ └── 005_source_policies.sql # DB Schema (6 Tabellen)
├── internal/policy/
│ ├── models.go # Datenstrukturen & Enums
│ ├── store.go # PostgreSQL CRUD
│ ├── enforcer.go # Policy-Enforcement
│ ├── pii_detector.go # PII-Erkennung
│ ├── audit.go # Audit-Logging
│ ├── loader.go # YAML-Loader
│ └── policy_test.go # Unit Tests
└── policies/
└── bundeslaender.yaml # Initial-Daten (8 Bundeslaender)
```