This repository has been archived on 2026-02-15. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
breakpilot-pwa/SOURCE_POLICY_IMPLEMENTATION_PLAN.md
BreakPilot Dev 19855efacc
Some checks failed
Tests / Go Tests (push) Has been cancelled
Tests / Python Tests (push) Has been cancelled
Tests / Integration Tests (push) Has been cancelled
Tests / Go Lint (push) Has been cancelled
Tests / Python Lint (push) Has been cancelled
Tests / Security Scan (push) Has been cancelled
Tests / All Checks Passed (push) Has been cancelled
Security Scanning / Secret Scanning (push) Has been cancelled
Security Scanning / Dependency Vulnerability Scan (push) Has been cancelled
Security Scanning / Go Security Scan (push) Has been cancelled
Security Scanning / Python Security Scan (push) Has been cancelled
Security Scanning / Node.js Security Scan (push) Has been cancelled
Security Scanning / Docker Image Security (push) Has been cancelled
Security Scanning / Security Summary (push) Has been cancelled
CI/CD Pipeline / Go Tests (push) Has been cancelled
CI/CD Pipeline / Python Tests (push) Has been cancelled
CI/CD Pipeline / Website Tests (push) Has been cancelled
CI/CD Pipeline / Linting (push) Has been cancelled
CI/CD Pipeline / Security Scan (push) Has been cancelled
CI/CD Pipeline / Docker Build & Push (push) Has been cancelled
CI/CD Pipeline / Integration Tests (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / CI Summary (push) Has been cancelled
ci/woodpecker/manual/build-ci-image Pipeline was successful
ci/woodpecker/manual/main Pipeline failed
feat: BreakPilot PWA - Full codebase (clean push without large binaries)
All services: admin-v2, studio-v2, website, ai-compliance-sdk,
consent-service, klausur-service, voice-service, and infrastructure.
Large PDFs and compiled binaries excluded via .gitignore.
2026-02-11 13:25:58 +01:00

17 KiB

Source-Policy System - Implementierungsplan

Zusammenfassung

Whitelist-basiertes Datenquellen-Management fuer das edu-search-service unter /compliance/source-policy. Fuer Auditoren pruefbar mit vollstaendigem Audit-Trail.

Kernprinzipien:

  • Nur offizielle Open-Data-Portale und amtliche Quellen (§5 UrhG)
  • Training mit externen Daten: VERBOTEN
  • Alle Aenderungen protokolliert (Audit-Trail)
  • PII-Blocklist mit Hard-Block

1. Architektur

┌─────────────────────────────────────────────────────────────────┐
│                    admin-v2 (Next.js)                           │
│  /app/(admin)/compliance/source-policy/                         │
│  ├── page.tsx (Dashboard + Tabs)                                │
│  └── components/                                                │
│      ├── SourcesTab.tsx (Whitelist-Verwaltung)                  │
│      ├── OperationsMatrixTab.tsx (Lookup/RAG/Training/Export)   │
│      ├── PIIRulesTab.tsx (PII-Blocklist)                        │
│      └── AuditTab.tsx (Aenderungshistorie + Export)             │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                  edu-search-service (Go)                        │
│  NEW: internal/policy/                                          │
│  ├── models.go        (Datenstrukturen)                         │
│  ├── store.go         (PostgreSQL CRUD)                         │
│  ├── enforcer.go      (Policy-Enforcement)                      │
│  ├── pii_detector.go  (PII-Erkennung)                          │
│  └── audit.go         (Audit-Logging)                          │
│                                                                 │
│  MODIFIED:                                                      │
│  ├── crawler/crawler.go    (Whitelist-Check vor Fetch)         │
│  ├── pipeline/pipeline.go  (PII-Filter nach Extract)           │
│  └── api/handlers/policy_handlers.go (Admin-API)               │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                     PostgreSQL                                  │
│  NEW TABLES:                                                    │
│  - source_policies        (versionierte Policies)              │
│  - allowed_sources        (Whitelist pro Bundesland)           │
│  - operation_permissions  (Lookup/RAG/Training/Export Matrix)  │
│  - pii_rules              (Regex/Keyword Blocklist)            │
│  - policy_audit_log       (unveraenderlich)                    │
│  - blocked_content_log    (blockierte URLs fuer Audit)         │
└─────────────────────────────────────────────────────────────────┘

2. Datenmodell

2.1 PostgreSQL Schema

-- Policies (versioniert)
CREATE TABLE source_policies (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    version INTEGER NOT NULL DEFAULT 1,
    name VARCHAR(255) NOT NULL,
    bundesland VARCHAR(2),  -- NULL = Bundesebene/KMK
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMP DEFAULT NOW(),
    approved_by UUID,
    approved_at TIMESTAMP
);

-- Whitelist
CREATE TABLE allowed_sources (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    policy_id UUID REFERENCES source_policies(id),
    domain VARCHAR(255) NOT NULL,
    name VARCHAR(255) NOT NULL,
    license VARCHAR(50) NOT NULL,  -- DL-DE-BY-2.0, CC-BY, §5 UrhG
    legal_basis VARCHAR(100),
    citation_template TEXT,
    trust_boost DECIMAL(3,2) DEFAULT 0.50,
    is_active BOOLEAN DEFAULT true
);

-- Operations Matrix
CREATE TABLE operation_permissions (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    source_id UUID REFERENCES allowed_sources(id),
    operation VARCHAR(50) NOT NULL,  -- lookup, rag, training, export
    is_allowed BOOLEAN NOT NULL,
    requires_citation BOOLEAN DEFAULT false,
    notes TEXT
);

-- PII Blocklist
CREATE TABLE pii_rules (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL,
    rule_type VARCHAR(50) NOT NULL,  -- regex, keyword
    pattern TEXT NOT NULL,
    severity VARCHAR(20) DEFAULT 'block',  -- block, warn, redact
    is_active BOOLEAN DEFAULT true
);

-- Audit Log (immutable)
CREATE TABLE policy_audit_log (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    action VARCHAR(50) NOT NULL,
    entity_type VARCHAR(50) NOT NULL,
    entity_id UUID,
    old_value JSONB,
    new_value JSONB,
    user_email VARCHAR(255),
    created_at TIMESTAMP DEFAULT NOW()
);

-- Blocked Content Log
CREATE TABLE blocked_content_log (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    url VARCHAR(2048) NOT NULL,
    domain VARCHAR(255) NOT NULL,
    block_reason VARCHAR(100) NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
);

2.2 Initial-Daten

Datei: edu-search-service/policies/bundeslaender.yaml

federal:
  name: "KMK & Bundesebene"
  sources:
    - domain: "kmk.org"
      name: "Kultusministerkonferenz"
      license: "§5 UrhG"
      legal_basis: "Amtliche Werke (§5 UrhG)"
      citation_template: "Quelle: KMK, {title}, {date}"
    - domain: "bildungsserver.de"
      name: "Deutscher Bildungsserver"
      license: "DL-DE-BY-2.0"

NI:
  name: "Niedersachsen"
  sources:
    - domain: "nibis.de"
      name: "NiBiS Bildungsserver"
      license: "DL-DE-BY-2.0"
    - domain: "mk.niedersachsen.de"
      name: "Kultusministerium Niedersachsen"
      license: "§5 UrhG"
    - domain: "cuvo.nibis.de"
      name: "Kerncurricula Niedersachsen"
      license: "DL-DE-BY-2.0"

BY:
  name: "Bayern"
  sources:
    - domain: "km.bayern.de"
      name: "Bayerisches Kultusministerium"
      license: "§5 UrhG"
    - domain: "isb.bayern.de"
      name: "ISB Bayern"
      license: "DL-DE-BY-2.0"
    - domain: "lehrplanplus.bayern.de"
      name: "LehrplanPLUS"
      license: "DL-DE-BY-2.0"

# Default Operations Matrix
default_operations:
  lookup:
    allowed: true
    requires_citation: true
  rag:
    allowed: true
    requires_citation: true
  training:
    allowed: false  # VERBOTEN
  export:
    allowed: true
    requires_citation: true

# Default PII Rules
pii_rules:
  - name: "Email Addresses"
    type: "regex"
    pattern: "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
    severity: "block"
  - name: "German Phone Numbers"
    type: "regex"
    pattern: "(?:\\+49|0)[\\s.-]?\\d{2,4}[\\s.-]?\\d{3,}[\\s.-]?\\d{2,}"
    severity: "block"
  - name: "IBAN"
    type: "regex"
    pattern: "DE\\d{2}\\s?\\d{4}\\s?\\d{4}\\s?\\d{4}\\s?\\d{4}\\s?\\d{2}"
    severity: "block"

3. Backend Implementation

3.1 Neue Dateien

Datei Beschreibung
internal/policy/models.go Go Structs (SourcePolicy, AllowedSource, PIIRule, etc.)
internal/policy/store.go PostgreSQL CRUD mit pgx
internal/policy/enforcer.go CheckSource(), CheckOperation(), DetectPII()
internal/policy/audit.go LogChange(), LogBlocked()
internal/policy/pii_detector.go Regex-basierte PII-Erkennung
internal/api/handlers/policy_handlers.go Admin-Endpoints
migrations/005_source_policies.sql DB-Schema
policies/bundeslaender.yaml Initial-Daten

3.2 API Endpoints

# Policies
GET    /v1/admin/policies
POST   /v1/admin/policies
PUT    /v1/admin/policies/:id

# Sources (Whitelist)
GET    /v1/admin/sources
POST   /v1/admin/sources
PUT    /v1/admin/sources/:id
DELETE /v1/admin/sources/:id

# Operations Matrix
GET    /v1/admin/operations-matrix
PUT    /v1/admin/operations/:id

# PII Rules
GET    /v1/admin/pii-rules
POST   /v1/admin/pii-rules
PUT    /v1/admin/pii-rules/:id
DELETE /v1/admin/pii-rules/:id
POST   /v1/admin/pii-rules/test  # Test gegen Sample-Text

# Audit
GET    /v1/admin/policy-audit?from=&to=
GET    /v1/admin/blocked-content?from=&to=
GET    /v1/admin/compliance-report  # PDF/JSON Export

# Live-Check
POST   /v1/admin/check-compliance
       Body: { "url": "...", "operation": "lookup" }

3.3 Crawler-Integration

In crawler/crawler.go:

func (c *Crawler) FetchWithPolicy(ctx context.Context, url string) (*FetchResult, error) {
    // 1. Whitelist-Check
    source, err := c.enforcer.CheckSource(ctx, url)
    if err != nil || source == nil {
        c.enforcer.LogBlocked(ctx, url, "not_whitelisted")
        return nil, ErrNotWhitelisted
    }

    // ... existing fetch ...

    // 2. PII-Check nach Fetch
    piiMatches := c.enforcer.DetectPII(content)
    if hasSeverity(piiMatches, "block") {
        c.enforcer.LogBlocked(ctx, url, "pii_detected")
        return nil, ErrPIIDetected
    }

    return result, nil
}

4. Frontend Implementation

4.1 Navigation Update

In lib/navigation.ts unter compliance Kategorie hinzufuegen:

{
  id: 'source-policy',
  name: 'Quellen-Policy',
  href: '/compliance/source-policy',
  description: 'Datenquellen & Compliance',
  purpose: 'Whitelist zugelassener Datenquellen mit Operations-Matrix und PII-Blocklist.',
  audience: ['DSB', 'Compliance Officer', 'Auditor'],
  gdprArticles: ['Art. 5 (Rechtmaessigkeit)', 'Art. 6 (Rechtsgrundlage)'],
}

4.2 Seiten-Struktur

/app/(admin)/compliance/source-policy/
├── page.tsx              # Haupt-Dashboard mit Tabs
└── components/
    ├── SourcesTab.tsx    # Whitelist-Tabelle mit CRUD
    ├── OperationsMatrixTab.tsx  # 4x4 Matrix
    ├── PIIRulesTab.tsx   # PII-Regeln mit Test-Funktion
    └── AuditTab.tsx      # Aenderungshistorie + Export

4.3 UI-Layout

Stats Cards (oben):

  • Aktive Policies
  • Zugelassene Quellen
  • Blockiert (heute)
  • Compliance Score

Tabs:

  1. Dashboard - Uebersicht mit Quick-Stats
  2. Quellen - Whitelist-Tabelle (Domain, Name, Lizenz, Status)
  3. Operations - Matrix mit Lookup/RAG/Training/Export
  4. PII-Regeln - Blocklist mit Test-Funktion
  5. Audit - Aenderungshistorie mit PDF/JSON-Export

Pattern (aus audit-report/page.tsx):

  • Tab-Navigation: bg-purple-600 text-white fuer aktiv
  • Status-Badges: bg-green-100 text-green-700 fuer aktiv
  • Tabellen: hover:bg-slate-50
  • Info-Boxen: bg-blue-50 border-blue-200

5. Betroffene Dateien

Neue Dateien erstellen:

Backend (edu-search-service):

internal/policy/models.go
internal/policy/store.go
internal/policy/enforcer.go
internal/policy/audit.go
internal/policy/pii_detector.go
internal/api/handlers/policy_handlers.go
migrations/005_source_policies.sql
policies/bundeslaender.yaml

Frontend (admin-v2):

app/(admin)/compliance/source-policy/page.tsx
app/(admin)/compliance/source-policy/components/SourcesTab.tsx
app/(admin)/compliance/source-policy/components/OperationsMatrixTab.tsx
app/(admin)/compliance/source-policy/components/PIIRulesTab.tsx
app/(admin)/compliance/source-policy/components/AuditTab.tsx

Bestehende Dateien aendern:

edu-search-service/cmd/server/main.go           # Policy-Endpoints registrieren
edu-search-service/internal/crawler/crawler.go  # Policy-Check hinzufuegen
edu-search-service/internal/pipeline/pipeline.go  # PII-Filter
edu-search-service/internal/database/database.go  # Migrations
admin-v2/lib/navigation.ts                      # source-policy Modul

6. Implementierungs-Reihenfolge

Phase 1: Datenbank & Models

  1. Migration 005_source_policies.sql erstellen
  2. Go Models in internal/policy/models.go
  3. Store-Layer in internal/policy/store.go
  4. YAML-Loader fuer Initial-Daten

Phase 2: Policy Enforcer

  1. internal/policy/enforcer.go - CheckSource, CheckOperation
  2. internal/policy/pii_detector.go - Regex-basierte Erkennung
  3. internal/policy/audit.go - Logging
  4. Integration in Crawler

Phase 3: Admin API

  1. internal/api/handlers/policy_handlers.go
  2. Routen in main.go registrieren
  3. API testen

Phase 4: Frontend

  1. Hauptseite mit PagePurpose
  2. SourcesTab mit Whitelist-CRUD
  3. OperationsMatrixTab
  4. PIIRulesTab mit Test-Funktion
  5. AuditTab mit Export

Phase 5: Testing & Deployment

  1. Unit Tests fuer Enforcer
  2. Integration Tests fuer API
  3. E2E Test fuer Frontend
  4. Deployment auf Mac Mini

7. Verifikation

Nach Backend (Phase 1-3):

# Migration ausfuehren
ssh macmini "cd /path/to/edu-search-service && go run ./cmd/migrate"

# API testen
curl -X GET http://macmini:8088/v1/admin/policies
curl -X POST http://macmini:8088/v1/admin/check-compliance \
  -d '{"url":"https://nibis.de/test","operation":"lookup"}'

Nach Frontend (Phase 4):

# Build & Deploy
rsync -avz admin-v2/ macmini:/path/to/admin-v2/
ssh macmini "docker compose build admin-v2 && docker compose up -d admin-v2"

# Testen
open https://macmini:3002/compliance/source-policy

Auditor-Checkliste:

  • Alle Quellen in Whitelist dokumentiert
  • Operations-Matrix zeigt Training = VERBOTEN
  • PII-Regeln aktiv und testbar
  • Audit-Log zeigt alle Aenderungen
  • Blocked-Content-Log zeigt blockierte URLs
  • PDF/JSON-Export funktioniert

8. KMK-Spezifika (§5 UrhG)

Rechtsgrundlage:

  • KMK-Beschluesse, Vereinbarungen, EPA sind amtliche Werke nach §5 UrhG
  • Frei nutzbar, Attribution erforderlich

Zitierformat:

Quelle: KMK, [Titel des Beschlusses], [Datum]
Beispiel: Quelle: KMK, Bildungsstandards im Fach Deutsch, 2003

Zugelassene Dokumenttypen:

  • Beschluesse (Resolutions)
  • Vereinbarungen (Agreements)
  • EPA (Einheitliche Pruefungsanforderungen)
  • Empfehlungen (Recommendations)

In Operations-Matrix:

Operation Erlaubt Hinweis
Lookup Ja Quelle anzeigen
RAG Ja Zitation im Output
Training NEIN VERBOTEN
Export Ja Attribution

9. Lizenzen

Lizenz Name Attribution
DL-DE-BY-2.0 Datenlizenz Deutschland Ja
CC-BY Creative Commons Attribution Ja
CC-BY-SA CC Attribution-ShareAlike Ja + ShareAlike
CC0 Public Domain Nein
§5 UrhG Amtliche Werke Ja (Quelle)

10. Aktueller Stand

Phase 1: Datenbank & Models - ABGESCHLOSSEN

  • Codebase-Exploration edu-search-service
  • Codebase-Exploration admin-v2
  • Plan dokumentiert
  • Migration 005_source_policies.sql erstellen
  • Go Models implementieren (internal/policy/models.go)
  • Store-Layer implementieren (internal/policy/store.go)
  • Policy Enforcer implementieren (internal/policy/enforcer.go)
  • PII Detector implementieren (internal/policy/pii_detector.go)
  • Audit Logging implementieren (internal/policy/audit.go)
  • YAML Loader implementieren (internal/policy/loader.go)
  • Initial-Daten YAML erstellen (policies/bundeslaender.yaml)
  • Unit Tests schreiben (internal/policy/policy_test.go)
  • README aktualisieren

Phase 2: Admin API - AUSSTEHEND

  • API Handlers implementieren (policy_handlers.go)
  • main.go aktualisieren
  • API testen

Phase 3: Integration - AUSSTEHEND

  • Crawler-Integration
  • Pipeline-Integration

Phase 4: Frontend - AUSSTEHEND

  • Frontend page.tsx erstellen
  • SourcesTab Component
  • OperationsMatrixTab Component
  • PIIRulesTab Component
  • AuditTab Component
  • Navigation aktualisieren

Erstellte Dateien:

edu-search-service/
├── migrations/
│   └── 005_source_policies.sql     # DB Schema (6 Tabellen)
├── internal/policy/
│   ├── models.go                   # Datenstrukturen & Enums
│   ├── store.go                    # PostgreSQL CRUD
│   ├── enforcer.go                 # Policy-Enforcement
│   ├── pii_detector.go             # PII-Erkennung
│   ├── audit.go                    # Audit-Logging
│   ├── loader.go                   # YAML-Loader
│   └── policy_test.go              # Unit Tests
└── policies/
    └── bundeslaender.yaml          # Initial-Daten (8 Bundeslaender)