docs: instruction for hardcoded knowledge → Control Library migration
6 files with hardcoded legal knowledge identified. Review deadline 2026-07-01. legal_basis_validator.py marked with warning log on every use. Instruction file for other session to execute migration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -2,7 +2,23 @@
|
|||||||
Legal Basis Validator — checks if the correct DSGVO legal basis (lit. a-f)
|
Legal Basis Validator — checks if the correct DSGVO legal basis (lit. a-f)
|
||||||
is used for each processing purpose in the privacy policy.
|
is used for each processing purpose in the privacy policy.
|
||||||
|
|
||||||
Common mistakes:
|
⚠️ TECHNISCHE SCHULD / HARDCODED KNOWLEDGE:
|
||||||
|
Dieses Modul enthält hartkodierte Rechtsgrundlagen-Zuordnungen (CORRECT_BASIS dict).
|
||||||
|
Das ist ein TEMPORAERER Fallback bis die Control Library entsprechende Controls hat.
|
||||||
|
|
||||||
|
MITTELFRISTIGES ZIEL: Dieses Dict durch RAG/Control-Library-Abfragen ersetzen.
|
||||||
|
Neue Controls sollten in der Pipeline generiert werden, z.B.:
|
||||||
|
"Cookie-Tracking erfordert Art. 6(1)(a) Einwilligung (EuGH C-673/17 Planet49)"
|
||||||
|
→ canonical_controls mit scope_conditions + legal_ref
|
||||||
|
|
||||||
|
BIS DAHIN: Dieses Dict wird als Fallback genutzt mit einem Warning-Log wenn
|
||||||
|
es herangezogen wird. Bei jedem neuen Gesetz/Urteil muss SOWOHL die Pipeline
|
||||||
|
als auch dieses Dict aktualisiert werden — oder besser: das Dict entfernen und
|
||||||
|
nur noch Controls nutzen.
|
||||||
|
|
||||||
|
Erstellt: 2026-04-29 | Review-Datum: 2026-07-01 | Owner: Agent-Team
|
||||||
|
|
||||||
|
Common mistakes detected:
|
||||||
- Cookie tracking on lit. f (legitimate interest) instead of lit. a (consent)
|
- Cookie tracking on lit. f (legitimate interest) instead of lit. a (consent)
|
||||||
- Marketing emails on lit. f instead of lit. a
|
- Marketing emails on lit. f instead of lit. a
|
||||||
- Analytics on lit. b (contract) — incorrect overextension
|
- Analytics on lit. b (contract) — incorrect overextension
|
||||||
@@ -85,7 +101,15 @@ CORRECT_BASIS: dict[str, dict] = {
|
|||||||
|
|
||||||
|
|
||||||
def validate_legal_bases(dse_text: str) -> list[LitFinding]:
|
def validate_legal_bases(dse_text: str) -> list[LitFinding]:
|
||||||
"""Check if correct legal bases are used in the privacy policy."""
|
"""Check if correct legal bases are used in the privacy policy.
|
||||||
|
|
||||||
|
⚠️ Uses HARDCODED CORRECT_BASIS dict as fallback.
|
||||||
|
TODO: Replace with RAG/Control Library query when lit-mapping Controls exist.
|
||||||
|
"""
|
||||||
|
logger.warning(
|
||||||
|
"legal_basis_validator: Using HARDCODED rules (CORRECT_BASIS dict). "
|
||||||
|
"This should be replaced with Control Library queries. Review date: 2026-07-01"
|
||||||
|
)
|
||||||
findings = []
|
findings = []
|
||||||
text_lower = dse_text.lower()
|
text_lower = dse_text.lower()
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,101 @@
|
|||||||
|
# Instruktion: Hartkodiertes Wissen → Control Library Migration
|
||||||
|
|
||||||
|
**Branch:** `feat/zeroclaw-compliance-agent`
|
||||||
|
**Repo:** `/Users/benjaminadmin/Projekte/breakpilot-compliance/`
|
||||||
|
**Erstellt:** 2026-04-29
|
||||||
|
**Review-Deadline:** 2026-07-01
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
Der Compliance Agent hat 6 Dateien mit hartkodiertem Rechtswissen in Python-Dicts.
|
||||||
|
Das Wissen veraltet und wird nicht von der Pipeline aktualisiert. Langfristig muss
|
||||||
|
alles aus der Control Library kommen (166k+ Controls in `compliance.canonical_controls`).
|
||||||
|
|
||||||
|
## Inventar (alle im Backend: `backend-compliance/compliance/services/`)
|
||||||
|
|
||||||
|
| Datei | Hartkodiert | Zeilen | Prioritaet |
|
||||||
|
|-------|------------|--------|-----------|
|
||||||
|
| `legal_basis_validator.py` | `CORRECT_BASIS` dict — 7 Lit-Zuordnungen (Art. 6 lit. a-f pro Zweck) | ~50 LOC | HOCH |
|
||||||
|
| `service_registry.py` | `SERVICE_REGISTRY` dict — 82 Services mit Legal Refs | ~500 LOC | MITTEL |
|
||||||
|
| `mandatory_content_checker.py` | `MANDATORY_DSE_CONTENT` (9 Felder) + `MANDATORY_IMPRESSUM_CONTENT` (5 Felder) | ~80 LOC | MITTEL |
|
||||||
|
| `relevance_filter.py` | `CONTROL_RELEVANCE` dict — 7 Controls mit Keyword-Listen | ~60 LOC | MITTEL |
|
||||||
|
| `consent-tester/services/script_analyzer.py` | `SERVICE_PATTERNS` — 19 Services (Duplikat von Registry) | ~70 LOC | NIEDRIG |
|
||||||
|
| `consent-tester/services/banner_detector.py` | `CMP_SELECTORS` — 10 CMPs | PERMANENT (technisch) | — |
|
||||||
|
|
||||||
|
## Migrationspfad pro Datei
|
||||||
|
|
||||||
|
### Schritt 1: Controls in der Pipeline generieren
|
||||||
|
|
||||||
|
Fuer jedes Dict-Entry einen entsprechenden Control in der Pipeline generieren lassen.
|
||||||
|
|
||||||
|
Beispiel fuer `legal_basis_validator.py`:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Neuer Control in canonical_controls:
|
||||||
|
INSERT INTO compliance.canonical_controls (title, objective, requirements, scope_conditions, tags)
|
||||||
|
VALUES (
|
||||||
|
'Cookie-Tracking erfordert Einwilligung (lit. a)',
|
||||||
|
'Cookie-Tracking und Webanalyse duerfen nur mit ausdruecklicher Einwilligung (Art. 6 Abs. 1 lit. a DSGVO) erfolgen.',
|
||||||
|
'Rechtsgrundlage fuer Cookie-Tracking muss Art. 6(1)(a) sein. Art. 6(1)(f) (berechtigtes Interesse) ist nach EuGH C-673/17 (Planet49) nicht zulaessig.',
|
||||||
|
'{"applies_when": "cookie_tracking OR web_analytics"}',
|
||||||
|
ARRAY['legal_basis', 'lit_mapping', 'cookie', 'planet49']
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
Benoetigte Controls (aus legal_basis_validator.py CORRECT_BASIS):
|
||||||
|
1. `cookie_tracking` → lit. a (Planet49)
|
||||||
|
2. `web_analytics` → lit. a (DSK Orientierungshilfe, §25 TDDDG)
|
||||||
|
3. `marketing_email` → lit. a (Art. 7 DSGVO, §7 UWG)
|
||||||
|
4. `remarketing` → lit. a (§25 TDDDG)
|
||||||
|
5. `credit_check` → lit. b/f + Art. 22 Pflichthinweis
|
||||||
|
6. `social_media_embed` → lit. a (Fashion ID Urteil)
|
||||||
|
7. `session_recording` → lit. a (§25 TDDDG)
|
||||||
|
|
||||||
|
Benoetigte Controls (aus mandatory_content_checker.py):
|
||||||
|
1. DSE muss Verantwortlichen nennen (Art. 13(1)(a))
|
||||||
|
2. DSE muss DSB-Kontakt nennen (Art. 13(1)(b))
|
||||||
|
3. DSE muss Zwecke nennen (Art. 13(1)(c))
|
||||||
|
4. DSE muss Rechtsgrundlagen nennen (Art. 13(1)(c))
|
||||||
|
5. DSE muss Speicherdauer nennen (Art. 13(2)(a))
|
||||||
|
6. DSE muss Betroffenenrechte nennen (Art. 13(2)(b-d))
|
||||||
|
7. DSE muss Beschwerderecht nennen (Art. 13(2)(d))
|
||||||
|
8. DSE muss Drittlandtransfer nennen (Art. 13(1)(f))
|
||||||
|
9. DSE muss automatisierte Entscheidungen nennen (Art. 13(2)(f))
|
||||||
|
|
||||||
|
### Schritt 2: Agent-Code aendern — Control Library first, Dict as Fallback
|
||||||
|
|
||||||
|
```python
|
||||||
|
# VORHER (hartkodiert):
|
||||||
|
CORRECT_BASIS = {"cookie_tracking": {"correct": "lit. a", ...}}
|
||||||
|
|
||||||
|
# NACHHER (Control Library first):
|
||||||
|
async def get_legal_basis_rule(purpose: str) -> dict | None:
|
||||||
|
controls = await query_controls(tags=["lit_mapping", purpose])
|
||||||
|
if controls:
|
||||||
|
return controls[0] # Control Library hat Vorrang
|
||||||
|
logger.warning("No control found for %s — using hardcoded fallback", purpose)
|
||||||
|
return CORRECT_BASIS.get(purpose) # Fallback
|
||||||
|
```
|
||||||
|
|
||||||
|
### Schritt 3: Dicts entfernen
|
||||||
|
|
||||||
|
Wenn alle Controls in der Library sind und der Agent sie zuverlaessig findet:
|
||||||
|
- Dicts loeschen
|
||||||
|
- Warning-Logs entfernen
|
||||||
|
- Tests aktualisieren
|
||||||
|
|
||||||
|
## Welche Datei NICHT migriert wird
|
||||||
|
|
||||||
|
`banner_detector.py` — die CMP-Selektoren sind technische CSS-Patterns, kein
|
||||||
|
Rechtswissen. Die bleiben hartkodiert und werden aktualisiert wenn CMPs ihre UI aendern.
|
||||||
|
|
||||||
|
## Erkennungszeichen im Code
|
||||||
|
|
||||||
|
Alle betroffenen Dateien haben:
|
||||||
|
- `⚠️ TECHNISCHE SCHULD` im Docstring
|
||||||
|
- `Review-Datum: 2026-07-01` im Header
|
||||||
|
- `logger.warning("... HARDCODED rules ...")` bei Nutzung
|
||||||
|
|
||||||
|
## Memory-Datei
|
||||||
|
|
||||||
|
Details: `/Users/benjaminadmin/.claude/projects/-Users-benjaminadmin-Projekte-breakpilot-lehrer/memory/hardcoded_knowledge_debt.md`
|
||||||
Reference in New Issue
Block a user