fix: Restrict sub-section detection to genuinely separate document types

Only Cookie and Widerruf sections are checked as separate documents.
Social Media, DSFA, Betroffenenrechte, Dienste von Drittanbietern are
part of the parent DSI and no longer generate false findings.

Added PLAN-rag-document-check.md for Phase 2:
- RAG-based checks with document-type-specific Controls
- DSFA checklist (Art. 35 + Landes-Listen)
- AVV checklist (Art. 28)
- Reference detection (sub-doc → parent doc)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-05-06 11:02:36 +02:00
parent 0416bb5d04
commit 13c5880f51
2 changed files with 95 additions and 10 deletions
@@ -249,17 +249,17 @@ def _run_checklist(text: str, doc_type: str, label: str, url: str, word_count: i
# Section heading patterns → document type mapping
# ONLY sections that are genuinely separate document types with their own checklists.
# Everything else (Social Media, Betroffenenrechte, Dienste von Drittanbietern)
# is part of the parent DSI and inherits its checks.
SECTION_TYPE_MAP = [
(r"cookie", "cookie"),
(r"dienste?\s+von\s+drittanbieter", "dse"),
(r"social\s+media", "dse"),
(r"datensicherheit", "dse"),
(r"betroffenenrecht", "dse"),
(r"widerrufsrecht|widerruf", "widerruf"),
(r"impressum", "impressum"),
(r"nutzungsbedingung|agb|geschaeftsbedingung", "agb"),
(r"datenschutz(?:folge|risiko).*(?:analyse|abschaetzung)|dsfa", "dse"),
(r"datenschutzerkl(?:ae|ä)rung.*social", "dse"),
(r"^cookie", "cookie"), # Cookie-Richtlinie → §25 TDDDG
(r"widerrufsrecht|widerrufsbelehrung", "widerruf"), # Widerruf → §355 BGB
(r"^impressum$", "impressum"), # Impressum → §5 TMG
(r"^(?:agb|allgemeine geschäftsbedingungen|nutzungsbedingungen)$", "agb"),
# NOTE: Social Media, DSFA, Datensicherheit, Betroffenenrechte are NOT
# separate documents — they are sections within the parent DSI.
# DSFA needs its own checklist (RAG-based) — Phase 2.
]