fix: 5 false positives from etogruppe.com ground truth
Build + Deploy / build-tts (push) Successful in 1m38s
Build + Deploy / build-document-crawler (push) Successful in 41s
Build + Deploy / build-dsms-gateway (push) Successful in 26s
Build + Deploy / build-dsms-node (push) Successful in 12s
Build + Deploy / build-admin-compliance (push) Successful in 2m22s
Build + Deploy / build-backend-compliance (push) Successful in 3m21s
Build + Deploy / build-ai-sdk (push) Successful in 53s
Build + Deploy / build-developer-portal (push) Successful in 1m16s
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 20s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / branch-name (push) Has been skipped
CI / nodejs-build (push) Successful in 3m18s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Failing after 59s
CI / test-python-backend (push) Successful in 47s
CI / test-python-document-crawler (push) Successful in 32s
CI / test-python-dsms-gateway (push) Successful in 27s
CI / validate-canonical-controls (push) Successful in 16s
Build + Deploy / trigger-orca (push) Successful in 3m23s
Build + Deploy / build-tts (push) Successful in 1m38s
Build + Deploy / build-document-crawler (push) Successful in 41s
Build + Deploy / build-dsms-gateway (push) Successful in 26s
Build + Deploy / build-dsms-node (push) Successful in 12s
Build + Deploy / build-admin-compliance (push) Successful in 2m22s
Build + Deploy / build-backend-compliance (push) Successful in 3m21s
Build + Deploy / build-ai-sdk (push) Successful in 53s
Build + Deploy / build-developer-portal (push) Successful in 1m16s
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 20s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / branch-name (push) Has been skipped
CI / nodejs-build (push) Successful in 3m18s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Failing after 59s
CI / test-python-backend (push) Successful in 47s
CI / test-python-document-crawler (push) Successful in 32s
CI / test-python-dsms-gateway (push) Successful in 27s
CI / validate-canonical-controls (push) Successful in 16s
Build + Deploy / trigger-orca (push) Successful in 3m23s
1. Soft hyphens (/\xad) stripped before regex matching —
fixes "Datenübertragbarkeit" not matching
2. Art. 15/17/20: allow adjectives between "Recht auf" and keyword
("Recht auf unentgeltliche Auskunft" now matches)
3. DSB contact: regex spans up to 300 chars across newlines
(DSB section with company address between heading and email)
4. Löschkonzept: added "Fortfall", "Entfall", "Beendigung" as
deletion trigger words alongside "Ablauf"/"Wegfall"
Reduces etogruppe FPs from 5 to ~1.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -72,9 +72,10 @@ ART13_CHECKLIST = [
|
||||
"label": "Kontaktdaten des DSB (E-Mail oder Telefon)",
|
||||
"level": 2, "parent": "dpo",
|
||||
"patterns": [
|
||||
r"datenschutz(?:beauftragter?|beauftragte).*?[a-z0-9._%+\-]+@",
|
||||
r"dsb.*?@|dpo.*?@",
|
||||
r"datenschutz(?:beauftragter?|beauftragte)[\s\S]{0,300}[a-z0-9._%+\-]+@",
|
||||
r"dsb[\s\S]{0,100}@|dpo[\s\S]{0,100}@",
|
||||
r"datenschutz@",
|
||||
r"datenschutzbeauftragt[\s\S]{0,200}(?:e-?mail|telefon|fon)",
|
||||
],
|
||||
"severity": "MEDIUM",
|
||||
"hint": "Art. 37(7) DSGVO verlangt Veroeffentlichung der Kontaktdaten des DSB. Mindestens eine E-Mail ist noetig — den Namen muessen Sie nicht nennen. Haeufiger Fehler: DSB wird erwaehnt, aber ohne jede Kontaktmoeglichkeit.",
|
||||
@@ -276,7 +277,9 @@ ART13_CHECKLIST = [
|
||||
"patterns": [
|
||||
r"l(?:oe|ö)schkonzept", r"l(?:oe|ö)schfrist",
|
||||
r"(?:regel|routinem(?:ae|ä)(?:ss|ß)ig).*l(?:oe|ö)sch",
|
||||
r"nach\s+(?:ablauf|wegfall).*(?:gel(?:oe|ö)scht|l(?:oe|ö)sch)",
|
||||
r"nach\s+(?:ablauf|wegfall|fortfall|entfall|beendigung).*(?:gel(?:oe|ö)scht|l(?:oe|ö)sch)",
|
||||
r"(?:gel(?:oe|ö)scht|l(?:oe|ö)schung)\s+nach\s+(?:ablauf|wegfall|fortfall)",
|
||||
r"zweck\s+(?:entf(?:ae|ä)llt|wegf(?:ae|ä)llt).*(?:gel(?:oe|ö)scht|l(?:oe|ö)sch)",
|
||||
],
|
||||
"severity": "LOW",
|
||||
"hint": "Art. 5(1)(e) DSGVO (Speicherbegrenzung) erfordert ein Loeschkonzept. Beschreiben Sie den Prozess: automatische Loeschung nach Fristablauf, regelmaessige Pruefzyklen, oder Verweis auf DIN 66398 (Loeschkonzept). Reine Archivierung ohne Loeschfrist genuegt nicht.",
|
||||
@@ -302,7 +305,7 @@ ART13_CHECKLIST = [
|
||||
"id": "rights_art15",
|
||||
"label": "Recht auf Auskunft (Art. 15)",
|
||||
"level": 2, "parent": "rights",
|
||||
"patterns": [r"art\.\s*15", r"recht\s+auf\s+auskunft", r"right\s+(?:of|to)\s+access"],
|
||||
"patterns": [r"art\.\s*15", r"recht\s+auf\s+(?:\w+\s+)?auskunft", r"auskunft\s+(?:ueber|über)\s+(?:herkunft|ihre)", r"right\s+(?:of|to)\s+access"],
|
||||
"severity": "LOW",
|
||||
"hint": "Art. 15 DSGVO: Betroffene koennen kostenlos Auskunft und eine Kopie aller Daten verlangen. Antwortfrist: 1 Monat (Art. 12(3)). Haeufiger Fehler: Kein Hinweis auf Kostenfreiheit oder den konkreten Anfrageweg (E-Mail-Adresse).",
|
||||
},
|
||||
@@ -318,7 +321,7 @@ ART13_CHECKLIST = [
|
||||
"id": "rights_art17",
|
||||
"label": "Recht auf Loeschung (Art. 17)",
|
||||
"level": 2, "parent": "rights",
|
||||
"patterns": [r"art\.\s*17", r"recht\s+auf\s+l(?:oe|ö)schung", r"right\s+to\s+erasure"],
|
||||
"patterns": [r"art\.\s*17", r"recht\s+auf\s+(?:\w+\s+)?l(?:oe|ö)schung", r"(?:berichtigung|korrektur)\s+oder\s+l(?:oe|ö)schung", r"right\s+to\s+erasure"],
|
||||
"severity": "LOW",
|
||||
"hint": "Art. 17 DSGVO ('Recht auf Vergessenwerden'): Loeschung ist Pflicht, wenn Zweck entfaellt, Einwilligung widerrufen wird oder Daten unrechtmaessig verarbeitet wurden. Erwaehnen Sie auch die Ausnahmen (z.B. gesetzliche Aufbewahrungspflichten §257 HGB, §147 AO).",
|
||||
},
|
||||
@@ -334,7 +337,7 @@ ART13_CHECKLIST = [
|
||||
"id": "rights_art20",
|
||||
"label": "Recht auf Datenportabilitaet (Art. 20)",
|
||||
"level": 2, "parent": "rights",
|
||||
"patterns": [r"art\.\s*20", r"daten(?:ue|ü)bertragbarkeit|datenportabilit", r"right\s+to\s+data\s+portability"],
|
||||
"patterns": [r"art\.\s*20", r"daten(?:ue|ü)bertrag(?:ung|barkeit)|datenportabilit", r"maschinenlesbar\w*\s+format", r"right\s+to\s+data\s+portability"],
|
||||
"severity": "LOW",
|
||||
"hint": "Art. 20 DSGVO: Gilt nur bei Verarbeitung auf Basis von Einwilligung (Art. 6(1)(a)) oder Vertrag (Art. 6(1)(b)) UND automatisierter Verarbeitung. Format: strukturiert, gaengig, maschinenlesbar (z.B. JSON, CSV). Nicht anwendbar bei Art. 6(1)(f).",
|
||||
},
|
||||
|
||||
@@ -71,7 +71,10 @@ def check_document_completeness(
|
||||
Returns a list of findings (summary + missing items).
|
||||
"""
|
||||
findings = []
|
||||
text_lower = text.lower()
|
||||
# Strip soft hyphens ( / \xad) that CMS tools insert for word-breaking
|
||||
# — they break regex matches on compound words like "Datenübertragbarkeit"
|
||||
text_clean = text.replace("\xad", "").replace("­", "")
|
||||
text_lower = text_clean.lower()
|
||||
|
||||
if not text or len(text) < 50:
|
||||
findings.append({
|
||||
|
||||
Reference in New Issue
Block a user