fix: 7 regex bugs from IHK Konstanz ground truth analysis
Build + Deploy / build-admin-compliance (push) Successful in 9s
Build + Deploy / build-backend-compliance (push) Successful in 8s
Build + Deploy / build-ai-sdk (push) Successful in 42s
Build + Deploy / build-developer-portal (push) Successful in 8s
Build + Deploy / build-tts (push) Successful in 7s
Build + Deploy / build-document-crawler (push) Successful in 7s
Build + Deploy / build-dsms-gateway (push) Successful in 8s
Build + Deploy / build-dsms-node (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 18s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m57s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Failing after 49s
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Successful in 28s
CI / test-python-dsms-gateway (push) Successful in 23s
CI / validate-canonical-controls (push) Successful in 15s
Build + Deploy / trigger-orca (push) Successful in 2m24s

Fixes based on manual verification of all 30 failed checks:
1. Cookie table: recognize "folgende cookies" + column headers as text
2. Cookie names: add JSESSIONID, cookieinfo, et_id, BT_* patterns
3. Essential justified: match "sitzung zuordnen", "betrieb der website"
4. Social bookmarks: recognize as 2-click alternative
5. DSFA plural: "kanaelen" now matches alongside "kanal"
6. Section splitter: skip-headings no longer lose subsequent text
   (Risikoabwaegung section was cut from DSFA, losing risk scores)
7. Cookie legal basis: accept Art. 6(1)(f) in cookie context

Reduces false positives from 7 to ~1-2 for IHK Konstanz test case.
Ground truth table: zeroclaw/docs/ground-truth-ihk-konstanz.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-05-07 14:51:09 +02:00
parent f59f810638
commit fa4fd87102
5 changed files with 207 additions and 8 deletions
@@ -266,8 +266,11 @@ JOINT_CONTROLLER_CHECKLIST = [
r"(?:zwei|2)[\-\s]?klick",
r"(?:shariff|share[\-\s]?buttons?\s+ohne\s+tracking)",
r"(?:erst|nur)\s+(?:bei|nach|durch)\s+(?:klick|aktivierung).*(?:daten|verbindung)",
r"social\s*bookmark",
r"(?:kein|keine)\s+(?:social[\-\s]?media[\-\s]?)?plugin",
r"(?:link|verweis|grafik).*(?:weitergeleitet|weiterleitung)",
],
"severity": "LOW",
"hint": "Die verwendete technische Loesung (z.B. 2-Klick-Loesung oder Shariff) wird nicht beschrieben. Erlaeutern Sie, welche datenschutzfreundliche Technik eingesetzt wird, um den sofortigen Datentransfer an Plattformen zu verhindern.",
"hint": "Erlaeutern Sie die eingesetzte datenschutzfreundliche Technik: z.B. Social Bookmarks (reine Links ohne automatische Datenuebertragung), 2-Klick-Loesung oder Shariff-Buttons.",
},
]