1b5c6bd340
Build + Deploy / build-ai-sdk (push) Failing after 33s
Build + Deploy / build-developer-portal (push) Successful in 7s
Build + Deploy / build-tts (push) Successful in 7s
Build + Deploy / build-document-crawler (push) Successful in 7s
Build + Deploy / build-dsms-gateway (push) Successful in 8s
Build + Deploy / build-admin-compliance (push) Successful in 1m51s
Build + Deploy / build-backend-compliance (push) Successful in 8s
CI / loc-budget (push) Failing after 18s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 19s
Build + Deploy / build-dsms-node (push) Successful in 8s
CI / branch-name (push) Has been skipped
Build + Deploy / trigger-orca (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / nodejs-build (push) Successful in 3m8s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Failing after 46s
CI / test-python-backend (push) Successful in 41s
CI / test-python-document-crawler (push) Successful in 32s
CI / test-python-dsms-gateway (push) Successful in 24s
Tested BMW, Stadt Koeln, BfDI, Sparkasse, Caritas, TUEV Sued, Spiegel, ETO Gruppe, EUIPO. Key findings: - Stadt Koeln + ETO Gruppe best (95% correctness) - BMW, Sparkasse, Spiegel genuinely deficient (verified) - EUIPO uses EU Regulation 2018/1725, not GDPR — needs separate checklist - ~0-2 false positives per website after LLM verification 7 regex fixes emerged from batch testing (soft hyphens, word insertions, numbered headings, German section names, etc.) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2.0 KiB
2.0 KiB
Batch-Test Ergebnisse (2026-05-08)
9 Websites getestet
| # | Website | Typ | L1 | L2 | Vollst. | Korr. | Woerter | Bewertung |
|---|---|---|---|---|---|---|---|---|
| 1 | Stadt Koeln | Kommune | 9/9 | 21/22 | 100% | 95% | 5910 | Vorbildlich |
| 2 | Caritas | Nonprofit | 9/9 | 19/22 | 100% | 86% | 9447 | Gut |
| 3 | ETO Gruppe | Mittelstand | 9/9 | 21/22 | 100% | 95% | 7312 | Vorbildlich |
| 4 | BfDI | Bundesbehoerde | 9/9 | 16/22 | 100% | 73% | 2014 | OK (kurz) |
| 5 | TUEV Sued | Prueforg. | 8/9 | 15/21 | 89% | 71% | 9467 | Luecken |
| 6 | IHK Konstanz | Kammer | 9/9 | 18/22 | 100% | 82% | 6353 | Gut |
| 7 | BMW | Konzern | 8/9 | 10/21 | 89% | 48% | 7207 | Mangelhaft |
| 8 | Sparkasse | Finanz | 7/9 | 10/20 | 78% | 50% | 12183 | Mangelhaft |
| 9 | Spiegel | Medien | 6/9 | 10/13 | 67% | 77% | 13698 | Mangelhaft |
Sonderfaelle
- EUIPO (EU-Behoerde): 6/9 L1, 5/13 L2 — unterliegt Verordnung 2018/1725, nicht DSGVO. Separate Checkliste noetig.
- dm, Zalando, HWK: Text-Extraktion scheitert (JS-heavy SPAs, Consent-Wall blockiert)
Verifizierte True Positives
BMW, Sparkasse und Spiegel haben tatsaechlich lueckenhafte DSEs — verifiziert gegen Originaltexte:
- BMW: Keine E-Mail, kein Art. 77 Beschwerderecht, keine Art.-Referenzen fuer Rechte
- Sparkasse: Kein DSB, kein Art. 77
- Spiegel: Kein DSB, kein Art. 77, keine Betroffenenrechte
False-Positive-Rate
Ueber alle 9 Websites: ~0-2 FP pro Website nach LLM-Verifikation. Hauptursache verbleibender FP: Ungewoehnliche Formulierungen die weder Regex noch LLM erkennen.
Regex-Fixes die aus dem Batch-Test entstanden
- Soft-Hyphen Stripping (\xad) — etogruppe
- "Recht auf [Adjektiv] Auskunft" — Wort-Einschub
- "nach Fortfall" neben "nach Ablauf" — Loeschkonzept
- DSB-Kontakt ueber Zeilenumbrueche — [\s\S]{0,300}
- Nummerierte Headings ("5. Soziale Medien") — isdigit()
- Section-Splitter nur bei klassifizierten Headings
- "Soziale Medien/Netzwerke" als Social-Media-Heading