Benjamin Admin
8d6959e8b2
fix: Expand Art. 13 patterns for generic matching across all websites
...
Complaint (Art. 13(2)(d)):
+ 'recht auf beschwerde', 'art. 77', 'beschwerde...wenden/einlegen',
'zuständige behörde' — IHK uses 'Recht auf Beschwerde gem. Art. 77'
Legal basis (Art. 13(1)(c)):
+ 'gemäß Art.', '§ X IHKG/BDSG/LDSG/BBiG/TDDDG', 'einwilligung gem',
'verarbeitung auf grundlage' — catches statutory references
Third country (Art. 13(1)(f)):
+ 'Übermittlung ausserhalb', 'EWR/EEA', 'Data Privacy Framework'
Retention (Art. 13(2)(a)):
+ 'Dauer der Speicherung', 'Aufbewahrungsdauer/-pflicht/-zeit',
'gesetzliche Aufbewahrung' — common German DSE headings
All patterns are generic, not IHK-specific.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-05-05 17:45:02 +02:00
Benjamin Admin
e3ae35891f
fix: 0% completeness bug — SCORE finding was not generated at 100%
...
Root cause: When all 9 Art. 13 checks passed (100%), no SCORE finding
was created (line: 'if pct < 100'). The backend then defaulted to
completeness=0 because it looked for the SCORE finding to extract the %.
Fix: Always generate SCORE finding, even at 100%. Added 'OK' severity
for fully compliant documents.
This was the cause of 8 documents showing '0% MANGELHAFT' despite
containing all required information.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-05-05 15:34:04 +02:00
Benjamin Admin
6c5e086356
fix: DSI dedup — skip anchor links, filter noise, merge duplicates + fix false positives
...
Dedup fixes:
- Anchor links (#cookies, #betroffenenrechte) on same page are skipped entirely
- Noise titles filtered: 'drucken', 'nach oben', 'Datenschutz' (too generic)
- Documents with < 50 words filtered (navigation snippets)
- Documents with identical word_count merged (same page, different title)
- URL-only titles filtered
False positive fixes (dsi_document_checker.py):
- 'Kontaktdaten des Verantwortlichen' pattern for controller check
- 'Zweck und Rechtsgrundlage' combined heading pattern
- 'Welche Daten werden verarbeitet' question-style headings
- 'Betroffenenrechte' as standalone heading
- 'Welche Rechte hat der Betroffene' question pattern
- 'Daten werden geloescht' retention pattern
- 'Auftragsverarbeiter' as recipient indicator
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-05-05 11:41:07 +02:00
Benjamin Admin
48146cddaf
feat: DSI document discovery + completeness check in agent scan workflow
...
Agent scan now automatically:
1. Discovers all legal documents via consent-tester /dsi-discovery endpoint
2. Classifies each as DSE/AGB/Widerruf/Cookie/Impressum
3. Checks completeness against type-specific checklists:
- DSE: 9 Art. 13 DSGVO mandatory fields (controller, DPO, purposes,
legal basis, recipients, third-country, retention, rights, complaint)
- AGB: §305ff BGB (scope, contract formation, liability, jurisdiction)
- Widerruf: §355 BGB (right info, 14-day deadline, form, consequences)
4. Adds findings per document to scan results
5. Shows discovered documents with completeness % in email summary
6. Returns discovered_documents list in API response
New files:
- dsi_document_checker.py (229 LOC) — checklists + classifier
- agent_scan_helpers.py (109 LOC) — extracted summary builder + corrections
Refactor: agent_scan_routes.py 537→448 LOC (under 500 budget)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-05-04 22:10:13 +02:00