fix: Context-aware Impressum checks + 3 regex fixes

3 Regex fixes:
- Telefon: matches '0761 / 48 98 09 01' format (spaces around /)
- Registergericht: matches 'AG Freiburg' (not just 'Amtsgericht')
- Vertretung: matches 'Geschaeftsfuehrung:' (not just 'Geschaeftsfuehrer:')

6 checks changed from FAIL to INFO severity:
- V.i.S.d.P.: only relevant if website has editorial content
- Streitbeilegung: only relevant for B2C online shops
- Berufsrecht: only relevant for regulated professions
- Stammkapital: legally required but rarely enforced
- Aufsichtsbehoerde: only for licensed activities
- Berufshaftpflicht: only for mandatory insurance

INFO checks don't count towards completeness percentage.
They appear as hints, not findings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-05-11 15:23:19 +02:00
parent 916337b503
commit 0c25832b5c
2 changed files with 54 additions and 45 deletions
@@ -111,14 +111,19 @@ def check_document_completeness(
passed_l1_ids: set[str] = set()
all_checks: list[dict] = []
l1_present = 0
l1_scoreable = 0 # Exclude INFO checks from score
for check in l1_checks:
is_info = check.get("severity") == "INFO"
match = _match_patterns(check["patterns"], text_lower)
passed = match is not None
if passed:
passed_l1_ids.add(check["id"])
l1_present += 1
else:
if not is_info:
l1_present += 1
if not is_info:
l1_scoreable += 1
if not passed and not is_info:
findings.append({
"code": f"DSI-MISSING-{check['id'].upper()}",
"severity": check.get("severity", "MEDIUM"),
@@ -175,7 +180,7 @@ def check_document_completeness(
})
# ── Summary ───────────────────────────────────────────────────────
l1_total = len(l1_checks)
l1_total = l1_scoreable # Exclude INFO checks from percentage
completeness_pct = round(l1_present / l1_total * 100) if l1_total else 0
correctness_pct = round(l2_passed / l2_total * 100) if l2_total else 0