feat: 4 remaining tasks — EU institutions, banner integration, JS-sites, Caritas fixes
Build + Deploy / build-ai-sdk (push) Failing after 36s
Build + Deploy / build-developer-portal (push) Successful in 8s
Build + Deploy / build-tts (push) Successful in 7s
Build + Deploy / build-document-crawler (push) Successful in 7s
Build + Deploy / build-admin-compliance (push) Successful in 8s
Build + Deploy / build-backend-compliance (push) Successful in 8s
CI / nodejs-build (push) Successful in 3m14s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Failing after 46s
CI / test-python-backend (push) Successful in 43s
CI / test-python-document-crawler (push) Successful in 29s
CI / test-python-dsms-gateway (push) Successful in 30s
CI / validate-canonical-controls (push) Successful in 16s
Build + Deploy / build-dsms-gateway (push) Successful in 8s
Build + Deploy / build-dsms-node (push) Successful in 8s
CI / branch-name (push) Has been skipped
Build + Deploy / trigger-orca (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 17s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
Build + Deploy / build-ai-sdk (push) Failing after 36s
Build + Deploy / build-developer-portal (push) Successful in 8s
Build + Deploy / build-tts (push) Successful in 7s
Build + Deploy / build-document-crawler (push) Successful in 7s
Build + Deploy / build-admin-compliance (push) Successful in 8s
Build + Deploy / build-backend-compliance (push) Successful in 8s
CI / nodejs-build (push) Successful in 3m14s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Failing after 46s
CI / test-python-backend (push) Successful in 43s
CI / test-python-document-crawler (push) Successful in 29s
CI / test-python-dsms-gateway (push) Successful in 30s
CI / validate-canonical-controls (push) Successful in 16s
Build + Deploy / build-dsms-gateway (push) Successful in 8s
Build + Deploy / build-dsms-node (push) Successful in 8s
CI / branch-name (push) Has been skipped
Build + Deploy / trigger-orca (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 17s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
1. EU Institution Checks (Verordnung 2018/1725): - New doc_type "eu_institution" with 9 L1 + 15 L2 checks - Both German + English patterns (EU institutions are multilingual) - Auto-detection via "2018/1725", "EDSB", "EDPS" keywords - Correct article references (Art. 15 instead of 13, Art. 5 instead of 6) 2. Banner Check Integration: - banner_runner.py maps scan results to 36 L1/L2 structured checks - BannerCheckTab shows hierarchical ChecklistView with hints - 3-phase summary (cookies/scripts before/after consent) - /scan endpoint now includes structured_checks in response 3. JS-heavy Website Fixes (dm, Zalando, HWK): - dsi_helpers.py: goto_resilient (networkidle→domcontentloaded fallback) - try_dismiss_consent_banner before text extraction - PDF redirect detection (dm.de redirects to GCS PDF) 4. Caritas False Positive Fixes: - Phone regex allows parentheses: +49 (0)761 → now matches - "Recht auf Widerspruch" (3 words) + §23 KDG → matches Art. 21 - Church authorities: "Katholisches Datenschutzzentrum" recognized Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,175 @@
|
||||
"""
|
||||
Banner Runner — maps scan results to the L1/L2 check hierarchy.
|
||||
|
||||
Takes the raw ScanResponse dict and produces a structured_checks list
|
||||
compatible with ChecklistView (same format as document checks).
|
||||
"""
|
||||
|
||||
from checks.banner_checks import BANNER_CHECKLIST
|
||||
|
||||
|
||||
def map_scan_to_checks(scan_result: dict) -> dict:
|
||||
"""Map a /scan response to the L1/L2 banner check hierarchy.
|
||||
|
||||
Returns dict with:
|
||||
- structured_checks: list of CheckItem dicts
|
||||
- completeness_pct: L1 pass rate (0-100)
|
||||
- correctness_pct: L2 pass rate (0-100)
|
||||
"""
|
||||
# Collect all violation codes from every source
|
||||
violation_codes = _collect_violation_codes(scan_result)
|
||||
|
||||
# Collect pass codes — some checks produce boolean signals, not violations
|
||||
pass_codes = _collect_pass_codes(scan_result)
|
||||
|
||||
# Build structured checks
|
||||
checks: list[dict] = []
|
||||
l1_checks: list[dict] = []
|
||||
l2_checks: list[dict] = []
|
||||
|
||||
for defn in BANNER_CHECKLIST:
|
||||
key = defn["check_key"]
|
||||
level = defn["level"]
|
||||
parent = defn.get("parent")
|
||||
|
||||
# Determine pass/fail
|
||||
is_violation_key = key in violation_codes
|
||||
is_pass_key = key in pass_codes
|
||||
|
||||
# For checks whose check_key appears in violations → failed
|
||||
# For checks whose check_key appears only in passes → passed
|
||||
# For checks where neither → assume passed (not tested = no finding)
|
||||
if is_violation_key:
|
||||
passed = False
|
||||
matched_text = violation_codes[key]
|
||||
elif is_pass_key:
|
||||
passed = True
|
||||
matched_text = pass_codes.get(key, "")
|
||||
else:
|
||||
# Key not found in violations or explicit passes.
|
||||
# If the scan ran (banner detected) → assume passed.
|
||||
# If banner not detected → only banner_detected fails.
|
||||
passed = scan_result.get("banner_detected", False) or key == "banner_detected"
|
||||
if key == "banner_detected":
|
||||
passed = scan_result.get("banner_detected", False)
|
||||
matched_text = ""
|
||||
|
||||
# L2 checks are skipped if their parent L1 failed
|
||||
skipped = False
|
||||
if level == 2 and parent:
|
||||
parent_check = next(
|
||||
(c for c in checks if c["id"] == parent), None
|
||||
)
|
||||
if parent_check and not parent_check["passed"]:
|
||||
skipped = True
|
||||
|
||||
item = {
|
||||
"id": defn["id"],
|
||||
"label": defn["label"],
|
||||
"passed": passed and not skipped,
|
||||
"severity": defn["severity"],
|
||||
"level": level,
|
||||
"parent": parent,
|
||||
"skipped": skipped,
|
||||
"hint": defn.get("hint", ""),
|
||||
"matched_text": matched_text if passed else "",
|
||||
}
|
||||
checks.append(item)
|
||||
|
||||
if level == 1:
|
||||
l1_checks.append(item)
|
||||
elif level == 2:
|
||||
l2_checks.append(item)
|
||||
|
||||
# Compute percentages
|
||||
l1_total = len(l1_checks)
|
||||
l1_passed = sum(1 for c in l1_checks if c["passed"])
|
||||
completeness_pct = round(l1_passed / l1_total * 100) if l1_total else 0
|
||||
|
||||
l2_active = [c for c in l2_checks if not c["skipped"]]
|
||||
l2_passed = sum(1 for c in l2_active if c["passed"])
|
||||
correctness_pct = round(l2_passed / len(l2_active) * 100) if l2_active else 0
|
||||
|
||||
return {
|
||||
"structured_checks": checks,
|
||||
"completeness_pct": completeness_pct,
|
||||
"correctness_pct": correctness_pct,
|
||||
}
|
||||
|
||||
|
||||
def _collect_violation_codes(scan: dict) -> dict[str, str]:
|
||||
"""Collect check_key → violation text from all sources."""
|
||||
codes: dict[str, str] = {}
|
||||
|
||||
# Banner text violations
|
||||
banner_checks = scan.get("banner_checks", {})
|
||||
for v in banner_checks.get("violations", []):
|
||||
code = v.get("code", "")
|
||||
if code:
|
||||
codes[code] = v.get("text", "")[:120]
|
||||
|
||||
# Phase A violations (before consent)
|
||||
phase_a = scan.get("phases", {}).get("before_consent", {})
|
||||
for v in phase_a.get("violations", []):
|
||||
code = v.get("code", "")
|
||||
if code:
|
||||
codes[code] = v.get("text", "")[:120]
|
||||
|
||||
# Phase B violations (after reject)
|
||||
phase_b = scan.get("phases", {}).get("after_reject", {})
|
||||
for v in phase_b.get("violations", []):
|
||||
code = v.get("code", "")
|
||||
if code:
|
||||
codes[code] = v.get("text", "")[:120]
|
||||
|
||||
# Tracking services in phase A → tracking_before_consent
|
||||
tracking_a = phase_a.get("tracking_services", [])
|
||||
if tracking_a and "tracking_before_consent" not in codes:
|
||||
codes["tracking_before_consent"] = ", ".join(tracking_a[:5])
|
||||
|
||||
# Cookies before consent → cookies_before_consent
|
||||
cookies_a = phase_a.get("cookies", [])
|
||||
tracking_cookies = [c for c in cookies_a if _is_tracking_cookie(c)]
|
||||
if tracking_cookies and "cookies_before_consent" not in codes:
|
||||
codes["cookies_before_consent"] = ", ".join(tracking_cookies[:5])
|
||||
|
||||
# New tracking after reject → tracking_after_reject
|
||||
new_tracking_b = phase_b.get("new_tracking", [])
|
||||
if new_tracking_b and "tracking_after_reject" not in codes:
|
||||
codes["tracking_after_reject"] = ", ".join(new_tracking_b[:5])
|
||||
|
||||
return codes
|
||||
|
||||
|
||||
def _collect_pass_codes(scan: dict) -> dict[str, str]:
|
||||
"""Collect explicit pass signals from scan results."""
|
||||
passes: dict[str, str] = {}
|
||||
|
||||
# Banner detected
|
||||
if scan.get("banner_detected"):
|
||||
passes["banner_detected"] = scan.get("banner_provider", "detected")
|
||||
|
||||
# Provider named
|
||||
provider = scan.get("banner_provider", "")
|
||||
if provider:
|
||||
passes["banner_provider_named"] = provider
|
||||
|
||||
# Impressum link
|
||||
bc = scan.get("banner_checks", {})
|
||||
if bc.get("has_impressum_link"):
|
||||
passes["impressum_link"] = "Impressum-Link gefunden"
|
||||
if bc.get("has_dse_link"):
|
||||
passes["dse_link"] = "DSE-Link gefunden"
|
||||
|
||||
return passes
|
||||
|
||||
|
||||
_TRACKING_COOKIE_PREFIXES = (
|
||||
"_ga", "_gid", "_fbp", "_fbc", "IDE", "_gcl", "fr", "_pin",
|
||||
"_tt_", "li_sugr", "_hj", "mp_", "ajs_", "_clck", "_clsk",
|
||||
)
|
||||
|
||||
|
||||
def _is_tracking_cookie(name: str) -> bool:
|
||||
"""Check if a cookie name is a known tracking cookie."""
|
||||
return any(name.startswith(p) for p in _TRACKING_COOKIE_PREFIXES)
|
||||
Reference in New Issue
Block a user