feat(profile+report): P17 — 4 Polish-Items
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Successful in 19s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 39s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped

A) Cookie-Policy-Architecture-Block Fallback auf DSE-Text wenn cookie via
   P15 deduped wurde. Erkennt jetzt auch single-doc Sites (Safetykon-Pattern).

B) Konkrete-Aufgaben-Liste: Per-Doc-Cap (3) entfernt + globaler Cap 10→20.
   Safetykon zeigt jetzt 7 statt 4 Aufgaben.

C) business_type-Klassifizierer: B2B-Service-Cluster aus P14 als Boost.
   Bei 2+ Service-Indikatoren (CE-Zertifizierung/Compliance/Auditierung)
   wird b2b_score angehoben. Safetykon: "B2C consulting" → "B2B (consulting)".

D) Vendor-Extract Fallback auf DSE-Text wenn cookie deduped + keine CMP-
   Payloads. LLM extrahiert dann Vendors aus dem DSE-Text. Safetykon: 0 → 1
   Vendor (Google Analytics aus dem DSE-Text erkannt).

Smoke-Test Safetykon: alle 4 Polish-Items wirken, kein Regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-05-19 12:22:05 +02:00
parent f30a3ce471
commit 313982c6f1
3 changed files with 31 additions and 2 deletions
@@ -237,6 +237,13 @@ async def detect_business_profile(documents: dict[str, str]) -> BusinessProfile:
b2g_score = _count_hits(full_text, _B2G_KEYWORDS)
nonprofit_score = _count_hits(full_text, _NONPROFIT_KEYWORDS)
# P17-C: B2B-Dienstleister-Cluster (P14) als Boost — wenn ein Unternehmen
# CE-Zertifizierung / Compliance-Beratung / Auditierung / Schulungen anbietet,
# ist es i.d.R. B2B auch wenn die strikten B2B-Keywords nicht greifen.
b2b_service_boost = _count_hits(full_text, _B2B_SERVICE_POSITIVE)
if b2b_service_boost >= 2:
b2b_score += min(3, b2b_service_boost - 1)
# Missing documents as signal
has_agb = "agb" in documents
has_widerruf = "widerruf" in documents