feat(service-detector): detect 118 services in legal texts (was 20)
Build + Deploy / build-admin-compliance (push) Successful in 2m5s
Build + Deploy / build-backend-compliance (push) Successful in 3m26s
Build + Deploy / build-ai-sdk (push) Successful in 56s
Build + Deploy / build-developer-portal (push) Successful in 1m29s
Build + Deploy / build-tts (push) Failing after 1m48s
Build + Deploy / build-document-crawler (push) Successful in 44s
Build + Deploy / build-dsms-gateway (push) Successful in 28s
Build + Deploy / build-dsms-node (push) Successful in 17s
CI / branch-name (push) Has been skipped
Build + Deploy / trigger-orca (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 17s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 2m45s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Failing after 52s
CI / test-python-backend (push) Successful in 36s
CI / test-python-document-crawler (push) Successful in 25s
CI / test-python-dsms-gateway (push) Successful in 21s
CI / validate-canonical-controls (push) Successful in 14s

New service_detector.py uses service_registry (88 entries) plus 30+
extra text patterns to detect services mentioned in DSI/legal texts.

Results on Spiegel: 31/32 services detected (97%, was 5/32 = 16%).
Includes metadata: name, category, country, EU adequacy status.

- Profiler now uses detect_services_in_text() instead of 20-entry list
- Profile extractor adds detected_services with full metadata
- Auto-generates scope hint for non-EU services (Drittlandtransfer)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-05-13 16:00:15 +02:00
parent 3e61f381a7
commit 33bf2b7c5a
3 changed files with 154 additions and 4 deletions
@@ -64,6 +64,22 @@ def extract_profile_from_documents(
"regulated_profession_type", ""
)
# ── Detected services (full list with metadata) ────────────
try:
from compliance.services.service_detector import detect_services_in_text
detected = detect_services_in_text(all_text)
result["detected_services"] = detected
# Add non-EU services as scope hint
non_eu = [s for s in detected if not s.get("eu_adequate")]
if non_eu:
result["compliance_scope_hints"].append({
"field": "hasThirdCountryTransfer",
"value": True,
"source": f"{len(non_eu)} Dienste ausserhalb EWR erkannt ({', '.join(s['name'] for s in non_eu[:5])}...)",
})
except Exception as e:
logger.warning("Service detection failed: %s", e)
# ── Scope hints from document content ────────────────────────
_extract_scope_hints(all_text, result)