feat(audit): P72 MC-Scope-Filter + P73 MC-Solution-Generator
CI / detect-changes (push) Successful in 12s
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / loc-budget (push) Failing after 18s
CI / go-lint (push) Has been skipped
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 41s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / detect-changes (push) Successful in 12s
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 15s
CI / loc-budget (push) Failing after 18s
CI / go-lint (push) Has been skipped
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 41s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
P72 — rag_document_checker LEFT JOINs canonical_controls.scope_doc_type.
_filter_by_canonical_scope wirft MCs raus deren scope explizit auf
einen inkompatiblen Doc-Type zeigt (Mapping in _SCOPE_COMPATIBLE).
Konservativ: 'other'/NULL/'process' bleiben drin — Heuristik v1 ist
noch nicht stark genug fuer hartes Filtern.
Erwartete Wirkung: ~10-15% weniger irrelevante MCs pro Doc, weil z.B.
ein TOM-MC nicht mehr als DSE-Finding auftaucht.
P73 — mc_solution_generator.py: Qwen->OVH Cascade generiert pro HIGH/
CRITICAL-Fail eine konkrete Einfuege-Empfehlung mit Anchor (wo + was)
und Aufwand-Schaetzung. JSON-Schema {solution_text, anchor_hint,
effort_min}. In-process LRU-Cache (500 entries) per (mc_id, doc_md5).
Max 3 Solutions pro Doc-Type, global Cap 8 — haelt Latenz < 60s. Bloecke
werden im Mail-Render unter VVT als 'Loesungs-Vorschlaege (KI-generiert)'
eingehaengt. Disclaimer: kein Rechts-Beratung, mit DSB pruefen.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -293,6 +293,59 @@ _MC_ALIAS_FALLBACK = {
|
||||
}
|
||||
|
||||
|
||||
# P72 — kompatible scope_doc_type-Werte pro operativem doc_type.
|
||||
# 'other' / NULL / 'process' bleiben immer drin (Backfill ist Heuristik v1
|
||||
# und nicht stark genug fuer hartes Filtern).
|
||||
_SCOPE_COMPATIBLE: dict[str, set[str]] = {
|
||||
"dse": {"dse", "jc", "process", "tom", "accounting"},
|
||||
"cookie": {"cookie_richtlinie", "banner_implementation",
|
||||
"cmp_audit", "dse"},
|
||||
"cookie_policy": {"cookie_richtlinie", "banner_implementation",
|
||||
"cmp_audit", "dse"},
|
||||
"impressum": {"impressum", "agb"},
|
||||
"agb": {"agb", "widerruf", "impressum"},
|
||||
"nutzungsbedingungen": {"agb", "widerruf", "impressum"},
|
||||
"widerruf": {"widerruf", "agb"},
|
||||
"avv": {"avv", "tom", "jc", "process"},
|
||||
"tom": {"tom", "avv", "process"},
|
||||
"loeschkonzept": {"process", "dse", "accounting"},
|
||||
"dsfa": {"process", "tom", "dse"},
|
||||
"social_media": {"jc", "dse"},
|
||||
"dsa": {"dse", "impressum"},
|
||||
"legal_notice": {"impressum", "agb"},
|
||||
"lizenzhinweise": {"agb", "impressum"},
|
||||
}
|
||||
_PERMISSIVE_SCOPES = {"other", "process", None, "", "null"}
|
||||
|
||||
|
||||
def _filter_by_canonical_scope(
|
||||
controls: list[dict],
|
||||
doc_type: str,
|
||||
) -> list[dict]:
|
||||
"""P72 — wirft MCs raus, deren canonical scope_doc_type explizit auf
|
||||
einen INKOMPATIBLEN Doc-Type zeigt. 'other'/NULL/'process' bleiben
|
||||
drin (Backfill v1 noch zu unsicher).
|
||||
"""
|
||||
compatible = _SCOPE_COMPATIBLE.get(doc_type)
|
||||
if not compatible:
|
||||
return controls
|
||||
kept: list[dict] = []
|
||||
dropped = 0
|
||||
for c in controls:
|
||||
scope = c.get("canonical_scope")
|
||||
scope_norm = (scope or "").strip().lower() or None
|
||||
if scope_norm in _PERMISSIVE_SCOPES or scope_norm in compatible:
|
||||
kept.append(c)
|
||||
else:
|
||||
dropped += 1
|
||||
if dropped:
|
||||
logger.info(
|
||||
"P72 scope-filter: %d/%d MCs out-of-scope fuer doc_type=%s",
|
||||
dropped, len(controls), doc_type,
|
||||
)
|
||||
return kept
|
||||
|
||||
|
||||
def _load_text_only_ids(
|
||||
doc_type: str | None = None,
|
||||
business_scope: set[str] | None = None,
|
||||
@@ -372,11 +425,19 @@ async def _load_controls(doc_type: str, db_url: str, limit: int,
|
||||
return []
|
||||
|
||||
try:
|
||||
query = """SELECT id, control_id, title, regulation, article,
|
||||
check_question, pass_criteria, fail_criteria, severity
|
||||
FROM compliance.doc_check_controls
|
||||
WHERE doc_type = $1
|
||||
ORDER BY severity DESC, title"""
|
||||
# P72: LEFT JOIN canonical_controls.scope_doc_type um scope-Info
|
||||
# mitzuziehen. Wenn ein MC explizit fuer einen anderen Doc-Type
|
||||
# klassifiziert ist (z.B. 'tom' statt 'dse'), wird er unten
|
||||
# gefiltert. 'other' / NULL bleiben drin (Backfill noch nicht stark).
|
||||
query = """SELECT dc.id, dc.control_id, dc.title, dc.regulation,
|
||||
dc.article, dc.check_question, dc.pass_criteria,
|
||||
dc.fail_criteria, dc.severity,
|
||||
cc.scope_doc_type AS canonical_scope
|
||||
FROM compliance.doc_check_controls dc
|
||||
LEFT JOIN compliance.canonical_controls cc
|
||||
ON cc.id = dc.control_uuid
|
||||
WHERE dc.doc_type = $1
|
||||
ORDER BY dc.severity DESC, dc.title"""
|
||||
if limit > 0:
|
||||
query += f" LIMIT {limit}"
|
||||
|
||||
@@ -387,6 +448,12 @@ async def _load_controls(doc_type: str, db_url: str, limit: int,
|
||||
rows = await conn.fetch(query, fallback)
|
||||
|
||||
controls = [dict(r) for r in rows]
|
||||
|
||||
# P72: Scope-Filter — werfe MCs raus, deren canonical scope_doc_type
|
||||
# explizit auf einen anderen Doc-Type zeigt. Konservativ:
|
||||
# other/NULL/process bleiben drin (zu unsichere Klassifikation).
|
||||
controls = _filter_by_canonical_scope(controls, doc_type)
|
||||
|
||||
text_only = _load_text_only_ids(doc_type, business_scope)
|
||||
if text_only:
|
||||
before = len(controls)
|
||||
|
||||
Reference in New Issue
Block a user