fix(cra): Scanner-Findings vollstaendig mappen + assess-from-scanner-Latenz senken
CI / detect-changes (push) Successful in 17s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 13s
CI / validate-canonical-controls (push) Successful in 12s
CI / loc-budget (push) Successful in 25s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / detect-changes (push) Successful in 17s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 13s
CI / validate-canonical-controls (push) Successful in 12s
CI / loc-budget (push) Successful in 25s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Punkt 2 (Coverage): semgrep/gdpr-Findings ohne CWE blieben unmapped (~21%). Der Mapper nutzt jetzt den scanner rule_id + gezielte Keywords (gdpr -> Datenminimierung CRA-AI-17, path-traversal/prototype-pollution -> CRA-AI-20, nginx-header/Docker-Hardening -> CRA-AI-1/4, insecure-websocket -> CRA-AI-15). Reale Scanner-Daten: unmapped 19/92 -> 0/92 (Coverage 100%). Punkt 3 (Latenz): enrich_findings_with_breadth lief ~6 Aggregat-Queries je (use_case,sub_topic)-Paar, nutzte aber nur die Liste. Jetzt EINE batched Query (breadth_controls_batch) fuer alle Paare + Prozess-Cache (TTL 1800s). macmini: cold 0,23s / warm 0,000s. Prod-Root-Cause: atom_classification ohne (use_case,sub_topic)-Index nach DB-Swap -> Index dem DB-Owner empfohlen. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -14,7 +14,7 @@ from __future__ import annotations
|
||||
|
||||
from typing import Any, Optional
|
||||
|
||||
from sqlalchemy import text
|
||||
from sqlalchemy import bindparam, text
|
||||
from sqlalchemy.orm import Session
|
||||
|
||||
from compliance.data.use_case_registry import REGISTRY, is_valid_use_case
|
||||
@@ -148,12 +148,82 @@ _ATOM_COUNT_SQL = text("""
|
||||
""")
|
||||
|
||||
|
||||
# Breadth fast-path: top-N atom controls for MANY (use_case, sub_topic) pairs in
|
||||
# ONE query. The CRA enrichment only needs this list — NOT the counts/facets/total
|
||||
# that controls_for_use_case also computes (those are 5 extra aggregate scans per
|
||||
# call, discarded by the caller). On prod (atom_classification currently lacks the
|
||||
# (use_case, sub_topic) index after the DB swap) collapsing ~6 queries × N pairs
|
||||
# into one scan is the difference between ~38s and a few seconds.
|
||||
_ATOM_BREADTH_BATCH_SQL = text("""
|
||||
SELECT q.use_case, q.sub_topic, q.control_id, q.title, q.severity,
|
||||
q.source_regulation, q.source_article
|
||||
FROM (
|
||||
SELECT ac.use_case, ac.sub_topic, cc.control_id, cc.title, cc.severity,
|
||||
cpl.source_regulation, cpl.source_article,
|
||||
row_number() OVER (
|
||||
PARTITION BY ac.use_case, ac.sub_topic
|
||||
ORDER BY CASE cc.severity WHEN 'critical' THEN 0 WHEN 'high' THEN 1
|
||||
WHEN 'medium' THEN 2 ELSE 3 END, cc.title
|
||||
) AS rn
|
||||
FROM atom_classification ac
|
||||
JOIN canonical_controls cc ON cc.id = ac.control_uuid
|
||||
LEFT JOIN LATERAL (
|
||||
SELECT cpl.source_regulation, cpl.source_article
|
||||
FROM control_parent_links cpl
|
||||
WHERE cpl.control_uuid = ac.control_uuid LIMIT 1
|
||||
) cpl ON true
|
||||
WHERE ac.relevant = true
|
||||
AND (ac.addressee IS NULL OR ac.addressee NOT IN
|
||||
('aufsichtsbefugnis','staat_eu','dritter','meta'))
|
||||
AND (ac.use_case, ac.sub_topic) IN :pairs
|
||||
) q
|
||||
WHERE q.rn <= :per
|
||||
""").bindparams(bindparam("pairs", expanding=True))
|
||||
|
||||
# Process-level memo: does the atom table exist? (never changes at runtime)
|
||||
_ATOM_TABLE_EXISTS: dict[str, Optional[bool]] = {"v": None}
|
||||
|
||||
|
||||
class UseCaseControlsService:
|
||||
"""Topic → controls retrieval over the seeded use-case mappings."""
|
||||
|
||||
def __init__(self, db: Session) -> None:
|
||||
self.db = db
|
||||
|
||||
def _atom_table_exists(self) -> bool:
|
||||
if _ATOM_TABLE_EXISTS["v"] is None:
|
||||
_ATOM_TABLE_EXISTS["v"] = self.db.execute(
|
||||
text("SELECT to_regclass('compliance.atom_classification')")
|
||||
).scalar() is not None
|
||||
return bool(_ATOM_TABLE_EXISTS["v"])
|
||||
|
||||
def breadth_controls_batch(
|
||||
self, pairs, per: int = 3,
|
||||
) -> dict[tuple[str, str], list[dict[str, Any]]]:
|
||||
"""Top-``per`` atom controls for each (use_case, sub_topic) pair, in ONE
|
||||
query. Returns {(use_case, sub_topic): [control dicts]}. Best-effort:
|
||||
empty dict on any error or when the atom table is absent (caller then
|
||||
leaves breadth empty — never breaks the assessment)."""
|
||||
uniq = sorted({(uc, st) for uc, st in pairs if uc and st})
|
||||
if not uniq or not self._atom_table_exists():
|
||||
return {}
|
||||
try:
|
||||
rows = self.db.execute(
|
||||
_ATOM_BREADTH_BATCH_SQL,
|
||||
{"pairs": uniq, "per": min(max(int(per), 1), 50)},
|
||||
).fetchall()
|
||||
except Exception:
|
||||
return {}
|
||||
out: dict[tuple[str, str], list[dict[str, Any]]] = {}
|
||||
for r in rows:
|
||||
out.setdefault((r.use_case, r.sub_topic), []).append({
|
||||
"control_id": r.control_id, "title": r.title,
|
||||
"source_regulation": r.source_regulation,
|
||||
"source_article": r.source_article,
|
||||
"severity": r.severity, "use_case": r.use_case,
|
||||
})
|
||||
return out
|
||||
|
||||
def list_use_cases(self) -> list[dict[str, Any]]:
|
||||
"""Registry use-cases with live counts — atom-grain (Haiku classification)
|
||||
plus the legacy master seed. Backs the coverage overview so every topic is
|
||||
|
||||
Reference in New Issue
Block a user