Files
breakpilot-compliance/ai-compliance-sdk/scripts/copy_iace_collections_to_prod.py
T
Benjamin Bönisch 43e02f794a
CI / detect-changes (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 6s
CI / validate-canonical-controls (push) Successful in 10s
CI / loc-budget (push) Successful in 20s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Successful in 1m4s
CI / iace-gt-coverage (push) Successful in 15s
CI / test-python-backend (push) Successful in 24s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
feat(cra): SBOM- + DAST-Findings aus dem Scanner-MCP konsumieren
Sharangs compliance-scanner-agent exponiert SBOM (sbom_vuln_report) + DAST
(list_dast_findings) als eigene MCP-Tools (nicht via list_findings). Neuer
fetch_all_findings(repo_id) zieht list_findings + SBOM + DAST in EINER
MCP-Session und normalisiert ins Finding-Schema:
- SBOM: ein Finding pro verwundbarem Paket (nicht pro CVE), cwe=CWE-1395
  -> deterministisch CRA-AI-22 (robust gegen Paketnamen wie "sqlite").
- DAST: cwe/endpoint/vuln_type uebernommen -> Mapping via cwe/keywords.
assess-from-scanner nutzt fetch_all_findings + liefert source.breakdown
(code/sbom/dast). DAST hat im MCP keinen repo_id-Filter -> dast_repo_scoped:false
(deployment-weit, transparent geflaggt).

Echte MCP-Daten: Kitchenasty 58 code + 35 sbom + 81 dast -> 174 gemappt
(Coverage 94,3%, alle 35 SBOM -> CRA-AI-22).

Enthaelt zusaetzlich das Qdrant->Prod-Kopierскript (#42, verbatim macmini->prod).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-18 12:05:05 +02:00

90 lines
3.3 KiB
Python

#!/usr/bin/env python3
"""Verbatim copy of the IACE Qdrant knowledge-base collections to another Qdrant.
There is no RAG/embedding service on prod, so the normal ingest_iace_kb.sh has no
target there. Instead we copy the already-embedded points (id + vector + payload)
1:1 from the source Qdrant (macmini) to the destination (prod). No re-embedding,
no re-chunking → the destination is byte-identical and /sdk/v1/rag/search reads it
the same way. Idempotent: same point ids → upsert overwrites, no duplicates.
Usage (run on macmini; reads local Qdrant, writes prod Qdrant):
SRC_QDRANT=http://localhost:6333 \
DST_QDRANT=https://qdrant-dev.breakpilot.ai \
DST_QDRANT_KEY=<prod-api-key> \
python3 copy_iace_collections_to_prod.py
"""
import json
import os
import urllib.error
import urllib.request
SRC = os.environ.get("SRC_QDRANT", "http://localhost:6333").rstrip("/")
DST = os.environ["DST_QDRANT"].rstrip("/")
KEY = os.environ["DST_QDRANT_KEY"]
COLLECTIONS = os.environ.get(
"COLLECTIONS", "bp_iace_accident_stats,bp_iace_safety_kb,bp_iace_failure_kb"
).split(",")
BATCH = 128
def _req(method, url, body=None, key=None):
data = json.dumps(body).encode() if body is not None else None
r = urllib.request.Request(url, data=data, method=method)
r.add_header("Content-Type", "application/json")
if key:
r.add_header("api-key", key)
with urllib.request.urlopen(r, timeout=120) as resp:
return json.loads(resp.read())
def _exists(base, col, key=None) -> bool:
try:
_req("GET", f"{base}/collections/{col}", key=key)
return True
except urllib.error.HTTPError as e:
if e.code == 404:
return False
raise
def copy_collection(col: str) -> None:
src_cfg = _req("GET", f"{SRC}/collections/{col}")["result"]["config"]["params"]["vectors"]
size, dist = src_cfg["size"], src_cfg["distance"]
if _exists(DST, col, KEY):
print(f" {col}: dst exists — upserting into it")
else:
_req("PUT", f"{DST}/collections/{col}", {"vectors": {"size": size, "distance": dist}}, KEY)
print(f" {col}: created on dst ({size}d {dist})")
offset, total = None, 0
while True:
body = {"limit": BATCH, "with_vector": True, "with_payload": True}
if offset is not None:
body["offset"] = offset
res = _req("POST", f"{SRC}/collections/{col}/points/scroll", body)["result"]
pts = res.get("points", [])
if not pts:
break
upsert = [{"id": p["id"], "vector": p["vector"], "payload": p.get("payload", {})} for p in pts]
_req("PUT", f"{DST}/collections/{col}/points?wait=true", {"points": upsert}, KEY)
total += len(pts)
offset = res.get("next_page_offset")
if offset is None:
break
src_n = _req("POST", f"{SRC}/collections/{col}/points/count", {"exact": True})["result"]["count"]
dst_n = _req("POST", f"{DST}/collections/{col}/points/count", {"exact": True}, KEY)["result"]["count"]
flag = "OK" if dst_n >= src_n else "MISMATCH"
print(f" {col}: copied {total} | src={src_n} dst={dst_n} [{flag}]")
def main() -> None:
print(f"Copy IACE collections {SRC} -> {DST}")
for col in COLLECTIONS:
copy_collection(col.strip())
print("Done.")
if __name__ == "__main__":
main()