43e02f794a
CI / detect-changes (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 6s
CI / validate-canonical-controls (push) Successful in 10s
CI / loc-budget (push) Successful in 20s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Successful in 1m4s
CI / iace-gt-coverage (push) Successful in 15s
CI / test-python-backend (push) Successful in 24s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Sharangs compliance-scanner-agent exponiert SBOM (sbom_vuln_report) + DAST (list_dast_findings) als eigene MCP-Tools (nicht via list_findings). Neuer fetch_all_findings(repo_id) zieht list_findings + SBOM + DAST in EINER MCP-Session und normalisiert ins Finding-Schema: - SBOM: ein Finding pro verwundbarem Paket (nicht pro CVE), cwe=CWE-1395 -> deterministisch CRA-AI-22 (robust gegen Paketnamen wie "sqlite"). - DAST: cwe/endpoint/vuln_type uebernommen -> Mapping via cwe/keywords. assess-from-scanner nutzt fetch_all_findings + liefert source.breakdown (code/sbom/dast). DAST hat im MCP keinen repo_id-Filter -> dast_repo_scoped:false (deployment-weit, transparent geflaggt). Echte MCP-Daten: Kitchenasty 58 code + 35 sbom + 81 dast -> 174 gemappt (Coverage 94,3%, alle 35 SBOM -> CRA-AI-22). Enthaelt zusaetzlich das Qdrant->Prod-Kopierскript (#42, verbatim macmini->prod). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
90 lines
3.3 KiB
Python
90 lines
3.3 KiB
Python
#!/usr/bin/env python3
|
|
"""Verbatim copy of the IACE Qdrant knowledge-base collections to another Qdrant.
|
|
|
|
There is no RAG/embedding service on prod, so the normal ingest_iace_kb.sh has no
|
|
target there. Instead we copy the already-embedded points (id + vector + payload)
|
|
1:1 from the source Qdrant (macmini) to the destination (prod). No re-embedding,
|
|
no re-chunking → the destination is byte-identical and /sdk/v1/rag/search reads it
|
|
the same way. Idempotent: same point ids → upsert overwrites, no duplicates.
|
|
|
|
Usage (run on macmini; reads local Qdrant, writes prod Qdrant):
|
|
SRC_QDRANT=http://localhost:6333 \
|
|
DST_QDRANT=https://qdrant-dev.breakpilot.ai \
|
|
DST_QDRANT_KEY=<prod-api-key> \
|
|
python3 copy_iace_collections_to_prod.py
|
|
"""
|
|
import json
|
|
import os
|
|
import urllib.error
|
|
import urllib.request
|
|
|
|
SRC = os.environ.get("SRC_QDRANT", "http://localhost:6333").rstrip("/")
|
|
DST = os.environ["DST_QDRANT"].rstrip("/")
|
|
KEY = os.environ["DST_QDRANT_KEY"]
|
|
COLLECTIONS = os.environ.get(
|
|
"COLLECTIONS", "bp_iace_accident_stats,bp_iace_safety_kb,bp_iace_failure_kb"
|
|
).split(",")
|
|
BATCH = 128
|
|
|
|
|
|
def _req(method, url, body=None, key=None):
|
|
data = json.dumps(body).encode() if body is not None else None
|
|
r = urllib.request.Request(url, data=data, method=method)
|
|
r.add_header("Content-Type", "application/json")
|
|
if key:
|
|
r.add_header("api-key", key)
|
|
with urllib.request.urlopen(r, timeout=120) as resp:
|
|
return json.loads(resp.read())
|
|
|
|
|
|
def _exists(base, col, key=None) -> bool:
|
|
try:
|
|
_req("GET", f"{base}/collections/{col}", key=key)
|
|
return True
|
|
except urllib.error.HTTPError as e:
|
|
if e.code == 404:
|
|
return False
|
|
raise
|
|
|
|
|
|
def copy_collection(col: str) -> None:
|
|
src_cfg = _req("GET", f"{SRC}/collections/{col}")["result"]["config"]["params"]["vectors"]
|
|
size, dist = src_cfg["size"], src_cfg["distance"]
|
|
if _exists(DST, col, KEY):
|
|
print(f" {col}: dst exists — upserting into it")
|
|
else:
|
|
_req("PUT", f"{DST}/collections/{col}", {"vectors": {"size": size, "distance": dist}}, KEY)
|
|
print(f" {col}: created on dst ({size}d {dist})")
|
|
|
|
offset, total = None, 0
|
|
while True:
|
|
body = {"limit": BATCH, "with_vector": True, "with_payload": True}
|
|
if offset is not None:
|
|
body["offset"] = offset
|
|
res = _req("POST", f"{SRC}/collections/{col}/points/scroll", body)["result"]
|
|
pts = res.get("points", [])
|
|
if not pts:
|
|
break
|
|
upsert = [{"id": p["id"], "vector": p["vector"], "payload": p.get("payload", {})} for p in pts]
|
|
_req("PUT", f"{DST}/collections/{col}/points?wait=true", {"points": upsert}, KEY)
|
|
total += len(pts)
|
|
offset = res.get("next_page_offset")
|
|
if offset is None:
|
|
break
|
|
|
|
src_n = _req("POST", f"{SRC}/collections/{col}/points/count", {"exact": True})["result"]["count"]
|
|
dst_n = _req("POST", f"{DST}/collections/{col}/points/count", {"exact": True}, KEY)["result"]["count"]
|
|
flag = "OK" if dst_n >= src_n else "MISMATCH"
|
|
print(f" {col}: copied {total} | src={src_n} dst={dst_n} [{flag}]")
|
|
|
|
|
|
def main() -> None:
|
|
print(f"Copy IACE collections {SRC} -> {DST}")
|
|
for col in COLLECTIONS:
|
|
copy_collection(col.strip())
|
|
print("Done.")
|
|
|
|
|
|
if __name__ == "__main__":
|
|
main()
|