feat(cra): Datenblatt→Grenzen-Extraktor (hybrid, lokales 35B)

Hybrid-Extraktion Datenblatt → IACE Grenzen (ISO 12100): deterministischer
Detektor (Schnittstellen/Einheiten per Regex) + lokales 35B via llm_cascade
(Qwen-lokal-first) fuer die semantische Zuordnung auf die echten LimitsFormData-
Keys. Nichts erfinden: Feld nicht im Text → leer + Quellen-Zitat je Feld.
Essenzielle ISO-12100-Felder, die leer bleiben → gezielte Rückfragen
(foreseeable_misuses, person_groups, qualification, temporal_limits …).
Endpoint POST /api/v1/cra/extract-datasheet. 13 Tests gruen (reine Teile).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-06-16 19:06:07 +02:00
parent 62fafaaec5
commit cfdc5fe277
3 changed files with 268 additions and 0 deletions
@@ -18,6 +18,7 @@ from compliance.services.cra_finding_mapper import assess_findings_payload
from compliance.services.cra_applicability import (
compute_verdict, compute_machinery_verdict, maturity as evidence_maturity, MACHINE_INTEGRATOR,
)
from compliance.services.cra_datasheet_extractor import extract_grenzen
from compliance.services.scanner_mcp_client import fetch_findings
from compliance.services.cra_snapshot_store import save_snapshot, list_snapshots, get_snapshot
from compliance.services.cra_use_case_controls import enrich_findings_with_breadth
@@ -137,6 +138,18 @@ async def assess_from_scanner(body: ScannerPullRequest):
return result
class DatasheetRequest(BaseModel):
text: str = ""
@router.post("/extract-datasheet")
async def extract_datasheet(body: DatasheetRequest):
"""Datasheet text -> IACE 'Grenzen' draft (limits + provenance) + the
essential ISO-12100 fields still missing as targeted follow-up questions.
Hybrid: deterministic interface/unit detector + local 35B (llm_cascade)."""
return await extract_grenzen(body.text)
@router.get("/scanner-repos")
async def scanner_repos():
"""Distinct repo_ids the scanner has findings for, so the UI can pick which