Roadmap item 4. After WHAT applies / WHAT is missing / WHICH first, the GF asks HOW. The
Implementation Playbook renders, for one capability, the full journey — why / which regulations
it closes / tools / process / evidence / controls — and chains the Optimization Roadmap into
per-measure playbooks. Another renderer over the same Capability spine (ADR-003/004), not a new
engine: ~95% of the data already exists, it just needs a different rendering.
- compliance/playbook/: build_playbook() + playbooks_for_plan() (chains optimization -> playbook,
acyclic; reuses leverage for "closes which regulations"). Capabilities without curated content
render as honest status:missing stubs — the content-owed signal.
- knowledge/implementation_playbooks/: curated knowledge layer (Reasoning Knowledge Acquisition),
two deep expert drafts (SBOM, CVD/PSIRT, status draft, expert-draft-not-normative) + README.
The bottleneck is now CONTENT, not software; Playbook (own knowledge) != regulatory domain.
- ADR-004: Implementation Playbooks = renderer + knowledge layer; content is the bottleneck.
- reference suite: "Implementation Playbook" section renders the SBOM journey + Roadmap->Playbook
table (high-leverage caps flagged "fehlt (Inhalt)" — content backlog, highest leverage first).
- refactor: extracted markdown helpers to reference_scenarios/_helpers.py to keep generate.py
under the 500-LOC budget.
9 playbook tests (40 with optimization+transition+company), mypy --strict clean, check-loc 0.
Product code with no app caller + knowledge/ADR/reference = non-runtime -> no deploy (ADR-001).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Roadmap item 5. GAP analysis and measure-prioritisation are the SAME computation: Required −
Known = the Capability Delta. The Capability Delta Engine (RS-005) computes it once; renderers
read that ONE delta. Interview Renderer (missing info → questions) was already built; this adds
the Roadmap/Management Renderer (missing capabilities → measures ranked by regulatory leverage).
- compliance/optimization/: regulatory_leverage() + select_within_budget() (pure leverage math)
+ roadmap_from_delta(assessment, ...) — the keystone binding optimization to the RS-005 delta
(dependency optimization → transition_reasoning, acyclic; the delta engine stays hermetic).
leverage(measure) = number of regulatory requirements it closes at once (e.g. patch management
→ CRA+MaschinenVO+IEC62443+ISO27001 = 4). No new corpus, no new meta-model class (freeze v1.0).
- Welt-1 honesty: percentages are exact count ratios over the IDENTIFIED requirements (the known
delta), never "% gesetzeskonform".
- reference suite: "Regulatory Optimization" section runs the SAME convergence delta → ranked
measures + budget answer + the management sentence "of N identified requirements you close M
with the top-K measures (X%) — highest regulatory leverage".
- ADR-003: Capability Delta Engine — one delta, many renderers; rename Gap → Capability Delta.
13 optimization tests (31 with transition+company), mypy --strict clean, check-loc 0.
Product code with no app caller + ADR/reference = non-runtime → no deploy (ADR-001).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Roadmap item 2: the RTS now pin MaschinenVO + convergence Expected Outcomes, so the
convergence USP is a living regression, not just a one-off section.
- RTS-003 (machine + ISMS, networked): full multi-regulation archetype — maschinenvo
expected_delta + convergence expected_multi_target (links TP-ISO27001-CRA-MaschinenVO-v1).
Generator runs the convergence pattern through RS-005: 4/4 machine-safety delta MISSING +
4/4 expected multi-target caps converge. PASS.
- RTS-001 (component): MaschinenVO modeled as `uncertain` (a pure component is usually not a
machine; deciding question is_safety_component) — engine must never assert it applies. Honest,
parallel to the Data-Act handling.
- RTS-002 (machine, QMS-only): MaschinenVO `applies` (is_machine) but LOW convergence — no ISMS
means the cyber side is entirely delta, so few caps are shared. The honest contrast that the
convergence USP rewards companies who already run an ISMS.
- generator: per-RTS maschinenvo/convergence Soll-Ist checks; convergence pattern run once and
reused. Data Act stays `uncertain` everywhere, never asserted.
All 3 RTS PASS. 18 tests (transition+company), mypy --strict clean, check-loc 0.
Non-runtime (knowledge + reference harness) -> no deploy (ADR-001).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The first multi-regulation pattern: each capability declares `covers_targets`, so we
can answer the convergence USP — "which capability satisfies CRA AND MaschinenVO at once?"
- knowledge: transition_pattern_iso27001_to_cra_maschinenvo_v1.yaml (pattern_type:
regulatory_convergence, status draft). The cyber-safety bridge = MaschinenVO Annex III
1.1.9 "protection against corruption" overlapping CRA integrity. 4 convergence
capabilities cover BOTH; 5 CRA-only; 3 MaschinenVO-only.
- product: compliance/transition_reasoning/convergence.py — regulatory_convergence()
pure/deterministic/computed-not-stored, no new graph/class (freeze v1.0 untouched).
No app caller yet -> non-runtime, no deploy (ADR-001).
- reference suite: Cross-Regulation Capability Mapping section renders the customer
sentence "von N neuen Massnahmen erfuellen M gleichzeitig CRA und MaschinenVO".
- README: term -> Regulatory Transition / Convergence Pattern; covers_targets documented.
- tests: test_regulatory_convergence (18 transition+company pass), mypy --strict clean.
Curated expert knowledge, AI first draft (L1/draft) — Annex/Article refs indicative,
review_required by a machinery-safety expert.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three ANONYMIZED reference transition scenarios (no real company names stored) = canonical
regression scenarios that test the KNOWLEDGE, not just the engine. Each pins an Expected
Outcome (expected_likely_covered + expected_delta); every commit must reproduce it (identical
or better).
- RTS-001 automotive supplier (TISAX+ISO27001) -> CRA: mature ISMS, standard CRA delta.
- RTS-002 classic machine builder (ISO9001) -> CRA: only process discipline -> MUCH larger delta
(10 missing vs 3 covered). New TP-ISO9001-CRA-v1 pattern (different shape).
- RTS-003 networked machine builder (ISMS) -> CRA: highlights the Data Act.
Data Act is modelled as UNCERTAIN (a hypothesis), never a fixed gilt/gilt-nicht: the generator
checks the engine SURFACES the uncertainty + the deciding question (generates_usage_data) and
never wrongly ASSERTS applicability. All three RTS PASS.
Non-runtime knowledge + reference harness -> no deploy (ADR-001). Names deliberately absent.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Transition KNOWLEDGE Patterns (renamed term -- curated knowledge, not an algorithm):
- 4 maturity levels: draft -> reviewed -> validated (domain expert) -> proven (field). "approved"
dropped; target is validated. TP-ISO27001-CRA set to reviewed (L2).
- 3 enrichments per pattern: confidence_source: relationship (curated, not an LLM estimate ->
computed-not-stored); why_asked (customer-facing: why the source does not suffice here); dropped_if
(what makes the question unnecessary). Applied to TP-ISO27001-CRA.
- New TP-ISMS-TISAX (draft): different character -- info-security module mostly covered; delta is
automotive-specific (prototype protection, TISAX labels, VDA ISA self-assessment, ENX assessment,
Art. 28 data protection). Proves the architecture is GENERIC, not CRA-tailored.
- Reference scenario 4 generalized to loop over ALL patterns through RS-005: both carried (CRA
17->17, TISAX 13->13) -> a living genericity + regression test for every future pattern.
Non-runtime knowledge + reference harness -> no deploy (ADR-001). Next: ISO9001->IATF16949.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
(1) Harden the first Transition Pattern to the gold-standard template per quality checklist:
versioned transition_goal (ISO27001:2022 -> CRA, applies 2027-12-11), source_state_variants
(certified/isms_introduced/expired/limited_scope), each likely_covered assumption with a typed
relationship (supports|partially_supports, never equivalent) + verification + rationale (the Warum)
+ an auditor-checkable reviewable_claim, delta as missing-capability + needed-info, an explicit
rejected_assumptions section, and a determinism_goal. README schema updated to match.
(2) New Reference-Suite scenario 4 (Transition): the generator READS the pattern YAML and runs it
through the RS-005 Planning Engine + Company 2A -> coverage + question requests. Proves the
architecture fully carries the pattern (17 caps -> 17 coverage + 17 requests; 9 HIGH delta = the
real CRA gaps, 8 probably-covered from the ISMS). Now a living regression test: every future pattern
runs through the same engine.
Non-runtime knowledge + reference harness -> no deploy (ADR-001). Next: ISMS->TISAX once approved.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reasoning session's new Knowledge Acquisition responsibility (re-charter): build and curate
the Transition Knowledge Base under backend-compliance/knowledge/transition_patterns/ (beside
reasoning/, not under it -- it is knowledge, not an engine).
First professional pattern TP-ISO27001-CRA-v1 (status: draft): separates what a mature ISMS
likely covers at the ORG level (probably_covered, needs product-level confirmation, never
auto-"erfuellt") from the CRA-specific delta with no ISO 27001 analogue (SBOM, support period +
secure signed updates, coordinated vulnerability disclosure, Art. 14 authority reporting,
product cyber risk assessment, CE conformity / technical documentation). Expert draft, not a
normative proof; review_required before customer use.
Non-runtime knowledge -> no deploy (ADR-001). Next: ISMS->TISAX.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Second reasoning mode, scope per user: the engine owns the INFORMATION GAPS, not the
questions. assess_transition(context, target_requirements, company_profile) emits
ranked TransitionQuestionRequest {capability, control, reason, question_intent,
expected_evidence, priority, information_gain} -- NOT rendered question text. Rendering
(intent+subject->sentence) is a separate swappable layer (RS-005.1), not here.
Consumes the Company Capability Profile (2A) as "have" + injected TargetRequirement
(Execution-owned placeholder) as "required" -- no required-capability data in product
code (EMPTY_REQUIREMENTS, mocks only in tests). A certification-derived capability is
probably_covered (Welt 1) -> a confirmation request, never already_covered/"erfuellt".
Deterministic, computed-not-stored, no percentages.
Activates 2A/2C/RCI (first consumer of the Company profile). Freeze-respecting: additive
package, no new graph/base class/meta-model class. 9 tests, mypy --strict clean, LOC ok.
No endpoint/UI/RAG; question rendering deliberately deferred to RS-005.1.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three real customer scenarios driven through the DEPLOYED engines (scope/map/
interpretation, RCI, company 2A, capability registry). Each scenario emits an
Architecture Coverage table DERIVED from the real run, so cells flip automatically
as domains land (e.g. Sz2/Environmental UNSUPPORTED -> PASS). The roll-up answers
"is BreakPilot better than six months ago" by real customer situations, not LOC.
Gaps captured as epics (NOT implemented): RS-001 Interpretation Pattern Library,
RS-002 Environmental Corpus, RS-003 Capability Linking (cap<->MCAP) + Company-Gap,
RS-004 MaschinenVO/EMV Registry Linking.
reference_scenarios/generate.py = reproducible source (ruff/mypy-exempt, NOT product
code, not imported by the app); reference_scenario_suite_v1.md = generated artifact.
No new product code; CRA patterns deliberately NOT built — the suite is now the measure.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Third instance of the identity-machine pattern (after Master Controls and Master
Obligations). New compliance/capability/ package: MasterCapability with stable MCAP
ids, CapabilityCandidate minting, seven typed relation types, a VERSIONED derivation
policy, and identity lifecycle (merge/split/deprecate/redirect with provenance).
Stored: identities, sources, relationship types, policy versions, lifecycle events,
provenance. Derived (never stored): confidence/status via evaluate_relation under a
policy version. Hard rule (structurally guarded): a certification alone can never
yield CONFIRMED — only CONFIRMS + concrete artifact (or expert) does.
Built from the Reasoning session per user directive but this IS the Compliance
Execution model (Execution owns Capability) — handed off via the board. Metadata-first:
CapabilityRelation is registry metadata, NOT a new meta-model class (freeze v1.0
untouched). No Company-Gap, no real ISO/cert mappings, no UI/RAG, no generic
canonicalization engine. 11 tests; mypy --strict clean; LOC ok.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
HEAD of the spine Company->Capability->Product->Regulation->Obligation->Procedure
->Evidence. New compliance/company/ package: CompanyContext container + a four-state
trust model (declared/inferred/confirmed/unknown).
Hard rule (structural): a certification yields at most an INFERRED candidate and is
never auto-treated as CONFIRMED/"erfuellt". A certification produces evidence-of-
capability; only real ExistingEvidence promotes a capability to CONFIRMED.
Ownership: Reasoning owns the container + trust-state; the Certification->Capability
mapping is Execution's domain, consumed via an injected contract. No mapping data in
product code (tests inject mocks). No endpoint/UI/RAG/new regs/controls; no meta-model
classes (freeze v1.0 untouched). 8 tests; mypy --strict clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
RCI/Delta as a read-/reasoning layer ON TOP of the product-first pipeline. Answers
"what changes relative to my existing Regulatory Map?" — NOT "what does the new
law say in general". No UI, no ingestion (newsletter/mailbox), no RAG, no new
regulations/controls, no legal evaluation outside the stored map.
- 4 core objects (compliance/rci/schemas.py): ComplianceBaseline (snapshot of
profile + map + registry obligations + required/present evidence), RegulatoryChange
(simulated/provided INPUT), ObligationDelta (delta_type NEW|CHANGED|REMOVED|
ALREADY_COVERED|NEEDS_REVIEW|NOT_APPLICABLE), ChangeImpactSummary. delta_type is a
THIRD vocabulary, disjoint from ClaimCoverage (Welt 1) and ComplianceStatus (Welt 2).
- create_baseline() snapshots the existing pipeline once; assess_change() computes
deltas deterministically against the snapshot (no re-evaluation).
- 12 tests = the 5 acceptance questions (affects product? new/changed? already
covered by evidence? needs human review? not relevant?) + repeal/uncertain-reg/
missing-evidence/boundary. Existing pipeline tests stay green; mypy clean; LOC ok.
- App/reasoning types only — no compliance-meta-model classes (freeze v1.0 untouched).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Thin adapter — it judges the customer's reading WITHIN the already-built
RegulatoryMap, it does not assess abstract legal questions and it is not RCI.
- Reuses the existing assess_interpretation (no new legal reasoning); the 6
verdicts (plausible/too_narrow/too_broad/partially_correct/unsupported/uncertain)
pass through unchanged.
- Restricts affected_regulations/affected_obligations to those present in the map
(intersection); links to the map's uncertain regulations.
- Touched unsupported domains (wastewater/chemicals/...) are reported as
future_corpus_domains (future_corpus_needed) — never pseudo-evaluated.
- Customer-readable explanation ("Ihre Interpretation ist wahrscheinlich zu eng. …
Betroffen in Ihrer Map: CRA.").
- POST /reasoning/interpretation-in-map (renders the map, then interprets).
- 7 tests; 63 green (existing reasoning MVP stays green), mypy clean, LOC ok.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Map Renderer explains the engine's state, it does not extend it. Pure
composition of resolve_product_scope (scope verdict) + derive_obligations
(registry-linked obligations + overlaps) into one RegulatoryMap.
- product_summary, trigger_facts, applicable/uncertain/excluded regulations,
unsupported_domains, overlaps (shared_obligations), shared_evidence, and a
customer-readable executive_summary.
- No own legal decisions: applicable/uncertain mirror the scope verdict exactly.
- Obligations shown ONLY when registry-linkable (registry_anchor) — MaschinenVO/
EMV obligations are proposed, so they render empty + a note, never as linked.
Overlaps/shared_evidence likewise filtered to registry-linked members.
- Uncertain regulations link to the navigator question that would resolve them
(RED -> has_radio_module, DataAct -> generates_usage_data).
- Environmental appears only as unsupported_domain; executive_summary has NO
percentage (counts + "no further regulations identified" instead).
- POST /reasoning/regulatory-map (thin handler). Response types are presentation-
level, not meta-model classes (freeze v1.0 untouched).
- 9 tests; 56 green (existing reasoning MVP stays green), mypy clean, LOC ok.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Connects the Navigator's fact-gate to the existing reasoning discover_scope —
the Scope Engine decides only once the minimum (P0) facts are released.
- resolve_product_scope(canonical): if not ready_for_scope -> NEEDS_FACTS
(missing_facts + suggested_questions, discover_scope NOT run); else project
canonical->reasoning profile and run the EXISTING discover_scope exactly once
-> RESOLVED with applicable/excluded/uncertain regulations.
- Environmental triggers surface ONLY as unsupported_domains (future_corpus_needed),
never as a legal evaluation — transparency, no false completeness.
- POST /reasoning/product-scope (thin handler) returns case NEEDS_FACTS or RESOLVED.
- No new scope rules, no new regulations, no environmental-law evaluation, no UI,
no Go, no RAG, no percent-compliance. Response types are application-level, not
meta-model classes (freeze v1.0 untouched).
- 6 tests incl. discover_scope spy (0 calls when gated, exactly 1 when ready),
category separation, environmental-as-unsupported-only. 47 tests green (existing
reasoning MVP tests stay green), mypy clean, LOC ok.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 2 of the convergence sequence. The Navigator sits over the
CanonicalProductRegulatoryProfile (prefilled from company-profile / ProductWizard)
and reports ONLY which facts are still missing + prioritized questions to collect
them. It decides which facts are needed, NEVER what applies — that stays with the
Scope Engine (step 3). No regulation logic, no UI, no Go, no RAG.
- NavigatorQuestion (interaction type, NOT a compliance-meta-model class — freeze
v1.0 untouched): question_id, target_field, label, why_needed,
regulatory_domains_unblocked (static metadata), answer_type, options, priority.
- QUESTION_CATALOG: 12 questions over canonical gaps — P0 (markets, role,
lifecycle, machine/component), P1 (radio, usage-data, security-function,
environmental wastewater/air/chemicals triggers), P2 (structured BOM).
- engine: navigate() -> missing_facts + suggested_questions (priority-sorted) +
completeness_summary (ready_for_scope = no P0 missing); apply_answers() ->
updated profile. Pure field-presence; no scope import.
- 8 tests: <=10 questions for a filled company-profile, known facts not re-asked,
environmental = trigger questions only (no law evaluation), apply round-trip,
P0 ordering, ready_for_scope. 41 tests green, mypy clean, LOC ok.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ONE canonical product profile so the Go gap engine and the Python reasoning
engine stop diverging ("SPS mit Remote Access" means the same everywhere).
gap.ProductProfile LEADS; the reasoning ProductProfile becomes an adapter/DTO.
Types + mappers only — no regulation logic, no Go changes, no UI, no new questions.
- CanonicalProductRegulatoryProfile mirrors gap.ProductProfile + the Navigator
gaps the audit found: economic-operator role, radio_module, generates_usage_data,
lifecycle_phase, structured BOM (ProductComponent), safety-vs-security split,
machine-vs-component + a forward-looking EnvironmentalImpact domain (wastewater/
air/chemicals triggers — fields only, no rules yet).
- Mappers: from_product_wizard (lossless), from_company_profile (prefill incl.
the machineBuilder block), to_gap_profile (emits the unchanged gap JSON shape),
to_reasoning_profile (projects into the reasoning ProductProfile; AI stays
delegated to ai-act/ucca). Only profile->reasoning is coupled; reasoning stays
hermetic.
- 10 tests = the 10 acceptance criteria incl. ProductWizard round-trip lossless,
markets no longer forced ['EU'], and canonical->reasoning->discover_scope
proving one semantic profile drives the engine. 33 tests green, mypy clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
"vollständig" still implied fulfillment. potentially_addresses now reads
"… adressiert N Pflichten direkt und M teilweise; K werden durch die Aussage
nicht berührt. … Dies ist keine Konformitätsaussage." Enum value kept
(potentially_addresses chosen over addresses_claimed for product clarity).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Architecture-validation finding: the implementation mode produced compliance-
flavored output ("teilweise erfüllt", "covered") from a mere customer claim,
blurring the line to the Execution layer. This is a design decision, not a text
fix — the reasoning layer judges only the customer's STATEMENT, never conformity.
- CoverageStatus -> ClaimCoverage; values are claim-relative + carry "potential":
potentially_addresses / partially_addresses / does_not_address /
insufficient_information.
- ImplementationAssessment -> ClaimObligationMapping (coverage_status ->
claim_coverage); ImplementationResponse -> ImplementationReasoningResponse
(assessments -> mappings, + explicit `disclaimer`); request renamed; engine
entry assess_implementation -> reason_implementation_claim.
- Endpoint /reasoning/implementation-assessment -> /reasoning/implementation-reasoning.
- Summary/explanations reworded: "adressiert wahrscheinlich N Pflichten … für
eine Bewertung der tatsächlichen Umsetzung sind Nachweise erforderlich (keine
Konformitätsaussage)". No "erfüllt"/"abgedeckt" leaks.
- New guard test asserts no compliance verdict leaks (no "erfüllt"; disclaimer
separates ClaimCoverage from ComplianceStatus). 23 tests green, mypy clean.
Discovery (scope/obligations) was already structurally claim-free and unaffected.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Deterministic reasoning layer ON TOP of the Legal Knowledge Graph (obligation
registry) and the Compliance Execution Graph (control mapping/evidence). Answers
which regulations apply to a concrete product, which obligations follow, whether
the customer's implementation covers them, and whether a customer interpretation
is too narrow/broad/plausible.
- ProductProfile with tri-state facts (Optional[bool]=None => uncertain, never
false security); safe predicate evaluator (no eval).
- 6 regulation triggers (CRA/MaschinenVO/RED/EMV/DataAct/NIS2) with missing-fact
prompts; 24 obligation scope rules.
- CRA obligation_ids RE-USED verbatim from the registry (93 ids) — never re-minted
(control_uuid trap); Machine/Data-Act flagged proposed=True.
- required_evidence constrained to the framework-agnostic shared evidence catalog;
capabilities echo the planned Obligation->Capability layer.
- Overlap groups (CRA<->MaschinenVO cyber-safety) + evidence-for-multiple (USP).
- 4 endpoints POST /reasoning/{scope,obligations,implementation-assessment,
interpretation-assessment}; thin handlers, registered in api/__init__.py.
- 22 tests (5 machine-builder scenarios + 10 acceptance questions). No DB
migration, no RAG, no new controls.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reichert die Obligation-Shadow-Telemetrie um zwei Felder an für die Cross-Firmen-
Auswertung: met_count (abgedeckte Obligations) + recall_limited_obligations (welche
Obligations recall-limitiert sind) — erlaubt die Konzentrations-Analyse über Firmen.
7-Firmen-Shadow: 136 Control-Findings → 29 Obligation-Findings (4,7×); recall_limited
nur 6/29, konzentriert auf third_country/safeguards in 2/7 Firmen → LLM-Fix bounded.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
gpt-oss-120b is a reasoning model: it spends output tokens on chain-of-thought
before the answer. deep_check called _call_ovh with max_tokens=400, which
length-capped it mid-reasoning -> content=null -> the OVH tier returned nothing
and the cascade always skipped Tier-2. Floor the OVH budget to >=2000, fall back
to reasoning_content when content is null, and raise the client timeout to 90s
for the slower reasoning path.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Generalise "Embedding finds, Claude decides" into the shared Pruefer-Library:
- router.route_and_check dispatches control -> sensor_classification -> Checker.
- build_spec reads sensor_classification (CONTENT/LLM -> judge=haiku, the
validated sufficiency tier; the Qwen-first cascade is disproven for sufficiency).
- LLMChecker gains a Haiku-direct tier (reuses the validated deep_check prompt).
- Cookie Layer-3 now routes through route_and_check instead of bespoke code, so
cookie is the first real router consumer -- proves the architecture end-to-end.
Reproduces the validated result via the shared path: FN 159->14, recall
0.13->0.92, precision 0.89 (vs bespoke 12/0.93/0.90 -- within Haiku noise).
Tests: 10/10 (router dispatch + build_spec + haiku tier + cookie rewire).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The embedding/boost auto-rescue is intentionally optimistic (finds the topic, not
fulfilment) -> 159 FN over-rescues vs Opus-GT (recall 0.13). Layer-3 re-judges
exactly the rescued passes with the validated Haiku judge (cohort
cookie_sufficiency_v1 P0.89/R0.91) -- NOT the Qwen-first cascade (local is
disproven as a sufficiency judge) -- and un-passes them when the obligation is
not concretely met. Gated to the full check (not skip_llm).
Measured (5-firm Opus-GT, engine+L3): FN 159->12, recall 0.13->0.93,
precision 0.96->0.90 (276 rescues corrected). "Embedding finds, Claude decides."
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The cookie agent loaded 100 controls, 11 of which have no COOKIE_POLICY in
applicable_artifacts -- Security/TOM/Audit (PROCESS) or Banner-behaviour
(BEHAVIOR) controls that produce nonsense findings against a cookie policy
(e.g. "TOMs not documented"). Add a cookie classification gate (analogous to the
DSE gate, keyed on COOKIE_POLICY, without the needs_review carve-out since the
artifact signal is decisive and the set is inventory-verified). Controls are
routed out, not deleted. Effect vs Opus-GT: FP 16->11, FN 179->159; the
remaining FN=159 over-rescue is a separate (judge/criteria) question, not routing.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
compliance.doc_check_controls war auf prod historisch trippliziert
(Dump-Artefakt ohne PK/Unique: jede (doc_type, control_id)-Zeile 3x,
5622 statt 1874 ueber alle 8 doc_types). Die Migration dedupt idempotent
(kleinste ctid behalten) und setzt UNIQUE(doc_type, control_id), damit
sich die Triplikation nicht wiederholen kann.
Auf prod bereits direkt angewandt und in _migration_history registriert
(read-only verifiziert: 1874, alle doc_types total=distinct, Constraint
aktiv); dieser Commit codifiziert die Migration in der Deploy-Kette,
damit ein Restore aus einem aelteren Dump sie automatisch re-appliziert.
[migration-approved]
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
#1 _machinery_obligations: SET statement_timeout=4s + run_in_threadpool — auf
prod hing die maschinen-Query ~30s (langsame/unindizierte DB nach DB-Swap)
und blockierte den async-Worker. Jetzt: bei Langsamkeit graceful 'keine
Maschinen-Pflichten' statt Hang. (Fehlender prod-Index = Controls/DB-Session.)
#2 parse_grenzen_json: tolerant ggue. ```json-Fences / Prosa-umschlossenem JSON
(gehostete Modelle wie OVH ignorieren z.T. response_format) → Datenblatt-
Extraktion liefert auch ueber den OVH-Fallback Felder.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1) createProject las proj.id, der Create-Response ist aber {project:{id}} →
'Projekt anlegen' war kaputt. Jetzt proj.project?.id. E2E verifiziert
(create→put limits_form→get→delete = 200).
2) MaschinenVO-Sicherheitspflichten wurden in die CRA-Cyber-Buckets
(Code/Prozess/Doku) gemischt → fehl-kategorisiert (Maschinen-Safety ≠
CRA-Annex-I-Cyber). Jetzt eigene Response-Liste machinery_guideline +
eigener Frontend-Abschnitt 'Maschinensicherheit (MaschinenVO 2023/1230)';
geklebtes 'MaschVO'-Badge entfaellt damit.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
qwen3.5:35b-a3b ist ein Thinking-Modell → generierte erst Reasoning, riss das
90s-Timeout → leere Extraktion. llm_cascade additiv um think-Param erweitert
(Cache-Key kennt think); Datenblatt-Extraktor setzt think=False → sauberes JSON
in ~1s. Default fuer alle anderen Cascade-Nutzer unveraendert.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
llm_cascade additiv modell-faehig (optionaler model-Param, Cache-Key kennt
model_hint → keine Kollision; Default unveraendert für alle anderen Nutzer).
Datenblatt-Extraktor nutzt jetzt qwen3.5:35b-a3b (CRA_DATASHEET_MODEL, gleiches
Modell wie der Compliance Advisor) für bessere semantische Zuordnung. Plus
llm_status (ok|empty|unavailable) + Logging statt stillem except; Frontend zeigt
bei 'unavailable' einen Hinweis statt leerer Felder (wichtig auf prod ohne
lokales Ollama → Cascade-Fallback bzw. Hinweis).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Hybrid-Extraktion Datenblatt → IACE Grenzen (ISO 12100): deterministischer
Detektor (Schnittstellen/Einheiten per Regex) + lokales 35B via llm_cascade
(Qwen-lokal-first) fuer die semantische Zuordnung auf die echten LimitsFormData-
Keys. Nichts erfinden: Feld nicht im Text → leer + Quellen-Zitat je Feld.
Essenzielle ISO-12100-Felder, die leer bleiben → gezielte Rückfragen
(foreseeable_misuses, person_groups, qualification, temporal_limits …).
Endpoint POST /api/v1/cra/extract-datasheet. 13 Tests gruen (reine Teile).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3-Tier-MaschinenVO-Verdict (direkt / sicherheitsrelevant / nicht relevant) aus
Personengefährdungs-Signal: eine Komponente ist keine Maschine, aber wenn ihre
Funktion bei Fehler ODER Manipulation Personen gefaehrden kann (Bewegung, Laser/
Auge, Kraft, Temperatur, elektrisch), ist sie sicherheitsrelevant — Pflicht
trifft den Maschinenbauer, Zulieferer liefert Nachweise, und ein Cyber-Angriff
kann die Sicherheitsfunktion aushebeln (Cyber-Safety-Bruecke). OWIS-mit-Laser
landet so korrekt als 'sicherheitsrelevante Komponente'. Engine + /readiness
additiv; Frontend: Gefährdungs-Frage + -Typen, MaschinenVO-Ergebnisblock.
Presets aktualisiert (OWIS: Laser+Bewegung, Zwick: Bewegung). 22 Tests gruen.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reine, deterministische Verdict-Schicht ueber der bestehenden Annex-III/IV-
Klassifikation (kein vierter Klassifizierer): trennt Rechtspflicht von Markt-
Druck. Kern: das Inverkehrbringen (ab 11.12.2027), nicht der Entwicklungs-
zeitpunkt, entscheidet — Bestandsprodukte, die nach der Frist weiter verkauft
werden, fallen unter CRA. Producer-Typen (component/end_device/machine_
integrator/software_app) steuern Default-Annahmen (Anlagenbauer: Vernetzung/OTA
vorausgesetzt) + Verdict-Betonung (Komponente => Markt-Druck). Plus Evidence-
Checkliste (SBOM/VDP/Patch/Lifecycle/Threat-Model/Logging/Auth/Incident) +
Reifegrad. /readiness additiv erweitert (verdict/maturity/digital_elements/
producer_type). 15 Tests gruen. Beispiele: OWIS PS90+, ZwickRoell roboTest.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
prod-canonical_controls (aus dem DB-Swap) hat weder PK noch Unique auf id →
FK InvalidForeignKey. control_uuid bleibt UUID (logische Referenz), wie die
bereits FK-lose atom_classification auf prod. [migration-approved]
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Liest den Lebenszyklus jedes Befunds (status + tracker_issue_url) aus dem
Scanner zurück und rollt ihn zu einem Management-Bild auf: % erledigt,
4-Phasen (offen/in Arbeit/erledigt/ausgeschlossen), offenes Restrisiko nach
Schweregrad, Fortschritt je CRA-Anforderung und eine Aufgaben-/Ticket-Tabelle
mit Jira-Link. Neuer Endpoint GET/POST /api/v1/cra/progress (dünn → Service
cra_progress, rein deterministisch, kein /assess-Schema-Drift). Frontend:
ProgressView in Ebene 1 (CRACyberView), live je Scanner-Repo, sonst Demo-Status.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Jede Normreferenz einer Maßnahme wird lizenzklassifiziert (eu_law /
public_domain / open / paid_reference) — paid-reference-Normen werden nur als
Verweis geführt, nie im Text gespeichert (idea/expression). Kuratierte
Maßnahmen tragen Tier 'core', KI-/Fallback-Maßnahmen 'review' (indikativ).
Frontend zeigt Quellen-Badges + "indikativ"-Kennzeichnung. Methodik in
docs-src/development/mapping-methodology.md (Szenario C, Due-Diligence).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>