The architecture is stable; from here the value comes from DOMAINS, not more software. Phase B is
organized as law-first Domain Knowledge Programs, each delivering the same production line: Corpus ->
Obligations -> Capabilities -> Transition Patterns -> Playbooks -> Reference Scenarios -> Completeness.
No new runtime framework (Freeze v1.0).
- knowledge/programs/README.md: reusable Domain Program blueprint (production line, per-stage ownership,
law-first ordering, planned programs Environmental/Automotive/IEC62443/Functional-Safety).
- knowledge/programs/environmental.yaml: the Environmental domain as DATA. Law-first: B1 Environmental
Regulatory Corpus (water/chemicals/emissions/energy/waste/product-responsibility — law + obligations
only) -> B2 Capability Model -> B3 Transition Patterns (ISO 14001 -> corpus, built LAST). ISO 14001
is a source state, NOT the domain.
- Ownership handoffs: B1 -> Legal Knowledge, B2 -> Compliance Execution, B3+/playbooks/reference ->
Reasoning. Coordinate via the board; no session builds another's artifacts.
- reference suite: "Domain Knowledge Programs" section renders the program stages + a measurable
Completeness baseline (6 areas, 0 assessed today) that flips automatically as stages land.
- ADR-008: from architecture to domains; Phase B as law-first programs; architecture frozen.
6 program-contract tests (law-first order + ownership pinned), check-loc 0. Knowledge data + ADR +
reference harness = non-runtime -> no deploy (ADR-001). No new module, no runtime change.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase A½. The move from feature to product development: for every assessment, answer "how sure are
we that this answer is COMPLETE?" — different from confidence. The product never claims full coverage;
it makes its own knowledge state transparent and auditable. Shows what we do NOT know and why.
- compliance/completeness/: assess_completeness(identified, corpus_status, uncertain, assumptions,
assessed_obligations) -> CompletenessReport. Separates IDENTIFIED from ASSESSED (validated corpus
AND determined applicability) and justifies every gap. Two kinds of open: corpus gap (future_corpus)
and applicability uncertainty (query_required + deciding question, e.g. Data Act / generates_usage_data).
- The metric is COUNTS, never a single percentage: "Identifiziert N · bewertet M · offen K ·
Unsicherheiten U · Begründung ja" + an honest audit statement.
- ADR-007: auditable honesty; phase order A factory -> A½ Completeness -> B new domains; the
transparency selling point. Deterministic, no LLM; corpus status + obligation count injected.
- reference suite: "Regulatory Completeness" section runs an industrial-dishwasher assessment
(assessed CRA/MaschinenVO; open EMV/Environmental=future_corpus, Data Act=query_required) and notes
Environmental flips open->validated automatically once the corpus lands.
11 completeness tests (54 with adjacent modules), mypy --strict clean (15 files), check-loc 0.
Product code with no app caller + ADR/reference = non-runtime -> no deploy (ADR-001). Freeze-safe.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase A1. The real knowledge production is not writing — it is TARGETED UPDATING: when 20 documents
arrive, which 5 change our knowledge and which 15 are ignorable? Before the parser, Knowledge Intake
classifies a new document (no content extraction) and intersects its signals with an index of the
existing knowledge to emit a Knowledge Package (an impact analysis).
- compliance/knowledge_intake/: build_knowledge_index(patterns, playbooks, reference_scenarios,
obligation_index) + assess_document_impact(descriptor, index) -> KnowledgePackage. Deterministic,
NO content extraction, NO LLM. Surfaces affected capabilities / playbooks / transition patterns /
reference scenarios / (injected) obligations, whether it is a new domain, and a triage level
(HIGH / LOW / NONE / NEW_DOMAIN) with a recommendation.
- ADR-006: Knowledge Intake = classify + impact before extraction; full factory Intake -> Package ->
Parser -> Draft -> Review -> Published; phase order A1 Intake / A2 Draft / A3 Review.
- reference suite: "Knowledge Intake" section triages 3 example documents (CRA SBOM-FAQ -> high,
14C/2PB/3RTS/2Obl; environmental guidance -> new_domain; marketing blog -> ignorable). Section
lives in _helpers.py to keep generate.py under the 500-LOC budget.
- Honest known refinement surfaced by intake: regulation-ID normalization (CRA vs Cyber Resilience Act).
10 intake tests (60 with the adjacent modules), mypy --strict clean (16 files), check-loc 0.
Product code with no app caller + ADR/reference = non-runtime -> no deploy (ADR-001). Freeze-safe.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The bottleneck is not content, it is knowledge PRODUCTION. Instead of writing 200 playbooks by
hand, generate drafts deterministically from data the software already owns, then have an expert
review them. Mirrors the legal pipeline (Gesetz -> Parser -> Obligation -> Review) for BreakPilot's
own knowledge: new Capability -> Registry -> Transition Pattern -> Playbook Draft Generator ->
Expert Review -> versioned Playbook.
- compliance/knowledge_production/: generate_playbook_draft(capability, requirement, control_links)
+ drafts_from_pattern(pattern) -> one PlaybookDraft per delta capability. Owned fields (why /
closes_regulations / expected_evidence / typical_controls) are assembled with per-field provenance;
the practitioner know-how (tools / process_steps / how_others) is left as an explicit TODO.
- DraftStatus lifecycle (Freigabestatus): draft_generated -> in_review -> reviewed -> validated ->
proven. Deterministic, NO LLM in the core (any model enrichment stays offline/advisory/propose-only).
- ADR-005: extends "the engine does not change, the corpus grows" with "and the corpus is not written
by hand — it is deterministically prepared, then curated".
- reference suite: "Knowledge Production" section turns the convergence pattern into 12 auto-assembled
drafts (why/closes/evidence filled, tools/steps TODO) -> review 12 drafts, don't write 12 playbooks.
10 tests (50 with playbook/optimization/transition/company), mypy --strict clean, check-loc 0.
Product code with no app caller + ADR/reference = non-runtime -> no deploy (ADR-001). Freeze-safe.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Roadmap item 4. After WHAT applies / WHAT is missing / WHICH first, the GF asks HOW. The
Implementation Playbook renders, for one capability, the full journey — why / which regulations
it closes / tools / process / evidence / controls — and chains the Optimization Roadmap into
per-measure playbooks. Another renderer over the same Capability spine (ADR-003/004), not a new
engine: ~95% of the data already exists, it just needs a different rendering.
- compliance/playbook/: build_playbook() + playbooks_for_plan() (chains optimization -> playbook,
acyclic; reuses leverage for "closes which regulations"). Capabilities without curated content
render as honest status:missing stubs — the content-owed signal.
- knowledge/implementation_playbooks/: curated knowledge layer (Reasoning Knowledge Acquisition),
two deep expert drafts (SBOM, CVD/PSIRT, status draft, expert-draft-not-normative) + README.
The bottleneck is now CONTENT, not software; Playbook (own knowledge) != regulatory domain.
- ADR-004: Implementation Playbooks = renderer + knowledge layer; content is the bottleneck.
- reference suite: "Implementation Playbook" section renders the SBOM journey + Roadmap->Playbook
table (high-leverage caps flagged "fehlt (Inhalt)" — content backlog, highest leverage first).
- refactor: extracted markdown helpers to reference_scenarios/_helpers.py to keep generate.py
under the 500-LOC budget.
9 playbook tests (40 with optimization+transition+company), mypy --strict clean, check-loc 0.
Product code with no app caller + knowledge/ADR/reference = non-runtime -> no deploy (ADR-001).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Roadmap item 5. GAP analysis and measure-prioritisation are the SAME computation: Required −
Known = the Capability Delta. The Capability Delta Engine (RS-005) computes it once; renderers
read that ONE delta. Interview Renderer (missing info → questions) was already built; this adds
the Roadmap/Management Renderer (missing capabilities → measures ranked by regulatory leverage).
- compliance/optimization/: regulatory_leverage() + select_within_budget() (pure leverage math)
+ roadmap_from_delta(assessment, ...) — the keystone binding optimization to the RS-005 delta
(dependency optimization → transition_reasoning, acyclic; the delta engine stays hermetic).
leverage(measure) = number of regulatory requirements it closes at once (e.g. patch management
→ CRA+MaschinenVO+IEC62443+ISO27001 = 4). No new corpus, no new meta-model class (freeze v1.0).
- Welt-1 honesty: percentages are exact count ratios over the IDENTIFIED requirements (the known
delta), never "% gesetzeskonform".
- reference suite: "Regulatory Optimization" section runs the SAME convergence delta → ranked
measures + budget answer + the management sentence "of N identified requirements you close M
with the top-K measures (X%) — highest regulatory leverage".
- ADR-003: Capability Delta Engine — one delta, many renderers; rename Gap → Capability Delta.
13 optimization tests (31 with transition+company), mypy --strict clean, check-loc 0.
Product code with no app caller + ADR/reference = non-runtime → no deploy (ADR-001).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The first multi-regulation pattern: each capability declares `covers_targets`, so we
can answer the convergence USP — "which capability satisfies CRA AND MaschinenVO at once?"
- knowledge: transition_pattern_iso27001_to_cra_maschinenvo_v1.yaml (pattern_type:
regulatory_convergence, status draft). The cyber-safety bridge = MaschinenVO Annex III
1.1.9 "protection against corruption" overlapping CRA integrity. 4 convergence
capabilities cover BOTH; 5 CRA-only; 3 MaschinenVO-only.
- product: compliance/transition_reasoning/convergence.py — regulatory_convergence()
pure/deterministic/computed-not-stored, no new graph/class (freeze v1.0 untouched).
No app caller yet -> non-runtime, no deploy (ADR-001).
- reference suite: Cross-Regulation Capability Mapping section renders the customer
sentence "von N neuen Massnahmen erfuellen M gleichzeitig CRA und MaschinenVO".
- README: term -> Regulatory Transition / Convergence Pattern; covers_targets documented.
- tests: test_regulatory_convergence (18 transition+company pass), mypy --strict clean.
Curated expert knowledge, AI first draft (L1/draft) — Annex/Article refs indicative,
review_required by a machinery-safety expert.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Second reasoning mode, scope per user: the engine owns the INFORMATION GAPS, not the
questions. assess_transition(context, target_requirements, company_profile) emits
ranked TransitionQuestionRequest {capability, control, reason, question_intent,
expected_evidence, priority, information_gain} -- NOT rendered question text. Rendering
(intent+subject->sentence) is a separate swappable layer (RS-005.1), not here.
Consumes the Company Capability Profile (2A) as "have" + injected TargetRequirement
(Execution-owned placeholder) as "required" -- no required-capability data in product
code (EMPTY_REQUIREMENTS, mocks only in tests). A certification-derived capability is
probably_covered (Welt 1) -> a confirmation request, never already_covered/"erfuellt".
Deterministic, computed-not-stored, no percentages.
Activates 2A/2C/RCI (first consumer of the Company profile). Freeze-respecting: additive
package, no new graph/base class/meta-model class. 9 tests, mypy --strict clean, LOC ok.
No endpoint/UI/RAG; question rendering deliberately deferred to RS-005.1.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Third instance of the identity-machine pattern (after Master Controls and Master
Obligations). New compliance/capability/ package: MasterCapability with stable MCAP
ids, CapabilityCandidate minting, seven typed relation types, a VERSIONED derivation
policy, and identity lifecycle (merge/split/deprecate/redirect with provenance).
Stored: identities, sources, relationship types, policy versions, lifecycle events,
provenance. Derived (never stored): confidence/status via evaluate_relation under a
policy version. Hard rule (structurally guarded): a certification alone can never
yield CONFIRMED — only CONFIRMS + concrete artifact (or expert) does.
Built from the Reasoning session per user directive but this IS the Compliance
Execution model (Execution owns Capability) — handed off via the board. Metadata-first:
CapabilityRelation is registry metadata, NOT a new meta-model class (freeze v1.0
untouched). No Company-Gap, no real ISO/cert mappings, no UI/RAG, no generic
canonicalization engine. 11 tests; mypy --strict clean; LOC ok.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
HEAD of the spine Company->Capability->Product->Regulation->Obligation->Procedure
->Evidence. New compliance/company/ package: CompanyContext container + a four-state
trust model (declared/inferred/confirmed/unknown).
Hard rule (structural): a certification yields at most an INFERRED candidate and is
never auto-treated as CONFIRMED/"erfuellt". A certification produces evidence-of-
capability; only real ExistingEvidence promotes a capability to CONFIRMED.
Ownership: Reasoning owns the container + trust-state; the Certification->Capability
mapping is Execution's domain, consumed via an injected contract. No mapping data in
product code (tests inject mocks). No endpoint/UI/RAG/new regs/controls; no meta-model
classes (freeze v1.0 untouched). 8 tests; mypy --strict clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
RCI/Delta as a read-/reasoning layer ON TOP of the product-first pipeline. Answers
"what changes relative to my existing Regulatory Map?" — NOT "what does the new
law say in general". No UI, no ingestion (newsletter/mailbox), no RAG, no new
regulations/controls, no legal evaluation outside the stored map.
- 4 core objects (compliance/rci/schemas.py): ComplianceBaseline (snapshot of
profile + map + registry obligations + required/present evidence), RegulatoryChange
(simulated/provided INPUT), ObligationDelta (delta_type NEW|CHANGED|REMOVED|
ALREADY_COVERED|NEEDS_REVIEW|NOT_APPLICABLE), ChangeImpactSummary. delta_type is a
THIRD vocabulary, disjoint from ClaimCoverage (Welt 1) and ComplianceStatus (Welt 2).
- create_baseline() snapshots the existing pipeline once; assess_change() computes
deltas deterministically against the snapshot (no re-evaluation).
- 12 tests = the 5 acceptance questions (affects product? new/changed? already
covered by evidence? needs human review? not relevant?) + repeal/uncertain-reg/
missing-evidence/boundary. Existing pipeline tests stay green; mypy clean; LOC ok.
- App/reasoning types only — no compliance-meta-model classes (freeze v1.0 untouched).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Thin adapter — it judges the customer's reading WITHIN the already-built
RegulatoryMap, it does not assess abstract legal questions and it is not RCI.
- Reuses the existing assess_interpretation (no new legal reasoning); the 6
verdicts (plausible/too_narrow/too_broad/partially_correct/unsupported/uncertain)
pass through unchanged.
- Restricts affected_regulations/affected_obligations to those present in the map
(intersection); links to the map's uncertain regulations.
- Touched unsupported domains (wastewater/chemicals/...) are reported as
future_corpus_domains (future_corpus_needed) — never pseudo-evaluated.
- Customer-readable explanation ("Ihre Interpretation ist wahrscheinlich zu eng. …
Betroffen in Ihrer Map: CRA.").
- POST /reasoning/interpretation-in-map (renders the map, then interprets).
- 7 tests; 63 green (existing reasoning MVP stays green), mypy clean, LOC ok.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Map Renderer explains the engine's state, it does not extend it. Pure
composition of resolve_product_scope (scope verdict) + derive_obligations
(registry-linked obligations + overlaps) into one RegulatoryMap.
- product_summary, trigger_facts, applicable/uncertain/excluded regulations,
unsupported_domains, overlaps (shared_obligations), shared_evidence, and a
customer-readable executive_summary.
- No own legal decisions: applicable/uncertain mirror the scope verdict exactly.
- Obligations shown ONLY when registry-linkable (registry_anchor) — MaschinenVO/
EMV obligations are proposed, so they render empty + a note, never as linked.
Overlaps/shared_evidence likewise filtered to registry-linked members.
- Uncertain regulations link to the navigator question that would resolve them
(RED -> has_radio_module, DataAct -> generates_usage_data).
- Environmental appears only as unsupported_domain; executive_summary has NO
percentage (counts + "no further regulations identified" instead).
- POST /reasoning/regulatory-map (thin handler). Response types are presentation-
level, not meta-model classes (freeze v1.0 untouched).
- 9 tests; 56 green (existing reasoning MVP stays green), mypy clean, LOC ok.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Connects the Navigator's fact-gate to the existing reasoning discover_scope —
the Scope Engine decides only once the minimum (P0) facts are released.
- resolve_product_scope(canonical): if not ready_for_scope -> NEEDS_FACTS
(missing_facts + suggested_questions, discover_scope NOT run); else project
canonical->reasoning profile and run the EXISTING discover_scope exactly once
-> RESOLVED with applicable/excluded/uncertain regulations.
- Environmental triggers surface ONLY as unsupported_domains (future_corpus_needed),
never as a legal evaluation — transparency, no false completeness.
- POST /reasoning/product-scope (thin handler) returns case NEEDS_FACTS or RESOLVED.
- No new scope rules, no new regulations, no environmental-law evaluation, no UI,
no Go, no RAG, no percent-compliance. Response types are application-level, not
meta-model classes (freeze v1.0 untouched).
- 6 tests incl. discover_scope spy (0 calls when gated, exactly 1 when ready),
category separation, environmental-as-unsupported-only. 47 tests green (existing
reasoning MVP tests stay green), mypy clean, LOC ok.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 2 of the convergence sequence. The Navigator sits over the
CanonicalProductRegulatoryProfile (prefilled from company-profile / ProductWizard)
and reports ONLY which facts are still missing + prioritized questions to collect
them. It decides which facts are needed, NEVER what applies — that stays with the
Scope Engine (step 3). No regulation logic, no UI, no Go, no RAG.
- NavigatorQuestion (interaction type, NOT a compliance-meta-model class — freeze
v1.0 untouched): question_id, target_field, label, why_needed,
regulatory_domains_unblocked (static metadata), answer_type, options, priority.
- QUESTION_CATALOG: 12 questions over canonical gaps — P0 (markets, role,
lifecycle, machine/component), P1 (radio, usage-data, security-function,
environmental wastewater/air/chemicals triggers), P2 (structured BOM).
- engine: navigate() -> missing_facts + suggested_questions (priority-sorted) +
completeness_summary (ready_for_scope = no P0 missing); apply_answers() ->
updated profile. Pure field-presence; no scope import.
- 8 tests: <=10 questions for a filled company-profile, known facts not re-asked,
environmental = trigger questions only (no law evaluation), apply round-trip,
P0 ordering, ready_for_scope. 41 tests green, mypy clean, LOC ok.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ONE canonical product profile so the Go gap engine and the Python reasoning
engine stop diverging ("SPS mit Remote Access" means the same everywhere).
gap.ProductProfile LEADS; the reasoning ProductProfile becomes an adapter/DTO.
Types + mappers only — no regulation logic, no Go changes, no UI, no new questions.
- CanonicalProductRegulatoryProfile mirrors gap.ProductProfile + the Navigator
gaps the audit found: economic-operator role, radio_module, generates_usage_data,
lifecycle_phase, structured BOM (ProductComponent), safety-vs-security split,
machine-vs-component + a forward-looking EnvironmentalImpact domain (wastewater/
air/chemicals triggers — fields only, no rules yet).
- Mappers: from_product_wizard (lossless), from_company_profile (prefill incl.
the machineBuilder block), to_gap_profile (emits the unchanged gap JSON shape),
to_reasoning_profile (projects into the reasoning ProductProfile; AI stays
delegated to ai-act/ucca). Only profile->reasoning is coupled; reasoning stays
hermetic.
- 10 tests = the 10 acceptance criteria incl. ProductWizard round-trip lossless,
markets no longer forced ['EU'], and canonical->reasoning->discover_scope
proving one semantic profile drives the engine. 33 tests green, mypy clean.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Architecture-validation finding: the implementation mode produced compliance-
flavored output ("teilweise erfüllt", "covered") from a mere customer claim,
blurring the line to the Execution layer. This is a design decision, not a text
fix — the reasoning layer judges only the customer's STATEMENT, never conformity.
- CoverageStatus -> ClaimCoverage; values are claim-relative + carry "potential":
potentially_addresses / partially_addresses / does_not_address /
insufficient_information.
- ImplementationAssessment -> ClaimObligationMapping (coverage_status ->
claim_coverage); ImplementationResponse -> ImplementationReasoningResponse
(assessments -> mappings, + explicit `disclaimer`); request renamed; engine
entry assess_implementation -> reason_implementation_claim.
- Endpoint /reasoning/implementation-assessment -> /reasoning/implementation-reasoning.
- Summary/explanations reworded: "adressiert wahrscheinlich N Pflichten … für
eine Bewertung der tatsächlichen Umsetzung sind Nachweise erforderlich (keine
Konformitätsaussage)". No "erfüllt"/"abgedeckt" leaks.
- New guard test asserts no compliance verdict leaks (no "erfüllt"; disclaimer
separates ClaimCoverage from ComplianceStatus). 23 tests green, mypy clean.
Discovery (scope/obligations) was already structurally claim-free and unaffected.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Deterministic reasoning layer ON TOP of the Legal Knowledge Graph (obligation
registry) and the Compliance Execution Graph (control mapping/evidence). Answers
which regulations apply to a concrete product, which obligations follow, whether
the customer's implementation covers them, and whether a customer interpretation
is too narrow/broad/plausible.
- ProductProfile with tri-state facts (Optional[bool]=None => uncertain, never
false security); safe predicate evaluator (no eval).
- 6 regulation triggers (CRA/MaschinenVO/RED/EMV/DataAct/NIS2) with missing-fact
prompts; 24 obligation scope rules.
- CRA obligation_ids RE-USED verbatim from the registry (93 ids) — never re-minted
(control_uuid trap); Machine/Data-Act flagged proposed=True.
- required_evidence constrained to the framework-agnostic shared evidence catalog;
capabilities echo the planned Obligation->Capability layer.
- Overlap groups (CRA<->MaschinenVO cyber-safety) + evidence-for-multiple (USP).
- 4 endpoints POST /reasoning/{scope,obligations,implementation-assessment,
interpretation-assessment}; thin handlers, registered in api/__init__.py.
- 22 tests (5 machine-builder scenarios + 10 acceptance questions). No DB
migration, no RAG, no new controls.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reichert die Obligation-Shadow-Telemetrie um zwei Felder an für die Cross-Firmen-
Auswertung: met_count (abgedeckte Obligations) + recall_limited_obligations (welche
Obligations recall-limitiert sind) — erlaubt die Konzentrations-Analyse über Firmen.
7-Firmen-Shadow: 136 Control-Findings → 29 Obligation-Findings (4,7×); recall_limited
nur 6/29, konzentriert auf third_country/safeguards in 2/7 Firmen → LLM-Fix bounded.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
gpt-oss-120b is a reasoning model: it spends output tokens on chain-of-thought
before the answer. deep_check called _call_ovh with max_tokens=400, which
length-capped it mid-reasoning -> content=null -> the OVH tier returned nothing
and the cascade always skipped Tier-2. Floor the OVH budget to >=2000, fall back
to reasoning_content when content is null, and raise the client timeout to 90s
for the slower reasoning path.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Generalise "Embedding finds, Claude decides" into the shared Pruefer-Library:
- router.route_and_check dispatches control -> sensor_classification -> Checker.
- build_spec reads sensor_classification (CONTENT/LLM -> judge=haiku, the
validated sufficiency tier; the Qwen-first cascade is disproven for sufficiency).
- LLMChecker gains a Haiku-direct tier (reuses the validated deep_check prompt).
- Cookie Layer-3 now routes through route_and_check instead of bespoke code, so
cookie is the first real router consumer -- proves the architecture end-to-end.
Reproduces the validated result via the shared path: FN 159->14, recall
0.13->0.92, precision 0.89 (vs bespoke 12/0.93/0.90 -- within Haiku noise).
Tests: 10/10 (router dispatch + build_spec + haiku tier + cookie rewire).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The embedding/boost auto-rescue is intentionally optimistic (finds the topic, not
fulfilment) -> 159 FN over-rescues vs Opus-GT (recall 0.13). Layer-3 re-judges
exactly the rescued passes with the validated Haiku judge (cohort
cookie_sufficiency_v1 P0.89/R0.91) -- NOT the Qwen-first cascade (local is
disproven as a sufficiency judge) -- and un-passes them when the obligation is
not concretely met. Gated to the full check (not skip_llm).
Measured (5-firm Opus-GT, engine+L3): FN 159->12, recall 0.13->0.93,
precision 0.96->0.90 (276 rescues corrected). "Embedding finds, Claude decides."
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The cookie agent loaded 100 controls, 11 of which have no COOKIE_POLICY in
applicable_artifacts -- Security/TOM/Audit (PROCESS) or Banner-behaviour
(BEHAVIOR) controls that produce nonsense findings against a cookie policy
(e.g. "TOMs not documented"). Add a cookie classification gate (analogous to the
DSE gate, keyed on COOKIE_POLICY, without the needs_review carve-out since the
artifact signal is decisive and the set is inventory-verified). Controls are
routed out, not deleted. Effect vs Opus-GT: FP 16->11, FN 179->159; the
remaining FN=159 over-rescue is a separate (judge/criteria) question, not routing.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
#1 _machinery_obligations: SET statement_timeout=4s + run_in_threadpool — auf
prod hing die maschinen-Query ~30s (langsame/unindizierte DB nach DB-Swap)
und blockierte den async-Worker. Jetzt: bei Langsamkeit graceful 'keine
Maschinen-Pflichten' statt Hang. (Fehlender prod-Index = Controls/DB-Session.)
#2 parse_grenzen_json: tolerant ggue. ```json-Fences / Prosa-umschlossenem JSON
(gehostete Modelle wie OVH ignorieren z.T. response_format) → Datenblatt-
Extraktion liefert auch ueber den OVH-Fallback Felder.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Hybrid-Extraktion Datenblatt → IACE Grenzen (ISO 12100): deterministischer
Detektor (Schnittstellen/Einheiten per Regex) + lokales 35B via llm_cascade
(Qwen-lokal-first) fuer die semantische Zuordnung auf die echten LimitsFormData-
Keys. Nichts erfinden: Feld nicht im Text → leer + Quellen-Zitat je Feld.
Essenzielle ISO-12100-Felder, die leer bleiben → gezielte Rückfragen
(foreseeable_misuses, person_groups, qualification, temporal_limits …).
Endpoint POST /api/v1/cra/extract-datasheet. 13 Tests gruen (reine Teile).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
3-Tier-MaschinenVO-Verdict (direkt / sicherheitsrelevant / nicht relevant) aus
Personengefährdungs-Signal: eine Komponente ist keine Maschine, aber wenn ihre
Funktion bei Fehler ODER Manipulation Personen gefaehrden kann (Bewegung, Laser/
Auge, Kraft, Temperatur, elektrisch), ist sie sicherheitsrelevant — Pflicht
trifft den Maschinenbauer, Zulieferer liefert Nachweise, und ein Cyber-Angriff
kann die Sicherheitsfunktion aushebeln (Cyber-Safety-Bruecke). OWIS-mit-Laser
landet so korrekt als 'sicherheitsrelevante Komponente'. Engine + /readiness
additiv; Frontend: Gefährdungs-Frage + -Typen, MaschinenVO-Ergebnisblock.
Presets aktualisiert (OWIS: Laser+Bewegung, Zwick: Bewegung). 22 Tests gruen.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reine, deterministische Verdict-Schicht ueber der bestehenden Annex-III/IV-
Klassifikation (kein vierter Klassifizierer): trennt Rechtspflicht von Markt-
Druck. Kern: das Inverkehrbringen (ab 11.12.2027), nicht der Entwicklungs-
zeitpunkt, entscheidet — Bestandsprodukte, die nach der Frist weiter verkauft
werden, fallen unter CRA. Producer-Typen (component/end_device/machine_
integrator/software_app) steuern Default-Annahmen (Anlagenbauer: Vernetzung/OTA
vorausgesetzt) + Verdict-Betonung (Komponente => Markt-Druck). Plus Evidence-
Checkliste (SBOM/VDP/Patch/Lifecycle/Threat-Model/Logging/Auth/Incident) +
Reifegrad. /readiness additiv erweitert (verdict/maturity/digital_elements/
producer_type). 15 Tests gruen. Beispiele: OWIS PS90+, ZwickRoell roboTest.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Liest den Lebenszyklus jedes Befunds (status + tracker_issue_url) aus dem
Scanner zurück und rollt ihn zu einem Management-Bild auf: % erledigt,
4-Phasen (offen/in Arbeit/erledigt/ausgeschlossen), offenes Restrisiko nach
Schweregrad, Fortschritt je CRA-Anforderung und eine Aufgaben-/Ticket-Tabelle
mit Jira-Link. Neuer Endpoint GET/POST /api/v1/cra/progress (dünn → Service
cra_progress, rein deterministisch, kein /assess-Schema-Drift). Frontend:
ProgressView in Ebene 1 (CRACyberView), live je Scanner-Repo, sonst Demo-Status.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Jede Normreferenz einer Maßnahme wird lizenzklassifiziert (eu_law /
public_domain / open / paid_reference) — paid-reference-Normen werden nur als
Verweis geführt, nie im Text gespeichert (idea/expression). Kuratierte
Maßnahmen tragen Tier 'core', KI-/Fallback-Maßnahmen 'review' (indikativ).
Frontend zeigt Quellen-Badges + "indikativ"-Kennzeichnung. Methodik in
docs-src/development/mapping-methodology.md (Szenario C, Due-Diligence).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
(a) Labels: Module korrekt zugeordnet — Modul A = Selbstbewertung, Modul B+C =
benannte Stelle, EUCC = eigenes Zertifikat (nicht Modul H), "harmonisierte
Norm" ist kein Modul sondern Konformitätsvermutung. Für den CRA noch KEINE
harmonisierte Norm veröffentlicht → Kachel als "noch nicht verfügbar"
(erwartet ~2027), nicht wählbar, mit Hinweis. (page/path/documents-Labels.)
(b) Gating: wichtige Klasse II + kritische Produkte dürfen NICHT selbst bewerten;
harmonisierte Norm allein genügt dort nicht → ALLOWED_PATHS IMPORTANT_II/
CRITICAL = {eucc, notified_body}; DEFAULT_FOR II = notified_body. _PATH_HINT
entsprechend. Regressionstest test_cra_conformity_paths.py.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- _ATOM_LIST_SQL via LATERAL: zusaetzlich cpl.source_article (Gesetzes-Artikel)
im atom-grain Response. Spalte control_parent_links.source_article verifiziert
(macmini + prod).
- Registry-Mapper-Test (neue Domaenen) nach compliance/tests/ verschoben — CI
faehrt compliance/tests/, nicht tests/; schliesst die CI-Luecke der
6-neue-Use-Cases-Erweiterung.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Frontend (CRA/Cyber-Tab):
- Erklär-Zwischensätze je Ebene (Befund -> CRA-Anforderung -> Best-Practice-
Standard -> Maßnahmen) + "So liest du einen Befund"-Legende.
- Kuratierte M-Maßnahmen und atom-grain "Regulatorische Breite" in EINE Sektion
"Maßnahmen (wählbar)" zusammengeführt (statt zwei konkurrierender Listen).
- Standalone "Empfohlene Maßnahmen (Sollzustand)" entfernt (jetzt je Befund).
Backend:
- Atom-Controls-Query liefert jetzt cpl.source_article (Artikel/Anhang/Erwägungs-
grund-Anker) zusätzlich zu source_regulation; via LATERAL-Join.
- enrich_findings_with_breadth trägt source_article in regulatory_breadth.
- Daten waren schon ingestiert (682/691 CRA-Atome haben source_article) — wurden
nur nicht selektiert/angezeigt.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Follow-up to the machinery_reg_cyber.py removal: the readiness endpoint now pulls
Machinery Regulation 2023/1230 cyber-with-safety obligations from the shared
Controls-API (use_case=maschinen), tagged "Maschinen-VO", best-effort.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Machine/plant builders are hit by BOTH the CRA and the new Machinery Regulation.
New machinery_reg_cyber.py models its two well-corroborated Annex III cyber-with-
safety essential requirements (1.1.9 protection against corruption, 1.2.1 control-
system safety incl. foreseeable manipulation) in our own words; EU legal text is
freely reusable (Commission Decision 2011/833/EU, source acknowledged), harmonised
standards referenced by identifier only. The readiness check asks "is it
machinery?" and, if so, adds these obligations tagged "Maschinen-VO" alongside the
CRA ones — the combination is visible (regulations list + per-item source badge).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
For hardware CE projects (no repo) each networked component (controller/hmi/
gateway/drive/remote_access/sensor) yields typical ICS vulnerability CLASSES
(real CWE + "CISA-ICS — product-specific check" framing, NO fabricated CVEs);
they flow through the same CRA engine. /assess accepts components[]. MappedFinding
now echoes title/location/cwe so the response is self-contained for any finding
source. Live CISA-ICS/NVD per-product CVE lookup is the later enrichment.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Semantic breadth (2): each finding's CRA-AI is mapped to a network_security
sub_topic and enriched with atom-grain, framework-traceable obligations from the
shared Controls-API (compliance.atom_classification) — at the endpoint/view layer
(SessionLocal), NOT in the pure mapper. CRA-AI anchor + curated measure +
NIST/OWASP crosswalk stay the lead; this is breadth + source evidence. Only
network_security is queried (atom-grain), scoped by sub_topic + limit. Frontend
renders it under the collapsible best-practice depth (control_id · title · source).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>