Commit Graph

605 Commits

Author SHA1 Message Date
Benjamin Admin 86a783e72f docs(spec): Journey model PROPOSAL — validate "transition is the unit, rest are renderers"
Transition Pattern, Playbook and Reference Scenario describe the same transition from different
angles. The hypothesis: a single underlying object, the Journey, is the unit of knowledge; everything
else is a renderer ("rendered, not modeled"). Per the user's request this is a SPECIFICATION to
validate the assumption BEFORE any code — not a runtime module, not new architecture, decision pending.

- Conceptual Journey schema: identity (from->to) + source_variants + likely_covered[] + delta[] +
  rejected_assumptions; questions/measures/evidence/verification are DERIVED. Truth hierarchy:
  Atomic Requirement -> Capability -> Journey (x Company context on instantiation).
- Renderers: Transition Pattern (= the curated Journey core), Interview, Roadmap, Reference Scenario,
  Evidence view, role views (Sales/Auditor/Developer/GF) — four views, one Journey.
- GROUNDED validation against the real ISO 27001 -> CRA artifacts (TP-ISO27001-CRA-v1, the
  sbom/CVD playbooks, RTS-001/002/003). Honest finding: the unification HOLDS with two refinements
  that AVOID premature abstraction:
  1. Reference Scenario = Journey x Company context + expected outcome (no duplicated transition data).
  2. Playbook = a CAPABILITY renderer (reused across journeys), NOT a Journey renderer — stays
     capability-owned (ADR-004); the Journey aggregates, it does not own.
  => a transition is curated exactly once; ISO9001->MaschinenVO becomes one object, not four.
- Proposes the principle "rendered, not modeled" alongside computed-not-stored / derived-not-curated.

No code, no runtime change (Freeze). Non-runtime doc -> no deploy (ADR-001). If accepted -> ADR-011.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-28 07:14:15 +02:00
Benjamin Admin 18f5d0cb05 feat(programs): Operational Knowledge — the transition is the unit + Transition Coverage KPI
Customers don't buy "EMV domain"; they buy "we have ISO 9001, help us with the CRA". The sellable
unit of knowledge is the TRANSITION (from -> to), not the law and not the capability. This reframes
the backlog from "model EMV next" to "the top demanded transitions". No new runtime framework (ADR-010).

- knowledge/programs/transitions.yaml: the Operational Knowledge backlog — the ~20-30 actually demanded
  transitions (of ~N*(N-1) possible) with priority. ISO27001->CRA, ISO9001->CRA, ISO9001->MaschinenVO
  (all 5-star), IEC62443->CRA, TISAX->CRA, ISO27001/IEC62443->NIS2, ISO14001->Umweltrecht.
- Transition Coverage KPI (reference suite, computed-not-stored): per transition a status DERIVED from
  the transition-pattern corpus (reviewed/validated/proven -> Gold, draft -> 🟡, none -> ). Honest
  current state: ISO27001->CRA  reviewed, ISO9001->CRA 🟡 draft, rest . Highest-priority gap =
  ISO9001->MaschinenVO (the next Track-B work) — a far stronger product indicator than "EMV 30% modelled".
- Three knowledge layers documented: Regulatory -> Operational (transitions/playbooks/deltas, the
  biggest differentiator) -> Verification (Vision V2). A domain is a TRANSITION PROGRAM with two tracks:
  Track A breadth (model sources, @Legal-KG/@Execution) + Track B product (transitions/playbooks/RTS
  per source, @Reasoning).
- ADR-010: the transition is the unit of knowledge; Transition Coverage KPI; three layers; two tracks.

10 program/transition-contract tests, check-loc 0. Knowledge data + ADR + reference harness =
non-runtime -> no deploy (ADR-001). No new module, no runtime change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-27 23:48:45 +02:00
Benjamin Admin 1a9439d013 feat(programs): open Domain Knowledge Program v1 — 7-stage production line + per-domain KPI
The real bottleneck is domain MODELLING. Phase B is organized as one program with sub-programs per
domain, each run through the SAME 7-stage production line. No new runtime framework, no new module
(ADR-009, Freeze v1.0) — only program data + a derived reporting view.

- Customer enters by INDUSTRY, not regulation: Industry -> Domain Model -> Requirement Sources ->
  Requirements -> Capabilities -> ... -> Completeness.
- 7-stage checklist identical for every domain (Domain Model / Requirement Sources / Capability
  Registry / Transition Patterns / Playbooks / Reference Scenarios / Completeness) with per-stage
  ownership. README generalized to the framework.
- Each domain lists typical_requirement_sources + typical_certifications -> pre-onboarding capability
  HYPOTHESIS (the ETO insight; feeds Company 2A as inferred, never confirmed).
- Backlog v1 (by customer value): 1 Industrial Automation, 2 Environmental, 3 Automotive, 4 Medical,
  5 Energy. Five domain-definition shells (environmental restructured to the unified shape, law-first
  preserved).
- Per-domain KPI is DERIVED from the real corpus (computed-not-stored; sources modelled / transition
  patterns / playbooks / reference scenarios), NOT a curated number. Reference suite renders maturity
  bars: Industrial Automation 43% (3/7 sources) leads, Environmental 0% (work ahead). Backlog (value)
  and KPI (corpus state) are deliberately separated.
- ADR-009: Domain Knowledge Program framework. Honest known refinement: regulation-ID normalization
  (CRA vs Cyber Resilience Act) aliased in the KPI.

7 program-contract tests (backlog order + industry-first + derived-not-stored), check-loc 0.
Knowledge data + ADR + reference harness = non-runtime -> no deploy (ADR-001).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-27 18:49:06 +02:00
Benjamin Admin 9c02c2c4a2 feat(programs): start the Environmental Knowledge Program — domains, not architecture
The architecture is stable; from here the value comes from DOMAINS, not more software. Phase B is
organized as law-first Domain Knowledge Programs, each delivering the same production line: Corpus ->
Obligations -> Capabilities -> Transition Patterns -> Playbooks -> Reference Scenarios -> Completeness.
No new runtime framework (Freeze v1.0).

- knowledge/programs/README.md: reusable Domain Program blueprint (production line, per-stage ownership,
  law-first ordering, planned programs Environmental/Automotive/IEC62443/Functional-Safety).
- knowledge/programs/environmental.yaml: the Environmental domain as DATA. Law-first: B1 Environmental
  Regulatory Corpus (water/chemicals/emissions/energy/waste/product-responsibility — law + obligations
  only) -> B2 Capability Model -> B3 Transition Patterns (ISO 14001 -> corpus, built LAST). ISO 14001
  is a source state, NOT the domain.
- Ownership handoffs: B1 -> Legal Knowledge, B2 -> Compliance Execution, B3+/playbooks/reference ->
  Reasoning. Coordinate via the board; no session builds another's artifacts.
- reference suite: "Domain Knowledge Programs" section renders the program stages + a measurable
  Completeness baseline (6 areas, 0 assessed today) that flips automatically as stages land.
- ADR-008: from architecture to domains; Phase B as law-first programs; architecture frozen.

6 program-contract tests (law-first order + ownership pinned), check-loc 0. Knowledge data + ADR +
reference harness = non-runtime -> no deploy (ADR-001). No new module, no runtime change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-27 14:36:03 +02:00
Benjamin Admin aa99111a87 feat(completeness): Regulatory Completeness Engine — auditable coverage, not confidence
Phase A½. The move from feature to product development: for every assessment, answer "how sure are
we that this answer is COMPLETE?" — different from confidence. The product never claims full coverage;
it makes its own knowledge state transparent and auditable. Shows what we do NOT know and why.

- compliance/completeness/: assess_completeness(identified, corpus_status, uncertain, assumptions,
  assessed_obligations) -> CompletenessReport. Separates IDENTIFIED from ASSESSED (validated corpus
  AND determined applicability) and justifies every gap. Two kinds of open: corpus gap (future_corpus)
  and applicability uncertainty (query_required + deciding question, e.g. Data Act / generates_usage_data).
- The metric is COUNTS, never a single percentage: "Identifiziert N · bewertet M · offen K ·
  Unsicherheiten U · Begründung ja" + an honest audit statement.
- ADR-007: auditable honesty; phase order A factory -> A½ Completeness -> B new domains; the
  transparency selling point. Deterministic, no LLM; corpus status + obligation count injected.
- reference suite: "Regulatory Completeness" section runs an industrial-dishwasher assessment
  (assessed CRA/MaschinenVO; open EMV/Environmental=future_corpus, Data Act=query_required) and notes
  Environmental flips open->validated automatically once the corpus lands.

11 completeness tests (54 with adjacent modules), mypy --strict clean (15 files), check-loc 0.
Product code with no app caller + ADR/reference = non-runtime -> no deploy (ADR-001). Freeze-safe.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-27 14:16:12 +02:00
Benjamin Admin 07e392913f feat(knowledge-intake): classify a document + assess its impact before extraction
Phase A1. The real knowledge production is not writing — it is TARGETED UPDATING: when 20 documents
arrive, which 5 change our knowledge and which 15 are ignorable? Before the parser, Knowledge Intake
classifies a new document (no content extraction) and intersects its signals with an index of the
existing knowledge to emit a Knowledge Package (an impact analysis).

- compliance/knowledge_intake/: build_knowledge_index(patterns, playbooks, reference_scenarios,
  obligation_index) + assess_document_impact(descriptor, index) -> KnowledgePackage. Deterministic,
  NO content extraction, NO LLM. Surfaces affected capabilities / playbooks / transition patterns /
  reference scenarios / (injected) obligations, whether it is a new domain, and a triage level
  (HIGH / LOW / NONE / NEW_DOMAIN) with a recommendation.
- ADR-006: Knowledge Intake = classify + impact before extraction; full factory Intake -> Package ->
  Parser -> Draft -> Review -> Published; phase order A1 Intake / A2 Draft / A3 Review.
- reference suite: "Knowledge Intake" section triages 3 example documents (CRA SBOM-FAQ -> high,
  14C/2PB/3RTS/2Obl; environmental guidance -> new_domain; marketing blog -> ignorable). Section
  lives in _helpers.py to keep generate.py under the 500-LOC budget.
- Honest known refinement surfaced by intake: regulation-ID normalization (CRA vs Cyber Resilience Act).

10 intake tests (60 with the adjacent modules), mypy --strict clean (16 files), check-loc 0.
Product code with no app caller + ADR/reference = non-runtime -> no deploy (ADR-001). Freeze-safe.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-27 13:58:59 +02:00
Benjamin Admin b6cfc0a503 feat(knowledge-production): Playbook Draft Generator — prepare the corpus deterministically
The bottleneck is not content, it is knowledge PRODUCTION. Instead of writing 200 playbooks by
hand, generate drafts deterministically from data the software already owns, then have an expert
review them. Mirrors the legal pipeline (Gesetz -> Parser -> Obligation -> Review) for BreakPilot's
own knowledge: new Capability -> Registry -> Transition Pattern -> Playbook Draft Generator ->
Expert Review -> versioned Playbook.

- compliance/knowledge_production/: generate_playbook_draft(capability, requirement, control_links)
  + drafts_from_pattern(pattern) -> one PlaybookDraft per delta capability. Owned fields (why /
  closes_regulations / expected_evidence / typical_controls) are assembled with per-field provenance;
  the practitioner know-how (tools / process_steps / how_others) is left as an explicit TODO.
- DraftStatus lifecycle (Freigabestatus): draft_generated -> in_review -> reviewed -> validated ->
  proven. Deterministic, NO LLM in the core (any model enrichment stays offline/advisory/propose-only).
- ADR-005: extends "the engine does not change, the corpus grows" with "and the corpus is not written
  by hand — it is deterministically prepared, then curated".
- reference suite: "Knowledge Production" section turns the convergence pattern into 12 auto-assembled
  drafts (why/closes/evidence filled, tools/steps TODO) -> review 12 drafts, don't write 12 playbooks.

10 tests (50 with playbook/optimization/transition/company), mypy --strict clean, check-loc 0.
Product code with no app caller + ADR/reference = non-runtime -> no deploy (ADR-001). Freeze-safe.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-27 13:31:31 +02:00
Benjamin Admin 78f0ffa9de feat(playbook): Implementation Playbooks — the Berater renderer ("wie komme ich dort hin?")
Roadmap item 4. After WHAT applies / WHAT is missing / WHICH first, the GF asks HOW. The
Implementation Playbook renders, for one capability, the full journey — why / which regulations
it closes / tools / process / evidence / controls — and chains the Optimization Roadmap into
per-measure playbooks. Another renderer over the same Capability spine (ADR-003/004), not a new
engine: ~95% of the data already exists, it just needs a different rendering.

- compliance/playbook/: build_playbook() + playbooks_for_plan() (chains optimization -> playbook,
  acyclic; reuses leverage for "closes which regulations"). Capabilities without curated content
  render as honest status:missing stubs — the content-owed signal.
- knowledge/implementation_playbooks/: curated knowledge layer (Reasoning Knowledge Acquisition),
  two deep expert drafts (SBOM, CVD/PSIRT, status draft, expert-draft-not-normative) + README.
  The bottleneck is now CONTENT, not software; Playbook (own knowledge) != regulatory domain.
- ADR-004: Implementation Playbooks = renderer + knowledge layer; content is the bottleneck.
- reference suite: "Implementation Playbook" section renders the SBOM journey + Roadmap->Playbook
  table (high-leverage caps flagged "fehlt (Inhalt)" — content backlog, highest leverage first).
- refactor: extracted markdown helpers to reference_scenarios/_helpers.py to keep generate.py
  under the 500-LOC budget.

9 playbook tests (40 with optimization+transition+company), mypy --strict clean, check-loc 0.
Product code with no app caller + knowledge/ADR/reference = non-runtime -> no deploy (ADR-001).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-27 10:38:13 +02:00
Benjamin Admin cfafa31ea2 feat(optimization): Regulatory Optimization — Roadmap/Management renderer over the Capability Delta
Roadmap item 5. GAP analysis and measure-prioritisation are the SAME computation: Required −
Known = the Capability Delta. The Capability Delta Engine (RS-005) computes it once; renderers
read that ONE delta. Interview Renderer (missing info → questions) was already built; this adds
the Roadmap/Management Renderer (missing capabilities → measures ranked by regulatory leverage).

- compliance/optimization/: regulatory_leverage() + select_within_budget() (pure leverage math)
  + roadmap_from_delta(assessment, ...) — the keystone binding optimization to the RS-005 delta
  (dependency optimization → transition_reasoning, acyclic; the delta engine stays hermetic).
  leverage(measure) = number of regulatory requirements it closes at once (e.g. patch management
  → CRA+MaschinenVO+IEC62443+ISO27001 = 4). No new corpus, no new meta-model class (freeze v1.0).
- Welt-1 honesty: percentages are exact count ratios over the IDENTIFIED requirements (the known
  delta), never "% gesetzeskonform".
- reference suite: "Regulatory Optimization" section runs the SAME convergence delta → ranked
  measures + budget answer + the management sentence "of N identified requirements you close M
  with the top-K measures (X%) — highest regulatory leverage".
- ADR-003: Capability Delta Engine — one delta, many renderers; rename Gap → Capability Delta.

13 optimization tests (31 with transition+company), mypy --strict clean, check-loc 0.
Product code with no app caller + ADR/reference = non-runtime → no deploy (ADR-001).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-27 09:49:38 +02:00
Benjamin Admin a0f72fc39b feat(rts): extend Reference Transition Scenarios to multi-regulation (CRA + MaschinenVO)
Roadmap item 2: the RTS now pin MaschinenVO + convergence Expected Outcomes, so the
convergence USP is a living regression, not just a one-off section.

- RTS-003 (machine + ISMS, networked): full multi-regulation archetype — maschinenvo
  expected_delta + convergence expected_multi_target (links TP-ISO27001-CRA-MaschinenVO-v1).
  Generator runs the convergence pattern through RS-005: 4/4 machine-safety delta MISSING +
  4/4 expected multi-target caps converge. PASS.
- RTS-001 (component): MaschinenVO modeled as `uncertain` (a pure component is usually not a
  machine; deciding question is_safety_component) — engine must never assert it applies. Honest,
  parallel to the Data-Act handling.
- RTS-002 (machine, QMS-only): MaschinenVO `applies` (is_machine) but LOW convergence — no ISMS
  means the cyber side is entirely delta, so few caps are shared. The honest contrast that the
  convergence USP rewards companies who already run an ISMS.
- generator: per-RTS maschinenvo/convergence Soll-Ist checks; convergence pattern run once and
  reused. Data Act stays `uncertain` everywhere, never asserted.

All 3 RTS PASS. 18 tests (transition+company), mypy --strict clean, check-loc 0.
Non-runtime (knowledge + reference harness) -> no deploy (ADR-001).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-27 09:26:01 +02:00
Benjamin Admin 66be23f0c4 feat(convergence): first Regulatory Convergence Pattern (ISO27001 -> CRA + MaschinenVO)
The first multi-regulation pattern: each capability declares `covers_targets`, so we
can answer the convergence USP — "which capability satisfies CRA AND MaschinenVO at once?"

- knowledge: transition_pattern_iso27001_to_cra_maschinenvo_v1.yaml (pattern_type:
  regulatory_convergence, status draft). The cyber-safety bridge = MaschinenVO Annex III
  1.1.9 "protection against corruption" overlapping CRA integrity. 4 convergence
  capabilities cover BOTH; 5 CRA-only; 3 MaschinenVO-only.
- product: compliance/transition_reasoning/convergence.py — regulatory_convergence()
  pure/deterministic/computed-not-stored, no new graph/class (freeze v1.0 untouched).
  No app caller yet -> non-runtime, no deploy (ADR-001).
- reference suite: Cross-Regulation Capability Mapping section renders the customer
  sentence "von N neuen Massnahmen erfuellen M gleichzeitig CRA und MaschinenVO".
- README: term -> Regulatory Transition / Convergence Pattern; covers_targets documented.
- tests: test_regulatory_convergence (18 transition+company pass), mypy --strict clean.

Curated expert knowledge, AI first draft (L1/draft) — Annex/Article refs indicative,
review_required by a machinery-safety expert.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-27 09:12:30 +02:00
Benjamin Admin f78e03bd0a docs(knowledge): Reference Transition Scenarios (RTS-001..003) + ISO9001->CRA pattern
Three ANONYMIZED reference transition scenarios (no real company names stored) = canonical
regression scenarios that test the KNOWLEDGE, not just the engine. Each pins an Expected
Outcome (expected_likely_covered + expected_delta); every commit must reproduce it (identical
or better).

- RTS-001 automotive supplier (TISAX+ISO27001) -> CRA: mature ISMS, standard CRA delta.
- RTS-002 classic machine builder (ISO9001) -> CRA: only process discipline -> MUCH larger delta
  (10 missing vs 3 covered). New TP-ISO9001-CRA-v1 pattern (different shape).
- RTS-003 networked machine builder (ISMS) -> CRA: highlights the Data Act.

Data Act is modelled as UNCERTAIN (a hypothesis), never a fixed gilt/gilt-nicht: the generator
checks the engine SURFACES the uncertainty + the deciding question (generates_usage_data) and
never wrongly ASSERTS applicability. All three RTS PASS.

Non-runtime knowledge + reference harness -> no deploy (ADR-001). Names deliberately absent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-27 08:46:20 +02:00
Benjamin Admin 0da093c046 docs(knowledge): TKP 4-level lifecycle + 3 enrichments + ISMS->TISAX (genericity proof)
Transition KNOWLEDGE Patterns (renamed term -- curated knowledge, not an algorithm):
- 4 maturity levels: draft -> reviewed -> validated (domain expert) -> proven (field). "approved"
  dropped; target is validated. TP-ISO27001-CRA set to reviewed (L2).
- 3 enrichments per pattern: confidence_source: relationship (curated, not an LLM estimate ->
  computed-not-stored); why_asked (customer-facing: why the source does not suffice here); dropped_if
  (what makes the question unnecessary). Applied to TP-ISO27001-CRA.
- New TP-ISMS-TISAX (draft): different character -- info-security module mostly covered; delta is
  automotive-specific (prototype protection, TISAX labels, VDA ISA self-assessment, ENX assessment,
  Art. 28 data protection). Proves the architecture is GENERIC, not CRA-tailored.
- Reference scenario 4 generalized to loop over ALL patterns through RS-005: both carried (CRA
  17->17, TISAX 13->13) -> a living genericity + regression test for every future pattern.

Non-runtime knowledge + reference harness -> no deploy (ADR-001). Next: ISO9001->IATF16949.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-27 08:29:30 +02:00
Benjamin Admin 4bfd552da7 docs(knowledge): TP-ISO27001->CRA gold standard + reference scenario (RS-005 regression)
(1) Harden the first Transition Pattern to the gold-standard template per quality checklist:
versioned transition_goal (ISO27001:2022 -> CRA, applies 2027-12-11), source_state_variants
(certified/isms_introduced/expired/limited_scope), each likely_covered assumption with a typed
relationship (supports|partially_supports, never equivalent) + verification + rationale (the Warum)
+ an auditor-checkable reviewable_claim, delta as missing-capability + needed-info, an explicit
rejected_assumptions section, and a determinism_goal. README schema updated to match.

(2) New Reference-Suite scenario 4 (Transition): the generator READS the pattern YAML and runs it
through the RS-005 Planning Engine + Company 2A -> coverage + question requests. Proves the
architecture fully carries the pattern (17 caps -> 17 coverage + 17 requests; 9 HIGH delta = the
real CRA gaps, 8 probably-covered from the ISMS). Now a living regression test: every future pattern
runs through the same engine.

Non-runtime knowledge + reference harness -> no deploy (ADR-001). Next: ISMS->TISAX once approved.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-27 08:11:42 +02:00
Benjamin Admin bea8559f78 docs(knowledge): first Transition Pattern ISO27001 -> CRA (curated knowledge base)
Reasoning session's new Knowledge Acquisition responsibility (re-charter): build and curate
the Transition Knowledge Base under backend-compliance/knowledge/transition_patterns/ (beside
reasoning/, not under it -- it is knowledge, not an engine).

First professional pattern TP-ISO27001-CRA-v1 (status: draft): separates what a mature ISMS
likely covers at the ORG level (probably_covered, needs product-level confirmation, never
auto-"erfuellt") from the CRA-specific delta with no ISO 27001 analogue (SBOM, support period +
secure signed updates, coordinated vulnerability disclosure, Art. 14 authority reporting,
product cyber risk assessment, CE conformity / technical documentation). Expert draft, not a
normative proof; review_required before customer use.

Non-runtime knowledge -> no deploy (ADR-001). Next: ISMS->TISAX.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-27 07:50:42 +02:00
Benjamin Admin 77de7e794c feat(transition): Transition Reasoning v0 (RS-005) — Transition Planning Engine
Second reasoning mode, scope per user: the engine owns the INFORMATION GAPS, not the
questions. assess_transition(context, target_requirements, company_profile) emits
ranked TransitionQuestionRequest {capability, control, reason, question_intent,
expected_evidence, priority, information_gain} -- NOT rendered question text. Rendering
(intent+subject->sentence) is a separate swappable layer (RS-005.1), not here.

Consumes the Company Capability Profile (2A) as "have" + injected TargetRequirement
(Execution-owned placeholder) as "required" -- no required-capability data in product
code (EMPTY_REQUIREMENTS, mocks only in tests). A certification-derived capability is
probably_covered (Welt 1) -> a confirmation request, never already_covered/"erfuellt".
Deterministic, computed-not-stored, no percentages.

Activates 2A/2C/RCI (first consumer of the Company profile). Freeze-respecting: additive
package, no new graph/base class/meta-model class. 9 tests, mypy --strict clean, LOC ok.
No endpoint/UI/RAG; question rendering deliberately deferred to RS-005.1.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-27 07:31:11 +02:00
Benjamin Admin 16371f2909 feat(reference): Reference Scenario Suite v1 (living regression reference, not docs)
Three real customer scenarios driven through the DEPLOYED engines (scope/map/
interpretation, RCI, company 2A, capability registry). Each scenario emits an
Architecture Coverage table DERIVED from the real run, so cells flip automatically
as domains land (e.g. Sz2/Environmental UNSUPPORTED -> PASS). The roll-up answers
"is BreakPilot better than six months ago" by real customer situations, not LOC.

Gaps captured as epics (NOT implemented): RS-001 Interpretation Pattern Library,
RS-002 Environmental Corpus, RS-003 Capability Linking (cap<->MCAP) + Company-Gap,
RS-004 MaschinenVO/EMV Registry Linking.

reference_scenarios/generate.py = reproducible source (ruff/mypy-exempt, NOT product
code, not imported by the app); reference_scenario_suite_v1.md = generated artifact.
No new product code; CRA patterns deliberately NOT built — the suite is now the measure.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-26 22:48:27 +02:00
Benjamin Admin 6ccc6c87c1 feat(capability): Master Capability Registry v0 (Phase 2C, Compliance Execution domain)
Third instance of the identity-machine pattern (after Master Controls and Master
Obligations). New compliance/capability/ package: MasterCapability with stable MCAP
ids, CapabilityCandidate minting, seven typed relation types, a VERSIONED derivation
policy, and identity lifecycle (merge/split/deprecate/redirect with provenance).

Stored: identities, sources, relationship types, policy versions, lifecycle events,
provenance. Derived (never stored): confidence/status via evaluate_relation under a
policy version. Hard rule (structurally guarded): a certification alone can never
yield CONFIRMED — only CONFIRMS + concrete artifact (or expert) does.

Built from the Reasoning session per user directive but this IS the Compliance
Execution model (Execution owns Capability) — handed off via the board. Metadata-first:
CapabilityRelation is registry metadata, NOT a new meta-model class (freeze v1.0
untouched). No Company-Gap, no real ISO/cert mappings, no UI/RAG, no generic
canonicalization engine. 11 tests; mypy --strict clean; LOC ok.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-26 21:35:12 +02:00
Benjamin Admin 8c893ca783 feat(company): Company Intelligence 2A — Company Capability Profile foundation
HEAD of the spine Company->Capability->Product->Regulation->Obligation->Procedure
->Evidence. New compliance/company/ package: CompanyContext container + a four-state
trust model (declared/inferred/confirmed/unknown).

Hard rule (structural): a certification yields at most an INFERRED candidate and is
never auto-treated as CONFIRMED/"erfuellt". A certification produces evidence-of-
capability; only real ExistingEvidence promotes a capability to CONFIRMED.

Ownership: Reasoning owns the container + trust-state; the Certification->Capability
mapping is Execution's domain, consumed via an injected contract. No mapping data in
product code (tests inject mocks). No endpoint/UI/RAG/new regs/controls; no meta-model
classes (freeze v1.0 untouched). 8 tests; mypy --strict clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-26 14:59:42 +02:00
Benjamin Admin a5687bbc65 feat(rci): Regulatory Change Intelligence foundation (delta over the stored map)
RCI/Delta as a read-/reasoning layer ON TOP of the product-first pipeline. Answers
"what changes relative to my existing Regulatory Map?" — NOT "what does the new
law say in general". No UI, no ingestion (newsletter/mailbox), no RAG, no new
regulations/controls, no legal evaluation outside the stored map.

- 4 core objects (compliance/rci/schemas.py): ComplianceBaseline (snapshot of
  profile + map + registry obligations + required/present evidence), RegulatoryChange
  (simulated/provided INPUT), ObligationDelta (delta_type NEW|CHANGED|REMOVED|
  ALREADY_COVERED|NEEDS_REVIEW|NOT_APPLICABLE), ChangeImpactSummary. delta_type is a
  THIRD vocabulary, disjoint from ClaimCoverage (Welt 1) and ComplianceStatus (Welt 2).
- create_baseline() snapshots the existing pipeline once; assess_change() computes
  deltas deterministically against the snapshot (no re-evaluation).
- 12 tests = the 5 acceptance questions (affects product? new/changed? already
  covered by evidence? needs human review? not relevant?) + repeal/uncertain-reg/
  missing-evidence/boundary. Existing pipeline tests stay green; mypy clean; LOC ok.
- App/reasoning types only — no compliance-meta-model classes (freeze v1.0 untouched).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-26 13:45:23 +02:00
Benjamin Admin 50ae9e94d1 feat(interpretation-in-map): judge a customer interpretation within the map (step 5)
Thin adapter — it judges the customer's reading WITHIN the already-built
RegulatoryMap, it does not assess abstract legal questions and it is not RCI.

- Reuses the existing assess_interpretation (no new legal reasoning); the 6
  verdicts (plausible/too_narrow/too_broad/partially_correct/unsupported/uncertain)
  pass through unchanged.
- Restricts affected_regulations/affected_obligations to those present in the map
  (intersection); links to the map's uncertain regulations.
- Touched unsupported domains (wastewater/chemicals/...) are reported as
  future_corpus_domains (future_corpus_needed) — never pseudo-evaluated.
- Customer-readable explanation ("Ihre Interpretation ist wahrscheinlich zu eng. …
  Betroffen in Ihrer Map: CRA.").
- POST /reasoning/interpretation-in-map (renders the map, then interprets).
- 7 tests; 63 green (existing reasoning MVP stays green), mypy clean, LOC ok.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-26 10:58:00 +02:00
Benjamin Admin 9312ad18ef feat(regulatory-map): customer-readable read-model over the scope (step 4)
The Map Renderer explains the engine's state, it does not extend it. Pure
composition of resolve_product_scope (scope verdict) + derive_obligations
(registry-linked obligations + overlaps) into one RegulatoryMap.

- product_summary, trigger_facts, applicable/uncertain/excluded regulations,
  unsupported_domains, overlaps (shared_obligations), shared_evidence, and a
  customer-readable executive_summary.
- No own legal decisions: applicable/uncertain mirror the scope verdict exactly.
- Obligations shown ONLY when registry-linkable (registry_anchor) — MaschinenVO/
  EMV obligations are proposed, so they render empty + a note, never as linked.
  Overlaps/shared_evidence likewise filtered to registry-linked members.
- Uncertain regulations link to the navigator question that would resolve them
  (RED -> has_radio_module, DataAct -> generates_usage_data).
- Environmental appears only as unsupported_domain; executive_summary has NO
  percentage (counts + "no further regulations identified" instead).
- POST /reasoning/regulatory-map (thin handler). Response types are presentation-
  level, not meta-model classes (freeze v1.0 untouched).
- 9 tests; 56 green (existing reasoning MVP stays green), mypy clean, LOC ok.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-26 10:36:06 +02:00
Benjamin Admin 4e8eb2dc0e feat(product-scope): gate Navigator facts, then reuse discover_scope (step 3)
Connects the Navigator's fact-gate to the existing reasoning discover_scope —
the Scope Engine decides only once the minimum (P0) facts are released.

- resolve_product_scope(canonical): if not ready_for_scope -> NEEDS_FACTS
  (missing_facts + suggested_questions, discover_scope NOT run); else project
  canonical->reasoning profile and run the EXISTING discover_scope exactly once
  -> RESOLVED with applicable/excluded/uncertain regulations.
- Environmental triggers surface ONLY as unsupported_domains (future_corpus_needed),
  never as a legal evaluation — transparency, no false completeness.
- POST /reasoning/product-scope (thin handler) returns case NEEDS_FACTS or RESOLVED.
- No new scope rules, no new regulations, no environmental-law evaluation, no UI,
  no Go, no RAG, no percent-compliance. Response types are application-level, not
  meta-model classes (freeze v1.0 untouched).
- 6 tests incl. discover_scope spy (0 calls when gated, exactly 1 when ready),
  category separation, environmental-as-unsupported-only. 47 tests green (existing
  reasoning MVP tests stay green), mypy clean, LOC ok.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-26 10:21:27 +02:00
Benjamin Admin 78aeedafae feat(navigator): Product Regulatory Navigator as a thin missing-facts layer
Step 2 of the convergence sequence. The Navigator sits over the
CanonicalProductRegulatoryProfile (prefilled from company-profile / ProductWizard)
and reports ONLY which facts are still missing + prioritized questions to collect
them. It decides which facts are needed, NEVER what applies — that stays with the
Scope Engine (step 3). No regulation logic, no UI, no Go, no RAG.

- NavigatorQuestion (interaction type, NOT a compliance-meta-model class — freeze
  v1.0 untouched): question_id, target_field, label, why_needed,
  regulatory_domains_unblocked (static metadata), answer_type, options, priority.
- QUESTION_CATALOG: 12 questions over canonical gaps — P0 (markets, role,
  lifecycle, machine/component), P1 (radio, usage-data, security-function,
  environmental wastewater/air/chemicals triggers), P2 (structured BOM).
- engine: navigate() -> missing_facts + suggested_questions (priority-sorted) +
  completeness_summary (ready_for_scope = no P0 missing); apply_answers() ->
  updated profile. Pure field-presence; no scope import.
- 8 tests: <=10 questions for a filled company-profile, known facts not re-asked,
  environmental = trigger questions only (no law evaluation), apply round-trip,
  P0 ordering, ready_for_scope. 41 tests green, mypy clean, LOC ok.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-26 10:05:27 +02:00
Benjamin Admin 739a477d3f feat(profile): CanonicalProductRegulatoryProfile convergence layer (types + mappers + tests)
ONE canonical product profile so the Go gap engine and the Python reasoning
engine stop diverging ("SPS mit Remote Access" means the same everywhere).
gap.ProductProfile LEADS; the reasoning ProductProfile becomes an adapter/DTO.
Types + mappers only — no regulation logic, no Go changes, no UI, no new questions.

- CanonicalProductRegulatoryProfile mirrors gap.ProductProfile + the Navigator
  gaps the audit found: economic-operator role, radio_module, generates_usage_data,
  lifecycle_phase, structured BOM (ProductComponent), safety-vs-security split,
  machine-vs-component + a forward-looking EnvironmentalImpact domain (wastewater/
  air/chemicals triggers — fields only, no rules yet).
- Mappers: from_product_wizard (lossless), from_company_profile (prefill incl.
  the machineBuilder block), to_gap_profile (emits the unchanged gap JSON shape),
  to_reasoning_profile (projects into the reasoning ProductProfile; AI stays
  delegated to ai-act/ucca). Only profile->reasoning is coupled; reasoning stays
  hermetic.
- 10 tests = the 10 acceptance criteria incl. ProductWizard round-trip lossless,
  markets no longer forced ['EU'], and canonical->reasoning->discover_scope
  proving one semantic profile drives the engine. 33 tests green, mypy clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-26 09:52:46 +02:00
Benjamin Admin 6673c8052b fix(reasoning): drop "vollständig" from ClaimCoverage wording [F1 final]
"vollständig" still implied fulfillment. potentially_addresses now reads
"… adressiert N Pflichten direkt und M teilweise; K werden durch die Aussage
nicht berührt. … Dies ist keine Konformitätsaussage." Enum value kept
(potentially_addresses chosen over addresses_claimed for product clarity).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-26 00:49:20 +02:00
Benjamin Admin 5e5002c883 refactor(reasoning): enforce ClaimCoverage (Welt 1) vs ComplianceStatus (Welt 2) boundary [F1]
Architecture-validation finding: the implementation mode produced compliance-
flavored output ("teilweise erfüllt", "covered") from a mere customer claim,
blurring the line to the Execution layer. This is a design decision, not a text
fix — the reasoning layer judges only the customer's STATEMENT, never conformity.

- CoverageStatus -> ClaimCoverage; values are claim-relative + carry "potential":
  potentially_addresses / partially_addresses / does_not_address /
  insufficient_information.
- ImplementationAssessment -> ClaimObligationMapping (coverage_status ->
  claim_coverage); ImplementationResponse -> ImplementationReasoningResponse
  (assessments -> mappings, + explicit `disclaimer`); request renamed; engine
  entry assess_implementation -> reason_implementation_claim.
- Endpoint /reasoning/implementation-assessment -> /reasoning/implementation-reasoning.
- Summary/explanations reworded: "adressiert wahrscheinlich N Pflichten … für
  eine Bewertung der tatsächlichen Umsetzung sind Nachweise erforderlich (keine
  Konformitätsaussage)". No "erfüllt"/"abgedeckt" leaks.
- New guard test asserts no compliance verdict leaks (no "erfüllt"; disclaimer
  separates ClaimCoverage from ComplianceStatus). 23 tests green, mypy clean.

Discovery (scope/obligations) was already structurally claim-free and unaffected.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-26 00:37:57 +02:00
Benjamin Admin 1607c89459 feat(reasoning): Regulatory Reasoning Engine MVP (scope/obligations/implementation/interpretation)
Deterministic reasoning layer ON TOP of the Legal Knowledge Graph (obligation
registry) and the Compliance Execution Graph (control mapping/evidence). Answers
which regulations apply to a concrete product, which obligations follow, whether
the customer's implementation covers them, and whether a customer interpretation
is too narrow/broad/plausible.

- ProductProfile with tri-state facts (Optional[bool]=None => uncertain, never
  false security); safe predicate evaluator (no eval).
- 6 regulation triggers (CRA/MaschinenVO/RED/EMV/DataAct/NIS2) with missing-fact
  prompts; 24 obligation scope rules.
- CRA obligation_ids RE-USED verbatim from the registry (93 ids) — never re-minted
  (control_uuid trap); Machine/Data-Act flagged proposed=True.
- required_evidence constrained to the framework-agnostic shared evidence catalog;
  capabilities echo the planned Obligation->Capability layer.
- Overlap groups (CRA<->MaschinenVO cyber-safety) + evidence-for-multiple (USP).
- 4 endpoints POST /reasoning/{scope,obligations,implementation-assessment,
  interpretation-assessment}; thin handlers, registered in api/__init__.py.
- 22 tests (5 machine-builder scenarios + 10 acceptance questions). No DB
  migration, no RAG, no new controls.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-25 19:30:53 +02:00
Benjamin Admin e1b270c36e Add obligation discovery pipeline tooling
Sichert die validierte Obligation Discovery Pipeline aus /tmp als dauerhaftes,
committetes Tooling (scripts/obligation_discovery/) — der eigentliche Vermögenswert.

Stufen: precluster (Embedding-Cache + Mikro-Cluster) → meta_cluster (Review Units,
Skalierungs-Fix) → synthesize_obligations (Opus, Key aus ENV, Streaming, harte Tier-Regel,
Provenance) → validate_registry → merge_review_diff. Reine Helfer in _core.py, 16 Unit-Tests.

Doku docs-src/development/obligation_discovery_pipeline_v1.md mit Meilensteinen
(SBOM/Vuln reproduziert, Auth 4408→170 Review Units→54→kuriert 29) und der Architekturregel:
Runtime deterministisch, Discovery LLM-gestützt.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-25 07:41:45 +02:00
Benjamin Admin c1ea9458a7 Add met_count and recall_limited_obligations to shadow telemetry
Reichert die Obligation-Shadow-Telemetrie um zwei Felder an für die Cross-Firmen-
Auswertung: met_count (abgedeckte Obligations) + recall_limited_obligations (welche
Obligations recall-limitiert sind) — erlaubt die Konzentrations-Analyse über Firmen.

7-Firmen-Shadow: 136 Control-Findings → 29 Obligation-Findings (4,7×); recall_limited
nur 6/29, konzentriert auf third_country/safeguards in 2/7 Firmen → LLM-Fix bounded.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-24 20:15:45 +02:00
Benjamin Admin 0631a98bdd Mark recall-limited obligations in DSE shadow telemetry
Trennt im Shadow drei Kategorien statt eines pauschalen FAILED:
  - echte Lücke (failed_by_current_checker)
  - redundanter Control-FP (kollabiert per OR zu MET)
  - Prüfer-Reichweitenproblem (recall_limited)

obligation_taxonomy.py: decision_method_required=LLM für recipients_disclosed,
third_country_transfer_disclosed, safeguards_disclosed, safeguards_accessible
(versioniertes Registry-Artefakt bis DB-Tabelle, v1-Spec). Empirisch: TeamViewer
0/22 kw+emb trotz erfüllter Pflicht (cos 0.49-0.57) → CONTENT/LLM-Klasse, kein Schwellen-Fix.

compute_obligation_shadow segregiert FAILED/PARTIAL über requires_llm(): teamviewer
5 Findings → 2 echte + 3 recall_limited. 9 neue Unit-Tests (41 gesamt grün).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-24 13:46:21 +02:00
Benjamin Admin c3542f7dfe feat(dse): obligation shadow telemetry
Verdrahtet die Obligation Aggregation Engine als Layer 4 (SHADOW) in v3_engine:
erzeugt aus den results zusätzlich Obligation-Ergebnisse AUSSCHLIESSLICH für die
Telemetrie. Greift NICHT in results ein — nutzer-sichtbare Findings unverändert.

- _obligation_shadow.py: fetch_obligation_markers (legal_obligations + applicability)
  + compute_obligation_shadow (pure): legacy_control_findings, obligation_shadow_results,
  collapse_factor, na_count, met_failed_delta, top_collapsed_obligations
- met-Signal = Legacy-passed (kein zusätzlicher Prüfer-Call/Key)

E2E (3 Firmen, echte Engine): 57 Control-Findings → 14 Obligation-Findings (4,1×);
Redundanz kollabiert wo Evidenz existiert, echte Lücken bleiben FAILED. 6 Unit-Tests grün.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-24 12:59:52 +02:00
Benjamin Admin 7ec29999a2 feat(obligation): obligation applicability predicates
Minimaler Applicability-Hook für die Obligation Aggregation Engine: entscheidet
aus dem Dokumenttext, ob eine bedingte Obligation anwendbar ist (True/False/None).

- has_third_country_transfer · uses_legitimate_interest · direct_marketing
  (+ Alias legitimate_interest_or_public_task)
- unbekanntes Prädikat → None → Aufrufer behält Default=anwendbar (fail-safe, nie stille NA)
- profiling/employment/telecom/health/data_act folgen als nächste Charge

Re-Benchmark (Opus-GT, 3 Firmen): Prädikate erkennen Transfer/berecht.Interesse/
Direktwerbung korrekt → keine falsche NA; NA-Flip-Probe bestätigt FEHLT→NA ohne Transfer.
14 Unit-Tests grün.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-24 12:43:42 +02:00
Benjamin Admin 402a42d30d feat(obligation): obligation-level aggregation engine
Erste Ausführung des Legal Obligation Layer v1: aggregiert Bewertungen auf
Kriterium-/Control-Ebene zu Findings auf Obligation-Ebene
(Regulation → Legal Obligation → Control → Criterion).

- regulierungs-agnostisch (obligation_id/tier/met/legal_basis/conditional)
- fail-safe: LM applicable=false→NA · keine erfüllt→FAILED · alle→MET · Teil→PARTIAL;
  BP/OPT covered→MET sonst OPEN (nie FAILED); LM unbewertbar→UNDETERMINED (Legacy behalten)
- Redundanz-Kollaps per OR pro legal_basis-Anforderung → kein künstliches PARTIAL
- Applicability als Hook (Prädikat-Engine folgt separat)

Shadow-Benchmark (Opus-GT, 3 Firmen): 38 Control-Findings → 13 Obligation-Findings
(2,9×); ~23 redundante Falsch-Positive strukturell korrigiert, echte Lücken erhalten,
PARTIAL=0. 16/16 Unit-Tests grün.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-24 12:28:03 +02:00
Benjamin Admin 067118b12d fix(cascade): give OVH/gpt-oss reasoning headroom so Tier-2 isn't silently dead
CI / detect-changes (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 6s
CI / validate-canonical-controls (push) Successful in 5s
CI / loc-budget (push) Successful in 20s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 25s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
gpt-oss-120b is a reasoning model: it spends output tokens on chain-of-thought
before the answer. deep_check called _call_ovh with max_tokens=400, which
length-capped it mid-reasoning -> content=null -> the OVH tier returned nothing
and the cascade always skipped Tier-2. Floor the OVH budget to >=2000, fall back
to reasoning_content when content is null, and raise the client timeout to 90s
for the slower reasoning path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-22 17:37:48 +02:00
Benjamin Admin 5ff08a240b feat(dse): tiered 3-state evaluator + Layer-3 wiring (compliance_tier)
Getierte Auswertung mit compliance_tier-Gating (nur LEGAL_MINIMUM bestimmt
ERFÜLLT/TEILWEISE/FEHLT; BEST_PRACTICE/OPTIONAL → Empfehlungen). Deterministisch-
first: EMBEDDING-Präsenz + gecachter Haiku nur für Sufficiency → reproduzierbar
(löst die gemessene Judge-Varianz). Layer-3 in v3_engine gated auf tiered_criteria,
fail-safe (UNBESTIMMT → Legacy). Offene Kalibrierung: Präsenz-Schwelle (Schritt 2).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-22 17:37:48 +02:00
Benjamin Admin 3e3644f83d feat(checkers): platform router + Haiku sufficiency tier; cookie is first consumer
Generalise "Embedding finds, Claude decides" into the shared Pruefer-Library:
- router.route_and_check dispatches control -> sensor_classification -> Checker.
- build_spec reads sensor_classification (CONTENT/LLM -> judge=haiku, the
  validated sufficiency tier; the Qwen-first cascade is disproven for sufficiency).
- LLMChecker gains a Haiku-direct tier (reuses the validated deep_check prompt).
- Cookie Layer-3 now routes through route_and_check instead of bespoke code, so
  cookie is the first real router consumer -- proves the architecture end-to-end.

Reproduces the validated result via the shared path: FN 159->14, recall
0.13->0.92, precision 0.89 (vs bespoke 12/0.93/0.90 -- within Haiku noise).
Tests: 10/10 (router dispatch + build_spec + haiku tier + cookie rewire).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-22 17:37:48 +02:00
Benjamin Admin e809d0bc1c feat(cookie): Layer-3 sufficiency-judge — Haiku re-judges embedding/boost rescues
The embedding/boost auto-rescue is intentionally optimistic (finds the topic, not
fulfilment) -> 159 FN over-rescues vs Opus-GT (recall 0.13). Layer-3 re-judges
exactly the rescued passes with the validated Haiku judge (cohort
cookie_sufficiency_v1 P0.89/R0.91) -- NOT the Qwen-first cascade (local is
disproven as a sufficiency judge) -- and un-passes them when the obligation is
not concretely met. Gated to the full check (not skip_llm).

Measured (5-firm Opus-GT, engine+L3): FN 159->12, recall 0.13->0.93,
precision 0.96->0.90 (276 rescues corrected). "Embedding finds, Claude decides."

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-22 17:37:48 +02:00
Benjamin Admin 869e7aeb1e fix(cookie): gate non-COOKIE_POLICY controls out of the cookie-policy scan
The cookie agent loaded 100 controls, 11 of which have no COOKIE_POLICY in
applicable_artifacts -- Security/TOM/Audit (PROCESS) or Banner-behaviour
(BEHAVIOR) controls that produce nonsense findings against a cookie policy
(e.g. "TOMs not documented"). Add a cookie classification gate (analogous to the
DSE gate, keyed on COOKIE_POLICY, without the needs_review carve-out since the
artifact signal is decisive and the set is inventory-verified). Controls are
routed out, not deleted. Effect vs Opus-GT: FP 16->11, FN 179->159; the
remaining FN=159 over-rescue is a separate (judge/criteria) question, not routing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-22 17:37:48 +02:00
Benjamin_Boenisch 38a347a82a feat(platform): live-wire AGB v2 + DSE v3 + Architektur-Tab (#29)
CI / detect-changes (push) Successful in 7s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 9s
CI / validate-canonical-controls (push) Successful in 12s
CI / loc-budget (push) Successful in 24s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 3m11s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 24s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
AGB v2 (decision_method routing, 71%FP->~0) + DSE v3 (4-layer, recovered from container) + Architektur-Tab into /sdk/agent live path. Incl CI robustness (detect-changes.sh + PR-head checkout) + security (hardcoded Qdrant key removed, gitleaks allowlist).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-21 12:58:26 +00:00
Benjamin Admin 76d1dc5e00 fix(db): dedupe doc_check_controls 3x + unique constraint
CI / detect-changes (pull_request) Failing after 5s
CI / branch-name (pull_request) Successful in 1s
CI / guardrail-integrity (pull_request) Failing after 2s
CI / secret-scan (pull_request) Failing after 5s
CI / dep-audit (pull_request) Failing after 12s
CI / sbom-scan (pull_request) Failing after 3s
CI / build-sha-integrity (pull_request) Failing after 3s
CI / validate-canonical-controls (pull_request) Failing after 1s
CI / loc-budget (pull_request) Has been skipped
CI / go-lint (pull_request) Has been skipped
CI / python-lint (pull_request) Has been skipped
CI / nodejs-lint (pull_request) Has been skipped
CI / nodejs-build (pull_request) Has been skipped
CI / test-go (pull_request) Has been skipped
CI / iace-gt-coverage (pull_request) Has been skipped
CI / test-python-backend (pull_request) Has been skipped
CI / test-python-document-crawler (pull_request) Has been skipped
CI / test-python-dsms-gateway (pull_request) Has been skipped
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 8s
CI / validate-canonical-controls (push) Successful in 6s
CI / loc-budget (push) Successful in 19s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 25s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
compliance.doc_check_controls war auf prod historisch trippliziert
(Dump-Artefakt ohne PK/Unique: jede (doc_type, control_id)-Zeile 3x,
5622 statt 1874 ueber alle 8 doc_types). Die Migration dedupt idempotent
(kleinste ctid behalten) und setzt UNIQUE(doc_type, control_id), damit
sich die Triplikation nicht wiederholen kann.

Auf prod bereits direkt angewandt und in _migration_history registriert
(read-only verifiziert: 1874, alle doc_types total=distinct, Constraint
aktiv); dieser Commit codifiziert die Migration in der Deploy-Kette,
damit ein Restore aus einem aelteren Dump sie automatisch re-appliziert.

[migration-approved]

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-20 14:25:03 +02:00
Benjamin Bönisch 43e02f794a feat(cra): SBOM- + DAST-Findings aus dem Scanner-MCP konsumieren
CI / detect-changes (push) Successful in 8s
CI / branch-name (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 6s
CI / validate-canonical-controls (push) Successful in 10s
CI / loc-budget (push) Successful in 20s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Successful in 1m4s
CI / iace-gt-coverage (push) Successful in 15s
CI / test-python-backend (push) Successful in 24s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Sharangs compliance-scanner-agent exponiert SBOM (sbom_vuln_report) + DAST
(list_dast_findings) als eigene MCP-Tools (nicht via list_findings). Neuer
fetch_all_findings(repo_id) zieht list_findings + SBOM + DAST in EINER
MCP-Session und normalisiert ins Finding-Schema:
- SBOM: ein Finding pro verwundbarem Paket (nicht pro CVE), cwe=CWE-1395
  -> deterministisch CRA-AI-22 (robust gegen Paketnamen wie "sqlite").
- DAST: cwe/endpoint/vuln_type uebernommen -> Mapping via cwe/keywords.
assess-from-scanner nutzt fetch_all_findings + liefert source.breakdown
(code/sbom/dast). DAST hat im MCP keinen repo_id-Filter -> dast_repo_scoped:false
(deployment-weit, transparent geflaggt).

Echte MCP-Daten: Kitchenasty 58 code + 35 sbom + 81 dast -> 174 gemappt
(Coverage 94,3%, alle 35 SBOM -> CRA-AI-22).

Enthaelt zusaetzlich das Qdrant->Prod-Kopierскript (#42, verbatim macmini->prod).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-18 12:05:05 +02:00
Benjamin Bönisch 8f21650d74 feat(sdk): Kunden-Dokumente + CRA-Meldewesen, Screening aus Frontend genommen
CI / detect-changes (push) Successful in 16s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 15s
CI / validate-canonical-controls (push) Successful in 13s
CI / loc-budget (push) Successful in 25s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 3m9s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 31s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
- /sdk/dokumente: Kundensicht nur auf veroeffentlichte Rechtsdokumente
  (Ansehen + Download); Proxy mit Allow-List nur /public — Templates/Drafts/
  Generator bleiben unerreichbar.
- /sdk/cra-meldewesen: CRA Art. 14 Meldewesen (24h/72h/14d-Kaskade) mit
  Fristen-Tracking + ENISA-SRP-Export-Entwurf (kein Live-API). Backend:
  cra_meldewesen (pure, getestet) + cra_incident_store (schema-neutral ueber
  compliance_cra_documents) + /api/v1/cra/incidents (additiv, contract-safe).
- Screening (Self-Scan) aus dem Frontend genommen: Flow-Stepper-Eintrag
  ausgeblendet (visibleWhen), Dashboard-Kachel + Import-Button entfernt.
  Repo-Scanning laeuft extern im Compliance-Scanner; Backend-Router bleibt
  vorerst gemountet (Contract-Stabilitaet).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-17 21:21:28 +02:00
Benjamin Bönisch 72093e5501 fix(cra): Scanner-Findings vollstaendig mappen + assess-from-scanner-Latenz senken
CI / detect-changes (push) Successful in 17s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 13s
CI / validate-canonical-controls (push) Successful in 12s
CI / loc-budget (push) Successful in 25s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 30s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
Punkt 2 (Coverage): semgrep/gdpr-Findings ohne CWE blieben unmapped (~21%).
Der Mapper nutzt jetzt den scanner rule_id + gezielte Keywords (gdpr ->
Datenminimierung CRA-AI-17, path-traversal/prototype-pollution -> CRA-AI-20,
nginx-header/Docker-Hardening -> CRA-AI-1/4, insecure-websocket -> CRA-AI-15).
Reale Scanner-Daten: unmapped 19/92 -> 0/92 (Coverage 100%).

Punkt 3 (Latenz): enrich_findings_with_breadth lief ~6 Aggregat-Queries je
(use_case,sub_topic)-Paar, nutzte aber nur die Liste. Jetzt EINE batched Query
(breadth_controls_batch) fuer alle Paare + Prozess-Cache (TTL 1800s). macmini:
cold 0,23s / warm 0,000s. Prod-Root-Cause: atom_classification ohne
(use_case,sub_topic)-Index nach DB-Swap -> Index dem DB-Owner empfohlen.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-17 13:17:51 +02:00
Benjamin Admin fda94afd5f fix(cra): prod hang-guard /readiness machinery + robuster Datenblatt-JSON-Parse
CI / detect-changes (push) Successful in 19s
CI / guardrail-integrity (push) Has been skipped
CI / branch-name (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 10s
CI / validate-canonical-controls (push) Successful in 9s
CI / loc-budget (push) Successful in 22s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 32s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
#1 _machinery_obligations: SET statement_timeout=4s + run_in_threadpool — auf
   prod hing die maschinen-Query ~30s (langsame/unindizierte DB nach DB-Swap)
   und blockierte den async-Worker. Jetzt: bei Langsamkeit graceful 'keine
   Maschinen-Pflichten' statt Hang. (Fehlender prod-Index = Controls/DB-Session.)
#2 parse_grenzen_json: tolerant ggue. ```json-Fences / Prosa-umschlossenem JSON
   (gehostete Modelle wie OVH ignorieren z.T. response_format) → Datenblatt-
   Extraktion liefert auch ueber den OVH-Fallback Felder.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-17 07:39:39 +02:00
Benjamin Admin 9e2655bfef fix(cra): IACE-Create id-Wrapper + MaschinenVO eigene Sektion
CI / detect-changes (push) Successful in 16s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / build-sha-integrity (push) Successful in 12s
CI / validate-canonical-controls (push) Successful in 11s
CI / loc-budget (push) Successful in 24s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 3m10s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 32s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
1) createProject las proj.id, der Create-Response ist aber {project:{id}} →
   'Projekt anlegen' war kaputt. Jetzt proj.project?.id. E2E verifiziert
   (create→put limits_form→get→delete = 200).
2) MaschinenVO-Sicherheitspflichten wurden in die CRA-Cyber-Buckets
   (Code/Prozess/Doku) gemischt → fehl-kategorisiert (Maschinen-Safety ≠
   CRA-Annex-I-Cyber). Jetzt eigene Response-Liste machinery_guideline +
   eigener Frontend-Abschnitt 'Maschinensicherheit (MaschinenVO 2023/1230)';
   geklebtes 'MaschVO'-Badge entfaellt damit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-17 00:12:52 +02:00
Benjamin Admin fae826e1f7 fix(cra): 35B-Datenblatt-Extraktion — Thinking-Mode aus (think=false)
qwen3.5:35b-a3b ist ein Thinking-Modell → generierte erst Reasoning, riss das
90s-Timeout → leere Extraktion. llm_cascade additiv um think-Param erweitert
(Cache-Key kennt think); Datenblatt-Extraktor setzt think=False → sauberes JSON
in ~1s. Default fuer alle anderen Cascade-Nutzer unveraendert.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-16 20:22:57 +02:00
Benjamin Admin b217429d39 feat(cra): Datenblatt-Extraktion auf lokales 35B + llm_status-Fix
llm_cascade additiv modell-faehig (optionaler model-Param, Cache-Key kennt
model_hint → keine Kollision; Default unveraendert für alle anderen Nutzer).
Datenblatt-Extraktor nutzt jetzt qwen3.5:35b-a3b (CRA_DATASHEET_MODEL, gleiches
Modell wie der Compliance Advisor) für bessere semantische Zuordnung. Plus
llm_status (ok|empty|unavailable) + Logging statt stillem except; Frontend zeigt
bei 'unavailable' einen Hinweis statt leerer Felder (wichtig auf prod ohne
lokales Ollama → Cascade-Fallback bzw. Hinweis).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-16 19:53:48 +02:00
Benjamin Admin cfdc5fe277 feat(cra): Datenblatt→Grenzen-Extraktor (hybrid, lokales 35B)
Hybrid-Extraktion Datenblatt → IACE Grenzen (ISO 12100): deterministischer
Detektor (Schnittstellen/Einheiten per Regex) + lokales 35B via llm_cascade
(Qwen-lokal-first) fuer die semantische Zuordnung auf die echten LimitsFormData-
Keys. Nichts erfinden: Feld nicht im Text → leer + Quellen-Zitat je Feld.
Essenzielle ISO-12100-Felder, die leer bleiben → gezielte Rückfragen
(foreseeable_misuses, person_groups, qualification, temporal_limits …).
Endpoint POST /api/v1/cra/extract-datasheet. 13 Tests gruen (reine Teile).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-16 19:06:07 +02:00
Benjamin Admin 62fafaaec5 feat(cra): MaschinenVO-Gefährdungs-Ableitung + Cyber-Safety-Brücke
3-Tier-MaschinenVO-Verdict (direkt / sicherheitsrelevant / nicht relevant) aus
Personengefährdungs-Signal: eine Komponente ist keine Maschine, aber wenn ihre
Funktion bei Fehler ODER Manipulation Personen gefaehrden kann (Bewegung, Laser/
Auge, Kraft, Temperatur, elektrisch), ist sie sicherheitsrelevant — Pflicht
trifft den Maschinenbauer, Zulieferer liefert Nachweise, und ein Cyber-Angriff
kann die Sicherheitsfunktion aushebeln (Cyber-Safety-Bruecke). OWIS-mit-Laser
landet so korrekt als 'sicherheitsrelevante Komponente'. Engine + /readiness
additiv; Frontend: Gefährdungs-Frage + -Typen, MaschinenVO-Ergebnisblock.
Presets aktualisiert (OWIS: Laser+Bewegung, Zwick: Bewegung). 22 Tests gruen.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-16 18:48:52 +02:00