breakpilot-compliance

Author	SHA1	Message	Date
Benjamin Admin	978052b5a2	fix(onboarding): decouple partial/indicative signals from detected — partial no longer removes a question Fix B of the pre-#59 semantic correction. The Silent Pass had only TWO effective states though the data carries three: a `detected` mapping (a concrete artifact) AND a `partial` mapping (an indicative signal, e.g. a CI pipeline -> secure-development-lifecycle) both flowed through capability_ids() and were fed to the Advisor as already-present — so a weak indication silently removed a question, exactly the Welt-1/ Welt-2 transparency we want to keep. Now three distinct states: - detected -> reduces the delta immediately (auto_detected, not asked). [unchanged] - partial -> raises assumption strength but does NOT replace the question (surfaced as `indications`, the capability stays in the delta and is still asked). - requirement-> describes a target, never the present state (already handled by Fix A's kind split). Changes (data + thin wiring, no new architecture): - SilentIntakeResult.capability_ids() returns only relationship==detected; new indicative_capability_ids() returns the partial ones. - advisor_start() gains indicative_capabilities (NOT fed into the profile) and surfaces result.indications = indicative ∩ required − auto_detected. - AdvisorResult / AdvisorResponse gain `indications` (additive, contract-safe); the service passes the indicative ids through. Tests: a partial CI signal is indicative-not-detected and does NOT shrink the delta; end-to-end it appears in `indications`, not `auto_detected`, and the gap is still asked. 28 onboarding tests pass, mypy --strict clean on the onboarding modules, demo runs, check-loc 0. Runtime effect -> deploy + smoke.	2026-06-28 16:02:35 +02:00
Benjamin Admin	c39787ad96	fix(onboarding): separate observation vs requirement signals — a demanded SBOM is not a present SBOM Semantic correction of the knowledge base BEFORE the empirical loop (#59) is built — otherwise the Observation Store would learn from already-misclassified signals. The Silent Pass conflated two kinds of signal into one: an OBSERVATION ("I saw an SBOM in the repo") and a REQUIREMENT ("a tender DEMANDS an SBOM"). They were aliased to the same canonical id, so a tender clause read as "SBOM already present" and suppressed the very question that should have been asked. Fix — make the kind explicit and authoritative (no new architecture, data + thin wiring): - `kind` ∈ {observation, requirement} on ProducedSignal (producer may declare) and on the canonical SignalVocabularyEntry (AUTHORITATIVE — a mislabelled producer cannot collapse the two). - Vocabulary split: sbom_file_found → sbom_present (obs) + sbom_required (req); security_txt_or_cvd_policy → cvd_policy_present (obs) + psirt_required (req); add signed_updates_required. requirement signals are intentionally UNMAPPED in intake_signal_map (they describe a target, not state). - silent_intake() consumes ONLY kind==observation; requirement signals are preserved in `requirements_seen` (visible/auditable) but NEVER become a detected capability. - normalize_signals() stamps the vocabulary's kind onto every IntakeSignal; unknown ids still pass through. This is the same Observation-vs-Requirement split the Requirements Verification Platform rests on: observations are reality, requirements are targets, and their comparison is the delta. A tender / OEM spec / law now produces requirement signals; scanners / repos / documents produce observation signals. Tests: rewrote the two test_signal_producer cases that previously ASSERTED the bug (tender == repo) to pin the correct split; regression — `requires_sbom` yields no capability + stays in requirements_seen while `cyclonedx_found` still detects sbom_creation; endpoint-level regression that a tender requirement does not auto-detect and the gap stays asked; vocabulary-kind-overrides-mislabelled-producer. 25 onboarding tests pass, mypy --strict clean, demo runs, check-loc 0. Runtime effect → deploy + smoke. (Fix A; partial-vs- detected decoupling follows as Fix B before #59.)	2026-06-28 15:52:50 +02:00
Benjamin Admin	a4123ace71	feat: POST /onboarding/advisor-start — expose the Smart Onboarding Advisor at runtime (#58 ) This exposes the existing Smart Onboarding Advisor through a runtime endpoint; it does not add new reasoning logic. Tightly scoped: adapter boundary + endpoint, no big frontend, no persistence, no empirical learning, no new scanners, no LLM. POST /onboarding/advisor-start : (company + certifications + target + scanner_findings[ProducedSignal]) -> Normalizer -> Silent Knowledge Pass -> Advisor -> { silent_intake_summary, inferred_assumptions, rejected_assumptions, top_5_questions, capability_delta, top_measures, evidence_requests, completeness_summary, auto_detected, headline } GET /onboarding/targets : the supported target ids (CRA, TISAX, MDR, Environmental) compliance/services/onboarding_service.py is the app-caller: it loads the curated knowledge (hypothesis library, signal vocabulary + map, the target's required capabilities) once and calls the pure, tested orchestration (normalize_signals -> silent_intake -> advisor_start). The scanner ADAPTER boundary is the ProducedSignal format the request carries — existing scanners emit it, no new scanners. Thin handler (<30 LOC), registered in the auto-load list. No DB. Additive to the OpenAPI contract (contract test is additive-friendly; baseline regenerates on CI/py3.12). First deployable runtime feature -> dev deploy + smoke. mypy --strict clean, 22 onboarding tests pass, check-loc 0.	2026-06-28 15:14:00 +02:00
Benjamin Admin	c2c8f7e424	feat: Signal Producer interface + Normalizer — one signal language for all sources (before #58 ) Not scanner stubs — the scanners exist. The Silent Pass needs only their UNIFIED output. This adds the small common DATA FORMAT (not a new module/framework) the user asked for, exactly the Requirement- Source / MCAP / regulation-alias pattern: many inputs, one language. Producer A / B / C -> normalize_signals (vocabulary: id + aliases) -> canonical IntakeSignal -> Silent Pass - ProducedSignal {signal_id, source_type, confidence, evidence, provenance} = what ANY source emits (website scanner, repo scanner, PDF parser, tender parser, API, the user). - knowledge/onboarding/signal_vocabulary.yaml reduces producer dialects to a canonical signal: "SBOM present" arrives as cyclonedx_found / spdx_found / sbom_uploaded / requires_sbom (tender) — all become `sbom_file_found`. The Silent Pass cannot tell where it came from -> no per-scanner special logic, ever. - Unknown signals pass through (a new producer stays visible). confidence/evidence/provenance flow to the detected capability for the audit trail. A tender that "requires SBOM" now produces the same effect as a repo that HAS one — fits Vision V2 (Requirement Source over Regulation). Endpoint (#58) then has its final shape: POST -> Producers -> Normalizer -> Silent Pass -> Profile -> Delta -> Questions -> Roadmap. Non-runtime -> no deploy. mypy --strict clean, 14 onboarding tests pass, check-loc 0.	2026-06-28 14:49:57 +02:00
Benjamin Admin	9c33582412	feat: Silent Knowledge Pass — recognise before asking (Phase 0, before the endpoint) Not the endpoint yet — the bigger knowledge lever first. The Advisor can say "I need 5 answers" but does not yet decide what it can find out by ITSELF. The Silent Knowledge Pass runs in front of the Advisor and, from signals existing scanners/parsers already produce (website, repository, documents, product data), deterministically derives capabilities the company demonstrably HAS + product facts that drive scope — so every recognised item shrinks the delta and removes a question. compliance/onboarding/silent_intake.py: silent_intake(signals, signal_map) -> detected_capabilities (+ evidence already in hand) + product_facts. The signal->conclusion map is curated DATA (knowledge/onboarding/intake_signal_map.yaml), signals are injected (scanners are upstream). Pure, deterministic, no LLM. advisor_start gains detected_capabilities (folded into the profile at HIGH confidence -> covered, not asked) and an auto_detected result + headline. The experience flips from a question wall to "we already recognised 4 capabilities, 2 product facts and have 4 pieces of evidence in hand — only these few remain". Order now: Silent Pass -> #58 endpoint/frontend -> #59 empirical loop. NOT new architecture, just an orchestration step in front. Non-runtime (no app caller) -> no deploy. 15 onboarding tests pass, mypy --strict clean, check-loc 0.	2026-06-28 14:34:27 +02:00
Benjamin Admin	98d616d82b	feat: Observation Model — the empirical learning unit, defined BEFORE persistence (Task 59a) The learning point is not the hypothesis, it is the QUESTION — and confirmed/refuted is too coarse. "partial, only critical suppliers" or "certified but not lived" are not "wrong", they are valuable knowledge. So the chain is Hypothesis -> Question -> Observation -> (Review) -> Hypothesis, and the observation model must be defined cleanly before any store/API (else thousands of too-coarse observations get migrated later). compliance/onboarding/observations.py: - ObservationType: confirmed / partial / refuted / not_applicable / unknown (richer than binary). - Observation: {hypothesis_id, capability, question, answer (free text), observation_type, scope_note ("only critical suppliers"), evidence_uploaded, reviewed, reviewed_by}. - empirical_distribution() -> a DISTRIBUTION (confirmed 61 / partial 31 / refuted 8), not one %. - empirical_confidence() -> (confirmed + 0.5*partial) / (confirmed+partial+refuted); n.a./unknown excluded; None until calibrated. - REVIEW GATE: only reviewed observations calibrate — a raw answer never changes a hypothesis (no learning from outliers). Refactor: the hypothesis is now PURE curated knowledge — the binary observations counter and any confidence are removed from CapabilityHypothesis and the YAML; confidence is COMPUTED from the separate reviewed observation stream. Pure, mypy --strict clean. Persistence/aggregation/calibration are 59b/c/d. Non-runtime -> no deploy. 12 tests pass, check-loc 0.	2026-06-28 13:31:43 +02:00
Benjamin Admin	2d2cb2a244	feat: Certification Capability Hypotheses — capability-centric library + empirical confidence The bottleneck is knowledge, not the endpoint. This builds the knowledge the Onboarding Advisor needs, restructured per the user's key insight: NOT "ISO27001 -> 30 capabilities" but each hypothesis as its own object "capability -> supported_by: [certs]". A capability is written ONCE with all supporting certs, so the shared management-system core (document control, incident, supplier, audit, access, asset, monitoring, training, crypto, release, risk) covers most certifications with ~18 hypotheses instead of ~300 — and multi-certification merges AUTOMATICALLY (a company's inferred caps = every hypothesis whose supported_by intersects its certs). Welt-1 throughout: "IF cert present, EXPECT capability (verification required)", never "erfüllt". Capabilities NO cert suggests (SBOM, signed updates, CVD, support period) have no hypothesis -> they stay in the delta and get asked. confidence is EMPIRICAL: computed from real-onboarding observations (confirmed/(confirmed+refuted)), None until calibrated — never an LLM/expert score (record_observation + empirical_confidence). The long-term moat: knowledge that learns from reality, not from a norm. compliance/onboarding/hypotheses.py (resolve_for_certifications / inferred_hypotheses / empirical_ confidence / record_observation) feeds the existing advisor_start unchanged; the demo now runs on the curated library. Pure, mypy --strict clean, library is DATA (no norm text, no real names). Non-runtime -> no deploy. 12 tests pass, check-loc 0.	2026-06-28 13:16:45 +02:00
Benjamin Admin	3ba90f49cf	feat: Smart Onboarding Advisor — make the knowledge usable in onboarding (ADR-012) The user-named "right next runtime step": stop building knowledge, start using it automatically in onboarding — no sales training, no regulation picking. compliance/onboarding/ is an ORCHESTRATOR (not a new engine) wiring Company 2A -> RS-005 -> optimization -> completeness: advisor_start(input, cert_hypotheses, target_requirements, ...) -> AdvisorResult From (company + products + certifications + target) it returns inferred_assumptions, rejected_ assumptions, next_best_questions (<=5, ranked by information_gain + leverage + unknown_high_risk + evidence_missing, each self-explaining), capability_delta, top_measures, evidence_requests, unsupported_domains, completeness_summary. apply_answer() updates the profile (delta shrinks). Welt-1 throughout: certificates REDUCE questions but satisfy nothing automatically (verification_ required); relevance(evidence,target) keeps ISO 14001 out of the CRA result. Certificate->capability hypotheses + target requirements are INJECTED (curated knowledge, outsourced; not in code). All 7 acceptance criteria pass; mypy --strict clean. First app-caller wiring the engines into a product flow — still no endpoint/persistence, so 0 runtime effect -> no deploy yet (deploys when POST /onboarding/advisor-start + frontend are wired). check-loc 0.	2026-06-28 12:45:49 +02:00
Benjamin Admin	a98076196b	feat: Capability Convergence Explanation — why the registry converges + Core/Domain (Phase Ω) The mature step after Medical is not the next domain but understanding WHY the registry converges. Three derived views over existing data (no ML, no new architecture): 1. Why converge? — a domain matrix per cross-domain MCAP + a curated REASON (the moat: not "MCAP-X exists" but "why MCAP-X must exist": software product / supply chain / product operation / universal process). 2. Capability Families — ~75 MCAPs collapse to ~15 curated families (knowledge/capability_families/ families.yaml), each with the reason it is universal or domain-specific. 3. Core vs Domain — a COMPUTED property (not a new class): Core recurs across >=2 independent domains AND source types; Domain stays in one. Medical made it obvious (new medical caps are nearly all Domain; update/SBOM/access/logging are Core). Non-runtime -> no deploy. 4 tests pass, check-loc 0.	2026-06-28 12:26:22 +02:00
Benjamin Admin	80f2e2f619	feat: Medical stress test (safety+security coupled) + Missing Convergence report (Phase Ω #3 ) Medical before Payment: the harder scientific test (safety AND security coupled, full lifecycle, deep risk/evidence demands). ISO 13485 runs through the SAME engine as ISO 27001 -> CRA, only new data, 0 runtime. The key result: IEC 81001-5-1 (health-software security) pulls in the SAME security MCAPs as the CRA, so Medical REUSES cyber capabilities (the safety/security coupling appears as capability reuse) while adding 7 genuinely new medical caps (clinical evaluation, software safety classification, ISO 14971 risk file, benefit-risk). rejected_assumptions intact. Effect on the convergence core: secure_signed_update_distribution 18 -> 24 and technical_vulnerability_management 17 -> 23, now spanning 3 domains (cyber + industrial + medical) — the core visibly GROWS, exactly the convergence signal. New 5th report: MISSING CONVERGENCE — deterministic (no ML) token-cluster detector for potential structural duplications: a name token shared by >=3 MCAPs across >=2 distinct sources is flagged for EXPERT REVIEW (never auto-merged). Surfaces e.g. the `risk` cluster (6 risk MCAPs across 6 sources) and `security`/`software`; single-source decompositions are filtered out. Complements Suspicious by looking at cross-source duplication, not single MCAPs. Also records the durable modelling rule extracted from the frequency fix: evidence is attributed to its ORIGIN; its value against a target is computed later (relevance(evidence,target)). Ledger now 8 sources, Architecture Stability 8/8 = 100%. Non-runtime -> no deploy. 29 tests pass, check-loc 0.	2026-06-28 12:09:52 +02:00
Benjamin Admin	c160bb8291	feat: Cross-Domain MCAP Convergence Analysis — which capabilities carry the system (Phase Ω pause) After Automotive, pause on domains and ask the deeper question: not "which MCAPs occur most often?" (frequency deceives) but "which MCAPs CARRY the largest part of the system?". A deterministic MCAP Impact Score (no AI) aggregates over the EXISTING data only: Impact = distinct Sources + Target Types + Domains + Journeys + Regulatory + Business Leverage Critically anti-frequency-deception: a `likely_covered` cap is attributed to its source CERT (one source), not to every target regulation — otherwise generic management caps win on raw frequency. With that fix the Core surfaces the true cross-cutting nodes: secure_signed_update_distribution (18), technical_vulnerability_management (17), access_control, incident_management, sbom_creation, product_cyber_risk_assessment — exactly the bridges the user predicted; the high-frequency single- domain environmental management caps correctly drop out. Four reports, pure aggregation (no runtime, no new architecture): Core (highest impact), Emerging (>=2 domains), Isolated (1 source/journey — specialised or convergence-not-yet-seen), Suspicious (too coarse: generic verbs; too fine: hyper-specific isolated names) — an abstraction-level review tool for domain experts. 11/62 caps already reach impact >=8; the method is ready to reveal whether a 30-50 MCAP core forms as Medical/Payment arrive. Non-runtime -> no deploy. 5 tests pass, check-loc 0.	2026-06-28 11:48:04 +02:00
Benjamin Admin	90c3fe16b5	feat: Automotive convergence stress test — same capability from many sources (Phase Ω #2 ) Not another domain to prove agnosticism (Environmental did that) but a DIFFERENT property: can the SAME capability be fed by many overlapping Requirement Sources at once without the model becoming unstable? Realistic setup — a supplier with ISO 9001 + IATF 16949 + TISAX + ASPICE + CSMS + SUMS developing an ECU for OEM X. Seven sources (CRA, UNECE R155/CSMS, R156/SUMS, IATF, TISAX, ASPICE, OEM X) with deliberate overlap, run through the SAME engine (0 runtime code, data only). Three new measurements (user-requested): - Capability Convergence: technical_vulnerability_management = 4 sources across 3 source TYPES (regulation + certification + contract); secure_signed_update_distribution = 4 sources. The overlap is where the economic value lives ("one capability replaces five evidence worlds"). - Existing-vs-New: 13/27 required caps reuse existing cyber/environmental MCAPs (48%) -> the registry is starting to converge; the automotive-specific rest (CSMS/SUMS/ASPICE/functional safety) is expectedly new (a maturity hint, not an architecture break). - Business Leverage: a convergent capability satisfies N regulations AND unlocks the OEM market — more convincing to a GF than "satisfies five laws". (Regulatory Leverage counts regulations; Business Leverage counts regulations + markets/customers.) Ledger gains the automotive row (0/0, 14 new types, data_only); stability stays 7/7 = 100%. The verdict recommends the user's next step: NOT a new domain but PAUSE and analyse the registry for the cross-domain high-convergence core MCAPs. Non-runtime -> no deploy. 12 tests pass, check-loc 0.	2026-06-28 11:30:30 +02:00
Benjamin Admin	fbbd0957bd	feat: Environmental stress test — the architecture works OUTSIDE cyber (Phase Ω, data-only) First NON-cyber stress test. Every prior journey was cyber (infosec/software/product security). Environmental brings a completely different mental model (substance flows, emissions, water, chemicals, energy, circularity). The claim under test: RS-005 carries it UNCHANGED — only new DATA, zero runtime code. ISO 14001 (an EMS) is modelled as a Company Profile and run through the SAME engines as ISO 27001 -> CRA (new pattern transition_pattern_iso14001_to_environmental_v1.yaml, capabilities as VERBS): - ISO 14001 yields 5 environmental MANAGEMENT capabilities (Welt-1, probably present) - the concrete substance/emission/water/material EVIDENCE is the 11-capability delta - rejected_assumptions state what ISO 14001 does NOT produce (substance lists, REACH, emissions, battery passports, water analyses) — preserving the Welt-1/Welt-2 separation - the Journey Matcher stays domain-agnostic: ISO14001->Environmental 100%, cyber journeys 0% Result: a non-cyber domain ran through Reality -> ... -> Journey with 0 new runtime classes and 0 new pipeline — a stronger generality proof than ten more cyber regulations. Also extends the Architecture Stability ledger with the third KPI column the user requested — "new capability types" — as a granularity Frühindikator (a domain needing ~80 new types at 0 runtime would flag a too-coarse/too-fine capability model). Environmental = 16 types (5 mgmt + 11 evidence), in range. Ledger now flags cyber vs non_cyber family. Non-runtime -> no deploy. 19 tests pass, check-loc 0.	2026-06-28 11:10:07 +02:00
Benjamin Admin	cefacb87af	feat: Architecture Stability + Knowledge Velocity KPI — Phase Ω (Evidence of Generality) The focus has shifted: no more architecture epics (the Journey Matcher was the last building block). The question is no longer "can the architecture do this?" but "where does it fail under real domain knowledge?". This operationalises the two KPIs almost nobody measures, as a non- runtime, auditable ledger: - Architecture Stability : per integrated Requirement Source — new runtime classes? new pipeline? - Knowledge Velocity : can a domain EXPERT integrate a source data-only, without a developer? A new domain is a ROW in knowledge/architecture_stability/integration_ledger.yaml (data), never a code change — so the KPI improves by adding data, which IS the proof. Current state: 6 sources across 5 target types (CRA, MaschinenVO, TISAX, Tender, OEM, Environmental) = 6/6 = 100% stability and 100% data-only. The pipeline functions are listed honestly as one-time, domain-agnostic infrastructure (now frozen), so the KPI cannot be gamed. The test is a LIVING GUARDRAIL: it fails the day a source needs runtime code, surfacing the exact moment generality breaks. Non-runtime -> no deploy. 5 tests pass, check-loc 0.	2026-06-28 10:49:00 +02:00
Benjamin Admin	80bf1993e0	feat: Journey Matcher — the delta explains the journey (Delta -> Journey, ADR-011) The sanctioned last architectural building block. Reverses the order: not Goal -> Journey -> Delta but Goal -> Required -> Delta -> Journey. A Journey is the EXPLANATION of the Capability Delta, not its cause — so this is a Matcher/Explainer, not a Selector. New module compliance/journey_matcher/ = the third independent, interchangeable function of the pipeline, beside Company 2A (Evidence -> Capability) and RS-005 (Capability -> Delta): match_journeys(delta, journeys, context) -> ranked, auditable explanation - Looks ONLY at the Capability Delta — never at certificates, regulation, tenders or the goal. Journey signatures are certificate-agnostic capability clusters (Input -> Output pattern). - score = share of the delta a journey explains (recall over the missing capabilities); journey_only documents where a journey reaches beyond the delta so a broad journey is not silently preferred. - Deliberately dumb + deterministic (pure set overlap; NO ML/embeddings/LLM), fully auditable (matched / unexplained / journey_only / context signals); a learning ranker can sit on top later. - Signatures injected, engine hermetic. mypy --strict clean. Validated on the real patterns (demo): a CRA+MaschinenVO delta ranks the convergence journey 100%, "ISO27001 -> CRA" 56% (misses the machine-safety caps), "ISMS -> TISAX" 0%. This resolves the "Scope -> Journey" jump from Customer Mission #1. Freeze exception explicitly authorised; non-runtime -> no deploy. 12 tests pass, check-loc 0.	2026-06-28 10:36:43 +02:00
Benjamin Admin	dbf7b9b587	feat: Customer Mission #5 — a non-security target, evidence relevance flips both ways Closes the Evidence-Relevance(Target) claim by testing it on a deliberately NON-security target (a hand-authored environmental / material-evidence Required set — no corpus, no ISO-14001 norm model, no new module). One company profile, three targets through the same engine: - ISO 14001: keine (CRA) / keine (TISAX) / HOCH (environmental) <- flips - ISO 27001: hoch (CRA) / hoch (TISAX) / keine (environmental) <- flips the other way - PSIRT: hoch (CRA) / keine (TISAX) / keine (environmental) Proves relevance(evidence, target) is two-sided: no evidence is relevant "in itself"; relevance only arises against a target -> it must be computed, never stored as an attribute of the evidence. With this, the target-type diversity for the later selector is complete (Regulation · Certification · Contract/Tender · OEM-Spec · Environmental/Material) — five target types through one engine, so a Scope→Journey selector finally makes sense. Synthetic, no real names. Non-runtime -> no deploy. 5 tests.	2026-06-28 10:18:28 +02:00
pilotadmin	5cba0504df	Merge pull request 'Customer Mission #4 — a second, different contract target' (#32 ) from feat/customer-mission-4-second-contract into main	2026-06-28 09:54:42 +02:00
pilotadmin	77d6bc5551	Merge pull request 'Customer Mission #3 — one profile, three target types' (#31 ) from feat/customer-mission-3-target-types into main	2026-06-28 09:54:21 +02:00
Benjamin Admin	b71771e52e	feat: Customer Mission #4 — a second, different contract target (no tender-special-logic) One contract example (Mission #3's public tender) is not enough to safely generalise: it risks baking tender-shaped assumptions into the later Scope→Journey selector. This mission runs TWO deliberately different contract sub-types against the same company through the IDENTICAL engine: - public tender (procurement: pentest report, references, support SLA, SBOM) -> delta 4 - private OEM spec (Lastenheft: CSMS, functional safety, SUMS, ASPICE) -> delta 3 The two deltas are completely DISJOINT (no shared missing capability), proving the contracts are genuinely different — yet there is no per-contract code: assess_transition treats each as a plain Required set, exactly like a regulation or a certification. Evidence-Relevance is target-relative even between two contracts (TISAX worth more to the automotive OEM than to the generic tender). Conclusion: "Contract" as a requirement source is now covered by >=2 diverse cases, so the later selector can treat any contract uniformly. Synthetic company + synthetic contracts (NO real names). Non-runtime -> no deploy. 5 tests pass.	2026-06-28 09:42:31 +02:00
Benjamin Admin	256bb0607d	feat: Customer Mission #3 — one profile, three target TYPES (Requirements Verification proof) Proves the next thing after Mission #2: the pipeline is target-type-agnostic. One company profile runs against THREE deliberately different target types through the identical engine (assess_transition): - CRA (Regulation) -> delta 8 - TISAX (Certification) -> delta 3 - public tender (Contract, synthetic) -> delta 4 A regulation, a certification and a contract all reduce to required capabilities; Profile − Required = Delta does not care which. That is the Requirements Verification Platform: the requirement SOURCE is swappable, the pipeline stays. Makes Evidence-Relevance(Target) concrete: the same evidence is worth a different amount per target. PSIRT = hoch(CRA)/keine(TISAX)/mittel(tender); ISO 14001 = keine against all three security targets but would be hoch against an environmental target. Relevance is a function of the target, not an attribute of the evidence. Also: cross-target-TYPE convergence (8 capabilities satisfy >=2 of the 3 target types) — the leverage one level above law-convergence. Synthetic company + synthetic tender (NO real names). Non-runtime -> no deploy. 5 tests pass.	2026-06-28 09:30:28 +02:00
Benjamin Admin	ff9a66fb72	chore: regenerate Customer Mission #1 snapshot — IACE machine-safety playbooks close the gap (2/12 -> 7/12) The 4 machine-safety playbooks (+ CE conformity) delegated to the IACE session now exist, so Mission #1's end-to-end run finds content for them. Generated artifact only; non-runtime.	2026-06-28 09:10:31 +02:00
Benjamin Admin	dfb2c6dfdb	Merge remote-tracking branch 'origin/feat/iace-machinery-playbooks' into feat/customer-mission-2-multicert	2026-06-28 09:08:54 +02:00
Benjamin Admin	16d6ad4122	feat(knowledge): CE conformity + technical documentation playbook draft (machinery) 5th machinery-safety playbook, capability ce_conformity_assessment_and_technical_ documentation — referenced by the ISO27001->CRA+MaschinenVO transition pattern and listed as content-missing. Covers MaschVO conformity assessment (Annex XI), technical file (Annex IV), EU declaration (Annex V) and CE marking; notes the CRA<->MaschinenVO integrated technical file. status: draft, with canonical_action verb. New file only -> non-runtime, no deploy, conflict-free ride-along. capability_id unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-28 09:02:06 +02:00
Benjamin Admin	3856bb3a4f	feat: Customer Mission #2 — the company arrives with a PROFILE, not a journey Second mission, deliberately different from #1: a highly-certified company (ISO 9001 + ISO 27001 + ISO 14001 + TISAX + CE + PSIRT) asking „what do WE still need for the CRA?". Stresses Mission #1's one open seam (Scope → Journey) and proves the reframe with the real engines: - The start is a Company Capability Profile (certs aggregated), NOT a single cert→target journey. Certifications are OBSERVATIONS feeding the profile. - Evidence is target-relative: ISO 14001 is in the profile but irrelevant to the CRA; PSIRT covers two CRA-delta capabilities. More evidence = smaller delta (12 → 9). - The „journey" is the computed delta (Profile, Target) — not a thing a selector picks. This SHRINKS Mission #1's jump: the seam is profile-intake + target-pick, not a journey-matcher engine. There is no „ISO 27001 → CRA"; only „Profile → CRA". Records the 5 per-mission selection-rationale questions (which journey/why/decisive info/model-extended?/new-parameter?). Selector input = (Company Profile, Target), which collapses the 2^N cert-combination explosion. Non-runtime (reference_scenarios + tests only) -> no deploy. 6 tests pass; check-loc 0.	2026-06-28 09:00:51 +02:00
Benjamin Admin	0b962b41fa	feat(knowledge): 4 machinery-safety implementation playbook drafts (Reasoning delegation) Fulfils the board delegation Reasoning -> IACE (line 45): expert FIRST DRAFTS for the 4 MaschinenVO capabilities the Reference-Suite playbook dashboard lists as "content missing": machine_safety_risk_assessment (ISO 12100), mechanical_safety_and_guards (ISO 14120/14119/13850/13849), operating_instructions_and_safety_information (ISO 12100 6.4 / IEC 82079), protection_against_corruption_of_safety_functions (MaschVO Annex III 1.1.9 = the CRA<->MaschinenVO cyber-safety bridge). Schema per knowledge/implementation_playbooks/README.md. status: draft (expert draft, non-normative). Includes the optional canonical_action verb-formulation (capability-is- a-verb experiment). New files only -> non-runtime, no deploy, conflict-free ride-along. Capability ids unchanged (Execution registry contract). Owner verifies + integrates. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-28 08:47:56 +02:00
Benjamin Admin	98f67e75d9	feat(mission): Customer Mission #1 — the platform as one connected expert system (end-to-end) Turn the architecture inside-out: instead of refining classes/registries/journeys, force the whole platform to behave as ONE expert system and run a real consulting project end-to-end — measuring how often the consultant has to "jump" (special-case glue instead of a clean engine-to-engine handoff). A Reference Scenario asks "is the knowledge correct?"; a Customer Mission asks "can a customer WORK with it?". This is the last big architecture test before broad corpus expansion. - reference_scenarios/mission_machine_builder.py: a synthetic machine builder (ISO9001 + ISMS + CE + PLC + remote maintenance + cloud + 80 devs + EU; no real names) asks "what must I do in the next 6 months?". Runs the REAL engines: Regulatory Map -> Journey selection -> Capability Delta (RS-005) -> Roadmap (leverage) -> Playbooks -> Evidence -> Verification -> Completeness, and produces the 6-month consulting answer ("the top-5 measures close 9/16 = 56%, starting with the ones that satisfy CRA AND MaschinenVO at once"). - Flow-Continuity audit (the actual test): 5 CLEAN, 2 JUMPS, 2 deliberate DEPENDENCIES. The two real seams: (1) Scope -> Journey (no `certs x targets -> journeys` selector engine; the data exists in transitions.yaml, only the selection is glue); (2) Evidence -> Verification (parked, Vision V2). The two dependencies (cert->capability map @Execution, corpus_status curation) are intended ownership boundaries, not architecture breaks. - Finding: the platform carries the WHOLE consulting flow end-to-end. Once the Scope->Journey selector exists, the foundation is essentially done — from there the work is knowledge, not architecture. 4 end-to-end tests (mission runs, exactly two known jumps, full flow present, no real company names). check-loc 0. Non-runtime harness -> no deploy (ADR-001). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-28 08:39:26 +02:00
Benjamin Admin	ecae5bc7f1	feat(vocabulary): Domain Vocabulary — identity vs representation; regulation aliases fix the KPI normalization Before the next Journey: the LANGUAGE. With 5 knowledge objects but no vocabulary, the same reise gets named four different ways (ISO9001->MaschinenVO vs Quality Management->Product Safety vs ...). The spec answers ONE question: which terms are IDENTITIES and which are REPRESENTATIONS of the same meaning? - spec docs-src/architecture/domain-vocabulary-spec-v1.md (PROPOSAL): identity hierarchy (Requirement RQ / Capability MCAP [Registry 2C] / regulation-source-target / Journey Class MJRN [PROVISIONAL] / Journey instance / Playbook MPLB); canonical name + aliases; capability vocabulary = the Capability Registry (not rebuilt); reorder Vocabulary -> Transition #2 -> #3 -> Rule of Three. - knowledge/vocabulary/regulations.yaml: regulation/standard IDENTITIES (id + canonical + aliases). SOLVES the regulation-ID normalization the KPIs flagged: CRA == "Cyber Resilience Act" == "Regulation (EU) 2024/2847" all resolve to `cra`; ISO9001/QMS -> iso9001; etc. Shared artifact (@Legal-KG/@Execution please adopt). - knowledge/vocabulary/journey_classes.yaml (PROVISIONAL): clusters our transitions into classes (Information Security -> Product Cybersecurity; Quality Management -> Product Compliance/Safety). Finding: ISO9001->MaschinenVO is an INSTANCE of an existing class (like ISO9001->CRA, ISO13485->MDR), not a new kind -> avoids duplication. Journey Class is a new abstraction -> its own Rule of Three (no MJRN minting yet). - reference suite: both KPIs now read aliases from regulations.yaml instead of hard-coded maps; the "Regelwerk-ID-Normalisierung" line flips TODO -> PASS. KPI numbers unchanged (vocab is a superset). - Side effect = Requirements Intelligence: a Tender "Security Patch Procedure" resolves to MCAP-0017. 7 vocabulary tests (17 with domain programs), check-loc 0. Knowledge data + spec + reference harness = non-runtime -> no deploy (ADR-001). No new module, no runtime change, no minting (Freeze). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-28 08:11:30 +02:00
Benjamin Admin	18f5d0cb05	feat(programs): Operational Knowledge — the transition is the unit + Transition Coverage KPI Customers don't buy "EMV domain"; they buy "we have ISO 9001, help us with the CRA". The sellable unit of knowledge is the TRANSITION (from -> to), not the law and not the capability. This reframes the backlog from "model EMV next" to "the top demanded transitions". No new runtime framework (ADR-010). - knowledge/programs/transitions.yaml: the Operational Knowledge backlog — the ~20-30 actually demanded transitions (of ~N*(N-1) possible) with priority. ISO27001->CRA, ISO9001->CRA, ISO9001->MaschinenVO (all 5-star), IEC62443->CRA, TISAX->CRA, ISO27001/IEC62443->NIS2, ISO14001->Umweltrecht. - Transition Coverage KPI (reference suite, computed-not-stored): per transition a status DERIVED from the transition-pattern corpus (reviewed/validated/proven -> Gold, draft -> 🟡, none -> ⚪). Honest current state: ISO27001->CRA ✅ reviewed, ISO9001->CRA 🟡 draft, rest ⚪. Highest-priority gap = ISO9001->MaschinenVO (the next Track-B work) — a far stronger product indicator than "EMV 30% modelled". - Three knowledge layers documented: Regulatory -> Operational (transitions/playbooks/deltas, the biggest differentiator) -> Verification (Vision V2). A domain is a TRANSITION PROGRAM with two tracks: Track A breadth (model sources, @Legal-KG/@Execution) + Track B product (transitions/playbooks/RTS per source, @Reasoning). - ADR-010: the transition is the unit of knowledge; Transition Coverage KPI; three layers; two tracks. 10 program/transition-contract tests, check-loc 0. Knowledge data + ADR + reference harness = non-runtime -> no deploy (ADR-001). No new module, no runtime change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-27 23:48:45 +02:00
Benjamin Admin	1a9439d013	feat(programs): open Domain Knowledge Program v1 — 7-stage production line + per-domain KPI The real bottleneck is domain MODELLING. Phase B is organized as one program with sub-programs per domain, each run through the SAME 7-stage production line. No new runtime framework, no new module (ADR-009, Freeze v1.0) — only program data + a derived reporting view. - Customer enters by INDUSTRY, not regulation: Industry -> Domain Model -> Requirement Sources -> Requirements -> Capabilities -> ... -> Completeness. - 7-stage checklist identical for every domain (Domain Model / Requirement Sources / Capability Registry / Transition Patterns / Playbooks / Reference Scenarios / Completeness) with per-stage ownership. README generalized to the framework. - Each domain lists typical_requirement_sources + typical_certifications -> pre-onboarding capability HYPOTHESIS (the ETO insight; feeds Company 2A as inferred, never confirmed). - Backlog v1 (by customer value): 1 Industrial Automation, 2 Environmental, 3 Automotive, 4 Medical, 5 Energy. Five domain-definition shells (environmental restructured to the unified shape, law-first preserved). - Per-domain KPI is DERIVED from the real corpus (computed-not-stored; sources modelled / transition patterns / playbooks / reference scenarios), NOT a curated number. Reference suite renders maturity bars: Industrial Automation 43% (3/7 sources) leads, Environmental 0% (work ahead). Backlog (value) and KPI (corpus state) are deliberately separated. - ADR-009: Domain Knowledge Program framework. Honest known refinement: regulation-ID normalization (CRA vs Cyber Resilience Act) aliased in the KPI. 7 program-contract tests (backlog order + industry-first + derived-not-stored), check-loc 0. Knowledge data + ADR + reference harness = non-runtime -> no deploy (ADR-001). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-27 18:49:06 +02:00
Benjamin Admin	9c02c2c4a2	feat(programs): start the Environmental Knowledge Program — domains, not architecture The architecture is stable; from here the value comes from DOMAINS, not more software. Phase B is organized as law-first Domain Knowledge Programs, each delivering the same production line: Corpus -> Obligations -> Capabilities -> Transition Patterns -> Playbooks -> Reference Scenarios -> Completeness. No new runtime framework (Freeze v1.0). - knowledge/programs/README.md: reusable Domain Program blueprint (production line, per-stage ownership, law-first ordering, planned programs Environmental/Automotive/IEC62443/Functional-Safety). - knowledge/programs/environmental.yaml: the Environmental domain as DATA. Law-first: B1 Environmental Regulatory Corpus (water/chemicals/emissions/energy/waste/product-responsibility — law + obligations only) -> B2 Capability Model -> B3 Transition Patterns (ISO 14001 -> corpus, built LAST). ISO 14001 is a source state, NOT the domain. - Ownership handoffs: B1 -> Legal Knowledge, B2 -> Compliance Execution, B3+/playbooks/reference -> Reasoning. Coordinate via the board; no session builds another's artifacts. - reference suite: "Domain Knowledge Programs" section renders the program stages + a measurable Completeness baseline (6 areas, 0 assessed today) that flips automatically as stages land. - ADR-008: from architecture to domains; Phase B as law-first programs; architecture frozen. 6 program-contract tests (law-first order + ownership pinned), check-loc 0. Knowledge data + ADR + reference harness = non-runtime -> no deploy (ADR-001). No new module, no runtime change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-27 14:36:03 +02:00
Benjamin Admin	aa99111a87	feat(completeness): Regulatory Completeness Engine — auditable coverage, not confidence Phase A½. The move from feature to product development: for every assessment, answer "how sure are we that this answer is COMPLETE?" — different from confidence. The product never claims full coverage; it makes its own knowledge state transparent and auditable. Shows what we do NOT know and why. - compliance/completeness/: assess_completeness(identified, corpus_status, uncertain, assumptions, assessed_obligations) -> CompletenessReport. Separates IDENTIFIED from ASSESSED (validated corpus AND determined applicability) and justifies every gap. Two kinds of open: corpus gap (future_corpus) and applicability uncertainty (query_required + deciding question, e.g. Data Act / generates_usage_data). - The metric is COUNTS, never a single percentage: "Identifiziert N · bewertet M · offen K · Unsicherheiten U · Begründung ja" + an honest audit statement. - ADR-007: auditable honesty; phase order A factory -> A½ Completeness -> B new domains; the transparency selling point. Deterministic, no LLM; corpus status + obligation count injected. - reference suite: "Regulatory Completeness" section runs an industrial-dishwasher assessment (assessed CRA/MaschinenVO; open EMV/Environmental=future_corpus, Data Act=query_required) and notes Environmental flips open->validated automatically once the corpus lands. 11 completeness tests (54 with adjacent modules), mypy --strict clean (15 files), check-loc 0. Product code with no app caller + ADR/reference = non-runtime -> no deploy (ADR-001). Freeze-safe. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-27 14:16:12 +02:00
Benjamin Admin	07e392913f	feat(knowledge-intake): classify a document + assess its impact before extraction Phase A1. The real knowledge production is not writing — it is TARGETED UPDATING: when 20 documents arrive, which 5 change our knowledge and which 15 are ignorable? Before the parser, Knowledge Intake classifies a new document (no content extraction) and intersects its signals with an index of the existing knowledge to emit a Knowledge Package (an impact analysis). - compliance/knowledge_intake/: build_knowledge_index(patterns, playbooks, reference_scenarios, obligation_index) + assess_document_impact(descriptor, index) -> KnowledgePackage. Deterministic, NO content extraction, NO LLM. Surfaces affected capabilities / playbooks / transition patterns / reference scenarios / (injected) obligations, whether it is a new domain, and a triage level (HIGH / LOW / NONE / NEW_DOMAIN) with a recommendation. - ADR-006: Knowledge Intake = classify + impact before extraction; full factory Intake -> Package -> Parser -> Draft -> Review -> Published; phase order A1 Intake / A2 Draft / A3 Review. - reference suite: "Knowledge Intake" section triages 3 example documents (CRA SBOM-FAQ -> high, 14C/2PB/3RTS/2Obl; environmental guidance -> new_domain; marketing blog -> ignorable). Section lives in _helpers.py to keep generate.py under the 500-LOC budget. - Honest known refinement surfaced by intake: regulation-ID normalization (CRA vs Cyber Resilience Act). 10 intake tests (60 with the adjacent modules), mypy --strict clean (16 files), check-loc 0. Product code with no app caller + ADR/reference = non-runtime -> no deploy (ADR-001). Freeze-safe. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-27 13:58:59 +02:00
Benjamin Admin	b6cfc0a503	feat(knowledge-production): Playbook Draft Generator — prepare the corpus deterministically The bottleneck is not content, it is knowledge PRODUCTION. Instead of writing 200 playbooks by hand, generate drafts deterministically from data the software already owns, then have an expert review them. Mirrors the legal pipeline (Gesetz -> Parser -> Obligation -> Review) for BreakPilot's own knowledge: new Capability -> Registry -> Transition Pattern -> Playbook Draft Generator -> Expert Review -> versioned Playbook. - compliance/knowledge_production/: generate_playbook_draft(capability, requirement, control_links) + drafts_from_pattern(pattern) -> one PlaybookDraft per delta capability. Owned fields (why / closes_regulations / expected_evidence / typical_controls) are assembled with per-field provenance; the practitioner know-how (tools / process_steps / how_others) is left as an explicit TODO. - DraftStatus lifecycle (Freigabestatus): draft_generated -> in_review -> reviewed -> validated -> proven. Deterministic, NO LLM in the core (any model enrichment stays offline/advisory/propose-only). - ADR-005: extends "the engine does not change, the corpus grows" with "and the corpus is not written by hand — it is deterministically prepared, then curated". - reference suite: "Knowledge Production" section turns the convergence pattern into 12 auto-assembled drafts (why/closes/evidence filled, tools/steps TODO) -> review 12 drafts, don't write 12 playbooks. 10 tests (50 with playbook/optimization/transition/company), mypy --strict clean, check-loc 0. Product code with no app caller + ADR/reference = non-runtime -> no deploy (ADR-001). Freeze-safe. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-27 13:31:31 +02:00
Benjamin Admin	78f0ffa9de	feat(playbook): Implementation Playbooks — the Berater renderer ("wie komme ich dort hin?") Roadmap item 4. After WHAT applies / WHAT is missing / WHICH first, the GF asks HOW. The Implementation Playbook renders, for one capability, the full journey — why / which regulations it closes / tools / process / evidence / controls — and chains the Optimization Roadmap into per-measure playbooks. Another renderer over the same Capability spine (ADR-003/004), not a new engine: ~95% of the data already exists, it just needs a different rendering. - compliance/playbook/: build_playbook() + playbooks_for_plan() (chains optimization -> playbook, acyclic; reuses leverage for "closes which regulations"). Capabilities without curated content render as honest status:missing stubs — the content-owed signal. - knowledge/implementation_playbooks/: curated knowledge layer (Reasoning Knowledge Acquisition), two deep expert drafts (SBOM, CVD/PSIRT, status draft, expert-draft-not-normative) + README. The bottleneck is now CONTENT, not software; Playbook (own knowledge) != regulatory domain. - ADR-004: Implementation Playbooks = renderer + knowledge layer; content is the bottleneck. - reference suite: "Implementation Playbook" section renders the SBOM journey + Roadmap->Playbook table (high-leverage caps flagged "fehlt (Inhalt)" — content backlog, highest leverage first). - refactor: extracted markdown helpers to reference_scenarios/_helpers.py to keep generate.py under the 500-LOC budget. 9 playbook tests (40 with optimization+transition+company), mypy --strict clean, check-loc 0. Product code with no app caller + knowledge/ADR/reference = non-runtime -> no deploy (ADR-001). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-27 10:38:13 +02:00
Benjamin Admin	cfafa31ea2	feat(optimization): Regulatory Optimization — Roadmap/Management renderer over the Capability Delta Roadmap item 5. GAP analysis and measure-prioritisation are the SAME computation: Required − Known = the Capability Delta. The Capability Delta Engine (RS-005) computes it once; renderers read that ONE delta. Interview Renderer (missing info → questions) was already built; this adds the Roadmap/Management Renderer (missing capabilities → measures ranked by regulatory leverage). - compliance/optimization/: regulatory_leverage() + select_within_budget() (pure leverage math) + roadmap_from_delta(assessment, ...) — the keystone binding optimization to the RS-005 delta (dependency optimization → transition_reasoning, acyclic; the delta engine stays hermetic). leverage(measure) = number of regulatory requirements it closes at once (e.g. patch management → CRA+MaschinenVO+IEC62443+ISO27001 = 4). No new corpus, no new meta-model class (freeze v1.0). - Welt-1 honesty: percentages are exact count ratios over the IDENTIFIED requirements (the known delta), never "% gesetzeskonform". - reference suite: "Regulatory Optimization" section runs the SAME convergence delta → ranked measures + budget answer + the management sentence "of N identified requirements you close M with the top-K measures (X%) — highest regulatory leverage". - ADR-003: Capability Delta Engine — one delta, many renderers; rename Gap → Capability Delta. 13 optimization tests (31 with transition+company), mypy --strict clean, check-loc 0. Product code with no app caller + ADR/reference = non-runtime → no deploy (ADR-001). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-27 09:49:38 +02:00
Benjamin Admin	a0f72fc39b	feat(rts): extend Reference Transition Scenarios to multi-regulation (CRA + MaschinenVO) Roadmap item 2: the RTS now pin MaschinenVO + convergence Expected Outcomes, so the convergence USP is a living regression, not just a one-off section. - RTS-003 (machine + ISMS, networked): full multi-regulation archetype — maschinenvo expected_delta + convergence expected_multi_target (links TP-ISO27001-CRA-MaschinenVO-v1). Generator runs the convergence pattern through RS-005: 4/4 machine-safety delta MISSING + 4/4 expected multi-target caps converge. PASS. - RTS-001 (component): MaschinenVO modeled as `uncertain` (a pure component is usually not a machine; deciding question is_safety_component) — engine must never assert it applies. Honest, parallel to the Data-Act handling. - RTS-002 (machine, QMS-only): MaschinenVO `applies` (is_machine) but LOW convergence — no ISMS means the cyber side is entirely delta, so few caps are shared. The honest contrast that the convergence USP rewards companies who already run an ISMS. - generator: per-RTS maschinenvo/convergence Soll-Ist checks; convergence pattern run once and reused. Data Act stays `uncertain` everywhere, never asserted. All 3 RTS PASS. 18 tests (transition+company), mypy --strict clean, check-loc 0. Non-runtime (knowledge + reference harness) -> no deploy (ADR-001). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-27 09:26:01 +02:00
Benjamin Admin	66be23f0c4	feat(convergence): first Regulatory Convergence Pattern (ISO27001 -> CRA + MaschinenVO) The first multi-regulation pattern: each capability declares `covers_targets`, so we can answer the convergence USP — "which capability satisfies CRA AND MaschinenVO at once?" - knowledge: transition_pattern_iso27001_to_cra_maschinenvo_v1.yaml (pattern_type: regulatory_convergence, status draft). The cyber-safety bridge = MaschinenVO Annex III 1.1.9 "protection against corruption" overlapping CRA integrity. 4 convergence capabilities cover BOTH; 5 CRA-only; 3 MaschinenVO-only. - product: compliance/transition_reasoning/convergence.py — regulatory_convergence() pure/deterministic/computed-not-stored, no new graph/class (freeze v1.0 untouched). No app caller yet -> non-runtime, no deploy (ADR-001). - reference suite: Cross-Regulation Capability Mapping section renders the customer sentence "von N neuen Massnahmen erfuellen M gleichzeitig CRA und MaschinenVO". - README: term -> Regulatory Transition / Convergence Pattern; covers_targets documented. - tests: test_regulatory_convergence (18 transition+company pass), mypy --strict clean. Curated expert knowledge, AI first draft (L1/draft) — Annex/Article refs indicative, review_required by a machinery-safety expert. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-27 09:12:30 +02:00
Benjamin Admin	f78e03bd0a	docs(knowledge): Reference Transition Scenarios (RTS-001..003) + ISO9001->CRA pattern Three ANONYMIZED reference transition scenarios (no real company names stored) = canonical regression scenarios that test the KNOWLEDGE, not just the engine. Each pins an Expected Outcome (expected_likely_covered + expected_delta); every commit must reproduce it (identical or better). - RTS-001 automotive supplier (TISAX+ISO27001) -> CRA: mature ISMS, standard CRA delta. - RTS-002 classic machine builder (ISO9001) -> CRA: only process discipline -> MUCH larger delta (10 missing vs 3 covered). New TP-ISO9001-CRA-v1 pattern (different shape). - RTS-003 networked machine builder (ISMS) -> CRA: highlights the Data Act. Data Act is modelled as UNCERTAIN (a hypothesis), never a fixed gilt/gilt-nicht: the generator checks the engine SURFACES the uncertainty + the deciding question (generates_usage_data) and never wrongly ASSERTS applicability. All three RTS PASS. Non-runtime knowledge + reference harness -> no deploy (ADR-001). Names deliberately absent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-27 08:46:20 +02:00
Benjamin Admin	0da093c046	docs(knowledge): TKP 4-level lifecycle + 3 enrichments + ISMS->TISAX (genericity proof) Transition KNOWLEDGE Patterns (renamed term -- curated knowledge, not an algorithm): - 4 maturity levels: draft -> reviewed -> validated (domain expert) -> proven (field). "approved" dropped; target is validated. TP-ISO27001-CRA set to reviewed (L2). - 3 enrichments per pattern: confidence_source: relationship (curated, not an LLM estimate -> computed-not-stored); why_asked (customer-facing: why the source does not suffice here); dropped_if (what makes the question unnecessary). Applied to TP-ISO27001-CRA. - New TP-ISMS-TISAX (draft): different character -- info-security module mostly covered; delta is automotive-specific (prototype protection, TISAX labels, VDA ISA self-assessment, ENX assessment, Art. 28 data protection). Proves the architecture is GENERIC, not CRA-tailored. - Reference scenario 4 generalized to loop over ALL patterns through RS-005: both carried (CRA 17->17, TISAX 13->13) -> a living genericity + regression test for every future pattern. Non-runtime knowledge + reference harness -> no deploy (ADR-001). Next: ISO9001->IATF16949. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-27 08:29:30 +02:00
Benjamin Admin	4bfd552da7	docs(knowledge): TP-ISO27001->CRA gold standard + reference scenario (RS-005 regression) (1) Harden the first Transition Pattern to the gold-standard template per quality checklist: versioned transition_goal (ISO27001:2022 -> CRA, applies 2027-12-11), source_state_variants (certified/isms_introduced/expired/limited_scope), each likely_covered assumption with a typed relationship (supports\|partially_supports, never equivalent) + verification + rationale (the Warum) + an auditor-checkable reviewable_claim, delta as missing-capability + needed-info, an explicit rejected_assumptions section, and a determinism_goal. README schema updated to match. (2) New Reference-Suite scenario 4 (Transition): the generator READS the pattern YAML and runs it through the RS-005 Planning Engine + Company 2A -> coverage + question requests. Proves the architecture fully carries the pattern (17 caps -> 17 coverage + 17 requests; 9 HIGH delta = the real CRA gaps, 8 probably-covered from the ISMS). Now a living regression test: every future pattern runs through the same engine. Non-runtime knowledge + reference harness -> no deploy (ADR-001). Next: ISMS->TISAX once approved. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-27 08:11:42 +02:00
Benjamin Admin	bea8559f78	docs(knowledge): first Transition Pattern ISO27001 -> CRA (curated knowledge base) Reasoning session's new Knowledge Acquisition responsibility (re-charter): build and curate the Transition Knowledge Base under backend-compliance/knowledge/transition_patterns/ (beside reasoning/, not under it -- it is knowledge, not an engine). First professional pattern TP-ISO27001-CRA-v1 (status: draft): separates what a mature ISMS likely covers at the ORG level (probably_covered, needs product-level confirmation, never auto-"erfuellt") from the CRA-specific delta with no ISO 27001 analogue (SBOM, support period + secure signed updates, coordinated vulnerability disclosure, Art. 14 authority reporting, product cyber risk assessment, CE conformity / technical documentation). Expert draft, not a normative proof; review_required before customer use. Non-runtime knowledge -> no deploy (ADR-001). Next: ISMS->TISAX. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-27 07:50:42 +02:00
Benjamin Admin	77de7e794c	feat(transition): Transition Reasoning v0 (RS-005) — Transition Planning Engine Second reasoning mode, scope per user: the engine owns the INFORMATION GAPS, not the questions. assess_transition(context, target_requirements, company_profile) emits ranked TransitionQuestionRequest {capability, control, reason, question_intent, expected_evidence, priority, information_gain} -- NOT rendered question text. Rendering (intent+subject->sentence) is a separate swappable layer (RS-005.1), not here. Consumes the Company Capability Profile (2A) as "have" + injected TargetRequirement (Execution-owned placeholder) as "required" -- no required-capability data in product code (EMPTY_REQUIREMENTS, mocks only in tests). A certification-derived capability is probably_covered (Welt 1) -> a confirmation request, never already_covered/"erfuellt". Deterministic, computed-not-stored, no percentages. Activates 2A/2C/RCI (first consumer of the Company profile). Freeze-respecting: additive package, no new graph/base class/meta-model class. 9 tests, mypy --strict clean, LOC ok. No endpoint/UI/RAG; question rendering deliberately deferred to RS-005.1. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-27 07:31:11 +02:00
Benjamin Admin	16371f2909	feat(reference): Reference Scenario Suite v1 (living regression reference, not docs) Three real customer scenarios driven through the DEPLOYED engines (scope/map/ interpretation, RCI, company 2A, capability registry). Each scenario emits an Architecture Coverage table DERIVED from the real run, so cells flip automatically as domains land (e.g. Sz2/Environmental UNSUPPORTED -> PASS). The roll-up answers "is BreakPilot better than six months ago" by real customer situations, not LOC. Gaps captured as epics (NOT implemented): RS-001 Interpretation Pattern Library, RS-002 Environmental Corpus, RS-003 Capability Linking (cap<->MCAP) + Company-Gap, RS-004 MaschinenVO/EMV Registry Linking. reference_scenarios/generate.py = reproducible source (ruff/mypy-exempt, NOT product code, not imported by the app); reference_scenario_suite_v1.md = generated artifact. No new product code; CRA patterns deliberately NOT built — the suite is now the measure. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-26 22:48:27 +02:00
Benjamin Admin	6ccc6c87c1	feat(capability): Master Capability Registry v0 (Phase 2C, Compliance Execution domain) Third instance of the identity-machine pattern (after Master Controls and Master Obligations). New compliance/capability/ package: MasterCapability with stable MCAP ids, CapabilityCandidate minting, seven typed relation types, a VERSIONED derivation policy, and identity lifecycle (merge/split/deprecate/redirect with provenance). Stored: identities, sources, relationship types, policy versions, lifecycle events, provenance. Derived (never stored): confidence/status via evaluate_relation under a policy version. Hard rule (structurally guarded): a certification alone can never yield CONFIRMED — only CONFIRMS + concrete artifact (or expert) does. Built from the Reasoning session per user directive but this IS the Compliance Execution model (Execution owns Capability) — handed off via the board. Metadata-first: CapabilityRelation is registry metadata, NOT a new meta-model class (freeze v1.0 untouched). No Company-Gap, no real ISO/cert mappings, no UI/RAG, no generic canonicalization engine. 11 tests; mypy --strict clean; LOC ok. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-26 21:35:12 +02:00
Benjamin Admin	8c893ca783	feat(company): Company Intelligence 2A — Company Capability Profile foundation HEAD of the spine Company->Capability->Product->Regulation->Obligation->Procedure ->Evidence. New compliance/company/ package: CompanyContext container + a four-state trust model (declared/inferred/confirmed/unknown). Hard rule (structural): a certification yields at most an INFERRED candidate and is never auto-treated as CONFIRMED/"erfuellt". A certification produces evidence-of- capability; only real ExistingEvidence promotes a capability to CONFIRMED. Ownership: Reasoning owns the container + trust-state; the Certification->Capability mapping is Execution's domain, consumed via an injected contract. No mapping data in product code (tests inject mocks). No endpoint/UI/RAG/new regs/controls; no meta-model classes (freeze v1.0 untouched). 8 tests; mypy --strict clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-26 14:59:42 +02:00
Benjamin Admin	a5687bbc65	feat(rci): Regulatory Change Intelligence foundation (delta over the stored map) RCI/Delta as a read-/reasoning layer ON TOP of the product-first pipeline. Answers "what changes relative to my existing Regulatory Map?" — NOT "what does the new law say in general". No UI, no ingestion (newsletter/mailbox), no RAG, no new regulations/controls, no legal evaluation outside the stored map. - 4 core objects (compliance/rci/schemas.py): ComplianceBaseline (snapshot of profile + map + registry obligations + required/present evidence), RegulatoryChange (simulated/provided INPUT), ObligationDelta (delta_type NEW\|CHANGED\|REMOVED\| ALREADY_COVERED\|NEEDS_REVIEW\|NOT_APPLICABLE), ChangeImpactSummary. delta_type is a THIRD vocabulary, disjoint from ClaimCoverage (Welt 1) and ComplianceStatus (Welt 2). - create_baseline() snapshots the existing pipeline once; assess_change() computes deltas deterministically against the snapshot (no re-evaluation). - 12 tests = the 5 acceptance questions (affects product? new/changed? already covered by evidence? needs human review? not relevant?) + repeal/uncertain-reg/ missing-evidence/boundary. Existing pipeline tests stay green; mypy clean; LOC ok. - App/reasoning types only — no compliance-meta-model classes (freeze v1.0 untouched). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-26 13:45:23 +02:00
Benjamin Admin	50ae9e94d1	feat(interpretation-in-map): judge a customer interpretation within the map (step 5) Thin adapter — it judges the customer's reading WITHIN the already-built RegulatoryMap, it does not assess abstract legal questions and it is not RCI. - Reuses the existing assess_interpretation (no new legal reasoning); the 6 verdicts (plausible/too_narrow/too_broad/partially_correct/unsupported/uncertain) pass through unchanged. - Restricts affected_regulations/affected_obligations to those present in the map (intersection); links to the map's uncertain regulations. - Touched unsupported domains (wastewater/chemicals/...) are reported as future_corpus_domains (future_corpus_needed) — never pseudo-evaluated. - Customer-readable explanation ("Ihre Interpretation ist wahrscheinlich zu eng. … Betroffen in Ihrer Map: CRA."). - POST /reasoning/interpretation-in-map (renders the map, then interprets). - 7 tests; 63 green (existing reasoning MVP stays green), mypy clean, LOC ok. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-26 10:58:00 +02:00
Benjamin Admin	9312ad18ef	feat(regulatory-map): customer-readable read-model over the scope (step 4) The Map Renderer explains the engine's state, it does not extend it. Pure composition of resolve_product_scope (scope verdict) + derive_obligations (registry-linked obligations + overlaps) into one RegulatoryMap. - product_summary, trigger_facts, applicable/uncertain/excluded regulations, unsupported_domains, overlaps (shared_obligations), shared_evidence, and a customer-readable executive_summary. - No own legal decisions: applicable/uncertain mirror the scope verdict exactly. - Obligations shown ONLY when registry-linkable (registry_anchor) — MaschinenVO/ EMV obligations are proposed, so they render empty + a note, never as linked. Overlaps/shared_evidence likewise filtered to registry-linked members. - Uncertain regulations link to the navigator question that would resolve them (RED -> has_radio_module, DataAct -> generates_usage_data). - Environmental appears only as unsupported_domain; executive_summary has NO percentage (counts + "no further regulations identified" instead). - POST /reasoning/regulatory-map (thin handler). Response types are presentation- level, not meta-model classes (freeze v1.0 untouched). - 9 tests; 56 green (existing reasoning MVP stays green), mypy clean, LOC ok. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-26 10:36:06 +02:00
Benjamin Admin	4e8eb2dc0e	feat(product-scope): gate Navigator facts, then reuse discover_scope (step 3) Connects the Navigator's fact-gate to the existing reasoning discover_scope — the Scope Engine decides only once the minimum (P0) facts are released. - resolve_product_scope(canonical): if not ready_for_scope -> NEEDS_FACTS (missing_facts + suggested_questions, discover_scope NOT run); else project canonical->reasoning profile and run the EXISTING discover_scope exactly once -> RESOLVED with applicable/excluded/uncertain regulations. - Environmental triggers surface ONLY as unsupported_domains (future_corpus_needed), never as a legal evaluation — transparency, no false completeness. - POST /reasoning/product-scope (thin handler) returns case NEEDS_FACTS or RESOLVED. - No new scope rules, no new regulations, no environmental-law evaluation, no UI, no Go, no RAG, no percent-compliance. Response types are application-level, not meta-model classes (freeze v1.0 untouched). - 6 tests incl. discover_scope spy (0 calls when gated, exactly 1 when ready), category separation, environmental-as-unsupported-only. 47 tests green (existing reasoning MVP tests stay green), mypy clean, LOC ok. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-26 10:21:27 +02:00
Benjamin Admin	78aeedafae	feat(navigator): Product Regulatory Navigator as a thin missing-facts layer Step 2 of the convergence sequence. The Navigator sits over the CanonicalProductRegulatoryProfile (prefilled from company-profile / ProductWizard) and reports ONLY which facts are still missing + prioritized questions to collect them. It decides which facts are needed, NEVER what applies — that stays with the Scope Engine (step 3). No regulation logic, no UI, no Go, no RAG. - NavigatorQuestion (interaction type, NOT a compliance-meta-model class — freeze v1.0 untouched): question_id, target_field, label, why_needed, regulatory_domains_unblocked (static metadata), answer_type, options, priority. - QUESTION_CATALOG: 12 questions over canonical gaps — P0 (markets, role, lifecycle, machine/component), P1 (radio, usage-data, security-function, environmental wastewater/air/chemicals triggers), P2 (structured BOM). - engine: navigate() -> missing_facts + suggested_questions (priority-sorted) + completeness_summary (ready_for_scope = no P0 missing); apply_answers() -> updated profile. Pure field-presence; no scope import. - 8 tests: <=10 questions for a filled company-profile, known facts not re-asked, environmental = trigger questions only (no law evaluation), apply round-trip, P0 ordering, ready_for_scope. 41 tests green, mypy clean, LOC ok. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-26 10:05:27 +02:00

1 2 3 4 5 ...

631 Commits