Commit Graph

570 Commits

Author SHA1 Message Date
Benjamin Admin
28aa74b4b0 Merge remote-tracking branch 'gitea/main'
Some checks failed
Build pitch-deck / build-push-deploy (push) Failing after 1m13s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 49s
CI / test-python-voice (push) Successful in 38s
CI / test-bqas (push) Successful in 31s
# Conflicts:
#	pitch-deck/components/slides/MilestonesSlide.tsx
#	pitch-deck/lib/finanzplan/engine.ts
2026-04-27 13:14:54 +02:00
Benjamin Admin
8e37441782 perf(pipeline): switch back to v4 prompt — backfill costs nearly the same
v3+backfill=$31.60/10k vs v4=$33/10k — not worth the extra complexity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-27 00:44:23 +02:00
Benjamin Admin
6a0e7c947f perf(pipeline): switch to v3 prompt for generation, v4 fields via Haiku backfill
Remove applicability/scanner_hint/evidence_type/provides_context from
Pass 0b prompt to reduce output tokens (~40% less). These 6 fields are
added via cheap Haiku backfill afterwards (~$1.50 per 10k controls).

Saves ~$200 over the remaining 160k obligations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-27 00:14:47 +02:00
Benjamin Admin
3c1a2d9c41 Remove re-export shim from keycloak_auth.py, update consumer imports
- rbac_api.py: import get_current_user from auth.dependencies directly
- keycloak_auth.py: remove re-export of dependencies module symbols
- pdf_service.py, file_processor.py: remove misleading compat comments

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-27 00:13:30 +02:00
Benjamin Admin
92c86ec6ba [split-required] [guardrail-change] Enforce 500 LOC budget across all services
Install LOC guardrails (check-loc.sh, architecture.md, pre-commit hook)
and split all 44 files exceeding 500 LOC into domain-focused modules:

- consent-service (Go): models, handlers, services, database splits
- backend-core (Python): security_api, rbac_api, pdf_service, auth splits
- admin-core (TypeScript): 5 page.tsx + sidebar extractions
- pitch-deck (TypeScript): 6 slides, 3 UI components, engine.ts splits
- voice-service (Python): enhanced_task_orchestrator split

Result: 0 violations, 36 exempted (pipeline, tests, pure-data files).
Go build verified clean. No behavior changes — pure structural splits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-27 00:09:30 +02:00
Benjamin Admin
5ef039a6bc feat(pipeline): Pass 0b prompt v4 + Haiku backfill endpoint
Prompt v4 adds 6 new fields to Pass 0b output:
- applicability: condition rules (same format as dependency engine)
- check_type: expanded to 10 granular types
- scanner_hint: search_terms + negative_indicators for MCP
- manual_review_required_if: escalation conditions
- evidence_type: code/process/hybrid
- provides_context: context variables this control creates

New endpoint POST /generate/backfill-extended:
- Backfills existing 9k controls via Haiku Batch API (~$1.50)
- Adds all 6 new fields to generation_metadata
- Supports dry_run mode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 23:14:59 +02:00
Benjamin Admin
96b8f25747 fix(pipeline): use action_type-derived phase order in ontology generator
LLM merge_key phases (e.g. "submission") don't always match PHASE_ORDER
keys. Derive phase order from action_type via get_phase_order() instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 20:32:58 +02:00
Benjamin Admin
42ab5ead26 feat(pipeline): implement Control Dependency Engine (Block 9)
Core engine (dependency_engine.py):
- 5 dependency types: prerequisite, supersedes, compensating_control,
  conditional_requirement, scope_exclusion
- Generic condition evaluator (JSONB rules with AND/OR/NOT/field ops)
- Priority-based conflict resolution
- Cycle detection (DFS) + topological sort
- Full evaluation with MCP-compatible dependency_resolution trace
- 39 tests all passing (incl. GHV scenario from user requirements)

Automatic generator (dependency_generator.py):
- Ontology-based: same normalized_object + phase sequence -> prerequisite
- Pattern-based: define->implement, implement->monitor, etc.
- Domain packs: YAML rules for GDPR, AI Act, CRA, Security, Labor Contracts
- 14 tests all passing

API routes (dependency_routes.py):
- CRUD for dependencies
- POST /evaluate with dependency resolution
- POST /generate (auto-generation with dry_run)
- POST /validate (cycle detection)
- GET /graph (nodes + edges for visualization)

Prompt enhancement (decomposition_pass.py):
- Added dependency_hints + lifecycle_phase_order to Pass 0b prompt
- Stored in generation_metadata for post-processing

DB migration: control_dependencies + control_evaluation_results tables

126 tests total, all passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 20:28:10 +02:00
Benjamin Admin
5aaa62dca7 fix(pipeline): improve quality metrics heuristics
- Fix truncated title detection: only flag near-200-char titles or mid-word cutoffs
- Fix evidence leak detection: check title start patterns, not keyword substring
  ("nachweisen" verb is valid action, "Nachweis vorliegen" is evidence)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 09:53:52 +02:00
Benjamin Admin
d583971afd feat(pipeline): add quality metrics endpoint for Pass 0b controls
GET /generate/quality-metrics — reports:
- controls_per_obligation ratio
- duplicate merge_key rate
- evidence leak rate
- truncated title rate
- MCP field coverage
- merge_key coverage

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 09:51:27 +02:00
Benjamin Admin
d660a45bb5 feat(pipeline): implement golden test suite + fix ontology patterns
- Add test_golden_controls.py: 37 tests covering all 8 YAML categories
  (container, framework, evidence, negative, title, split, scope, merge_key)
- Fix evidence detection: handle German feminine articles (eine/einer/etc.)
- Fix framework detection: use verb stems for conjugated German verbs
- Add framework patterns: OWASP API6, CCM without CSA prefix, generic category
- Fix negative patterns: use "nicht übertragen/gespeichert/erscheinen" before
  generic "dürfen nicht" to correctly route prevent vs exclude

All 73 tests passing (36 ontology + 37 golden).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 09:48:12 +02:00
Benjamin Admin
d1f3b9ffcd feat(pipeline): add submit-pass0b endpoint for batch submission
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 09:42:06 +02:00
Benjamin Admin
d93321275c feat(pipeline): add batch API status + result processing endpoints
- GET /generate/batch-api-status/{batch_id} — check Anthropic batch status
- POST /generate/process-batch — process completed batch results (background)
- GET /generate/process-batch-status/{job_id} — poll processing progress

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 09:36:47 +02:00
Benjamin Admin
629b9d9ca5 feat(pipeline): store MCP fields (assertion, pass/fail criteria, check_type) in generation_metadata
- Add assertion, pass_criteria, fail_criteria, check_type to AtomicControlCandidate dataclass
- Parse MCP fields from LLM output in _process_pass0b_control
- Store MCP fields in generation_metadata JSON for later use by MCP scanner
- Fields default to empty when not present (backward-compatible with old prompts)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 09:32:56 +02:00
Benjamin Admin
7e3b1108e2 feat: integrate Ontology pre-LLM filter into Pass 0b submit
Obligations classified before API call:
- evidence → skipped (saves API cost)
- composite → skipped (not atomic)
- framework_container → skipped (decompose separately)
- atomic → sent to LLM

Filter stats returned in submit response.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 09:13:32 +02:00
Benjamin Admin
b3fbbbacfe feat(control-pipeline): Control Ontology v1 — action types, evidence/container/framework detection
Block 7.1-7.2 from masterplan:
- 26 action_types with German aliases + phase mapping
- Negative obligation patterns (exclude, prevent, enforce)
- Container detection (11 composite objects that must not become atomic)
- Evidence detection (14 indicators + "X dokumentieren" pattern)
- Framework reference detection (OWASP, NIST, BSI, CSA, ISO patterns)
- classify_obligation() routes to: atomic, composite, evidence, framework_container
- build_canonical_key() for deterministic dedup
- 36 tests covering all classification functions

Also: merge_key bug fix in _process_pass0b_control()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 09:06:39 +02:00
Benjamin Admin
3a100fa1f1 feat: Pass 0b prompt v3 — compound action ban, evidence-of-action rule, pflicht-vs-prozess merge
Fixes from v2 evaluation (7.9/10 avg, 28 controls):
1. COMPOUND BAN: "durchführen UND Maßnahmen ergreifen" → pick primary action only
2. EVIDENCE-OF-ACTION: "Tests dokumentieren" → evidence field, not own control
3. PFLICHT=PROZESS: "Behörden informieren" + "Verfahren etablieren" = 1 control
4. MERGE-KEY BUG: merge_key from LLM output now stored in generation_metadata

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 00:25:38 +02:00
Benjamin Admin
fbeb93046d docs: Pass 0b v2 evaluation — 28 controls, 7.9/10 avg, 3 findings for v3
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 00:19:06 +02:00
Benjamin Admin
0cce8a2011 feat: add Golden Test Suite v1 (40 regression tests for Pass 0b pipeline)
8 categories: duplicate explosion, compound split, negative obligations,
container detection, framework decomposition, evidence leakage,
scope dimension, title quality. Includes global quality gates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 00:05:08 +02:00
Benjamin Admin
7a53f5bee1 feat: Pass 0b prompt v2 — container detection, merge-key, evidence separation, actionable titles
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-26 00:00:59 +02:00
Benjamin Admin
ea30ceb1f1 feat(control-pipeline): improved Pass 0b prompt for actionable control titles
Key changes to system prompt:
- Evidence/documentation belongs in evidence field, NOT as separate control
- SBOM = 1 control (not "maintain" + "document" separately)
- Security lifecycle phases (identify/assess/remediate/monitor) = separate controls
- Same object + same action + same actor = 1 control (merge, not split)
- Titles must contain the ACTION, not just the subject
  WRONG: "Vertraulichkeit Mitarbeiter"
  RIGHT: "Mitarbeiter zur Vertraulichkeit verpflichten"

Titles serve as MCP search queries against customer documents/code.
Bad titles = bad search results = unusable product.

All 52,566 old pass0b controls deprecated (not deleted) for full regeneration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 23:45:37 +02:00
Benjamin Admin
cd33777d75 fix: Pass 0b INSERT ON CONFLICT DO UPDATE + per-result commit/rollback
Prevents UniqueViolation from blocking entire batch. Each result
is committed individually, errors are rolled back without affecting
subsequent results.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 22:15:21 +02:00
Benjamin Admin
c73a489075 fix: Pass 0b filter — skip obligations whose parent already has pass0b controls
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 21:54:32 +02:00
Benjamin Admin
7ddb572f5d fix: Pass 0b batch custom_id + result handler for numeric format
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 16:08:19 +02:00
Benjamin Admin
1a3101066e fix: paginated indexing to avoid OOM on 53k controls
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 16:31:20 +02:00
Benjamin Admin
043bcb65d8 fix(control-pipeline): harmonization recheck indexes ALL drafts, not just atomics
Previous version searched against atomic_controls_dedup collection which
only contains Pass 0b atomic controls. Now creates a temporary collection
with ALL draft controls as reference, then checks targets against it.

Two phases:
1. Index ~53k reference drafts into temp Qdrant collection (batch 32)
2. Search each of 14k target controls, Embedding + LLM for borderline
3. Cleanup temp collection when done

Status updates every 50 controls (fixed counter bug).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 15:42:40 +02:00
Benjamin Admin
d31fccbe0e feat(control-pipeline): add harmonization recheck endpoint
POST /generate/harmonization-recheck verifies promoted controls
against Qdrant dedup collection via Embedding + LLM. Runs as stable
asyncio background task inside the container (no docker exec issues).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 13:25:56 +02:00
Sharang Parnerkar
41bc522b5b fix(pitch-deck): close auth gaps, isolate finanzplan scenario access, enforce TS
Some checks failed
Build pitch-deck / build-push-deploy (push) Failing after 1m4s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 57s
CI / test-python-voice (push) Successful in 42s
CI / test-bqas (push) Successful in 42s
D1: Remove /api/admin/fp-patch from PUBLIC_PATHS — it was returning live financial
data (fp_liquiditaet rows) to any unauthenticated caller; middleware admin gate now
applies as it does for all /api/admin/* paths.

D2: Add PITCH_ADMIN_SECRET bearer guard to POST /api/financial-model (create scenario)
and PUT /api/financial-model/assumptions (update assumptions) — any authenticated
investor could previously create/modify global financial model data.

D3: Add PITCH_ADMIN_SECRET bearer guard to POST /api/finanzplan/compute — any
investor could trigger a full DB recomputation across all fp_* tables. Also replace
String(error) in error response with a static message.

D4: GET /api/finanzplan/[sheetName] now ignores ?scenarioId= for non-admin callers;
investors always receive the default scenario only. Previously any investor could
enumerate UUIDs and read any scenario's financials including other investors' plans.

D9: Remove `name` from the non-admin /api/finanzplan response — scenario names like
"Wandeldarlehen v2" reveal internal versioning to investors.

D10: Remove hardcoded postgres://breakpilot:breakpilot123@localhost fallback from
lib/db.ts — missing DATABASE_URL now fails loudly instead of silently using stale
credentials that are committed to the repository.

D6: Fix all 4 TypeScript errors that were masked by ignoreBuildErrors:true; bump
tsconfig target to ES2018 (regex s flag in ChatFAB), type lang as 'de'|'en' in
chat route, add 'as string' assertion in adapter.ts. Remove ignoreBuildErrors:true
from next.config.js so future type errors fail the build rather than being silently
shipped.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 09:08:50 +02:00
Sharang Parnerkar
75bd0c29f3 fix(pitch-deck): eliminate SYSTEM_PROMPT placeholder leak and fix liquidity tax ordering
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m43s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 1m2s
CI / test-python-voice (push) Successful in 45s
CI / test-bqas (push) Successful in 41s
C3: Split SYSTEM_PROMPT into PART1/PART2/PART3 constants; Kernbotschaft #9 and
VERSIONS-ISOLATION now concatenated directly at runtime instead of .replace() — a
whitespace mismatch can no longer cause placeholder text to leak verbatim to the LLM.

I2: Add second liquidity-chain pass (sumAus→ÜBERSCHUSS→rolling balance) after tax rows
(Gewerbesteuer/Körperschaftsteuer) are written to fp_liquiditaet, so first-run LIQUIDITÄT
figures include tax outflows without requiring a second engine invocation.

I6: Warn when loadFpLiquiditaetSummary finds no fp_liquiditaet rows for a named scenario,
surfacing scenario-name mismatches that would otherwise silently return empty context.

I8: Sanitize console.error calls in chat/route.ts (3 sites) and data/route.ts; cap
LiteLLM error body to 200 chars, use (error as Error).message for stream/handler errors.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 08:53:52 +02:00
Benjamin Admin
3ffa3f5793 feat(control-pipeline): add Document Compliance Engine — scope detection + document requirements
New service: document_scope_resolver.py with 28 document rules covering:
- Base (impressum, privacy_policy)
- Tracking (cookie_banner, cookie_policy)
- E-Commerce (AGB, withdrawal, shipping, pricing, payment)
- Digital (digital_content_terms, no_withdrawal_notice)
- SaaS (ToS, service_description, DPA, SLA)
- AI (transparency_notice, automated_decisions)
- Hardware (warranty, return, CE, safety)
- Environmental (WEEE, battery disposal)
- Marketplace (seller terms, ranking transparency)
- Subscription (cancellation terms)

API: POST /v1/document-compliance/required
Input: company flags + jurisdiction → Output: required documents + assessment

Includes confidence scoring, escalation detection (e.g. ecommerce
without distance_selling flag), and reasoning. 19 tests covering all
business model combinations including B2B-only exclusions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 08:39:55 +02:00
Sharang Parnerkar
59e55f8740 fix(pitch-deck): remove version name from isolation prompt to avoid leaking multiplicity
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m41s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 44s
CI / test-python-voice (push) Successful in 39s
CI / test-bqas (push) Successful in 31s
Using terms like 'Version X' or 'Szenario Y' in the VERSIONS-ISOLATION
instruction implies other versions exist. Rewritten to never reference
version/scenario names — just 'this pitch deck, created for you, the only one'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-24 08:27:58 +02:00
Benjamin Admin
f1359d63ba fix: handle new numeric batch custom_id format in Pass 0a result processing
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 07:21:50 +02:00
Benjamin Admin
bbfcd44407 fix: use numeric batch index as custom_id (64 char limit, alphanumeric only)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 00:39:13 +02:00
Benjamin Admin
5da5a5597b fix: increase Batch API upload timeout to 600s for large payloads
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 00:31:50 +02:00
Sharang Parnerkar
b1ef6a85d6 fix(pitch-deck): dynamic VERSIONS-ISOLATION and Kernbotschaft from version data
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Successful in 35s
Removes all hardcoded version-specific numbers from SYSTEM_PROMPT (200k,
40k/160k L-Bank split, 195 Kunden, 3.3 Mio, 9 MA). These are now generated
at runtime from the investor's assigned pitch_version_data: funding amount,
instrument, fm_scenarios name, and 2030 financials (customers, revenue,
employees).

loadPitchContext() now returns { contextString, meta } so the POST handler
can build correct isolation and Kernbotschaft strings for any version —
Wandeldarlehen 200k, 1 Mio, or any future scenario.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 22:44:41 +02:00
Sharang Parnerkar
a795794f94 fix(pitch-deck): FAQ version-data priority override in chat system prompt
Some checks failed
Build pitch-deck / build-push-deploy (push) Successful in 1m10s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Has been cancelled
FAQ entries contain hardcoded financial numbers written for specific scenarios
(e.g. 470k Liquidität 2027, 200k/40k WD amounts). When an investor is on a
different version, those FAQ numbers would override the correct version-specific
context already injected from pitch_version_data.

Added an explicit priority instruction: version-specific Unternehmensdaten
always override FAQ content for any conflicting numbers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 22:40:07 +02:00
Sharang Parnerkar
4e27e05512 fix(pitch-deck): chat agent now uses investor's assigned version scenario
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m25s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 33s
CI / test-python-voice (push) Successful in 35s
CI / test-bqas (push) Successful in 31s
loadPitchContext() now accepts a versionId and loads data from
pitch_version_data instead of hardcoded base table queries, matching
the pattern used by /api/data and /api/financial-model.

Also pulls fp_liquiditaet yearly summaries (LIQUIDITÄT, Summe ERTRÄGE,
etc.) for the matching fp_scenario so the agent quotes the correct
finanzplan numbers. Falls back to base tables when no version is assigned.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 22:24:13 +02:00
Sharang Parnerkar
71b6f8f181 fix(pitch-deck): fix Liquidität engine label mismatches + MilestonesSlide types
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m38s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 38s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Successful in 33s
Engine now uses dynamic row_type-based summation instead of hardcoded label
strings that differed between scenarios (e.g. 'Summe ERTRÄGE' vs
'Summe EINZAHLUNGEN'), fixing stale 9.2M value in Wandeldarlehen scenarios.
Rolling balance now includes all financing cash flows via ÜBERSCHUSS chain.

MilestonesSlide: widen Theme type to union so t.key comparisons compile.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-23 22:07:38 +02:00
Benjamin Admin
38684dd903 feat(control-pipeline): add Assessment Layer to Applicability Engine
Adds confidence scoring, escalation detection, and reasoning to the
deterministic filter. All assessment is deterministic (no LLM).

Confidence scoring (0.0-1.0):
- +0.25 industry specified
- +0.15 company size specified
- +0.20-0.30 scope signals provided
- +0.15 controls found
- +0.15 no contradictions
- Capped at 0.75 for escalation cases

Escalation triggers:
- Contradictory signals (holds_client_funds without operates_payment_service)
- Ambiguous signals (provides_embedded_connectivity)
- Financial signals without explicit payment service declaration
- Incomplete profile (no industry, size, or signals)

Reasoning: template-based, includes active signals, control count,
scope-condition descriptions, and warnings.

Response now includes "assessment" field with confidence, escalation_flag,
escalation_reason, inferred_signals, reasoning, and warnings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 20:36:11 +02:00
Benjamin Admin
716bc651c4 fix(control-pipeline): remove fictional demo packages, add real DB integration tests
Deleted 3 packages that were copied without validation:
- applicability_demo/ (fictional control IDs, wrong API schema)
- applicability_demo_sdk/ (wrong endpoint URL, fictional request format)
- applicability_demo_ci/ (GitHub Actions instead of Gitea, duplicated code)

Replaced with real integration in test_applicability_use_cases.py:
- TestApplicabilityIntegration calls real get_applicable_controls()
- Checks source_citation->source and control_id domain prefixes
- Runs against actual DB when DATABASE_URL is set
- 128 structure/acceptance tests pass, 24 integration tests skip without DB

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 19:59:56 +02:00
Benjamin Admin
27f12e4659 feat(control-pipeline): add CI regression suite for applicability tests
Makefile + pytest + GitHub Actions workflow for automated regression:
- make install / make eval / make test
- pytest integration with demo_cases.yaml
- Golden outputs for 6 priority cases
- Report generation (JSON + Markdown)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 19:12:44 +02:00
Benjamin Admin
a7c6ffe4dd feat(control-pipeline): add SDK endpoint demo package for applicability tests
Request payloads + response contract + api_runner.py for 6 priority cases.
Can be run directly against /v1/applicability/evaluate endpoint.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 19:11:44 +02:00
Benjamin Admin
ae5c5c24eb feat(control-pipeline): add applicability demo test package with evaluator
6 priority demo cases with golden outputs, evaluator.py and run_demo.py:
- CASE-001: Webshop+Stripe (anti-PSD2 false positive)
- CASE-002: Bank+TAN-Generator (scope override for batteries)
- CASE-004: FinTech Wallet (true positive PSD2/AML)
- CASE-006: SaaS+SMS Gateway (anti-TKG false positive)
- CASE-008: Software→IoT Hardware (multi-regime scope)
- CASE-011: Embedded Finance (escalation case)

Self-test passes 6/6 against golden outputs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 19:08:31 +02:00
Benjamin Admin
e8ec50e0fc feat(control-pipeline): 24 demo test cases for applicability engine
YAML-based test package with 4 categories (6 each):
- Standard sector cases (Telko, SaaS, Energie, Automotive, Health, Law)
- Scope-beats-sector (Bank+Battery, KI-Recruiting, White-Label, Payments)
- False friends (Stripe!=PSD2, Hotline!=TKG, Repo-signals!=regulation)
- Escalation (IoT-SIM, FinTech unclear, Treuhand, KI-Diagnose)

Enforces 5 acceptance rules: no false certainty, scope>sector,
repo signals insufficient, standard first, 40%+ negative tests.

Scoring framework: must_include + must_not_include + reasoning + escalation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 17:42:38 +02:00
Benjamin Admin
1f8667c7da feat(control-pipeline): replace similarity-only dedup with LLM-verified dedup in pipeline
Stage 4 (Harmonization) now uses two-tier approach:
- Score >= 0.92: auto-duplicate (embedding only, fast)
- Score 0.85-0.92: LLM verification via local qwen3.5 (think=false, ~3s)
- Score < 0.85: not a duplicate

This eliminates ~44% false positives from pure embedding similarity.
LLM_DEDUP_ENABLED env var controls the feature (default: true).

Also adds 10 applicability use case tests (bank+TAN, webshop+Stripe,
SaaS startup, energy provider, health app, automotive, law firm, etc.)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 16:57:37 +02:00
Benjamin Admin
bed41dcbdf feat(control-pipeline): add applicability backfill endpoint (Phase 5/C3)
POST /v1/canonical/generate/backfill-applicability enriches controls
with applicable_industries, applicable_company_size, scope_conditions
via Anthropic API. Targets ~26k controls from pipeline version < 3.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 16:25:50 +02:00
Benjamin Admin
6694ab84a1 chore: trigger rebuild
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 47s
CI / test-python-voice (push) Successful in 43s
CI / test-bqas (push) Successful in 33s
2026-04-23 12:43:55 +02:00
Benjamin Admin
f721e97ff1 chore: diagnose WD liquiditaet sums
Some checks failed
CI / go-lint (push) Has been cancelled
CI / python-lint (push) Has been cancelled
CI / nodejs-lint (push) Has been cancelled
CI / test-go-consent (push) Has been cancelled
CI / test-python-voice (push) Has been cancelled
CI / test-bqas (push) Has been cancelled
Build pitch-deck / build-push-deploy (push) Successful in 1m27s
2026-04-23 12:39:20 +02:00
Benjamin Admin
d9f9fa0743 security: re-secure fp-patch
Some checks failed
Build pitch-deck / build-push-deploy (push) Has been cancelled
CI / nodejs-lint (push) Has been cancelled
CI / go-lint (push) Has been cancelled
CI / python-lint (push) Has been cancelled
CI / test-go-consent (push) Has been cancelled
CI / test-python-voice (push) Has been cancelled
CI / test-bqas (push) Has been cancelled
2026-04-23 12:30:23 +02:00
Benjamin Admin
7b72fac679 chore: trigger deploy
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 49s
CI / test-bqas (push) Has been cancelled
CI / test-python-voice (push) Has started running
2026-04-23 12:23:32 +02:00