Remove applicability/scanner_hint/evidence_type/provides_context from
Pass 0b prompt to reduce output tokens (~40% less). These 6 fields are
added via cheap Haiku backfill afterwards (~$1.50 per 10k controls).
Saves ~$200 over the remaining 160k obligations.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prompt v4 adds 6 new fields to Pass 0b output:
- applicability: condition rules (same format as dependency engine)
- check_type: expanded to 10 granular types
- scanner_hint: search_terms + negative_indicators for MCP
- manual_review_required_if: escalation conditions
- evidence_type: code/process/hybrid
- provides_context: context variables this control creates
New endpoint POST /generate/backfill-extended:
- Backfills existing 9k controls via Haiku Batch API (~$1.50)
- Adds all 6 new fields to generation_metadata
- Supports dry_run mode
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LLM merge_key phases (e.g. "submission") don't always match PHASE_ORDER
keys. Derive phase order from action_type via get_phase_order() instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix truncated title detection: only flag near-200-char titles or mid-word cutoffs
- Fix evidence leak detection: check title start patterns, not keyword substring
("nachweisen" verb is valid action, "Nachweis vorliegen" is evidence)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- GET /generate/batch-api-status/{batch_id} — check Anthropic batch status
- POST /generate/process-batch — process completed batch results (background)
- GET /generate/process-batch-status/{job_id} — poll processing progress
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add assertion, pass_criteria, fail_criteria, check_type to AtomicControlCandidate dataclass
- Parse MCP fields from LLM output in _process_pass0b_control
- Store MCP fields in generation_metadata JSON for later use by MCP scanner
- Fields default to empty when not present (backward-compatible with old prompts)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Obligations classified before API call:
- evidence → skipped (saves API cost)
- composite → skipped (not atomic)
- framework_container → skipped (decompose separately)
- atomic → sent to LLM
Filter stats returned in submit response.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes from v2 evaluation (7.9/10 avg, 28 controls):
1. COMPOUND BAN: "durchführen UND Maßnahmen ergreifen" → pick primary action only
2. EVIDENCE-OF-ACTION: "Tests dokumentieren" → evidence field, not own control
3. PFLICHT=PROZESS: "Behörden informieren" + "Verfahren etablieren" = 1 control
4. MERGE-KEY BUG: merge_key from LLM output now stored in generation_metadata
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Key changes to system prompt:
- Evidence/documentation belongs in evidence field, NOT as separate control
- SBOM = 1 control (not "maintain" + "document" separately)
- Security lifecycle phases (identify/assess/remediate/monitor) = separate controls
- Same object + same action + same actor = 1 control (merge, not split)
- Titles must contain the ACTION, not just the subject
WRONG: "Vertraulichkeit Mitarbeiter"
RIGHT: "Mitarbeiter zur Vertraulichkeit verpflichten"
Titles serve as MCP search queries against customer documents/code.
Bad titles = bad search results = unusable product.
All 52,566 old pass0b controls deprecated (not deleted) for full regeneration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prevents UniqueViolation from blocking entire batch. Each result
is committed individually, errors are rolled back without affecting
subsequent results.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous version searched against atomic_controls_dedup collection which
only contains Pass 0b atomic controls. Now creates a temporary collection
with ALL draft controls as reference, then checks targets against it.
Two phases:
1. Index ~53k reference drafts into temp Qdrant collection (batch 32)
2. Search each of 14k target controls, Embedding + LLM for borderline
3. Cleanup temp collection when done
Status updates every 50 controls (fixed counter bug).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
POST /generate/harmonization-recheck verifies promoted controls
against Qdrant dedup collection via Embedding + LLM. Runs as stable
asyncio background task inside the container (no docker exec issues).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
D1: Remove /api/admin/fp-patch from PUBLIC_PATHS — it was returning live financial
data (fp_liquiditaet rows) to any unauthenticated caller; middleware admin gate now
applies as it does for all /api/admin/* paths.
D2: Add PITCH_ADMIN_SECRET bearer guard to POST /api/financial-model (create scenario)
and PUT /api/financial-model/assumptions (update assumptions) — any authenticated
investor could previously create/modify global financial model data.
D3: Add PITCH_ADMIN_SECRET bearer guard to POST /api/finanzplan/compute — any
investor could trigger a full DB recomputation across all fp_* tables. Also replace
String(error) in error response with a static message.
D4: GET /api/finanzplan/[sheetName] now ignores ?scenarioId= for non-admin callers;
investors always receive the default scenario only. Previously any investor could
enumerate UUIDs and read any scenario's financials including other investors' plans.
D9: Remove `name` from the non-admin /api/finanzplan response — scenario names like
"Wandeldarlehen v2" reveal internal versioning to investors.
D10: Remove hardcoded postgres://breakpilot:breakpilot123@localhost fallback from
lib/db.ts — missing DATABASE_URL now fails loudly instead of silently using stale
credentials that are committed to the repository.
D6: Fix all 4 TypeScript errors that were masked by ignoreBuildErrors:true; bump
tsconfig target to ES2018 (regex s flag in ChatFAB), type lang as 'de'|'en' in
chat route, add 'as string' assertion in adapter.ts. Remove ignoreBuildErrors:true
from next.config.js so future type errors fail the build rather than being silently
shipped.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
C3: Split SYSTEM_PROMPT into PART1/PART2/PART3 constants; Kernbotschaft #9 and
VERSIONS-ISOLATION now concatenated directly at runtime instead of .replace() — a
whitespace mismatch can no longer cause placeholder text to leak verbatim to the LLM.
I2: Add second liquidity-chain pass (sumAus→ÜBERSCHUSS→rolling balance) after tax rows
(Gewerbesteuer/Körperschaftsteuer) are written to fp_liquiditaet, so first-run LIQUIDITÄT
figures include tax outflows without requiring a second engine invocation.
I6: Warn when loadFpLiquiditaetSummary finds no fp_liquiditaet rows for a named scenario,
surfacing scenario-name mismatches that would otherwise silently return empty context.
I8: Sanitize console.error calls in chat/route.ts (3 sites) and data/route.ts; cap
LiteLLM error body to 200 chars, use (error as Error).message for stream/handler errors.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Using terms like 'Version X' or 'Szenario Y' in the VERSIONS-ISOLATION
instruction implies other versions exist. Rewritten to never reference
version/scenario names — just 'this pitch deck, created for you, the only one'.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Removes all hardcoded version-specific numbers from SYSTEM_PROMPT (200k,
40k/160k L-Bank split, 195 Kunden, 3.3 Mio, 9 MA). These are now generated
at runtime from the investor's assigned pitch_version_data: funding amount,
instrument, fm_scenarios name, and 2030 financials (customers, revenue,
employees).
loadPitchContext() now returns { contextString, meta } so the POST handler
can build correct isolation and Kernbotschaft strings for any version —
Wandeldarlehen 200k, 1 Mio, or any future scenario.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
FAQ entries contain hardcoded financial numbers written for specific scenarios
(e.g. 470k Liquidität 2027, 200k/40k WD amounts). When an investor is on a
different version, those FAQ numbers would override the correct version-specific
context already injected from pitch_version_data.
Added an explicit priority instruction: version-specific Unternehmensdaten
always override FAQ content for any conflicting numbers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
loadPitchContext() now accepts a versionId and loads data from
pitch_version_data instead of hardcoded base table queries, matching
the pattern used by /api/data and /api/financial-model.
Also pulls fp_liquiditaet yearly summaries (LIQUIDITÄT, Summe ERTRÄGE,
etc.) for the matching fp_scenario so the agent quotes the correct
finanzplan numbers. Falls back to base tables when no version is assigned.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Engine now uses dynamic row_type-based summation instead of hardcoded label
strings that differed between scenarios (e.g. 'Summe ERTRÄGE' vs
'Summe EINZAHLUNGEN'), fixing stale 9.2M value in Wandeldarlehen scenarios.
Rolling balance now includes all financing cash flows via ÜBERSCHUSS chain.
MilestonesSlide: widen Theme type to union so t.key comparisons compile.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds confidence scoring, escalation detection, and reasoning to the
deterministic filter. All assessment is deterministic (no LLM).
Confidence scoring (0.0-1.0):
- +0.25 industry specified
- +0.15 company size specified
- +0.20-0.30 scope signals provided
- +0.15 controls found
- +0.15 no contradictions
- Capped at 0.75 for escalation cases
Escalation triggers:
- Contradictory signals (holds_client_funds without operates_payment_service)
- Ambiguous signals (provides_embedded_connectivity)
- Financial signals without explicit payment service declaration
- Incomplete profile (no industry, size, or signals)
Reasoning: template-based, includes active signals, control count,
scope-condition descriptions, and warnings.
Response now includes "assessment" field with confidence, escalation_flag,
escalation_reason, inferred_signals, reasoning, and warnings.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deleted 3 packages that were copied without validation:
- applicability_demo/ (fictional control IDs, wrong API schema)
- applicability_demo_sdk/ (wrong endpoint URL, fictional request format)
- applicability_demo_ci/ (GitHub Actions instead of Gitea, duplicated code)
Replaced with real integration in test_applicability_use_cases.py:
- TestApplicabilityIntegration calls real get_applicable_controls()
- Checks source_citation->source and control_id domain prefixes
- Runs against actual DB when DATABASE_URL is set
- 128 structure/acceptance tests pass, 24 integration tests skip without DB
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Makefile + pytest + GitHub Actions workflow for automated regression:
- make install / make eval / make test
- pytest integration with demo_cases.yaml
- Golden outputs for 6 priority cases
- Report generation (JSON + Markdown)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Request payloads + response contract + api_runner.py for 6 priority cases.
Can be run directly against /v1/applicability/evaluate endpoint.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stage 4 (Harmonization) now uses two-tier approach:
- Score >= 0.92: auto-duplicate (embedding only, fast)
- Score 0.85-0.92: LLM verification via local qwen3.5 (think=false, ~3s)
- Score < 0.85: not a duplicate
This eliminates ~44% false positives from pure embedding similarity.
LLM_DEDUP_ENABLED env var controls the feature (default: true).
Also adds 10 applicability use case tests (bank+TAN, webshop+Stripe,
SaaS startup, energy provider, health app, automotive, law firm, etc.)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
POST /v1/canonical/generate/backfill-applicability enriches controls
with applicable_industries, applicable_company_size, scope_conditions
via Anthropic API. Targets ~26k controls from pipeline version < 3.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>