breakpilot-core

Author	SHA1	Message	Date
Benjamin Admin	0bb9726ddd	Merge branch 'main' of ssh://gitea.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-core CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 48s Details CI / test-python-voice (push) Successful in 43s Details CI / test-bqas (push) Successful in 36s Details	2026-05-10 15:09:51 +02:00
Benjamin Admin	8510af46eb	feat(pipeline): MC Quality Overhaul — 74.5% → 92.8% accuracy, 5.3K → 13.6K MCs Phase 0: Quality Audit script (Claude Sonnet, 1750 samples) Phase 1: Object ontology expanded 31 → 74 tokens with descriptions + boundaries Phase 2: 174K controls re-classified via Haiku (10 batches, $50) - Generic tokens removed (documentation, procedure, process) - L2 sub-topics added (108K + 64K controls) - Bad subtopics fixed (stakeholder_*, escalation fragments) Phase 3: Re-clustering K=18704 (37K objects → 16.7K groups) Phase 4: Direct MC generation from canonical tokens (gpre2_direct_mc.py) Phase 5: Regulation-source split (gpre3, dry-run tested) New features: - Tenant-isolated document upload API (rag-service) - BAuA crawler (Playwright, 131 PDFs downloaded) - OSHA Technical Manual crawler (23 chapters) - CE obligation extractor (6141 obligations from Qdrant) RAG ingestion: - 126 BAuA PDFs (TRBS/TRGS/ASR): 27,664 chunks - OSHA Technical Manual: 7,241 chunks - OSHA 1910 Subpart O (full): 745 chunks - EuGH C-588/21 P: 216 chunks - EU 2018/1725: 842 chunks Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-10 15:08:15 +02:00
Benjamin Admin	81db904b3e	feat(legal-sources): add OSHA machinery safety standards + international norms mapping OSHA 29 CFR 1910 Subpart O (1910.211-1910.219) — complete machine guarding requirements. US federal law, public domain. International norms mapping table: China GB/T, Korea KS, India BIS equivalents to ISO/EN standards. Unfortunately all countries protect ISO copyright even for identical national adoptions (IDT). Only OSHA provides truly free machinery safety content. EU Excel harmonised standards list included for reference. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-09 10:50:43 +02:00
Sharang Parnerkar	572052285c	fix: require button click to consume magic link token Build pitch-deck / build-push-deploy (push) Successful in 1m54s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 47s Details CI / test-python-voice (push) Successful in 37s Details CI / test-bqas (push) Successful in 37s Details Email security gateways follow GET redirects automatically and were consuming the token before the investor clicked through. The verify page now shows an 'Access Pitch Deck' button; the token is only consumed on explicit click, which scanners cannot trigger. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-07 23:30:27 +02:00
Sharang Parnerkar	1ef22e6f95	fix: use PITCH_BASE_URL for short link redirects instead of request.url Build pitch-deck / build-push-deploy (push) Successful in 1m39s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 32s Details CI / test-python-voice (push) Successful in 32s Details CI / test-bqas (push) Successful in 29s Details Behind Orca's reverse proxy, request.url resolves to http://127.0.0.1:3000 which causes redirects to go to the internal address instead of the public domain. Use PITCH_BASE_URL (already set in service.toml) as the base. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-07 10:55:53 +02:00
Sharang Parnerkar	d291af0e33	fix: whitelist /p/* in middleware so short links work without a session Build pitch-deck / build-push-deploy (push) Successful in 1m38s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 33s Details CI / test-python-voice (push) Successful in 33s Details CI / test-bqas (push) Successful in 30s Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-07 10:42:09 +02:00
Sharang Parnerkar	76aad8b1d1	feat(pitch-deck): branded short links for magic URLs (pitch.breakpilot.ai/p/ab3xk2) Build pitch-deck / build-push-deploy (push) Successful in 1m31s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 32s Details CI / test-python-voice (push) Successful in 34s Details CI / test-bqas (push) Successful in 30s Details - New pitch_short_links table stores 6-char alphanumeric codes mapped to magic link tokens - GET /p/[code] redirects to /auth/verify?token=... (302, validates expiry) - All magic link generation points (invite, generate-link, resend) now create a short code - Emails (invite + resend) use the short URL — less token-like, cleaner for spam filters - Copy-link UI shows short URL prominently with full URL as fallback - Migration 008 added to /api/admin/migrate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-07 10:34:24 +02:00
Sharang Parnerkar	54f0919b73	feat(pitch-deck): translate financial plan row labels when lang=en Build pitch-deck / build-push-deploy (push) Successful in 2m0s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 47s Details CI / test-python-voice (push) Successful in 35s Details CI / test-bqas (push) Successful in 32s Details - Add ROW_LABEL_MAP (DE→EN) covering GuV, Liquidität, Kunden, Betriebliche Aufwendungen rows - Add FORMULA_TOOLTIPS_EN with English tooltip text for all formula-driven rows - Add MONTH_LABELS_EN (Mrz→Mar, Mai→May, Okt→Oct) - LabelWithTooltip now accepts `de` flag, translates display text and tooltip accordingly - Month column headers switch between DE/EN month abbreviations - Falls back to original German label for any row not in the map Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-07 09:47:45 +02:00
Sharang Parnerkar	ec7eee8e3d	feat(pitch-deck): change preferred_lang for existing investors from detail page Build pitch-deck / build-push-deploy (push) Successful in 1m27s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 33s Details CI / test-python-voice (push) Successful in 33s Details CI / test-bqas (push) Successful in 34s Details - GET /api/admin/investors/:id now returns preferred_lang - PATCH /api/admin/investors/:id accepts preferred_lang (de/en), validates value - Investor detail page: DE/EN toggle in the Pitch Version card, instant save on click Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-06 23:31:59 +02:00
Sharang Parnerkar	b0d273d3ab	feat(pitch-deck): add pitch version selection to investor invite form Build pitch-deck / build-push-deploy (push) Successful in 1m33s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 33s Details CI / test-python-voice (push) Successful in 36s Details CI / test-bqas (push) Successful in 32s Details - Version dropdown on the invite form shows all committed versions - Selected version is assigned to the investor at creation time (no separate step needed) - API validates version is committed before upserting - Leaving the dropdown empty keeps any existing assignment (COALESCE behavior) - version_id included in audit log Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-06 23:27:23 +02:00
Sharang Parnerkar	17b9006b88	feat(pitch-deck): English email templates, investor language preference, link-only invite mode Build pitch-deck / build-push-deploy (push) Successful in 1m55s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 36s Details CI / test-python-voice (push) Successful in 35s Details CI / test-bqas (push) Successful in 35s Details - Add English email template variants (greeting, message, closing, subject, CTA copy) - Add `preferred_lang` column to `pitch_investors` — stored per investor, deck opens in that language by default - Invite form: DE/EN language toggle that switches email defaults and pitch language setting - Invite form: "Send email" toggle — when off, creates investor + returns magic link without sending email (for cold outreach attachment) - `app/page.tsx`: initializes pitch language from investor's `preferred_lang` before first render (no flash) - Migration 007 added to `/api/admin/migrate` route for production rollout Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-06 23:18:40 +02:00
Benjamin Admin	e013702a02	Merge branch 'main' of ssh://gitea.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-core CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 47s Details CI / test-python-voice (push) Successful in 38s Details CI / test-bqas (push) Successful in 37s Details	2026-05-06 21:06:19 +02:00
Benjamin Admin	f022b489e2	docs: comprehensive session handover — Blocks F+G complete, next: MC quality refinement Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-06 21:06:01 +02:00
Benjamin Admin	0092c4fe47	feat(pipeline): G-pre1 refinement script for large object groups Splits master controls >200 members by re-clustering their object groups with k=4-20 per group. First round: 38 groups → 325 sub-groups → 253 new MCs. 25 generic MCs remain (monitoring, procedure, etc.) — need regulation-source split. Session summary: Block F complete, Control Generation (1,599+), Pass 0a/0b, Production Sync, G-pre1/2/3 Object Clustering + Master Controls + API, G1-G4 Compliance Execution Layer (Decision Trace, Commit Ledger, Decision Memory, Pre-Deployment Enforcement). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-06 20:41:49 +02:00
Benjamin Admin	d5bcd0bd5b	feat(pipeline): G4 Pre-Deployment Enforcement — CI/CD compliance gate New table: deployment_checks (verdict, blocking/warning controls, risk score) New API: POST /v1/deployment-checks (SDK asks: "can I deploy?") GET /v1/deployment-checks/{id} (check result) POST /v1/deployment-checks/{id}/override (manual override with justification) GET /v1/deployment-checks/stats (approval/block rate) Check logic: queries G1 decision_traces + G3 open failures per affected control. Verdict: approved (0 blocking) or blocked (with fix recommendations). 454 tests pass, 0 regressions. Block G complete: G1-G4 all implemented. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-06 20:24:45 +02:00
Benjamin Admin	c398e74d5e	feat(pipeline): G3 Full Decision Memory — compliance lifecycle event stream New table: decision_events (assessment→decision→fix→verification→failure cycle) New API: POST /v1/decision-events (record lifecycle event) GET /v1/decision-events (list with filters) GET /v1/decision-events/timeline/{control_id} (full chronological timeline) GET /v1/decision-events/stats (failure rate, cycle times) Each event captures input_state, output_state, actor, evidence. 454 tests pass, 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-06 20:16:25 +02:00
Benjamin Admin	e82f99b8cb	feat(pipeline): G2 Compliance Commit Ledger — code↔control audit trail New table: compliance_commits (commit hash, affected controls, risk level) New API: POST /v1/compliance-commits (SDK registers commit + impact) GET /v1/compliance-commits (list with filters) GET /v1/compliance-commits/by-control/{id} (all commits for a control) GET /v1/compliance-commits/stats (dashboard) GET /v1/compliance-commits/{id} (detail) GIN index on affected_control_ids for fast @> containment queries. 454 tests pass, 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-06 19:17:45 +02:00
Benjamin Admin	66a70ab31c	feat(pipeline): G1 Decision Trace — compliance decision tracking New table: decision_traces (status, reason, evidence, fix plan per control) New API: POST/GET/PUT /v1/decision-traces (CRUD for decisions) GET /v1/decision-traces/stats (compliance dashboard) GET /v1/controls/{id}/full-trace (Regulation→Obligation→Control→Decision→Evidence) 454 tests pass, 0 regressions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-06 18:26:21 +02:00
Benjamin Admin	ad24835940	feat(pipeline): G-pre1/2/3 — Object Clustering + Master Controls + API G-pre1: 144k objects clustered into 7,466 groups via Mini-Batch K-Means on bge-m3 embeddings. Two-stage: k=5000 base + sub-cluster groups >50. G-pre2: 5,114 Master Controls from lifecycle phase chains (define→implement→test→monitor), linking 172,504 atomic controls. G-pre3: REST API for Master Controls GET /v1/master-controls (list, search, filter) GET /v1/master-controls/stats GET /v1/master-controls/{mc_id} (detail with phase-controls) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-06 15:11:38 +02:00
Benjamin Admin	e683701a44	fix(gitea): remove /etc/timezone mount (macOS incompatible), use TZ env var Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-05 19:37:43 +02:00
Benjamin Admin	0bad74a3bd	docs: session handover — Block F complete, pipeline done, G-pre1 analysis Session 03-05.05.2026: - Block F1-F5 complete (DB migration of hardcoded dicts) - Control Generation: 1,599 controls + 11,522 obligations + 1,147 atomics - Production sync: 2,625 controls + 11,522 obligations synced - G-pre1 analysis: 183k objects → 144k after normalize (needs hierarchical clustering) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-05 18:02:10 +02:00
Benjamin Admin	22257a7ed8	feat(pipeline): F5 validation tests — verify DB matches hardcoded dicts 8 tests confirm all REGULATION_LICENSE_MAP, ACTION_TYPES, _NEGATIVE_PATTERNS, _ACTION_SYNONYMS, and _OBJECT_SYNONYMS entries are correctly migrated to DB. Dicts kept as fallback for DB-unavailability resilience. Block F complete: F1-F5 all done. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-05 16:06:59 +02:00
Benjamin Admin	a20de0b52b	feat(pipeline): F4 LLM synonym enrichment script Uses Ollama (qwen3.5:35b-a3b, think:false) to generate additional German synonyms for action types and object tokens. Results stored with source='llm' in action_synonyms/object_synonyms tables. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-05 15:45:43 +02:00
Benjamin Admin	775d8b52f3	fix(vault): prevent CPU-burning init loop with marker file + idempotent checks Root cause: init scripts ran repeatedly (on container restart) and tried vault secrets enable / vault auth enable for already-existing paths. Vault logged ERRORs and burned 40-84% CPU in the loop. Fix: - Marker file /vault/data/.init-complete skips re-initialization - vault secrets list / vault auth list checks before enable calls - No more "path already in use" errors on subsequent runs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-05 11:46:16 +02:00
Sharang Parnerkar	f0a84e79ab	fix(preview): return fp_scenarios key so version-specific scenario is resolved in admin preview Build pitch-deck / build-push-deploy (push) Successful in 1m39s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 40s Details CI / test-python-voice (push) Successful in 32s Details CI / test-bqas (push) Successful in 33s Details The preview-data API was returning `fm_scenarios` but PitchDeck reads `data.fp_scenarios`, so fpBaseScenarioId was always null and the Finanzplan slide fell back to the global default scenario (Base Case 200k) instead of the version's assigned scenario (e.g. 1 Mio. Euro Base). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-05 11:39:53 +02:00
Benjamin Admin	64f45be63a	feat(pipeline): add Pass 0a endpoint to core control-pipeline Registers /generate/run-pass0a and /generate/pass0a-status/{job_id} on the core control-pipeline (port 8098). Previously Pass 0a was only available on the compliance backend which connects to Production DB, causing a split-brain when controls are generated locally. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-05 07:21:58 +02:00
Sharang Parnerkar	404963db77	feat(showcase): restore intro-presenter and executive-summary slides in showcase mode Build pitch-deck / build-push-deploy (push) Successful in 1m22s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 31s Details CI / test-python-voice (push) Successful in 33s Details CI / test-bqas (push) Successful in 30s Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 23:14:18 +02:00
Sharang Parnerkar	0acbf25956	fix(showcase): hide Data Room link for showcase sessions Build pitch-deck / build-push-deploy (push) Successful in 1m23s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 29s Details CI / test-python-voice (push) Successful in 33s Details CI / test-bqas (push) Successful in 30s Details Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 23:12:57 +02:00
Sharang Parnerkar	2bd9b015eb	fix(showcase): block financial data from AI Q&A, fix FAB overflow, fix presenter slide mapping Build pitch-deck / build-push-deploy (push) Successful in 1m47s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 41s Details CI / test-python-voice (push) Successful in 32s Details CI / test-bqas (push) Successful in 32s Details AI Q&A: fetch is_showcase from DB; showcase sessions receive no financial/funding context and have an explicit LLM guard refusing to discuss investment details. FAQ context and financial slide IDs stripped from system prompt. FAB: flex layout so Fullscreen button is always visible regardless of panel height. Presenter: pass activeSlideOrder to usePresenterMode so buildSlideAudioPlan maps slideIdx → slideId from the filtered list, not the full SLIDE_ORDER. Progress calculation also filters to active scripts only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 23:00:55 +02:00
Sharang Parnerkar	be126a7a39	fix(pitch): showcase sidebar shows only filtered slides + AI presenter via FAB Build pitch-deck / build-push-deploy (push) Successful in 1m22s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 31s Details CI / test-python-voice (push) Successful in 30s Details CI / test-bqas (push) Successful in 31s Details NavigationFAB and SlideOverview now accept slideNames prop and render only the active slide list (filtered for showcase mode). Adds AI presenter start button to the FAB footer so it's accessible even when intro-presenter slide is hidden. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 22:50:33 +02:00
Sharang Parnerkar	30a9165497	feat(pitch): showcase mode — per-investor toggle hides financial/investor slides for customer demos Build pitch-deck / build-push-deploy (push) Successful in 1m35s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 39s Details CI / test-python-voice (push) Successful in 32s Details CI / test-bqas (push) Successful in 30s Details Adds is_showcase boolean to pitch_investors; when set, filters out financials, the ask, cap table, assumptions, finanzplan, risks, and intro-presenter slides. Slide navigation is fully dynamic — progress bar and counts update accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 22:41:15 +02:00
Sharang Parnerkar	f2184be02f	fix: tab row counts use investor's scenario, not always Base Case Build pitch-deck / build-push-deploy (push) Successful in 1m34s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 36s Details CI / test-python-voice (push) Successful in 35s Details CI / test-bqas (push) Successful in 32s Details /api/finanzplan now accepts ?scenarioId and uses it for the per-sheet row counts (the numbers in brackets on the tab bar). FinanzplanSlide passes fpBaseScenarioId when fetching the sheet list, so Wandeldarlehen investors see e.g. Personalkosten (9) instead of (35). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 15:21:40 +02:00
Sharang Parnerkar	06014d57b3	fix: derive fp_scenario IDs from version snapshot, eliminate hardcoded UUIDs Build pitch-deck / build-push-deploy (push) Successful in 1m30s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 33s Details CI / test-python-voice (push) Successful in 32s Details CI / test-bqas (push) Successful in 31s Details The fm_scenarios array in each pitch version snapshot already stores the fp_scenario IDs directly (same pattern 1 Mio used). Wandeldarlehen snapshots were missing Bear/Bull entries — updated in DB to add them. - /api/data: include fp_scenarios in version response (was omitted) - PitchDeck: derive fpBaseScenarioId from data.fp_scenarios - useFpKPIs: accept fpBaseScenarioId instead of isWandeldarlehen boolean - AssumptionsSlide: find Bear/Base/Bull by name from fpScenarios prop - FinanzplanSlide: initialize from fpBaseScenarioId, use version scenarios for selector - FinancialsSlide / ExecutiveSummarySlide: pass fpBaseScenarioId to hook - types: add FpScenarioRef + fp_scenarios field to PitchData No UUID hardcoded in any component. Adding a new pitch version only requires setting the correct fp_scenario IDs in its fm_scenarios snapshot. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 15:00:06 +02:00
Sharang Parnerkar	6c022d1a79	fix: allow investors to query fp_ scenarios by scenarioId Build pitch-deck / build-push-deploy (push) Successful in 1m55s Details CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 40s Details CI / test-python-voice (push) Successful in 37s Details CI / test-bqas (push) Successful in 34s Details AssumptionsSlide sends ?scenarioId=<uuid> for Bear/Base/Bull cards but the route was silently dropping it for non-admin requests, making all three cards return the same default Base Case data. Since fp_ financial projections are already investor-facing, any valid scenarioId is allowed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-04 14:27:14 +02:00
Benjamin Admin	e869cabc81	docs: session handover — F1-F3 done, control generation running Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-04 07:21:24 +02:00
Benjamin Admin	652e3a65a3	feat(pipeline): F2+F3 action/object ontology — DB-backed normalization CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 36s Details CI / test-python-voice (push) Successful in 33s Details CI / test-bqas (push) Successful in 31s Details Migrates ACTION_TYPES (26+8 types), _NEGATIVE_PATTERNS (22), _ACTION_SYNONYMS (65), and _OBJECT_SYNONYMS (75) from hardcoded dicts to DB tables. - SQL migration: 003_action_object_ontology.sql (3 tables) - Migration scripts: f2_migrate_actions.py (34 types, 145 synonyms), f3_migrate_objects.py (75 objects) - OntologyRegistry cache: 5min TTL, raises RuntimeError if empty (safe fallback to dicts) - control_ontology.classify_action/get_phase delegate to DB with dict fallback - control_dedup.normalize_action/normalize_object delegate to DB with dict fallback - 25 new tests, 446 total pass, 0 regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 23:47:53 +02:00
Benjamin Admin	aab8eeb335	Merge branch 'main' of ssh://gitea.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-core CI / go-lint (push) Has been skipped Details CI / python-lint (push) Has been skipped Details CI / nodejs-lint (push) Has been skipped Details CI / test-go-consent (push) Successful in 47s Details CI / test-python-voice (push) Successful in 38s Details CI / test-bqas (push) Successful in 33s Details	2026-05-03 23:14:34 +02:00
Benjamin Admin	9437e029d0	feat(pipeline): F1 regulation registry — DB-backed license/source-type lookup Migrates REGULATION_LICENSE_MAP (135 entries) and SOURCE_REGULATION_CLASSIFICATION (58 entries) from hardcoded Python dicts to compliance.regulation_registry table. - SQL migration: 002_regulation_registry.sql (table + indexes + trigger) - Migration script: f1_migrate_regulation_registry.py (162 rows, --dry-run) - RegulationRegistry cache: 5min TTL, prefix fallback, graceful degradation - control_generator._classify_regulation() delegates to DB with dict fallback - source_type_classification.classify_source_regulation() delegates to DB - 34 new tests (lookup, cache, degradation, migration data consistency) - 421 total tests pass, 0 regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 23:14:06 +02:00
Benjamin Admin	4fd2bfefcd	docs: session handover updated for Block F start Next: F1 Regulation Registry (DB + API + Frontend + Auto-Create) Frontend at /sdk/regulation-registry in breakpilot-compliance admin Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 22:51:23 +02:00
Benjamin Admin	fac9280716	feat(pipeline): Block D5+-E complete session — 20k+ new chunks Session 02-03.05.2026 accomplishments: - D5+: NIST/ENISA PDF quality fix (0%→45% section rate) - D5+: 4 lost NIST PDFs restored (11k chunks) - D5+: Text normalization + section detection for NIST/BSI - D6: Citation backfill (3,651 controls updated, old archived) - E2: 8 DE laws ingested (ArbZG, MuSchG, GmbHG, AktG, InsO...) - E3: 5 EU regulations (CSRD, CSDDD, Taxonomy, eIDAS, Pay Trans.) - E4: Standards (GoBD, BAIT, VAIT) - E6: 3 CH + 4 AT laws (OR, DSV, ArG, ArbVG, AngG, AZG, NISG) - E7: 9 court judgments as full text (Schrems II 154 chunks, Meta 101, BVerfG 161, DSK OH 119, Planet49 42, SCHUFA 41, Schadenersatz 29, BAG 48, Google Fonts 14) - Infra: Qdrant snapshot mechanism, upload-before-delete safety Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 22:31:57 +02:00
Benjamin Admin	118be3540d	feat(pipeline): D6 citation backfill + E2/E3 law ingestion scripts - d6_citation_backfill.py: 3-tier matching (hash/prefix/overlap), archives old citations, updated 3.651 controls (93.6% coverage) - ingest_de_laws.py: 8 German laws ingested (ArbZG, MuSchG, NachwG, MiLoG, GmbHG, AktG, InsO, BUrlG — 1.629 chunks) - ingest_eu_regulations.py: EUR-Lex ingestion (needs manual HTML due to AWS WAF). CSRD, CSDDD, EU Taxonomy, eIDAS 2.0, Pay Transparency manually ingested (1.057 chunks) - Updated session handover with current state Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 13:19:27 +02:00
Benjamin Admin	a9671a572b	fix(embedding): single-number ALL-CAPS section detection for ENISA/BSI Add case-sensitive _SINGLE_NUM_ALLCAPS_RE for "1. INTRODUCTION" style headers (ENISA, BSI docs). Cannot use _LEGAL_SECTION_RE for this because it uses re.IGNORECASE which would false-positive on "1. Erstens" etc. Also re-downloaded 2 corrupt PDFs from nist.gov (nistir_8259a, nist_ai_rmf) — originals in MinIO were 263-byte XML error responses, not PDFs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 08:56:02 +02:00
Benjamin Admin	2f4a3f2ea2	fix(embedding): add NIST control IDs to _SECTION_NUMBER_RE _SECTION_NUMBER_RE only had patterns for §/Art/Section/Kapitel/Annex but missed NIST-style identifiers (AC-1, GV.OC-01, 3.1, A01:2021). This caused 0% section rate for all NIST/BSI/ENISA documents even though sections were correctly detected — the section NUMBER wasn't extracted from the header. Also adds: - reupload_legal_strategy.py: re-upload with legal chunking - extract_and_upload_nist.py: local PDF extraction workaround - qdrant-snapshot.sh: backup mechanism for Qdrant collections Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 07:42:06 +02:00
Benjamin Admin	0b0eed27b0	feat(embedding): NIST PDF text normalization + safe re-ingest script Fix broken multi-column PDF extraction for NIST/BSI/ENISA documents: - _normalize_pdf_text(): fixes broken section numbers (1 . 1 → 1.1), control IDs (AC - 1 → AC-1), ligatures, soft hyphens - pdfplumber tolerances increased (x=3,y=4) for better column handling - 3 new regex patterns: NIST CSF 2.0, NIST enhancements, OWASP Top 10 - reingest_nist.py: safe upload-before-delete for 4 lost NIST PDFs - reingest_d5.py: safety fix — upload first, verify, then delete old Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-03 06:42:46 +02:00
Benjamin Admin	97a7f6f264	docs: comprehensive session handover with full roadmap (Blocks A-G) Complete instructions for next session including: - Current quality metrics per document type - Prioritized action items (NIST fix, citation backfill, missing laws) - Full Block E-G roadmap with details - All critical files, DB state, test commands - Known issues (3 lost NIST PDFs, frontend 500s, D5 script safety) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 22:30:50 +02:00
Benjamin Admin	ff21bc258a	docs: session handover — D2-D5 complete, quality report, NIST plan Major session achievements: - Structural metadata end-to-end (D2-D4) - 430 docs re-ingested with new chunking - HTML stripping + charset detection (0% → 97.6%) - 20 EU regulations from EUR-Lex HTML (DSGVO: 0% → 92%) - Quality report script (500 controls: 13% fully correct) - Frontend requirements.map fix Open: NIST/ENISA text normalization, citation backfill, D5 script safety (upload-before-delete), BEG IV ingestion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 22:22:55 +02:00
Benjamin Admin	3009f3d13a	feat(embedding): add NIST/ENISA/standard section numbering to chunker Extends _LEGAL_SECTION_RE to detect: - Numbered sections: 1.1 Title, 2.3.1 Subtitle - Control family IDs: AC-1, AU-2, PO.1, PW.1.1 - Table/Figure/Appendix references Also adds EUR-Lex HTML replacement script. 58 embedding-service tests passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 19:24:10 +02:00
Benjamin Admin	5a6e588641	docs: update session handover — D2-D5 complete, EU PDF issue documented Session achieved: structural metadata end-to-end (D2-D4), overlap bug fix, HTML stripping with charset detection, 430/436 docs re-ingested. Remaining: ~40 EU Official Journal PDFs need HTML from EUR-Lex (broken multi-column PDF extraction), 3 missing EDPB PDFs, 1 corrupt PDF. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 17:34:34 +02:00
Benjamin Admin	41183ff93d	fix(docker): set PDF_EXTRACTION_BACKEND to auto (was pymupdf) The default was 'pymupdf' which doesn't exist as a backend, causing fallthrough to pypdf every time. With 'auto', the priority is: unstructured > pdfplumber > pypdf. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 17:30:33 +02:00
Benjamin Admin	75dda9ac92	feat(embedding): add pdfplumber backend for multi-column PDF extraction EU Official Journal PDFs (AI Act, CRA, NIS2, DSGVO, etc.) use multi-column layouts that pypdf breaks into fragmented words ("Ar tik el" instead of "Artikel"). pdfplumber handles these correctly. Backend priority: unstructured > pdfplumber > pypdf (auto mode). Also increases D5 re-ingestion timeout to 3600s for large PDFs. 58 embedding-service tests passing. pdfplumber: MIT license. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-02 15:42:25 +02:00

1 2 3 4 5 ...

698 Commits