299 Commits

Author SHA1 Message Date
Benjamin Admin
88d0619184 fix(pitch-deck): SUMME Betriebliche includes Personalkosten + Abschreibungen
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m7s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 33s
CI / test-python-voice (push) Successful in 35s
CI / test-bqas (push) Successful in 33s
Bug: "SUMME Betriebliche Aufwendungen" excluded Personalkosten and
Abschreibungen because they have is_sum_row=true. Result: both sum
rows showed identical values.

Fix: explicitly include Personalkosten and Abschreibungen rows in
the SUMME Betriebliche calculation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 10:54:19 +02:00
Benjamin Admin
6111494460 fix(pitch-deck): remove Berechnen button + cell editing from Finanzplan
Some checks failed
Build pitch-deck / build-push-deploy (push) Successful in 1m10s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 29s
CI / test-python-voice (push) Successful in 30s
CI / test-bqas (push) Has been cancelled
Finanzplan is now read-only for investors:
- Removed "Berechnen" / "Compute" button
- Removed cell double-click editing
- Removed blue edit indicator dots
- All sums computed live in frontend (no manual recompute needed)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 10:51:43 +02:00
Benjamin Admin
73e3749960 fix(pitch-deck): SUMME footer works in both annual and monthly view
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m4s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 34s
CI / test-python-voice (push) Successful in 29s
CI / test-bqas (push) Successful in 27s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 10:46:16 +02:00
Benjamin Admin
f57bdfa151 chore: trigger deploy
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m9s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 32s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Successful in 29s
2026-04-20 10:42:51 +02:00
Benjamin Admin
34b519eebb fix(pitch-deck): Investitionen tab shows values (was empty due to values_invest field)
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m2s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 27s
CI / test-python-voice (push) Successful in 31s
CI / test-bqas (push) Successful in 31s
getValues() now reads values_invest for investment rows.
Previously only read values/values_total/values_brutto, missing the
invest-specific field name.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 10:38:03 +02:00
Benjamin Admin
66fb265f22 feat(pitch-deck): collapsible year view in Finanzplan + remove section labels
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m6s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 30s
CI / test-python-voice (push) Successful in 31s
CI / test-bqas (push) Successful in 31s
- Year navigation: "Alle Jahre" shows 5 annual columns, individual years show 12 months
- Default starts at single year view
- Annual view: flow rows show yearly sum, balance rows show Dec value
- Removed [section] labels from row display
- Footer sum only shown in monthly view (not annual)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 10:31:47 +02:00
Benjamin Admin
ec7326cfe1 feat(pitch-deck): live-compute sums for Liquidität + Kunden + Umsatz tabs
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m6s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 31s
CI / test-python-voice (push) Successful in 30s
CI / test-bqas (push) Successful in 32s
Extended live-compute to ALL tabs:
- Liquidität: "Summe ERTRÄGE" = sum of einzahlung rows,
  "Summe AUSZAHLUNGEN" = sum of auszahlung rows
- Kunden: GESAMT rows = sum of tier detail rows
- Umsatz: GESAMTUMSATZ = sum of all revenue rows
- Materialaufwand: SUMME = sum of cost rows

ÜBERSCHUSS rows kept from DB (complex multi-step formula).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 10:26:23 +02:00
Benjamin Admin
67ed5e542d feat(pitch-deck): live-computed sum rows in Finanzplan (like Excel formulas)
Some checks failed
Build pitch-deck / build-push-deploy (push) Successful in 1m11s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 29s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Has been cancelled
Sum rows (is_sum_row=true) are now computed live in the frontend from
their detail rows, not read from stale DB values. This means:
- Category sums (Versicherungen, Marketing, Sonstige etc.) always match
- "Summe sonstige" = all non-personal, non-AfA rows
- "SUMME Betriebliche" = all rows including personal + AfA
- No more manual recompute needed after DB changes

Also: chart labels increased from 7-8px to 11px for readability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 10:23:59 +02:00
Benjamin Admin
6ec27fdbf2 feat(pitch-deck): larger chart labels + 2 new charts (Liquidität + Revenue vs Costs)
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m6s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 33s
CI / test-python-voice (push) Successful in 34s
CI / test-bqas (push) Successful in 28s
Charts tab:
- All bar labels increased from 7-8px to 11px (readable)
- New: Liquidität (Jahresende) bar chart — shows cash position per year
- New: Umsatz vs. Gesamtkosten — side-by-side bars per year
- All charts read from fpKPIs (fp_* source of truth)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 10:14:39 +02:00
Benjamin Admin
9513675d85 fix(pitch-deck): Finanzplan starts on GuV tab instead of Personalkosten
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m8s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 28s
CI / test-python-voice (push) Successful in 27s
CI / test-bqas (push) Successful in 29s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 10:11:39 +02:00
Benjamin Admin
8e2329be53 fix(pitch-deck): trial churn (25% leave after 3 months) + remove unit_cost label
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m4s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 30s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Successful in 31s
Churn model: 25% of new customers leave after 3 months (trial period).
Remaining customers have normal monthly churn (3% Starter, 2% Pro, 1% Ent).
Churn label shows "25% Trial + X%/Mon".

DB: section 'unit_cost' renamed to 'einkauf' (removed English label from UI).
Code: unit price detection updated for new section name.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 10:07:45 +02:00
Benjamin Admin
19214bfd66 fix(pitch-deck): remove Sonst. Erträge tab + add investments
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m7s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 30s
CI / test-python-voice (push) Successful in 34s
CI / test-bqas (push) Successful in 33s
- Removed 'sonst_ertraege' from SHEET_LIST (empty, irrelevant for pre-seed)
- DB: Added Mac Studio (LLM Training) 13.000 EUR, Jan 2027, AfA 3 Jahre
- DB: Added Software-Lizenzen (GWG) 800 EUR/Jahr (2026-2030)
- DB: Added Domain/SSL/Zertifikate (GWG) 500 EUR at founding
- DB: Removed GPU-Server (wrong assumption — Mac Studio used instead)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 09:52:19 +02:00
Benjamin Admin
53e61c6dcd fix(pitch-deck): no costs before founding month (FOUNDING_MONTH)
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m8s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 32s
CI / test-python-voice (push) Successful in 32s
CI / test-bqas (push) Successful in 31s
Engine: All formula-based rows (F) now start at FOUNDING_MONTH (m8),
not m1. Affects: Fortbildung, Fahrzeug, KFZ, Reise, Bewirtung,
Internet, BG, Marketing, Serverkosten, Gewerbesteuer.

DB: All manual (M) betriebliche rows zeroed for m1-m7 across all
6 scenarios.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 09:45:45 +02:00
Benjamin Admin
728f698f9e fix(pitch-deck): remove redundant Summe rows from Umsatz/Material/Kunden tabs + total line styling
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m7s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 33s
CI / test-python-voice (push) Successful in 29s
CI / test-bqas (push) Successful in 34s
- Removed auto-generated SUMME footer from umsatzerloese, materialaufwand, kunden tabs
  (GESAMTUMSATZ/Bestandskunden gesamt rows already exist in DB data)
- GESAMT/Total rows now have thicker top border (border-t-2 white/20)
- unit_cost rows show unit price instead of annual sum

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 09:42:16 +02:00
Benjamin Admin
511a7de627 fix(pitch-deck): unit_cost rows show price not annual sum in Finanzplan
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m10s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 31s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 28s
Einkaufspreis rows (Mac Mini/Studio) showed sum of 12 months (e.g. 38,400)
instead of the unit price (3,200). Now detected via section='unit_cost'
or label contains 'Einkaufspreis' and shows the price value instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 09:36:42 +02:00
Benjamin Admin
9d82f15c53 fix(pitch-deck): FinanzplanSlide selects correct fp_scenario per version
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m8s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 34s
CI / test-python-voice (push) Successful in 35s
CI / test-bqas (push) Successful in 33s
Bug: Finanzplan data grid always loaded Base Case (is_default=true)
even for Wandeldarlehen version, showing 35 employees + module-based
customers instead of lean 10-person plan.

Fix: isWandeldarlehen prop passed to FinanzplanSlide. On load, picks
Wandeldarlehen scenario by name match instead of is_default.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 20:37:00 +02:00
Benjamin Admin
b0918fd946 fix(pitch-deck): GuV + Cashflow tabs read from fp_* data, Break-Even as year
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m31s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 40s
CI / test-python-voice (push) Successful in 37s
CI / test-bqas (push) Successful in 31s
FinancialsSlide:
- Break-Even: shows year (2029) instead of formatted number (2.029)
- GuV tab: replaced AnnualPLTable (useFinancialModel) with fp_guv data table
  Shows: Revenue, Personnel, EBIT, Taxes, Net Income per year
- Cashflow tab: replaced AnnualCashflowChart (useFinancialModel) with
  fp_liquiditaet bar chart showing cash position + EBIT per year
- Both tabs now show "Quelle: Finanzplan" label

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 20:33:30 +02:00
Benjamin Admin
7b31b462a0 fix(pitch-deck): 1 Mio investment amount everywhere (975k → 1M)
Some checks failed
Build pitch-deck / build-push-deploy (push) Failing after 25s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Failing after 7s
CI / test-python-voice (push) Failing after 10s
CI / test-bqas (push) Failing after 9s
Updated:
- CapTable: 975k → 1M, 19.6% → 20%, Gründer 37.5% → 37.3%
- FAQ: investment-captable answer updated to 1M
- Production DB: fp_liquiditaet Fremdkapital 975k → 1M (Base + Bear + Bull)
- Production DB: pitch_version_data funding amount → 1M
- All 3 scenarios (Base/Bear/Bull) recomputed with new amounts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 18:16:26 +02:00
Benjamin Admin
021faedfa3 fix(pitch-deck): CapTable slide — remove Gehälter/Gewinnverwendung, fix BAFA + 1M amounts
Some checks failed
Build pitch-deck / build-push-deploy (push) Failing after 23s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Failing after 10s
CI / test-python-voice (push) Failing after 10s
CI / test-bqas (push) Failing after 11s
Removed:
- Gründergehälter card (entire section)
- Gewinnverwendung card (entire section)
- Instrument line from Pre-Seed Runde

Updated:
- Series A → "Series A Ausblick (Optional)"
- Investment: 975.000 → 1.000.000 EUR
- Post-Money: 4.975.000 → 5.000.000 EUR
- BAFA INVEST: 20% → 15% Erwerbszuschuss + 25% Exit-Zuschuss

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 18:07:30 +02:00
Benjamin Admin
0b30c5e66c feat(pitch-deck): Bear/Bull scenarios from DB + Assumptions slide reads all 3
Some checks failed
Build pitch-deck / build-push-deploy (push) Failing after 21s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Failing after 6s
CI / test-python-voice (push) Failing after 9s
CI / test-bqas (push) Failing after 8s
4 new fp_scenarios created on production:
- Wandeldarlehen Bear (5%/8% growth, 1.5x churn → 11 customers 2030)
- Wandeldarlehen Bull (12%/16% growth, 0.7x churn → 999 customers 2030)
- 1 Mio Bear (6%/10% growth, 1.5x churn → 22 customers 2030)
- 1 Mio Bull (12%/18% growth, 0.7x churn → 2574 customers 2030)

AssumptionsSlide loads all 3 scenarios (Bear/Base/Bull) from their
respective fp_* tables. No more scaling factors — real DB data.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 18:04:59 +02:00
Benjamin Admin
824f8a7ff2 fix(pitch-deck): remove duplicate phases from GTM slide
Some checks failed
Build pitch-deck / build-push-deploy (push) Failing after 25s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Failing after 11s
CI / test-python-voice (push) Failing after 10s
CI / test-bqas (push) Failing after 12s
Phases were duplicated between GTM slide and Strategy slide.
GTM now shows only: ICP (Ideal Customer Profile) + Channel Mix.
Phases remain exclusively on Strategy slide (version-aware).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 17:57:45 +02:00
Benjamin Admin
5914ec6cd5 feat(pitch-deck): AIPipeline slide numbers from pitch_pipeline_stats DB table
Some checks failed
Build pitch-deck / build-push-deploy (push) Successful in 1m8s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Failing after 9s
CI / test-python-voice (push) Failing after 27s
CI / test-bqas (push) Failing after 27s
All KPI numbers on the AI Pipeline slide now load from the
pitch_pipeline_stats table via /api/pipeline-stats:
- Legal sources: 380+ (was hardcoded 75+)
- Unique controls: 25k+ (was 70k+)
- Obligations: 47k+ (from DB)
- EU regulations, DACH laws, frameworks: from DB
- Pipeline steps text: all counts dynamic

Numbers can be updated via SQL without code deploy:
UPDATE pitch_pipeline_stats SET value = X WHERE key = 'legal_sources';

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 17:55:22 +02:00
Benjamin Admin
30c63bbef6 feat(pitch-deck): Use of Funds computed from fp_* spending data
Some checks failed
Build pitch-deck / build-push-deploy (push) Successful in 1m20s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Failing after 22s
CI / test-python-voice (push) Successful in 32s
CI / test-bqas (push) Successful in 35s
Use of Funds pie chart now shows actual spending breakdown from fp_* tables
(months 8-24) instead of manually set percentages:
- Engineering & Personal: from fp_personalkosten
- Vertrieb & Marketing: from fp_betriebliche (marketing category)
- Betrieb & Infrastruktur: from fp_betriebliche (other categories)
- Hardware & Ausstattung: from fp_investitionen

Falls back to funding.use_of_funds if fp_* data not yet loaded.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 17:50:08 +02:00
Benjamin Admin
7be1a296c6 feat(pitch-deck): NRR + Payback formula-based from fp_* data
Some checks failed
Build pitch-deck / build-push-deploy (push) Successful in 1m11s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 39s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Has been cancelled
- NRR: Revenue year N / Revenue year N-1 × 100 (no more "target > 120%")
- Payback: CAC / monthly gross profit (no more "target < 3 months")
- Both computed in useFpKPIs hook from fp_guv data
- BusinessModelSlide shows computed values with "(berechnet)" label

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 17:47:11 +02:00
Benjamin Admin
e524786ac0 chore: trigger deploy after GuV recompute + isWandeldarlehen fix
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m48s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 41s
CI / test-bqas (push) Successful in 37s
2026-04-19 17:41:03 +02:00
Benjamin Admin
0ee2b1538a fix(pitch-deck): critical — isWandeldarlehen exact match, not includes
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 40s
CI / test-python-voice (push) Successful in 31s
CI / test-bqas (push) Successful in 42s
Bug: 1 Mio version instrument "Stammkapital + Wandeldarlehen + Equity"
matched includes('wandeldarlehen'), applying lean logic to 1 Mio version.

Fix: === 'wandeldarlehen' (exact match).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 17:11:57 +02:00
Benjamin Admin
dd6e2f8bd7 feat(pitch-deck): MOAT on USP slide + version-aware GTM slide
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m11s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 37s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Successful in 36s
USP Slide:
- Added 3 MOAT statements as prominent section:
  1. End-to-End Traceability (Law → Obligation → Control → Code)
  2. Continuous Compliance Engine (every change, real-time evidence)
  3. EU-Trust & Governance Stack (sovereign, GDPR/AI Act native)

GTM Slide (version-aware):
- Wandeldarlehen: Founder sales → Content/SEO → Organic growth
  3 phases (Pilot → Organic → Scaling), no AEs in 2027
- 1 Mio: Direct sales + Channel (unchanged)
  3 phases with 5-20 KMU target, 2 AEs from 2027

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 17:06:14 +02:00
Benjamin Admin
f66f32ee9d fix(pitch-deck): all financial slides now read from fp_* tables via useFpKPIs
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 34s
CI / test-python-voice (push) Successful in 32s
CI / test-bqas (push) Successful in 32s
New shared hook: useFpKPIs — loads annual KPIs from fp_guv/liquiditaet/personal/kunden.
Replaces useFinancialModel (simplified model) for KPI display on all slides.

Slides updated:
- CompetitionSlide: "110 Gesetze" → "380+ Regularien & Normen"
- BusinessModelSlide: ACV + Gross Margin from fp_* (was useFinancialModel)
- ExecutiveSummarySlide: Unternehmensentwicklung from fp_* (was useFinancialModel)
- FinancialsSlide: KPI cards from fp_* (ARR, Customers, Break-Even, EBIT 2030)

All slides now show consistent numbers from the same source of truth (fp_* tables).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 17:00:27 +02:00
Benjamin Admin
de308b7397 fix(pitch-deck): Assumptions slide reads KPIs from fp_* tables + version-aware text
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 37s
CI / test-bqas (push) Successful in 35s
- Base Case KPIs now loaded from fp_guv/fp_liquiditaet/fp_kunden (source of truth)
- Bear/Bull derived from Base with scaling factors
- Assumptions text conditional: Wandeldarlehen shows lean plan (3→10, 8%/15% growth),
  1 Mio shows original (5→35, aggressive growth)
- Removed dependency on useFinancialModel (simplified model)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 14:37:31 +02:00
Benjamin Admin
8402e57323 fix(pitch-deck): Umlaute in RiskSlide + Sidebar-Name für Risiken
Some checks failed
Build pitch-deck / build-push-deploy (push) Successful in 1m25s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 38s
CI / test-python-voice (push) Successful in 40s
CI / test-bqas (push) Has been cancelled
- Alle ä/ö/ü in RiskSlide.tsx korrigiert
- slideNames in i18n.ts: 'Risiken & Mitigation' (DE) + 'Risks & Mitigation' (EN) hinzugefügt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 14:34:06 +02:00
Benjamin Admin
1212f6ddfb feat(pitch-deck): version-aware Strategy slide (Wandeldarlehen vs 1 Mio)
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 37s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Successful in 33s
Strategy slide now shows different phases per pitch version:

Wandeldarlehen (lean):
- Phase 1: 3 Personen, ~60k ARR, Prototyp → Produktiv
- Phase 2: 4-5 Personen, ~200k ARR, erster Dev + Security
- Phase 3: 5-7 Personen, ~500k-1M ARR, Vertrieb + Break-Even
- Phase 4: 7-10 Personen, ~2-3M ARR, profitabel organisch

1 Mio (unchanged):
- Phase 1-4: 5→35 MA, 75k→10M ARR

Risks slide already visible for both versions (in slide order).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 11:49:31 +02:00
Benjamin Admin
ac2299226a feat(pitch-deck): add Risks & Mitigation slide (vorletzte Folie)
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 37s
CI / test-python-voice (push) Successful in 35s
CI / test-bqas (push) Successful in 33s
New slide with 6 risks and concrete mitigations:
1. AI Commoditization — Layer 2-6 moat, not Layer 1
2. US Platform Expansion — EU-only infrastructure, CLOUD Act barrier
3. Team/Key-Person Risk — Documentation, ESOP, early legal hire
4. Slow Customer Acquisition — Consulting revenue bridge, channel strategy
5. Regulatory Changes — Enlarges market, RAG indexes in days
6. Liquidity Risk — Organic growth, Pre-Seed BW option

Key quote: "We don't compete with AI. We compete with teams that
use AI better than we do."

Presenter script added for the risks slide.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 11:45:51 +02:00
Benjamin Admin
607dab4f26 fix(pitch-deck): KPIs + Charts on Folie 28 now read from fp_* tables directly
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m12s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 34s
CI / test-python-voice (push) Successful in 34s
CI / test-bqas (push) Successful in 29s
Previously KPIs/Charts used useFinancialModel (simplified model) which had
different assumptions than the fp_* tables (source of truth).

Now: KPIs tab loads from fp_guv, fp_liquiditaet, fp_personalkosten, fp_kunden
via API. Charts (MRR, EBIT, Headcount) also use fp_* data.

Removed dependency on useFinancialModel and computeAnnualKPIs for this slide.
Added Liquidität (Dez) row to KPIs table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 08:37:46 +02:00
Benjamin Admin
3b8f9b595e fix(pitch-deck): lean cost structure for Wandeldarlehen scenario
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Successful in 34s
Engine formula adjustments (reduced for lean startup):
- Fortbildung: 500→300, Fahrzeug: 400→200, KFZ-Steuer: 50→25
- KFZ-Versicherung: 500→150, Reise: 100→75, Bewirtung: 200→100
- Serverkosten: 100/Kunde→50/Kunde, Basis 500→300

Tooltips updated to match new values.

DB (production): All (M) rows reduced to lean levels:
- Raumkosten: 5000→0 (remote, kein Büro)
- Versicherungen: ~1700→800/Mon (Startup-Tarife)
- Verbrauchsmaterial: 500→50, Werkzeuge: 300→100
- Rechts-/Beratung: nur Gründungskosten (m8-m10)

Result: Liquidität Ende 2027 ≈ 0 (4.496 EUR), Break-Even 2029.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-19 08:25:57 +02:00
Benjamin Admin
84a0280c52 feat(pitch-deck): Gewerbesteuer formula + BG/Marketing/Telefon engine formulas + tooltips
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m34s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 38s
CI / test-python-voice (push) Successful in 39s
CI / test-bqas (push) Successful in 41s
Engine:
- Gewerbesteuer (F): 12.25% of monthly profit (only when positive)
- Berufsgenossenschaft (F): 2.77% of brutto payroll
- Allgemeine Marketingkosten (F): 10% of revenue
- Internet/Mobilfunk (F): Headcount × 50 EUR/Mon

UI: Tooltip for Gewerbesteuer formula added.

DB changes (production):
- Gewerbesteuer: (M) → (F), auto-calculated
- Rechtsanwalt/Datenschutz: new hire Oct 2026, 7500 EUR brutto
- Beratung & Services: new revenue line (5k→30k/Mon)
- Investitionen: Home Office 2500 EUR per new hire
- Marketing Videos moved to marketing category
- Bank → Bank-/Kreditkartengebühren
- Jahresabschluss costs filled (1000-2000 EUR/year)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 14:29:38 +02:00
Benjamin Admin
dc36e59d17 feat(pitch-deck): formula engine + tooltips for betriebliche Aufwendungen
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m27s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 36s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Successful in 34s
Engine formulas added:
- Berufsgenossenschaft (F): 2.77% of total brutto payroll (VBG IT rate)
- Internet/Mobilfunk (F): Headcount × 50 EUR/Mon
- Allgemeine Marketingkosten (F): 10% of monthly revenue

UI: Hover tooltips on all (F) and computed rows showing the formula.
SUMME matcher updated for renamed "SUMME Betriebliche Aufwendungen".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 14:15:16 +02:00
Benjamin Admin
9bb689b7e6 Merge branch 'main' of ssh://gitea.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-core
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m22s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 37s
CI / test-python-voice (push) Successful in 40s
CI / test-bqas (push) Successful in 39s
2026-04-18 13:03:31 +02:00
Benjamin Admin
d01a50a4b1 feat(pitch-deck): formula-based betriebliche rows in Finanzplan engine
Compute engine now auto-calculates these rows from headcount/customers:
- Fort-/Weiterbildungskosten (F): MA (excl. founders) × 500 EUR/Mon
- Fahrzeugkosten (F): MA (excl. founders) × 400 EUR/Mon
- KFZ-Steuern (F): MA (excl. founders) × 50 EUR/Mon
- KFZ-Versicherung (F): MA (excl. founders) × 500 EUR/Mon
- Reisekosten (F): Headcount × 100 EUR/Mon
- Bewirtungskosten (F): Enterprise-Kunden × 200 EUR/Mon
- Serverkosten Cloud (F): Bestandskunden × 100 EUR + 500 EUR Basis

Labels marked (F) for formula, (M) for manual in production DB.
Gesamtkosten matcher updated for renamed "SUMME Betriebliche Aufwendungen".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 13:03:18 +02:00
Sharang Parnerkar
51e75187ed feat(pitch-deck): add force recompute to bypass stale pitch_fm_results cache
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m1s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 27s
CI / test-python-voice (push) Successful in 25s
CI / test-bqas (push) Successful in 28s
Adds `force: true` body param to POST /api/financial-model/compute that
skips the cached results check and recomputes from assumptions directly.
Exposes this via a "Force Recompute" button on the scenario edit admin page,
so updating assumptions directly in the DB can be followed by a cache bust
without touching the UI assumption flow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 11:10:35 +02:00
Sharang Parnerkar
e37fd3bbe4 fix: remove scenario dropdown from FinanzplanSlide
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m2s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 28s
CI / test-python-voice (push) Successful in 29s
CI / test-bqas (push) Successful in 26s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 11:04:26 +02:00
Sharang Parnerkar
11fa490599 fix: finanzplan scenario selector — load from API, no hardcoded UUID
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m4s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 29s
CI / test-python-voice (push) Successful in 27s
CI / test-bqas (push) Successful in 28s
Replaces the FM-name-based 'wandeldarlehen' hack with a proper scenario
picker. Scenarios are fetched from /api/finanzplan, default is selected
automatically. Dropdown appears when multiple scenarios exist.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 10:56:52 +02:00
Sharang Parnerkar
27ef21a4f0 feat: git SHA version badge in admin, fix finanzplan caching, drop gitea remote
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m4s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 28s
CI / test-python-voice (push) Successful in 28s
CI / test-bqas (push) Successful in 26s
- AdminShell: shows NEXT_PUBLIC_GIT_SHA in sidebar footer
- Dockerfile + build-pitch-deck.yml: pass --build-arg GIT_SHA at build time
- FinanzplanSlide: fetch with cache:no-store to always show current DB values
- finanzplan routes: Cache-Control: no-store to prevent CDN/proxy staling
- CLAUDE.md: remove dead gitea remote (only origin exists)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 10:47:51 +02:00
Sharang Parnerkar
b3643ddee9 Merge branch 'main' of ssh://coolify.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-core
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m2s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 26s
CI / test-python-voice (push) Successful in 29s
CI / test-bqas (push) Successful in 26s
2026-04-17 10:39:54 +02:00
Sharang Parnerkar
68b7660ce3 docs: replace all Coolify references with Orca across core repo
CI/CD pipeline migrated from Coolify to Orca.
Updated CLAUDE.md, pre-push-checks, docs-src, and pitch-deck scripts/slides.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 10:39:47 +02:00
Benjamin Admin
2d61911d98 chore: trigger pitch-deck CI + deploy
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m12s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 30s
CI / test-python-voice (push) Successful in 29s
CI / test-bqas (push) Successful in 31s
2026-04-17 10:23:31 +02:00
Benjamin Admin
9f642901ab chore: trigger pitch-deck CI build
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 34s
CI / test-python-voice (push) Successful in 35s
CI / test-bqas (push) Successful in 34s
2026-04-17 09:53:47 +02:00
Benjamin Admin
add7400b78 chore: retrigger CI for pitch-deck fm_scenarios fix
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 41s
CI / test-python-voice (push) Successful in 35s
CI / test-bqas (push) Successful in 32s
2026-04-17 09:45:48 +02:00
Benjamin Admin
65cc5200ea chore: trigger coolify rebuild (fm_scenarios fix)
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 33s
2026-04-17 08:55:11 +02:00
Benjamin Admin
ede93a7774 chore: trigger rebuild after build verification
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 33s
CI / test-python-voice (push) Successful in 34s
CI / test-bqas (push) Successful in 32s
2026-04-17 08:47:05 +02:00
Benjamin Admin
bc020e9f64 Merge branch 'main' of ssh://gitea.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-core
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m9s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 36s
CI / test-python-voice (push) Successful in 34s
CI / test-bqas (push) Successful in 33s
2026-04-17 08:36:07 +02:00
Benjamin Admin
bad4659d5b fix(pitch-deck): include fm_scenarios in preview-data API response
The admin preview was not returning fm_scenarios/fm_assumptions,
so preferredScenarioId was always null and all financial slides
fell back to Base Case (1M) instead of the version's scenario.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 08:35:22 +02:00
Sharang Parnerkar
e3b33ef596 docs: add AGENTS.python/go/typescript.md and pre-push check rules
Some checks failed
CI / go-lint (push) Has been cancelled
CI / python-lint (push) Has been cancelled
CI / nodejs-lint (push) Has been cancelled
CI / test-go-consent (push) Has been cancelled
CI / test-python-voice (push) Has been cancelled
CI / test-bqas (push) Has been cancelled
Mandatory pre-push gates for all three language stacks with exact
commands, common pitfalls, and architecture rules. CLAUDE.md updated
with quick-reference section linking to the new files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 08:35:12 +02:00
Sharang Parnerkar
39255f2c9e fix(pitch-deck): hoist textLang const out of fetch object literal
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m8s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 30s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 32s
Syntax error: const declaration was inside the options object.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 08:16:19 +02:00
Benjamin Admin
030991cb9a chore: trigger rebuild 2
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 34s
CI / test-bqas (push) Has been cancelled
CI / test-python-voice (push) Has been cancelled
2026-04-17 08:15:13 +02:00
Benjamin Admin
fa9b554f50 fix(pitch-deck): TTS letter spelling (CE/SAST/DAST) + Finanzplan slide loads version scenario
Some checks failed
Build pitch-deck / build-push-deploy (push) Failing after 23s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 34s
CI / test-python-voice (push) Successful in 32s
CI / test-bqas (push) Successful in 29s
TTS:
- CE → "C. E." for letter-by-letter pronunciation
- SAST → "S. A. S. T.", DAST → "D. A. S. T."

Finanzplan Slide 28:
- Data grid now loads Wandeldarlehen fp_scenario when active FM scenario
  contains "wandeldarlehen" (scenarioId=c0000000-...-000000000200)
- Base Case version continues to load default fp_scenario

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 08:10:18 +02:00
Benjamin Admin
788714ecec chore: trigger coolify rebuild
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 32s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 42s
2026-04-17 08:05:09 +02:00
Benjamin Admin
08ca17c876 fix(pitch-deck): presenter script — prototype status, no production claims before Aug 2026
Some checks failed
Build pitch-deck / build-push-deploy (push) Failing after 22s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 35s
CI / test-bqas (push) Successful in 33s
- Traction slide: "funktionsfähig und deployed" → "Prototyp-Stadium, mit Testkunden validiert"
- "bereit für zahlende Kunden" → "Ab August 2026 produktiver Betrieb"
- SDK Demo: "produktive Plattform" → "funktionierender Prototyp, mit Testkunden validiert"
- USP: "produktive Engine" → "leistungsfähige Engine"

Until founding in August 2026, all references must indicate prototype/test status.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 01:20:21 +02:00
Benjamin Admin
c157e9cbca fix(pitch-deck): TTS language detection, technical FAQ, proper German umlauts + abbreviations
Some checks failed
Build pitch-deck / build-push-deploy (push) Failing after 23s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 34s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Has been cancelled
TTS Language Bug:
- ChatFAB: detect response language from text content instead of UI language
- German text with umlauts/ß triggers German TTS even when UI is in English

Presenter Script (German TTS pronunciation):
- Add proper umlauts (ä/ö/ü) throughout German text
- Expand abbreviations for clear pronunciation:
  DSGVO → Datenschutz-Grundverordnung
  SAST → Static Application Security Testing
  DAST → Dynamic Application Security Testing
  SBOM → Software Bill of Materials
  VVT → Verarbeitungsverzeichnis
  TOMs → technisch-organisatorische Maßnahmen
  BSI → Bundesamt für Sicherheit in der Informationstechnik
  KMU → kleine und mittlere Unternehmen, etc.

Technical FAQ (12 new entries):
- BGE-M3, RAG, Qdrant, Cross-Encoder, Hybrid Search
- SAST/DAST, SBOM, BSI, Cloud Providers (SysEleven/Hetzner)
- Controls/Prüfaspekte, Policy Engine, VVT/TOMs/DSFA

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 01:18:03 +02:00
Benjamin Admin
9005a05bd7 fix(pitch-deck): version-aware financial model + layout fix + COMPLAI spelling
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m2s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 26s
CI / test-python-voice (push) Successful in 25s
CI / test-bqas (push) Successful in 26s
Critical fix: All financial slides now use the version's preferred scenario
instead of always defaulting to Base Case (1M). This ensures the
Wandeldarlehen version shows its own lean financial plan.

- useFinancialModel: add preferredScenarioId parameter
- PitchDeck: extract default scenario from previewData.fm_scenarios
- Pass preferredScenarioId to all 5 financial slides
- FinancialsSlide layout: remove empty right column, full-width charts
- Remove ScenarioSwitcher + unused slider from FinancialsSlide
- Fix COMPLEI → COMPLAI in presenter script (only TTS pronunciation differs)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 01:02:57 +02:00
Benjamin Admin
98081ae5eb fix(pitch-deck): add loading fallback for Unternehmensentwicklung tile
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m8s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 36s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Successful in 34s
Shows "Lade Finanzplan..." when annualKPIs is empty (data not yet loaded)
instead of rendering nothing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 00:47:34 +02:00
Benjamin Admin
c99e35438c feat(pitch-deck): rewrite presenter script — emotional tone, correct numbers, all slides
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m13s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 31s
CI / test-python-voice (push) Successful in 32s
CI / test-bqas (push) Successful in 33s
- Fix "110 Gesetze" → "380+ Regularien" (all occurrences)
- Fix savings: 30k/20k → 13k/9k matching SavingsSlide KMU (55k total, 3.7x ROI)
- Fix "COMPLAI" → "COMPLEI" (pronunciation: like Ei, not AI)
- Remove "Frankreich/France" references
- Remove hardcoded financial projections (now reference computed data)
- Add missing slide scripts: usp, cap-table, customer-savings, annex-strategy,
  annex-finanzplan, annex-glossary, legal-disclaimer
- More emotional, positive, investor-focused tone throughout
- Fix "38 Verordnungen" → "380+ Regularien" in AI pipeline
- Fix module count: "12" → "65 Compliance-Module"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 00:43:30 +02:00
Benjamin Admin
1241a14ea5 Merge branch 'main' of ssh://gitea.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-core
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 28s
CI / test-bqas (push) Successful in 35s
2026-04-17 00:27:17 +02:00
Benjamin Admin
0712d18824 fix(pitch-deck): remove assumption sliders from Financials slide
Investors should not be able to modify business case assumptions.
Questions should be directed to founders via the AI chat agent.
Scenario switcher is kept for viewing different scenarios.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 00:27:01 +02:00
Sharang Parnerkar
71040dcd33 revert: remove <en> tag mixed-language approach from presenter scripts
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 13s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 34s
CI / test-python-voice (push) Successful in 35s
CI / test-bqas (push) Successful in 33s
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 22:37:23 +02:00
Sharang Parnerkar
0923d9b051 fix(presenter): strip <en> tags from displayed subtitle text
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m5s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 33s
CI / test-python-voice (push) Successful in 38s
CI / test-bqas (push) Successful in 29s
Tags are TTS-only markers; display should show plain text.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 22:32:02 +02:00
Sharang Parnerkar
909301a4de feat(pitch-deck): wrap English words with <en> tags for correct TTS pronunciation
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 34s
CI / test-python-voice (push) Successful in 34s
CI / test-bqas (push) Successful in 33s
DevSecOps, Onepager, SaaS, deployed, TypeScript, RegTech, OpenAI,
PostgreSQL, NVIDIA, GitLab, Full Compliance GPT, ERPNext — all marked
for English voice synthesis in German presenter script and FAQ.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 22:24:26 +02:00
Sharang Parnerkar
d548ce4199 fix(pitch-deck): refresh expired JWT from live DB session on cookie read
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m13s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 37s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Successful in 34s
When jwtVerify fails (JWT expired), decode the token without expiry check
to recover sessionId, validate it against the DB, and reissue a fresh 24h
JWT. Fixes investors with old 1h JWTs being locked out on magic link re-click.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 22:18:52 +02:00
Sharang Parnerkar
0188a46afb fix(pitch-deck): fix TTS pronunciation of 25.000+ in presenter scripts
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m10s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 38s
CI / test-python-voice (push) Successful in 38s
CI / test-bqas (push) Successful in 34s
Replace '25.000+' with 'über 25 Tausend' in DE text so Edge TTS speaks
it correctly instead of 'fünfundzwanzig Punkt null null null plus'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 22:14:57 +02:00
Sharang Parnerkar
d6be61cdcf fix(pitch-deck): align JWT expiry with session lifetime (24h)
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m10s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 32s
CI / test-bqas (push) Successful in 37s
JWT was set to 1h while the session cookie lived 24h. After 1 hour the
cookie persisted but jwtVerify failed, making /api/auth/me return 401
and the re-click redirect fall through to the already-used token error.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 22:09:12 +02:00
Sharang Parnerkar
6e6525a416 fix(pitch-deck): pin presenter TTS to Edge TTS (de-DE-ConradNeural)
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m25s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 44s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Successful in 37s
German permanently routes to compliance TTS service (Edge TTS neural
voice, Piper fallback). OVH DE path removed — no env var can flip it
back accidentally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:44:12 +02:00
Sharang Parnerkar
6a6b3e8cee feat(pitch-deck): make OVH DE TTS opt-in via OVH_TTS_ENABLED_DE env var
Some checks failed
Build pitch-deck / build-push-deploy (push) Successful in 1m25s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 47s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Has been cancelled
Without the flag, German routes to the compliance TTS service which uses
Edge TTS (de-DE-ConradNeural) with Piper as fallback — easier to A/B
between OVH and compliance/Edge TTS without code changes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:40:32 +02:00
Sharang Parnerkar
09ac22f692 fix(pitch-deck): revert OVH synthesis rate to 16000 Hz
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 49s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Successful in 36s
OVH honours sample_rate_hz and returns data at exactly the requested
rate, so synthesis and WAV header rates must always match. Decoupled
22050/16000 caused 22050 Hz PCM wrapped in a 16000 Hz header → slow
bloated playback. Both back to 16000 Hz.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:37:52 +02:00
Sharang Parnerkar
5a476ac97d fix(pitch-deck): decouple OVH synthesis rate from WAV header rate
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m26s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 49s
CI / test-python-voice (push) Successful in 40s
CI / test-bqas (push) Successful in 37s
OVH uses sample_rate_hz in the request for internal synthesis quality
but always outputs raw PCM at 16000 Hz. Sending 22050 for synthesis
gives better pronunciation; declaring 16000 in the WAV header gives
correct playback speed. Previously both were the same value, forcing
a tradeoff between quality and speed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:26:54 +02:00
Sharang Parnerkar
4f2a963834 fix(pitch-deck): set OVH TTS sample rate to 16000 Hz (Riva native)
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m27s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 50s
CI / test-python-voice (push) Successful in 40s
CI / test-bqas (push) Successful in 36s
OVH Riva ignores the sample_rate_hz request param and always returns at
its native 16000 Hz. Declaring a higher rate in the WAV header causes
proportionally slower/deeper playback. 16000 Hz matches the actual output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:16:54 +02:00
Sharang Parnerkar
aa7bd79c51 fix(pitch-deck): bump OVH TTS default sample rate to 44100 Hz
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m25s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 45s
CI / test-python-voice (push) Successful in 40s
CI / test-bqas (push) Successful in 35s
22050 Hz declared in WAV header while Riva returns 44100 Hz native PCM
causes playback at half speed — deep, bloated voice. Aligning the
declared rate with the actual output fixes pitch and speed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:11:14 +02:00
Sharang Parnerkar
7701a34d7f feat(pitch-deck): redirect to pitch if valid session on magic link re-click
Some checks failed
Build pitch-deck / build-push-deploy (push) Successful in 1m25s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 48s
CI / test-bqas (push) Has been cancelled
CI / test-python-voice (push) Has started running
If an investor clicks the magic link again after already being logged in,
check /api/auth/me first — valid session → redirect to / immediately
instead of showing the 'link already used' error.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 21:08:26 +02:00
Sharang Parnerkar
d35e3f4705 fix(pitch-deck): split email.ts to fix client bundle including nodemailer
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m40s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 45s
CI / test-python-voice (push) Successful in 31s
CI / test-bqas (push) Successful in 43s
Client component (investors/new page) imported DEFAULT_MESSAGE etc. from
lib/email.ts which also top-level initialises nodemailer — webpack tried
to bundle fs/net/dns into the client chunk and failed.

Extract the pure constants + getDefaultGreeting into lib/email-templates.ts
(client-safe), keep nodemailer in lib/email.ts (server-only), update the
page to import from email-templates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 20:56:36 +02:00
Sharang Parnerkar
5d71a371d6 fix(pitch-deck): resolve Docker build failures — nodemailer webpack + jose Edge Runtime
Some checks failed
Build pitch-deck / build-push-deploy (push) Failing after 45s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 41s
CI / test-bqas (push) Successful in 39s
- Add nodemailer to serverExternalPackages so webpack doesn't try to
  bundle fs/net/dns built-ins (was fatal build error)
- Import jwtVerify from jose/jwt/verify instead of the full jose index
  to avoid pulling in JWE deflate code incompatible with Edge Runtime

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 20:31:34 +02:00
Benjamin Admin
f75aef2a4a chore: trigger rebuild
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 38s
CI / test-python-voice (push) Successful in 35s
CI / test-bqas (push) Successful in 38s
2026-04-16 16:13:14 +02:00
Benjamin Admin
5264528940 style(pitch-deck): highlight Professional tier with silver border on BusinessModel slide
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 38s
CI / test-python-voice (push) Successful in 41s
CI / test-bqas (push) Successful in 37s
Build pitch-deck / build-push-deploy (push) Failing after 47s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 15:36:02 +02:00
Benjamin Admin
084183f3a4 fix(pitch-deck): sync Executive Summary + BusinessModel with compute engine
Some checks failed
Build pitch-deck / build-push-deploy (push) Failing after 35s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 33s
ExecutiveSummarySlide:
- Unternehmensentwicklung: hardcoded table → useFinancialModel + computeAnnualKPIs
  (MA, Kunden, ARR now computed from finanzplan DB for all versions)
- Pricing: aligned with BusinessModelSlide tiers (Starter/Professional/Enterprise)
  Enterprise: 40k → 50k (matching Folie 11)

BusinessModelSlide:
- ACV: hardcoded "15–50k" → computed from summary.final_arr / final_customers
- Gross Margin: hardcoded "> 80%" → computed from lastResult.gross_margin_pct

All financial numbers on all slides now flow from the same compute engine.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 09:02:07 +02:00
Benjamin Admin
e05d3e1554 fix(pitch-deck): sync Executive Summary savings with SavingsSlide (Folie 18) KMU data
Some checks failed
Build pitch-deck / build-push-deploy (push) Failing after 40s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 38s
CI / test-bqas (push) Successful in 37s
Kundenersparnis kachel now matches KMU tier from SavingsSlide:
- Pentests: 30k → 13k (actual savings vs without BreakPilot)
- CE-Beurt. 20k → CE-Risiko 9k
- Audit Mgr. 60k+ → Compliance-Zeit 15k + Audit-Vorb. 9k
- Total: 50-110k → 55k/Jahr (KMU, 3.7x ROI)
- HTML embed: "50.000+ EUR/Jahr" → "55.000 EUR/Jahr (3,7x ROI)"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 08:57:45 +02:00
Benjamin Admin
06f868abeb fix(pitch-deck): replace all hardcoded financial numbers with computed values
Some checks failed
Build pitch-deck / build-push-deploy (push) Failing after 40s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 37s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Successful in 33s
All financial data now flows from the same compute engine (useFinancialModel).
No more hardcoded numbers in any slide — all values are derived from the
finanzplan database, ensuring consistency across all pitch deck versions.

- FinanzplanSlide: KPI table + charts now use computeAnnualKPIs() from FMResult[]
- BusinessModelSlide: bottom-up calc (customers × ACV = ARR) from compute engine
- AssumptionsSlide: Base case from compute, Bear/Bull scaled from Base
- New helper: lib/finanzplan/annual-kpis.ts for 60-month → 5-year aggregation
- PitchDeck: passes investorId to all financial slides for version-aware data

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 08:48:37 +02:00
Benjamin Admin
aed428312f feat(pitch-deck): bilingual email template + invite page with live preview
Some checks failed
Build pitch-deck / build-push-deploy (push) Failing after 1m6s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 48s
CI / test-python-voice (push) Successful in 39s
CI / test-bqas (push) Successful in 40s
Email template (email.ts):
- Bilingual: German body + DE/EN legal footer
- Customizable greeting, message body, and closing
- Magic Link explanation box (hardcoded)
- Confidentiality & Disclaimer footer (hardcoded, bilingual)

Invite page (investors/new):
- Name is now required, Company is optional
- Editable fields: greeting, message, closing (with defaults)
- Live email preview panel (right side)
- Shows full email content before sending
- German UI labels

API (invite/route.ts):
- Passes greeting, message, closing to email function

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 08:34:23 +02:00
Benjamin Admin
32851ca9fb feat(pitch-deck): add confidentiality & disclaimer to magic link email
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 36s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Successful in 33s
Adds legal footer to the investor invite email with:
- Confidentiality obligation (3 years, purpose limitation)
- Disclaimer (not an offer, projections only, risk of total loss)
- Jurisdiction: Konstanz, German law

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 08:23:05 +02:00
Benjamin Admin
cbee0b534f feat(pitch-deck): TheAsk — 40k/160k/200k tiles, BAFA+L-Bank hint, FAQ, skip CapTable for Wandeldarlehen
Some checks failed
Build pitch-deck / build-push-deploy (push) Successful in 1m8s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 36s
CI / test-python-voice (push) Successful in 37s
CI / test-bqas (push) Has been cancelled
- Funding tiles: 40k investor (ab 20%) + 160k L-Bank = 200k, optional 400k row
- Remove Cap Table "Beispielrechnung" from TheAsk slide
- BAFA INVEST title: add hint that L-Bank+BAFA combination must be verified
- Skip CapTable slide entirely for Wandeldarlehen versions (useEffect auto-advance)
- FAQ: add Wandeldarlehen/Pre-Seed BW entry + BAFA+Pre-Seed compatibility entry
- FAQ: fix outdated BAFA INVEST percentage (20% → 15%) in investment-captable entry

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 08:20:29 +02:00
Benjamin Admin
8f44d907a5 feat(pitch-deck): TheAsk slide — Wandeldarlehen version with Pre-Seed BW, Cap Table
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m12s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 37s
CI / test-bqas (push) Successful in 36s
- Conditional sections only shown when instrument is "Wandeldarlehen"
- 200k investor ask + 200k L-Bank = 400k total funding visualization
- 3-step explanation: Investment → Conversion → Investor advantage
- Pre-Seed BW / L-Bank co-financing info box
- Cap Table before/after conversion example
- Use of Funds EUR amounts based on 400k total budget
- "1 Mio." version remains completely unaffected

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 07:36:44 +02:00
Benjamin Admin
24ce8ccd20 fix(pitch-deck): TheAsk slide — fix client-side crash
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m12s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 34s
CI / test-python-voice (push) Successful in 35s
CI / test-bqas (push) Successful in 36s
- Replace emoji with Landmark icon
- Add JSON.parse fallback for use_of_funds
- Guard pieData labels and amounts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 00:26:16 +02:00
Benjamin Admin
786993d8ca feat(pitch-deck): add BAFA INVEST program info to The Ask slide
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m12s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 34s
CI / test-python-voice (push) Successful in 34s
CI / test-bqas (push) Successful in 35s
- 15% tax-free acquisition grant (corrected from 25%)
- 25% exit grant on capital gains
- Up to 40% effective support (entry + exit combined)
- Program extended until 31.12.2026
- Disclaimer to verify current terms at bafa.de

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 00:05:42 +02:00
Benjamin Admin
2b9788bdb0 feat(pitch-deck): add day/night mode toggle to sidebar
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m7s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 40s
CI / test-python-voice (push) Successful in 34s
CI / test-bqas (push) Successful in 34s
- Theme toggle button below language toggle
- Uses existing theme-light CSS class from globals.css
- Moon/Sun icons with Nacht/Tag labels (DE) or Dark/Light (EN)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 23:58:48 +02:00
Benjamin Admin
91b5ce990f fix(pitch-deck): remove Kernmarkt label, pricing from product, bigger disclaimer
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m6s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 32s
CI / test-python-voice (push) Successful in 35s
CI / test-bqas (push) Successful in 31s
- BusinessModel: remove "Kernmarkt" text, stronger highlight (shadow+border)
- Product: remove Pricing kachel, split Deployment into 2 side-by-side
  cards (Cloud + Privacy Hardware), larger text
- Executive Summary: disclaimer font size increased (9px→11px, 10px→12px)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 23:50:49 +02:00
Benjamin Admin
936b4ccc51 fix(pitch-deck): glossary — align abbreviations with descriptions (items-baseline)
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m11s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 34s
CI / test-bqas (push) Successful in 32s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 23:38:00 +02:00
Benjamin Admin
9e3f15ce4e fix(pitch-deck): increase font sizes on slides 8, 11, 18, 25, 27
Some checks failed
Build pitch-deck / build-push-deploy (push) Has been cancelled
CI / go-lint (push) Has been cancelled
CI / python-lint (push) Has been cancelled
CI / nodejs-lint (push) Has been cancelled
CI / test-go-consent (push) Has been cancelled
CI / test-python-voice (push) Has been cancelled
CI / test-bqas (push) Has been cancelled
- All text-[10px] → text-xs (12px)
- All text-[9px] → text-[11px]
- All text-[8px] → text-[10px]
- Affected: BusinessModel, Product, Savings, Strategy slides
- Engineering: revert LoC to 481K (compliance SDK only, not all repos)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 23:34:59 +02:00
Benjamin Admin
7523f47468 fix(pitch-deck): engineering slide — sync numbers with real data
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m4s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 33s
CI / test-python-voice (push) Successful in 32s
CI / test-bqas (push) Successful in 30s
- 481K LoC → 960K+ (actual count across 3 repos)
- 10 Services → 320 Dokumente im RAG (aligned with Slide 7)
- 48+ SDK-Module → 70K+ Compliance Controls (from DB)
- 5 Infra → 12 Produkt-Module (aligned with Slide 8)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 23:26:26 +02:00
Benjamin Admin
6de8b33dd1 fix(pitch-deck): regulatory slide — white headers for requirements + how we help
Some checks failed
Build pitch-deck / build-push-deploy (push) Has been cancelled
CI / python-lint (push) Has been cancelled
CI / nodejs-lint (push) Has been cancelled
CI / go-lint (push) Has been cancelled
CI / test-go-consent (push) Has been cancelled
CI / test-python-voice (push) Has been cancelled
CI / test-bqas (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 23:23:07 +02:00
Benjamin Admin
79c01c85fa fix(pitch-deck): realistic savings — credible ROI for investors
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m11s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 29s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Successful in 33s
- Ext. DSB: 6k→3k (halved, not eliminated)
- Compliance docs: 0→2k (reduced effort, not zero)
- Personnel: "~2/8/40 MA savings" → "50% more productive compliance time"
  (realistic productivity gain, not full headcount elimination)
- ROIs now credible: KMU 3.7x, Mittelstand 6.4x, Konzern 15.6x
  (was 11x/21x/62x — too aggressive for investors)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 23:17:43 +02:00
Benjamin Admin
735cab2018 fix(pitch-deck): add pulse animation to MarketSlide inactive tabs
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m11s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 34s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 33s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 23:10:22 +02:00
Benjamin Admin
b4e8b74afb fix(pitch-deck): center KPI card labels and values
Some checks failed
Build pitch-deck / build-push-deploy (push) Has been cancelled
CI / python-lint (push) Has been cancelled
CI / go-lint (push) Has been cancelled
CI / nodejs-lint (push) Has been cancelled
CI / test-python-voice (push) Has been cancelled
CI / test-go-consent (push) Has been cancelled
CI / test-bqas (push) Has been cancelled
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 23:05:46 +02:00
Benjamin Admin
4b06933576 fix(pitch-deck): sync Executive Summary modules with Slide 8
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 32s
CI / test-python-voice (push) Successful in 32s
CI / test-bqas (push) Successful in 36s
- Cookie-Generator → Tender Matching (RFQ gegen Codebase)
- Integration → AI Act Compliance (UCCA, Betriebsrat)
- Text: Integration in Kundenprozesse → AI Act + Tender Matching

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 22:59:41 +02:00
Benjamin Admin
89a6b90ca6 fix(pitch-deck): remaining umlauts + COMPLAI consistency
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m18s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 36s
CI / test-python-voice (push) Successful in 37s
CI / test-bqas (push) Successful in 38s
- 17 more umlaut fixes (Konformitätsbewertung, Löschkonzept,
  Portabilität, Regelmäßige, etc.) across 6 files
- ComplAI → COMPLAI in all string contexts for consistency
- BrandName component used for JSX rendering (gradient AI)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 22:54:09 +02:00
Benjamin Admin
f9b9cf0383 feat(pitch-deck): business model redesign + umlauts fix + tab animations
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m12s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 37s
CI / test-python-voice (push) Successful in 37s
CI / test-bqas (push) Successful in 36s
Business Model slide completely rewritten:
- Left: 3 pricing tiers (Starter/Professional/Enterprise)
- Right: Unit Economics (ACV, Gross Margin, NRR, Payback)
- Bottom-up sizing: 1,200 customers × 8,400 ACV = 10M ARR
- Land & Expand arrow visualization

Umlauts: 75+ ae/oe/ue → ä/ö/ü replacements across 10 slide files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 22:42:06 +02:00
Benjamin Admin
2de4d03d81 Merge branch 'main' of ssh://gitea.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-core
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m9s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 32s
CI / test-python-voice (push) Successful in 37s
CI / test-bqas (push) Successful in 36s
2026-04-15 22:26:57 +02:00
Benjamin Admin
d2c2fd92cc feat(pitch-deck): tab pulse animation + BrandName in regulatory/competition
- Inactive tabs pulse gently (animate-[pulse_3s]) on:
  Competition, AIPipeline, Financials, Regulatory slides
- RegulatorySlide: "Wie ComplAI hilft" → BrandName component
- CompetitionSlide: "ComplAI" label → BrandName component

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 22:26:05 +02:00
Sharang Parnerkar
032df7f401 fix(pitch-deck): coerce pg NUMERIC to Number globally — fixes Finanzen crash
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 39s
CI / test-python-voice (push) Successful in 38s
CI / test-bqas (push) Successful in 29s
The Finanzen slide crashed on mount with "a.toFixed is not a function".
Traced to UnitEconomicsCards.tsx:59 calling ltvCacRatio.toFixed(1),
where ltvCacRatio arrives as a string.

Root cause: the cached path in /api/financial-model/compute returns raw
rows from pitch_fm_results. node-postgres returns NUMERIC / DECIMAL
columns as strings by default, so lastResult.ltv_cac_ratio (and every
other *_eur / *_pct / *_ratio field) flows through the app as a string.
Arithmetic-heavy code paths survived on accidental string-coerce (`-`,
`/`, `*`), but direct method calls like .toFixed() don't coerce, which
is why Unit Economics was the visible crash site.

Fix at the boundary: register a single types.setTypeParser(NUMERIC, …)
on the pg Pool so every query returns real numbers. All our NUMERIC
values fit well inside Number.MAX_SAFE_INTEGER, so parseFloat is safe.

One-line change, no component-level defensive coercions needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 22:19:57 +02:00
Benjamin Admin
474f09ce88 fix(pitch-deck): USP compliance text position + regulatory KPI labels
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 39s
CI / test-bqas (push) Successful in 38s
- USP: Compliance block shifted right (left-7 → left-12)
- Regulatory: KPI labels more descriptive:
  Horizontal → "Gelten für alle Branchen"
  Sektorspezifisch → "Branchenspezifische Gesetze"
  Industriesektoren → "Abgedeckte Branchen"
  Dokumente column → "Gesetze gesamt"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:21:39 +02:00
Benjamin Admin
e920dd1b3f feat(pitch-deck): savings slide — aggressive personnel savings, fix terminology
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 37s
CI / test-bqas (push) Successful in 38s
- Apps → Anwendungen/Softwareapplikationen
- Remove "Shift-Left" — replaced with KI-Automatisierung Personalersparnis
- KMU (25 MA): ~2 MA Ersparnis = 120k€/Jahr → ROI 11x (was 3.5x)
- Mittelstand (100 MA): ~8 MA Ersparnis = 480k€ → ROI 21x (was 7.5x)
- Konzern (500+ MA): ~40 MA Ersparnis = 2.4M€ → ROI 62x (was 20.8x)
- Linear scaling of personnel savings across tiers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:15:01 +02:00
Benjamin Admin
5ddf8bbc3c fix(pitch-deck): architecture + GTM corrections
Some checks failed
Build pitch-deck / build-push-deploy (push) Successful in 1m17s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 30s
CI / test-python-voice (push) Successful in 35s
CI / test-bqas (push) Has been cancelled
Architecture:
- "Daten verlassen nie das Unternehmen" → "nie BSI-zertifizierte Server in DE"
- "Keine Cloud-Abhängigkeit" → "100% EU-Cloud · Keine US-Anbieter"
- Mac Mini/Studio: remove GB/model specs, mark as (geplant, optional)

GTM:
- Phase 1 focus: Maschinenbau, Automotive, Elektro (was Healthcare, Finance)
- ICP: Produzierende Industrie (was Regulierte Branche)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:11:46 +02:00
Benjamin Admin
14cde7b3ee feat(pitch-deck): disclaimer 2 founders, glossary +12 terms, SDK demo + strategy fixes
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m16s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 36s
CI / test-python-voice (push) Successful in 38s
CI / test-bqas (push) Successful in 38s
Disclaimer: 2 founders (Bodman + Engen), all singular→plural
Glossary: +FISA 702, Cloud Act, BDSG, BSI, RAG, LLM, UCCA, FRIA,
  SDK, OWASP, NIST, ENISA, CE, RFQ (new Technology category)
SDK Demo: Müller Maschinenbau → Muster Maschinenbau (example customer)
Strategy: CANCOM/Bechtle disclaimer (planned, not yet contacted)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:03:48 +02:00
Benjamin Admin
581162cdb8 fix(pitch-deck): footer readability + finanzplan import endpoint
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m9s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 37s
CI / test-bqas (push) Successful in 38s
- Regulatory landscape footer: text-xs text-white/50 (was text-[9px] text-white/20)
- New POST /api/admin/import-fp endpoint to import fp_* data from JSON dump

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 19:43:08 +02:00
Benjamin Admin
dc27fc5500 feat(pitch-deck): regulatory landscape based on real rag-documents.json
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m10s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 32s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Successful in 37s
Complete rewrite of Slide 7:
- 10 real VDMA/VDA/BDI industry sectors (was 11 mixed categories)
- 7 key EU regulations as columns (DSGVO, AI Act, NIS2, CRA,
  Maschinenverordnung, Data Act, Batterieverordnung)
- Actual document counts per industry (244 horizontal + sector-specific)
- Last column: total applicable documents (not regulation count)
- KPIs: 320 docs, 244 horizontal, 65 sector-specific, 10 sectors
- Footer explains horizontal vs sector-specific logic
- Subtitle: 320 Dokumente im RAG — 10 Industriesektoren

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 19:21:07 +02:00
Benjamin Admin
51649c874b Merge branch 'main' of ssh://gitea.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-core
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m10s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Successful in 33s
2026-04-15 19:07:27 +02:00
Benjamin Admin
4d7836540a feat(pitch-deck): add admin migration endpoint for finanzplan tables
POST /api/admin/migrate creates all fp_* tables on production DB.
Admin-only, creates tables with IF NOT EXISTS for safe re-runs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 19:06:32 +02:00
Sharang Parnerkar
3419e18d7f feat(pitch-deck): add Sharang Parnerkar photo
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m12s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 31s
CI / test-python-voice (push) Successful in 30s
CI / test-bqas (push) Successful in 34s
GitHub avatar (github.com/mighty840) saved as /team/sharang-parnerkar.jpg.
Team-data JSON for both draft versions (Wandeldarlehen and The Ask 1 Mio)
was updated out-of-band via the admin API:

- Bio lengthened (~640 chars DE/EN) to match Benjamin's depth — now
  covers the ETO tenure (3→60 org scale), ETOPay, ViviSwap/MiCA,
  enterprise AI on AWS/Azure/SysEleven, embedded Rust work, and the
  ferrite-sdk open-source project.
- photo_url switched from empty to /team/sharang-parnerkar.jpg.
- Expertise tags unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 18:50:04 +02:00
Benjamin Admin
a9b71b9d23 Merge branch 'main' of ssh://gitea.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-core
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m7s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 31s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 34s
2026-04-15 18:45:08 +02:00
Benjamin Admin
e8a18c0025 perf(pitch-deck): fix slow financial slides — cached results + batch insert
- Compute endpoint now returns cached results if available (single SELECT
  instead of DELETE + 60 INSERTs)
- When recompute is needed, batch all 60 rows into a single INSERT
- Reduces DB calls from 61 to 2 (cached) or 3 (recompute)
- Fixes timeout/blank financial slides for investors

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 18:43:05 +02:00
Sharang Parnerkar
3e9a988aaf perf(pitch-deck): smooth SDK demo carousel — no blank frames, parallel preload
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 34s
CI / test-python-voice (push) Successful in 34s
CI / test-bqas (push) Successful in 31s
The SDK Live Demo was janky: AnimatePresence mode="wait" unmounted the
current Image before mounting the next, so every advance forced a cold
fetch and left an empty black frame until the new image decoded. Only
the first three screenshots had priority; the rest fetched lazily, so
the first pass through the carousel repeatedly stalled.

Replaces the single swap-in/swap-out Image with a stack of 23 images
layered in an aspect-[1920/1080] container. Cross-fades are now pure
CSS opacity on always-mounted nodes, so there is no unmount and no gap.

Key details:
- priority on the first 3 (triggers <link rel="preload">); loading=eager
  on the remaining 20 so the browser starts all fetches at mount rather
  than deferring via IntersectionObserver.
- sizes="(max-width: 1024px) 100vw, 1024px" lets next/image serve the
  actual displayed resolution instead of the 1920 hint — fewer bytes,
  faster first paint.
- Load-gated reveal: a new `shown` state trails `current` until the
  target image fires onLoadingComplete. If the user clicks ahead of
  the network, the previous loaded screenshot stays visible — no more
  black flashes before images arrive.

Second pass through the carousel is instant (images are in-cache).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 18:35:55 +02:00
Sharang Parnerkar
01f05e4399 feat(pitch-deck): route DE presenter TTS through OVH via LiteLLM passthrough
Adds an OVH-backed branch to /api/presenter/tts so the German presenter
narration is synthesized by OVH AI Endpoints' nvr-tts-de-de (NVIDIA Riva)
reached through the LiteLLM passthrough at /tts-ovh/audio/*, which
injects the OVH API token server-side.

- DE requests now hit ${LITELLM_URL}/tts-ovh/audio/v1/tts/text_to_audio
  with the documented body shape (encoding=1, language_code=de-DE,
  voice_name=German-DE-Male-1, sample_rate_hz=22050) and return the
  audio/wav bytes upstream serves (confirmed RIFF-framed in a smoke test).
- EN continues to hit compliance-tts-service until OVH_TTS_URL_EN is set,
  making the eventual EN switch a single env flip.
- OVH and voice/url/sample-rate parameters are env-overridable
  (OVH_TTS_URL_DE, OVH_TTS_VOICE_DE, OVH_TTS_SAMPLE_RATE,
  OVH_TTS_URL_EN, OVH_TTS_VOICE_EN) so retuning doesn't need a redeploy.
- Defensive: OVH failures surface as 502 (no silent fallback) so upstream
  issues are visible during this test rollout.
- wrapPcmAsWav() helper is kept as a safety net in case OVH ever returns
  bare PCM instead of a full WAV.

Adds X-TTS-Source response header (ovh | compliance) to make
provenance observable from DevTools.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 18:35:55 +02:00
Benjamin Admin
7c17e484c1 fix(pitch-deck): add /team to public paths for team photo access
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 36s
CI / test-python-voice (push) Successful in 35s
CI / test-bqas (push) Successful in 33s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 18:23:52 +02:00
Benjamin Admin
ea39418738 Merge branch 'main' of ssh://gitea.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-core
Some checks failed
Build pitch-deck / build-push-deploy (push) Has been cancelled
CI / go-lint (push) Has been cancelled
CI / python-lint (push) Has been cancelled
CI / nodejs-lint (push) Has been cancelled
CI / test-go-consent (push) Has been cancelled
CI / test-python-voice (push) Has been cancelled
CI / test-bqas (push) Has been cancelled
2026-04-15 18:21:15 +02:00
Benjamin Admin
7f88ed0ed2 feat(pitch-deck): add Benjamin Boenisch photo + update team data
- Photo extracted from CV and placed in public/team/
- Team data updated via MCP (both versions):
  - Bio: 18+ years industry/strategy, SVP at ETO GRUPPE,
    60 employees, M&A, 11 patents, VDMA/CyberLAGO memberships
  - Role: CEO & Gründer (was CEO & Co-Founder)
  - Expertise tags: Strategie & M&A, DSGVO/AI Act/CRA,
    IoT & Embedded, Web3 & Blockchain, 11 Patente
  - photo_url set to /team/benjamin-boenisch.png

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 18:20:23 +02:00
Sharang Parnerkar
44659a9dd7 fix(pitch-deck): serve /screenshots/* past the auth middleware
Some checks failed
Build pitch-deck / build-push-deploy (push) Has been cancelled
CI / go-lint (push) Has been cancelled
CI / python-lint (push) Has been cancelled
CI / nodejs-lint (push) Has been cancelled
CI / test-go-consent (push) Has been cancelled
CI / test-python-voice (push) Has been cancelled
CI / test-bqas (push) Has been cancelled
The SDK Live Demo slide renders screenshots via next/image from
/public/screenshots/*.png. Because /screenshots was not on the
PUBLIC_PATHS list, every request was 307-redirected to /auth, and the
next/image optimizer responded with
  HTTP 400 "The requested resource isn't a valid image."
leaving the slide with empty dark frames (surfaced in the pitch preview).

next/image also bypasses middleware itself (see the matcher), but the
server-side fetch it performs for the source URL does hit middleware
and carries no investor cookie, so whitelisting the path is required
even for authenticated viewers.

These PNGs are public marketing assets — there's no reason to gate them.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 18:20:16 +02:00
Sharang Parnerkar
87d7da0198 fix(pitch-deck): point SDK demo URL mockup at admin-dev.breakpilot.ai
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m9s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 34s
CI / test-python-voice (push) Successful in 32s
CI / test-bqas (push) Successful in 30s
The SDK live-demo slide renders a fake browser URL bar to frame each
screenshot. It used admin.breakpilot.ai, but the actual demo instance
investors should be able to reach lives on admin-dev.breakpilot.ai.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 18:09:04 +02:00
Benjamin Admin
9675c1f896 Merge branch 'main' of ssh://gitea.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-core
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m11s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 32s
CI / test-python-voice (push) Successful in 30s
CI / test-bqas (push) Successful in 31s
2026-04-15 18:00:45 +02:00
Benjamin Admin
9736476a0c feat(pitch-deck): legal disclaimer slide + projection footer on financial slides
New DisclaimerSlide (last slide):
- Full liability disclaimer (German/English)
- Confidentiality clause (purpose limitation, 3yr duration, Konstanz jurisdiction)
- Status as private individual in founding preparation

ProjectionFooter component on 4 financial slides:
- FinancialsSlide, TheAskSlide, FinanzplanSlide, CapTableSlide
- "Alle Finanzdaten sind Planzahlen" disclaimer

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 18:00:08 +02:00
Sharang Parnerkar
03d420c984 feat(pitch-deck): self-service magic-link reissue on /auth
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m5s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 31s
CI / test-python-voice (push) Successful in 31s
CI / test-bqas (push) Successful in 31s
Investors who lost their session or whose invite token was already used
can now enter their email on /auth to receive a fresh access link,
without needing a manual re-invite from an admin.

- New /api/auth/request-link endpoint looks up the investor by email,
  issues a new pitch_magic_links row, and emails the link via the
  existing sendMagicLinkEmail path. Response is generic regardless of
  whether the email exists (enumeration resistance) and silently no-ops
  for revoked investors.
- Rate-limited both per-IP (authVerify preset) and per-email (magicLink
  preset, 3/hour — same ceiling as admin-invite/resend).
- /auth page now renders an email form; submits to the new endpoint and
  shows a generic "if invited, link sent" confirmation.
- Route-level tests cover validation, normalization, unknown email,
  revoked investor, and both rate-limit paths.
- End-to-end regression test wires request-link + verify against an
  in-memory fake DB and asserts the full flow: original invite used →
  replay rejected → email submission → fresh link → verify succeeds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-15 17:06:12 +02:00
Benjamin Admin
6b52719079 feat(pitch-deck): rename Traction → Meilensteine, update milestones data
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m10s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 32s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Successful in 36s
- i18n: Traction & Meilensteine → Meilensteine / Milestones
- slideNames updated (DE + EN)
- Chat display name updated
- Milestones data replaced via MCP (both versions):
  13 milestones chronologically: domains, DPMA, IHK, prototype,
  pilot customers, RAG pipeline, EUIPO, L-Bank, Gründerzuschuss,
  GmbH founding, onboarding, App Store, distribution
- Metrics updated: 385 docs, 25k controls, 12 modules, etc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 16:20:47 +02:00
Benjamin Admin
a5b7d62969 fix(pitch-deck): USP cards wider (290px), circle larger (440px), more height
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m11s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 30s
CI / test-python-voice (push) Successful in 32s
CI / test-bqas (push) Successful in 26s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 15:49:28 +02:00
Benjamin Admin
ef9e3699b2 fix(pitch-deck): USP cards overlap — increase container height to 520px
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m8s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 31s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 27s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 15:06:33 +02:00
Benjamin Admin
440367b69d feat(pitch-deck): USP font sizes match Solution slide, product modules updated
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m4s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 31s
CI / test-python-voice (push) Successful in 32s
CI / test-bqas (push) Successful in 30s
USP slide:
- Title/subtitle same as Solution (text-4xl/text-lg)
- Card titles: text-base font-bold (was text-xs)
- Card descriptions: text-sm text-white/50 (was text-[10px])
- Circle text: text-sm (was text-[11px]/text-[9px])
- Cards 240px wide with GlassCard wrapper

Product slide:
- "Integration in Kundenprozesse" → "AI Act Compliance" (UCCA, Betriebsrat)
- "Cookie-Generator" → "Tender Matching" (RFQ gegen Codebase)
- Remove "FR" badge from deployment options

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 14:45:14 +02:00
Benjamin Admin
801a5a43f5 feat(pitch-deck): USP slide — larger circle, title back, infinity hub
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m6s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 34s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 34s
- USP as slide title (GradientText) above
- Circle doubled to 380px with spinning ring
- Infinity symbol (∞) in center hub instead of text
- Compliance left, Code right inside circle — larger font
- 4 cards in corners (220px wide, larger text, ~5 lines each)
- Cards spread to corners (top/bottom, left/right)
- Dashed SVG lines connecting circle to cards

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 14:33:28 +02:00
Benjamin Admin
9c23068a4f feat(pitch-deck): USP slide — large circle with cards on sides
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m7s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 31s
- Large spinning circle (320px) with USP hub in center
- Compliance items left, Code items right inside circle
- 4 arrows pointing outward to capability cards
- 2 cards left (RFQ, Bidirectional), 2 cards right (Process, Continuous)
- Longer descriptions (~5 lines per card)
- Grid layout: cards | circle | cards

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 14:26:59 +02:00
Benjamin Admin
d359b7b734 fix(pitch-deck): HowItWorks line behind icons, remove France refs, SOM label
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m6s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 31s
CI / test-python-voice (push) Successful in 30s
CI / test-bqas (push) Successful in 30s
- Connection line: starts/ends between icons, opaque icon background
- Remove all "oder Frankreich/or France/oder FR/or FR" references
- Market subtitle: remove "Der Maschinenbau"
- SOM label: add "(nur Maschinen- und Anlagenbauer als Kernmarkt)"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 13:46:21 +02:00
Benjamin Admin
bd37ff807e fix(pitch-deck): USP slide complete redesign — grid layout
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m7s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 30s
CI / test-python-voice (push) Successful in 31s
CI / test-bqas (push) Successful in 31s
Replace broken absolute positioning with clean grid layout:
- Top: Compliance card | BreakPilot hub (spinning) | Code card
- Arrows + sync labels between cards
- Bottom: 4 capability cards in a row
- No more floating text, no overlapping elements

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 13:25:03 +02:00
Benjamin Admin
40d2342086 fix(pitch-deck): fix JSX syntax error in USPSlide corner cards
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m3s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 30s
CI / test-python-voice (push) Successful in 29s
CI / test-bqas (push) Successful in 27s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 13:14:03 +02:00
Benjamin Admin
adf3bf8301 feat(pitch-deck): USP slide redesign + add to sidebar
Some checks failed
Build pitch-deck / build-push-deploy (push) Failing after 20s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 31s
CI / test-python-voice (push) Successful in 32s
CI / test-bqas (push) Successful in 33s
- USP added to slideNames (DE+EN) and chat display names
- Circular layout: BreakPilot hub center, rotating ring,
  Compliance & Code sections inside circle
- 4 capability cards in corners connected by dashed lines
- Removed variant toggle (kept variant A design)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 13:04:06 +02:00
Benjamin Admin
1b5ccd4dec feat(pitch-deck): solution text fixes + USP bridge 3 variants
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m5s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 36s
CI / test-python-voice (push) Successful in 38s
CI / test-bqas (push) Successful in 32s
- Solution: 30k → 15k+ EUR per year per application
- Solution: DE oder FR → Deutschland
- USP title: Unser USP → USP
- USP bridge: 3 switchable variants (A: circular loop,
  B: infinity loop, C: hexagonal hub) with toggle buttons

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 12:57:30 +02:00
Benjamin Admin
b5d8f9aed3 feat(pitch-deck): add USP slide + update cover and problem texts
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m8s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 30s
CI / test-python-voice (push) Successful in 29s
CI / test-bqas (push) Successful in 28s
- Cover: remove "für den Maschinenbau" from tagline
- Problem subtitle: Maschinenbauer → Deutsche und europäische Unternehmen
- New USP slide after Solution: bridge between compliance docs/audits
  and actual code implementation — RFQ verification, bidirectional sync,
  automated process compliance, continuous instead of annual checks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 12:44:32 +02:00
Benjamin Admin
c8171b0a1e chore(pitch-deck): trigger rebuild 2
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m22s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 36s
CI / test-python-voice (push) Successful in 28s
CI / test-bqas (push) Successful in 28s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 11:05:40 +02:00
Benjamin Admin
7e15ef3725 chore(pitch-deck): trigger rebuild for i18n Problem slide changes
Some checks failed
Build pitch-deck / build-push-deploy (push) Failing after 24s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Failing after 9s
CI / test-python-voice (push) Failing after 11s
CI / test-bqas (push) Failing after 10s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 08:54:03 +02:00
Benjamin Admin
e3a3802f5b chore(pitch-deck): trigger rebuild for i18n changes
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Failing after 8s
CI / test-python-voice (push) Failing after 9s
CI / test-bqas (push) Failing after 10s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 08:53:17 +02:00
Benjamin Admin
93e319e9fb feat(pitch-deck): rewrite Problem slide cards for investors
Some checks failed
Build pitch-deck / build-push-deploy (push) Failing after 8s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Failing after 8s
CI / test-python-voice (push) Failing after 5s
CI / test-bqas (push) Failing after 5s
- Card 1 (KI-Dilemma): clearer framing of sovereignty vs competitiveness
- Card 2: Patriots Act → Patriot Act + FISA 702, Schrems II reference
- Card 3: 50.000+ EUR → Nicht tragbar / Unsustainable, focus on
  AI Act, NIS2, CRA since 2024, competitive disadvantage vs US/Asia,
  supply chain costs, geopolitical pressure
- Quote updated: Maschinenbauer → Produzierende Unternehmen

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 07:59:59 +02:00
Benjamin Admin
6626d2a8f9 fix(pitch-deck): fix ReferenceError in ChatFAB breaking 2nd message
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m4s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 32s
CI / test-python-voice (push) Successful in 28s
CI / test-bqas (push) Successful in 28s
faqMatch (undefined) → faqMatches[0]. The undefined variable caused
a ReferenceError after streaming completed, which the catch block
turned into "Verbindung fehlgeschlagen" for every subsequent message.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:53:42 +02:00
Benjamin Admin
3dbc470158 feat: DSFA Generator — FISA 702 Risiken bei US-Cloud-Providern
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 26s
CI / test-python-voice (push) Successful in 29s
CI / test-bqas (push) Successful in 30s
Erkennt automatisch US-Provider (AWS, Azure, Google, Microsoft, OpenAI,
Anthropic, Oracle, Amazon) und fuegt 3 Drittland-Risiken hinzu:
- FISA 702 Zugriff nicht ausschliessbar
- EU-Serverstandort schuetzt nicht gegen US-Rechtszugriff
- Fehlende Rechtsbehelfe fuer EU-Betroffene

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:47:21 +02:00
Benjamin Admin
e5d0386cfb feat(pitch-deck): add FISA 702 FAQ entries for investor agent
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m1s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 30s
CI / test-python-voice (push) Successful in 32s
CI / test-bqas (push) Successful in 26s
5 new FAQ entries covering:
- FISA 702 basics (PRISM, Upstream, Schrems II)
- EU cloud region myth (extraterritorial US law)
- DSFA contradiction (risk acceptance vs risk elimination)
- Market opportunity (structural independence)
- BreakPilot architecture (BSI, SysEleven, Hetzner)

Also: middleware fix to allow admin sessions on investor routes
(enables chat in preview mode)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:27:47 +02:00
Benjamin Admin
ff071af2a0 fix(pitch-deck): allow admin sessions to access investor routes
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m3s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 30s
CI / test-python-voice (push) Successful in 30s
CI / test-bqas (push) Successful in 34s
Admins in preview mode can now use /api/chat and other investor
endpoints without needing a separate investor login.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 00:13:13 +02:00
Benjamin Admin
fcdcbc51e3 fix(pitch-deck): regulatory matrix header positioning
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m12s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 31s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Successful in 30s
- Regulatorien + Branche moved to top header row
- Branche: white/70 instead of white/30 for readability
- Regulatorien: indigo color instead of grey

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 23:59:53 +02:00
Benjamin Admin
7b8f8d4b5a fix(pitch-deck): regulatory matrix — remove legend, stagger headers
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m1s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 30s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 29s
- Remove colored dot legend row (redundant with column headers)
- Stagger column headers on 2 rows (odd/even) to save space
- Last column: Reg. → Regulatorien

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 23:54:59 +02:00
Benjamin Admin
f385c612f5 fix(pitch-deck): regulatory matrix header alignment + labels
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m4s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 32s
CI / test-python-voice (push) Successful in 30s
CI / test-bqas (push) Successful in 32s
- Column headers: centered text labels instead of icons
- Remove colored dots from headers
- Last column: # → Reg. (Regulierungen)
- Consistent column width for last column

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 23:45:24 +02:00
Benjamin Admin
9166d9dade fix(pitch-deck): resolve merge conflict in AIPipelineSlide — keep updated version
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m0s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 34s
CI / test-python-voice (push) Successful in 30s
CI / test-bqas (push) Successful in 31s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 17:42:13 +02:00
Benjamin Admin
7ae5bc0fd5 feat(pitch-deck): overhaul AI Pipeline slide with real data
- Hero stats: 75+ sources, 70k+ controls, 47k+ obligations
- RAG tab: source categories with investor-friendly explanations
  (why court rulings matter, why frameworks define state of art)
- Remove inflated numbers (was 110+ regulations, now accurate 75+)
- Quality tab: continuous expansion, cross-regulation mapping
- Remove NiBiS/education references (irrelevant for compliance)
- All numbers verified against production database

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 17:40:27 +02:00
Sharang Parnerkar
242ed1101e style(team): tighter card layout — equal height, equity pill, GitHub/LinkedIn detection
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m11s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 33s
CI / test-python-voice (push) Successful in 32s
CI / test-bqas (push) Successful in 31s
- grid items-stretch so cards match height
- Smaller avatar (16->64px) to free vertical space
- Equity moved to a top-right pill (compact); decimals collapsed via equityDisplay()
- Profile link icon auto-detects GitHub vs LinkedIn vs generic
- Expertise tags get their own divider strip at card bottom — cleaner hierarchy
- Card background lightened from 0.08 to 0.04 with subtle hover border

Bio text itself shortened on the data side (both draft versions via admin API).
2026-04-14 16:25:37 +02:00
Sharang Parnerkar
8b2e9ac328 content(pitch-deck): tidy slide text — remove OVH, generalize issue tracker, add live support, Mac Studio option
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m4s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 32s
CI / test-python-voice (push) Successful in 34s
CI / test-bqas (push) Successful in 32s
Solution slide:
- Continuous Code Security: "Jira tickets" -> "tickets in the issue tracker of your choice"
- German Cloud / Full Integration: removed OVH (now "BSI cloud DE or FR"),
  removed "AI task creation from audio", added "Live support via Jitsi (video) and Matrix (chat)",
  "Mac Mini" -> "Mac Mini/Studio"

Products / Modular toolkit slide:
- Regional bubble: "OVH FR" -> "FR"

How It Works:
- Cloud step: removed OVH and "pre-configured Mac Mini" mentions

Engineering deep dive:
- "Docker Containers" stat -> "Services"; "Coolify -> Hetzner" -> "orca -> Hetzner"
- "Dockerfiles / Fully containerized" stat -> "Infra Components / orca (Rust) + infisical + pg + qdrant"
- devopsStack: Coolify -> orca (Rust), Docker Compose -> Private Registry (registry.meghsakha.com),
  HashiCorp Vault -> Infisical, EU-Cloud list drops OVH
- Service Architecture Infrastructure section: add orca (Rust), Infisical, Private Registry
- Footer note drops OVH

Chat / Presenter (consistency):
- chat/route.ts system prompt: OVH removed, Jira-Integration -> Issue-Tracker-Integration
- presenter-faq.ts + presenter-script.ts: OVH references removed across all answers,
  Jira mentioned alongside GitLab/Linear/Gitea as examples, Mac Mini -> Mac Mini/Studio

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 16:14:40 +02:00
Benjamin Admin
084d09e9bd fix(pitch-deck): revert banner test text back to Draft
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 13s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 32s
CI / test-python-voice (push) Successful in 31s
CI / test-bqas (push) Successful in 31s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:32:46 +02:00
Benjamin Admin
646143ce5a Merge branch 'main' of ssh://gitea.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-core
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 1m3s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 31s
CI / test-python-voice (push) Successful in 30s
CI / test-bqas (push) Successful in 31s
2026-04-14 15:20:56 +02:00
Benjamin Admin
00d802f965 test(pitch-deck): banner text Draft → Draft V1 — deployment test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:20:41 +02:00
Sharang Parnerkar
ebb7575f2c test: retrigger with http:// webhook URL
All checks were successful
Build pitch-deck / build-push-deploy (push) Successful in 15s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 33s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 30s
2026-04-14 09:32:36 +02:00
Sharang Parnerkar
d0539d0f2f ci: use http:// for orca webhook (port 6880 serves plain HTTP)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been cancelled
CI / nodejs-lint (push) Has been cancelled
CI / test-go-consent (push) Has been cancelled
CI / test-python-voice (push) Has been cancelled
CI / test-bqas (push) Has been cancelled
2026-04-14 09:32:08 +02:00
Sharang Parnerkar
8e92a93aa8 test: verify full CI pipeline with registry auth + orca webhook
Some checks failed
Build pitch-deck / build-push-deploy (push) Failing after 14s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 34s
CI / test-python-voice (push) Successful in 34s
CI / test-bqas (push) Successful in 32s
2026-04-14 09:27:05 +02:00
Sharang Parnerkar
f794347827 ci: add docker login step for registry.meghsakha.com
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 30s
CI / test-bqas (push) Has been cancelled
CI / test-python-voice (push) Has been cancelled
Requires Gitea Actions secrets: REGISTRY_USERNAME, REGISTRY_PASSWORD
2026-04-14 09:26:12 +02:00
Sharang Parnerkar
1af160eed0 test: trigger orca webhook via CI
Some checks failed
Build pitch-deck / build-push-deploy (push) Failing after 10s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 33s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 32s
2026-04-14 09:22:10 +02:00
Sharang Parnerkar
eb118ebf92 ci: re-add HMAC-SHA256 signing on orca webhook (ORCA_WEBHOOK_SECRET)
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 33s
CI / test-python-voice (push) Successful in 30s
CI / test-bqas (push) Successful in 31s
2026-04-14 08:31:29 +02:00
Sharang Parnerkar
dbb476cc3b ci: drop HMAC signing (orca webhooks have no secret by default)
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 33s
CI / test-python-voice (push) Successful in 31s
CI / test-bqas (push) Successful in 32s
2026-04-14 08:27:22 +02:00
Sharang Parnerkar
9345efc3f0 ci(pipeline): trigger orca redeploy after image push, remove coolify
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 33s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 32s
build-pitch-deck workflow now posts an HMAC-signed push event to orca's
webhook endpoint after the image is built + pushed. This avoids the race
where orca would otherwise redeploy with the old :latest image before
CI finishes pushing the new one.

Removed the obsolete deploy-coolify.yml (wrong branch, wrong system) and
stripped the deploy-coolify job from ci.yaml.

Requires Gitea Actions secret: ORCA_WEBHOOK_SECRET_PITCH_DECK
2026-04-14 08:20:05 +02:00
Benjamin Admin
c4e993e3f8 fix: Leere Controls (title/objective=None) filtern vor Store
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 44s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 30s
CI / Deploy (push) Failing after 4s
- Batch-Postprocessing: Controls mit title/objective = None/null/"" werden
  gefiltert und nicht gespeichert. Title wird aus Objective abgeleitet falls
  nur Title fehlt.
- _store_control: Pre-store Quality Guard lehnt leere Controls ab
- Verhindert "None"-Controls die durch LLM-Parsing-Fehler entstehen

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 06:59:47 +02:00
Benjamin Admin
a58d1aa403 fix: KRITISCH — 12 Pipeline-Bugs gefixt, 36.000 verlorene Controls retten
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 36s
CI / test-python-voice (push) Successful in 37s
CI / test-bqas (push) Successful in 31s
CI / Deploy (push) Failing after 2s
Root Cause: _generate_control_id erzeugte ID-Kollisionen (String-Sort statt
numeric), ON CONFLICT DO NOTHING verwarf Controls stillschweigend, Chunks
wurden als "processed" markiert obwohl Store fehlschlug → permanent verloren.

Fixes:
1. _generate_control_id: Numeric MAX statt String-Sort, Collision Guard
   mit UUID-Suffix Fallback, Exception wird geloggt statt verschluckt
2. _store_control: ON CONFLICT DO UPDATE statt DO NOTHING → ID immer returned
3. Store-Logik: Chunk wird bei store_failed NICHT mehr als processed markiert
   → Retry beim naechsten Lauf moeglich
4. Counter: controls_generated nur bei erfolgreichem Store inkrementiert
   Neue Counter: controls_stored + controls_store_failed
5. Anthropic API: HTTP 429/500/502/503/504 werden jetzt retried (2 Versuche)
6. Monitoring: Progress-Log zeigt Store-Rate (%), ALARM bei <80%
7. Post-Job Validierung: Vergleicht Generated vs Stored vs DB-Realitaet
   WARNUNG wenn store_failed > 0, KRITISCH wenn Rate < 90%

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 00:39:12 +02:00
Benjamin Admin
d7ed5ce8c5 fix(pitch-deck): add 8 missing slides to renderSlide switch
Some checks failed
Build pitch-deck / build-and-push (push) Failing after 1m4s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 33s
CI / test-python-voice (push) Successful in 35s
CI / test-bqas (push) Successful in 35s
CI / Deploy (push) Failing after 2s
ExecutiveSummary, RegulatoryLandscape, CapTable, Savings,
SDKDemo, Strategy, Finanzplan, Glossary

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 22:36:14 +02:00
Benjamin Admin
512088ab93 feat(pitch-deck): HTTPS via Nginx reverse proxy on port 3012
Some checks failed
Build pitch-deck / build-and-push (push) Failing after 56s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 32s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 33s
CI / Deploy (push) Failing after 4s
- Add Nginx SSL server block for pitch-deck on port 3012
- Route through Nginx instead of direct container port
- Restore secure cookie flag (requires HTTPS)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 17:13:52 +02:00
Benjamin Admin
32b5e0223d fix(pitch-deck): use explicit PITCH_SECURE_COOKIE flag for cookie security
HTTP access on local network was blocked by secure cookie flag when
NODE_ENV=production. Now requires explicit opt-in via env var.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 17:11:36 +02:00
Benjamin Admin
9354cbf775 fix(pitch-deck): add PITCH_JWT_SECRET + PITCH_ADMIN_SECRET env vars
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 17:00:57 +02:00
Benjamin Admin
756d068b4f fix: skip_web_search Default auf True — 5x schnellere Pipeline
Anchor-Search (DuckDuckGo + RAG via SDK) verlangsamt Pipeline von
~50 Chunks/min auf ~10 Chunks/min. Anchors (OWASP/NIST-Referenzen)
koennen nachtraeglich in einem Batch-Job befuellt werden.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:26:01 +02:00
Benjamin Admin
c02a7bd8a6 feat(pitch-deck): show version name + status in preview banner
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 12:21:59 +02:00
Benjamin Admin
b6d3fad6ab Merge branch 'main' of ssh://gitea.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-core into feature/payment-compliance-module 2026-04-13 11:49:25 +02:00
Sharang Parnerkar
27479ee553 docs(mcp-server): add README + gitignore .mcp.json
Some checks failed
Build pitch-deck / build-and-push (push) Failing after 1m2s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 35s
CI / test-bqas (push) Successful in 34s
CI / Deploy (push) Failing after 3s
Setup instructions for the pitch version MCP server.
.mcp.json contains the admin secret and is gitignored.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:36:54 +02:00
Sharang Parnerkar
82a5d62f44 feat(pitch-deck): MCP server for pitch version management via Claude Code
Some checks failed
Build pitch-deck / build-and-push (push) Failing after 1m8s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 36s
CI / test-python-voice (push) Successful in 42s
CI / test-bqas (push) Successful in 40s
CI / Deploy (push) Failing after 3s
Stdio MCP server that wraps the pitch-deck admin API, exposing 11 tools:
list_versions, create_version, get_version, get_table_data,
update_table_data, commit_version, fork_version, diff_versions,
list_investors, assign_version, invite_investor.

Authenticates via PITCH_ADMIN_SECRET bearer token against the deployed
pitch-deck API. All existing auth, validation, and audit logging is
reused — the MCP server is a thin adapter.

Usage: add to ~/.claude/settings.json mcpServers, set PITCH_API_URL
and PITCH_ADMIN_SECRET env vars. See mcp-server/README.md (to be added).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 10:32:45 +02:00
Benjamin Admin
bc23c6815a docs: README aktualisiert — BV + FRIA Templates + Domain-Risiken
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 07:05:05 +02:00
Benjamin Admin
7dd2dc89a9 test: FRIA + DSFA Domain-Risiken Tests — 15/15 bestanden
FRIA: Minimal-Context, Domain-Rights (HR/Edu/HC), Universal Rights,
      Massnahmen, Public Entity, Risikomatrix, Betroffene.
DSFA: Domain-spezifische Risiken (AGG, Chancenungleichheit, Fehldiagnose,
      Kredit-Scoring), keine Extra-Risiken ohne Domain.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 06:58:36 +02:00
Benjamin Admin
57462899f6 fix: DSFA Generator — Domain-spezifische Risiken (HR/Edu/HC/Finance)
Risikoanalyse erkennt jetzt den Domain-Kontext und fuegt automatisch
domain-spezifische Risiken hinzu:
- HR: AGG-Verstoss, Beweislastumkehr, Art. 22, Proxy-Diskriminierung
- Education: Chancenungleichheit, Minderjaehrige, Fehlbewertung
- Healthcare: Fehldiagnose, Triage, Patientenautonomie
- Finance: Kredit-Scoring Diskriminierung, Dienstverweigerung

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 22:36:11 +02:00
Benjamin Admin
f23b872c54 feat: FRIA Template (Art. 27 AI Act) — 7. Document Template
Grundrechte-Folgenabschaetzung mit 8 Sektionen, ~26 Placeholders,
Conditional Blocks fuer Bildung/HR/oeffentliche Stellen.
Python-Generator mit Domain→Grundrechte-Mapping (Education, HR, Healthcare, Finance).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 16:38:59 +02:00
Benjamin Admin
55f7195edd test: BV-Generator Tests — 9 Tests (alle bestanden)
Testet: minimaler/voller Kontext, verbotene Nutzungen (KI/Standard),
Datenarten-Mapping, TOM bei hohem Konflikt-Score, Speicherfristen.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 11:02:32 +02:00
Benjamin Admin
b14be8583d feat: Betriebsrats-Compliance — BAG-Ingestion Script + BV-Template
1. BAG-Urteile Ingestion Script (21 kuratierte Urteile zu §87 BetrVG)
   - Microsoft 365, SAP ERP, E-Mail, Standardsoftware, Video, SaaS/Cloud
   - 14 erfolgreich ingestiert (4.726 Chunks in bp_compliance_datenschutz)
2. Betriebsvereinbarung Template (6. Document Template)
   - SQL-Migration mit 13 Sektionen (A-M), ~30 Placeholders
   - Conditional Blocks fuer KI-Systeme, Video, HR
   - Python-Generator mit automatischer TOM-Befuellung

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-12 10:49:01 +02:00
Benjamin Admin
67ad7c236b Merge remote-tracking branch 'gitea/main'
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 37s
CI / test-bqas (push) Successful in 33s
CI / Deploy (push) Failing after 4s
2026-04-12 09:08:04 +02:00
Benjamin Admin
f89ce46631 fix: Pipeline-Skalierung — 6 Optimierungen für 80k+ Controls
1. control_generator: GeneratorResult.status Default "completed" → "running" (Bug)
2. control_generator: Anthropic API mit Phase-Timeouts + Retry bei Disconnect
3. control_generator: regulation_exclude Filter + Harmonization via Qdrant statt In-Memory
4. decomposition_pass: Enrich Pass Batch-UPDATEs (400k → ~400 DB-Calls)
5. decomposition_pass: Merge Pass single Query statt N+1
6. batch_dedup_runner: Cross-Group Dedup parallelisiert (asyncio.gather)
7. canonical_control_routes: Framework Controls API Pagination (limit/offset)
8. DB-Indizes: idx_oc_parent_release, idx_oc_trigger_null, idx_cc_framework

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 14:09:32 +02:00
Benjamin Admin
fc71117bf2 feat: Document Templates V2 — DSFA, TOM, VVT, AVV, Verpflichtung, Art.13/14
Erweiterte Compliance-Vorlagen fuer den Document Generator:
- DSFA V2: Schwellwertanalyse (9 WP248-Kriterien), SDM-basierte TOM,
  strukturierte Risikobewertung, KI-Modul (AI Act), Art.36-Pruefung
- TOM V2: 7 SDM-Gewaehrleistungsziele, Sektor-Erweiterungen,
  NIS2/ISO27001/AI Act Varianten
- VVT V2: 6 Branchen-Muster (IT/SaaS, Gesundheit, Handel, Handwerk,
  Bildung, Beratung) + allgemeine Art.30-Vorlage
- AVV V2: Vollstaendiger Art.28-Vertrag mit TOM-Anlage
- Verpflichtungserklaerung: Mitarbeiter-Vertraulichkeit
- Art.13/14 Informationspflichten-Muster

Enthalt SQL-Migrations (compliance_legal_templates), Python-Generatoren
und Qdrant-Cleanup-Skript. Feature-Branch fuer spaetere Integration
in breakpilot-compliance.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 11:39:39 +02:00
Sharang Parnerkar
ea752088f6 feat(pitch-admin): structured form editors, bilingual fields, version preview
Some checks failed
Build pitch-deck / build-and-push (push) Failing after 59s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 32s
CI / test-python-voice (push) Successful in 32s
CI / test-bqas (push) Successful in 32s
CI / Deploy (push) Failing after 4s
Replaces raw JSON textarea in version editor with proper form UIs:

- Company: single-record form with side-by-side DE/EN tagline + mission
- Team: expandable card list with bilingual role/bio, expertise tags
- Financials: year-by-year table with numeric inputs
- Market: TAM/SAM/SOM row table
- Competitors: card list with strengths/weaknesses tag arrays
- Features: card list with DE/EN names + checkbox matrix
- Milestones: card list with DE/EN title/description + status dropdown
- Metrics: card list with DE/EN labels
- Funding: form + nested use_of_funds table
- Products: card list with DE/EN capabilities + feature tag arrays
- FM Scenarios: card list with color picker
- FM Assumptions: row table

Shared editor primitives (components/pitch-admin/editors/):
  BilingualField, FormField, ArrayField, RowTable, CardList

"Edit as JSON" toggle preserved as escape hatch on every tab.

Preview: admin clicks "Preview" on version editor → opens
/pitch-preview/[versionId] in new tab showing the full pitch deck
with that version's data. Admin-cookie gated (no investor auth).
Yellow "PREVIEW MODE" banner at top.

Also fixes the [object Object] inline table type cast in FM editor.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 10:34:42 +02:00
Sharang Parnerkar
edadf39445 fix(pitch-admin): render JSONB arrays as inline table editors
Some checks failed
Build pitch-deck / build-and-push (push) Failing after 57s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 32s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 30s
CI / Deploy (push) Failing after 3s
Arrays of objects (funding_schedule, founder_salary_schedule, etc.)
now render as editable tables with per-field inputs, add/remove row
buttons, instead of a raw JSON string in a single text input.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 10:09:26 +02:00
1c3cec2c06 feat(pitch-deck): full pitch versioning with git-style history (#4)
Some checks failed
Build pitch-deck / build-and-push (push) Failing after 1m8s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 32s
CI / test-python-voice (push) Successful in 32s
CI / test-bqas (push) Successful in 32s
CI / Deploy (push) Failing after 4s
Full pitch versioning: 12 data tables versioned as JSONB snapshots,
git-style parent chain (draft→commit→fork), per-investor assignment,
side-by-side diff engine, version-aware /api/data + /api/financial-model.

Bug fixes: FM editor [object Object] for JSONB arrays, admin scroll.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 07:37:33 +00:00
Sharang Parnerkar
746daaef6d ci: add Gitea Actions workflow to build + push pitch-deck image
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 31s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 31s
CI / Deploy (push) Failing after 5s
Builds and pushes to registry.meghsakha.com/breakpilot/pitch-deck
on every push to main that touches pitch-deck/ files. Tags with
:latest and :SHORT_SHA.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 08:36:23 +02:00
Benjamin Admin
441d5740bd feat: Applicability Engine + API-Filter + DB-Sync + Cleanup
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 37s
CI / Deploy (push) Failing after 2s
- Applicability Engine (deterministisch, kein LLM): filtert Controls
  nach Branche, Unternehmensgroesse, Scope-Signalen
- API-Filter auf GET /controls, /controls-count, /controls-meta
- POST /controls/applicable Endpoint fuer Company-Profile-Matching
- 35 Unit-Tests fuer Engine
- Port-8098-Konflikt mit Nginx gefixt (nur expose, kein Host-Port)
- CLAUDE.md: control-pipeline Dokumentation ergaenzt
- 6 internationale Gesetze geloescht (ES/FR/HU/NL/SE/CZ — nur DACH)
- DB-Backup-Import-Script (import_backup.py)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 21:58:17 +02:00
Benjamin Admin
ee5241a7bc merge: gitea/main — resolve pitch-deck conflicts (accept theirs)
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 45s
CI / test-python-voice (push) Successful in 37s
CI / test-bqas (push) Successful in 34s
CI / Deploy (push) Failing after 5s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 14:43:32 +02:00
Benjamin Admin
e3ab428b91 feat: control-pipeline Service aus Compliance-Repo migriert
Control-Pipeline (Pass 0a/0b, BatchDedup, Generator) als eigenstaendiger
Service in Core, damit Compliance-Repo unabhaengig refakturiert werden kann.
Schreibt weiterhin ins compliance-Schema der shared PostgreSQL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 14:40:47 +02:00
c7ab569b2b feat(pitch-deck): admin UI for investor + financial-model management (#3)
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 42s
CI / test-python-voice (push) Successful in 30s
CI / test-bqas (push) Successful in 30s
CI / Deploy (push) Successful in 2s
Adds /pitch-admin dashboard with real bcrypt admin accounts and full
audit attribution for every state-changing action.

- pitch_admins + pitch_admin_sessions tables (migration 002)
- pitch_audit_logs.admin_id + target_investor_id columns
- lib/admin-auth.ts: bcryptjs, single-session, jose JWT with audience claim
- middleware.ts: two-cookie gating with bearer-secret CLI fallback
- 14 new API routes (admin-auth, dashboard, investor detail/edit/resend,
  admins CRUD, fm scenarios + assumptions PATCH)
- 9 admin pages: login, dashboard, investors list/new/[id], audit,
  financial-model list/[id], admins
- Bootstrap CLI: npm run admin:create
- 36 vitest tests covering auth, admin-auth, rate-limit primitives

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 10:36:16 +00:00
645973141c feat(pitch-deck): passwordless investor auth, audit logs, snapshots & PWA (#2)
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 27s
CI / test-python-voice (push) Successful in 25s
CI / test-bqas (push) Successful in 27s
CI / Deploy (push) Successful in 6s
Adds investor-facing access controls, persistence, and PWA support to the pitch deck:

- Passwordless magic-link auth (jose JWT + nodemailer SMTP)
- Per-investor audit logging (logins, slide views, assumption changes, chat)
- Financial model snapshot persistence (auto-save/restore per investor)
- PWA support (manifest, service worker, offline caching, branded icons)
- Safeguards: email watermark overlay, security headers, content protection,
  rate limiting, IP/new-IP detection, single active session per investor
- Admin API: invite, list investors, revoke, query audit logs
- pitch-deck service added to docker-compose.coolify.yml

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 08:48:38 +00:00
Benjamin Admin
68692ade4e fix: DB Pool 5→20 + KPI/Charts Skip DB-Load
Pool-Size von 5 auf 20 erhöht (Connection-Exhaustion bei
parallelen Finanzplan-Queries + Compute + API-Calls)

KPIs/Charts Tabs laden keine DB-Daten (virtual tabs,
Daten sind hardcoded) → sofortiges Rendering

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 23:56:40 +01:00
Benjamin Admin
49908d72d0 feat: Churn Rate in Kundenzahlen integriert
Churn Rates pro Segment (monatlich):
  Startup: 3%, KMU klein: 2%, KMU mittel: 1.5%, Enterprise: 0.5%

Neukunden-Zahlen erhöht um Churn auszugleichen:
  Dez 2026: 17 (statt 14), Dez 2027: 132 (statt 117)
  Dez 2030: 1.322 (statt 1.200)

ARR steigt auf ~11,1M (höhere Neukunden kompensieren Abgang)
Onepager Unternehmensentwicklung synchronisiert.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 23:33:22 +01:00
Benjamin Admin
1b5c2a156c feat: KPIs + Grafiken Reiter im Finanzplan + ROI korrigiert
KPIs Tab: 15 Kennzahlen pro Jahr (2026-2030)
  MRR, ARR, Kunden, ARPU, Mitarbeiter, Umsatz/MA, Personalkosten,
  EBIT, EBIT-Marge, Steuern, Jahresüberschuss, Serverkosten/Kunde,
  Bruttomarge, Burn Rate, Runway

Grafiken Tab:
  - MRR & Kundenentwicklung (Balkendiagramm, 5 Jahre)
  - EBIT (Rot/Grün je nach Verlust/Gewinn)
  - Personalaufbau (Balkendiagramm 5→35)

ROI korrigiert (Ersparnis ÷ Preis):
  KMU: 3,5x, Mittelstand: 7,5x, Konzern: 20,8x

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 23:17:31 +01:00
Benjamin Admin
159d07efd5 feat: Glossar-Folie mit 27 Abkürzungen in 4 Kategorien
Letzte Folie "Glossar & Abkürzungen":
- Code Security & DevSecOps: SAST, DAST, SBOM, DevSecOps, SCA, CI/CD, AppSec
- Compliance & Datenschutz: DSGVO, VVT, TOMs, DSFA, DSR, DSB, ISMS
- EU-Regulierungen: AI Act, CRA, NIS2, MVO, TISAX
- Geschäftskennzahlen: ARR, MRR, CAC, LTV, ARPU, SaaS, ESOP, ROI

Jede Abkürzung mit ausgeschriebenem Namen + Kurzbeschreibung (DE+EN)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 23:03:27 +01:00
Benjamin Admin
06431be40d feat: Kundenersparnis-Folie + Savings FAQ
Neue Folie "Kundenersparnis" mit 3 Unternehmenstypen:
  KMU (25 MA): 97.750→44.530 = 53.220 EUR Ersparnis (ROI 9,1x)
  Mittelstand (100 MA): 419.500→193.880 = 225.620 EUR (ROI 12,6x)
  Konzern (500+ MA): 2.113.500→1.074.080 = 1.039.420 EUR (ROI 17,4x)

Detaillierte Aufschlüsselung pro Kostenposition:
  Pentests pro Anwendung, CE-SW-Risiko pro Produkt,
  Compliance-Team, Entwickler-Produktivität (IDC: 19% Zeitverlust),
  TISAX/ISO, CRA/NIS2, Incident Response

2 neue FAQs: savings-detail (Priority 10) + savings-pentest
System-Prompt angepasst mit konkreten Zahlen

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 22:54:07 +01:00
Benjamin Admin
9f3e5bbf9f fix: Summenzeile für Umsatz + Kunden, Kunden = Dezember-Wert
- Summenzeile auch für Umsatzerlöse und Kunden
- Kunden-Sheets: Jahresspalte zeigt Dezember-Wert (Bestand, nicht Summe)
- Bereits existierende Summenzeilen werden nicht doppelt gezählt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 21:33:22 +01:00
Benjamin Admin
a66b76001b fix: Sortierung Personalkosten + Umlaute DB + Summenzeilen
- Gründer immer sort_order 1+2, dann nach start_date
- Beide Gründer exakt gleiches Gehalt (7.000 EUR/Mo ab Jan 2027)
- Alle Pos-Namen durchnummeriert (Pos 3 bis Pos 35)

Umlaute in DB-Labels (Liquidität, GuV, Betriebliche):
  Umsatzerloese→Umsatzerlöse, UEBERSCHUSS→ÜBERSCHUSS,
  Koerperschaftsteuer→Körperschaftsteuer, etc.
Engine-Labels synchron aktualisiert.

Summenzeile (SUMME) als tfoot für:
  Personalkosten, Materialaufwand, Betriebliche Aufwendungen,
  Investitionen, Sonstige Erträge

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 21:22:45 +01:00
Benjamin Admin
3188054462 feat: Cap Table Folie + INVEST 20% + ESOP + Gründergehälter
Neue Folie "Investition & Cap Table" nach The Ask:
- Pie Chart: Gründer 75%, Investor 19,6%, ESOP 5,4%
- Pre-Seed Details: 4M Pre-Money, 975k Investment, 4,975M Post-Money
- Gründergehälter: 0 (2026) → 7k (2027) → 8k (2028) → 9,1k (2029+)
- Gewinnverwendung: 100% Reinvestition, kein Dividende bis Series A
- INVEST-Programm (BAFA): 20% Zuschuss = 195.000 EUR zurück
- ESOP: 5,4% für Schlüsselmitarbeiter, 4J Vesting, 1J Cliff
- Series A Ausblick: 15-25M Bewertung bei 3M+ ARR

Finanzplan: Gründer 7.000 EUR/Mo ab Jan 2027, 14% jährl. Erhöhung

FAQs: Cap Table + Gewinnverwendung als Fließtext

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 18:20:02 +01:00
Benjamin Admin
5fd65e8a38 feat: Steuerberechnung in GuV — KSt + GewSt + Verlustvortrag
Stockach 78333, Hebesatz 350%:
- Gewerbesteuer: 3,5% × 3,5 = 12,25%
- Körperschaftsteuer: 15% + 5,5% Soli = 15,825%
- Gesamt: ~28,08% auf den Gewinn

Verlustvortrag:
- Verluste werden kumuliert und mit künftigen Gewinnen verrechnet
- Bis 1 Mio EUR: 100% verrechenbar
- Über 1 Mio EUR: nur 60% (Mindestbesteuerung)

GuV-Zeilen: Gewerbesteuer, Körperschaftsteuer, Steuern gesamt,
Ergebnis nach Steuern, Jahresüberschuss

Liquidität: Steuern als monatliche Auszahlungen (1/12 des Jahres)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 17:07:09 +01:00
Benjamin Admin
34d2529e04 feat: Investor Agent — FAQ als LLM-Kontext statt Direkt-Streaming
Architektur-Umbau: FAQ-Antworten werden NICHT mehr direkt gestreamt.
Stattdessen werden die Top-3 relevanten FAQ-Einträge als Kontext
ans LLM übergeben. Das LLM interpretiert die Frage, kombiniert
mehrere FAQs bei komplexen Fragen und antwortet natürlich.

Vorher: Frage → Keyword-Match → FAQ direkt streamen (LLM umgangen)
Nachher: Frage → Top-3 FAQ-Matches → LLM-Prompt als Kontext → LLM antwortet

Neue Funktionen:
- matchFAQMultiple(): Top-N Matches statt nur bester
- buildFAQContext(): Baut Kontext-String für LLM-Injection
- faqContext statt faqAnswer im Request-Body
- System-Prompt Anweisung: "Kombiniere bei Bedarf, natürlicher Fließtext"

Behebt: Komplexe Fragen mit 2+ Themen werden jetzt korrekt beantwortet

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 10:57:47 +01:00
Benjamin Admin
928556aa89 feat: Bechtle/CANCOM Channel-Strategie detailliert auf Strategy-Folie + FAQ
Strategy-Folie: Neue Sektion "Zwei Wege zum Mittelstand"
- CANCOM Cloud Marketplace: TecDAX, ISV-Partnerprogramm, 3-6 Monate
  bis Listing, sofort national sichtbar, hunderte Vertriebsmitarbeiter
- Bechtle Systemhäuser: 15.000 MA, 85+ Standorte, 70.000 Kunden,
  regionaler Einstieg → lokaler Champion → nationale Listung (12-18 Mo)
- Quote: "Direktvertrieb skaliert linear — Channel exponentiell"

FAQ aktualisiert: Vollständige Bechtle/CANCOM-Erklärung als Fließtext
mit konkreten Zahlen und Timeline für Investoren

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 09:45:30 +01:00
Benjamin Admin
720493f26b feat: Firmenstrategie — neue Folie + Channel-first + 35 Rollen überarbeitet
Neue Folie "Anhang: Strategie":
- USP-Darstellung: Code Security vs Compliance vs BreakPilot (3 Kacheln)
- 4 Phasen: Foundation → Traction → Scale → Leadership
- Channel-first-Argument: Bechtle/CANCOM statt Sales-Army
- Firmenaufbau von 5 auf 35 mit ARR-Zielen pro Phase

35 Positionen (DB) neu strukturiert:
- Phase 1: Security Engineer + CE-Risikoingenieur (Produkt-Fokus)
- Phase 2: Channel Manager Bechtle (Monat 6!) + DevSecOps + KI
- Phase 3: Erster Direktvertrieb + Compliance-Jurist + Pentester
- Phase 4+5: VP Sales, Enterprise, EU-Expansion, Developer Relations

Neue FAQs:
- competitor-focus: Deutsche Wettbewerber + Source Code Security (Priority 10)
- strategy-channel-first: Bechtle/CANCOM Channel-Strategie
- team-hiring-order: Aktualisiert mit neuer Reihenfolge

Sharang Parnerkar korrigiert (DB).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 09:17:32 +01:00
Benjamin Admin
ab13254636 fix: Investor Agent — Fließtext statt Bulletlisten + deutsche Rollen
System-Prompt: "Antworte wie ein Mensch im Gespräch, keine Bulletlisten,
erkläre das WARUM, TTS-optimiert"

Alle 6 Team-FAQs + Module-FAQ als natürlicher Fließtext umgeschrieben:
- Deutsche Rollennamen (Vertriebsmitarbeiter, Kundenbetreuer, etc.)
- Begründungen eingebettet ("Der Grund ist...", "Das haben wir bewusst...")
- Übergangssätze für natürlichen Redefluss
- 3-5 Absätze pro Antwort statt Aufzählungen

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 08:44:06 +01:00
Benjamin Admin
104a506b6f feat: Investor Agent FAQ — Team-Aufbau + 12 Module + System-Prompt
6 neue FAQ-Einträge:
- team-structure: 35-MA Organigramm mit Departmentverteilung
- team-hiring-order: Einstellungsreihenfolge Year 1-5 mit Logik
- team-why-compliance-first: Warum DSB vor Engineers (DataGuard/heyData Muster)
- team-competitor-comparison: Vanta/Drata/DataGuard/heyData/Sprinto/Delve Teams
- team-engineering-ratio: 37% Engineering, warum nicht mehr
- modules-overview: Alle 12 Module einzeln aufgezählt

System-Prompt (Chat API) komplett aktualisiert:
- 12 Module statt 65+
- 110 Gesetze, 25.000 Prüfaspekte
- Strategisches Dilemma als Kernproblem
- Finanzplan-Zahlen: 1.200 Kunden, 10M ARR, Break-Even 2029
- Team-Aufbau als Kernbotschaft #8

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 08:28:00 +01:00
Benjamin Admin
92290b9035 fix: Finanzplan Schriftfarben + 35 Personalrollen + 12 Module
Schrift: text-white/20 → text-white/50 für Section/Position Labels,
  text-white/40 → text-white/60 für Tabellenheader (beide Modi lesbar)

35 Rollen basierend auf Wettbewerber-Recherche (Vanta, DataGuard, heyData):
  Year 1 (5): CEO, CTO, Compliance Consultant, 2× Full-Stack Engineer
  Year 2 (+5=10): Sales, CSM, AI Engineer, Head of Product, Frontend
  Year 3 (+7=17): Sales #2, DevOps, Marketing, Compliance #2, Sr. Backend, CSM #2, SDR
  Year 4 (+8=25): VP Sales, Pre-Sales, Security, VP Marketing, Events, HR, CSM #3, QA
  Year 5 (+10=35): Sales DACH, SDR #2, ML, DevRel, Finance, Frontend #2, Legal, BD, Backend #2, Eng. Manager

12 Module: +Cookie-Generator auf Folie 7 + Onepager

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 08:08:43 +01:00
Benjamin Admin
b5d855d117 feat: Presenter Vor/Zurück-Spulen mit Folien-Sync
- prevSlide() in usePresenterMode: springt zur vorherigen Folie,
  stoppt aktuelle Audio, startet Präsentation der vorherigen Folie
- SkipBack Button in PresenterOverlay neben SkipForward
- Beide Buttons springen zur korrekten Folie UND starten die Audio

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 21:57:23 +01:00
Benjamin Admin
1bd57da627 feat: Presenter-Script aktualisiert + COMPLAI + Cookie-Generator (12 Module)
Presenter-Script komplett synchronisiert:
- COMPLAI statt ComplAI überall
- 12 Module aufgezählt (inkl. DSR, Consent, Notfallpläne, Cookie-Generator)
- 110 Gesetze statt 84
- 25.000 Prüfaspekte statt Controls
- SOM 24 Mio. statt 7,2 Mio.
- Gründung Jul/Aug 2026 statt Q4
- Umlaute korrigiert (standardmäßig, wählbar, Lücken, abschließen)

Folie 3 (Cover): COMPLAI groß über BrandName-Komponente
Folie 7: +Cookie-Generator als 12. Modul
Onepager: +Cookie-Generator
DB: Metrics auf 12 Module

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 21:46:06 +01:00
Benjamin Admin
f9c03c30d9 feat: 3 neue Module — DSR, Consent, Notfallpläne (8→11 Module)
Folie 7 (Modularer Baukasten): 11 Module in 4-Spalten-Grid
  Neu: DSR/Betroffenenrechte, Consent Management, Notfallpläne

Onepager: 11 Module kompakt (kürzere Labels für A3)

KI-Pipeline: "1.500+ Pflichten" → "abgeleitete Pflichten" (nicht verifiziert)
Traction: 11 Module in DB-Metrics

Umlaute: fuer→für, Loeschfristen→Löschfristen in ProductSlide

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 21:31:20 +01:00
Benjamin Admin
f2b225106d fix: Umlaute überall korrekt + Meilenstein-Daten aktualisiert
Umlaute: ä, ö, ü in i18n.ts, presenter-script.ts, presenter-faq.ts
  (oe→ö, ae→ä, ue→ü, ~60 Ersetzungen gesamt)

Meilensteine (DB):
  - Plattform-Entwicklung: Januar 2026
  - Compliance SDK 8 Module: März 2026
  - RAG 110 Regularien: April 2026
  - 2 Pilottestkunden: Januar bis Juli 2026
  - GmbH-Gründung: Jul/Aug 2026

KI-Pipeline: 110+ Verordnungen, 25.000+ Prüfaspekte, 1.500+ Pflichten

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 19:33:47 +01:00
Benjamin Admin
29d3ec60d0 fix: KI-Pipeline Deep Dive aktualisiert
- Ingestion: 110+ Verordnungen/Gesetze (statt 38+), 25.000+ Prüfaspekte
- Pflichten-Engine: 1.500+ Pflichten (statt 325+)
- Vektorspeicher: 25.000+ Prüfaspekte · 110 Gesetze · 1.500+ Pflichten

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 18:27:36 +01:00
Benjamin Admin
bbf038d228 feat: Annahmen & Sensitivität — 3 Cases aus Finanzplan
Bear Case: 50% langsamer, 8% Churn → 600 Kunden, 4.2M ARR, BE 2030
Base Case: Wie Finanzplan → 1.200 Kunden, 10M ARR, BE 2029
Bull Case: 50% schneller, 8% Enterprise → 2.000 Kunden, 18M ARR, BE 2028

Alte Szenario-Slider und Mock-Daten komplett entfernt.
Vergleichstabelle unten für schnellen Überblick.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 18:24:59 +01:00
Benjamin Admin
c967d80aed feat: Folie 14 Finanzen — direkt aus Finanzplan DB
- Mock-Daten und Szenario-Toggle entfernt
- Lädt automatisch Finanzplan-Daten beim Öffnen
- KPIs: ARR 2030, Mitarbeiter 2030, Break-Even Jahr, Cash Ende
- Übersicht: Revenue vs. Costs Chart + Waterfall + Cashflow
- GuV: Direkt aus fp_guv DB-Tabelle (keine Mocks)
- Cashflow: AnnualCashflowChart mit 1M InitialFunding
- Keine Slider/Szenario-Sidebar mehr (nicht relevant)
- Umlaute korrigiert (Übersicht statt Uebersicht)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 18:00:15 +01:00
Benjamin Admin
11c0c1df38 fix: Liquidität — operativer Überschuss ohne Kapitaleinzahlungen
Überschuss = NUR operativer Cashflow:
  Einzahlungen: Umsatz + Sonst.Erträge + Anzahlungen (OHNE EK/FK)
  Auszahlungen: Material + Personal + Sonstige + Steuern (OHNE Kredit)
  = Operativer Überschuss

Kontostand = Vormonat + Operativer Überschuss + Finanzierung
  Finanzierung = EK + FK - Kreditrückzahlungen (separat)

So zeigt der Überschuss die echte operative Performance,
die Kapitaleinzahlung erscheint nur im Kontostand.

Marketing: 5.000€/Mo ab Jul 2027 (statt 20k)
Alle Werte Math.round() — ganzzahlig

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 17:40:50 +01:00
Benjamin Admin
f849fd729a fix: Liquidität Kontostand + ganzzahlig + Jahresspalte
- Kontostand/LIQUIDITAET: Jahresspalte zeigt Dez-Wert (nicht Summe)
- Alle Werte ganzzahlig (keine Nachkommastellen)
- Engine: Brutto, Sozial, AfA, Material alles Math.round()
- formatCell: immer maximumFractionDigits: 0
- GuV: Jahreswerte gerundet

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 17:26:30 +01:00
Benjamin Admin
85949dbf8e fix: Gleichmäßiger Personalaufbau + Kunden/Umsatz synchronisiert
Personal: 35 Positionen gleichmäßig über Monate verteilt
  2026: Aug→Dez (1/Monat = 5)
  2027: Feb/Apr/Jun/Sep/Nov (+5 = 10)
  2028: Feb/Mrz/Mai/Jul/Aug/Okt/Dez (+7 = 17)
  2029: Jan/Mrz/Apr/Jun/Jul/Sep/Okt/Dez (+8 = 25)
  2030: Jan→Nov fast jeden Monat (+10 = 35)

Kunden nach Pricing-Tiers (75/15/7/3%):
  Dez 2026: 14 → 73k ARR
  Dez 2027: 117 → 1,0M ARR
  Dez 2028: 370 → 3,2M ARR
  Dez 2029: 726 → 6,2M ARR
  Dez 2030: 1.200 → 10,0M ARR

Onepager Unternehmensentwicklung synchronisiert.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 16:48:30 +01:00
Benjamin Admin
6fba87fdd9 fix: PDF-Template Seitengröße + Finanzplan Daten synchronisiert
PDF: @page 297mm x 680mm mit 30mm Margins (passt zum getesteten Format)
Personal: 35 Positionen (5/10/17/25/35 MA pro Jahr)
Kunden: ~20/122/379/733/1213 verteilt auf 4 Pricing-Tiers
  Startup 60%, KMU klein 25%, KMU mittel 10%, Enterprise 5%

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 16:30:02 +01:00
Benjamin Admin
c7236ef7e8 fix: Onepager Textänderungen + Prüfaspekte
- CE-SW-Risiko: "auf Code-Basis schon in der Entwicklung"
- "Compliance GPT" ohne "Echtzeit"
- Problem +Bullet: "EU-Regulierung unterscheidet nicht klein/groß"
- Sicherheitskontrollen → Prüfaspekte (Hero + KPI-Kachel)
- Pricing: "Startup" ohne "/ <10"
- Markt: SOM mit * "nur Anlagen- und Maschinenbau"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 15:13:26 +01:00
Benjamin Admin
307af5c901 fix: Onepager Texte + gleichmäßige Spalten
Problem: "Hohe Kosten für Pentests und Audits — nur einmal im Jahr"
Lösung: +CE-SW-Risikobeurteilung Echtzeit, +Compliance GPT,
  Pflichten statt CE-Risikobewertungen, Jira entfernt
Spalten: grid-cols-4 / grid-cols-6 gleichmäßig verteilt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 14:52:35 +01:00
Benjamin Admin
625906f75a fix: Onepager — Kacheln-Layout mit gleichen Höhen
- grid-rows-2 für alle 3 Spalten → Höhen exakt synchron
- Umsatzerwartung → "Unternehmensentwicklung" mit MA-Spalte integriert
- Mitarbeiter-Kachel → "Zielmärkte" (Maschinenbau, Automobil, Zulieferer, Produktion)
- Unternehmensentwicklung: 4 Spalten (Jahr, MA, Kunden, ARR)
- Linke + rechte Kacheln haben gleiche Höhe wie mittlere

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 14:27:40 +01:00
Benjamin Admin
129072e0f0 fix: Onepager Layout + Wettbewerber-Daten aktualisiert
Layout: grid-cols-[1fr_1.6fr_1fr], flex-1 für gleiche Höhen
- Links: Mitarbeiter + Markt (schmal, gleiche Höhe)
- Mitte: Umsatz + Wettbewerber (breiter, grid-cols für saubere Spalten)
- Rechts: Pricing + Kundenersparnis (schmal, gleiche Höhe)

Wettbewerber aktualisiert mit recherchierten Daten:
- +Delve (🇺🇸 2024, 24 MA, $2,6M ARR, $35M Invest)
- +Mitarbeiter-Spalte (MA) für alle
- Sprinto: $38M ARR (Latka)
- DataGuard: €20-30M ARR, €65M Invest
- Proliance: €5-10M ARR
- heyData: €3-10M ARR

Go-to-Market: farbige Bullet Points pro Phase
Spalten in Umsatz + Wettbewerber: grid mit 1fr statt fixer px

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 14:05:54 +01:00
Benjamin Admin
dbc4e59e24 fix: Onepager Layout — 3 Spalten gestapelt + 5. Problem-Bullet
Layout: schmal | breit | schmal (grid-cols-[1fr_2fr_1fr])
- Links: Mitarbeiter + Markt (übereinander, schmal)
- Mitte: Umsatzerwartung + Wettbewerber (übereinander, breit)
  Mit sauberen Grid-Spalten und größeren Spaltenüberschriften
- Rechts: Pricing + Kundenersparnis (übereinander, schmal)

Problem: 5. Bullet "Pentests und CE-Zertifizierungen kosten 50.000+
EUR/Jahr — prüfen aber nur einmal"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 13:42:32 +01:00
Benjamin Admin
cf476ea986 fix: Onepager Feinschliff
- Mitarbeiter: 5/10/17/25/35 (statt 5→10 etc.)
- Wettbewerber: +Gründungsjahr +Kundenzahl Spalten
- Umsatzerwartung: +Kundenzahl, höhere Zahlen (30→1.200 Kunden, 8,5M ARR)
- Integration: "Jira" entfernt, nur "Ticketsysteme, Workflows"
- Compliance Docs: "AGB, DSE" → "Pflichten"
- COMPLAI Plattform: "Jitsi, Matrix, volle Integration" entfernt
- Problem: "riskieren, die Kontrolle ... zu verlieren"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 13:22:56 +01:00
Benjamin Admin
c989af42f5 fix: Onepager — CE-Software-Risiko, Roadmap größer, 3 neue Kacheln
- CE-Risikobeurteilung → CE-Software-Risikobeurteilung überall
- Wettbewerber: Spaltenheader "Umsatz" + "Invest"
- Go-to-Market Roadmap: Schrift größer (text-xs Items, text-sm Titel)
- 3 neue Kacheln: Umsatzerwartung (ARR 2026-2030),
  Mitarbeiterentwicklung (5→25), Pricing nach Unternehmensgröße

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 12:59:28 +01:00
Benjamin Admin
d3247ef090 fix: Onepager — finale Texte Problem/Lösung/USP, Gründer-Kachel entfernt
- USP: Ausführlicher Text + "100% Datensouveränität ohne US-Abhängigkeit"
- Problem: "Unlösbare Entscheidung" mit 4 präzisen Bullet Points
- Lösung: "Audit-ready zu jedem Zeitpunkt" mit 5 Bullet Points
- Gründer-Kachel entfernt → 3er-Grid (Ersparnis, Wettbewerber, Markt)
- Wettbewerber: Schrift etwas größer (10px), besser lesbar

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 12:47:46 +01:00
Benjamin Admin
90c7f9d8ec feat: Onepager komplett überarbeitet
- Problem + Lösung: Bullet Points statt Fließtext
- USP: größere Überschrift + "100% Datensouveränität"
- KPIs: 1M Finanzierung entfernt, 80% Zeitersparnis + 10x günstiger hinzu
- Scanner: "Integration in Kundenprozesse" statt "Jira-Integration"
- 8 Module: gleiche Optik wie Folie 7, mit Icons + Beschreibungen
  8. Modul: "Sichere Kommunikation: Chat + Video mit AI Notetaker"
- Geschäftsmodell → Kundenersparnis (Pentests 30k, CE 20k, Audit 60k+)
- Wettbewerber: + Umsatz (ARR) + Investsumme
- Umlaute überall korrekt (ä, ö, ü)
- COMPLAI mit farbigem AI

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 12:31:24 +01:00
Benjamin Admin
c43d39fd7f feat: Executive Summary komplett überarbeitet
- Problem: Strategisches Dilemma (KI vs. Datensouveränität, 30.000+ Unternehmen)
- Lösung: Kontinuierliche Compliance statt punktueller Prüfungen
- Roadmap: Go-to-Market Phasen 1-3 (statt Q-Kacheln), Gründung Jul/Aug 2026
- 8 Module als kompakte Baukasten-Leiste
- Wettbewerber-Kachel: 6 Wettbewerber mit Flagge + Bewertung
- Umlaute: ä, ö, ü statt ae, oe, ue in allen deutschen Texten
- COMPLAI statt ComplAI, AI farblich abgesetzt
- USP: "auf deutscher oder französischer Cloud"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 12:05:36 +01:00
Benjamin Admin
8aca75118c fix: Zahlen und Texte korrigiert — Problem, USP, KPIs
Problem-Text: Neuer Wortlaut (US-KI-Anbieter, 30.000+ Unternehmen,
egal ob 10 oder 5.000 MA, Datenmissbrauch-Risiko)

USP: "auf deutscher oder franzoesischer Cloud"

KPI-Kacheln: 170+ Originaldokumente entfernt, 40.000→25.000+
Sicherheitskontrollen, 84→110 Gesetze & Regularien (nur EU+DACH),
761K→500K+ Lines of Code

Konsistent in: i18n (DE+EN), Executive Summary (Slide+PDF),
Competition, AI Pipeline, SDK Demo, Regulatory Landscape,
Presenter Script, FAQ

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 11:54:26 +01:00
Benjamin Admin
6bf2692faa fix: Executive Summary Anpassungen
- Titel: "BreakPilot COMPLAI" mit farblich abgesetztem "AI"
- Untertitel: "Onepager" statt "Executive Summary"
- Hero: Neuer Text mit 25.000 atomaren Sicherheitskontrollen,
  "unsere Kunden" statt "Maschinenbauer", keine Datensouveraenitaet im Titel
- USP: "CE-Software-Risikobeurteilung fuer unsere Kunden"
- PDF-Template synchron aktualisiert

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 11:37:25 +01:00
Benjamin Admin
2d85ef310a fix: Schriftgroessen auf Executive Summary ueberall erhoeht
Alle Texte ca. 2 Stufen groesser:
- Hero: text-xs → text-sm
- USP: text-[10px]/text-xs → text-xs/text-sm
- Problem/Loesung: text-[10px] → text-sm
- KPI Labels: text-[8px] → text-[10px], Values: text-base → text-lg
- Scanner/Platform: text-xs → text-sm (Titel), text-[9px] → text-xs (Items)
- Roadmap: text-[10px] → text-xs
- Bottom-Kacheln: text-[9px] → text-xs
- Gruender: text-[9px]/text-[8px] → text-xs/text-[10px]
- Disclaimer: text-[7px] → text-[9px]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 09:23:00 +01:00
Benjamin Admin
774a0ba6db feat: Haftungsausschluss auf Executive Summary (Slide + PDF)
Vollstaendiger Disclaimer-Text (DE + EN) am Ende der Executive Summary:
- Slide: Dezente Box mit 7px Schrift, vor dem Download-Button
- PDF (DIN A3): Gleicher Text in #94a3b8 vor dem Footer
Inhalt: Keine Anlageberatung, zukunftsgerichtete Aussagen,
Team Breakpilot (noch keine GmbH), Vertraulichkeit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 09:11:21 +01:00
Benjamin Admin
566a8bf84e feat: Tag-Modus komplett im Onepager-Design
- Weisser Hintergrund (#fff) statt Grau
- Plus Jakarta Sans Font (wie Onepager)
- Solide Karten (#f8fafc, #e2e8f0 Borders) statt Glass-Effekte
- Kein Backdrop-Blur im Light Mode
- Partikel komplett ausgeblendet
- Onepager Farb-Hierarchie: #1a1a2e → #334155#475569 → #64748b → #94a3b8
- Akzent-Hintergruende: #eef2ff (Indigo), #ecfdf5 (Emerald), #fefce8 (Amber)
- Sidebar/Chat: Weiss mit #e2e8f0 Borders
- Saubere Shadows statt Glow-Effekte
- KPI-Glow-Dots ausgeblendet

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 08:49:50 +01:00
Benjamin Admin
3567845235 feat: Executive Summary komplett ueberarbeitet — Onepager + Exec kombiniert
Slide-Ansicht (scrollbar, passt nicht auf einen Screen):
- Hero-Text: 4 Zeilen Plattform-Beschreibung (aus Onepager)
- USP-Banner
- Problem + Loesung (aus bisheriger Exec Summary)
- 6 KPI-Kacheln (170+, 40k+, 84, 10, 761K, 1M)
- Compliance Scanner Features (5 Punkte, aus Onepager)
- ComplAI Plattform Features (5 Punkte, aus Onepager)
- Roadmap: Q4/2026 → Q3/2029 Break-Even
- 4-Spalten: Geschaeftsmodell, Zielmaerkte, Gruender, Funding+Markt

PDF-Download (DIN A3 Hochformat, 297x420mm):
- Plus Jakarta Sans Font
- Gradient Top-Bar
- Alle Sektionen auf A3 optimiert
- Druckfertig mit print-color-adjust

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 08:41:57 +01:00
Benjamin Admin
c4d8da6d0d fix: Tag-Modus — Sidebar, Chat-Panel, Modals, Kacheln lesbar
- NavigationFAB/ChatFAB: bg-black/* → weisser Hintergrund im Light Mode
- Hover-States: bg-white/* → leichte Grautöne
- Shadows: dunkle Schatten → leichte Schatten
- Modal-Backdrops: transparent statt dunkel
- Input-Felder, KPI-Cards, Progress-Bar angepasst
- Farbige Akzent-Hintergründe (rot/grün/amber) leichter

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 08:27:43 +01:00
Benjamin Admin
fa8010cf91 feat: Tag/Nacht-Modus fuer gesamtes Pitch Deck
- CSS-Variablen-basiertes Theming (globals.css)
- .theme-light Klasse auf html-Element schaltet alles um
- Toggle-Button oben rechts (Sonne/Mond Icon)
- Light Mode: helle Hintergruende, dunkle Texte, gedaempfte Glass-Effekte
- Alle text-white/* Klassen werden per CSS Override umgemapped
- Partikel-Background auf 8% Opacity im Light Mode
- Kein text-shadow-glow im Light Mode

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 08:08:16 +01:00
Benjamin Admin
16de384831 fix: GuV-Tab als Jahrestabelle (y2026-y2030) statt Monatsgrid
GuV hat Jahres-Keys (y2026) statt Monats-Keys (m1-m60).
Eigene Tabelle mit 5 Jahrsspalten, Jahresnavigation ausgeblendet.
Alle Summenzeilen (EBIT, Ergebnis, Jahresueberschuss) hervorgehoben.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 23:50:53 +01:00
Benjamin Admin
a01e6cb88e feat: Phase 5+6 — Finanzplan Bridge + Financials-Slide Sync
- Adapter: fp_* Tabellen → FMResult Interface (60 Monate)
- Compute-Endpoint: source=finanzplan delegiert an Finanzplan-Engine
- useFinancialModel Hook: computeFromFinanzplan() + finanzplanResults
- FinancialsSlide: Toggle "Szenario-Modell" vs "Finanzplan (Excel)"
- Gruendungsdatum fix: EK+FK auf Aug (m8), Raumkosten ab Aug
- Startup-Preisstaffel: <10 MA ab 3.600 EUR/Jahr, 14-Tage-Test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 20:15:30 +01:00
Benjamin Admin
a58cd16f01 feat: Finanzplan Phase 1-4 — DB + Engine + API + Spreadsheet-UI
Phase 1: DB-Schema (12 fp_* Tabellen) + Excel-Import (332 Zeilen importiert)
Phase 2: Compute Engine (Personal, Invest, Umsatz, Material, Betrieblich, Liquiditaet, GuV)
Phase 3: API (/api/finanzplan/ — GET sheets, PUT cells, POST compute)
Phase 4: Spreadsheet-UI (FinanzplanSlide als Annex mit Tab-Leiste, editierbarem Grid, Jahres-Navigation)

Zusaetzlich:
- Gruendungsdatum verschoben: Feb→Aug 2026 (DB + Personalkosten)
- Neue Preisstaffel: Startup/<10 MA ab 3.600 EUR/Jahr (14-Tage-Test, Kreditkarte)
- Competition-Slide: Pricing-Tiers aktualisiert

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 19:26:46 +01:00
Benjamin Admin
f514667ef9 feat: Modularer Baukasten + mitarbeiterbasiertes Pricing + Savings-ROI
Produkte: 8 Module als Baukasten (Code Security, CE-Risiko, Compliance-Docs,
Audit Manager, LLM, Academy, Jira, Full Compliance)
Pricing: nach MA (<50: 15k, 50-250: 30k, 250+: 40-50k EUR/Jahr)
Cloud Standard (BSI DE/OVH FR), Mac Mini nur fuer <10 MA

Geschaeftsmodell: ROI-Rechnung statt HW-Amortisation
(Kunde zahlt 40-50k, spart 50-110k: Pentests, CE, Auditmanager)

So funktioniert's: Cloud-Vertrag statt HW aufstellen,
Audit vorbereiten statt Audit bestehen

Competition: Pricing-Tiers auf Cloud-Modell umgestellt
FAQ: Alle 65+-Referenzen + alte Tier-Preise entfernt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 18:10:11 +01:00
Benjamin Admin
9e712465af feat: Audit-Abweichungen End-to-End in Solution + Executive Summary
Nach dem Audit: Haupt-/Nebenabweichungen automatisch abarbeiten —
Rollen zuweisen, Stichtage, Tickets, Nachweise einfordern,
Eskalation an GF. Kein Excel, kein Hinterherlaufen.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 17:40:50 +01:00
Benjamin Admin
bf22d436fb feat: Problem-Narrative — KI-Dilemma statt Bussgeld-Zahlen
Echte KMU-Sorgen statt irrelevante 4.1B-Statistik:
1. KI-Dilemma: Wollen KI, aber keinen Copilot/Claude im Code
2. Patriots Act: Selbst EU-Server der US-Player unsicher
3. Regulierungs-Tsunami: 5+ Gesetze, 50k/Jahr Stichproben

Quote: "Maschinenbauer brauchen eine KI-Loesung, die in Deutschland
laeuft, ihren Code schuetzt und Compliance automatisiert."

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 17:34:15 +01:00
Benjamin Admin
f689b892de feat: Komplette Story-Ueberarbeitung — KMU-Maschinenbau-Narrative
Problem: Regulierungs-Tsunami (5+ Gesetze, persoenliche GF-Haftung),
jaehrliche Stichproben (50k+ EUR/Jahr), Datensouveraenitaet (0 DE-Alternativen)

Loesung: Kontinuierliche Code-Security statt Stichproben,
Compliance auf Autopilot (VVT, TOMs, DSFA, Loeschfristen, CE),
Deutsche Cloud (BSI DE / OVH FR), Jitsi, Matrix, Jira-Integration

ROI: Kunde zahlt 50k/Jahr, spart 50k+ (Pentests, CE, Auditmanager)

DB: Funding 1M EUR, SOM 24M EUR

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 17:25:40 +01:00
Benjamin Admin
2f2338c973 feat: Executive Summary ueberarbeitet — Kernfeatures statt Hardware
- Funding: 1 Mio EUR (DB), Use of Funds: 35% Vertrieb, 20% Workshops
- SOM: 7.2M → 24M EUR (DB), Wettbewerbs-Benchmark
- Executive Summary: Mac Mini/Studio entfernt, stattdessen:
  Full Compliance GPT, ISMS, CE-Risikobeurteilung, DAST/SAST/SBOM,
  VVT, TOMs, DSFA, Loeschfristen, Jira-Integration
- USP: Full KI Compliance Check + CE Software + DevSecOps
- Geschaeftsmodell-Text aktualisiert

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 17:01:25 +01:00
Benjamin Admin
10eb0ce5f9 feat: Maschinenbau als Branche + Zahlen 9→10 Branchen
- Maschinenbau als neue Kern-Branche in Matrix (15 Regularien)
- Alle Branchen-Counts aktualisiert (synced mit breakpilot-lehrer)
- 9→10 Branchen ueberall konsistent (i18n, KPIs, Presenter, FAQ)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 15:56:19 +01:00
Benjamin Admin
32616504a6 feat: RAG-Zahlen korrigiert + Branchen-Regulierungs-Matrix
- Alle Zahlen aktualisiert: 170+ Originaldokumente, 40.000+ Controls,
  84 Regularien, 9 Branchen (statt 57 Module / 19 Regularien / 2.274 Texte)
- Neue Folie: Regulatorische Landschaft mit Branchen-Regulierungs-Matrix
- Konsistent in: Solution, Executive Summary (Slide+PDF), Competition,
  AI Pipeline, SDK Demo, Presenter Script, FAQ

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 15:40:44 +01:00
Benjamin Admin
4bce3724f2 feat: Executive Summary Onepager-Slide mit PDF-Download
Neue Folie als erste Content-Slide (nach Intro) mit kompakter
Investor-Uebersicht: Problem/Loesung, KPIs, Markt, Team, Funding.
PDF-Download via window.print() ohne zusaetzliche Dependencies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 15:00:54 +01:00
Benjamin Admin
322e2d9cb3 feat(embedding): implement legal-aware chunking pipeline
Replace plain recursive chunker with legal-aware chunking that:
- Detects legal section headers (§, Art., Section, Chapter, Annex)
- Adds section context prefix to every chunk
- Splits on paragraph boundaries then sentence boundaries
- Protects DE + EN abbreviations (80+ patterns) from false splits
- Supports language detection for locale-specific processing
- Force-splits overlong sentences at word boundaries

The old plain_recursive API option is removed — all non-semantic
strategies now route through chunk_text_legal().

Includes 40 tests covering header detection, abbreviation protection,
sentence splitting, and legal chunking behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-22 09:18:23 +01:00
Benjamin Admin
c1a8b9d936 feat(pitch-deck): update Engineering + AI Pipeline slides with current data
Engineering slide:
- Woodpecker CI → Gitea Actions + Coolify
- Stats: 481K LOC, 10 containers, 48+ modules, 14 Dockerfiles
- Infrastructure: Hetzner + SysEleven (BSI) + OVH, no US providers
- Service architecture: compliance-only (Frontend, Backend, Infra)

AI Pipeline slide:
- 38+ indexed regulations, 6,259 extracted controls, 325+ obligations
- 6 Qdrant collections, 2,274+ chunks
- UCCA policy engine (45 rules, E0-E3 escalation)
- LLM: 120B on OVH + 1000B on SysEleven (BSI), via LiteLLM
- QA: PDF-QA pipeline, Gitea Actions CI, Coolify deploy

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 23:08:34 +01:00
Benjamin Admin
c374600833 fix(pitch-deck): set proper ownership on public/ dir for standalone mode
Screenshots were owned by root but Next.js standalone runs as nextjs user,
causing image optimization to fail with 'not a valid image' error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 20:57:51 +01:00
Benjamin Admin
87b00a94c0 feat(pitch): add SDK demo slide with screenshot gallery + inline preview
- New annex slide 'annex-sdk-demo' with auto-scrolling screenshot gallery
  (22 real screenshots from Müller Maschinenbau demo project)
- Browser chrome mockup, fullscreen view, thumbnail strip navigation
- Inline SDK dashboard preview on Product slide
- Seed script for creating demo data + taking Playwright screenshots
- Presenter script for SDK demo narration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 20:51:17 +01:00
Benjamin Admin
978f0297eb feat(pitch): rewrite pitch content — Cloud SDK as core product
Restructure all pitch messaging: Cloud-based SDK platform with 65+ modules
is the CORE product. Mac Mini/Studio repositioned as side product for small
firms. Updated presenter scripts (20 slides), FAQ (35 entries), and chat
system prompt with new Kernbotschaften covering company compliance, Code/CE
scanning, EU AI hosting, Jira integration, and additional features.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 18:10:33 +01:00
Benjamin Admin
959986356b feat(chat): TTS for chat responses + fix team FAQ with real founder names
- Chat answers are now read aloud via Edge TTS (auto, with mute toggle)
- FAQ team answer: vague text → Benjamin Boenisch (CEO) + Sharang (CTO)
- System prompt: explicit instruction to always cite team names from DB
- Speaker icon in chat header shows speaking state, click to mute/unmute
- Audio stops on new message, chat close, or mute

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 17:18:30 +01:00
Benjamin Admin
f126b40574 feat(presenter): continuous speech — no gaps between paragraphs/slides
- Concatenate all paragraphs + transition hint into one TTS call per slide
  → natural prosody, zero gaps within a slide
- Pre-fetch next slide's audio during current playback → seamless transitions
- Advance slide during transition phrase ("Let us look at...")
- Pause/resume without destroying audio → instant continue
- Subtitle display synced to playback position via timeupdate

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 17:02:13 +01:00
Benjamin Admin
fa4027d027 fix(chat): extract SLIDE_ORDER to shared module for server-side import
useSlideNavigation.ts has 'use client' — server API routes can't import
from it. Move SLIDE_ORDER to lib/slide-order.ts (no 'use client') and
re-export from useSlideNavigation for backwards compat.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 15:02:06 +01:00
Benjamin Admin
9da9b323fc fix(presenter): fix resume after chat interruption + sync stateRef
stateRef was still 'resuming' when advanceRef.current() ran,
causing it to bail out. Now sync stateRef immediately before advance.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 14:04:39 +01:00
Benjamin Admin
eb263ce7a4 fix(presenter): replace crypto.subtle with simple hash for HTTP compatibility
crypto.subtle requires HTTPS context. Use simple string hash instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 12:42:53 +01:00
Benjamin Admin
aece5f7414 fix(presenter): unlock audio playback via AudioContext on user gesture
Browser autoplay policy blocks audio.play() outside user gesture.
Use AudioContext to unlock audio immediately in click handler.
Add console logging for TTS debugging.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 12:38:16 +01:00
Benjamin Admin
ddabda6f05 feat(presenter): replace Web Speech API with Piper TTS for high-quality voice
- New API route /api/presenter/tts proxies to compliance-tts-service
- usePresenterMode now uses Audio element with Piper-generated MP3
- Client-side audio caching (text hash → blob URL) avoids re-synthesis
- Graceful fallback to word-count timer if TTS service unavailable
- Add TTS_SERVICE_URL env var to pitch-deck Docker config

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 12:23:37 +01:00
Benjamin Admin
bcbceba31c feat(presenter): add browser TTS (Web Speech API) + fix German umlauts
- Integrate Web Speech API into usePresenterMode for text-to-speech
- Speech-driven paragraph advancement (falls back to timer if TTS unavailable)
- TTS toggle button (Volume2/VolumeX) in PresenterOverlay
- Chrome keepAlive workaround for long speeches
- Voice selection: prefers premium/neural voices, falls back to any matching lang
- Fix all German umlauts across presenter-script, presenter-faq, i18n, route.ts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 12:11:12 +01:00
Benjamin Admin
3a2567b44d feat(pitch-deck): add AI Presenter mode with LiteLLM migration and FAQ system
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 27s
CI / test-python-voice (push) Successful in 25s
CI / test-bqas (push) Successful in 25s
CI / Deploy (push) Successful in 4s
- Migrate chat API from Ollama to LiteLLM (OpenAI-compatible SSE)
- Add 15-min presenter storyline with bilingual scripts for all 20 slides
- Add FAQ system (30 entries) with keyword matching for instant answers
- Add IntroPresenterSlide with avatar placeholder and start button
- Add PresenterOverlay (progress bar, subtitle text, play/pause/stop)
- Add AvatarPlaceholder with pulse animation during speaking
- Add usePresenterMode hook (state machine: idle→presenting→paused→answering→resuming)
- Add 'P' keyboard shortcut to toggle presenter mode
- Support [GOTO:slide-id] markers in chat responses
- Dynamic slide count (was hardcoded 13, now from SLIDE_ORDER)
- TTS stub prepared for future Piper integration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 11:45:55 +01:00
Benjamin Admin
df0a9d6cf0 feat(pitch-deck): update TAM/SAM/SOM with bottom-up competitor revenue validation
MarketSlide:
- TAM sources updated: bottom-up from Top-10 competitor revenues (>$1.13B known)
- SAM increased €850M → €950M, growth 19.5% → 24% (NIS2/CRA/AI Act expansion)
- SAM source: bottom-up DACH revenues (DataGuard €52M, heyData €15M, etc.)
- SOM growth increased to 30%, benchmark against Proliance/heyData
- TAM growth updated to 18.5% (compliance automation wave 30-45% vs GRC avg 13.8%)

ProblemSlide:
- Added 3rd source to DSGVO card: market validation with real competitor revenues
- Highlights: Vanta $220M/$4.15B, Top-10 >$1.1B, 80% still manual

DB (pitch_market):
- SAM value_eur: 850M → 950M
- Growth rates: TAM 16.2→18.5, SAM 19.5→24.0, SOM 25→30
- Source strings updated to reference bottom-up methodology

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 08:33:18 +01:00
Benjamin Admin
38363b2837 feat(pitch-deck): rewrite CompetitionSlide with 6 detailed competitor profiles
- Add Vanta, Drata, Sprinto (international) alongside Proliance, DataGuard, heyData (DACH)
- Each card: HQ city/country, offices, employees, revenue, customers + countries, funding, investors, AI badge
- Two tabs: Overview & Comparison / Feature Matrix (Detail)
- 44-feature comparison table with collapsible sections: Top 5 Unterschiede, Alle Features, USP
- Efficiency ratios table (revenue/employee, customers/employee)
- DACH landscape note (Secjur, Usercentrics, Caralegal, 2B Advice, OneTrust)
- Research-backed data: Vanta $220M/$4.15B, Drata $100M/$2B, Sprinto $38M, DataGuard €52M, heyData €15M
- Dynamic feature/USP counts in subtitle
- Bilingual (de/en) with i18n subtitle update

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 08:26:20 +01:00
Benjamin Admin
96f94475f6 fix: downgrade to PaddleOCR 2.x — 3.x uses too much RAM on CPU
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 33s
CI / test-python-voice (push) Successful in 31s
CI / test-bqas (push) Successful in 34s
CI / Deploy (push) Successful in 2s
PaddlePaddle 3.x + PP-OCRv5 requires >6GB RAM and has oneDNN
compatibility issues on CPU. PaddleOCR 2.x with PP-OCRv4 works
reliably with ~2-3GB RAM and has no MKLDNN issues.

- Pin paddlepaddle<3.0.0 and paddleocr<3.0.0
- Simplify main.py — single init strategy, direct 2.x result format
- Re-enable warmup (fits in memory with 2.x)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 19:13:33 +01:00
Benjamin Admin
3fd3336f6c fix: force-disable oneDNN via paddle.set_flags and enable_mkldnn=False
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 34s
CI / test-python-voice (push) Successful in 32s
CI / test-bqas (push) Successful in 32s
CI / Deploy (push) Successful in 2s
Previous FLAGS_use_mkldnn env var was ignored by PaddlePaddle 3.x.
Now using paddle.set_flags() API and PaddleOCR enable_mkldnn param.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 19:01:46 +01:00
Benjamin Admin
eaba087d11 fix: disable oneDNN/MKLDNN and support PaddleOCR 3.x result format
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 31s
CI / test-python-voice (push) Successful in 1m19s
CI / test-bqas (push) Successful in 32s
CI / Deploy (push) Successful in 2s
- Set FLAGS_use_mkldnn=0 before paddle import to avoid
  ConvertPirAttribute2RuntimeAttribute error
- Support both PaddleOCR 2.x (list) and 3.x (dict) result formats
- Use use_textline_orientation (3.x) instead of use_angle_cls
- Remove latin lang fallback (not supported in 3.x)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 18:52:31 +01:00
Benjamin Admin
ed2cc234b8 fix: add error handling and logging to OCR endpoint
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 31s
CI / test-python-voice (push) Successful in 32s
CI / test-bqas (push) Successful in 33s
CI / Deploy (push) Successful in 2s
Return detailed error message instead of generic 500, and handle
empty OCR results gracefully.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 18:37:32 +01:00
Benjamin Admin
ffd3fd1d7c fix: remove warmup OCR call — causes OOM on 6G container
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 38s
CI / test-python-voice (push) Successful in 38s
CI / test-bqas (push) Successful in 50s
CI / Deploy (push) Successful in 2s
The warmup OCR call during startup pushes memory over 6G and causes
OOM kills + restart loops. First real OCR request will be slow
(JIT compilation) but container stays stable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 18:24:55 +01:00
Benjamin Admin
23694b6555 fix: increase paddleocr memory limit 4G → 6G
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 34s
CI / test-python-voice (push) Successful in 32s
CI / test-bqas (push) Successful in 33s
CI / Deploy (push) Successful in 2s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 18:08:33 +01:00
Benjamin Admin
8979aa8e43 fix: add warmup OCR call to avoid timeout on first request
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 43s
CI / test-python-voice (push) Successful in 35s
CI / test-bqas (push) Successful in 34s
CI / Deploy (push) Successful in 3s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 16:56:08 +01:00
Benjamin Admin
c433bc021e docs: add post-push deploy monitoring to CLAUDE.md
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 32s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 35s
CI / Deploy (push) Successful in 2s
After every push to gitea, Claude automatically polls health endpoints
and notifies the user when deployment is ready for testing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 13:45:09 +01:00
Benjamin Admin
f4ed1eb10c feat: add paddleocr-service to Coolify compose
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 30s
CI / test-python-voice (push) Successful in 34s
CI / test-bqas (push) Successful in 32s
CI / Deploy (push) Successful in 2s
Add PaddleOCR PP-OCRv5 service with 4G memory limit, model volume,
and health check (5min start period for model loading). Domain routing
(ocr.breakpilot.com) to be configured in Coolify UI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 13:43:11 +01:00
Benjamin Admin
9c8663a0f1 Merge gitea/main: accept Coolify compose config
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 36s
CI / test-python-voice (push) Successful in 40s
CI / test-bqas (push) Successful in 32s
CI / Deploy (push) Successful in 2s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 13:27:29 +01:00
Benjamin Admin
d1632fca17 docs: update all docs to reflect Coolify deployment model
Replace Hetzner references with Coolify. Deployment is now:
- Core + Compliance: Push gitea → Coolify auto-deploys
- Lehrer: stays local on Mac Mini

Updated: CLAUDE.md, MkDocs CI/CD pipeline, MkDocs index, environments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 12:18:25 +01:00
fcf8aa8652 fix: migrate deployment from Hetzner to Coolify (#1)
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 31s
CI / test-python-voice (push) Successful in 32s
CI / test-bqas (push) Successful in 28s
CI / Deploy (push) Successful in 2s
## Summary
- Add Coolify deployment configuration (docker-compose, healthchecks, network setup)
- Replace deploy-hetzner CI job with Coolify webhook deploy
- Externalize postgres, qdrant, S3 for Coolify environment
- Remove services not needed for SDK deployment (voice, jitsi, synapse)

## All changes since branch creation
- Coolify docker-compose with healthchecks for all services
- CI pipeline: deploy-hetzner → deploy-coolify (simple webhook curl)
- QDRANT_API_KEY support in rag-service
- Alpine-compatible Dockerfile fixes

Co-authored-by: Sharang Parnerkar <parnerkarsharang@gmail.com>
Reviewed-on: #1
2026-03-13 10:45:18 +00:00
Benjamin Admin
65177d3ff7 fix: robust PaddleOCR init with multiple fallback strategies
Some checks failed
CI / go-lint (pull_request) Failing after 2s
CI / python-lint (pull_request) Failing after 11s
CI / nodejs-lint (pull_request) Failing after 2s
CI / test-go-consent (pull_request) Failing after 2s
CI / test-python-voice (pull_request) Failing after 14s
CI / test-bqas (pull_request) Failing after 11s
CI / deploy-hetzner (pull_request) Has been skipped
Deploy to Coolify / deploy (push) Has been cancelled
PaddleOCR 3.x removed show_log param and lang='latin'. Try multiple
init strategies in order until one succeeds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 11:09:33 +01:00
Benjamin Admin
559d6a351c fix: resolve stash conflict
Some checks failed
Deploy to Coolify / deploy (push) Has been cancelled
CI / go-lint (pull_request) Failing after 2s
CI / python-lint (pull_request) Failing after 14s
CI / nodejs-lint (pull_request) Failing after 3s
CI / test-go-consent (pull_request) Failing after 3s
CI / test-python-voice (pull_request) Failing after 11s
CI / test-bqas (pull_request) Failing after 10s
CI / deploy-hetzner (pull_request) Has been skipped
2026-03-13 10:59:30 +01:00
Benjamin Admin
8fd11998e4 merge: resolve docker-compose.coolify.yml conflict (accept remote) 2026-03-13 10:56:36 +01:00
Benjamin Admin
4ce649aa71 fix: upgrade PaddleOCR to 3.x for PP-OCRv5 and stability
Old paddlepaddle==2.6.2 + paddleocr==2.8.1 caused hangs on first OCR
request. Upgrading to paddlepaddle>=3.0.0 + paddleocr>=2.9.0 enables
native PP-OCRv5 support and fixes stability issues.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:53:18 +01:00
Sharang Parnerkar
cf2cabd098 Remove services not needed by SDK from Coolify deployment
Some checks failed
CI / go-lint (pull_request) Failing after 15s
CI / python-lint (pull_request) Failing after 10s
CI / nodejs-lint (pull_request) Failing after 2s
CI / test-go-consent (pull_request) Failing after 2s
CI / test-python-voice (pull_request) Failing after 11s
CI / test-bqas (pull_request) Failing after 10s
CI / deploy-hetzner (pull_request) Has been skipped
Deploy to Coolify / deploy (push) Has been cancelled
Remove backend-core, billing-service, night-scheduler, and admin-core
as they are not used by any compliance/SDK service. Update
health-aggregator CHECK_SERVICES to reference consent-service instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:16:59 +01:00
Sharang Parnerkar
8ee02bd2e4 Add healthchecks to backend-core, consent-service, billing-service, admin-core
Coolify/Traefik requires healthchecks to route traffic to containers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:16:59 +01:00
Sharang Parnerkar
d9687725e5 Remove Traefik labels from coolify compose — Coolify handles routing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:16:59 +01:00
Sharang Parnerkar
6c3911ca47 Fix admin-core build: ensure public directory exists before build
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:16:59 +01:00
Sharang Parnerkar
30807d1ce1 Fix backend-core TARGETARCH: auto-detect instead of hardcoded arm64
The Dockerfile hardcoded TARGETARCH=arm64 for Mac Mini. Coolify server
is x86_64, causing exit code 126 (wrong binary arch). Now uses Docker
BuildKit's auto-detected TARGETARCH with dpkg fallback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:16:59 +01:00
Sharang Parnerkar
82c28a2b6e Add QDRANT_API_KEY support to rag-service
- Add QDRANT_API_KEY to config.py (empty string = no auth)
- Pass api_key to QdrantClient constructor (None when empty)
- Add QDRANT_API_KEY to coolify compose and env example

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:16:59 +01:00
Sharang Parnerkar
86624d72dd Sync coolify compose with main: remove voice-service, update rag/embedding
- Remove voice-service (removed in main branch)
- Remove voice_session_data volume
- Add OLLAMA_URL and OLLAMA_EMBED_MODEL to rag-service
- Update embedding-service default model to BAAI/bge-m3, memory 4G→8G
- Update health-aggregator CHECK_SERVICES (remove voice-service)
- Update .env.coolify.example accordingly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:16:59 +01:00
Sharang Parnerkar
9218664400 fix: use Alpine-compatible addgroup/adduser flags in Dockerfiles
Replace --system/--gid/--uid (Debian syntax) with -S/-g/-u (BusyBox/Alpine).
Coolify ARG injection causes exit code 255 with Debian-style flags.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:16:59 +01:00
Sharang Parnerkar
8fa5d9061a refactor(coolify): externalize postgres, qdrant, S3; remove jitsi/synapse
- Remove PostgreSQL, Qdrant, MinIO services (managed separately in Coolify)
- Remove Jitsi stack (web, xmpp, jicofo, jvb) and Synapse/synapse-db
- Add POSTGRES_HOST, QDRANT_URL, S3_ENDPOINT/S3_ACCESS_KEY/S3_SECRET_KEY env vars
- Remove Traefik labels from internal-only services
- Health aggregator no longer checks external services
- Core now has 10 services: valkey + 9 application services

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:16:59 +01:00
Sharang Parnerkar
84002f5719 feat: add Coolify deployment configuration
Add docker-compose.coolify.yml (17 services), .env.coolify.example,
and Gitea Action workflow for Coolify API deployment. Removes nginx,
vault, gitea, woodpecker, mailpit, and dev-only services. Adds Traefik
labels for *.breakpilot.ai domain routing with Let's Encrypt SSL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 10:16:59 +01:00
Benjamin Admin
8b87b90cbb fix(qdrant): Increase ulimits for RocksDB (Too many open files)
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 32s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 31s
CI / deploy-hetzner (push) Successful in 40s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 22:31:16 +01:00
Benjamin Admin
be45adb975 fix(rag): Auto-create Qdrant collection on first index
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 33s
CI / test-python-voice (push) Successful in 36s
CI / test-bqas (push) Successful in 31s
CI / deploy-hetzner (push) Successful in 38s
Collections may not exist if init_collections() failed at startup
(e.g. Qdrant not ready). Now index_documents() ensures the
collection exists before upserting.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 21:02:05 +01:00
Benjamin Admin
7c932c441f feat(rag): Add bp_compliance_gesetze + bp_compliance_ce collections
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 35s
CI / test-python-voice (push) Successful in 50s
CI / test-bqas (push) Successful in 33s
CI / deploy-hetzner (push) Successful in 39s
Required for Verbraucherschutz + EU law ingestion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 20:41:26 +01:00
Benjamin Admin
1eb402b3da fix(ci): Remove Ollama host port binding — port 11434 already in use
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 31s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 31s
CI / deploy-hetzner (push) Successful in 1m18s
Host already has Ollama running (LibreChat). Our container only needs
internal docker network access via container name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 20:04:32 +01:00
Benjamin Admin
963e824328 fix(ci): Use external network + pre-create breakpilot-network
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 33s
CI / test-python-voice (push) Successful in 35s
CI / test-bqas (push) Successful in 30s
CI / deploy-hetzner (push) Failing after 15s
Network already exists from compliance project — use external: true
and pre-create with docker network create before docker compose up.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 20:01:17 +01:00
Benjamin Admin
c0782e0039 fix(ci): Fix backend-core TARGETARCH for amd64 + set -e in deploy
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 33s
CI / test-python-voice (push) Successful in 33s
CI / test-bqas (push) Successful in 33s
CI / deploy-hetzner (push) Failing after 1m17s
- backend-core Dockerfile defaults TARGETARCH=arm64, override with build arg
- Add set -e in helper container to fail fast on build errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 19:51:19 +01:00
Benjamin Admin
44d66e2d6c feat(ci): Add Hetzner deployment for Core services
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 32s
CI / test-python-voice (push) Successful in 35s
CI / test-bqas (push) Successful in 34s
CI / deploy-hetzner (push) Successful in 3m29s
- docker-compose.hetzner.yml: Override for x86_64 (platform, ports,
  Ollama container for CPU embeddings, mailpit dummy, disabled services)
- CI: deploy-hetzner job using helper-container pattern
- Services: postgres, valkey, qdrant, ollama, backend-core, consent-service,
  rag-service, embedding-service, health-aggregator

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 19:42:41 +01:00
Benjamin Admin
f9b475db8f fix: Ensure public/ dir exists in Docker build for levis-holzbau
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 38s
CI / test-python-voice (push) Successful in 35s
CI / test-bqas (push) Successful in 38s
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 10:06:54 +01:00
Benjamin Admin
0770ff499b feat: Add LEVIS Holzbau — Kinder-Holzwerk-Website (Port 3013)
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-consent (push) Successful in 39s
CI / test-python-voice (push) Successful in 37s
CI / test-bqas (push) Successful in 37s
Neue statische Website fuer Kinder (6-12 Jahre) mit 8 Holzprojekten,
SVG-Illustrationen, Sicherheitshinweisen und kindgerechtem Design.
Next.js 15 + Tailwind + Framer Motion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 10:03:21 +01:00
Benjamin Admin
32aade553d Switch MinIO from local to Hetzner Object Storage
Migrate rag-service S3 config from local MinIO (minio:9000) to
Hetzner Object Storage (nbg1.your-objectstorage.com) with HTTPS.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 14:07:26 +01:00
292 changed files with 53778 additions and 1706 deletions

227
.claude/AGENTS.go.md Normal file
View File

@@ -0,0 +1,227 @@
# AGENTS.go.md — Go Agent Rules
Applies to: `ai-compliance-sdk/` (Go/Gin service)
---
## NON-NEGOTIABLE: Pre-Push Checklist
**BEFORE every `git push`, run ALL of the following from the module root. A single failure blocks the push.**
```bash
# 1. Format (gofmt is non-negotiable — unformatted code fails CI)
gofmt -l . | grep -q . && echo "FORMATTING ERRORS — run: gofmt -w ." && exit 1 || true
# 2. Vet (catches suspicious code that compiles but is likely wrong)
go vet ./...
# 3. Lint (golangci-lint aggregates 50+ linters — the de-facto standard)
golangci-lint run --timeout=5m ./...
# 4. Tests with race detector
go test -race -count=1 ./...
# 5. Build verification (catches import errors, missing implementations)
go build ./...
```
**One-liner pre-push gate:**
```bash
gofmt -l . | grep -q . && exit 1; go vet ./... && golangci-lint run --timeout=5m && go test -race -count=1 ./... && go build ./...
```
### Why each check matters
| Check | Catches | Time |
|-------|---------|------|
| `gofmt` | Formatting violations (CI rejects unformatted code) | <1s |
| `go vet` | Printf format mismatches, unreachable code, shadowed vars | <5s |
| `golangci-lint` | 50+ static analysis checks (errcheck, staticcheck, etc.) | 10-30s |
| `go test -race` | Race conditions (invisible without this flag) | 10-60s |
| `go build` | Import errors, interface mismatches | <5s |
---
## golangci-lint Configuration
Config lives in `.golangci.yml` at the repo root. Minimum required linters:
```yaml
linters:
enable:
- errcheck # unchecked errors are bugs
- gosimple # code simplification
- govet # go vet findings
- ineffassign # useless assignments
- staticcheck # advanced static analysis (SA*, S*, QF*)
- unused # unused code
- gofmt # formatting
- goimports # import organization
- gocritic # opinionated style checks
- noctx # HTTP requests without context
- bodyclose # unclosed HTTP response bodies
- exhaustive # exhaustive switch on enums
- wrapcheck # errors from external packages must be wrapped
linters-settings:
errcheck:
check-blank: true # blank identifier for errors is a bug
govet:
enable-all: true
issues:
max-issues-per-linter: 0
max-same-issues: 0
```
**Never suppress with `//nolint:` without a comment explaining why it's safe.**
---
## Code Structure (Hexagonal Architecture)
```
ai-compliance-sdk/
├── cmd/
│ └── server/main.go # thin: parse flags, wire deps, call app.Run()
├── internal/
│ ├── app/ # dependency wiring
│ ├── domain/ # pure business logic, no framework deps
│ ├── ports/ # interfaces (repositories, external services)
│ ├── adapters/
│ │ ├── http/ # Gin handlers (≤30 LOC per handler)
│ │ ├── postgres/ # DB adapters implementing ports
│ │ └── external/ # third-party API clients
│ └── services/ # orchestration between domain + ports
└── pkg/ # exported, reusable packages
```
**Handler constraint — max 30 lines per handler:**
```go
func (h *RiskHandler) GetRisk(c *gin.Context) {
id, err := uuid.Parse(c.Param("id"))
if err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": "invalid id"})
return
}
risk, err := h.service.Get(c.Request.Context(), id)
if err != nil {
h.handleError(c, err)
return
}
c.JSON(http.StatusOK, risk)
}
```
---
## Error Handling
```go
// REQUIRED: wrap errors with context
if err != nil {
return fmt.Errorf("get risk %s: %w", id, err)
}
// REQUIRED: define sentinel errors in domain package
var ErrNotFound = errors.New("not found")
var ErrUnauthorized = errors.New("unauthorized")
// REQUIRED: check errors — never use _ for error returns
result, err := service.Do(ctx, input)
if err != nil {
// handle it
}
```
**`errcheck` linter enforces this — zero tolerance for unchecked errors.**
---
## Testing Requirements
```
internal/
├── domain/
│ ├── risk.go
│ └── risk_test.go # unit: pure functions, no I/O
├── adapters/
│ ├── http/
│ │ ├── handler.go
│ │ └── handler_test.go # httptest-based, mock service
│ └── postgres/
│ ├── repo.go
│ └── repo_test.go # integration: testcontainers or real DB
```
**Test naming convention:**
```go
func TestRiskService_Get_ReturnsRisk(t *testing.T) {}
func TestRiskService_Get_NotFound_ReturnsError(t *testing.T) {}
func TestRiskService_Get_DBError_WrapsError(t *testing.T) {}
```
**Table-driven tests are mandatory for functions with multiple cases:**
```go
func TestValidateInput(t *testing.T) {
tests := []struct {
name string
input string
wantErr bool
}{
{"valid", "ok", false},
{"empty", "", true},
{"too long", strings.Repeat("x", 300), true},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
err := validateInput(tt.input)
if (err != nil) != tt.wantErr {
t.Errorf("got err=%v, wantErr=%v", err, tt.wantErr)
}
})
}
}
```
```bash
# Pre-push: unit tests only (fast)
go test -race -count=1 -run "^TestUnit" ./...
# CI: all tests
go test -race -count=1 -coverprofile=coverage.out ./...
go tool cover -func=coverage.out | grep total
```
---
## Context Propagation
Every function that does I/O (DB, HTTP, file) **must** accept and pass `context.Context` as the first argument:
```go
// REQUIRED
func (r *RiskRepo) Get(ctx context.Context, id uuid.UUID) (*Risk, error) {
return r.db.QueryRowContext(ctx, query, id).Scan(...)
}
// FORBIDDEN — no context
func (r *RiskRepo) Get(id uuid.UUID) (*Risk, error) { ... }
```
`noctx` linter enforces HTTP client context. Manual review required for DB calls.
---
## Common Pitfalls That Break CI
| Pitfall | Prevention |
|---------|------------|
| Unformatted code | `gofmt -w .` before commit |
| Unchecked error return from `rows.Close()` / `resp.Body.Close()` | `errcheck` + `bodyclose` linters |
| Goroutine leak (goroutine started but never stopped) | `-race` test flag |
| Shadowed `err` variable in nested scope | `govet -shadow` |
| HTTP response body not closed | `bodyclose` linter |
| `interface{}` instead of `any` (Go 1.18+) | `gocritic` |
| Missing context on DB/HTTP calls | `noctx` linter |
| Returning concrete type from constructor instead of interface | breaks testability |

157
.claude/AGENTS.python.md Normal file
View File

@@ -0,0 +1,157 @@
# AGENTS.python.md — Python Agent Rules
Applies to: `backend-compliance/`, `ai-compliance-sdk/` (Python path), `compliance-tts-service/`, `document-crawler/`, `dsms-gateway/` (Python services)
---
## NON-NEGOTIABLE: Pre-Push Checklist
**BEFORE every `git push`, run ALL of the following from the service directory. A single failure blocks the push.**
```bash
# 1. Fast lint (Ruff — catches syntax errors, unused imports, style violations)
ruff check .
# 2. Auto-fix safe issues, then re-check
ruff check --fix . && ruff check .
# 3. Type checking (mypy strict on new modules, standard on legacy)
mypy . --ignore-missing-imports --no-error-summary
# 4. Unit tests only (fast, no external deps)
pytest tests/unit/ -x -q --no-header
# 5. Verify the service starts (catches import errors, missing env vars with defaults)
python -c "import app" 2>/dev/null || python -c "import main" 2>/dev/null || true
```
**One-liner pre-push gate (run from service root):**
```bash
ruff check . && mypy . --ignore-missing-imports --no-error-summary && pytest tests/ -x -q --no-header
```
### Why each check matters
| Check | Catches | Time |
|-------|---------|------|
| `ruff check` | Syntax errors, unused imports, undefined names | <2s |
| `mypy` | Type mismatches, wrong argument types | 5-15s |
| `pytest -x` | Logic errors, regressions | 10-60s |
| import check | Missing packages, circular imports | <1s |
---
## Code Style (Ruff)
Config lives in `pyproject.toml`. Do **not** add per-file `# noqa` suppressions without a comment explaining why.
```toml
[tool.ruff]
line-length = 100
target-version = "py311"
[tool.ruff.lint]
select = ["E", "F", "W", "I", "N", "UP", "B", "C4", "SIM", "TCH"]
ignore = ["E501"] # line length handled by formatter
[tool.ruff.lint.per-file-ignores]
"tests/*" = ["S101"] # assert is fine in tests
```
**Blocked patterns:**
- `from module import *` — always name imports explicitly
- Bare `except:` — use `except Exception as e:` at minimum
- `print()` in production code — use `logger`
- Mutable default arguments: `def f(x=[])``def f(x=None)`
---
## Type Annotations
All new functions **must** have complete type annotations. Use `from __future__ import annotations` for forward references.
```python
# Required
async def get_tenant(tenant_id: str, db: AsyncSession) -> TenantModel | None:
...
# Required for complex types
from typing import Sequence
def list_risks(filters: dict[str, str]) -> Sequence[RiskModel]:
...
```
**Mypy rules:**
- `--disallow-untyped-defs` on new files
- `--strict` on new modules (not legacy)
- Never use `type: ignore` without a comment
---
## FastAPI-Specific Rules
```python
# Handlers stay thin — delegate to service layer
@router.get("/risks/{risk_id}", response_model=RiskResponse)
async def get_risk(risk_id: UUID, service: RiskService = Depends(get_risk_service)):
return await service.get(risk_id) # ≤5 lines per handler
# Always use response_model — never return raw dicts from endpoints
# Always validate input with Pydantic — no manual dict parsing
# Use HTTPException with specific status codes, never bare 500
```
---
## Testing Requirements
```
tests/
├── unit/ # Pure logic tests, no DB/HTTP (run on every push)
├── integration/ # Requires running services (run in CI only)
└── contracts/ # OpenAPI snapshot tests (run on API changes)
```
**Unit test requirements:**
- Every new function → at least one happy-path test
- Every bug fix → regression test that would have caught it
- Mock all I/O: DB calls, HTTP calls, filesystem reads
```bash
# Run unit tests only (fast, for pre-push)
pytest tests/unit/ -x -q
# Run with coverage (for CI)
pytest tests/ --cov=. --cov-report=term-missing --cov-fail-under=70
```
---
## Dependency Management
```bash
# Check new package license before adding
pip show <package> | grep -E "License|Home-page"
# After adding to requirements.txt — verify no GPL/AGPL
pip-licenses --fail-on="GPL;AGPL" 2>/dev/null || echo "Check licenses manually"
```
**Never add:**
- GPL/AGPL licensed packages
- Packages with known CVEs (`pip audit`)
- Packages that only exist for dev (`pytest`, `ruff`) to production requirements
---
## Common Pitfalls That Break CI
| Pitfall | Prevention |
|---------|------------|
| `const x = ...` inside dict literal (wrong language!) | Run ruff before push |
| Pydantic v1 syntax in v2 project | Use `model_config`, not `class Config` |
| Sync function called inside async without `run_in_executor` | mypy + async linter |
| Missing `await` on coroutine | mypy catches this |
| `datetime.utcnow()` (deprecated) | Use `datetime.now(timezone.utc)` |
| Bare `except:` swallowing errors silently | ruff B001/E722 catches this |
| Unused imports left in committed code | ruff F401 catches this |

View File

@@ -0,0 +1,186 @@
# AGENTS.typescript.md — TypeScript/Next.js Agent Rules
Applies to: `pitch-deck/`, `admin-v2/` (Next.js apps in this repo)
---
## NON-NEGOTIABLE: Pre-Push Checklist
**BEFORE every `git push`, run ALL of the following from the Next.js app directory. A single failure blocks the push.**
```bash
# 1. Type check (catches the class of bug that broke ChatFAB.tsx — const inside object)
npx tsc --noEmit
# 2. Lint (ESLint with TypeScript-aware rules)
npm run lint
# 3. Production build (THE most important check — passes lint/types but still fails build)
npm run build
```
**One-liner pre-push gate:**
```bash
npx tsc --noEmit && npm run lint && npm run build
```
> **Why `npm run build` is mandatory:** Next.js performs additional checks during build (server component boundaries, missing env vars referenced in code, RSC/client component violations) that `tsc` and ESLint alone do not catch. The ChatFAB syntax error (`const` inside object literal) is exactly the kind of error caught only by build.
### Why each check matters
| Check | Catches | Time |
|-------|---------|------|
| `tsc --noEmit` | Type errors, wrong prop types, missing members | 5-20s |
| `eslint` | React hooks rules, import order, unused vars | 5-15s |
| `next build` | Server/client boundary violations, missing deps, syntax errors in JSX, env var issues | 30-120s |
---
## TypeScript Configuration
`tsconfig.json` must have strict mode enabled:
```json
{
"compilerOptions": {
"strict": true,
"noImplicitAny": true,
"strictNullChecks": true,
"noUncheckedIndexedAccess": true,
"exactOptionalPropertyTypes": true,
"noUnusedLocals": true,
"noUnusedParameters": true,
"noFallthroughCasesInSwitch": true
}
}
```
**Never use `// @ts-ignore` or `// @ts-expect-error` without a comment explaining why it's unavoidable.**
---
## ESLint Configuration
```json
{
"extends": [
"next/core-web-vitals",
"plugin:@typescript-eslint/recommended-type-checked"
],
"rules": {
"@typescript-eslint/no-explicit-any": "error",
"@typescript-eslint/no-unused-vars": "error",
"@typescript-eslint/no-floating-promises": "error",
"@typescript-eslint/await-thenable": "error",
"react-hooks/exhaustive-deps": "error",
"no-console": "warn"
}
}
```
**`@typescript-eslint/no-floating-promises`** — catches `await`-less async calls that silently swallow errors.
**`react-hooks/exhaustive-deps`** — catches missing deps in `useEffect`/`useCallback` (source of stale closure bugs).
---
## Next.js 15 Rules (App Router)
### Server vs Client boundary
```typescript
// Server Component (default) — no 'use client' needed
// Can: fetch data, access DB, read env vars, import server-only packages
async function Page() {
const data = await fetchData() // direct async/await
return <ClientComponent data={data} />
}
// Client Component — must have 'use client' at top
'use client'
// Can: use hooks, handle events, access browser APIs
// Cannot: import server-only packages (nodemailer, fs, db pool)
```
**Common violation:** Importing `lib/email.ts` (which imports nodemailer) from a client component → use `lib/email-templates.ts` instead.
### Route Handler typing
```typescript
// Always type request and use NextResponse
export async function GET(request: Request): Promise<NextResponse> {
const { searchParams } = new URL(request.url)
return NextResponse.json({ data })
}
```
### Environment variables
```typescript
// Server-only env vars: access directly
const secret = process.env.PITCH_ADMIN_SECRET // fine in server components
// Client env vars: must be prefixed NEXT_PUBLIC_
const url = process.env.NEXT_PUBLIC_API_URL // accessible in browser
// Never access server-only env vars in 'use client' components
```
---
## Component Architecture
```
app/
├── (route-group)/
│ ├── page.tsx # Server Component — data fetching
│ └── _components/ # Colocated components for this route
│ ├── ClientThing.tsx # 'use client' when needed
│ └── ServerThing.tsx # Server by default
components/
│ └── ui/ # Shared presentational components
lib/
│ ├── server-only-module.ts # import 'server-only' at top
│ └── shared-module.ts # safe for both server and client
```
**Rules:**
- Push `'use client'` boundary as deep as possible (toward leaves)
- Never import server-only modules from client components
- Colocate `_components/` and `_hooks/` per route when they're route-specific
---
## Testing Requirements
```bash
# Type check (fastest, run first)
npx tsc --noEmit
# Unit tests (Vitest)
npx vitest run
# E2E tests (Playwright — CI only, requires running server)
npx playwright test
```
**Test every:**
- Custom hook (`usePresenterMode`, `useSlideNavigation`)
- Utility function (`lib/auth.ts` helpers, `lib/email-templates.ts`)
- API route handler (mock DB, assert response shape)
---
## Common Pitfalls That Break CI
| Pitfall | Prevention |
|---------|------------|
| `const x = ...` inside object literal | `tsc --noEmit` + `npm run build` |
| Server-only import in client component | `import 'server-only'` guard + ESLint |
| Missing `await` on async function call | `@typescript-eslint/no-floating-promises` |
| `useEffect` with missing dependency | `react-hooks/exhaustive-deps` error |
| `any` type hiding type errors | `@typescript-eslint/no-explicit-any` error |
| Unused variable left after refactor | `noUnusedLocals` in tsconfig |
| `process.env.SECRET` in client component | Next.js build error |
| Forgetting `export default` on page component | Next.js build error |
| Calling server action from server component | must use route handler instead |
| `jose` full import in Edge Runtime | Use specific subpath: `jose/jwt/verify` |

View File

@@ -2,28 +2,53 @@
## Entwicklungsumgebung (WICHTIG - IMMER ZUERST LESEN)
### Zwei-Rechner-Setup
### Zwei-Rechner-Setup + Orca
| Geraet | Rolle | Aufgaben |
|--------|-------|----------|
| **MacBook** | Entwicklung | Claude Terminal, Code-Entwicklung, Browser (Frontend-Tests) |
| **Mac Mini** | Server | Docker, alle Services, Tests, Builds, Deployment |
| **Mac Mini** | Lokaler Server | Docker fuer lokale Dev/Tests (NICHT fuer Production!) |
| **Orca** | Production | Automatisches Build + Deploy bei Push auf gitea |
**WICHTIG:** Code wird direkt auf dem MacBook in diesem Repo bearbeitet. Docker und Services laufen auf dem Mac Mini.
**WICHTIG:** Code wird direkt auf dem MacBook in diesem Repo bearbeitet. Production-Deployment laeuft automatisch ueber Orca.
### Entwicklungsworkflow
### Entwicklungsworkflow (CI/CD — Orca)
```bash
# 1. Code auf MacBook bearbeiten (dieses Verzeichnis)
# 2. Committen und pushen:
git push origin main && git push gitea main
# 2. Committen und zu BEIDEN Remotes pushen:
git push origin main
# 3. Auf Mac Mini pullen und Container neu bauen:
# 3. FERTIG! Push auf gitea triggert automatisch:
# - Gitea Actions: Tests
# - Orca: Build → Deploy
```
**NIEMALS** manuell in Orca auf "Redeploy" klicken — Gitea Actions triggert Orca automatisch.
**IMMER auf `main` pushen** — sowohl origin als auch gitea.
### Post-Push Deploy-Monitoring (PFLICHT nach jedem Push auf gitea)
**IMMER wenn Claude auf gitea pusht, MUSS danach automatisch das Deploy-Monitoring laufen:**
1. Dem User sofort mitteilen: "Deploy gestartet, ich ueberwache den Status..."
2. Im Hintergrund Health-Checks pollen (alle 20 Sekunden, max 5 Minuten):
```bash
curl -sf https://api-dev.breakpilot.ai/health # Compliance Backend
curl -sf https://sdk-dev.breakpilot.ai/health # AI SDK
```
3. Sobald ALLE Endpoints healthy sind, dem User im Chat melden:
**"Deploy abgeschlossen! Du kannst jetzt testen."**
4. Falls nach 5 Minuten noch nicht healthy → Fehlermeldung mit Hinweis auf Orca-Logs.
### Lokale Entwicklung (Mac Mini — optional, nur Dev/Tests)
```bash
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-core && git pull --no-rebase origin main"
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-core && /usr/local/bin/docker compose build --no-cache <service> && /usr/local/bin/docker compose up -d <service>"
```
### SSH-Verbindung (fuer Docker/Tests)
### SSH-Verbindung (fuer lokale Docker/Tests)
```bash
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-core && <cmd>"
@@ -51,6 +76,14 @@ networks:
name: breakpilot-network # Fixer Name, kein Auto-Prefix!
```
### Deployment-Modell
| Repo | Deployment | Trigger |
|------|-----------|---------|
| **breakpilot-core** | Orca (automatisch) | Push auf gitea main |
| **breakpilot-compliance** | Orca (automatisch) | Push auf gitea main |
| **breakpilot-lehrer** | Mac Mini (lokal) | Manuell docker compose |
---
## Haupt-URLs (via Nginx Reverse Proxy)
@@ -161,7 +194,7 @@ networks:
| `compliance` | Compliance | compliance_*, dsr, gdpr, sdk_tenants, consent_admin |
```bash
# DB-Zugang
# DB-Zugang (lokal)
ssh macmini "docker exec bp-core-postgres psql -U breakpilot -d breakpilot_db"
```
@@ -185,15 +218,45 @@ breakpilot-core/
├── gitea/ # Gitea Config
├── docs-src/ # MkDocs Quellen
├── mkdocs.yml # MkDocs Config
├── control-pipeline/ # RAG/Control Pipeline (Port 8098)
├── scripts/ # Helper Scripts
└── docker-compose.yml # Haupt-Compose (28+ Services)
```
---
## Control Pipeline (WICHTIG)
**Seit 2026-04-09 liegt die gesamte RAG/Control-Pipeline im Core-Repo** (`control-pipeline/`), NICHT mehr im Compliance-Repo. Alle Arbeiten an der Pipeline (Pass 0a/0b, BatchDedup, Control Generator, Enrichment) finden ausschliesslich hier statt.
- **Port:** 8098
- **Container:** bp-core-control-pipeline
- **DB:** Schreibt ins `compliance`-Schema der shared PostgreSQL
- **Das Compliance-Repo wird NICHT fuer Pipeline-Aenderungen benutzt**
```bash
# Container auf Mac Mini
ssh macmini "cd ~/Projekte/breakpilot-core && /usr/local/bin/docker compose build --no-cache control-pipeline && /usr/local/bin/docker compose up -d --no-deps control-pipeline"
# Health
ssh macmini "/usr/local/bin/docker exec bp-core-control-pipeline curl -sf http://127.0.0.1:8098/health"
# Logs
ssh macmini "/usr/local/bin/docker logs -f bp-core-control-pipeline"
```
---
## Haeufige Befehle
### Docker
### Deployment (CI/CD — Standardweg)
```bash
# Committen und pushen → Orca deployt automatisch:
git push origin main
```
### Lokale Docker-Befehle (Mac Mini — nur Dev/Tests)
```bash
# Alle Core-Services starten
@@ -211,35 +274,50 @@ ssh macmini "/usr/local/bin/docker ps --filter name=bp-core"
**WICHTIG:** Docker-Pfad auf Mac Mini ist `/usr/local/bin/docker` (nicht im Standard-SSH-PATH).
### Alle 3 Projekte starten
```bash
# 1. Core (MUSS zuerst!)
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-core && /usr/local/bin/docker compose up -d"
# Warten auf Health:
ssh macmini "curl -sf http://127.0.0.1:8099/health"
# 2. Lehrer
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-lehrer && /usr/local/bin/docker compose up -d"
# 3. Compliance
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-compliance && /usr/local/bin/docker compose up -d"
```
### Git
```bash
# Zu BEIDEN Remotes pushen (PFLICHT!):
ssh macmini "cd /Users/benjaminadmin/Projekte/breakpilot-core && git push all main"
git push origin main
# Remotes:
# origin: lokale Gitea (macmini:3003)
# gitea: gitea.meghsakha.com
# all: beide gleichzeitig
```
---
## Pre-Push Checks (PFLICHT — VOR JEDEM PUSH)
> Full detail: `.claude/rules/pre-push-checks.md` | Stack rules: `AGENTS.python.md`, `AGENTS.go.md`, `AGENTS.typescript.md`
**NIEMALS pushen ohne diese Checks. CI-Failures blockieren das gesamte Deploy.**
### Python (backend-core, rag-service, embedding-service, control-pipeline)
```bash
cd <service-dir>
ruff check . && mypy . --ignore-missing-imports --no-error-summary && pytest tests/ -x -q --no-header
```
### Go (consent-service, billing-service)
```bash
cd <service-dir>
gofmt -l . | grep -q . && exit 1; go vet ./... && golangci-lint run --timeout=5m && go test -race ./... && go build ./...
```
### TypeScript/Next.js (pitch-deck, admin-v2)
```bash
cd pitch-deck # or admin-v2
npx tsc --noEmit && npm run lint && npm run build
```
> `npm run build` ist PFLICHT — `tsc` allein reicht nicht. Syntax-Fehler wie `const` inside object literal werden nur vom Build gefangen.
---
## Kernprinzipien
### 1. Open Source Policy

View File

@@ -0,0 +1,74 @@
# Pre-Push Checks (MANDATORY)
## Rule
**NEVER push to any remote without first running and confirming ALL checks pass for every changed language stack.**
This rule exists because CI failures break the deploy pipeline for everyone and waste ~5 minutes per failed build. A 60-second local check prevents that.
---
## Quick Reference by Stack
### Python (backend-compliance, ai-compliance-sdk, compliance-tts-service)
```bash
cd <service-dir>
ruff check . && mypy . --ignore-missing-imports --no-error-summary && pytest tests/ -x -q --no-header
```
Blocks on: syntax errors, type errors, failing tests.
### Go (ai-compliance-sdk Go path)
```bash
cd <service-dir>
gofmt -l . | grep -q . && exit 1; go vet ./... && golangci-lint run --timeout=5m && go test -race ./... && go build ./...
```
Blocks on: formatting, vet findings, lint violations, test failures, build errors.
### TypeScript/Next.js (admin-compliance, developer-portal)
```bash
cd <nextjs-app-dir>
npx tsc --noEmit && npm run lint && npm run build
```
Blocks on: type errors, lint violations, **build failures**.
> `npm run build` is mandatory — `tsc` passes but `next build` fails more often than you'd expect (server/client boundary violations, env var issues, JSX syntax errors).
---
## What Claude Must Do Before Every Push
1. Identify which services/apps were changed in this task
2. Run the appropriate gate command(s) from the table above
3. If any check fails: fix it, re-run, confirm green
4. Only then run `git push origin main`
**No exceptions.** A push that skips pre-push checks and breaks CI is worse than a delayed push.
---
## CI vs Local Checks
| Stage | Where | What |
|-------|-------|------|
| Pre-push (local) | Claude runs | Lint + type check + unit tests + build |
| CI (Gitea Actions) | Automatic on push | Same + integration tests + contract tests |
| Deploy (Orca) | Automatic after CI | Docker build + health check |
Local checks catch 90% of CI failures in seconds. CI is the safety net, not the first line of defense.
---
## Failures That Were Caused by Skipping Pre-Push Checks
- `ChatFAB.tsx`: `const textLang` inside fetch object literal — caught by `tsc --noEmit` and `npm run build`
- `nodemailer` webpack error: server-only import in client component — caught by `npm run build`
- `jose` Edge Runtime error: full package import — caught by `npm run build`
- `main.py` `<en>` tags spoken: missing `import re` — caught by `python -c "import main"`
These all caused a broken deploy. Each would have been caught in <60 seconds locally.

View File

@@ -0,0 +1,66 @@
# Build + push pitch-deck Docker image to registry.meghsakha.com
# and trigger orca redeploy on every push to main that touches pitch-deck/.
#
# Requires Gitea Actions secret: ORCA_WEBHOOK_SECRET
# (must match the `secret` field in ~/.orca/webhooks.json on the orca master)
name: Build pitch-deck
on:
push:
branches: [main]
paths:
- 'pitch-deck/**'
jobs:
build-push-deploy:
runs-on: docker
container:
image: docker:27-cli
steps:
- name: Checkout
run: |
apk add --no-cache git openssl curl
git clone --depth 1 --branch ${GITHUB_REF_NAME} ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}.git .
- name: Login to registry
env:
REGISTRY_USERNAME: ${{ secrets.REGISTRY_USERNAME }}
REGISTRY_PASSWORD: ${{ secrets.REGISTRY_PASSWORD }}
run: |
echo "$REGISTRY_PASSWORD" | docker login registry.meghsakha.com -u "$REGISTRY_USERNAME" --password-stdin
- name: Build image
run: |
cd pitch-deck
SHORT_SHA=$(git rev-parse --short HEAD)
docker build \
--build-arg GIT_SHA=${SHORT_SHA} \
-t registry.meghsakha.com/breakpilot/pitch-deck:latest \
-t registry.meghsakha.com/breakpilot/pitch-deck:${SHORT_SHA} \
.
- name: Push to registry
run: |
SHORT_SHA=$(git rev-parse --short HEAD)
docker push registry.meghsakha.com/breakpilot/pitch-deck:latest
docker push registry.meghsakha.com/breakpilot/pitch-deck:${SHORT_SHA}
echo "Pushed :latest + :${SHORT_SHA}"
- name: Trigger orca redeploy
env:
ORCA_WEBHOOK_SECRET: ${{ secrets.ORCA_WEBHOOK_SECRET }}
ORCA_WEBHOOK_URL: http://46.225.100.82:6880/api/v1/webhooks/github
run: |
SHA=$(git rev-parse HEAD)
PAYLOAD="{\"ref\":\"refs/heads/main\",\"repository\":{\"full_name\":\"${GITHUB_REPOSITORY}\"},\"head_commit\":{\"id\":\"$SHA\",\"message\":\"ci: pitch-deck image build\"}}"
SIG=$(printf '%s' "$PAYLOAD" | openssl dgst -sha256 -hmac "$ORCA_WEBHOOK_SECRET" -r | awk '{print $1}')
curl -sSf -k \
-X POST \
-H "Content-Type: application/json" \
-H "X-GitHub-Event: push" \
-H "X-Hub-Signature-256: sha256=$SIG" \
-d "$PAYLOAD" \
"$ORCA_WEBHOOK_URL" \
|| { echo "Orca redeploy failed"; exit 1; }
echo "Orca redeploy triggered"

View File

@@ -138,3 +138,8 @@ jobs:
pip install --quiet --no-cache-dir -r requirements.txt 2>/dev/null || true
pip install --quiet --no-cache-dir fastapi uvicorn pydantic pytest pytest-asyncio
python -m pytest tests/bqas/ -v --tb=short || true
# ========================================
# Deploys now handled by per-service workflows (e.g. build-pitch-deck.yml)
# which trigger orca webhooks directly after building + pushing the image.
# ========================================

View File

@@ -1,27 +0,0 @@
name: Deploy to Coolify
on:
push:
branches:
- coolify
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Deploy via Coolify API
run: |
echo "Deploying breakpilot-core to Coolify..."
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
-X POST \
-H "Authorization: Bearer ${{ secrets.COOLIFY_API_TOKEN }}" \
-H "Content-Type: application/json" \
-d '{"uuid": "${{ secrets.COOLIFY_RESOURCE_UUID }}", "force_rebuild": true}' \
"${{ secrets.COOLIFY_BASE_URL }}/api/v1/deploy")
echo "HTTP Status: $HTTP_STATUS"
if [ "$HTTP_STATUS" -ne 200 ] && [ "$HTTP_STATUS" -ne 201 ]; then
echo "Deployment failed with status $HTTP_STATUS"
exit 1
fi
echo "Deployment triggered successfully!"

1
.gitignore vendored
View File

@@ -7,6 +7,7 @@
secrets/
*.pem
*.key
.mcp.json
# Node
node_modules/

View File

@@ -0,0 +1,19 @@
FROM python:3.11-slim
WORKDIR /app
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8098
HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
CMD curl -f http://127.0.0.1:8098/health || exit 1
CMD ["python", "main.py"]

View File

@@ -0,0 +1,8 @@
from fastapi import APIRouter
from api.control_generator_routes import router as generator_router
from api.canonical_control_routes import router as canonical_router
router = APIRouter()
router.include_router(generator_router)
router.include_router(canonical_router)

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,67 @@
import os
class Settings:
"""Environment-based configuration for control-pipeline."""
# Database (compliance schema)
DATABASE_URL: str = os.getenv(
"DATABASE_URL",
"postgresql://breakpilot:breakpilot123@localhost:5432/breakpilot_db",
)
SCHEMA_SEARCH_PATH: str = os.getenv(
"SCHEMA_SEARCH_PATH", "compliance,core,public"
)
# Qdrant (vector search for dedup)
QDRANT_URL: str = os.getenv("QDRANT_URL", "http://localhost:6333")
QDRANT_API_KEY: str = os.getenv("QDRANT_API_KEY", "")
# Embedding Service
EMBEDDING_SERVICE_URL: str = os.getenv(
"EMBEDDING_SERVICE_URL", "http://embedding-service:8087"
)
# LLM - Anthropic
ANTHROPIC_API_KEY: str = os.getenv("ANTHROPIC_API_KEY", "")
CONTROL_GEN_ANTHROPIC_MODEL: str = os.getenv(
"CONTROL_GEN_ANTHROPIC_MODEL", "claude-sonnet-4-6"
)
DECOMPOSITION_LLM_MODEL: str = os.getenv(
"DECOMPOSITION_LLM_MODEL", "claude-haiku-4-5-20251001"
)
CONTROL_GEN_LLM_TIMEOUT: int = int(
os.getenv("CONTROL_GEN_LLM_TIMEOUT", "180")
)
# LLM - Ollama (fallback)
OLLAMA_URL: str = os.getenv(
"OLLAMA_URL", "http://host.docker.internal:11434"
)
CONTROL_GEN_OLLAMA_MODEL: str = os.getenv(
"CONTROL_GEN_OLLAMA_MODEL", "qwen3.5:35b-a3b"
)
# SDK Service (for RAG search proxy)
SDK_URL: str = os.getenv(
"SDK_URL", "http://ai-compliance-sdk:8090"
)
# Auth
JWT_SECRET: str = os.getenv("JWT_SECRET", "")
# Server
PORT: int = int(os.getenv("PORT", "8098"))
LOG_LEVEL: str = os.getenv("LOG_LEVEL", "INFO")
ENVIRONMENT: str = os.getenv("ENVIRONMENT", "development")
# Pipeline
DECOMPOSITION_BATCH_SIZE: int = int(
os.getenv("DECOMPOSITION_BATCH_SIZE", "5")
)
DECOMPOSITION_LLM_TIMEOUT: int = int(
os.getenv("DECOMPOSITION_LLM_TIMEOUT", "120")
)
settings = Settings()

View File

View File

@@ -0,0 +1,205 @@
"""
Source-Type-Klassifikation fuer Regulierungen und Frameworks.
Dreistufiges Modell der normativen Verbindlichkeit:
Stufe 1 — GESETZ (law):
Rechtlich bindend. Bussgeld bei Verstoss.
Beispiele: DSGVO, NIS2, AI Act, CRA
Stufe 2 — LEITLINIE (guideline):
Offizielle Auslegungshilfe von Aufsichtsbehoerden.
Beweislastumkehr: Wer abweicht, muss begruenden warum.
Beispiele: EDPB-Leitlinien, BSI-Standards, WP29-Dokumente
Stufe 3 — FRAMEWORK (framework):
Freiwillige Best Practices, nicht rechtsverbindlich.
Aber: Koennen als "Stand der Technik" herangezogen werden.
Beispiele: ENISA, NIST, OWASP, OECD, CISA
Mapping: source_regulation (aus control_parent_links) -> source_type
"""
# --- Typ-Definitionen ---
SOURCE_TYPE_LAW = "law" # Gesetz/Verordnung/Richtlinie — normative_strength bleibt
SOURCE_TYPE_GUIDELINE = "guideline" # Leitlinie/Standard — max "should"
SOURCE_TYPE_FRAMEWORK = "framework" # Framework/Best Practice — max "may"
# Max erlaubte normative_strength pro source_type
# DB-Constraint erlaubt: must, should, may (NICHT "can")
NORMATIVE_STRENGTH_CAP: dict[str, str] = {
SOURCE_TYPE_LAW: "must", # keine Begrenzung
SOURCE_TYPE_GUIDELINE: "should", # max "should"
SOURCE_TYPE_FRAMEWORK: "may", # max "may" (= "kann")
}
# Reihenfolge fuer Vergleiche (hoeher = staerker)
STRENGTH_ORDER: dict[str, int] = {
"may": 1, # KANN (DB-Wert)
"can": 1, # Alias — wird in cap_normative_strength zu "may" normalisiert
"should": 2,
"must": 3,
}
def cap_normative_strength(original: str, source_type: str) -> str:
"""
Begrenzt die normative_strength basierend auf dem source_type.
Beispiel:
cap_normative_strength("must", "framework") -> "may"
cap_normative_strength("should", "law") -> "should"
cap_normative_strength("must", "guideline") -> "should"
"""
cap = NORMATIVE_STRENGTH_CAP.get(source_type, "must")
cap_level = STRENGTH_ORDER.get(cap, 3)
original_level = STRENGTH_ORDER.get(original, 3)
if original_level > cap_level:
return cap
return original
def get_highest_source_type(source_types: list[str]) -> str:
"""
Bestimmt den hoechsten source_type aus einer Liste.
Ein Gesetz uebertrumpft alles.
Beispiel:
get_highest_source_type(["framework", "law"]) -> "law"
get_highest_source_type(["framework", "guideline"]) -> "guideline"
"""
type_order = {SOURCE_TYPE_FRAMEWORK: 1, SOURCE_TYPE_GUIDELINE: 2, SOURCE_TYPE_LAW: 3}
if not source_types:
return SOURCE_TYPE_FRAMEWORK
return max(source_types, key=lambda t: type_order.get(t, 0))
# ============================================================================
# Klassifikation: source_regulation -> source_type
#
# Diese Map wird fuer den Backfill und zukuenftige Pipeline-Runs verwendet.
# Neue Regulierungen hier eintragen!
# ============================================================================
SOURCE_REGULATION_CLASSIFICATION: dict[str, str] = {
# --- EU-Verordnungen (unmittelbar bindend) ---
"DSGVO (EU) 2016/679": SOURCE_TYPE_LAW,
"KI-Verordnung (EU) 2024/1689": SOURCE_TYPE_LAW,
"Cyber Resilience Act (CRA)": SOURCE_TYPE_LAW,
"NIS2-Richtlinie (EU) 2022/2555": SOURCE_TYPE_LAW,
"Data Act": SOURCE_TYPE_LAW,
"Data Governance Act (DGA)": SOURCE_TYPE_LAW,
"Markets in Crypto-Assets (MiCA)": SOURCE_TYPE_LAW,
"Maschinenverordnung (EU) 2023/1230": SOURCE_TYPE_LAW,
"Batterieverordnung (EU) 2023/1542": SOURCE_TYPE_LAW,
"AML-Verordnung": SOURCE_TYPE_LAW,
# --- EU-Richtlinien (nach nationaler Umsetzung bindend) ---
# Fuer Compliance-Zwecke wie Gesetze behandeln
# --- Nationale Gesetze ---
"Bundesdatenschutzgesetz (BDSG)": SOURCE_TYPE_LAW,
"Telekommunikationsgesetz": SOURCE_TYPE_LAW,
"Telekommunikationsgesetz Oesterreich": SOURCE_TYPE_LAW,
"Gewerbeordnung (GewO)": SOURCE_TYPE_LAW,
"Handelsgesetzbuch (HGB)": SOURCE_TYPE_LAW,
"Abgabenordnung (AO)": SOURCE_TYPE_LAW,
"IFRS-Übernahmeverordnung": SOURCE_TYPE_LAW,
"Österreichisches Datenschutzgesetz (DSG)": SOURCE_TYPE_LAW,
"LOPDGDD - Ley Orgánica de Protección de Datos (Spanien)": SOURCE_TYPE_LAW,
"Loi Informatique et Libertés (Frankreich)": SOURCE_TYPE_LAW,
"Információs önrendelkezési jog törvény (Ungarn)": SOURCE_TYPE_LAW,
"EU Blue Guide 2022": SOURCE_TYPE_LAW,
# --- EDPB/WP29 Leitlinien (offizielle Auslegungshilfe) ---
"EDPB Leitlinien 01/2019 (Zertifizierung)": SOURCE_TYPE_GUIDELINE,
"EDPB Leitlinien 01/2020 (Datentransfers)": SOURCE_TYPE_GUIDELINE,
"EDPB Leitlinien 01/2020 (Vernetzte Fahrzeuge)": SOURCE_TYPE_GUIDELINE,
"EDPB Leitlinien 01/2022 (BCR)": SOURCE_TYPE_GUIDELINE,
"EDPB Leitlinien 01/2024 (Berechtigtes Interesse)": SOURCE_TYPE_GUIDELINE,
"EDPB Leitlinien 04/2019 (Data Protection by Design)": SOURCE_TYPE_GUIDELINE,
"EDPB Leitlinien 05/2020 - Einwilligung": SOURCE_TYPE_GUIDELINE,
"EDPB Leitlinien 07/2020 (Datentransfers)": SOURCE_TYPE_GUIDELINE,
"EDPB Leitlinien 08/2020 (Social Media)": SOURCE_TYPE_GUIDELINE,
"EDPB Leitlinien 09/2022 (Data Breach)": SOURCE_TYPE_GUIDELINE,
"EDPB Leitlinien 09/2022 - Meldung von Datenschutzverletzungen": SOURCE_TYPE_GUIDELINE,
"EDPB Empfehlungen 01/2020 - Ergaenzende Massnahmen fuer Datentransfers": SOURCE_TYPE_GUIDELINE,
"EDPB Leitlinien - Berechtigtes Interesse (Art. 6(1)(f))": SOURCE_TYPE_GUIDELINE,
"WP244 Leitlinien (Profiling)": SOURCE_TYPE_GUIDELINE,
"WP251 Leitlinien (Profiling)": SOURCE_TYPE_GUIDELINE,
"WP260 Leitlinien (Transparenz)": SOURCE_TYPE_GUIDELINE,
# --- BSI Standards (behoerdliche technische Richtlinien) ---
"BSI-TR-03161-1": SOURCE_TYPE_GUIDELINE,
"BSI-TR-03161-2": SOURCE_TYPE_GUIDELINE,
"BSI-TR-03161-3": SOURCE_TYPE_GUIDELINE,
# --- ENISA (EU-Agentur, aber Empfehlungen nicht rechtsverbindlich) ---
"ENISA Cybersecurity State 2024": SOURCE_TYPE_FRAMEWORK,
"ENISA ICS/SCADA Dependencies": SOURCE_TYPE_FRAMEWORK,
"ENISA Supply Chain Good Practices": SOURCE_TYPE_FRAMEWORK,
"ENISA Threat Landscape Supply Chain": SOURCE_TYPE_FRAMEWORK,
# --- NIST (US-Standards, international als Best Practice) ---
"NIST AI Risk Management Framework": SOURCE_TYPE_FRAMEWORK,
"NIST Cybersecurity Framework 2.0": SOURCE_TYPE_FRAMEWORK,
"NIST SP 800-207 (Zero Trust)": SOURCE_TYPE_FRAMEWORK,
"NIST SP 800-218 (SSDF)": SOURCE_TYPE_FRAMEWORK,
"NIST SP 800-53 Rev. 5": SOURCE_TYPE_FRAMEWORK,
"NIST SP 800-63-3": SOURCE_TYPE_FRAMEWORK,
# --- OWASP (Community-Standards) ---
"OWASP API Security Top 10 (2023)": SOURCE_TYPE_FRAMEWORK,
"OWASP ASVS 4.0": SOURCE_TYPE_FRAMEWORK,
"OWASP MASVS 2.0": SOURCE_TYPE_FRAMEWORK,
"OWASP SAMM 2.0": SOURCE_TYPE_FRAMEWORK,
"OWASP Top 10 (2021)": SOURCE_TYPE_FRAMEWORK,
# --- Sonstige Frameworks ---
"OECD KI-Empfehlung": SOURCE_TYPE_FRAMEWORK,
"CISA Secure by Design": SOURCE_TYPE_FRAMEWORK,
}
def classify_source_regulation(source_regulation: str) -> str:
"""
Klassifiziert eine source_regulation als law, guideline oder framework.
Verwendet exaktes Matching gegen die Map. Bei unbekannten Quellen
wird anhand von Schluesselwoertern geraten, Fallback ist 'framework'
(konservativstes Ergebnis).
"""
if not source_regulation:
return SOURCE_TYPE_FRAMEWORK
# Exaktes Match
if source_regulation in SOURCE_REGULATION_CLASSIFICATION:
return SOURCE_REGULATION_CLASSIFICATION[source_regulation]
# Heuristik fuer unbekannte Quellen
lower = source_regulation.lower()
# Gesetze erkennen
law_indicators = [
"verordnung", "richtlinie", "gesetz", "directive", "regulation",
"(eu)", "(eg)", "act", "ley", "loi", "törvény", "código",
]
if any(ind in lower for ind in law_indicators):
return SOURCE_TYPE_LAW
# Leitlinien erkennen
guideline_indicators = [
"edpb", "leitlinie", "guideline", "wp2", "bsi", "empfehlung",
]
if any(ind in lower for ind in guideline_indicators):
return SOURCE_TYPE_GUIDELINE
# Frameworks erkennen
framework_indicators = [
"enisa", "nist", "owasp", "oecd", "cisa", "framework", "iso",
]
if any(ind in lower for ind in framework_indicators):
return SOURCE_TYPE_FRAMEWORK
# Konservativ: unbekannt = framework (geringste Verbindlichkeit)
return SOURCE_TYPE_FRAMEWORK

View File

View File

@@ -0,0 +1,37 @@
"""Database session factory for control-pipeline.
Connects to the shared PostgreSQL with search_path set to compliance schema.
"""
from sqlalchemy import create_engine, event
from sqlalchemy.orm import sessionmaker
from config import settings
engine = create_engine(
settings.DATABASE_URL,
pool_pre_ping=True,
pool_size=5,
max_overflow=10,
echo=False,
)
@event.listens_for(engine, "connect")
def set_search_path(dbapi_connection, connection_record):
cursor = dbapi_connection.cursor()
cursor.execute(f"SET search_path TO {settings.SCHEMA_SEARCH_PATH}")
cursor.close()
dbapi_connection.commit()
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
def get_db():
"""FastAPI dependency for DB sessions."""
db = SessionLocal()
try:
yield db
finally:
db.close()

88
control-pipeline/main.py Normal file
View File

@@ -0,0 +1,88 @@
import logging
from contextlib import asynccontextmanager
import uvicorn
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from config import settings
from db.session import engine
logging.basicConfig(
level=getattr(logging, settings.LOG_LEVEL, logging.INFO),
format="%(asctime)s [%(name)s] %(levelname)s: %(message)s",
)
logger = logging.getLogger("control-pipeline")
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Startup: verify DB and Qdrant connectivity."""
logger.info("Control-Pipeline starting up ...")
# Verify database connection
try:
with engine.connect() as conn:
conn.execute(__import__("sqlalchemy").text("SELECT 1"))
logger.info("Database connection OK")
except Exception as exc:
logger.error("Database connection failed: %s", exc)
yield
logger.info("Control-Pipeline shutting down ...")
app = FastAPI(
title="BreakPilot Control Pipeline",
description="Control generation, decomposition, and deduplication pipeline for the BreakPilot compliance platform.",
version="1.0.0",
lifespan=lifespan,
)
# CORS
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Routers
from api import router as api_router # noqa: E402
app.include_router(api_router)
# Health
@app.get("/health")
async def health():
"""Liveness probe."""
db_ok = False
try:
with engine.connect() as conn:
conn.execute(__import__("sqlalchemy").text("SELECT 1"))
db_ok = True
except Exception:
pass
status = "healthy" if db_ok else "degraded"
return {
"status": status,
"service": "control-pipeline",
"version": "1.0.0",
"dependencies": {
"postgres": "ok" if db_ok else "unavailable",
},
}
if __name__ == "__main__":
uvicorn.run(
"main:app",
host="0.0.0.0",
port=settings.PORT,
reload=False,
log_level="info",
)

View File

@@ -0,0 +1,22 @@
# Web Framework
fastapi>=0.123.0
uvicorn[standard]>=0.27.0
# Database
SQLAlchemy>=2.0.36
psycopg2-binary>=2.9.10
# HTTP Client
httpx>=0.28.0
# Validation
pydantic>=2.5.0
# AI - Anthropic Claude
anthropic>=0.75.0
# Vector DB (dedup)
qdrant-client>=1.7.0
# Auth
python-jose[cryptography]>=3.3.0

View File

@@ -0,0 +1,219 @@
"""
Import compliance backup into local PostgreSQL.
Fixes Python-style lists/dicts in JSONB fields to valid JSON.
"""
import ast
import gzip
import json
import re
import sys
import psycopg2
DB_URL = "postgresql://breakpilot:breakpilot123@localhost:5432/breakpilot_db"
BACKUP_PATH = "/tmp/compliance-db-2026-03-28_16-25-19.sql.gz"
# Tables with JSONB columns that need Python→JSON conversion
JSONB_TABLES = {
"canonical_controls",
"canonical_controls_pre_dedup",
"obligation_candidates",
"control_dedup_reviews",
"canonical_generation_jobs",
"canonical_processed_chunks",
}
def fix_python_value(val: str) -> str:
"""Convert Python repr to JSON string for JSONB fields."""
if val == "NULL":
return None
# Strip outer SQL quotes
if val.startswith("'") and val.endswith("'"):
# Unescape SQL single quotes
inner = val[1:-1].replace("''", "'")
else:
return val
# Try to parse as Python literal and convert to JSON
try:
obj = ast.literal_eval(inner)
return json.dumps(obj, ensure_ascii=False)
except (ValueError, SyntaxError):
# Already valid JSON or plain string
return inner
def process_line(line: str, conn) -> bool:
"""Process a single SQL line. Returns True if it was an INSERT."""
line = line.strip()
if not line.startswith("INSERT INTO"):
if line.startswith("SET "):
return False
return False
# Execute directly for non-JSONB tables
table_match = re.match(r'INSERT INTO "(\w+)"', line)
if not table_match:
return False
table = table_match.group(1)
if table not in JSONB_TABLES:
# Execute as-is
try:
with conn.cursor() as cur:
cur.execute(line)
return True
except Exception as e:
conn.rollback()
return False
# For JSONB tables: use psycopg2 parameterized query
# Extract column names and values
cols_match = re.match(r'INSERT INTO "\w+" \(([^)]+)\) VALUES \(', line)
if not cols_match:
return False
col_names = [c.strip().strip('"') for c in cols_match.group(1).split(",")]
# Extract VALUES portion
vals_start = line.index("VALUES (") + 8
vals_str = line[vals_start:-2] # Remove trailing );
# Parse SQL values (handling nested quotes and parentheses)
values = []
current = ""
in_quote = False
depth = 0
i = 0
while i < len(vals_str):
c = vals_str[i]
if in_quote:
if c == "'" and i + 1 < len(vals_str) and vals_str[i + 1] == "'":
current += "''"
i += 2
continue
elif c == "'":
current += "'"
in_quote = False
else:
current += c
else:
if c == "'":
current += "'"
in_quote = True
elif c == "(" :
depth += 1
current += c
elif c == ")":
depth -= 1
current += c
elif c == "," and depth == 0:
values.append(current.strip())
current = ""
else:
current += c
i += 1
values.append(current.strip())
if len(values) != len(col_names):
# Fallback: try direct execution
try:
with conn.cursor() as cur:
cur.execute(line)
return True
except Exception:
conn.rollback()
return False
# Convert values
params = []
placeholders = []
for col, val in zip(col_names, values):
if val == "NULL":
params.append(None)
placeholders.append("%s")
elif val in ("TRUE", "true"):
params.append(True)
placeholders.append("%s")
elif val in ("FALSE", "false"):
params.append(False)
placeholders.append("%s")
elif val.startswith("'") and val.endswith("'"):
inner = val[1:-1].replace("''", "'")
# Check if this looks like a Python literal (list/dict)
stripped = inner.strip()
if stripped and stripped[0] in ("[", "{") and stripped not in ("[]", "{}"):
try:
obj = ast.literal_eval(inner)
params.append(json.dumps(obj, ensure_ascii=False))
except (ValueError, SyntaxError):
params.append(inner)
else:
params.append(inner)
placeholders.append("%s")
else:
# Numeric or other
try:
if "." in val:
params.append(float(val))
else:
params.append(int(val))
except ValueError:
params.append(val)
placeholders.append("%s")
col_list = ", ".join(f'"{c}"' for c in col_names)
ph_list = ", ".join(placeholders)
sql = f'INSERT INTO "{table}" ({col_list}) VALUES ({ph_list})'
try:
with conn.cursor() as cur:
cur.execute(sql, params)
return True
except Exception as e:
conn.rollback()
if "duplicate key" not in str(e):
print(f" ERROR [{table}]: {str(e)[:120]}", file=sys.stderr)
return False
def main():
conn = psycopg2.connect(DB_URL)
conn.autocommit = True
with conn.cursor() as cur:
cur.execute("SET search_path TO compliance, public")
total = 0
ok = 0
errors = 0
print(f"Reading {BACKUP_PATH}...")
with gzip.open(BACKUP_PATH, "rt", encoding="utf-8") as f:
buffer = ""
for line in f:
buffer += line
if not buffer.rstrip().endswith(";"):
continue
# Complete SQL statement
stmt = buffer.strip()
buffer = ""
if not stmt.startswith("INSERT"):
continue
total += 1
if process_line(stmt, conn):
ok += 1
else:
errors += 1
if total % 10000 == 0:
print(f" {total:>8} processed, {ok} ok, {errors} errors")
print(f"\nDONE: {total} total, {ok} ok, {errors} errors")
conn.close()
if __name__ == "__main__":
main()

View File

@@ -0,0 +1,284 @@
"""Ingest BAG (Bundesarbeitsgericht) court decisions into RAG.
Downloads PDFs from bundesarbeitsgericht.de and uploads them to the
bp_compliance_datenschutz Qdrant collection via the RAG-Service API.
These decisions are curated for IT/KI-Mitbestimmung relevance (§87 BetrVG).
Usage:
python scripts/ingest_bag_urteile.py [--rag-url https://macmini:8097] [--dry-run]
"""
import argparse
import json
import os
import re
import sys
import tempfile
import time
import httpx
# ---------------------------------------------------------------------------
# Curated BAG decisions for IT/AI works council co-determination
# ---------------------------------------------------------------------------
BAG_DECISIONS = [
# --- M365 / Copilot / Standardsoftware ---
{
"url": "https://www.bundesarbeitsgericht.de/entscheidung/1-abr-20-21/",
"case_number": "1 ABR 20/21",
"date": "2022-03-08",
"subject": "Microsoft Office 365 — Mitbestimmung",
"keywords": ["Microsoft 365", "Standardsoftware", "Ueberwachung", "§87 BetrVG"],
},
{
"url": "https://www.bundesarbeitsgericht.de/entscheidung/1-abn-36-18/",
"case_number": "1 ABN 36/18",
"date": "2018-10-23",
"subject": "Excel / Standardsoftware — keine Geringfuegigkeitsschwelle",
"keywords": ["Excel", "Standardsoftware", "Geringfuegigkeit", "§87 BetrVG"],
},
{
"url": "https://www.bundesarbeitsgericht.de/entscheidung/1-abr-45-11/",
"case_number": "1 ABR 45/11",
"date": "2012-09-25",
"subject": "SAP ERP im Personalwesen",
"keywords": ["SAP", "ERP", "Personalwesen", "Verhaltenskontrolle", "§87 BetrVG"],
},
{
"url": "https://www.bundesarbeitsgericht.de/entscheidung/1-abr-31-19/",
"case_number": "1 ABR 31/19",
"date": "2021-01-27",
"subject": "E-Mail-Kommunikationssoftware — Mitbestimmung",
"keywords": ["E-Mail", "Kommunikation", "Software", "§87 BetrVG"],
},
{
"url": "https://www.bundesarbeitsgericht.de/entscheidung/1-abr-13-17/",
"case_number": "1 ABR 13/17",
"date": "2019-07-09",
"subject": "IT-System fuer Mitarbeiterbefragung",
"keywords": ["Mitarbeiterbefragung", "Feedback", "technische Einrichtung", "§87 BetrVG"],
},
{
"url": "https://www.bundesarbeitsgericht.de/entscheidung/1-abr-16-23/",
"case_number": "1 ABR 16/23",
"date": "2024-07-16",
"subject": "Headset-System — Geraetenutzungsdaten",
"keywords": ["Headset", "Geraetenutzung", "Ueberwachung", "§87 BetrVG"],
},
# --- Ueberwachung, Social, Drittplattformen ---
{
"url": "https://www.bundesarbeitsgericht.de/entscheidung/1-abr-7-15/",
"case_number": "1 ABR 7/15",
"date": "2016-12-13",
"subject": "Facebook-Seite — indirekte Leistungsueberwachung",
"keywords": ["Facebook", "Social Media", "Besucherbeitraege", "Ueberwachung", "§87 BetrVG"],
},
{
"url": "https://www.bundesarbeitsgericht.de/entscheidung/1-abr-43-12/",
"case_number": "1 ABR 43/12",
"date": "2013-12-10",
"subject": "Google Maps — indirekte Ueberwachung / Definition Ueberwachung",
"keywords": ["Google Maps", "Routenplaner", "indirekte Ueberwachung", "Definition", "§87 BetrVG"],
},
{
"url": "https://www.bundesarbeitsgericht.de/entscheidung/1-abr-68-13/",
"case_number": "1 ABR 68/13",
"date": "2015-07-21",
"subject": "Ueberwachung durch technische Einrichtung eines Dritten (SaaS/Cloud)",
"keywords": ["Drittsystem", "SaaS", "Cloud", "Ueberwachung", "§87 BetrVG"],
},
# --- Video, Belastung, Leistungskennzahlen ---
{
"url": "https://www.bundesarbeitsgericht.de/entscheidung/1-abr-78-11/",
"case_number": "1 ABR 78/11",
"date": "2012-12-11",
"subject": "Videoueberwachung — Grundsatzentscheidung",
"keywords": ["Videoueberwachung", "Kamera", "Arbeitsplatz", "§87 BetrVG"],
},
{
"url": "https://www.bundesarbeitsgericht.de/entscheidung/1-abr-46-15/",
"case_number": "1 ABR 46/15",
"date": "2017-04-25",
"subject": "Belastungsstatistik — dauerhafte Kennzahlenueberwachung",
"keywords": ["Belastungsstatistik", "Kennzahlen", "Analytics", "Persoenlichkeitsrecht", "§87 BetrVG"],
},
# --- Negative / abgrenzende Faelle ---
{
"url": "https://www.bundesarbeitsgericht.de/entscheidung/1-abr-32-16/",
"case_number": "1 ABR 32/16",
"date": "2017-12-19",
"subject": "Anti-Terror-Listen — keine Mitbestimmung",
"keywords": ["Anti-Terror", "Sanktionsliste", "keine Mitbestimmung", "Abgrenzung", "§87 BetrVG"],
},
{
"url": "https://www.bundesarbeitsgericht.de/entscheidung/1-abr-22-21/",
"case_number": "1 ABR 22/21",
"date": "2022-09-13",
"subject": "Elektronische Arbeitszeiterfassung — Initiativrecht",
"keywords": ["Arbeitszeiterfassung", "Initiativrecht", "digitale Systeme", "§87 BetrVG"],
},
# --- Historische Grundsatzentscheidungen ---
{
"url": "https://www.bundesarbeitsgericht.de/entscheidung/1-abr-43-81/",
"case_number": "1 ABR 43/81",
"date": "1983-12-06",
"subject": "Grundsatz technische Ueberwachung — Eignung genuegt",
"keywords": ["Grundsatz", "Eignung", "technische Einrichtung", "§87 BetrVG"],
},
{
"url": "https://www.bundesarbeitsgericht.de/entscheidung/1-abr-23-82/",
"case_number": "1 ABR 23/82",
"date": "1984-09-14",
"subject": "Erste Grundlinie IT-Systeme",
"keywords": ["IT-System", "Grundlinie", "technische Einrichtung", "§87 BetrVG"],
},
# --- E-Mail / Internet ---
{
"url": "https://www.bundesarbeitsgericht.de/entscheidung/1-abr-46-10/",
"case_number": "1 ABR 46/10",
"date": "2012-02-07",
"subject": "Internet- und E-Mail-Nutzung — Kommunikationsdaten",
"keywords": ["Internet", "E-Mail", "Kommunikationsdaten", "Auswertung", "§87 BetrVG"],
},
# --- HR / Bewertungssysteme ---
{
"url": "https://www.bundesarbeitsgericht.de/entscheidung/1-abr-40-07/",
"case_number": "1 ABR 40/07",
"date": "2008-07-22",
"subject": "Beurteilungssysteme — §94/§95 BetrVG",
"keywords": ["Beurteilung", "Bewertungssystem", "HR", "§94 BetrVG", "§95 BetrVG"],
},
{
"url": "https://www.bundesarbeitsgericht.de/entscheidung/1-abr-16-07/",
"case_number": "1 ABR 16/07",
"date": "2008-03-18",
"subject": "Personalfrageboegen — Bewertung",
"keywords": ["Personalfragebogen", "Bewertung", "HR-Tools", "§94 BetrVG"],
},
# --- Video / physische Ueberwachung ---
{
"url": "https://www.bundesarbeitsgericht.de/entscheidung/1-abr-21-03/",
"case_number": "1 ABR 21/03",
"date": "2004-06-29",
"subject": "Videoueberwachung Arbeitsplatz",
"keywords": ["Video", "Kamera", "Arbeitsplatz", "Ueberwachung", "§87 BetrVG"],
},
# --- Zustaendigkeit ---
{
"url": "https://www.bundesarbeitsgericht.de/entscheidung/1-abr-2-05/",
"case_number": "1 ABR 2/05",
"date": "2006-05-03",
"subject": "Zustaendigkeit Betriebsrat bei konzernweiten Tools",
"keywords": ["Zustaendigkeit", "Konzern", "Gesamtbetriebsrat", "§87 BetrVG"],
},
{
"url": "https://www.bundesarbeitsgericht.de/entscheidung/1-abr-58-04/",
"case_number": "1 ABR 58/04",
"date": "2006-03-28",
"subject": "Mitbestimmung bei Einfuehrung technischer Systeme",
"keywords": ["Systemeinführung", "technische Systeme", "Mitbestimmung", "§87 BetrVG"],
},
]
def normalize_case_number(case_number: str) -> str:
"""Normalize case number for use as regulation_id."""
return re.sub(r"[^a-z0-9]", "_", case_number.lower()).strip("_")
def download_decision(url: str, client: httpx.Client) -> bytes:
"""Download a BAG decision page as HTML."""
resp = client.get(url, follow_redirects=True)
resp.raise_for_status()
return resp.content
def upload_to_rag(
file_bytes: bytes,
filename: str,
metadata: dict,
rag_url: str,
client: httpx.Client,
) -> dict:
"""Upload a document to the RAG service."""
files = {"file": (filename, file_bytes, "text/html")}
data = {
"collection": "bp_compliance_datenschutz",
"data_type": "compliance_datenschutz",
"bundesland": "bund",
"use_case": "court_decision",
"year": metadata.get("date", "2024")[:4],
"chunk_strategy": "legal",
"chunk_size": "512",
"chunk_overlap": "50",
"metadata_json": json.dumps(metadata),
}
resp = client.post(f"{rag_url}/api/v1/documents/upload", files=files, data=data)
resp.raise_for_status()
return resp.json()
def main():
parser = argparse.ArgumentParser(description="Ingest BAG court decisions into RAG")
parser.add_argument("--rag-url", default="https://macmini:8097", help="RAG service URL")
parser.add_argument("--dry-run", action="store_true", help="Download only, don't upload")
args = parser.parse_args()
client = httpx.Client(timeout=60, verify=False)
stats = {"downloaded": 0, "uploaded": 0, "errors": 0}
for decision in BAG_DECISIONS:
case_id = normalize_case_number(decision["case_number"])
print(f"\n--- {decision['case_number']}: {decision['subject']} ---")
# Download
try:
html_bytes = download_decision(decision["url"], client)
stats["downloaded"] += 1
print(f" Downloaded: {len(html_bytes)} bytes")
except Exception as e:
print(f" ERROR downloading: {e}")
stats["errors"] += 1
continue
if args.dry_run:
continue
# Upload
metadata = {
"regulation_id": f"bag_{case_id}",
"regulation_name_de": f"BAG {decision['case_number']}{decision['subject']}",
"category": "arbeitsrecht",
"source": "bundesarbeitsgericht.de",
"doc_type": "court_decision",
"license": "public_domain_§5_UrhG",
"court": "BAG",
"case_number": decision["case_number"],
"date": decision["date"],
"subject_matter": decision["subject"],
"keywords": decision["keywords"],
}
try:
result = upload_to_rag(
html_bytes,
f"bag_{case_id}.html",
metadata,
args.rag_url,
client,
)
stats["uploaded"] += 1
print(f" Uploaded: {result.get('chunks_count', '?')} chunks, doc_id={result.get('document_id', '?')}")
except Exception as e:
print(f" ERROR uploading: {e}")
stats["errors"] += 1
time.sleep(1) # Rate limiting
print(f"\n=== Done: {stats['downloaded']} downloaded, {stats['uploaded']} uploaded, {stats['errors']} errors ===")
if __name__ == "__main__":
main()

View File

View File

@@ -0,0 +1,187 @@
"""
Anchor Finder — finds open-source references (OWASP, NIST, ENISA) for controls.
Two-stage search:
Stage A: RAG-internal search for open-source chunks matching the control topic
Stage B: Web search via DuckDuckGo Instant Answer API (no API key needed)
Only open-source references (Rule 1+2) are accepted as anchors.
"""
import logging
from dataclasses import dataclass
from typing import List, Optional
import httpx
from .rag_client import ComplianceRAGClient, get_rag_client
from .control_generator import (
GeneratedControl,
REGULATION_LICENSE_MAP,
_RULE2_PREFIXES,
_RULE3_PREFIXES,
_classify_regulation,
)
logger = logging.getLogger(__name__)
# Regulation codes that are safe to reference as open anchors (Rule 1+2)
_OPEN_SOURCE_RULES = {1, 2}
@dataclass
class OpenAnchor:
framework: str
ref: str
url: str
class AnchorFinder:
"""Finds open-source references to anchor generated controls."""
def __init__(self, rag_client: Optional[ComplianceRAGClient] = None):
self.rag = rag_client or get_rag_client()
async def find_anchors(
self,
control: GeneratedControl,
skip_web: bool = False,
min_anchors: int = 2,
) -> List[OpenAnchor]:
"""Find open-source anchors for a control."""
# Stage A: RAG-internal search
anchors = await self._search_rag_for_open_anchors(control)
# Stage B: Web search if not enough anchors
if len(anchors) < min_anchors and not skip_web:
web_anchors = await self._search_web(control)
# Deduplicate by framework+ref
existing_keys = {(a.framework, a.ref) for a in anchors}
for wa in web_anchors:
if (wa.framework, wa.ref) not in existing_keys:
anchors.append(wa)
return anchors
async def _search_rag_for_open_anchors(self, control: GeneratedControl) -> List[OpenAnchor]:
"""Search RAG for chunks from open sources matching the control topic."""
# Build search query from control title + first 3 tags
tags_str = " ".join(control.tags[:3]) if control.tags else ""
query = f"{control.title} {tags_str}".strip()
results = await self.rag.search_with_rerank(
query=query,
collection="bp_compliance_ce",
top_k=15,
)
anchors: List[OpenAnchor] = []
seen: set[str] = set()
for r in results:
if not r.regulation_code:
continue
# Only accept open-source references
license_info = _classify_regulation(r.regulation_code)
if license_info.get("rule") not in _OPEN_SOURCE_RULES:
continue
# Build reference key for dedup
ref = r.article or r.category or ""
key = f"{r.regulation_code}:{ref}"
if key in seen:
continue
seen.add(key)
framework_name = license_info.get("name", r.regulation_name or r.regulation_short or r.regulation_code)
url = r.source_url or self._build_reference_url(r.regulation_code, ref)
anchors.append(OpenAnchor(
framework=framework_name,
ref=ref,
url=url,
))
if len(anchors) >= 5:
break
return anchors
async def _search_web(self, control: GeneratedControl) -> List[OpenAnchor]:
"""Search DuckDuckGo Instant Answer API for open references."""
keywords = f"{control.title} security control OWASP NIST"
try:
async with httpx.AsyncClient(timeout=10.0) as client:
resp = await client.get(
"https://api.duckduckgo.com/",
params={
"q": keywords,
"format": "json",
"no_html": "1",
"skip_disambig": "1",
},
)
if resp.status_code != 200:
return []
data = resp.json()
anchors: List[OpenAnchor] = []
# Parse RelatedTopics
for topic in data.get("RelatedTopics", [])[:10]:
url = topic.get("FirstURL", "")
text = topic.get("Text", "")
if not url:
continue
# Only accept known open-source domains
framework = self._identify_framework_from_url(url)
if framework:
anchors.append(OpenAnchor(
framework=framework,
ref=text[:100] if text else url,
url=url,
))
if len(anchors) >= 3:
break
return anchors
except Exception as e:
logger.warning("Web anchor search failed: %s", e)
return []
@staticmethod
def _identify_framework_from_url(url: str) -> Optional[str]:
"""Identify if a URL belongs to a known open-source framework."""
url_lower = url.lower()
if "owasp.org" in url_lower:
return "OWASP"
if "nist.gov" in url_lower or "csrc.nist.gov" in url_lower:
return "NIST"
if "enisa.europa.eu" in url_lower:
return "ENISA"
if "cisa.gov" in url_lower:
return "CISA"
if "eur-lex.europa.eu" in url_lower:
return "EU Law"
return None
@staticmethod
def _build_reference_url(regulation_code: str, ref: str) -> str:
"""Build a reference URL for known frameworks."""
code = regulation_code.lower()
if code.startswith("owasp"):
return "https://owasp.org/www-project-application-security-verification-standard/"
if code.startswith("nist"):
return "https://csrc.nist.gov/publications"
if code.startswith("enisa"):
return "https://www.enisa.europa.eu/publications"
if code.startswith("eu_"):
return "https://eur-lex.europa.eu/"
if code == "cisa_secure_by_design":
return "https://www.cisa.gov/securebydesign"
return ""

View File

@@ -0,0 +1,245 @@
"""
Applicability Engine -- filters controls based on company profile + scope answers.
Deterministic, no LLM needed. Implements Scoped Control Applicability (Phase C2).
Filtering logic:
- Controls with NULL applicability fields are INCLUDED (apply to everyone).
- Controls with '["all"]' match all queries.
- Industry: control applies if its applicable_industries contains the requested
industry OR contains "all" OR is NULL.
- Company size: control applies if its applicable_company_size contains the
requested size OR contains "all" OR is NULL.
- Scope signals: control applies if it has NO scope_conditions, or the company
has at least one of the required signals (requires_any logic).
"""
from __future__ import annotations
import json
import logging
from typing import Any, Optional
from sqlalchemy import text
from db.session import SessionLocal
logger = logging.getLogger(__name__)
# Valid company sizes (ordered smallest to largest)
VALID_SIZES = ("micro", "small", "medium", "large", "enterprise")
def _parse_json_text(value: Any) -> Any:
"""Parse a TEXT column that stores JSON. Returns None if unparseable."""
if value is None:
return None
if isinstance(value, (list, dict)):
return value
if isinstance(value, str):
try:
return json.loads(value)
except (json.JSONDecodeError, ValueError):
return None
return None
def _matches_industry(applicable_industries_raw: Any, industry: str) -> bool:
"""Check if a control's applicable_industries matches the requested industry."""
industries = _parse_json_text(applicable_industries_raw)
if industries is None:
return True # NULL = applies to everyone
if not isinstance(industries, list):
return True # malformed = include
if "all" in industries:
return True
return industry in industries
def _matches_company_size(applicable_company_size_raw: Any, company_size: str) -> bool:
"""Check if a control's applicable_company_size matches the requested size."""
sizes = _parse_json_text(applicable_company_size_raw)
if sizes is None:
return True # NULL = applies to everyone
if not isinstance(sizes, list):
return True # malformed = include
if "all" in sizes:
return True
return company_size in sizes
def _matches_scope_signals(
scope_conditions_raw: Any, scope_signals: list[str]
) -> bool:
"""Check if a control's scope_conditions are satisfied by the given signals.
A control with scope_conditions = {"requires_any": ["uses_ai", "processes_health_data"]}
matches if the company has at least one of those signals.
A control with NULL or empty scope_conditions always matches.
"""
conditions = _parse_json_text(scope_conditions_raw)
if conditions is None:
return True # no conditions = applies to everyone
if not isinstance(conditions, dict):
return True # malformed = include
requires_any = conditions.get("requires_any", [])
if not requires_any:
return True # no required signals = applies to everyone
# Company must have at least one of the required signals
return bool(set(requires_any) & set(scope_signals))
def get_applicable_controls(
db,
industry: Optional[str] = None,
company_size: Optional[str] = None,
scope_signals: Optional[list[str]] = None,
limit: int = 100,
offset: int = 0,
) -> dict[str, Any]:
"""
Returns controls applicable to the given company profile.
Uses SQL pre-filtering with LIKE for performance, then Python post-filtering
for precise JSON matching (since columns are TEXT, not JSONB).
Args:
db: SQLAlchemy session
industry: e.g. "Telekommunikation", "Energie", "Gesundheitswesen"
company_size: e.g. "medium", "large", "enterprise"
scope_signals: e.g. ["uses_ai", "third_country_transfer"]
limit: max results to return (applied after filtering)
offset: pagination offset (applied after filtering)
Returns:
dict with total_applicable count, paginated controls, and breakdown stats
"""
if scope_signals is None:
scope_signals = []
# SQL pre-filter: broad match to reduce Python-side filtering
query = """
SELECT id, framework_id, control_id, title, objective, rationale,
scope, requirements, test_procedure, evidence,
severity, risk_score, implementation_effort,
evidence_confidence, open_anchors, release_state, tags,
license_rule, source_original_text, source_citation,
customer_visible, verification_method, category, evidence_type,
target_audience, generation_metadata, generation_strategy,
applicable_industries, applicable_company_size, scope_conditions,
parent_control_uuid, decomposition_method, pipeline_version,
created_at, updated_at
FROM canonical_controls
WHERE release_state NOT IN ('duplicate', 'deprecated', 'rejected')
"""
params: dict[str, Any] = {}
# SQL-level pre-filtering (broad, may include false positives)
if industry:
query += """ AND (applicable_industries IS NULL
OR applicable_industries LIKE '%"all"%'
OR applicable_industries LIKE '%' || :industry || '%')"""
params["industry"] = industry
if company_size:
query += """ AND (applicable_company_size IS NULL
OR applicable_company_size LIKE '%"all"%'
OR applicable_company_size LIKE '%' || :company_size || '%')"""
params["company_size"] = company_size
# For scope_signals we cannot do precise SQL filtering on requires_any,
# but we can at least exclude controls whose scope_conditions text
# does not contain any of the requested signals (if only 1 signal).
# With multiple signals we skip SQL pre-filter and do it in Python.
if scope_signals and len(scope_signals) == 1:
query += """ AND (scope_conditions IS NULL
OR scope_conditions LIKE '%' || :scope_sig || '%')"""
params["scope_sig"] = scope_signals[0]
query += " ORDER BY control_id"
rows = db.execute(text(query), params).fetchall()
# Python-level precise filtering
applicable = []
for r in rows:
if industry and not _matches_industry(r.applicable_industries, industry):
continue
if company_size and not _matches_company_size(
r.applicable_company_size, company_size
):
continue
if scope_signals and not _matches_scope_signals(
r.scope_conditions, scope_signals
):
continue
applicable.append(r)
total_applicable = len(applicable)
# Apply pagination
paginated = applicable[offset : offset + limit]
# Build domain breakdown
domain_counts: dict[str, int] = {}
for r in applicable:
domain = r.control_id.split("-")[0].upper() if r.control_id else "UNKNOWN"
domain_counts[domain] = domain_counts.get(domain, 0) + 1
# Build severity breakdown
severity_counts: dict[str, int] = {}
for r in applicable:
sev = r.severity or "unknown"
severity_counts[sev] = severity_counts.get(sev, 0) + 1
# Build industry breakdown (from matched controls)
industry_counts: dict[str, int] = {}
for r in applicable:
industries = _parse_json_text(r.applicable_industries)
if isinstance(industries, list):
for ind in industries:
industry_counts[ind] = industry_counts.get(ind, 0) + 1
else:
industry_counts["unclassified"] = (
industry_counts.get("unclassified", 0) + 1
)
return {
"total_applicable": total_applicable,
"limit": limit,
"offset": offset,
"controls": [_row_to_control(r) for r in paginated],
"breakdown": {
"by_domain": domain_counts,
"by_severity": severity_counts,
"by_industry": industry_counts,
},
}
def _row_to_control(r) -> dict[str, Any]:
"""Convert a DB row to a control dict for API response."""
return {
"id": str(r.id),
"framework_id": str(r.framework_id),
"control_id": r.control_id,
"title": r.title,
"objective": r.objective,
"rationale": r.rationale,
"severity": r.severity,
"category": r.category,
"verification_method": r.verification_method,
"evidence_type": getattr(r, "evidence_type", None),
"target_audience": r.target_audience,
"applicable_industries": r.applicable_industries,
"applicable_company_size": r.applicable_company_size,
"scope_conditions": r.scope_conditions,
"release_state": r.release_state,
"control_id_domain": (
r.control_id.split("-")[0].upper() if r.control_id else None
),
"created_at": r.created_at.isoformat() if r.created_at else None,
"updated_at": r.updated_at.isoformat() if r.updated_at else None,
}

View File

@@ -0,0 +1,631 @@
"""Batch Dedup Runner — Orchestrates deduplication of ~85k atomare Controls.
Reduces Pass 0b controls from ~85k to ~18-25k unique Master Controls via:
Phase 1: Intra-Group Dedup — same merge_group_hint → pick best, link rest
(85k → ~52k, mostly title-identical short-circuit, no embeddings)
Phase 2: Cross-Group Dedup — embed masters, search Qdrant for similar
masters with different hints (52k → ~18-25k)
All Pass 0b controls have pattern_id=NULL. The primary grouping key is
merge_group_hint (format: "action_type:norm_obj:trigger_key"), which
encodes the normalized action, object, and trigger.
Usage:
runner = BatchDedupRunner(db)
stats = await runner.run(dry_run=True) # preview
stats = await runner.run(dry_run=False) # execute
stats = await runner.run(hint_filter="implement:multi_factor_auth:none")
"""
import asyncio
import json
import logging
import time
from collections import defaultdict
from sqlalchemy import text
from services.control_dedup import (
canonicalize_text,
ensure_qdrant_collection,
get_embedding,
normalize_action,
normalize_object,
qdrant_search_cross_regulation,
qdrant_upsert,
LINK_THRESHOLD,
REVIEW_THRESHOLD,
)
logger = logging.getLogger(__name__)
DEDUP_COLLECTION = "atomic_controls_dedup"
# ── Quality Score ────────────────────────────────────────────────────────
def quality_score(control: dict) -> float:
"""Score a control by richness of requirements, tests, evidence, and objective.
Higher score = better candidate for master control.
"""
score = 0.0
reqs = control.get("requirements") or "[]"
if isinstance(reqs, str):
try:
reqs = json.loads(reqs)
except (json.JSONDecodeError, TypeError):
reqs = []
score += len(reqs) * 2.0
tests = control.get("test_procedure") or "[]"
if isinstance(tests, str):
try:
tests = json.loads(tests)
except (json.JSONDecodeError, TypeError):
tests = []
score += len(tests) * 1.5
evidence = control.get("evidence") or "[]"
if isinstance(evidence, str):
try:
evidence = json.loads(evidence)
except (json.JSONDecodeError, TypeError):
evidence = []
score += len(evidence) * 1.0
objective = control.get("objective") or ""
score += min(len(objective) / 200, 3.0)
return score
# ── Batch Dedup Runner ───────────────────────────────────────────────────
class BatchDedupRunner:
"""Batch dedup orchestrator for existing Pass 0b atomic controls."""
def __init__(self, db, collection: str = DEDUP_COLLECTION):
self.db = db
self.collection = collection
self.stats = {
"total_controls": 0,
"unique_hints": 0,
"phase1_groups_processed": 0,
"masters": 0,
"linked": 0,
"review": 0,
"new_controls": 0,
"parent_links_transferred": 0,
"cross_group_linked": 0,
"cross_group_review": 0,
"errors": 0,
"skipped_title_identical": 0,
}
self._progress_phase = ""
self._progress_count = 0
self._progress_total = 0
async def run(
self,
dry_run: bool = False,
hint_filter: str = None,
) -> dict:
"""Run the full batch dedup pipeline.
Args:
dry_run: If True, compute stats but don't modify DB/Qdrant.
hint_filter: If set, only process groups matching this hint prefix.
Returns:
Stats dict with counts.
"""
start = time.monotonic()
logger.info("BatchDedup starting (dry_run=%s, hint_filter=%s)",
dry_run, hint_filter)
if not dry_run:
await ensure_qdrant_collection(collection=self.collection)
# Phase 1: Intra-group dedup (same merge_group_hint)
self._progress_phase = "phase1"
groups = self._load_merge_groups(hint_filter)
self._progress_total = self.stats["total_controls"]
for hint, controls in groups:
try:
await self._process_hint_group(hint, controls, dry_run)
self.stats["phase1_groups_processed"] += 1
except Exception as e:
logger.error("BatchDedup Phase 1 error on hint %s: %s", hint, e)
self.stats["errors"] += 1
try:
self.db.rollback()
except Exception:
pass
logger.info(
"BatchDedup Phase 1 done: %d masters, %d linked, %d review",
self.stats["masters"], self.stats["linked"], self.stats["review"],
)
# Phase 2: Cross-group dedup via embeddings
if not dry_run:
self._progress_phase = "phase2"
await self._run_cross_group_pass()
elapsed = time.monotonic() - start
self.stats["elapsed_seconds"] = round(elapsed, 1)
logger.info("BatchDedup completed in %.1fs: %s", elapsed, self.stats)
return self.stats
def _load_merge_groups(self, hint_filter: str = None) -> list:
"""Load all Pass 0b controls grouped by merge_group_hint, largest first."""
conditions = [
"decomposition_method = 'pass0b'",
"release_state != 'deprecated'",
"release_state != 'duplicate'",
]
params = {}
if hint_filter:
conditions.append("generation_metadata->>'merge_group_hint' LIKE :hf")
params["hf"] = f"{hint_filter}%"
where = " AND ".join(conditions)
rows = self.db.execute(text(f"""
SELECT id::text, control_id, title, objective,
pattern_id, requirements::text, test_procedure::text,
evidence::text, release_state,
generation_metadata->>'merge_group_hint' as merge_group_hint,
generation_metadata->>'action_object_class' as action_object_class
FROM canonical_controls
WHERE {where}
ORDER BY control_id
"""), params).fetchall()
by_hint = defaultdict(list)
for r in rows:
by_hint[r[9] or ""].append({
"uuid": r[0],
"control_id": r[1],
"title": r[2],
"objective": r[3],
"pattern_id": r[4],
"requirements": r[5],
"test_procedure": r[6],
"evidence": r[7],
"release_state": r[8],
"merge_group_hint": r[9] or "",
"action_object_class": r[10] or "",
})
self.stats["total_controls"] = len(rows)
self.stats["unique_hints"] = len(by_hint)
sorted_groups = sorted(by_hint.items(), key=lambda x: len(x[1]), reverse=True)
logger.info("BatchDedup loaded %d controls in %d hint groups",
len(rows), len(sorted_groups))
return sorted_groups
def _sub_group_by_merge_hint(self, controls: list) -> dict:
"""Group controls by merge_group_hint composite key."""
groups = defaultdict(list)
for c in controls:
hint = c["merge_group_hint"]
if hint:
groups[hint].append(c)
else:
groups[f"__no_hint_{c['uuid']}"].append(c)
return dict(groups)
async def _process_hint_group(
self,
hint: str,
controls: list,
dry_run: bool,
):
"""Process all controls sharing the same merge_group_hint.
Within a hint group, all controls share action+object+trigger.
The best-quality control becomes master, rest are linked as duplicates.
"""
if len(controls) < 2:
# Singleton → always master
self.stats["masters"] += 1
if not dry_run:
await self._embed_and_index(controls[0])
self._progress_count += 1
self._log_progress(hint)
return
# Sort by quality score (best first)
sorted_group = sorted(controls, key=quality_score, reverse=True)
master = sorted_group[0]
self.stats["masters"] += 1
if not dry_run:
await self._embed_and_index(master)
for candidate in sorted_group[1:]:
# All share the same hint → check title similarity
if candidate["title"].strip().lower() == master["title"].strip().lower():
# Identical title → direct link (no embedding needed)
self.stats["linked"] += 1
self.stats["skipped_title_identical"] += 1
if not dry_run:
await self._mark_duplicate(master, candidate, confidence=1.0)
else:
# Different title within same hint → still likely duplicate
# Use embedding to verify
await self._check_and_link_within_group(master, candidate, dry_run)
self._progress_count += 1
self._log_progress(hint)
async def _check_and_link_within_group(
self,
master: dict,
candidate: dict,
dry_run: bool,
):
"""Check if candidate (same hint group) is duplicate of master via embedding."""
parts = candidate["merge_group_hint"].split(":", 2)
action = parts[0] if len(parts) > 0 else ""
obj = parts[1] if len(parts) > 1 else ""
canonical = canonicalize_text(action, obj, candidate["title"])
embedding = await get_embedding(canonical)
if not embedding:
# Can't embed → link anyway (same hint = same action+object)
self.stats["linked"] += 1
if not dry_run:
await self._mark_duplicate(master, candidate, confidence=0.90)
return
# Search the dedup collection (unfiltered — pattern_id is NULL)
results = await qdrant_search_cross_regulation(
embedding, top_k=3, collection=self.collection,
)
if not results:
# No Qdrant matches yet (master might not be indexed yet) → link to master
self.stats["linked"] += 1
if not dry_run:
await self._mark_duplicate(master, candidate, confidence=0.90)
return
best = results[0]
best_score = best.get("score", 0.0)
best_payload = best.get("payload", {})
best_uuid = best_payload.get("control_uuid", "")
if best_score > LINK_THRESHOLD:
self.stats["linked"] += 1
if not dry_run:
await self._mark_duplicate_to(best_uuid, candidate, confidence=best_score)
elif best_score > REVIEW_THRESHOLD:
self.stats["review"] += 1
if not dry_run:
self._write_review(candidate, best_payload, best_score)
else:
# Very different despite same hint → new master
self.stats["new_controls"] += 1
if not dry_run:
await self._index_with_embedding(candidate, embedding)
async def _run_cross_group_pass(self):
"""Phase 2: Find cross-group duplicates among surviving masters.
After Phase 1, ~52k masters remain. Many have similar semantics
despite different merge_group_hints (e.g. different German spellings).
This pass embeds all masters and finds near-duplicates via Qdrant.
"""
logger.info("BatchDedup Phase 2: Cross-group pass starting...")
rows = self.db.execute(text("""
SELECT id::text, control_id, title,
generation_metadata->>'merge_group_hint' as merge_group_hint
FROM canonical_controls
WHERE decomposition_method = 'pass0b'
AND release_state != 'duplicate'
AND release_state != 'deprecated'
ORDER BY control_id
""")).fetchall()
self._progress_total = len(rows)
self._progress_count = 0
logger.info("BatchDedup Cross-group: %d masters to check", len(rows))
cross_linked = 0
cross_review = 0
# Process in parallel batches for embedding + Qdrant search
PARALLEL_BATCH = 10
async def _embed_and_search(r):
"""Embed one control and search Qdrant — safe for asyncio.gather."""
hint = r[3] or ""
parts = hint.split(":", 2)
action = parts[0] if len(parts) > 0 else ""
obj = parts[1] if len(parts) > 1 else ""
canonical = canonicalize_text(action, obj, r[2])
embedding = await get_embedding(canonical)
if not embedding:
return None
results = await qdrant_search_cross_regulation(
embedding, top_k=5, collection=self.collection,
)
return (r, results)
for batch_start in range(0, len(rows), PARALLEL_BATCH):
batch = rows[batch_start:batch_start + PARALLEL_BATCH]
tasks = [_embed_and_search(r) for r in batch]
results_batch = await asyncio.gather(*tasks, return_exceptions=True)
for res in results_batch:
if res is None or isinstance(res, Exception):
if isinstance(res, Exception):
logger.error("BatchDedup embed/search error: %s", res)
self.stats["errors"] += 1
continue
r, results = res
ctrl_uuid = r[0]
hint = r[3] or ""
if not results:
continue
for match in results:
match_score = match.get("score", 0.0)
match_payload = match.get("payload", {})
match_uuid = match_payload.get("control_uuid", "")
if match_uuid == ctrl_uuid:
continue
if match_score > LINK_THRESHOLD:
try:
self.db.execute(text("""
UPDATE canonical_controls
SET release_state = 'duplicate', merged_into_uuid = CAST(:master AS uuid)
WHERE id = CAST(:dup AS uuid)
AND release_state != 'duplicate'
"""), {"master": match_uuid, "dup": ctrl_uuid})
self.db.execute(text("""
INSERT INTO control_parent_links
(control_uuid, parent_control_uuid, link_type, confidence)
VALUES (CAST(:cu AS uuid), CAST(:pu AS uuid), 'cross_regulation', :conf)
ON CONFLICT (control_uuid, parent_control_uuid) DO NOTHING
"""), {"cu": match_uuid, "pu": ctrl_uuid, "conf": match_score})
transferred = self._transfer_parent_links(match_uuid, ctrl_uuid)
self.stats["parent_links_transferred"] += transferred
self.db.commit()
cross_linked += 1
except Exception as e:
logger.error("BatchDedup cross-group link error %s%s: %s",
ctrl_uuid, match_uuid, e)
self.db.rollback()
self.stats["errors"] += 1
break
elif match_score > REVIEW_THRESHOLD:
self._write_review(
{"control_id": r[1], "title": r[2], "objective": "",
"merge_group_hint": hint, "pattern_id": None},
match_payload, match_score,
)
cross_review += 1
break
processed = min(batch_start + PARALLEL_BATCH, len(rows))
self._progress_count = processed
if processed % 500 < PARALLEL_BATCH:
logger.info("BatchDedup Cross-group: %d/%d checked, %d linked, %d review",
processed, len(rows), cross_linked, cross_review)
self.stats["cross_group_linked"] = cross_linked
self.stats["cross_group_review"] = cross_review
logger.info("BatchDedup Cross-group complete: %d linked, %d review",
cross_linked, cross_review)
# ── Qdrant Helpers ───────────────────────────────────────────────────
async def _embed_and_index(self, control: dict):
"""Compute embedding and index a control in the dedup Qdrant collection."""
parts = control["merge_group_hint"].split(":", 2)
action = parts[0] if len(parts) > 0 else ""
obj = parts[1] if len(parts) > 1 else ""
norm_action = normalize_action(action)
norm_object = normalize_object(obj)
canonical = canonicalize_text(action, obj, control["title"])
embedding = await get_embedding(canonical)
if not embedding:
return
await qdrant_upsert(
point_id=control["uuid"],
embedding=embedding,
payload={
"control_uuid": control["uuid"],
"control_id": control["control_id"],
"title": control["title"],
"pattern_id": control.get("pattern_id"),
"action_normalized": norm_action,
"object_normalized": norm_object,
"canonical_text": canonical,
"merge_group_hint": control["merge_group_hint"],
},
collection=self.collection,
)
async def _index_with_embedding(self, control: dict, embedding: list):
"""Index a control with a pre-computed embedding."""
parts = control["merge_group_hint"].split(":", 2)
action = parts[0] if len(parts) > 0 else ""
obj = parts[1] if len(parts) > 1 else ""
norm_action = normalize_action(action)
norm_object = normalize_object(obj)
canonical = canonicalize_text(action, obj, control["title"])
await qdrant_upsert(
point_id=control["uuid"],
embedding=embedding,
payload={
"control_uuid": control["uuid"],
"control_id": control["control_id"],
"title": control["title"],
"pattern_id": control.get("pattern_id"),
"action_normalized": norm_action,
"object_normalized": norm_object,
"canonical_text": canonical,
"merge_group_hint": control["merge_group_hint"],
},
collection=self.collection,
)
# ── DB Write Helpers ─────────────────────────────────────────────────
async def _mark_duplicate(self, master: dict, candidate: dict, confidence: float):
"""Mark candidate as duplicate of master, transfer parent links."""
try:
self.db.execute(text("""
UPDATE canonical_controls
SET release_state = 'duplicate', merged_into_uuid = CAST(:master AS uuid)
WHERE id = CAST(:cand AS uuid)
"""), {"master": master["uuid"], "cand": candidate["uuid"]})
self.db.execute(text("""
INSERT INTO control_parent_links
(control_uuid, parent_control_uuid, link_type, confidence)
VALUES (CAST(:master AS uuid), CAST(:cand_parent AS uuid), 'dedup_merge', :conf)
ON CONFLICT (control_uuid, parent_control_uuid) DO NOTHING
"""), {"master": master["uuid"], "cand_parent": candidate["uuid"], "conf": confidence})
transferred = self._transfer_parent_links(master["uuid"], candidate["uuid"])
self.stats["parent_links_transferred"] += transferred
self.db.commit()
except Exception as e:
logger.error("BatchDedup _mark_duplicate error %s%s: %s",
candidate["uuid"], master["uuid"], e)
self.db.rollback()
raise
async def _mark_duplicate_to(self, master_uuid: str, candidate: dict, confidence: float):
"""Mark candidate as duplicate of a Qdrant-matched master."""
try:
self.db.execute(text("""
UPDATE canonical_controls
SET release_state = 'duplicate', merged_into_uuid = CAST(:master AS uuid)
WHERE id = CAST(:cand AS uuid)
"""), {"master": master_uuid, "cand": candidate["uuid"]})
self.db.execute(text("""
INSERT INTO control_parent_links
(control_uuid, parent_control_uuid, link_type, confidence)
VALUES (CAST(:master AS uuid), CAST(:cand_parent AS uuid), 'dedup_merge', :conf)
ON CONFLICT (control_uuid, parent_control_uuid) DO NOTHING
"""), {"master": master_uuid, "cand_parent": candidate["uuid"], "conf": confidence})
transferred = self._transfer_parent_links(master_uuid, candidate["uuid"])
self.stats["parent_links_transferred"] += transferred
self.db.commit()
except Exception as e:
logger.error("BatchDedup _mark_duplicate_to error %s%s: %s",
candidate["uuid"], master_uuid, e)
self.db.rollback()
raise
def _transfer_parent_links(self, master_uuid: str, duplicate_uuid: str) -> int:
"""Move existing parent links from duplicate to master."""
rows = self.db.execute(text("""
SELECT parent_control_uuid::text, link_type, confidence,
source_regulation, source_article, obligation_candidate_id::text
FROM control_parent_links
WHERE control_uuid = CAST(:dup AS uuid)
AND link_type = 'decomposition'
"""), {"dup": duplicate_uuid}).fetchall()
transferred = 0
for r in rows:
parent_uuid = r[0]
if parent_uuid == master_uuid:
continue
self.db.execute(text("""
INSERT INTO control_parent_links
(control_uuid, parent_control_uuid, link_type, confidence,
source_regulation, source_article, obligation_candidate_id)
VALUES (CAST(:cu AS uuid), CAST(:pu AS uuid), :lt, :conf,
:sr, :sa, CAST(:oci AS uuid))
ON CONFLICT (control_uuid, parent_control_uuid) DO NOTHING
"""), {
"cu": master_uuid,
"pu": parent_uuid,
"lt": r[1],
"conf": float(r[2]) if r[2] else 1.0,
"sr": r[3],
"sa": r[4],
"oci": r[5],
})
transferred += 1
return transferred
def _write_review(self, candidate: dict, matched_payload: dict, score: float):
"""Write a dedup review entry for borderline matches."""
try:
self.db.execute(text("""
INSERT INTO control_dedup_reviews
(candidate_control_id, candidate_title, candidate_objective,
matched_control_uuid, matched_control_id,
similarity_score, dedup_stage, dedup_details)
VALUES (:ccid, :ct, :co, CAST(:mcu AS uuid), :mci,
:ss, 'batch_dedup', CAST(:dd AS jsonb))
"""), {
"ccid": candidate["control_id"],
"ct": candidate["title"],
"co": candidate.get("objective", ""),
"mcu": matched_payload.get("control_uuid"),
"mci": matched_payload.get("control_id"),
"ss": score,
"dd": json.dumps({
"merge_group_hint": candidate.get("merge_group_hint", ""),
"pattern_id": candidate.get("pattern_id"),
}),
})
self.db.commit()
except Exception as e:
logger.error("BatchDedup _write_review error: %s", e)
self.db.rollback()
raise
# ── Progress ─────────────────────────────────────────────────────────
def _log_progress(self, hint: str):
"""Log progress every 500 controls."""
if self._progress_count > 0 and self._progress_count % 500 == 0:
logger.info(
"BatchDedup [%s] %d/%d — masters=%d, linked=%d, review=%d",
self._progress_phase, self._progress_count, self._progress_total,
self.stats["masters"], self.stats["linked"], self.stats["review"],
)
def get_status(self) -> dict:
"""Return current progress stats (for status endpoint)."""
return {
"phase": self._progress_phase,
"progress": self._progress_count,
"total": self._progress_total,
**self.stats,
}

View File

@@ -0,0 +1,438 @@
"""
Citation Backfill Service — enrich existing controls with article/paragraph provenance.
3-tier matching strategy:
Tier 1 — Hash match: sha256(source_original_text) → RAG chunk lookup
Tier 2 — Regex parse: split concatenated "DSGVO Art. 35" → regulation + article
Tier 3 — Ollama LLM: ask local LLM to identify article/paragraph from text
"""
import hashlib
import json
import logging
import os
import re
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Optional
import httpx
from sqlalchemy import text
from sqlalchemy.orm import Session
from .rag_client import ComplianceRAGClient, RAGSearchResult
logger = logging.getLogger(__name__)
OLLAMA_URL = os.getenv("OLLAMA_URL", "http://host.docker.internal:11434")
OLLAMA_MODEL = os.getenv("CONTROL_GEN_OLLAMA_MODEL", "qwen3.5:35b-a3b")
LLM_TIMEOUT = float(os.getenv("CONTROL_GEN_LLM_TIMEOUT", "180"))
ALL_COLLECTIONS = [
"bp_compliance_ce",
"bp_compliance_gesetze",
"bp_compliance_datenschutz",
"bp_dsfa_corpus",
"bp_legal_templates",
]
BACKFILL_SYSTEM_PROMPT = (
"Du bist ein Rechtsexperte. Deine Aufgabe ist es, aus einem Gesetzestext "
"den genauen Artikel und Absatz zu bestimmen. Antworte NUR mit validem JSON."
)
# Regex to split concatenated source like "DSGVO Art. 35" or "NIS2 Artikel 21 Abs. 2"
_SOURCE_ARTICLE_RE = re.compile(
r"^(.+?)\s+(Art(?:ikel)?\.?\s*\d+.*)$", re.IGNORECASE
)
@dataclass
class MatchResult:
article: str
paragraph: str
method: str # "hash", "regex", "llm"
@dataclass
class BackfillResult:
total_controls: int = 0
matched_hash: int = 0
matched_regex: int = 0
matched_llm: int = 0
unmatched: int = 0
updated: int = 0
errors: list = field(default_factory=list)
class CitationBackfill:
"""Backfill article/paragraph into existing control source_citations."""
def __init__(self, db: Session, rag_client: ComplianceRAGClient):
self.db = db
self.rag = rag_client
self._rag_index: dict[str, RAGSearchResult] = {}
async def run(self, dry_run: bool = True, limit: int = 0) -> BackfillResult:
"""Main entry: iterate controls missing article/paragraph, match to RAG, update."""
result = BackfillResult()
# Load controls needing backfill
controls = self._load_controls_needing_backfill(limit)
result.total_controls = len(controls)
logger.info("Backfill: %d controls need article/paragraph enrichment", len(controls))
if not controls:
return result
# Collect hashes we need to find — only build index for controls with source text
needed_hashes: set[str] = set()
for ctrl in controls:
src = ctrl.get("source_original_text")
if src:
needed_hashes.add(hashlib.sha256(src.encode()).hexdigest())
if needed_hashes:
# Build targeted RAG index — only scroll collections that our controls reference
logger.info("Building targeted RAG hash index for %d source texts...", len(needed_hashes))
await self._build_rag_index_targeted(controls)
logger.info("RAG index built: %d chunks indexed, %d hashes needed", len(self._rag_index), len(needed_hashes))
else:
logger.info("No source_original_text found — skipping RAG index build")
# Process each control
for i, ctrl in enumerate(controls):
if i > 0 and i % 100 == 0:
logger.info("Backfill progress: %d/%d processed", i, result.total_controls)
try:
match = await self._match_control(ctrl)
if match:
if match.method == "hash":
result.matched_hash += 1
elif match.method == "regex":
result.matched_regex += 1
elif match.method == "llm":
result.matched_llm += 1
if not dry_run:
self._update_control(ctrl, match)
result.updated += 1
else:
logger.debug(
"DRY RUN: Would update %s with article=%s paragraph=%s (method=%s)",
ctrl["control_id"], match.article, match.paragraph, match.method,
)
else:
result.unmatched += 1
except Exception as e:
error_msg = f"Error backfilling {ctrl.get('control_id', '?')}: {e}"
logger.error(error_msg)
result.errors.append(error_msg)
if not dry_run:
try:
self.db.commit()
except Exception as e:
logger.error("Backfill commit failed: %s", e)
result.errors.append(f"Commit failed: {e}")
logger.info(
"Backfill complete: %d total, hash=%d regex=%d llm=%d unmatched=%d updated=%d",
result.total_controls, result.matched_hash, result.matched_regex,
result.matched_llm, result.unmatched, result.updated,
)
return result
def _load_controls_needing_backfill(self, limit: int = 0) -> list[dict]:
"""Load controls where source_citation exists but lacks separate 'article' key."""
query = """
SELECT id, control_id, source_citation, source_original_text,
generation_metadata, license_rule
FROM canonical_controls
WHERE license_rule IN (1, 2)
AND source_citation IS NOT NULL
AND (
source_citation->>'article' IS NULL
OR source_citation->>'article' = ''
)
ORDER BY control_id
"""
if limit > 0:
query += f" LIMIT {limit}"
result = self.db.execute(text(query))
cols = result.keys()
controls = []
for row in result:
ctrl = dict(zip(cols, row))
ctrl["id"] = str(ctrl["id"])
# Parse JSON fields
for jf in ("source_citation", "generation_metadata"):
if isinstance(ctrl.get(jf), str):
try:
ctrl[jf] = json.loads(ctrl[jf])
except (json.JSONDecodeError, TypeError):
ctrl[jf] = {}
controls.append(ctrl)
return controls
async def _build_rag_index_targeted(self, controls: list[dict]):
"""Build RAG index by scrolling only collections relevant to our controls.
Uses regulation codes from generation_metadata to identify which collections
to search, falling back to all collections only if needed.
"""
# Determine which collections are relevant based on regulation codes
regulation_to_collection = self._map_regulations_to_collections(controls)
collections_to_search = set(regulation_to_collection.values()) or set(ALL_COLLECTIONS)
logger.info("Targeted index: searching %d collections: %s",
len(collections_to_search), ", ".join(collections_to_search))
for collection in collections_to_search:
offset = None
page = 0
seen_offsets: set[str] = set()
while True:
chunks, next_offset = await self.rag.scroll(
collection=collection, offset=offset, limit=200,
)
if not chunks:
break
for chunk in chunks:
if chunk.text and len(chunk.text.strip()) >= 50:
h = hashlib.sha256(chunk.text.encode()).hexdigest()
self._rag_index[h] = chunk
page += 1
if page % 50 == 0:
logger.info("Indexing %s: page %d (%d chunks so far)",
collection, page, len(self._rag_index))
if not next_offset:
break
if next_offset in seen_offsets:
logger.warning("Scroll loop in %s at page %d — stopping", collection, page)
break
seen_offsets.add(next_offset)
offset = next_offset
logger.info("Indexed collection %s: %d pages", collection, page)
def _map_regulations_to_collections(self, controls: list[dict]) -> dict[str, str]:
"""Map regulation codes from controls to likely Qdrant collections."""
# Heuristic: regulation code prefix → collection
collection_map = {
"eu_": "bp_compliance_gesetze",
"dsgvo": "bp_compliance_datenschutz",
"bdsg": "bp_compliance_gesetze",
"ttdsg": "bp_compliance_gesetze",
"nist_": "bp_compliance_ce",
"owasp": "bp_compliance_ce",
"bsi_": "bp_compliance_ce",
"enisa": "bp_compliance_ce",
"at_": "bp_compliance_recht",
"fr_": "bp_compliance_recht",
"es_": "bp_compliance_recht",
}
result: dict[str, str] = {}
for ctrl in controls:
meta = ctrl.get("generation_metadata") or {}
reg = meta.get("source_regulation", "")
if not reg:
continue
for prefix, coll in collection_map.items():
if reg.startswith(prefix):
result[reg] = coll
break
else:
# Unknown regulation — search all
for coll in ALL_COLLECTIONS:
result[f"_all_{coll}"] = coll
return result
async def _match_control(self, ctrl: dict) -> Optional[MatchResult]:
"""3-tier matching: hash → regex → LLM."""
# Tier 1: Hash match against RAG index
source_text = ctrl.get("source_original_text")
if source_text:
h = hashlib.sha256(source_text.encode()).hexdigest()
chunk = self._rag_index.get(h)
if chunk and (chunk.article or chunk.paragraph):
return MatchResult(
article=chunk.article or "",
paragraph=chunk.paragraph or "",
method="hash",
)
# Tier 2: Regex parse concatenated source
citation = ctrl.get("source_citation") or {}
source_str = citation.get("source", "")
parsed = _parse_concatenated_source(source_str)
if parsed and parsed["article"]:
return MatchResult(
article=parsed["article"],
paragraph="", # Regex can't extract paragraph from concatenated format
method="regex",
)
# Tier 3: Ollama LLM
if source_text:
return await self._llm_match(ctrl)
return None
async def _llm_match(self, ctrl: dict) -> Optional[MatchResult]:
"""Use Ollama to identify article/paragraph from source text."""
citation = ctrl.get("source_citation") or {}
regulation_name = citation.get("source", "")
metadata = ctrl.get("generation_metadata") or {}
regulation_code = metadata.get("source_regulation", "")
source_text = ctrl.get("source_original_text", "")
prompt = f"""Analysiere den folgenden Gesetzestext und bestimme den genauen Artikel und Absatz.
Gesetz: {regulation_name} (Code: {regulation_code})
Text:
---
{source_text[:2000]}
---
Antworte NUR mit JSON:
{{"article": "Art. XX", "paragraph": "Abs. Y"}}
Falls kein spezifischer Absatz erkennbar ist, setze paragraph auf "".
Falls kein Artikel erkennbar ist, setze article auf "".
Bei deutschen Gesetzen mit § verwende: "§ XX" statt "Art. XX"."""
try:
raw = await _llm_ollama(prompt, BACKFILL_SYSTEM_PROMPT)
data = _parse_json(raw)
if data and (data.get("article") or data.get("paragraph")):
return MatchResult(
article=data.get("article", ""),
paragraph=data.get("paragraph", ""),
method="llm",
)
except Exception as e:
logger.warning("LLM match failed for %s: %s", ctrl.get("control_id"), e)
return None
def _update_control(self, ctrl: dict, match: MatchResult):
"""Update source_citation and generation_metadata in DB."""
citation = ctrl.get("source_citation") or {}
# Clean the source name: remove concatenated article if present
source_str = citation.get("source", "")
parsed = _parse_concatenated_source(source_str)
if parsed:
citation["source"] = parsed["name"]
# Add separate article/paragraph fields
citation["article"] = match.article
citation["paragraph"] = match.paragraph
# Update generation_metadata
metadata = ctrl.get("generation_metadata") or {}
if match.article:
metadata["source_article"] = match.article
metadata["source_paragraph"] = match.paragraph
metadata["backfill_method"] = match.method
metadata["backfill_at"] = datetime.now(timezone.utc).isoformat()
self.db.execute(
text("""
UPDATE canonical_controls
SET source_citation = :citation,
generation_metadata = :metadata,
updated_at = NOW()
WHERE id = CAST(:id AS uuid)
"""),
{
"id": ctrl["id"],
"citation": json.dumps(citation),
"metadata": json.dumps(metadata),
},
)
def _parse_concatenated_source(source: str) -> Optional[dict]:
"""Parse 'DSGVO Art. 35'{name: 'DSGVO', article: 'Art. 35'}.
Also handles '§' format: 'BDSG § 42'{name: 'BDSG', article: '§ 42'}.
"""
if not source:
return None
# Try Art./Artikel pattern
m = _SOURCE_ARTICLE_RE.match(source)
if m:
return {"name": m.group(1).strip(), "article": m.group(2).strip()}
# Try § pattern
m2 = re.match(r"^(.+?)\s+(§\s*\d+.*)$", source)
if m2:
return {"name": m2.group(1).strip(), "article": m2.group(2).strip()}
return None
async def _llm_ollama(prompt: str, system_prompt: Optional[str] = None) -> str:
"""Call Ollama chat API for backfill matching."""
messages = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
messages.append({"role": "user", "content": prompt})
payload = {
"model": OLLAMA_MODEL,
"messages": messages,
"stream": False,
"format": "json",
"options": {"num_predict": 256},
"think": False,
}
try:
async with httpx.AsyncClient(timeout=LLM_TIMEOUT) as client:
resp = await client.post(f"{OLLAMA_URL}/api/chat", json=payload)
if resp.status_code != 200:
logger.error("Ollama backfill failed %d: %s", resp.status_code, resp.text[:300])
return ""
data = resp.json()
msg = data.get("message", {})
if isinstance(msg, dict):
return msg.get("content", "")
return data.get("response", str(msg))
except Exception as e:
logger.error("Ollama backfill request failed: %s", e)
return ""
def _parse_json(raw: str) -> Optional[dict]:
"""Extract JSON object from LLM output."""
if not raw:
return None
# Try direct parse
try:
return json.loads(raw)
except json.JSONDecodeError:
pass
# Try extracting from markdown code block
m = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", raw, re.DOTALL)
if m:
try:
return json.loads(m.group(1))
except json.JSONDecodeError:
pass
# Try finding first { ... }
m = re.search(r"\{[^{}]*\}", raw)
if m:
try:
return json.loads(m.group(0))
except json.JSONDecodeError:
pass
return None

View File

@@ -0,0 +1,546 @@
"""Control Composer — Pattern + Obligation → Master Control.
Takes an obligation (from ObligationExtractor) and a matched control pattern
(from PatternMatcher), then uses LLM to compose a structured, actionable
Master Control. Replaces the old Stage 3 (STRUCTURE/REFORM) with a
pattern-guided approach.
Three composition modes based on license rules:
Rule 1: Obligation + Pattern + original text → full control
Rule 2: Obligation + Pattern + original text + citation → control
Rule 3: Obligation + Pattern (NO original text) → reformulated control
Fallback: No pattern match → basic generation (tagged needs_pattern_assignment)
Part of the Multi-Layer Control Architecture (Phase 6 of 8).
"""
import json
import logging
import os
from dataclasses import dataclass, field
from typing import Optional
from services.obligation_extractor import (
ObligationMatch,
_llm_ollama,
_parse_json,
)
from services.pattern_matcher import (
ControlPattern,
PatternMatchResult,
)
logger = logging.getLogger(__name__)
OLLAMA_MODEL = os.getenv("CONTROL_GEN_OLLAMA_MODEL", "qwen3.5:35b-a3b")
# Valid values for generated control fields
VALID_SEVERITIES = {"low", "medium", "high", "critical"}
VALID_EFFORTS = {"s", "m", "l", "xl"}
VALID_VERIFICATION = {"code_review", "document", "tool", "hybrid"}
@dataclass
class ComposedControl:
"""A Master Control composed from an obligation + pattern."""
# Core fields (match canonical_controls schema)
control_id: str = ""
title: str = ""
objective: str = ""
rationale: str = ""
scope: dict = field(default_factory=dict)
requirements: list = field(default_factory=list)
test_procedure: list = field(default_factory=list)
evidence: list = field(default_factory=list)
severity: str = "medium"
risk_score: float = 5.0
implementation_effort: str = "m"
open_anchors: list = field(default_factory=list)
release_state: str = "draft"
tags: list = field(default_factory=list)
# 3-Rule License fields
license_rule: Optional[int] = None
source_original_text: Optional[str] = None
source_citation: Optional[dict] = None
customer_visible: bool = True
# Classification
verification_method: Optional[str] = None
category: Optional[str] = None
target_audience: Optional[list] = None
# Pattern + Obligation linkage
pattern_id: Optional[str] = None
obligation_ids: list = field(default_factory=list)
# Metadata
generation_metadata: dict = field(default_factory=dict)
composition_method: str = "pattern_guided" # pattern_guided | fallback
def to_dict(self) -> dict:
"""Serialize for DB storage or API response."""
return {
"control_id": self.control_id,
"title": self.title,
"objective": self.objective,
"rationale": self.rationale,
"scope": self.scope,
"requirements": self.requirements,
"test_procedure": self.test_procedure,
"evidence": self.evidence,
"severity": self.severity,
"risk_score": self.risk_score,
"implementation_effort": self.implementation_effort,
"open_anchors": self.open_anchors,
"release_state": self.release_state,
"tags": self.tags,
"license_rule": self.license_rule,
"source_original_text": self.source_original_text,
"source_citation": self.source_citation,
"customer_visible": self.customer_visible,
"verification_method": self.verification_method,
"category": self.category,
"target_audience": self.target_audience,
"pattern_id": self.pattern_id,
"obligation_ids": self.obligation_ids,
"generation_metadata": self.generation_metadata,
"composition_method": self.composition_method,
}
class ControlComposer:
"""Composes Master Controls from obligations + patterns.
Usage::
composer = ControlComposer()
control = await composer.compose(
obligation=obligation_match,
pattern_result=pattern_match_result,
chunk_text="...",
license_rule=1,
source_citation={...},
)
"""
async def compose(
self,
obligation: ObligationMatch,
pattern_result: PatternMatchResult,
chunk_text: Optional[str] = None,
license_rule: int = 3,
source_citation: Optional[dict] = None,
regulation_code: Optional[str] = None,
) -> ComposedControl:
"""Compose a Master Control from obligation + pattern.
Args:
obligation: The extracted obligation (from ObligationExtractor).
pattern_result: The matched pattern (from PatternMatcher).
chunk_text: Original RAG chunk text (only used for Rules 1-2).
license_rule: 1=free, 2=citation, 3=restricted.
source_citation: Citation metadata for Rule 2.
regulation_code: Source regulation code.
Returns:
ComposedControl ready for storage.
"""
pattern = pattern_result.pattern if pattern_result else None
if pattern:
control = await self._compose_with_pattern(
obligation, pattern, chunk_text, license_rule, source_citation,
)
else:
control = await self._compose_fallback(
obligation, chunk_text, license_rule, source_citation,
)
# Set linkage fields
control.pattern_id = pattern.id if pattern else None
if obligation.obligation_id:
control.obligation_ids = [obligation.obligation_id]
# Set license fields
control.license_rule = license_rule
if license_rule in (1, 2) and chunk_text:
control.source_original_text = chunk_text
if license_rule == 2 and source_citation:
control.source_citation = source_citation
if license_rule == 3:
control.customer_visible = False
control.source_original_text = None
control.source_citation = None
# Build metadata
control.generation_metadata = {
"composition_method": control.composition_method,
"pattern_id": control.pattern_id,
"pattern_confidence": round(pattern_result.confidence, 3) if pattern_result else 0,
"pattern_method": pattern_result.method if pattern_result else "none",
"obligation_id": obligation.obligation_id,
"obligation_method": obligation.method,
"obligation_confidence": round(obligation.confidence, 3),
"license_rule": license_rule,
"regulation_code": regulation_code,
}
# Validate and fix fields
_validate_control(control)
return control
async def compose_batch(
self,
items: list[dict],
) -> list[ComposedControl]:
"""Compose multiple controls.
Args:
items: List of dicts with keys: obligation, pattern_result,
chunk_text, license_rule, source_citation, regulation_code.
Returns:
List of ComposedControl instances.
"""
results = []
for item in items:
control = await self.compose(
obligation=item["obligation"],
pattern_result=item.get("pattern_result", PatternMatchResult()),
chunk_text=item.get("chunk_text"),
license_rule=item.get("license_rule", 3),
source_citation=item.get("source_citation"),
regulation_code=item.get("regulation_code"),
)
results.append(control)
return results
# -----------------------------------------------------------------------
# Pattern-guided composition
# -----------------------------------------------------------------------
async def _compose_with_pattern(
self,
obligation: ObligationMatch,
pattern: ControlPattern,
chunk_text: Optional[str],
license_rule: int,
source_citation: Optional[dict],
) -> ComposedControl:
"""Use LLM to fill the pattern template with obligation-specific details."""
prompt = _build_compose_prompt(obligation, pattern, chunk_text, license_rule)
system_prompt = _compose_system_prompt(license_rule)
llm_result = await _llm_ollama(prompt, system_prompt)
if not llm_result:
return self._compose_from_template(obligation, pattern)
parsed = _parse_json(llm_result)
if not parsed:
return self._compose_from_template(obligation, pattern)
control = ComposedControl(
title=parsed.get("title", pattern.name_de)[:255],
objective=parsed.get("objective", pattern.objective_template),
rationale=parsed.get("rationale", pattern.rationale_template),
requirements=_ensure_list(parsed.get("requirements", pattern.requirements_template)),
test_procedure=_ensure_list(parsed.get("test_procedure", pattern.test_procedure_template)),
evidence=_ensure_list(parsed.get("evidence", pattern.evidence_template)),
severity=parsed.get("severity", pattern.severity_default),
implementation_effort=parsed.get("implementation_effort", pattern.implementation_effort_default),
category=parsed.get("category", pattern.category),
tags=_ensure_list(parsed.get("tags", pattern.tags)),
target_audience=_ensure_list(parsed.get("target_audience", [])),
verification_method=parsed.get("verification_method"),
open_anchors=_anchors_from_pattern(pattern),
composition_method="pattern_guided",
)
return control
def _compose_from_template(
self,
obligation: ObligationMatch,
pattern: ControlPattern,
) -> ComposedControl:
"""Fallback: fill template directly without LLM (when LLM fails)."""
obl_title = obligation.obligation_title or ""
obl_text = obligation.obligation_text or ""
title = f"{pattern.name_de}"
if obl_title:
title = f"{pattern.name_de}{obl_title}"
objective = pattern.objective_template
if obl_text and len(obl_text) > 20:
objective = f"{pattern.objective_template} Bezug: {obl_text[:200]}"
return ComposedControl(
title=title[:255],
objective=objective,
rationale=pattern.rationale_template,
requirements=list(pattern.requirements_template),
test_procedure=list(pattern.test_procedure_template),
evidence=list(pattern.evidence_template),
severity=pattern.severity_default,
implementation_effort=pattern.implementation_effort_default,
category=pattern.category,
tags=list(pattern.tags),
open_anchors=_anchors_from_pattern(pattern),
composition_method="template_only",
)
# -----------------------------------------------------------------------
# Fallback (no pattern)
# -----------------------------------------------------------------------
async def _compose_fallback(
self,
obligation: ObligationMatch,
chunk_text: Optional[str],
license_rule: int,
source_citation: Optional[dict],
) -> ComposedControl:
"""Generate a control without a pattern template (old-style)."""
prompt = _build_fallback_prompt(obligation, chunk_text, license_rule)
system_prompt = _compose_system_prompt(license_rule)
llm_result = await _llm_ollama(prompt, system_prompt)
parsed = _parse_json(llm_result) if llm_result else {}
obl_text = obligation.obligation_text or ""
control = ComposedControl(
title=parsed.get("title", obl_text[:100] if obl_text else "Untitled Control")[:255],
objective=parsed.get("objective", obl_text[:500]),
rationale=parsed.get("rationale", "Aus gesetzlicher Pflicht abgeleitet."),
requirements=_ensure_list(parsed.get("requirements", [])),
test_procedure=_ensure_list(parsed.get("test_procedure", [])),
evidence=_ensure_list(parsed.get("evidence", [])),
severity=parsed.get("severity", "medium"),
implementation_effort=parsed.get("implementation_effort", "m"),
category=parsed.get("category"),
tags=_ensure_list(parsed.get("tags", [])),
target_audience=_ensure_list(parsed.get("target_audience", [])),
verification_method=parsed.get("verification_method"),
composition_method="fallback",
release_state="needs_review",
)
return control
# ---------------------------------------------------------------------------
# Prompt builders
# ---------------------------------------------------------------------------
def _compose_system_prompt(license_rule: int) -> str:
"""Build the system prompt based on license rule."""
if license_rule == 3:
return (
"Du bist ein Security-Compliance-Experte. Deine Aufgabe ist es, "
"eigenstaendige Security Controls zu formulieren. "
"Du formulierst IMMER in eigenen Worten. "
"KOPIERE KEINE Saetze aus dem Quelltext. "
"Verwende eigene Begriffe und Struktur. "
"NENNE NICHT die Quelle. Keine proprietaeren Bezeichner. "
"Antworte NUR mit validem JSON."
)
return (
"Du bist ein Security-Compliance-Experte. "
"Erstelle ein praxisorientiertes, umsetzbares Security Control. "
"Antworte NUR mit validem JSON."
)
def _build_compose_prompt(
obligation: ObligationMatch,
pattern: ControlPattern,
chunk_text: Optional[str],
license_rule: int,
) -> str:
"""Build the LLM prompt for pattern-guided composition."""
obl_section = _obligation_section(obligation)
pattern_section = _pattern_section(pattern)
if license_rule == 3:
context_section = "KONTEXT: Intern analysiert (keine Quellenangabe)."
elif chunk_text:
context_section = f"KONTEXT (Originaltext):\n{chunk_text[:2000]}"
else:
context_section = "KONTEXT: Kein Originaltext verfuegbar."
return f"""Erstelle ein PRAXISORIENTIERTES Security Control.
{obl_section}
{pattern_section}
{context_section}
AUFGABE:
Fuelle das Muster mit pflicht-spezifischen Details.
Das Ergebnis muss UMSETZBAR sein — keine Gesetzesparaphrase.
Formuliere konkret und handlungsorientiert.
Antworte als JSON:
{{
"title": "Kurzer praegnanter Titel (max 100 Zeichen, deutsch)",
"objective": "Was soll erreicht werden? (1-3 Saetze)",
"rationale": "Warum ist das wichtig? (1-2 Saetze)",
"requirements": ["Konkrete Anforderung 1", "Anforderung 2", ...],
"test_procedure": ["Pruefschritt 1", "Pruefschritt 2", ...],
"evidence": ["Nachweis 1", "Nachweis 2", ...],
"severity": "low|medium|high|critical",
"implementation_effort": "s|m|l|xl",
"category": "{pattern.category}",
"tags": ["tag1", "tag2"],
"target_audience": ["unternehmen", "behoerden", "entwickler"],
"verification_method": "code_review|document|tool|hybrid"
}}"""
def _build_fallback_prompt(
obligation: ObligationMatch,
chunk_text: Optional[str],
license_rule: int,
) -> str:
"""Build the LLM prompt for fallback composition (no pattern)."""
obl_section = _obligation_section(obligation)
if license_rule == 3:
context_section = "KONTEXT: Intern analysiert (keine Quellenangabe)."
elif chunk_text:
context_section = f"KONTEXT (Originaltext):\n{chunk_text[:2000]}"
else:
context_section = "KONTEXT: Kein Originaltext verfuegbar."
return f"""Erstelle ein Security Control aus der folgenden Pflicht.
{obl_section}
{context_section}
AUFGABE:
Formuliere ein umsetzbares Security Control.
Keine Gesetzesparaphrase — konkrete Massnahmen beschreiben.
Antworte als JSON:
{{
"title": "Kurzer praegnanter Titel (max 100 Zeichen, deutsch)",
"objective": "Was soll erreicht werden? (1-3 Saetze)",
"rationale": "Warum ist das wichtig? (1-2 Saetze)",
"requirements": ["Konkrete Anforderung 1", "Anforderung 2", ...],
"test_procedure": ["Pruefschritt 1", "Pruefschritt 2", ...],
"evidence": ["Nachweis 1", "Nachweis 2", ...],
"severity": "low|medium|high|critical",
"implementation_effort": "s|m|l|xl",
"category": "one of: authentication, encryption, data_protection, etc.",
"tags": ["tag1", "tag2"],
"target_audience": ["unternehmen"],
"verification_method": "code_review|document|tool|hybrid"
}}"""
def _obligation_section(obligation: ObligationMatch) -> str:
"""Format the obligation for the prompt."""
parts = ["PFLICHT (was das Gesetz verlangt):"]
if obligation.obligation_title:
parts.append(f" Titel: {obligation.obligation_title}")
if obligation.obligation_text:
parts.append(f" Beschreibung: {obligation.obligation_text[:500]}")
if obligation.obligation_id:
parts.append(f" ID: {obligation.obligation_id}")
if obligation.regulation_id:
parts.append(f" Rechtsgrundlage: {obligation.regulation_id}")
if not obligation.obligation_text and not obligation.obligation_title:
parts.append(" (Keine spezifische Pflicht extrahiert)")
return "\n".join(parts)
def _pattern_section(pattern: ControlPattern) -> str:
"""Format the pattern for the prompt."""
reqs = "\n ".join(f"- {r}" for r in pattern.requirements_template[:5])
tests = "\n ".join(f"- {t}" for t in pattern.test_procedure_template[:3])
return f"""MUSTER (wie man es typischerweise umsetzt):
Pattern: {pattern.name_de} ({pattern.id})
Domain: {pattern.domain}
Ziel-Template: {pattern.objective_template}
Anforderungs-Template:
{reqs}
Pruefverfahren-Template:
{tests}"""
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _ensure_list(value) -> list:
"""Ensure a value is a list of strings."""
if isinstance(value, list):
return [str(v) for v in value if v]
if isinstance(value, str):
return [value]
return []
def _anchors_from_pattern(pattern: ControlPattern) -> list:
"""Convert pattern's open_anchor_refs to control anchor format."""
anchors = []
for ref in pattern.open_anchor_refs:
anchors.append({
"framework": ref.get("framework", ""),
"control_id": ref.get("ref", ""),
"title": "",
"alignment_score": 0.8,
})
return anchors
def _validate_control(control: ComposedControl) -> None:
"""Validate and fix control field values."""
# Severity
if control.severity not in VALID_SEVERITIES:
control.severity = "medium"
# Implementation effort
if control.implementation_effort not in VALID_EFFORTS:
control.implementation_effort = "m"
# Verification method
if control.verification_method and control.verification_method not in VALID_VERIFICATION:
control.verification_method = None
# Risk score
if not (0 <= control.risk_score <= 10):
control.risk_score = _severity_to_risk(control.severity)
# Title length
if len(control.title) > 255:
control.title = control.title[:252] + "..."
# Ensure minimum content
if not control.objective:
control.objective = control.title
if not control.rationale:
control.rationale = "Aus regulatorischer Anforderung abgeleitet."
if not control.requirements:
control.requirements = ["Anforderung gemaess Pflichtbeschreibung umsetzen"]
if not control.test_procedure:
control.test_procedure = ["Umsetzung der Anforderungen pruefen"]
if not control.evidence:
control.evidence = ["Dokumentation der Umsetzung"]
def _severity_to_risk(severity: str) -> float:
"""Map severity to a default risk score."""
return {
"critical": 9.0,
"high": 7.0,
"medium": 5.0,
"low": 3.0,
}.get(severity, 5.0)

View File

@@ -0,0 +1,745 @@
"""Control Deduplication Engine — 4-Stage Matching Pipeline.
Prevents duplicate atomic controls during Pass 0b by checking candidates
against existing controls before insertion.
Stages:
1. Pattern-Gate: pattern_id must match (hard gate)
2. Action-Check: normalized action verb must match (hard gate)
3. Object-Norm: normalized object must match (soft gate with high threshold)
4. Embedding: cosine similarity with tiered thresholds (Qdrant)
Verdicts:
- NEW: create a new atomic control
- LINK: add parent link to existing control (similarity > LINK_THRESHOLD)
- REVIEW: queue for human review (REVIEW_THRESHOLD < sim < LINK_THRESHOLD)
"""
import logging
import os
import re
from dataclasses import dataclass, field
from typing import Optional, Callable, Awaitable
import httpx
logger = logging.getLogger(__name__)
# ── Configuration ────────────────────────────────────────────────────
DEDUP_ENABLED = os.getenv("DEDUP_ENABLED", "true").lower() == "true"
LINK_THRESHOLD = float(os.getenv("DEDUP_LINK_THRESHOLD", "0.92"))
REVIEW_THRESHOLD = float(os.getenv("DEDUP_REVIEW_THRESHOLD", "0.85"))
LINK_THRESHOLD_DIFF_OBJECT = float(os.getenv("DEDUP_LINK_THRESHOLD_DIFF_OBJ", "0.95"))
CROSS_REG_LINK_THRESHOLD = float(os.getenv("DEDUP_CROSS_REG_THRESHOLD", "0.95"))
QDRANT_COLLECTION = os.getenv("DEDUP_QDRANT_COLLECTION", "atomic_controls")
QDRANT_URL = os.getenv("QDRANT_URL", "http://host.docker.internal:6333")
EMBEDDING_URL = os.getenv("EMBEDDING_URL", "http://embedding-service:8087")
# ── Result Dataclass ─────────────────────────────────────────────────
@dataclass
class DedupResult:
"""Outcome of the dedup check."""
verdict: str # "new" | "link" | "review"
matched_control_uuid: Optional[str] = None
matched_control_id: Optional[str] = None
matched_title: Optional[str] = None
stage: str = "" # which stage decided
similarity_score: float = 0.0
link_type: str = "dedup_merge" # "dedup_merge" | "cross_regulation"
details: dict = field(default_factory=dict)
# ── Action Normalization ─────────────────────────────────────────────
_ACTION_SYNONYMS: dict[str, str] = {
# German → canonical English
"implementieren": "implement",
"umsetzen": "implement",
"einrichten": "implement",
"einführen": "implement",
"aufbauen": "implement",
"bereitstellen": "implement",
"aktivieren": "implement",
"konfigurieren": "configure",
"einstellen": "configure",
"parametrieren": "configure",
"testen": "test",
"prüfen": "test",
"überprüfen": "test",
"verifizieren": "test",
"validieren": "test",
"kontrollieren": "test",
"auditieren": "audit",
"dokumentieren": "document",
"protokollieren": "log",
"aufzeichnen": "log",
"loggen": "log",
"überwachen": "monitor",
"monitoring": "monitor",
"beobachten": "monitor",
"schulen": "train",
"trainieren": "train",
"sensibilisieren": "train",
"löschen": "delete",
"entfernen": "delete",
"verschlüsseln": "encrypt",
"sperren": "block",
"beschränken": "restrict",
"einschränken": "restrict",
"begrenzen": "restrict",
"autorisieren": "authorize",
"genehmigen": "authorize",
"freigeben": "authorize",
"authentifizieren": "authenticate",
"identifizieren": "identify",
"melden": "report",
"benachrichtigen": "notify",
"informieren": "notify",
"aktualisieren": "update",
"erneuern": "update",
"sichern": "backup",
"wiederherstellen": "restore",
# English passthrough
"implement": "implement",
"configure": "configure",
"test": "test",
"verify": "test",
"validate": "test",
"audit": "audit",
"document": "document",
"log": "log",
"monitor": "monitor",
"train": "train",
"delete": "delete",
"encrypt": "encrypt",
"restrict": "restrict",
"authorize": "authorize",
"authenticate": "authenticate",
"report": "report",
"update": "update",
"backup": "backup",
"restore": "restore",
}
def normalize_action(action: str) -> str:
"""Normalize an action verb to a canonical English form."""
if not action:
return ""
action = action.strip().lower()
# Strip German infinitive/conjugation suffixes for lookup
action_base = re.sub(r"(en|t|st|e|te|tet|end)$", "", action)
# Try exact match first, then base form
if action in _ACTION_SYNONYMS:
return _ACTION_SYNONYMS[action]
if action_base in _ACTION_SYNONYMS:
return _ACTION_SYNONYMS[action_base]
# Fuzzy: check if action starts with any known verb
for verb, canonical in _ACTION_SYNONYMS.items():
if action.startswith(verb) or verb.startswith(action):
return canonical
return action # fallback: return as-is
# ── Object Normalization ─────────────────────────────────────────────
_OBJECT_SYNONYMS: dict[str, str] = {
# Authentication / Access
"mfa": "multi_factor_auth",
"multi-faktor-authentifizierung": "multi_factor_auth",
"mehrfaktorauthentifizierung": "multi_factor_auth",
"multi-factor authentication": "multi_factor_auth",
"two-factor": "multi_factor_auth",
"2fa": "multi_factor_auth",
"passwort": "password_policy",
"kennwort": "password_policy",
"password": "password_policy",
"zugangsdaten": "credentials",
"credentials": "credentials",
"admin-konten": "privileged_access",
"admin accounts": "privileged_access",
"administratorkonten": "privileged_access",
"privilegierte zugriffe": "privileged_access",
"privileged accounts": "privileged_access",
"remote-zugriff": "remote_access",
"fernzugriff": "remote_access",
"remote access": "remote_access",
"session": "session_management",
"sitzung": "session_management",
"sitzungsverwaltung": "session_management",
# Encryption
"verschlüsselung": "encryption",
"encryption": "encryption",
"kryptografie": "encryption",
"kryptografische verfahren": "encryption",
"schlüssel": "key_management",
"key management": "key_management",
"schlüsselverwaltung": "key_management",
"zertifikat": "certificate_management",
"certificate": "certificate_management",
"tls": "transport_encryption",
"ssl": "transport_encryption",
"https": "transport_encryption",
# Network
"firewall": "firewall",
"netzwerk": "network_security",
"network": "network_security",
"vpn": "vpn",
"segmentierung": "network_segmentation",
"segmentation": "network_segmentation",
# Logging / Monitoring
"audit-log": "audit_logging",
"audit log": "audit_logging",
"protokoll": "audit_logging",
"logging": "audit_logging",
"monitoring": "monitoring",
"überwachung": "monitoring",
"alerting": "alerting",
"alarmierung": "alerting",
"siem": "siem",
# Data
"personenbezogene daten": "personal_data",
"personal data": "personal_data",
"sensible daten": "sensitive_data",
"sensitive data": "sensitive_data",
"datensicherung": "backup",
"backup": "backup",
"wiederherstellung": "disaster_recovery",
"disaster recovery": "disaster_recovery",
# Policy / Process
"richtlinie": "policy",
"policy": "policy",
"verfahrensanweisung": "procedure",
"procedure": "procedure",
"prozess": "process",
"schulung": "training",
"training": "training",
"awareness": "awareness",
"sensibilisierung": "awareness",
# Incident
"vorfall": "incident",
"incident": "incident",
"sicherheitsvorfall": "security_incident",
"security incident": "security_incident",
# Vulnerability
"schwachstelle": "vulnerability",
"vulnerability": "vulnerability",
"patch": "patch_management",
"update": "patch_management",
"patching": "patch_management",
}
# Precompile for substring matching (longest first)
_OBJECT_KEYS_SORTED = sorted(_OBJECT_SYNONYMS.keys(), key=len, reverse=True)
def normalize_object(obj: str) -> str:
"""Normalize a compliance object to a canonical token."""
if not obj:
return ""
obj_lower = obj.strip().lower()
# Exact match
if obj_lower in _OBJECT_SYNONYMS:
return _OBJECT_SYNONYMS[obj_lower]
# Substring match (longest first)
for phrase in _OBJECT_KEYS_SORTED:
if phrase in obj_lower:
return _OBJECT_SYNONYMS[phrase]
# Fallback: strip articles/prepositions, join with underscore
cleaned = re.sub(r"\b(der|die|das|den|dem|des|ein|eine|eines|einem|einen"
r"|für|von|zu|auf|in|an|bei|mit|nach|über|unter|the|a|an"
r"|for|of|to|on|in|at|by|with)\b", "", obj_lower)
tokens = [t for t in cleaned.split() if len(t) > 2]
return "_".join(tokens[:4]) if tokens else obj_lower.replace(" ", "_")
# ── Canonicalization ─────────────────────────────────────────────────
def canonicalize_text(action: str, obj: str, title: str = "") -> str:
"""Build a canonical English text for embedding.
Transforms German compliance text into normalized English tokens
for more stable embedding comparisons.
"""
norm_action = normalize_action(action)
norm_object = normalize_object(obj)
# Build canonical sentence
parts = [norm_action, norm_object]
if title:
# Add title keywords (stripped of common filler)
title_clean = re.sub(
r"\b(und|oder|für|von|zu|der|die|das|den|dem|des|ein|eine"
r"|bei|mit|nach|gemäß|gem\.|laut|entsprechend)\b",
"", title.lower()
)
title_tokens = [t for t in title_clean.split() if len(t) > 3][:5]
if title_tokens:
parts.append("for")
parts.extend(title_tokens)
return " ".join(parts)
# ── Embedding Helper ─────────────────────────────────────────────────
async def get_embedding(text: str) -> list[float]:
"""Get embedding vector for a single text via embedding service."""
try:
async with httpx.AsyncClient(timeout=10.0) as client:
resp = await client.post(
f"{EMBEDDING_URL}/embed",
json={"texts": [text]},
)
embeddings = resp.json().get("embeddings", [])
return embeddings[0] if embeddings else []
except Exception as e:
logger.warning("Embedding failed: %s", e)
return []
def cosine_similarity(a: list[float], b: list[float]) -> float:
"""Compute cosine similarity between two vectors."""
if not a or not b or len(a) != len(b):
return 0.0
dot = sum(x * y for x, y in zip(a, b))
norm_a = sum(x * x for x in a) ** 0.5
norm_b = sum(x * x for x in b) ** 0.5
if norm_a == 0 or norm_b == 0:
return 0.0
return dot / (norm_a * norm_b)
# ── Qdrant Helpers ───────────────────────────────────────────────────
async def qdrant_search(
embedding: list[float],
pattern_id: str,
top_k: int = 10,
collection: Optional[str] = None,
) -> list[dict]:
"""Search Qdrant for similar atomic controls, filtered by pattern_id."""
if not embedding:
return []
coll = collection or QDRANT_COLLECTION
body: dict = {
"vector": embedding,
"limit": top_k,
"with_payload": True,
"filter": {
"must": [
{"key": "pattern_id", "match": {"value": pattern_id}}
]
},
}
try:
async with httpx.AsyncClient(timeout=10.0) as client:
resp = await client.post(
f"{QDRANT_URL}/collections/{coll}/points/search",
json=body,
)
if resp.status_code != 200:
logger.warning("Qdrant search failed: %d", resp.status_code)
return []
return resp.json().get("result", [])
except Exception as e:
logger.warning("Qdrant search error: %s", e)
return []
async def qdrant_search_cross_regulation(
embedding: list[float],
top_k: int = 5,
collection: Optional[str] = None,
) -> list[dict]:
"""Search Qdrant for similar controls across ALL regulations (no pattern_id filter).
Used for cross-regulation linking (e.g. DSGVO Art. 25 ↔ NIS2 Art. 21).
"""
if not embedding:
return []
coll = collection or QDRANT_COLLECTION
body: dict = {
"vector": embedding,
"limit": top_k,
"with_payload": True,
}
try:
async with httpx.AsyncClient(timeout=10.0) as client:
resp = await client.post(
f"{QDRANT_URL}/collections/{coll}/points/search",
json=body,
)
if resp.status_code != 200:
logger.warning("Qdrant cross-reg search failed: %d", resp.status_code)
return []
return resp.json().get("result", [])
except Exception as e:
logger.warning("Qdrant cross-reg search error: %s", e)
return []
async def qdrant_upsert(
point_id: str,
embedding: list[float],
payload: dict,
collection: Optional[str] = None,
) -> bool:
"""Upsert a single point into a Qdrant collection."""
if not embedding:
return False
coll = collection or QDRANT_COLLECTION
body = {
"points": [{
"id": point_id,
"vector": embedding,
"payload": payload,
}]
}
try:
async with httpx.AsyncClient(timeout=10.0) as client:
resp = await client.put(
f"{QDRANT_URL}/collections/{coll}/points",
json=body,
)
return resp.status_code == 200
except Exception as e:
logger.warning("Qdrant upsert error: %s", e)
return False
async def ensure_qdrant_collection(
vector_size: int = 1024,
collection: Optional[str] = None,
) -> bool:
"""Create a Qdrant collection if it doesn't exist (idempotent)."""
coll = collection or QDRANT_COLLECTION
try:
async with httpx.AsyncClient(timeout=10.0) as client:
# Check if exists
resp = await client.get(f"{QDRANT_URL}/collections/{coll}")
if resp.status_code == 200:
return True
# Create
resp = await client.put(
f"{QDRANT_URL}/collections/{coll}",
json={
"vectors": {"size": vector_size, "distance": "Cosine"},
},
)
if resp.status_code == 200:
logger.info("Created Qdrant collection: %s", coll)
# Create payload indexes
for field_name in ["pattern_id", "action_normalized", "object_normalized", "control_id"]:
await client.put(
f"{QDRANT_URL}/collections/{coll}/index",
json={"field_name": field_name, "field_schema": "keyword"},
)
return True
logger.error("Failed to create Qdrant collection: %d", resp.status_code)
return False
except Exception as e:
logger.warning("Qdrant collection check error: %s", e)
return False
# ── Main Dedup Checker ───────────────────────────────────────────────
class ControlDedupChecker:
"""4-stage dedup checker for atomic controls.
Usage:
checker = ControlDedupChecker(db_session)
result = await checker.check_duplicate(candidate_action, candidate_object, candidate_title, pattern_id)
if result.verdict == "link":
checker.add_parent_link(result.matched_control_uuid, parent_uuid)
elif result.verdict == "review":
checker.write_review(candidate, result)
else:
# Insert new control
"""
def __init__(
self,
db,
embed_fn: Optional[Callable[[str], Awaitable[list[float]]]] = None,
search_fn: Optional[Callable] = None,
):
self.db = db
self._embed = embed_fn or get_embedding
self._search = search_fn or qdrant_search
self._cache: dict[str, list[dict]] = {} # pattern_id → existing controls
def _load_existing(self, pattern_id: str) -> list[dict]:
"""Load existing atomic controls with same pattern_id from DB."""
if pattern_id in self._cache:
return self._cache[pattern_id]
from sqlalchemy import text
rows = self.db.execute(text("""
SELECT id::text, control_id, title, objective,
pattern_id,
generation_metadata->>'obligation_type' as obligation_type
FROM canonical_controls
WHERE parent_control_uuid IS NOT NULL
AND release_state != 'deprecated'
AND pattern_id = :pid
"""), {"pid": pattern_id}).fetchall()
result = [
{
"uuid": r[0], "control_id": r[1], "title": r[2],
"objective": r[3], "pattern_id": r[4],
"obligation_type": r[5],
}
for r in rows
]
self._cache[pattern_id] = result
return result
async def check_duplicate(
self,
action: str,
obj: str,
title: str,
pattern_id: Optional[str],
) -> DedupResult:
"""Run the 4-stage dedup pipeline + cross-regulation linking.
Returns DedupResult with verdict: new/link/review.
"""
# No pattern_id → can't dedup meaningfully
if not pattern_id:
return DedupResult(verdict="new", stage="no_pattern")
# Stage 1: Pattern-Gate
existing = self._load_existing(pattern_id)
if not existing:
return DedupResult(
verdict="new", stage="pattern_gate",
details={"reason": "no existing controls with this pattern_id"},
)
# Stage 2: Action-Check
norm_action = normalize_action(action)
# We don't have action stored on existing controls from DB directly,
# so we use embedding for controls that passed pattern gate.
# But we CAN check via generation_metadata if available.
# Stage 3: Object-Normalization
norm_object = normalize_object(obj)
# Stage 4: Embedding Similarity
canonical = canonicalize_text(action, obj, title)
embedding = await self._embed(canonical)
if not embedding:
# Can't compute embedding → default to new
return DedupResult(
verdict="new", stage="embedding_unavailable",
details={"canonical_text": canonical},
)
# Search Qdrant
results = await self._search(embedding, pattern_id, top_k=5)
if not results:
# No intra-pattern matches → try cross-regulation
return await self._check_cross_regulation(embedding, DedupResult(
verdict="new", stage="no_qdrant_matches",
details={"canonical_text": canonical, "action": norm_action, "object": norm_object},
))
# Evaluate best match
best = results[0]
best_score = best.get("score", 0.0)
best_payload = best.get("payload", {})
best_action = best_payload.get("action_normalized", "")
best_object = best_payload.get("object_normalized", "")
# Action differs → NEW (even if embedding is high)
if best_action and norm_action and best_action != norm_action:
return await self._check_cross_regulation(embedding, DedupResult(
verdict="new", stage="action_mismatch",
similarity_score=best_score,
matched_control_id=best_payload.get("control_id"),
details={
"candidate_action": norm_action,
"existing_action": best_action,
"similarity": best_score,
},
))
# Object differs → use higher threshold
if best_object and norm_object and best_object != norm_object:
if best_score > LINK_THRESHOLD_DIFF_OBJECT:
return DedupResult(
verdict="link", stage="embedding_diff_object",
matched_control_uuid=best_payload.get("control_uuid"),
matched_control_id=best_payload.get("control_id"),
matched_title=best_payload.get("title"),
similarity_score=best_score,
details={"candidate_object": norm_object, "existing_object": best_object},
)
return await self._check_cross_regulation(embedding, DedupResult(
verdict="new", stage="object_mismatch_below_threshold",
similarity_score=best_score,
matched_control_id=best_payload.get("control_id"),
details={
"candidate_object": norm_object,
"existing_object": best_object,
"threshold": LINK_THRESHOLD_DIFF_OBJECT,
},
))
# Same action + same object → tiered thresholds
if best_score > LINK_THRESHOLD:
return DedupResult(
verdict="link", stage="embedding_match",
matched_control_uuid=best_payload.get("control_uuid"),
matched_control_id=best_payload.get("control_id"),
matched_title=best_payload.get("title"),
similarity_score=best_score,
)
if best_score > REVIEW_THRESHOLD:
return DedupResult(
verdict="review", stage="embedding_review",
matched_control_uuid=best_payload.get("control_uuid"),
matched_control_id=best_payload.get("control_id"),
matched_title=best_payload.get("title"),
similarity_score=best_score,
)
return await self._check_cross_regulation(embedding, DedupResult(
verdict="new", stage="embedding_below_threshold",
similarity_score=best_score,
details={"threshold": REVIEW_THRESHOLD},
))
async def _check_cross_regulation(
self,
embedding: list[float],
intra_result: DedupResult,
) -> DedupResult:
"""Second pass: cross-regulation linking for controls deemed 'new'.
Searches Qdrant WITHOUT pattern_id filter. Uses a higher threshold
(0.95) to avoid false positives across regulation boundaries.
"""
if intra_result.verdict != "new" or not embedding:
return intra_result
cross_results = await qdrant_search_cross_regulation(embedding, top_k=5)
if not cross_results:
return intra_result
best = cross_results[0]
best_score = best.get("score", 0.0)
if best_score > CROSS_REG_LINK_THRESHOLD:
best_payload = best.get("payload", {})
return DedupResult(
verdict="link",
stage="cross_regulation",
matched_control_uuid=best_payload.get("control_uuid"),
matched_control_id=best_payload.get("control_id"),
matched_title=best_payload.get("title"),
similarity_score=best_score,
link_type="cross_regulation",
details={
"cross_reg_score": best_score,
"cross_reg_threshold": CROSS_REG_LINK_THRESHOLD,
},
)
return intra_result
def add_parent_link(
self,
control_uuid: str,
parent_control_uuid: str,
link_type: str = "dedup_merge",
confidence: float = 0.0,
source_regulation: Optional[str] = None,
source_article: Optional[str] = None,
obligation_candidate_id: Optional[str] = None,
) -> None:
"""Add a parent link to an existing atomic control."""
from sqlalchemy import text
self.db.execute(text("""
INSERT INTO control_parent_links
(control_uuid, parent_control_uuid, link_type, confidence,
source_regulation, source_article, obligation_candidate_id)
VALUES (:cu, :pu, :lt, :conf, :sr, :sa, :oci::uuid)
ON CONFLICT (control_uuid, parent_control_uuid) DO NOTHING
"""), {
"cu": control_uuid,
"pu": parent_control_uuid,
"lt": link_type,
"conf": confidence,
"sr": source_regulation,
"sa": source_article,
"oci": obligation_candidate_id,
})
self.db.commit()
def write_review(
self,
candidate_control_id: str,
candidate_title: str,
candidate_objective: str,
result: DedupResult,
parent_control_uuid: Optional[str] = None,
obligation_candidate_id: Optional[str] = None,
) -> None:
"""Write a dedup review queue entry."""
from sqlalchemy import text
self.db.execute(text("""
INSERT INTO control_dedup_reviews
(candidate_control_id, candidate_title, candidate_objective,
matched_control_uuid, matched_control_id,
similarity_score, dedup_stage, dedup_details,
parent_control_uuid, obligation_candidate_id)
VALUES (:ccid, :ct, :co, :mcu::uuid, :mci, :ss, :ds,
:dd::jsonb, :pcu::uuid, :oci)
"""), {
"ccid": candidate_control_id,
"ct": candidate_title,
"co": candidate_objective,
"mcu": result.matched_control_uuid,
"mci": result.matched_control_id,
"ss": result.similarity_score,
"ds": result.stage,
"dd": __import__("json").dumps(result.details),
"pcu": parent_control_uuid,
"oci": obligation_candidate_id,
})
self.db.commit()
async def index_control(
self,
control_uuid: str,
control_id: str,
title: str,
action: str,
obj: str,
pattern_id: str,
collection: Optional[str] = None,
) -> bool:
"""Index a new atomic control in Qdrant for future dedup checks."""
norm_action = normalize_action(action)
norm_object = normalize_object(obj)
canonical = canonicalize_text(action, obj, title)
embedding = await self._embed(canonical)
if not embedding:
return False
return await qdrant_upsert(
point_id=control_uuid,
embedding=embedding,
payload={
"control_uuid": control_uuid,
"control_id": control_id,
"title": title,
"pattern_id": pattern_id,
"action_normalized": norm_action,
"object_normalized": norm_object,
"canonical_text": canonical,
},
collection=collection,
)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,154 @@
"""
Control Status Transition State Machine.
Enforces that controls cannot be set to "pass" without sufficient evidence.
Prevents Compliance-Theater where controls claim compliance without real proof.
Transition rules:
planned → in_progress : always allowed
in_progress → pass : requires ≥1 evidence with confidence ≥ E2 and
truth_status in (uploaded, observed, validated_internal)
in_progress → partial : requires ≥1 evidence (any level)
pass → fail : always allowed (degradation)
any → n/a : requires status_justification
any → planned : always allowed (reset)
"""
from typing import Any, List, Optional, Tuple
# EvidenceDB is an ORM model from compliance — we only need duck-typed objects
# with .confidence_level and .truth_status attributes.
EvidenceDB = Any
# Confidence level ordering for comparisons
CONFIDENCE_ORDER = {"E0": 0, "E1": 1, "E2": 2, "E3": 3, "E4": 4}
# Truth statuses that qualify as "real" evidence for pass transitions
VALID_TRUTH_STATUSES = {"uploaded", "observed", "validated_internal", "accepted_by_auditor", "provided_to_auditor"}
def validate_transition(
current_status: str,
new_status: str,
evidence_list: Optional[List[EvidenceDB]] = None,
status_justification: Optional[str] = None,
bypass_for_auto_updater: bool = False,
) -> Tuple[bool, List[str]]:
"""
Validate whether a control status transition is allowed.
Args:
current_status: Current control status value (e.g. "planned", "pass")
new_status: Requested new status
evidence_list: List of EvidenceDB objects linked to this control
status_justification: Text justification (required for n/a transitions)
bypass_for_auto_updater: If True, skip evidence checks (used by CI/CD auto-updater
which creates evidence atomically with status change)
Returns:
Tuple of (allowed: bool, violations: list[str])
"""
violations: List[str] = []
evidence_list = evidence_list or []
# Same status → no-op, always allowed
if current_status == new_status:
return True, []
# Reset to planned is always allowed
if new_status == "planned":
return True, []
# n/a requires justification
if new_status == "n/a":
if not status_justification or not status_justification.strip():
violations.append("Transition to 'n/a' requires a status_justification explaining why this control is not applicable.")
return len(violations) == 0, violations
# Degradation: pass → fail is always allowed
if current_status == "pass" and new_status == "fail":
return True, []
# planned → in_progress: always allowed
if current_status == "planned" and new_status == "in_progress":
return True, []
# in_progress → partial: needs at least 1 evidence
if new_status == "partial":
if not bypass_for_auto_updater and len(evidence_list) == 0:
violations.append("Transition to 'partial' requires at least 1 evidence record.")
return len(violations) == 0, violations
# in_progress → pass: strict requirements
if new_status == "pass":
if bypass_for_auto_updater:
return True, []
if len(evidence_list) == 0:
violations.append("Transition to 'pass' requires at least 1 evidence record.")
return False, violations
# Check for at least one qualifying evidence
has_qualifying = False
for e in evidence_list:
conf = getattr(e, "confidence_level", None)
truth = getattr(e, "truth_status", None)
# Get string values from enum or string
conf_val = conf.value if hasattr(conf, "value") else str(conf) if conf else "E1"
truth_val = truth.value if hasattr(truth, "value") else str(truth) if truth else "uploaded"
if CONFIDENCE_ORDER.get(conf_val, 1) >= CONFIDENCE_ORDER["E2"] and truth_val in VALID_TRUTH_STATUSES:
has_qualifying = True
break
if not has_qualifying:
violations.append(
"Transition to 'pass' requires at least 1 evidence with confidence >= E2 "
"and truth_status in (uploaded, observed, validated_internal, accepted_by_auditor). "
"Current evidence does not meet this threshold."
)
return len(violations) == 0, violations
# in_progress → fail: always allowed
if new_status == "fail":
return True, []
# Any other transition from planned/fail to pass requires going through in_progress
if current_status in ("planned", "fail") and new_status == "pass":
if bypass_for_auto_updater:
return True, []
violations.append(
f"Direct transition from '{current_status}' to 'pass' is not allowed. "
f"Move to 'in_progress' first, then to 'pass' with qualifying evidence."
)
return False, violations
# Default: allow other transitions (e.g. fail → partial, partial → pass)
# For partial → pass, apply the same evidence checks
if current_status == "partial" and new_status == "pass":
if bypass_for_auto_updater:
return True, []
has_qualifying = False
for e in evidence_list:
conf = getattr(e, "confidence_level", None)
truth = getattr(e, "truth_status", None)
conf_val = conf.value if hasattr(conf, "value") else str(conf) if conf else "E1"
truth_val = truth.value if hasattr(truth, "value") else str(truth) if truth else "uploaded"
if CONFIDENCE_ORDER.get(conf_val, 1) >= CONFIDENCE_ORDER["E2"] and truth_val in VALID_TRUTH_STATUSES:
has_qualifying = True
break
if not has_qualifying:
violations.append(
"Transition from 'partial' to 'pass' requires at least 1 evidence with confidence >= E2 "
"and truth_status in (uploaded, observed, validated_internal, accepted_by_auditor)."
)
return len(violations) == 0, violations
# All other transitions allowed
return True, []

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,714 @@
"""Framework Decomposition Engine — decomposes framework-container obligations.
Sits between Pass 0a (obligation extraction) and Pass 0b (atomic control
composition). Detects obligations that reference a framework domain (e.g.
"CCM-Praktiken fuer AIS") and decomposes them into concrete sub-obligations
using an internal framework registry.
Three routing types:
atomic → pass through to Pass 0b unchanged
compound → split compound verbs, then Pass 0b
framework_container → decompose via registry, then Pass 0b
The registry is a set of JSON files under compliance/data/frameworks/.
"""
import json
import logging
import os
import re
import uuid
from dataclasses import dataclass, field
from pathlib import Path
from typing import Optional
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Registry loading
# ---------------------------------------------------------------------------
_REGISTRY_DIR = Path(__file__).resolve().parent.parent / "data" / "frameworks"
_REGISTRY: dict[str, dict] = {} # framework_id → framework dict
def _load_registry() -> dict[str, dict]:
"""Load all framework JSON files from the registry directory."""
registry: dict[str, dict] = {}
if not _REGISTRY_DIR.is_dir():
logger.warning("Framework registry dir not found: %s", _REGISTRY_DIR)
return registry
for fpath in sorted(_REGISTRY_DIR.glob("*.json")):
try:
with open(fpath, encoding="utf-8") as f:
fw = json.load(f)
fw_id = fw.get("framework_id", fpath.stem)
registry[fw_id] = fw
logger.info(
"Loaded framework: %s (%d domains)",
fw_id,
len(fw.get("domains", [])),
)
except Exception:
logger.exception("Failed to load framework file: %s", fpath)
return registry
def get_registry() -> dict[str, dict]:
"""Return the global framework registry (lazy-loaded)."""
global _REGISTRY
if not _REGISTRY:
_REGISTRY = _load_registry()
return _REGISTRY
def reload_registry() -> dict[str, dict]:
"""Force-reload the framework registry from disk."""
global _REGISTRY
_REGISTRY = _load_registry()
return _REGISTRY
# ---------------------------------------------------------------------------
# Framework alias index (built from registry)
# ---------------------------------------------------------------------------
def _build_alias_index(registry: dict[str, dict]) -> dict[str, str]:
"""Build a lowercase alias → framework_id lookup."""
idx: dict[str, str] = {}
for fw_id, fw in registry.items():
# Framework-level aliases
idx[fw_id.lower()] = fw_id
name = fw.get("display_name", "")
if name:
idx[name.lower()] = fw_id
# Common short forms
for part in fw_id.lower().replace("_", " ").split():
if len(part) >= 3:
idx[part] = fw_id
return idx
# ---------------------------------------------------------------------------
# Routing — classify obligation type
# ---------------------------------------------------------------------------
# Extended patterns for framework detection (beyond the simple _COMPOSITE_RE
# in decomposition_pass.py — here we also capture the framework name)
_FRAMEWORK_PATTERN = re.compile(
r"(?:praktiken|kontrollen|ma(?:ss|ß)nahmen|anforderungen|vorgaben|controls|practices|measures|requirements)"
r"\s+(?:f(?:ue|ü)r|aus|gem(?:ae|ä)(?:ss|ß)|nach|from|of|for|per)\s+"
r"(.+?)(?:\s+(?:m(?:ue|ü)ssen|sollen|sind|werden|implementieren|umsetzen|einf(?:ue|ü)hren)|\.|,|$)",
re.IGNORECASE,
)
# Direct framework name references
_DIRECT_FRAMEWORK_RE = re.compile(
r"\b(?:CSA\s*CCM|NIST\s*(?:SP\s*)?800-53|OWASP\s*(?:ASVS|SAMM|Top\s*10)"
r"|CIS\s*Controls|BSI\s*(?:IT-)?Grundschutz|ENISA|ISO\s*2700[12]"
r"|COBIT|SOX|PCI\s*DSS|HITRUST|SOC\s*2|KRITIS)\b",
re.IGNORECASE,
)
# Compound verb patterns (multiple main verbs)
_COMPOUND_VERB_RE = re.compile(
r"\b(?:und|sowie|als\s+auch|or|and)\b",
re.IGNORECASE,
)
# No-split phrases that look compound but aren't
_NO_SPLIT_PHRASES = [
"pflegen und aufrechterhalten",
"dokumentieren und pflegen",
"definieren und dokumentieren",
"erstellen und freigeben",
"pruefen und genehmigen",
"identifizieren und bewerten",
"erkennen und melden",
"define and maintain",
"create and maintain",
"establish and maintain",
"monitor and review",
"detect and respond",
]
@dataclass
class RoutingResult:
"""Result of obligation routing classification."""
routing_type: str # atomic | compound | framework_container | unknown_review
framework_ref: Optional[str] = None
framework_domain: Optional[str] = None
domain_title: Optional[str] = None
confidence: float = 0.0
reason: str = ""
def classify_routing(
obligation_text: str,
action_raw: str,
object_raw: str,
condition_raw: Optional[str] = None,
) -> RoutingResult:
"""Classify an obligation into atomic / compound / framework_container."""
combined = f"{obligation_text} {object_raw}".lower()
# --- Step 1: Framework container detection ---
fw_result = _detect_framework(obligation_text, object_raw)
if fw_result.routing_type == "framework_container":
return fw_result
# --- Step 2: Compound verb detection ---
if _is_compound_obligation(action_raw, obligation_text):
return RoutingResult(
routing_type="compound",
confidence=0.7,
reason="multiple_main_verbs",
)
# --- Step 3: Default = atomic ---
return RoutingResult(
routing_type="atomic",
confidence=0.9,
reason="single_action_single_object",
)
def _detect_framework(
obligation_text: str, object_raw: str,
) -> RoutingResult:
"""Detect if obligation references a framework domain."""
combined = f"{obligation_text} {object_raw}"
registry = get_registry()
alias_idx = _build_alias_index(registry)
# Strategy 1: direct framework name match
m = _DIRECT_FRAMEWORK_RE.search(combined)
if m:
fw_name = m.group(0).strip()
fw_id = _resolve_framework_id(fw_name, alias_idx, registry)
if fw_id:
domain_id, domain_title = _match_domain(
combined, registry[fw_id],
)
return RoutingResult(
routing_type="framework_container",
framework_ref=fw_id,
framework_domain=domain_id,
domain_title=domain_title,
confidence=0.95 if domain_id else 0.75,
reason=f"direct_framework_match:{fw_name}",
)
else:
# Framework name recognized but not in registry
return RoutingResult(
routing_type="framework_container",
framework_ref=None,
framework_domain=None,
confidence=0.6,
reason=f"direct_framework_match_no_registry:{fw_name}",
)
# Strategy 2: pattern match ("Praktiken fuer X")
m2 = _FRAMEWORK_PATTERN.search(combined)
if m2:
ref_text = m2.group(1).strip()
fw_id, domain_id, domain_title = _resolve_from_ref_text(
ref_text, registry, alias_idx,
)
if fw_id:
return RoutingResult(
routing_type="framework_container",
framework_ref=fw_id,
framework_domain=domain_id,
domain_title=domain_title,
confidence=0.85 if domain_id else 0.65,
reason=f"pattern_match:{ref_text}",
)
# Strategy 3: keyword-heavy object
if _has_framework_keywords(object_raw):
return RoutingResult(
routing_type="framework_container",
framework_ref=None,
framework_domain=None,
confidence=0.5,
reason="framework_keywords_in_object",
)
return RoutingResult(routing_type="atomic", confidence=0.0)
def _resolve_framework_id(
name: str,
alias_idx: dict[str, str],
registry: dict[str, dict],
) -> Optional[str]:
"""Resolve a framework name to its registry ID."""
normalized = re.sub(r"\s+", " ", name.strip().lower())
# Direct alias match
if normalized in alias_idx:
return alias_idx[normalized]
# Try compact form (strip spaces, hyphens, underscores)
compact = re.sub(r"[\s_\-]+", "", normalized)
for alias, fw_id in alias_idx.items():
if re.sub(r"[\s_\-]+", "", alias) == compact:
return fw_id
# Substring match in display names
for fw_id, fw in registry.items():
display = fw.get("display_name", "").lower()
if normalized in display or display in normalized:
return fw_id
# Partial match: check if normalized contains any alias (for multi-word refs)
for alias, fw_id in alias_idx.items():
if len(alias) >= 4 and alias in normalized:
return fw_id
return None
def _match_domain(
text: str, framework: dict,
) -> tuple[Optional[str], Optional[str]]:
"""Match a domain within a framework from text references."""
text_lower = text.lower()
best_id: Optional[str] = None
best_title: Optional[str] = None
best_score = 0
for domain in framework.get("domains", []):
score = 0
domain_id = domain["domain_id"]
title = domain.get("title", "")
# Exact domain ID match (e.g. "AIS")
if re.search(rf"\b{re.escape(domain_id)}\b", text, re.IGNORECASE):
score += 10
# Full title match
if title.lower() in text_lower:
score += 8
# Alias match
for alias in domain.get("aliases", []):
if alias.lower() in text_lower:
score += 6
break
# Keyword overlap
kw_hits = sum(
1 for kw in domain.get("keywords", [])
if kw.lower() in text_lower
)
score += kw_hits
if score > best_score:
best_score = score
best_id = domain_id
best_title = title
if best_score >= 3:
return best_id, best_title
return None, None
def _resolve_from_ref_text(
ref_text: str,
registry: dict[str, dict],
alias_idx: dict[str, str],
) -> tuple[Optional[str], Optional[str], Optional[str]]:
"""Resolve framework + domain from a reference text like 'AIS' or 'Application Security'."""
ref_lower = ref_text.lower()
for fw_id, fw in registry.items():
for domain in fw.get("domains", []):
# Check domain ID
if domain["domain_id"].lower() in ref_lower:
return fw_id, domain["domain_id"], domain.get("title")
# Check title
if domain.get("title", "").lower() in ref_lower:
return fw_id, domain["domain_id"], domain.get("title")
# Check aliases
for alias in domain.get("aliases", []):
if alias.lower() in ref_lower or ref_lower in alias.lower():
return fw_id, domain["domain_id"], domain.get("title")
return None, None, None
_FRAMEWORK_KW_SET = {
"praktiken", "kontrollen", "massnahmen", "maßnahmen",
"anforderungen", "vorgaben", "framework", "standard",
"baseline", "katalog", "domain", "family", "category",
"practices", "controls", "measures", "requirements",
}
def _has_framework_keywords(text: str) -> bool:
"""Check if text contains framework-indicator keywords."""
words = set(re.findall(r"[a-zäöüß]+", text.lower()))
return len(words & _FRAMEWORK_KW_SET) >= 2
def _is_compound_obligation(action_raw: str, obligation_text: str) -> bool:
"""Detect if the obligation has multiple competing main verbs."""
if not action_raw:
return False
action_lower = action_raw.lower().strip()
# Check no-split phrases first
for phrase in _NO_SPLIT_PHRASES:
if phrase in action_lower:
return False
# Must have a conjunction
if not _COMPOUND_VERB_RE.search(action_lower):
return False
# Split by conjunctions and check if we get 2+ meaningful verbs
parts = re.split(r"\b(?:und|sowie|als\s+auch|or|and)\b", action_lower)
meaningful = [p.strip() for p in parts if len(p.strip()) >= 3]
return len(meaningful) >= 2
# ---------------------------------------------------------------------------
# Framework Decomposition
# ---------------------------------------------------------------------------
@dataclass
class DecomposedObligation:
"""A concrete obligation derived from a framework container."""
obligation_candidate_id: str
parent_control_id: str
parent_framework_container_id: str
source_ref_law: str
source_ref_article: str
obligation_text: str
actor: str
action_raw: str
object_raw: str
condition_raw: Optional[str] = None
trigger_raw: Optional[str] = None
routing_type: str = "atomic"
release_state: str = "decomposed"
subcontrol_id: str = ""
# Metadata
action_hint: str = ""
object_hint: str = ""
object_class: str = ""
keywords: list[str] = field(default_factory=list)
@dataclass
class FrameworkDecompositionResult:
"""Result of framework decomposition."""
framework_container_id: str
source_obligation_candidate_id: str
framework_ref: Optional[str]
framework_domain: Optional[str]
domain_title: Optional[str]
matched_subcontrols: list[str]
decomposition_confidence: float
release_state: str # decomposed | unmatched | error
decomposed_obligations: list[DecomposedObligation]
issues: list[str]
def decompose_framework_container(
obligation_candidate_id: str,
parent_control_id: str,
obligation_text: str,
framework_ref: Optional[str],
framework_domain: Optional[str],
actor: str = "organization",
) -> FrameworkDecompositionResult:
"""Decompose a framework-container obligation into concrete sub-obligations.
Steps:
1. Resolve framework from registry
2. Resolve domain within framework
3. Select relevant subcontrols (keyword filter or full domain)
4. Generate decomposed obligations
"""
container_id = f"FWC-{uuid.uuid4().hex[:8]}"
registry = get_registry()
issues: list[str] = []
# Step 1: Resolve framework
fw = None
if framework_ref and framework_ref in registry:
fw = registry[framework_ref]
else:
# Try to find by name in text
fw, framework_ref = _find_framework_in_text(obligation_text, registry)
if not fw:
issues.append("ERROR: framework_not_matched")
return FrameworkDecompositionResult(
framework_container_id=container_id,
source_obligation_candidate_id=obligation_candidate_id,
framework_ref=framework_ref,
framework_domain=framework_domain,
domain_title=None,
matched_subcontrols=[],
decomposition_confidence=0.0,
release_state="unmatched",
decomposed_obligations=[],
issues=issues,
)
# Step 2: Resolve domain
domain_data = None
domain_title = None
if framework_domain:
for d in fw.get("domains", []):
if d["domain_id"].lower() == framework_domain.lower():
domain_data = d
domain_title = d.get("title")
break
if not domain_data:
# Try matching from text
domain_id, domain_title = _match_domain(obligation_text, fw)
if domain_id:
for d in fw.get("domains", []):
if d["domain_id"] == domain_id:
domain_data = d
framework_domain = domain_id
break
if not domain_data:
issues.append("WARN: domain_not_matched — using all domains")
# Fall back to all subcontrols across all domains
all_subcontrols = []
for d in fw.get("domains", []):
for sc in d.get("subcontrols", []):
sc["_domain_id"] = d["domain_id"]
all_subcontrols.append(sc)
subcontrols = _select_subcontrols(obligation_text, all_subcontrols)
if not subcontrols:
issues.append("ERROR: no_subcontrols_matched")
return FrameworkDecompositionResult(
framework_container_id=container_id,
source_obligation_candidate_id=obligation_candidate_id,
framework_ref=framework_ref,
framework_domain=framework_domain,
domain_title=None,
matched_subcontrols=[],
decomposition_confidence=0.0,
release_state="unmatched",
decomposed_obligations=[],
issues=issues,
)
else:
# Step 3: Select subcontrols from domain
raw_subcontrols = domain_data.get("subcontrols", [])
subcontrols = _select_subcontrols(obligation_text, raw_subcontrols)
if not subcontrols:
# Full domain decomposition
subcontrols = raw_subcontrols
# Quality check: too many subcontrols
if len(subcontrols) > 25:
issues.append(f"WARN: {len(subcontrols)} subcontrols — may be too broad")
# Step 4: Generate decomposed obligations
display_name = fw.get("display_name", framework_ref or "Unknown")
decomposed: list[DecomposedObligation] = []
matched_ids: list[str] = []
for sc in subcontrols:
sc_id = sc.get("subcontrol_id", "")
matched_ids.append(sc_id)
action_hint = sc.get("action_hint", "")
object_hint = sc.get("object_hint", "")
# Quality warnings
if not action_hint:
issues.append(f"WARN: {sc_id} missing action_hint")
if not object_hint:
issues.append(f"WARN: {sc_id} missing object_hint")
obl_id = f"{obligation_candidate_id}-{sc_id}"
decomposed.append(DecomposedObligation(
obligation_candidate_id=obl_id,
parent_control_id=parent_control_id,
parent_framework_container_id=container_id,
source_ref_law=display_name,
source_ref_article=sc_id,
obligation_text=sc.get("statement", ""),
actor=actor,
action_raw=action_hint or _infer_action(sc.get("statement", "")),
object_raw=object_hint or _infer_object(sc.get("statement", "")),
routing_type="atomic",
release_state="decomposed",
subcontrol_id=sc_id,
action_hint=action_hint,
object_hint=object_hint,
object_class=sc.get("object_class", ""),
keywords=sc.get("keywords", []),
))
# Check if decomposed are identical to container
for d in decomposed:
if d.obligation_text.strip() == obligation_text.strip():
issues.append(f"WARN: {d.subcontrol_id} identical to container text")
confidence = _compute_decomposition_confidence(
framework_ref, framework_domain, domain_data, len(subcontrols), issues,
)
return FrameworkDecompositionResult(
framework_container_id=container_id,
source_obligation_candidate_id=obligation_candidate_id,
framework_ref=framework_ref,
framework_domain=framework_domain,
domain_title=domain_title,
matched_subcontrols=matched_ids,
decomposition_confidence=confidence,
release_state="decomposed",
decomposed_obligations=decomposed,
issues=issues,
)
def _find_framework_in_text(
text: str, registry: dict[str, dict],
) -> tuple[Optional[dict], Optional[str]]:
"""Try to find a framework by searching text for known names."""
alias_idx = _build_alias_index(registry)
m = _DIRECT_FRAMEWORK_RE.search(text)
if m:
fw_id = _resolve_framework_id(m.group(0), alias_idx, registry)
if fw_id and fw_id in registry:
return registry[fw_id], fw_id
return None, None
def _select_subcontrols(
obligation_text: str, subcontrols: list[dict],
) -> list[dict]:
"""Select relevant subcontrols based on keyword matching.
Returns empty list if no targeted match found (caller falls back to
full domain).
"""
text_lower = obligation_text.lower()
scored: list[tuple[int, dict]] = []
for sc in subcontrols:
score = 0
for kw in sc.get("keywords", []):
if kw.lower() in text_lower:
score += 1
# Title match
title = sc.get("title", "").lower()
if title and title in text_lower:
score += 3
# Object hint in text
obj = sc.get("object_hint", "").lower()
if obj and obj in text_lower:
score += 2
if score > 0:
scored.append((score, sc))
if not scored:
return []
# Only return those with meaningful overlap (score >= 2)
scored.sort(key=lambda x: x[0], reverse=True)
return [sc for score, sc in scored if score >= 2]
def _infer_action(statement: str) -> str:
"""Infer a basic action verb from a statement."""
s = statement.lower()
if any(w in s for w in ["definiert", "definieren", "define"]):
return "definieren"
if any(w in s for w in ["implementiert", "implementieren", "implement"]):
return "implementieren"
if any(w in s for w in ["dokumentiert", "dokumentieren", "document"]):
return "dokumentieren"
if any(w in s for w in ["ueberwacht", "ueberwachen", "monitor"]):
return "ueberwachen"
if any(w in s for w in ["getestet", "testen", "test"]):
return "testen"
if any(w in s for w in ["geschuetzt", "schuetzen", "protect"]):
return "implementieren"
if any(w in s for w in ["verwaltet", "verwalten", "manage"]):
return "pflegen"
if any(w in s for w in ["gemeldet", "melden", "report"]):
return "melden"
return "implementieren"
def _infer_object(statement: str) -> str:
"""Infer the primary object from a statement (first noun phrase)."""
# Simple heuristic: take the text after "muessen"/"muss" up to the verb
m = re.search(
r"(?:muessen|muss|m(?:ü|ue)ssen)\s+(.+?)(?:\s+werden|\s+sein|\.|,|$)",
statement,
re.IGNORECASE,
)
if m:
return m.group(1).strip()[:80]
# Fallback: first 80 chars
return statement[:80] if statement else ""
def _compute_decomposition_confidence(
framework_ref: Optional[str],
domain: Optional[str],
domain_data: Optional[dict],
num_subcontrols: int,
issues: list[str],
) -> float:
"""Compute confidence score for the decomposition."""
score = 0.3
if framework_ref:
score += 0.25
if domain:
score += 0.20
if domain_data:
score += 0.10
if 1 <= num_subcontrols <= 15:
score += 0.10
elif num_subcontrols > 15:
score += 0.05 # less confident with too many
# Penalize errors
errors = sum(1 for i in issues if i.startswith("ERROR:"))
score -= errors * 0.15
return round(max(min(score, 1.0), 0.0), 2)
# ---------------------------------------------------------------------------
# Registry statistics (for admin/debugging)
# ---------------------------------------------------------------------------
def registry_stats() -> dict:
"""Return summary statistics about the loaded registry."""
reg = get_registry()
stats = {
"frameworks": len(reg),
"details": [],
}
total_domains = 0
total_subcontrols = 0
for fw_id, fw in reg.items():
domains = fw.get("domains", [])
n_sc = sum(len(d.get("subcontrols", [])) for d in domains)
total_domains += len(domains)
total_subcontrols += n_sc
stats["details"].append({
"framework_id": fw_id,
"display_name": fw.get("display_name", ""),
"domains": len(domains),
"subcontrols": n_sc,
})
stats["total_domains"] = total_domains
stats["total_subcontrols"] = total_subcontrols
return stats

View File

@@ -0,0 +1,116 @@
"""
License Gate — checks whether a given source may be used for a specific purpose.
Usage types:
- analysis: Read + analyse internally (TDM under UrhG 44b)
- store_excerpt: Store verbatim excerpt in vault
- ship_embeddings: Ship embeddings in product
- ship_in_product: Ship text/content in product
Policy is driven by the canonical_control_sources table columns:
allowed_analysis, allowed_store_excerpt, allowed_ship_embeddings, allowed_ship_in_product
"""
from __future__ import annotations
import logging
from typing import Any
from sqlalchemy import text
from sqlalchemy.orm import Session
logger = logging.getLogger(__name__)
USAGE_COLUMN_MAP = {
"analysis": "allowed_analysis",
"store_excerpt": "allowed_store_excerpt",
"ship_embeddings": "allowed_ship_embeddings",
"ship_in_product": "allowed_ship_in_product",
}
def check_source_allowed(db: Session, source_id: str, usage_type: str) -> bool:
"""Check whether *source_id* may be used for *usage_type*.
Returns False if the source is unknown or the usage is not allowed.
"""
col = USAGE_COLUMN_MAP.get(usage_type)
if col is None:
logger.warning("Unknown usage_type=%s", usage_type)
return False
row = db.execute(
text(f"SELECT {col} FROM canonical_control_sources WHERE source_id = :sid"),
{"sid": source_id},
).fetchone()
if row is None:
logger.warning("Source %s not found in registry", source_id)
return False
return bool(row[0])
def get_license_matrix(db: Session) -> list[dict[str, Any]]:
"""Return the full license matrix with allowed usages per license."""
rows = db.execute(
text("""
SELECT license_id, name, terms_url, commercial_use,
ai_training_restriction, tdm_allowed_under_44b,
deletion_required, notes
FROM canonical_control_licenses
ORDER BY license_id
""")
).fetchall()
return [
{
"license_id": r.license_id,
"name": r.name,
"terms_url": r.terms_url,
"commercial_use": r.commercial_use,
"ai_training_restriction": r.ai_training_restriction,
"tdm_allowed_under_44b": r.tdm_allowed_under_44b,
"deletion_required": r.deletion_required,
"notes": r.notes,
}
for r in rows
]
def get_source_permissions(db: Session) -> list[dict[str, Any]]:
"""Return all sources with their permission flags."""
rows = db.execute(
text("""
SELECT s.source_id, s.title, s.publisher, s.url, s.version_label,
s.language, s.license_id,
s.allowed_analysis, s.allowed_store_excerpt,
s.allowed_ship_embeddings, s.allowed_ship_in_product,
s.vault_retention_days, s.vault_access_tier,
l.name AS license_name, l.commercial_use
FROM canonical_control_sources s
JOIN canonical_control_licenses l ON l.license_id = s.license_id
ORDER BY s.source_id
""")
).fetchall()
return [
{
"source_id": r.source_id,
"title": r.title,
"publisher": r.publisher,
"url": r.url,
"version_label": r.version_label,
"language": r.language,
"license_id": r.license_id,
"license_name": r.license_name,
"commercial_use": r.commercial_use,
"allowed_analysis": r.allowed_analysis,
"allowed_store_excerpt": r.allowed_store_excerpt,
"allowed_ship_embeddings": r.allowed_ship_embeddings,
"allowed_ship_in_product": r.allowed_ship_in_product,
"vault_retention_days": r.vault_retention_days,
"vault_access_tier": r.vault_access_tier,
}
for r in rows
]

View File

@@ -0,0 +1,624 @@
"""
LLM Provider Abstraction for Compliance AI Features.
Supports:
- Anthropic Claude API (default)
- Self-Hosted LLMs (Ollama, vLLM, LocalAI, etc.)
- HashiCorp Vault integration for secure API key storage
Configuration via environment variables:
- COMPLIANCE_LLM_PROVIDER: "anthropic" or "self_hosted"
- ANTHROPIC_API_KEY: API key for Claude (or loaded from Vault)
- ANTHROPIC_MODEL: Model name (default: claude-sonnet-4-20250514)
- SELF_HOSTED_LLM_URL: Base URL for self-hosted LLM
- SELF_HOSTED_LLM_MODEL: Model name for self-hosted
- SELF_HOSTED_LLM_KEY: Optional API key for self-hosted
Vault Configuration:
- VAULT_ADDR: Vault server address (e.g., http://vault:8200)
- VAULT_TOKEN: Vault authentication token
- USE_VAULT_SECRETS: Set to "true" to enable Vault integration
- VAULT_SECRET_PATH: Path to secrets (default: secret/breakpilot/api_keys)
"""
import os
import asyncio
import logging
from abc import ABC, abstractmethod
from typing import List, Optional, Dict, Any
from dataclasses import dataclass
from enum import Enum
import httpx
logger = logging.getLogger(__name__)
# =============================================================================
# Vault Integration
# =============================================================================
class VaultClient:
"""
HashiCorp Vault client for retrieving secrets.
Supports KV v2 secrets engine.
"""
def __init__(
self,
addr: Optional[str] = None,
token: Optional[str] = None
):
self.addr = addr or os.getenv("VAULT_ADDR", "http://localhost:8200")
self.token = token or os.getenv("VAULT_TOKEN")
self._cache: Dict[str, Any] = {}
self._cache_ttl = 300 # 5 minutes cache
def _get_headers(self) -> Dict[str, str]:
"""Get request headers with Vault token."""
headers = {"Content-Type": "application/json"}
if self.token:
headers["X-Vault-Token"] = self.token
return headers
def get_secret(self, path: str, key: str = "value") -> Optional[str]:
"""
Get a secret from Vault KV v2.
Args:
path: Secret path (e.g., "breakpilot/api_keys/anthropic")
key: Key within the secret data (default: "value")
Returns:
Secret value or None if not found
"""
cache_key = f"{path}:{key}"
# Check cache first
if cache_key in self._cache:
return self._cache[cache_key]
try:
# KV v2 uses /data/ in the path
full_path = f"{self.addr}/v1/secret/data/{path}"
response = httpx.get(
full_path,
headers=self._get_headers(),
timeout=10.0
)
if response.status_code == 200:
data = response.json()
secret_data = data.get("data", {}).get("data", {})
secret_value = secret_data.get(key)
if secret_value:
self._cache[cache_key] = secret_value
logger.info(f"Successfully loaded secret from Vault: {path}")
return secret_value
elif response.status_code == 404:
logger.warning(f"Secret not found in Vault: {path}")
else:
logger.error(f"Vault error {response.status_code}: {response.text}")
except httpx.RequestError as e:
logger.error(f"Failed to connect to Vault at {self.addr}: {e}")
except Exception as e:
logger.error(f"Error retrieving secret from Vault: {e}")
return None
def get_anthropic_key(self) -> Optional[str]:
"""Get Anthropic API key from Vault."""
path = os.getenv("VAULT_ANTHROPIC_PATH", "breakpilot/api_keys/anthropic")
return self.get_secret(path, "value")
def is_available(self) -> bool:
"""Check if Vault is available and authenticated."""
try:
response = httpx.get(
f"{self.addr}/v1/sys/health",
headers=self._get_headers(),
timeout=5.0
)
return response.status_code in (200, 429, 472, 473, 501, 503)
except Exception:
return False
# Singleton Vault client
_vault_client: Optional[VaultClient] = None
def get_vault_client() -> VaultClient:
"""Get shared Vault client instance."""
global _vault_client
if _vault_client is None:
_vault_client = VaultClient()
return _vault_client
def get_secret_from_vault_or_env(
vault_path: str,
env_var: str,
vault_key: str = "value"
) -> Optional[str]:
"""
Get a secret, trying Vault first, then falling back to environment variable.
Args:
vault_path: Path in Vault (e.g., "breakpilot/api_keys/anthropic")
env_var: Environment variable name as fallback
vault_key: Key within Vault secret data
Returns:
Secret value or None
"""
use_vault = os.getenv("USE_VAULT_SECRETS", "").lower() in ("true", "1", "yes")
if use_vault:
vault = get_vault_client()
secret = vault.get_secret(vault_path, vault_key)
if secret:
return secret
logger.info(f"Vault secret not found, falling back to env: {env_var}")
return os.getenv(env_var)
class LLMProviderType(str, Enum):
"""Supported LLM provider types."""
ANTHROPIC = "anthropic"
SELF_HOSTED = "self_hosted"
OLLAMA = "ollama" # Alias for self_hosted (Ollama-specific)
MOCK = "mock" # For testing
@dataclass
class LLMResponse:
"""Standard response from LLM."""
content: str
model: str
provider: str
usage: Optional[Dict[str, int]] = None
raw_response: Optional[Dict[str, Any]] = None
@dataclass
class LLMConfig:
"""Configuration for LLM provider."""
provider_type: LLMProviderType
api_key: Optional[str] = None
model: str = "claude-sonnet-4-20250514"
base_url: Optional[str] = None
max_tokens: int = 4096
temperature: float = 0.3
timeout: float = 60.0
class LLMProvider(ABC):
"""Abstract base class for LLM providers."""
def __init__(self, config: LLMConfig):
self.config = config
@abstractmethod
async def complete(
self,
prompt: str,
system_prompt: Optional[str] = None,
max_tokens: Optional[int] = None,
temperature: Optional[float] = None
) -> LLMResponse:
"""Generate a completion for the given prompt."""
pass
@abstractmethod
async def batch_complete(
self,
prompts: List[str],
system_prompt: Optional[str] = None,
max_tokens: Optional[int] = None,
rate_limit: float = 1.0
) -> List[LLMResponse]:
"""Generate completions for multiple prompts with rate limiting."""
pass
@property
@abstractmethod
def provider_name(self) -> str:
"""Return the provider name."""
pass
class AnthropicProvider(LLMProvider):
"""Claude API Provider using Anthropic's official API."""
ANTHROPIC_API_URL = "https://api.anthropic.com/v1/messages"
def __init__(self, config: LLMConfig):
super().__init__(config)
if not config.api_key:
raise ValueError("Anthropic API key is required")
self.api_key = config.api_key
self.model = config.model or "claude-sonnet-4-20250514"
@property
def provider_name(self) -> str:
return "anthropic"
async def complete(
self,
prompt: str,
system_prompt: Optional[str] = None,
max_tokens: Optional[int] = None,
temperature: Optional[float] = None
) -> LLMResponse:
"""Generate completion using Claude API."""
headers = {
"x-api-key": self.api_key,
"anthropic-version": "2023-06-01",
"content-type": "application/json"
}
messages = [{"role": "user", "content": prompt}]
payload = {
"model": self.model,
"max_tokens": max_tokens or self.config.max_tokens,
"messages": messages
}
if system_prompt:
payload["system"] = system_prompt
if temperature is not None:
payload["temperature"] = temperature
elif self.config.temperature is not None:
payload["temperature"] = self.config.temperature
async with httpx.AsyncClient(timeout=self.config.timeout) as client:
try:
response = await client.post(
self.ANTHROPIC_API_URL,
headers=headers,
json=payload
)
response.raise_for_status()
data = response.json()
content = ""
if data.get("content"):
content = data["content"][0].get("text", "")
return LLMResponse(
content=content,
model=self.model,
provider=self.provider_name,
usage=data.get("usage"),
raw_response=data
)
except httpx.HTTPStatusError as e:
logger.error(f"Anthropic API error: {e.response.status_code} - {e.response.text}")
raise
except Exception as e:
logger.error(f"Anthropic API request failed: {e}")
raise
async def batch_complete(
self,
prompts: List[str],
system_prompt: Optional[str] = None,
max_tokens: Optional[int] = None,
rate_limit: float = 1.0
) -> List[LLMResponse]:
"""Process multiple prompts with rate limiting."""
results = []
for i, prompt in enumerate(prompts):
if i > 0:
await asyncio.sleep(rate_limit)
try:
result = await self.complete(
prompt=prompt,
system_prompt=system_prompt,
max_tokens=max_tokens
)
results.append(result)
except Exception as e:
logger.error(f"Failed to process prompt {i}: {e}")
# Append error response
results.append(LLMResponse(
content=f"Error: {str(e)}",
model=self.model,
provider=self.provider_name
))
return results
class SelfHostedProvider(LLMProvider):
"""Self-Hosted LLM Provider supporting Ollama, vLLM, LocalAI, etc."""
def __init__(self, config: LLMConfig):
super().__init__(config)
if not config.base_url:
raise ValueError("Base URL is required for self-hosted provider")
self.base_url = config.base_url.rstrip("/")
self.model = config.model
self.api_key = config.api_key
@property
def provider_name(self) -> str:
return "self_hosted"
def _detect_api_format(self) -> str:
"""Detect the API format based on URL patterns."""
if "11434" in self.base_url or "ollama" in self.base_url.lower():
return "ollama"
elif "openai" in self.base_url.lower() or "v1" in self.base_url:
return "openai"
else:
return "ollama" # Default to Ollama format
async def complete(
self,
prompt: str,
system_prompt: Optional[str] = None,
max_tokens: Optional[int] = None,
temperature: Optional[float] = None
) -> LLMResponse:
"""Generate completion using self-hosted LLM."""
api_format = self._detect_api_format()
headers = {"content-type": "application/json"}
if self.api_key:
headers["Authorization"] = f"Bearer {self.api_key}"
if api_format == "ollama":
# Ollama API format
endpoint = f"{self.base_url}/api/generate"
full_prompt = prompt
if system_prompt:
full_prompt = f"{system_prompt}\n\n{prompt}"
payload = {
"model": self.model,
"prompt": full_prompt,
"stream": False,
"think": False, # Disable thinking mode (qwen3.5 etc.)
"options": {}
}
if max_tokens:
payload["options"]["num_predict"] = max_tokens
if temperature is not None:
payload["options"]["temperature"] = temperature
else:
# OpenAI-compatible format (vLLM, LocalAI, etc.)
endpoint = f"{self.base_url}/v1/chat/completions"
messages = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
messages.append({"role": "user", "content": prompt})
payload = {
"model": self.model,
"messages": messages,
"max_tokens": max_tokens or self.config.max_tokens,
"temperature": temperature if temperature is not None else self.config.temperature
}
async with httpx.AsyncClient(timeout=self.config.timeout) as client:
try:
response = await client.post(endpoint, headers=headers, json=payload)
response.raise_for_status()
data = response.json()
# Parse response based on format
if api_format == "ollama":
content = data.get("response", "")
else:
# OpenAI format
content = data.get("choices", [{}])[0].get("message", {}).get("content", "")
return LLMResponse(
content=content,
model=self.model,
provider=self.provider_name,
usage=data.get("usage"),
raw_response=data
)
except httpx.HTTPStatusError as e:
logger.error(f"Self-hosted LLM error: {e.response.status_code} - {e.response.text}")
raise
except Exception as e:
logger.error(f"Self-hosted LLM request failed: {e}")
raise
async def batch_complete(
self,
prompts: List[str],
system_prompt: Optional[str] = None,
max_tokens: Optional[int] = None,
rate_limit: float = 0.5 # Self-hosted can be faster
) -> List[LLMResponse]:
"""Process multiple prompts with rate limiting."""
results = []
for i, prompt in enumerate(prompts):
if i > 0:
await asyncio.sleep(rate_limit)
try:
result = await self.complete(
prompt=prompt,
system_prompt=system_prompt,
max_tokens=max_tokens
)
results.append(result)
except Exception as e:
logger.error(f"Failed to process prompt {i}: {e}")
results.append(LLMResponse(
content=f"Error: {str(e)}",
model=self.model,
provider=self.provider_name
))
return results
class MockProvider(LLMProvider):
"""Mock provider for testing without actual API calls."""
def __init__(self, config: LLMConfig):
super().__init__(config)
self.responses: List[str] = []
self.call_count = 0
@property
def provider_name(self) -> str:
return "mock"
def set_responses(self, responses: List[str]):
"""Set predetermined responses for testing."""
self.responses = responses
self.call_count = 0
async def complete(
self,
prompt: str,
system_prompt: Optional[str] = None,
max_tokens: Optional[int] = None,
temperature: Optional[float] = None
) -> LLMResponse:
"""Return mock response."""
if self.responses:
content = self.responses[self.call_count % len(self.responses)]
else:
content = f"Mock response for: {prompt[:50]}..."
self.call_count += 1
return LLMResponse(
content=content,
model="mock-model",
provider=self.provider_name,
usage={"input_tokens": len(prompt), "output_tokens": len(content)}
)
async def batch_complete(
self,
prompts: List[str],
system_prompt: Optional[str] = None,
max_tokens: Optional[int] = None,
rate_limit: float = 0.0
) -> List[LLMResponse]:
"""Return mock responses for batch."""
return [await self.complete(p, system_prompt, max_tokens) for p in prompts]
def get_llm_config() -> LLMConfig:
"""
Create LLM config from environment variables or Vault.
Priority for API key:
1. Vault (if USE_VAULT_SECRETS=true and Vault is available)
2. Environment variable (ANTHROPIC_API_KEY)
"""
provider_type_str = os.getenv("COMPLIANCE_LLM_PROVIDER", "anthropic")
try:
provider_type = LLMProviderType(provider_type_str)
except ValueError:
logger.warning(f"Unknown LLM provider: {provider_type_str}, falling back to mock")
provider_type = LLMProviderType.MOCK
# Get API key from Vault or environment
api_key = None
if provider_type == LLMProviderType.ANTHROPIC:
api_key = get_secret_from_vault_or_env(
vault_path="breakpilot/api_keys/anthropic",
env_var="ANTHROPIC_API_KEY"
)
elif provider_type in (LLMProviderType.SELF_HOSTED, LLMProviderType.OLLAMA):
api_key = get_secret_from_vault_or_env(
vault_path="breakpilot/api_keys/self_hosted_llm",
env_var="SELF_HOSTED_LLM_KEY"
)
# Select model based on provider type
if provider_type == LLMProviderType.ANTHROPIC:
model = os.getenv("ANTHROPIC_MODEL", "claude-sonnet-4-20250514")
elif provider_type in (LLMProviderType.SELF_HOSTED, LLMProviderType.OLLAMA):
model = os.getenv("SELF_HOSTED_LLM_MODEL", "qwen2.5:14b")
else:
model = "mock-model"
return LLMConfig(
provider_type=provider_type,
api_key=api_key,
model=model,
base_url=os.getenv("SELF_HOSTED_LLM_URL"),
max_tokens=int(os.getenv("COMPLIANCE_LLM_MAX_TOKENS", "4096")),
temperature=float(os.getenv("COMPLIANCE_LLM_TEMPERATURE", "0.3")),
timeout=float(os.getenv("COMPLIANCE_LLM_TIMEOUT", "60.0"))
)
def get_llm_provider(config: Optional[LLMConfig] = None) -> LLMProvider:
"""
Factory function to get the appropriate LLM provider based on configuration.
Usage:
provider = get_llm_provider()
response = await provider.complete("Analyze this requirement...")
"""
if config is None:
config = get_llm_config()
if config.provider_type == LLMProviderType.ANTHROPIC:
if not config.api_key:
logger.warning("No Anthropic API key found, using mock provider")
return MockProvider(config)
return AnthropicProvider(config)
elif config.provider_type in (LLMProviderType.SELF_HOSTED, LLMProviderType.OLLAMA):
if not config.base_url:
logger.warning("No self-hosted LLM URL found, using mock provider")
return MockProvider(config)
return SelfHostedProvider(config)
elif config.provider_type == LLMProviderType.MOCK:
return MockProvider(config)
else:
raise ValueError(f"Unsupported LLM provider type: {config.provider_type}")
# Singleton instance for reuse
_provider_instance: Optional[LLMProvider] = None
def get_shared_provider() -> LLMProvider:
"""Get a shared LLM provider instance."""
global _provider_instance
if _provider_instance is None:
_provider_instance = get_llm_provider()
return _provider_instance
def reset_shared_provider():
"""Reset the shared provider instance (useful for testing)."""
global _provider_instance
_provider_instance = None

View File

@@ -0,0 +1,59 @@
"""Shared normative language patterns for assertion classification.
Extracted from decomposition_pass.py for reuse in the assertion engine.
"""
import re
_PFLICHT_SIGNALS = [
r"\bmüssen\b", r"\bmuss\b", r"\bhat\s+sicherzustellen\b",
r"\bhaben\s+sicherzustellen\b", r"\bsind\s+verpflichtet\b",
r"\bist\s+verpflichtet\b",
r"\bist\s+zu\s+\w+en\b", r"\bsind\s+zu\s+\w+en\b",
r"\bhat\s+zu\s+\w+en\b", r"\bhaben\s+zu\s+\w+en\b",
r"\bist\s+\w+zu\w+en\b", r"\bsind\s+\w+zu\w+en\b",
r"\bist\s+\w+\s+zu\s+\w+en\b", r"\bsind\s+\w+\s+zu\s+\w+en\b",
r"\bhat\s+\w+\s+zu\s+\w+en\b", r"\bhaben\s+\w+\s+zu\s+\w+en\b",
r"\bshall\b", r"\bmust\b", r"\brequired\b",
r"\b\w+zuteilen\b", r"\b\w+zuwenden\b", r"\b\w+zustellen\b", r"\b\w+zulegen\b",
r"\b\w+zunehmen\b", r"\b\w+zuführen\b", r"\b\w+zuhalten\b", r"\b\w+zusetzen\b",
r"\b\w+zuweisen\b", r"\b\w+zuordnen\b", r"\b\w+zufügen\b", r"\b\w+zugeben\b",
r"\bist\b.{1,80}\bzu\s+\w+en\b", r"\bsind\b.{1,80}\bzu\s+\w+en\b",
]
PFLICHT_RE = re.compile("|".join(_PFLICHT_SIGNALS), re.IGNORECASE)
_EMPFEHLUNG_SIGNALS = [
r"\bsoll\b", r"\bsollen\b", r"\bsollte\b", r"\bsollten\b",
r"\bgewährleisten\b", r"\bsicherstellen\b",
r"\bshould\b", r"\bensure\b", r"\brecommend\w*\b",
r"\bnachweisen\b", r"\beinhalten\b", r"\bunterlassen\b", r"\bwahren\b",
r"\bdokumentieren\b", r"\bimplementieren\b", r"\büberprüfen\b", r"\büberwachen\b",
r"\bprüfen,\s+ob\b", r"\bkontrollieren,\s+ob\b",
]
EMPFEHLUNG_RE = re.compile("|".join(_EMPFEHLUNG_SIGNALS), re.IGNORECASE)
_KANN_SIGNALS = [
r"\bkann\b", r"\bkönnen\b", r"\bdarf\b", r"\bdürfen\b",
r"\bmay\b", r"\boptional\b",
]
KANN_RE = re.compile("|".join(_KANN_SIGNALS), re.IGNORECASE)
NORMATIVE_RE = re.compile(
"|".join(_PFLICHT_SIGNALS + _EMPFEHLUNG_SIGNALS + _KANN_SIGNALS),
re.IGNORECASE,
)
_RATIONALE_SIGNALS = [
r"\bda\s+", r"\bweil\b", r"\bgrund\b", r"\berwägung",
r"\bbecause\b", r"\breason\b", r"\brationale\b",
r"\bkönnen\s+.*\s+verursachen\b", r"\bführt\s+zu\b",
]
RATIONALE_RE = re.compile("|".join(_RATIONALE_SIGNALS), re.IGNORECASE)
# Evidence-related keywords (for fact detection)
_EVIDENCE_KEYWORDS = [
r"\bnachweis\b", r"\bzertifikat\b", r"\baudit.report\b",
r"\bprotokoll\b", r"\bdokumentation\b", r"\bbericht\b",
r"\bcertificate\b", r"\bevidence\b", r"\bproof\b",
]
EVIDENCE_RE = re.compile("|".join(_EVIDENCE_KEYWORDS), re.IGNORECASE)

View File

@@ -0,0 +1,563 @@
"""Obligation Extractor — 3-Tier Chunk-to-Obligation Linking.
Maps RAG chunks to obligations from the v2 obligation framework using
three tiers (fastest first):
Tier 1: EXACT MATCH — regulation_code + article → obligation_id (~40%)
Tier 2: EMBEDDING — chunk text vs. obligation descriptions (~30%)
Tier 3: LLM EXTRACT — local Ollama extracts obligation text (~25%)
Part of the Multi-Layer Control Architecture (Phase 4 of 8).
"""
import json
import logging
import os
import re
from dataclasses import dataclass, field
from pathlib import Path
from typing import Optional
import httpx
logger = logging.getLogger(__name__)
EMBEDDING_URL = os.getenv("EMBEDDING_URL", "http://embedding-service:8087")
OLLAMA_URL = os.getenv("OLLAMA_URL", "http://host.docker.internal:11434")
OLLAMA_MODEL = os.getenv("CONTROL_GEN_OLLAMA_MODEL", "qwen3.5:35b-a3b")
LLM_TIMEOUT = float(os.getenv("CONTROL_GEN_LLM_TIMEOUT", "180"))
# Embedding similarity thresholds for Tier 2
EMBEDDING_MATCH_THRESHOLD = 0.80
EMBEDDING_CANDIDATE_THRESHOLD = 0.60
# ---------------------------------------------------------------------------
# Regulation code mapping: RAG chunk codes → obligation file regulation IDs
# ---------------------------------------------------------------------------
_REGULATION_CODE_TO_ID = {
# DSGVO
"eu_2016_679": "dsgvo",
"dsgvo": "dsgvo",
"gdpr": "dsgvo",
# AI Act
"eu_2024_1689": "ai_act",
"ai_act": "ai_act",
"aiact": "ai_act",
# NIS2
"eu_2022_2555": "nis2",
"nis2": "nis2",
"bsig": "nis2",
# BDSG
"bdsg": "bdsg",
# TTDSG
"ttdsg": "ttdsg",
# DSA
"eu_2022_2065": "dsa",
"dsa": "dsa",
# Data Act
"eu_2023_2854": "data_act",
"data_act": "data_act",
# EU Machinery
"eu_2023_1230": "eu_machinery",
"eu_machinery": "eu_machinery",
# DORA
"eu_2022_2554": "dora",
"dora": "dora",
}
@dataclass
class ObligationMatch:
"""Result of obligation extraction."""
obligation_id: Optional[str] = None
obligation_title: Optional[str] = None
obligation_text: Optional[str] = None
method: str = "none" # exact_match | embedding_match | llm_extracted | inferred
confidence: float = 0.0
regulation_id: Optional[str] = None # e.g. "dsgvo"
def to_dict(self) -> dict:
return {
"obligation_id": self.obligation_id,
"obligation_title": self.obligation_title,
"obligation_text": self.obligation_text,
"method": self.method,
"confidence": self.confidence,
"regulation_id": self.regulation_id,
}
@dataclass
class _ObligationEntry:
"""Internal representation of a loaded obligation."""
id: str
title: str
description: str
regulation_id: str
articles: list[str] = field(default_factory=list) # normalized: ["art. 30", "§ 38"]
embedding: list[float] = field(default_factory=list)
class ObligationExtractor:
"""3-Tier obligation extraction from RAG chunks.
Usage::
extractor = ObligationExtractor()
await extractor.initialize() # loads obligations + embeddings
match = await extractor.extract(
chunk_text="...",
regulation_code="eu_2016_679",
article="Art. 30",
paragraph="Abs. 1",
)
"""
def __init__(self):
self._article_lookup: dict[str, list[str]] = {} # "dsgvo/art. 30" → ["DSGVO-OBL-001"]
self._obligations: dict[str, _ObligationEntry] = {} # id → entry
self._obligation_embeddings: list[list[float]] = []
self._obligation_ids: list[str] = []
self._initialized = False
async def initialize(self) -> None:
"""Load all obligations from v2 JSON files and compute embeddings."""
if self._initialized:
return
self._load_obligations()
await self._compute_embeddings()
self._initialized = True
logger.info(
"ObligationExtractor initialized: %d obligations, %d article lookups, %d embeddings",
len(self._obligations),
len(self._article_lookup),
sum(1 for e in self._obligation_embeddings if e),
)
async def extract(
self,
chunk_text: str,
regulation_code: str,
article: Optional[str] = None,
paragraph: Optional[str] = None,
) -> ObligationMatch:
"""Extract obligation from a chunk using 3-tier strategy."""
if not self._initialized:
await self.initialize()
reg_id = _normalize_regulation(regulation_code)
# Tier 1: Exact match via article lookup
if article:
match = self._tier1_exact(reg_id, article)
if match:
return match
# Tier 2: Embedding similarity
match = await self._tier2_embedding(chunk_text, reg_id)
if match:
return match
# Tier 3: LLM extraction
match = await self._tier3_llm(chunk_text, regulation_code, article)
return match
# -----------------------------------------------------------------------
# Tier 1: Exact Match
# -----------------------------------------------------------------------
def _tier1_exact(self, reg_id: Optional[str], article: str) -> Optional[ObligationMatch]:
"""Look up obligation by regulation + article."""
if not reg_id:
return None
norm_article = _normalize_article(article)
key = f"{reg_id}/{norm_article}"
obl_ids = self._article_lookup.get(key)
if not obl_ids:
return None
# Take the first match (highest priority)
obl_id = obl_ids[0]
entry = self._obligations.get(obl_id)
if not entry:
return None
return ObligationMatch(
obligation_id=entry.id,
obligation_title=entry.title,
obligation_text=entry.description,
method="exact_match",
confidence=1.0,
regulation_id=reg_id,
)
# -----------------------------------------------------------------------
# Tier 2: Embedding Match
# -----------------------------------------------------------------------
async def _tier2_embedding(
self, chunk_text: str, reg_id: Optional[str]
) -> Optional[ObligationMatch]:
"""Find nearest obligation by embedding similarity."""
if not self._obligation_embeddings:
return None
chunk_embedding = await _get_embedding(chunk_text[:2000])
if not chunk_embedding:
return None
best_idx = -1
best_score = 0.0
for i, obl_emb in enumerate(self._obligation_embeddings):
if not obl_emb:
continue
# Prefer same-regulation matches
obl_id = self._obligation_ids[i]
entry = self._obligations.get(obl_id)
score = _cosine_sim(chunk_embedding, obl_emb)
# Domain bonus: +0.05 if same regulation
if entry and reg_id and entry.regulation_id == reg_id:
score += 0.05
if score > best_score:
best_score = score
best_idx = i
if best_idx < 0:
return None
# Remove domain bonus for threshold comparison
raw_score = best_score
obl_id = self._obligation_ids[best_idx]
entry = self._obligations.get(obl_id)
if entry and reg_id and entry.regulation_id == reg_id:
raw_score -= 0.05
if raw_score >= EMBEDDING_MATCH_THRESHOLD:
return ObligationMatch(
obligation_id=entry.id if entry else obl_id,
obligation_title=entry.title if entry else None,
obligation_text=entry.description if entry else None,
method="embedding_match",
confidence=round(min(raw_score, 1.0), 3),
regulation_id=entry.regulation_id if entry else reg_id,
)
return None
# -----------------------------------------------------------------------
# Tier 3: LLM Extraction
# -----------------------------------------------------------------------
async def _tier3_llm(
self, chunk_text: str, regulation_code: str, article: Optional[str]
) -> ObligationMatch:
"""Use local LLM to extract the obligation from the chunk."""
prompt = f"""Analysiere den folgenden Gesetzestext und extrahiere die zentrale rechtliche Pflicht.
Text:
{chunk_text[:3000]}
Quelle: {regulation_code} {article or ''}
Antworte NUR als JSON:
{{
"obligation_text": "Die zentrale Pflicht in einem Satz",
"actor": "Wer muss handeln (z.B. Verantwortlicher, Auftragsverarbeiter)",
"action": "Was muss getan werden",
"normative_strength": "muss|soll|kann"
}}"""
system_prompt = (
"Du bist ein Rechtsexperte fuer EU-Datenschutz- und Digitalrecht. "
"Extrahiere die zentrale rechtliche Pflicht aus Gesetzestexten. "
"Antworte ausschliesslich als JSON."
)
result_text = await _llm_ollama(prompt, system_prompt)
if not result_text:
return ObligationMatch(
method="llm_extracted",
confidence=0.0,
regulation_id=_normalize_regulation(regulation_code),
)
parsed = _parse_json(result_text)
obligation_text = parsed.get("obligation_text", result_text[:500])
return ObligationMatch(
obligation_id=None,
obligation_title=None,
obligation_text=obligation_text,
method="llm_extracted",
confidence=0.60,
regulation_id=_normalize_regulation(regulation_code),
)
# -----------------------------------------------------------------------
# Initialization helpers
# -----------------------------------------------------------------------
def _load_obligations(self) -> None:
"""Load all obligation files from v2 framework."""
v2_dir = _find_obligations_dir()
if not v2_dir:
logger.warning("Obligations v2 directory not found — Tier 1 disabled")
return
manifest_path = v2_dir / "_manifest.json"
if not manifest_path.exists():
logger.warning("Manifest not found at %s", manifest_path)
return
with open(manifest_path) as f:
manifest = json.load(f)
for reg_info in manifest.get("regulations", []):
reg_id = reg_info["id"]
reg_file = v2_dir / reg_info["file"]
if not reg_file.exists():
logger.warning("Regulation file not found: %s", reg_file)
continue
with open(reg_file) as f:
data = json.load(f)
for obl in data.get("obligations", []):
obl_id = obl["id"]
entry = _ObligationEntry(
id=obl_id,
title=obl.get("title", ""),
description=obl.get("description", ""),
regulation_id=reg_id,
)
# Build article lookup from legal_basis
for basis in obl.get("legal_basis", []):
article_raw = basis.get("article", "")
if article_raw:
norm_art = _normalize_article(article_raw)
key = f"{reg_id}/{norm_art}"
if key not in self._article_lookup:
self._article_lookup[key] = []
self._article_lookup[key].append(obl_id)
entry.articles.append(norm_art)
self._obligations[obl_id] = entry
logger.info(
"Loaded %d obligations from %d regulations",
len(self._obligations),
len(manifest.get("regulations", [])),
)
async def _compute_embeddings(self) -> None:
"""Compute embeddings for all obligation descriptions."""
if not self._obligations:
return
self._obligation_ids = list(self._obligations.keys())
texts = [
f"{self._obligations[oid].title}: {self._obligations[oid].description}"
for oid in self._obligation_ids
]
logger.info("Computing embeddings for %d obligations...", len(texts))
self._obligation_embeddings = await _get_embeddings_batch(texts)
valid = sum(1 for e in self._obligation_embeddings if e)
logger.info("Got %d/%d valid embeddings", valid, len(texts))
# -----------------------------------------------------------------------
# Stats
# -----------------------------------------------------------------------
def stats(self) -> dict:
"""Return initialization statistics."""
return {
"total_obligations": len(self._obligations),
"article_lookups": len(self._article_lookup),
"embeddings_valid": sum(1 for e in self._obligation_embeddings if e),
"regulations": list(
{e.regulation_id for e in self._obligations.values()}
),
"initialized": self._initialized,
}
# ---------------------------------------------------------------------------
# Module-level helpers (reusable by other modules)
# ---------------------------------------------------------------------------
def _normalize_regulation(regulation_code: str) -> Optional[str]:
"""Map a RAG regulation_code to obligation framework regulation ID."""
if not regulation_code:
return None
code = regulation_code.lower().strip()
# Direct lookup
if code in _REGULATION_CODE_TO_ID:
return _REGULATION_CODE_TO_ID[code]
# Prefix matching for families
for prefix, reg_id in [
("eu_2016_679", "dsgvo"),
("eu_2024_1689", "ai_act"),
("eu_2022_2555", "nis2"),
("eu_2022_2065", "dsa"),
("eu_2023_2854", "data_act"),
("eu_2023_1230", "eu_machinery"),
("eu_2022_2554", "dora"),
]:
if code.startswith(prefix):
return reg_id
return None
def _normalize_article(article: str) -> str:
"""Normalize article references for consistent lookup.
Examples:
"Art. 30""art. 30"
"§ 38 BDSG""§ 38"
"Article 10""art. 10"
"Art. 30 Abs. 1""art. 30"
"Artikel 35""art. 35"
"""
if not article:
return ""
s = article.strip()
# Remove trailing law name: "§ 38 BDSG" → "§ 38"
s = re.sub(r"\s+(DSGVO|BDSG|TTDSG|DSA|NIS2|DORA|AI.?Act)\s*$", "", s, flags=re.IGNORECASE)
# Remove paragraph references: "Art. 30 Abs. 1" → "Art. 30"
s = re.sub(r"\s+(Abs|Absatz|para|paragraph|lit|Satz)\.?\s+.*$", "", s, flags=re.IGNORECASE)
# Normalize "Article" / "Artikel" → "Art."
s = re.sub(r"^(Article|Artikel)\s+", "Art. ", s, flags=re.IGNORECASE)
return s.lower().strip()
def _cosine_sim(a: list[float], b: list[float]) -> float:
"""Compute cosine similarity between two vectors."""
if not a or not b or len(a) != len(b):
return 0.0
dot = sum(x * y for x, y in zip(a, b))
norm_a = sum(x * x for x in a) ** 0.5
norm_b = sum(x * x for x in b) ** 0.5
if norm_a == 0 or norm_b == 0:
return 0.0
return dot / (norm_a * norm_b)
def _find_obligations_dir() -> Optional[Path]:
"""Locate the obligations v2 directory."""
candidates = [
Path(__file__).resolve().parent.parent.parent.parent
/ "ai-compliance-sdk" / "policies" / "obligations" / "v2",
Path("/app/ai-compliance-sdk/policies/obligations/v2"),
Path("ai-compliance-sdk/policies/obligations/v2"),
]
for p in candidates:
if p.is_dir() and (p / "_manifest.json").exists():
return p
return None
async def _get_embedding(text: str) -> list[float]:
"""Get embedding vector for a single text."""
try:
async with httpx.AsyncClient(timeout=10.0) as client:
resp = await client.post(
f"{EMBEDDING_URL}/embed",
json={"texts": [text]},
)
resp.raise_for_status()
embeddings = resp.json().get("embeddings", [])
return embeddings[0] if embeddings else []
except Exception:
return []
async def _get_embeddings_batch(
texts: list[str], batch_size: int = 32
) -> list[list[float]]:
"""Get embeddings for multiple texts in batches."""
all_embeddings: list[list[float]] = []
for i in range(0, len(texts), batch_size):
batch = texts[i : i + batch_size]
try:
async with httpx.AsyncClient(timeout=30.0) as client:
resp = await client.post(
f"{EMBEDDING_URL}/embed",
json={"texts": batch},
)
resp.raise_for_status()
embeddings = resp.json().get("embeddings", [])
all_embeddings.extend(embeddings)
except Exception as e:
logger.warning("Batch embedding failed for %d texts: %s", len(batch), e)
all_embeddings.extend([[] for _ in batch])
return all_embeddings
async def _llm_ollama(prompt: str, system_prompt: Optional[str] = None) -> str:
"""Call local Ollama for LLM extraction."""
messages = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
messages.append({"role": "user", "content": prompt})
payload = {
"model": OLLAMA_MODEL,
"messages": messages,
"stream": False,
"format": "json",
"options": {"num_predict": 512},
"think": False,
}
try:
async with httpx.AsyncClient(timeout=LLM_TIMEOUT) as client:
resp = await client.post(f"{OLLAMA_URL}/api/chat", json=payload)
if resp.status_code != 200:
logger.error(
"Ollama chat failed %d: %s", resp.status_code, resp.text[:300]
)
return ""
data = resp.json()
return data.get("message", {}).get("content", "")
except Exception as e:
logger.warning("Ollama call failed: %s", e)
return ""
def _parse_json(text: str) -> dict:
"""Extract JSON from LLM response text."""
# Try direct parse
try:
return json.loads(text)
except json.JSONDecodeError:
pass
# Try extracting JSON block
match = re.search(r"\{[^{}]*\}", text, re.DOTALL)
if match:
try:
return json.loads(match.group())
except json.JSONDecodeError:
pass
return {}

View File

@@ -0,0 +1,532 @@
"""Pattern Matcher — Obligation-to-Control-Pattern Linking.
Maps obligations (from the ObligationExtractor) to control patterns
using two tiers:
Tier 1: KEYWORD MATCH — obligation_match_keywords from patterns (~70%)
Tier 2: EMBEDDING — cosine similarity with domain bonus (~25%)
Part of the Multi-Layer Control Architecture (Phase 5 of 8).
"""
import logging
import os
from dataclasses import dataclass, field
from pathlib import Path
from typing import Optional
import yaml
from services.obligation_extractor import (
_cosine_sim,
_get_embedding,
_get_embeddings_batch,
)
logger = logging.getLogger(__name__)
# Minimum keyword score to accept a match (at least 2 keyword hits)
KEYWORD_MATCH_MIN_HITS = 2
# Embedding threshold for Tier 2
EMBEDDING_PATTERN_THRESHOLD = 0.75
# Domain bonus when regulation maps to the pattern's domain
DOMAIN_BONUS = 0.10
# Map regulation IDs to pattern domains that are likely relevant
_REGULATION_DOMAIN_AFFINITY = {
"dsgvo": ["DATA", "COMP", "GOV"],
"bdsg": ["DATA", "COMP"],
"ttdsg": ["DATA"],
"ai_act": ["AI", "COMP", "DATA"],
"nis2": ["SEC", "INC", "NET", "LOG", "CRYP"],
"dsa": ["DATA", "COMP"],
"data_act": ["DATA", "COMP"],
"eu_machinery": ["SEC", "COMP"],
"dora": ["SEC", "INC", "FIN", "COMP"],
}
@dataclass
class ControlPattern:
"""Python representation of a control pattern from YAML."""
id: str
name: str
name_de: str
domain: str
category: str
description: str
objective_template: str
rationale_template: str
requirements_template: list[str] = field(default_factory=list)
test_procedure_template: list[str] = field(default_factory=list)
evidence_template: list[str] = field(default_factory=list)
severity_default: str = "medium"
implementation_effort_default: str = "m"
obligation_match_keywords: list[str] = field(default_factory=list)
tags: list[str] = field(default_factory=list)
composable_with: list[str] = field(default_factory=list)
open_anchor_refs: list[dict] = field(default_factory=list)
@dataclass
class PatternMatchResult:
"""Result of pattern matching."""
pattern: Optional[ControlPattern] = None
pattern_id: Optional[str] = None
method: str = "none" # keyword | embedding | combined | none
confidence: float = 0.0
keyword_hits: int = 0
total_keywords: int = 0
embedding_score: float = 0.0
domain_bonus_applied: bool = False
composable_patterns: list[str] = field(default_factory=list)
def to_dict(self) -> dict:
return {
"pattern_id": self.pattern_id,
"method": self.method,
"confidence": round(self.confidence, 3),
"keyword_hits": self.keyword_hits,
"total_keywords": self.total_keywords,
"embedding_score": round(self.embedding_score, 3),
"domain_bonus_applied": self.domain_bonus_applied,
"composable_patterns": self.composable_patterns,
}
class PatternMatcher:
"""Links obligations to control patterns using keyword + embedding matching.
Usage::
matcher = PatternMatcher()
await matcher.initialize()
result = await matcher.match(
obligation_text="Fuehrung eines Verarbeitungsverzeichnisses...",
regulation_id="dsgvo",
)
print(result.pattern_id) # e.g. "CP-COMP-001"
print(result.confidence) # e.g. 0.85
"""
def __init__(self):
self._patterns: list[ControlPattern] = []
self._by_id: dict[str, ControlPattern] = {}
self._by_domain: dict[str, list[ControlPattern]] = {}
self._keyword_index: dict[str, list[str]] = {} # keyword → [pattern_ids]
self._pattern_embeddings: list[list[float]] = []
self._pattern_ids: list[str] = []
self._initialized = False
async def initialize(self) -> None:
"""Load patterns from YAML and compute embeddings."""
if self._initialized:
return
self._load_patterns()
self._build_keyword_index()
await self._compute_embeddings()
self._initialized = True
logger.info(
"PatternMatcher initialized: %d patterns, %d keywords, %d embeddings",
len(self._patterns),
len(self._keyword_index),
sum(1 for e in self._pattern_embeddings if e),
)
async def match(
self,
obligation_text: str,
regulation_id: Optional[str] = None,
top_n: int = 1,
) -> PatternMatchResult:
"""Match obligation text to the best control pattern.
Args:
obligation_text: The obligation description to match against.
regulation_id: Source regulation (for domain bonus).
top_n: Number of top results to consider for composability.
Returns:
PatternMatchResult with the best match.
"""
if not self._initialized:
await self.initialize()
if not obligation_text or not self._patterns:
return PatternMatchResult()
# Tier 1: Keyword matching
keyword_result = self._tier1_keyword(obligation_text, regulation_id)
# Tier 2: Embedding matching
embedding_result = await self._tier2_embedding(obligation_text, regulation_id)
# Combine scores: prefer keyword match, boost with embedding if available
best = self._combine_results(keyword_result, embedding_result)
# Attach composable patterns
if best.pattern:
best.composable_patterns = [
pid for pid in best.pattern.composable_with
if pid in self._by_id
]
return best
async def match_top_n(
self,
obligation_text: str,
regulation_id: Optional[str] = None,
n: int = 3,
) -> list[PatternMatchResult]:
"""Return top-N pattern matches sorted by confidence descending."""
if not self._initialized:
await self.initialize()
if not obligation_text or not self._patterns:
return []
keyword_scores = self._keyword_scores(obligation_text, regulation_id)
embedding_scores = await self._embedding_scores(obligation_text, regulation_id)
# Merge scores
all_pattern_ids = set(keyword_scores.keys()) | set(embedding_scores.keys())
results: list[PatternMatchResult] = []
for pid in all_pattern_ids:
pattern = self._by_id.get(pid)
if not pattern:
continue
kw_score = keyword_scores.get(pid, (0, 0, 0.0)) # (hits, total, score)
emb_score = embedding_scores.get(pid, (0.0, False)) # (score, bonus_applied)
kw_hits, kw_total, kw_confidence = kw_score
emb_confidence, bonus_applied = emb_score
# Combined confidence: max of keyword and embedding, with boost if both
if kw_confidence > 0 and emb_confidence > 0:
combined = max(kw_confidence, emb_confidence) + 0.05
method = "combined"
elif kw_confidence > 0:
combined = kw_confidence
method = "keyword"
else:
combined = emb_confidence
method = "embedding"
results.append(PatternMatchResult(
pattern=pattern,
pattern_id=pid,
method=method,
confidence=min(combined, 1.0),
keyword_hits=kw_hits,
total_keywords=kw_total,
embedding_score=emb_confidence,
domain_bonus_applied=bonus_applied,
composable_patterns=[
p for p in pattern.composable_with if p in self._by_id
],
))
# Sort by confidence descending
results.sort(key=lambda r: r.confidence, reverse=True)
return results[:n]
# -----------------------------------------------------------------------
# Tier 1: Keyword Match
# -----------------------------------------------------------------------
def _tier1_keyword(
self, obligation_text: str, regulation_id: Optional[str]
) -> Optional[PatternMatchResult]:
"""Match by counting keyword hits in the obligation text."""
scores = self._keyword_scores(obligation_text, regulation_id)
if not scores:
return None
# Find best match
best_pid = max(scores, key=lambda pid: scores[pid][2])
hits, total, confidence = scores[best_pid]
if hits < KEYWORD_MATCH_MIN_HITS:
return None
pattern = self._by_id.get(best_pid)
if not pattern:
return None
# Check domain bonus
bonus_applied = False
if regulation_id and self._domain_matches(pattern.domain, regulation_id):
confidence = min(confidence + DOMAIN_BONUS, 1.0)
bonus_applied = True
return PatternMatchResult(
pattern=pattern,
pattern_id=best_pid,
method="keyword",
confidence=confidence,
keyword_hits=hits,
total_keywords=total,
domain_bonus_applied=bonus_applied,
)
def _keyword_scores(
self, text: str, regulation_id: Optional[str]
) -> dict[str, tuple[int, int, float]]:
"""Compute keyword match scores for all patterns.
Returns dict: pattern_id → (hits, total_keywords, confidence).
"""
text_lower = text.lower()
hits_by_pattern: dict[str, int] = {}
for keyword, pattern_ids in self._keyword_index.items():
if keyword in text_lower:
for pid in pattern_ids:
hits_by_pattern[pid] = hits_by_pattern.get(pid, 0) + 1
result: dict[str, tuple[int, int, float]] = {}
for pid, hits in hits_by_pattern.items():
pattern = self._by_id.get(pid)
if not pattern:
continue
total = len(pattern.obligation_match_keywords)
confidence = hits / total if total > 0 else 0.0
result[pid] = (hits, total, confidence)
return result
# -----------------------------------------------------------------------
# Tier 2: Embedding Match
# -----------------------------------------------------------------------
async def _tier2_embedding(
self, obligation_text: str, regulation_id: Optional[str]
) -> Optional[PatternMatchResult]:
"""Match by embedding similarity against pattern objective_templates."""
scores = await self._embedding_scores(obligation_text, regulation_id)
if not scores:
return None
best_pid = max(scores, key=lambda pid: scores[pid][0])
emb_score, bonus_applied = scores[best_pid]
if emb_score < EMBEDDING_PATTERN_THRESHOLD:
return None
pattern = self._by_id.get(best_pid)
if not pattern:
return None
return PatternMatchResult(
pattern=pattern,
pattern_id=best_pid,
method="embedding",
confidence=min(emb_score, 1.0),
embedding_score=emb_score,
domain_bonus_applied=bonus_applied,
)
async def _embedding_scores(
self, obligation_text: str, regulation_id: Optional[str]
) -> dict[str, tuple[float, bool]]:
"""Compute embedding similarity scores for all patterns.
Returns dict: pattern_id → (score, domain_bonus_applied).
"""
if not self._pattern_embeddings:
return {}
chunk_embedding = await _get_embedding(obligation_text[:2000])
if not chunk_embedding:
return {}
result: dict[str, tuple[float, bool]] = {}
for i, pat_emb in enumerate(self._pattern_embeddings):
if not pat_emb:
continue
pid = self._pattern_ids[i]
pattern = self._by_id.get(pid)
if not pattern:
continue
score = _cosine_sim(chunk_embedding, pat_emb)
# Domain bonus
bonus_applied = False
if regulation_id and self._domain_matches(pattern.domain, regulation_id):
score += DOMAIN_BONUS
bonus_applied = True
result[pid] = (score, bonus_applied)
return result
# -----------------------------------------------------------------------
# Score combination
# -----------------------------------------------------------------------
def _combine_results(
self,
keyword_result: Optional[PatternMatchResult],
embedding_result: Optional[PatternMatchResult],
) -> PatternMatchResult:
"""Combine keyword and embedding results into the best match."""
if not keyword_result and not embedding_result:
return PatternMatchResult()
if not keyword_result:
return embedding_result
if not embedding_result:
return keyword_result
# Both matched — check if they agree
if keyword_result.pattern_id == embedding_result.pattern_id:
# Same pattern: boost confidence
combined_confidence = min(
max(keyword_result.confidence, embedding_result.confidence) + 0.05,
1.0,
)
return PatternMatchResult(
pattern=keyword_result.pattern,
pattern_id=keyword_result.pattern_id,
method="combined",
confidence=combined_confidence,
keyword_hits=keyword_result.keyword_hits,
total_keywords=keyword_result.total_keywords,
embedding_score=embedding_result.embedding_score,
domain_bonus_applied=(
keyword_result.domain_bonus_applied
or embedding_result.domain_bonus_applied
),
)
# Different patterns: pick the one with higher confidence
if keyword_result.confidence >= embedding_result.confidence:
return keyword_result
return embedding_result
# -----------------------------------------------------------------------
# Domain affinity
# -----------------------------------------------------------------------
@staticmethod
def _domain_matches(pattern_domain: str, regulation_id: str) -> bool:
"""Check if a pattern's domain has affinity with a regulation."""
affine_domains = _REGULATION_DOMAIN_AFFINITY.get(regulation_id, [])
return pattern_domain in affine_domains
# -----------------------------------------------------------------------
# Initialization helpers
# -----------------------------------------------------------------------
def _load_patterns(self) -> None:
"""Load control patterns from YAML files."""
patterns_dir = _find_patterns_dir()
if not patterns_dir:
logger.warning("Control patterns directory not found")
return
for yaml_file in sorted(patterns_dir.glob("*.yaml")):
if yaml_file.name.startswith("_"):
continue
try:
with open(yaml_file) as f:
data = yaml.safe_load(f)
if not data or "patterns" not in data:
continue
for p in data["patterns"]:
pattern = ControlPattern(
id=p["id"],
name=p["name"],
name_de=p["name_de"],
domain=p["domain"],
category=p["category"],
description=p["description"],
objective_template=p["objective_template"],
rationale_template=p["rationale_template"],
requirements_template=p.get("requirements_template", []),
test_procedure_template=p.get("test_procedure_template", []),
evidence_template=p.get("evidence_template", []),
severity_default=p.get("severity_default", "medium"),
implementation_effort_default=p.get("implementation_effort_default", "m"),
obligation_match_keywords=p.get("obligation_match_keywords", []),
tags=p.get("tags", []),
composable_with=p.get("composable_with", []),
open_anchor_refs=p.get("open_anchor_refs", []),
)
self._patterns.append(pattern)
self._by_id[pattern.id] = pattern
domain_list = self._by_domain.setdefault(pattern.domain, [])
domain_list.append(pattern)
except Exception as e:
logger.error("Failed to load %s: %s", yaml_file.name, e)
logger.info("Loaded %d patterns from %s", len(self._patterns), patterns_dir)
def _build_keyword_index(self) -> None:
"""Build reverse index: keyword → [pattern_ids]."""
for pattern in self._patterns:
for kw in pattern.obligation_match_keywords:
lower_kw = kw.lower()
if lower_kw not in self._keyword_index:
self._keyword_index[lower_kw] = []
self._keyword_index[lower_kw].append(pattern.id)
async def _compute_embeddings(self) -> None:
"""Compute embeddings for all pattern objective templates."""
if not self._patterns:
return
self._pattern_ids = [p.id for p in self._patterns]
texts = [
f"{p.name_de}: {p.objective_template}"
for p in self._patterns
]
logger.info("Computing embeddings for %d patterns...", len(texts))
self._pattern_embeddings = await _get_embeddings_batch(texts)
valid = sum(1 for e in self._pattern_embeddings if e)
logger.info("Got %d/%d valid pattern embeddings", valid, len(texts))
# -----------------------------------------------------------------------
# Public helpers
# -----------------------------------------------------------------------
def get_pattern(self, pattern_id: str) -> Optional[ControlPattern]:
"""Get a pattern by its ID."""
return self._by_id.get(pattern_id.upper())
def get_patterns_by_domain(self, domain: str) -> list[ControlPattern]:
"""Get all patterns for a domain."""
return self._by_domain.get(domain.upper(), [])
def stats(self) -> dict:
"""Return matcher statistics."""
return {
"total_patterns": len(self._patterns),
"domains": list(self._by_domain.keys()),
"keywords": len(self._keyword_index),
"embeddings_valid": sum(1 for e in self._pattern_embeddings if e),
"initialized": self._initialized,
}
def _find_patterns_dir() -> Optional[Path]:
"""Locate the control_patterns directory."""
candidates = [
Path(__file__).resolve().parent.parent.parent.parent
/ "ai-compliance-sdk" / "policies" / "control_patterns",
Path("/app/ai-compliance-sdk/policies/control_patterns"),
Path("ai-compliance-sdk/policies/control_patterns"),
]
for p in candidates:
if p.is_dir():
return p
return None

View File

@@ -0,0 +1,670 @@
"""Pipeline Adapter — New 10-Stage Pipeline Integration.
Bridges the existing 7-stage control_generator pipeline with the new
multi-layer components (ObligationExtractor, PatternMatcher, ControlComposer).
New pipeline flow:
chunk → license_classify
→ obligation_extract (Stage 4 — NEW)
→ pattern_match (Stage 5 — NEW)
→ control_compose (Stage 6 — replaces old Stage 3)
→ harmonize → anchor → store + crosswalk → mark processed
Can be used in two modes:
1. INLINE: Called from _process_batch() to enrich the pipeline
2. STANDALONE: Process chunks directly through new stages
Part of the Multi-Layer Control Architecture (Phase 7 of 8).
"""
import hashlib
import json
import logging
from dataclasses import dataclass, field
from typing import Optional
from sqlalchemy import text
from sqlalchemy.orm import Session
from services.control_composer import ComposedControl, ControlComposer
from services.obligation_extractor import ObligationExtractor, ObligationMatch
from services.pattern_matcher import PatternMatcher, PatternMatchResult
logger = logging.getLogger(__name__)
@dataclass
class PipelineChunk:
"""Input chunk for the new pipeline stages."""
text: str
collection: str = ""
regulation_code: str = ""
article: Optional[str] = None
paragraph: Optional[str] = None
license_rule: int = 3
license_info: dict = field(default_factory=dict)
source_citation: Optional[dict] = None
chunk_hash: str = ""
def compute_hash(self) -> str:
if not self.chunk_hash:
self.chunk_hash = hashlib.sha256(self.text.encode()).hexdigest()
return self.chunk_hash
@dataclass
class PipelineResult:
"""Result of processing a chunk through the new pipeline."""
chunk: PipelineChunk
obligation: ObligationMatch = field(default_factory=ObligationMatch)
pattern_result: PatternMatchResult = field(default_factory=PatternMatchResult)
control: Optional[ComposedControl] = None
crosswalk_written: bool = False
error: Optional[str] = None
def to_dict(self) -> dict:
return {
"chunk_hash": self.chunk.chunk_hash,
"obligation": self.obligation.to_dict() if self.obligation else None,
"pattern": self.pattern_result.to_dict() if self.pattern_result else None,
"control": self.control.to_dict() if self.control else None,
"crosswalk_written": self.crosswalk_written,
"error": self.error,
}
class PipelineAdapter:
"""Integrates ObligationExtractor + PatternMatcher + ControlComposer.
Usage::
adapter = PipelineAdapter(db)
await adapter.initialize()
result = await adapter.process_chunk(PipelineChunk(
text="...",
regulation_code="eu_2016_679",
article="Art. 30",
license_rule=1,
))
"""
def __init__(self, db: Optional[Session] = None):
self.db = db
self._extractor = ObligationExtractor()
self._matcher = PatternMatcher()
self._composer = ControlComposer()
self._initialized = False
async def initialize(self) -> None:
"""Initialize all sub-components."""
if self._initialized:
return
await self._extractor.initialize()
await self._matcher.initialize()
self._initialized = True
logger.info("PipelineAdapter initialized")
async def process_chunk(self, chunk: PipelineChunk) -> PipelineResult:
"""Process a single chunk through the new 3-stage pipeline.
Stage 4: Obligation Extract
Stage 5: Pattern Match
Stage 6: Control Compose
"""
if not self._initialized:
await self.initialize()
chunk.compute_hash()
result = PipelineResult(chunk=chunk)
try:
# Stage 4: Obligation Extract
result.obligation = await self._extractor.extract(
chunk_text=chunk.text,
regulation_code=chunk.regulation_code,
article=chunk.article,
paragraph=chunk.paragraph,
)
# Stage 5: Pattern Match
obligation_text = (
result.obligation.obligation_text
or result.obligation.obligation_title
or chunk.text[:500]
)
result.pattern_result = await self._matcher.match(
obligation_text=obligation_text,
regulation_id=result.obligation.regulation_id,
)
# Stage 6: Control Compose
result.control = await self._composer.compose(
obligation=result.obligation,
pattern_result=result.pattern_result,
chunk_text=chunk.text if chunk.license_rule in (1, 2) else None,
license_rule=chunk.license_rule,
source_citation=chunk.source_citation,
regulation_code=chunk.regulation_code,
)
except Exception as e:
logger.error("Pipeline processing failed: %s", e)
result.error = str(e)
return result
async def process_batch(self, chunks: list[PipelineChunk]) -> list[PipelineResult]:
"""Process multiple chunks through the pipeline."""
results = []
for chunk in chunks:
result = await self.process_chunk(chunk)
results.append(result)
return results
def write_crosswalk(self, result: PipelineResult, control_uuid: str) -> bool:
"""Write obligation_extraction + crosswalk_matrix rows for a processed chunk.
Called AFTER the control is stored in canonical_controls.
"""
if not self.db or not result.control:
return False
chunk = result.chunk
obligation = result.obligation
pattern = result.pattern_result
try:
# 1. Write obligation_extraction row
self.db.execute(
text("""
INSERT INTO obligation_extractions (
chunk_hash, collection, regulation_code,
article, paragraph, obligation_id,
obligation_text, confidence, extraction_method,
pattern_id, pattern_match_score, control_uuid
) VALUES (
:chunk_hash, :collection, :regulation_code,
:article, :paragraph, :obligation_id,
:obligation_text, :confidence, :extraction_method,
:pattern_id, :pattern_match_score,
CAST(:control_uuid AS uuid)
)
"""),
{
"chunk_hash": chunk.chunk_hash,
"collection": chunk.collection,
"regulation_code": chunk.regulation_code,
"article": chunk.article,
"paragraph": chunk.paragraph,
"obligation_id": obligation.obligation_id if obligation else None,
"obligation_text": (
obligation.obligation_text[:2000]
if obligation and obligation.obligation_text
else None
),
"confidence": obligation.confidence if obligation else 0,
"extraction_method": obligation.method if obligation else "none",
"pattern_id": pattern.pattern_id if pattern else None,
"pattern_match_score": pattern.confidence if pattern else 0,
"control_uuid": control_uuid,
},
)
# 2. Write crosswalk_matrix row
self.db.execute(
text("""
INSERT INTO crosswalk_matrix (
regulation_code, article, paragraph,
obligation_id, pattern_id,
master_control_id, master_control_uuid,
confidence, source
) VALUES (
:regulation_code, :article, :paragraph,
:obligation_id, :pattern_id,
:master_control_id,
CAST(:master_control_uuid AS uuid),
:confidence, :source
)
"""),
{
"regulation_code": chunk.regulation_code,
"article": chunk.article,
"paragraph": chunk.paragraph,
"obligation_id": obligation.obligation_id if obligation else None,
"pattern_id": pattern.pattern_id if pattern else None,
"master_control_id": result.control.control_id,
"master_control_uuid": control_uuid,
"confidence": min(
obligation.confidence if obligation else 0,
pattern.confidence if pattern else 0,
),
"source": "auto",
},
)
# 3. Update canonical_controls with pattern_id + obligation_ids
if result.control.pattern_id or result.control.obligation_ids:
self.db.execute(
text("""
UPDATE canonical_controls
SET pattern_id = COALESCE(:pattern_id, pattern_id),
obligation_ids = COALESCE(:obligation_ids, obligation_ids)
WHERE id = CAST(:control_uuid AS uuid)
"""),
{
"pattern_id": result.control.pattern_id,
"obligation_ids": json.dumps(result.control.obligation_ids),
"control_uuid": control_uuid,
},
)
self.db.commit()
result.crosswalk_written = True
return True
except Exception as e:
logger.error("Failed to write crosswalk: %s", e)
self.db.rollback()
return False
def stats(self) -> dict:
"""Return component statistics."""
return {
"extractor": self._extractor.stats(),
"matcher": self._matcher.stats(),
"initialized": self._initialized,
}
# ---------------------------------------------------------------------------
# Migration Passes — Backfill existing 4,800+ controls
# ---------------------------------------------------------------------------
class MigrationPasses:
"""Non-destructive migration passes for existing controls.
Pass 1: Obligation Linkage (deterministic, article→obligation lookup)
Pass 2: Pattern Classification (keyword-based matching)
Pass 3: Quality Triage (categorize by linkage completeness)
Pass 4: Crosswalk Backfill (write crosswalk rows for linked controls)
Pass 5: Deduplication (mark duplicate controls)
Usage::
migration = MigrationPasses(db)
await migration.initialize()
result = await migration.run_pass1_obligation_linkage(limit=100)
result = await migration.run_pass2_pattern_classification(limit=100)
result = migration.run_pass3_quality_triage()
result = migration.run_pass4_crosswalk_backfill()
result = migration.run_pass5_deduplication()
"""
def __init__(self, db: Session):
self.db = db
self._extractor = ObligationExtractor()
self._matcher = PatternMatcher()
self._initialized = False
async def initialize(self) -> None:
"""Initialize extractors (loads obligations + patterns)."""
if self._initialized:
return
self._extractor._load_obligations()
self._matcher._load_patterns()
self._matcher._build_keyword_index()
self._initialized = True
# -------------------------------------------------------------------
# Pass 1: Obligation Linkage (deterministic)
# -------------------------------------------------------------------
async def run_pass1_obligation_linkage(self, limit: int = 0) -> dict:
"""Link existing controls to obligations via source_citation article.
For each control with source_citation → extract regulation + article
→ look up in obligation framework → set obligation_ids.
"""
if not self._initialized:
await self.initialize()
query = """
SELECT id, control_id, source_citation, generation_metadata
FROM canonical_controls
WHERE release_state NOT IN ('deprecated')
AND (obligation_ids IS NULL OR obligation_ids = '[]')
"""
if limit > 0:
query += f" LIMIT {limit}"
rows = self.db.execute(text(query)).fetchall()
stats = {"total": len(rows), "linked": 0, "no_match": 0, "no_citation": 0}
for row in rows:
control_uuid = str(row[0])
control_id = row[1]
citation = row[2]
metadata = row[3]
# Extract regulation + article from citation or metadata
reg_code, article = _extract_regulation_article(citation, metadata)
if not reg_code:
stats["no_citation"] += 1
continue
# Tier 1: Exact match
match = self._extractor._tier1_exact(reg_code, article or "")
if match and match.obligation_id:
self.db.execute(
text("""
UPDATE canonical_controls
SET obligation_ids = :obl_ids
WHERE id = CAST(:uuid AS uuid)
"""),
{
"obl_ids": json.dumps([match.obligation_id]),
"uuid": control_uuid,
},
)
stats["linked"] += 1
else:
stats["no_match"] += 1
self.db.commit()
logger.info("Pass 1: %s", stats)
return stats
# -------------------------------------------------------------------
# Pass 2: Pattern Classification (keyword-based)
# -------------------------------------------------------------------
async def run_pass2_pattern_classification(self, limit: int = 0) -> dict:
"""Classify existing controls into patterns via keyword matching.
For each control without pattern_id → keyword-match title+objective
against pattern library → assign best match.
"""
if not self._initialized:
await self.initialize()
query = """
SELECT id, control_id, title, objective
FROM canonical_controls
WHERE release_state NOT IN ('deprecated')
AND (pattern_id IS NULL OR pattern_id = '')
"""
if limit > 0:
query += f" LIMIT {limit}"
rows = self.db.execute(text(query)).fetchall()
stats = {"total": len(rows), "classified": 0, "no_match": 0}
for row in rows:
control_uuid = str(row[0])
title = row[2] or ""
objective = row[3] or ""
# Keyword match
match_text = f"{title} {objective}"
result = self._matcher._tier1_keyword(match_text, None)
if result and result.pattern_id and result.keyword_hits >= 2:
self.db.execute(
text("""
UPDATE canonical_controls
SET pattern_id = :pattern_id
WHERE id = CAST(:uuid AS uuid)
"""),
{
"pattern_id": result.pattern_id,
"uuid": control_uuid,
},
)
stats["classified"] += 1
else:
stats["no_match"] += 1
self.db.commit()
logger.info("Pass 2: %s", stats)
return stats
# -------------------------------------------------------------------
# Pass 3: Quality Triage
# -------------------------------------------------------------------
def run_pass3_quality_triage(self) -> dict:
"""Categorize controls by linkage completeness.
Sets generation_metadata.triage_status:
- "review": has both obligation_id + pattern_id
- "needs_obligation": has pattern_id but no obligation_id
- "needs_pattern": has obligation_id but no pattern_id
- "legacy_unlinked": has neither
"""
categories = {
"review": """
UPDATE canonical_controls
SET generation_metadata = jsonb_set(
COALESCE(generation_metadata::jsonb, '{}'::jsonb),
'{triage_status}', '"review"'
)
WHERE release_state NOT IN ('deprecated')
AND obligation_ids IS NOT NULL AND obligation_ids != '[]'
AND pattern_id IS NOT NULL AND pattern_id != ''
""",
"needs_obligation": """
UPDATE canonical_controls
SET generation_metadata = jsonb_set(
COALESCE(generation_metadata::jsonb, '{}'::jsonb),
'{triage_status}', '"needs_obligation"'
)
WHERE release_state NOT IN ('deprecated')
AND (obligation_ids IS NULL OR obligation_ids = '[]')
AND pattern_id IS NOT NULL AND pattern_id != ''
""",
"needs_pattern": """
UPDATE canonical_controls
SET generation_metadata = jsonb_set(
COALESCE(generation_metadata::jsonb, '{}'::jsonb),
'{triage_status}', '"needs_pattern"'
)
WHERE release_state NOT IN ('deprecated')
AND obligation_ids IS NOT NULL AND obligation_ids != '[]'
AND (pattern_id IS NULL OR pattern_id = '')
""",
"legacy_unlinked": """
UPDATE canonical_controls
SET generation_metadata = jsonb_set(
COALESCE(generation_metadata::jsonb, '{}'::jsonb),
'{triage_status}', '"legacy_unlinked"'
)
WHERE release_state NOT IN ('deprecated')
AND (obligation_ids IS NULL OR obligation_ids = '[]')
AND (pattern_id IS NULL OR pattern_id = '')
""",
}
stats = {}
for category, sql in categories.items():
result = self.db.execute(text(sql))
stats[category] = result.rowcount
self.db.commit()
logger.info("Pass 3: %s", stats)
return stats
# -------------------------------------------------------------------
# Pass 4: Crosswalk Backfill
# -------------------------------------------------------------------
def run_pass4_crosswalk_backfill(self) -> dict:
"""Create crosswalk_matrix rows for controls with obligation + pattern.
Only creates rows that don't already exist.
"""
result = self.db.execute(text("""
INSERT INTO crosswalk_matrix (
regulation_code, obligation_id, pattern_id,
master_control_id, master_control_uuid,
confidence, source
)
SELECT
COALESCE(
(generation_metadata::jsonb->>'source_regulation'),
''
) AS regulation_code,
obl.value::text AS obligation_id,
cc.pattern_id,
cc.control_id,
cc.id,
0.80,
'migrated'
FROM canonical_controls cc,
jsonb_array_elements_text(
COALESCE(cc.obligation_ids::jsonb, '[]'::jsonb)
) AS obl(value)
WHERE cc.release_state NOT IN ('deprecated')
AND cc.pattern_id IS NOT NULL AND cc.pattern_id != ''
AND cc.obligation_ids IS NOT NULL AND cc.obligation_ids != '[]'
AND NOT EXISTS (
SELECT 1 FROM crosswalk_matrix cw
WHERE cw.master_control_uuid = cc.id
AND cw.obligation_id = obl.value::text
)
"""))
rows_inserted = result.rowcount
self.db.commit()
logger.info("Pass 4: %d crosswalk rows inserted", rows_inserted)
return {"rows_inserted": rows_inserted}
# -------------------------------------------------------------------
# Pass 5: Deduplication
# -------------------------------------------------------------------
def run_pass5_deduplication(self) -> dict:
"""Mark duplicate controls (same obligation + same pattern).
Groups controls by (obligation_id, pattern_id), keeps the one with
highest evidence_confidence (or newest), marks rest as deprecated.
"""
# Find groups with duplicates
groups = self.db.execute(text("""
SELECT cc.pattern_id,
obl.value::text AS obligation_id,
array_agg(cc.id ORDER BY cc.evidence_confidence DESC NULLS LAST, cc.created_at DESC) AS ids,
count(*) AS cnt
FROM canonical_controls cc,
jsonb_array_elements_text(
COALESCE(cc.obligation_ids::jsonb, '[]'::jsonb)
) AS obl(value)
WHERE cc.release_state NOT IN ('deprecated')
AND cc.pattern_id IS NOT NULL AND cc.pattern_id != ''
GROUP BY cc.pattern_id, obl.value::text
HAVING count(*) > 1
""")).fetchall()
stats = {"groups_found": len(groups), "controls_deprecated": 0}
for group in groups:
ids = group[2] # Array of UUIDs, first is the keeper
if len(ids) <= 1:
continue
# Keep first (highest confidence), deprecate rest
deprecate_ids = ids[1:]
for dep_id in deprecate_ids:
self.db.execute(
text("""
UPDATE canonical_controls
SET release_state = 'deprecated',
generation_metadata = jsonb_set(
COALESCE(generation_metadata::jsonb, '{}'::jsonb),
'{deprecated_reason}', '"duplicate_same_obligation_pattern"'
)
WHERE id = CAST(:uuid AS uuid)
AND release_state != 'deprecated'
"""),
{"uuid": str(dep_id)},
)
stats["controls_deprecated"] += 1
self.db.commit()
logger.info("Pass 5: %s", stats)
return stats
def migration_status(self) -> dict:
"""Return overall migration progress."""
row = self.db.execute(text("""
SELECT
count(*) AS total,
count(*) FILTER (WHERE obligation_ids IS NOT NULL AND obligation_ids != '[]') AS has_obligation,
count(*) FILTER (WHERE pattern_id IS NOT NULL AND pattern_id != '') AS has_pattern,
count(*) FILTER (
WHERE obligation_ids IS NOT NULL AND obligation_ids != '[]'
AND pattern_id IS NOT NULL AND pattern_id != ''
) AS fully_linked,
count(*) FILTER (WHERE release_state = 'deprecated') AS deprecated
FROM canonical_controls
""")).fetchone()
return {
"total_controls": row[0],
"has_obligation": row[1],
"has_pattern": row[2],
"fully_linked": row[3],
"deprecated": row[4],
"coverage_obligation_pct": round(row[1] / max(row[0], 1) * 100, 1),
"coverage_pattern_pct": round(row[2] / max(row[0], 1) * 100, 1),
"coverage_full_pct": round(row[3] / max(row[0], 1) * 100, 1),
}
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _extract_regulation_article(
citation: Optional[str], metadata: Optional[str]
) -> tuple[Optional[str], Optional[str]]:
"""Extract regulation_code and article from control's citation/metadata."""
from services.obligation_extractor import _normalize_regulation
reg_code = None
article = None
# Try citation first (JSON string or dict)
if citation:
try:
c = json.loads(citation) if isinstance(citation, str) else citation
if isinstance(c, dict):
article = c.get("article") or c.get("source_article")
# Try to get regulation from source field
source = c.get("source", "")
if source:
reg_code = _normalize_regulation(source)
except (json.JSONDecodeError, TypeError):
pass
# Try metadata
if metadata and not reg_code:
try:
m = json.loads(metadata) if isinstance(metadata, str) else metadata
if isinstance(m, dict):
src_reg = m.get("source_regulation", "")
if src_reg:
reg_code = _normalize_regulation(src_reg)
if not article:
article = m.get("source_article")
except (json.JSONDecodeError, TypeError):
pass
return reg_code, article

View File

@@ -0,0 +1,213 @@
"""
Compliance RAG Client — Proxy to Go SDK RAG Search.
Lightweight HTTP client that queries the Go AI Compliance SDK's
POST /sdk/v1/rag/search endpoint. This avoids needing embedding
models or direct Qdrant access in Python.
Error-tolerant: RAG failures never break the calling function.
"""
import logging
import os
from dataclasses import dataclass
from typing import List, Optional
import httpx
logger = logging.getLogger(__name__)
SDK_URL = os.getenv("SDK_URL", "http://ai-compliance-sdk:8090")
RAG_SEARCH_TIMEOUT = 15.0 # seconds
@dataclass
class RAGSearchResult:
"""A single search result from the compliance corpus."""
text: str
regulation_code: str
regulation_name: str
regulation_short: str
category: str
article: str
paragraph: str
source_url: str
score: float
collection: str = ""
class ComplianceRAGClient:
"""
RAG client that proxies search requests to the Go SDK.
Usage:
client = get_rag_client()
results = await client.search("DSGVO Art. 35", collection="bp_compliance_recht")
context_str = client.format_for_prompt(results)
"""
def __init__(self, base_url: str = SDK_URL):
self._search_url = f"{base_url}/sdk/v1/rag/search"
async def search(
self,
query: str,
collection: str = "bp_compliance_ce",
regulations: Optional[List[str]] = None,
top_k: int = 5,
) -> List[RAGSearchResult]:
"""
Search the RAG corpus via Go SDK.
Returns an empty list on any error (never raises).
"""
payload = {
"query": query,
"collection": collection,
"top_k": top_k,
}
if regulations:
payload["regulations"] = regulations
try:
async with httpx.AsyncClient(timeout=RAG_SEARCH_TIMEOUT) as client:
resp = await client.post(self._search_url, json=payload)
if resp.status_code != 200:
logger.warning(
"RAG search returned %d: %s", resp.status_code, resp.text[:200]
)
return []
data = resp.json()
results = []
for r in data.get("results", []):
results.append(RAGSearchResult(
text=r.get("text", ""),
regulation_code=r.get("regulation_code", ""),
regulation_name=r.get("regulation_name", ""),
regulation_short=r.get("regulation_short", ""),
category=r.get("category", ""),
article=r.get("article", ""),
paragraph=r.get("paragraph", ""),
source_url=r.get("source_url", ""),
score=r.get("score", 0.0),
collection=collection,
))
return results
except Exception as e:
logger.warning("RAG search failed: %s", e)
return []
async def search_with_rerank(
self,
query: str,
collection: str = "bp_compliance_ce",
regulations: Optional[List[str]] = None,
top_k: int = 5,
) -> List[RAGSearchResult]:
"""
Search with optional cross-encoder re-ranking.
Fetches top_k*4 results from RAG, then re-ranks with cross-encoder
and returns top_k. Falls back to regular search if reranker is disabled.
"""
from .reranker import get_reranker
reranker = get_reranker()
if reranker is None:
return await self.search(query, collection, regulations, top_k)
# Fetch more candidates for re-ranking
candidates = await self.search(
query, collection, regulations, top_k=max(top_k * 4, 20)
)
if not candidates:
return []
texts = [c.text for c in candidates]
try:
ranked_indices = reranker.rerank(query, texts, top_k=top_k)
return [candidates[i] for i in ranked_indices]
except Exception as e:
logger.warning("Reranking failed, returning unranked: %s", e)
return candidates[:top_k]
async def scroll(
self,
collection: str,
offset: Optional[str] = None,
limit: int = 100,
) -> tuple[List[RAGSearchResult], Optional[str]]:
"""
Scroll through ALL chunks in a collection (paginated).
Returns (chunks, next_offset). next_offset is None when done.
"""
scroll_url = self._search_url.replace("/search", "/scroll")
params = {"collection": collection, "limit": str(limit)}
if offset:
params["offset"] = offset
try:
async with httpx.AsyncClient(timeout=30.0) as client:
resp = await client.get(scroll_url, params=params)
if resp.status_code != 200:
logger.warning(
"RAG scroll returned %d: %s", resp.status_code, resp.text[:200]
)
return [], None
data = resp.json()
results = []
for r in data.get("chunks", []):
results.append(RAGSearchResult(
text=r.get("text", ""),
regulation_code=r.get("regulation_code", ""),
regulation_name=r.get("regulation_name", ""),
regulation_short=r.get("regulation_short", ""),
category=r.get("category", ""),
article=r.get("article", ""),
paragraph=r.get("paragraph", ""),
source_url=r.get("source_url", ""),
score=0.0,
collection=collection,
))
next_offset = data.get("next_offset") or None
return results, next_offset
except Exception as e:
logger.warning("RAG scroll failed: %s", e)
return [], None
def format_for_prompt(
self, results: List[RAGSearchResult], max_results: int = 5
) -> str:
"""Format search results as Markdown for inclusion in an LLM prompt."""
if not results:
return ""
lines = ["## Relevanter Rechtskontext\n"]
for i, r in enumerate(results[:max_results]):
header = f"{i + 1}. **{r.regulation_short}** ({r.regulation_code})"
if r.article:
header += f"{r.article}"
lines.append(header)
text = r.text[:400] + "..." if len(r.text) > 400 else r.text
lines.append(f" > {text}\n")
return "\n".join(lines)
# Singleton
_rag_client: Optional[ComplianceRAGClient] = None
def get_rag_client() -> ComplianceRAGClient:
"""Get the shared RAG client instance."""
global _rag_client
if _rag_client is None:
_rag_client = ComplianceRAGClient()
return _rag_client

View File

@@ -0,0 +1,85 @@
"""
Cross-Encoder Re-Ranking for RAG Search Results.
Uses BGE Reranker v2 (BAAI/bge-reranker-v2-m3, MIT license) to re-rank
search results from Qdrant for improved retrieval quality.
Lazy-loads the model on first use. Disabled by default (RERANK_ENABLED=false).
"""
import logging
import os
from typing import Optional
logger = logging.getLogger(__name__)
RERANK_ENABLED = os.getenv("RERANK_ENABLED", "false").lower() == "true"
RERANK_MODEL = os.getenv("RERANK_MODEL", "BAAI/bge-reranker-v2-m3")
class Reranker:
"""Cross-encoder reranker using sentence-transformers."""
def __init__(self, model_name: str = RERANK_MODEL):
self._model = None # Lazy init
self._model_name = model_name
def _ensure_model(self) -> None:
"""Load model on first use."""
if self._model is not None:
return
try:
from sentence_transformers import CrossEncoder
logger.info("Loading reranker model: %s", self._model_name)
self._model = CrossEncoder(self._model_name)
logger.info("Reranker model loaded successfully")
except ImportError:
logger.error(
"sentence-transformers not installed. "
"Install with: pip install sentence-transformers"
)
raise
except Exception as e:
logger.error("Failed to load reranker model: %s", e)
raise
def rerank(
self, query: str, texts: list[str], top_k: int = 5
) -> list[int]:
"""
Return indices of top_k texts sorted by relevance (highest first).
Args:
query: The search query.
texts: List of candidate texts to re-rank.
top_k: Number of top results to return.
Returns:
List of indices into the original texts list, sorted by relevance.
"""
if not texts:
return []
self._ensure_model()
pairs = [[query, text] for text in texts]
scores = self._model.predict(pairs)
# Sort by score descending, return indices
ranked = sorted(range(len(scores)), key=lambda i: scores[i], reverse=True)
return ranked[:top_k]
# Module-level singleton
_reranker: Optional[Reranker] = None
def get_reranker() -> Optional[Reranker]:
"""Get the shared reranker instance. Returns None if disabled."""
global _reranker
if not RERANK_ENABLED:
return None
if _reranker is None:
_reranker = Reranker()
return _reranker

View File

@@ -0,0 +1,223 @@
"""
Too-Close Similarity Detector — checks whether a candidate text is too similar
to a protected source text (copyright / license compliance).
Five metrics:
1. Exact-phrase — longest identical token sequence
2. Token overlap — Jaccard similarity of token sets
3. 3-gram Jaccard — Jaccard similarity of character 3-grams
4. Embedding cosine — via bge-m3 (Ollama or embedding-service)
5. LCS ratio — Longest Common Subsequence / max(len_a, len_b)
Decision:
PASS — no fail + max 1 warn
WARN — max 2 warn, no fail → human review
FAIL — any fail threshold → block, rewrite required
"""
from __future__ import annotations
import logging
import re
from dataclasses import dataclass
from typing import Optional
import httpx
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Thresholds
# ---------------------------------------------------------------------------
THRESHOLDS = {
"max_exact_run": {"warn": 8, "fail": 12},
"token_overlap": {"warn": 0.20, "fail": 0.30},
"ngram_jaccard": {"warn": 0.10, "fail": 0.18},
"embedding_cosine": {"warn": 0.86, "fail": 0.92},
"lcs_ratio": {"warn": 0.35, "fail": 0.50},
}
# ---------------------------------------------------------------------------
# Tokenisation helpers
# ---------------------------------------------------------------------------
_WORD_RE = re.compile(r"\w+", re.UNICODE)
def _tokenize(text: str) -> list[str]:
return [t.lower() for t in _WORD_RE.findall(text)]
def _char_ngrams(text: str, n: int = 3) -> set[str]:
text = text.lower()
return {text[i : i + n] for i in range(len(text) - n + 1)} if len(text) >= n else set()
# ---------------------------------------------------------------------------
# Metric implementations
# ---------------------------------------------------------------------------
def max_exact_run(tokens_a: list[str], tokens_b: list[str]) -> int:
"""Longest contiguous identical token sequence between a and b."""
if not tokens_a or not tokens_b:
return 0
best = 0
set_b = set(tokens_b)
for i in range(len(tokens_a)):
if tokens_a[i] not in set_b:
continue
for j in range(len(tokens_b)):
if tokens_a[i] != tokens_b[j]:
continue
run = 0
ii, jj = i, j
while ii < len(tokens_a) and jj < len(tokens_b) and tokens_a[ii] == tokens_b[jj]:
run += 1
ii += 1
jj += 1
if run > best:
best = run
return best
def token_overlap_jaccard(tokens_a: list[str], tokens_b: list[str]) -> float:
"""Jaccard similarity of token sets."""
set_a, set_b = set(tokens_a), set(tokens_b)
if not set_a and not set_b:
return 0.0
return len(set_a & set_b) / len(set_a | set_b)
def ngram_jaccard(text_a: str, text_b: str, n: int = 3) -> float:
"""Jaccard similarity of character n-grams."""
grams_a = _char_ngrams(text_a, n)
grams_b = _char_ngrams(text_b, n)
if not grams_a and not grams_b:
return 0.0
return len(grams_a & grams_b) / len(grams_a | grams_b)
def lcs_ratio(tokens_a: list[str], tokens_b: list[str]) -> float:
"""LCS length / max(len_a, len_b)."""
m, n = len(tokens_a), len(tokens_b)
if m == 0 or n == 0:
return 0.0
# Space-optimised LCS (two rows)
prev = [0] * (n + 1)
curr = [0] * (n + 1)
for i in range(1, m + 1):
for j in range(1, n + 1):
if tokens_a[i - 1] == tokens_b[j - 1]:
curr[j] = prev[j - 1] + 1
else:
curr[j] = max(prev[j], curr[j - 1])
prev, curr = curr, [0] * (n + 1)
return prev[n] / max(m, n)
async def embedding_cosine(text_a: str, text_b: str, embedding_url: str | None = None) -> float:
"""Cosine similarity via embedding service (bge-m3).
Falls back to 0.0 if the service is unreachable.
"""
url = embedding_url or "http://embedding-service:8087"
try:
async with httpx.AsyncClient(timeout=10.0) as client:
resp = await client.post(
f"{url}/embed",
json={"texts": [text_a, text_b]},
)
resp.raise_for_status()
embeddings = resp.json().get("embeddings", [])
if len(embeddings) < 2:
return 0.0
return _cosine(embeddings[0], embeddings[1])
except Exception:
logger.warning("Embedding service unreachable, skipping cosine check")
return 0.0
def _cosine(a: list[float], b: list[float]) -> float:
dot = sum(x * y for x, y in zip(a, b))
norm_a = sum(x * x for x in a) ** 0.5
norm_b = sum(x * x for x in b) ** 0.5
if norm_a == 0 or norm_b == 0:
return 0.0
return dot / (norm_a * norm_b)
# ---------------------------------------------------------------------------
# Decision engine
# ---------------------------------------------------------------------------
@dataclass
class SimilarityReport:
max_exact_run: int
token_overlap: float
ngram_jaccard: float
embedding_cosine: float
lcs_ratio: float
status: str # PASS, WARN, FAIL
details: dict # per-metric status
def _classify(value: float | int, metric: str) -> str:
t = THRESHOLDS[metric]
if value >= t["fail"]:
return "FAIL"
if value >= t["warn"]:
return "WARN"
return "PASS"
async def check_similarity(
source_text: str,
candidate_text: str,
embedding_url: str | None = None,
) -> SimilarityReport:
"""Run all 5 metrics and return an aggregate report."""
tok_src = _tokenize(source_text)
tok_cand = _tokenize(candidate_text)
m_exact = max_exact_run(tok_src, tok_cand)
m_token = token_overlap_jaccard(tok_src, tok_cand)
m_ngram = ngram_jaccard(source_text, candidate_text)
m_embed = await embedding_cosine(source_text, candidate_text, embedding_url)
m_lcs = lcs_ratio(tok_src, tok_cand)
details = {
"max_exact_run": _classify(m_exact, "max_exact_run"),
"token_overlap": _classify(m_token, "token_overlap"),
"ngram_jaccard": _classify(m_ngram, "ngram_jaccard"),
"embedding_cosine": _classify(m_embed, "embedding_cosine"),
"lcs_ratio": _classify(m_lcs, "lcs_ratio"),
}
fail_count = sum(1 for v in details.values() if v == "FAIL")
warn_count = sum(1 for v in details.values() if v == "WARN")
if fail_count > 0:
status = "FAIL"
elif warn_count > 2:
status = "FAIL"
elif warn_count > 1:
status = "WARN"
elif warn_count == 1:
status = "PASS"
else:
status = "PASS"
return SimilarityReport(
max_exact_run=m_exact,
token_overlap=round(m_token, 4),
ngram_jaccard=round(m_ngram, 4),
embedding_cosine=round(m_embed, 4),
lcs_ratio=round(m_lcs, 4),
status=status,
details=details,
)

View File

@@ -0,0 +1,331 @@
"""V1 Control Enrichment Service — Match Eigenentwicklung controls to regulations.
Finds regulatory coverage for v1 controls (generation_strategy='ungrouped',
pipeline_version=1, no source_citation) by embedding similarity search.
Reuses embedding + Qdrant helpers from control_dedup.py.
"""
import logging
from typing import Optional
from sqlalchemy import text
from db.session import SessionLocal
from services.control_dedup import (
get_embedding,
qdrant_search_cross_regulation,
)
logger = logging.getLogger(__name__)
# Similarity threshold — lower than dedup (0.85) since we want informational matches
# Typical top scores for v1 controls are 0.70-0.77
V1_MATCH_THRESHOLD = 0.70
V1_MAX_MATCHES = 5
def _is_eigenentwicklung_query() -> str:
"""SQL WHERE clause identifying v1 Eigenentwicklung controls."""
return """
generation_strategy = 'ungrouped'
AND (pipeline_version = '1' OR pipeline_version IS NULL)
AND source_citation IS NULL
AND parent_control_uuid IS NULL
AND release_state NOT IN ('rejected', 'merged', 'deprecated')
"""
async def count_v1_controls() -> int:
"""Count how many v1 Eigenentwicklung controls exist."""
with SessionLocal() as db:
row = db.execute(text(f"""
SELECT COUNT(*) AS cnt
FROM canonical_controls
WHERE {_is_eigenentwicklung_query()}
""")).fetchone()
return row.cnt if row else 0
async def enrich_v1_matches(
dry_run: bool = True,
batch_size: int = 100,
offset: int = 0,
) -> dict:
"""Find regulatory matches for v1 Eigenentwicklung controls.
Args:
dry_run: If True, only count — don't write matches.
batch_size: Number of v1 controls to process per call.
offset: Pagination offset (v1 control index).
Returns:
Stats dict with counts, sample matches, and pagination info.
"""
with SessionLocal() as db:
# 1. Load v1 controls (paginated)
v1_controls = db.execute(text(f"""
SELECT id, control_id, title, objective, category
FROM canonical_controls
WHERE {_is_eigenentwicklung_query()}
ORDER BY control_id
LIMIT :limit OFFSET :offset
"""), {"limit": batch_size, "offset": offset}).fetchall()
# Count total for pagination
total_row = db.execute(text(f"""
SELECT COUNT(*) AS cnt
FROM canonical_controls
WHERE {_is_eigenentwicklung_query()}
""")).fetchone()
total_v1 = total_row.cnt if total_row else 0
if not v1_controls:
return {
"dry_run": dry_run,
"processed": 0,
"total_v1": total_v1,
"message": "Kein weiterer Batch — alle v1 Controls verarbeitet.",
}
if dry_run:
return {
"dry_run": True,
"total_v1": total_v1,
"offset": offset,
"batch_size": batch_size,
"sample_controls": [
{
"control_id": r.control_id,
"title": r.title,
"category": r.category,
}
for r in v1_controls[:20]
],
}
# 2. Process each v1 control
processed = 0
matches_inserted = 0
errors = []
sample_matches = []
for v1 in v1_controls:
try:
# Build search text
search_text = f"{v1.title}{v1.objective}"
# Get embedding
embedding = await get_embedding(search_text)
if not embedding:
errors.append({
"control_id": v1.control_id,
"error": "Embedding fehlgeschlagen",
})
continue
# Search Qdrant (cross-regulation, no pattern filter)
# Collection is atomic_controls_dedup (contains ~51k atomare Controls)
results = await qdrant_search_cross_regulation(
embedding, top_k=20,
collection="atomic_controls_dedup",
)
# For each hit: resolve to a regulatory parent with source_citation.
# Atomic controls in Qdrant usually have parent_control_uuid → parent
# has the source_citation. We deduplicate by parent to avoid
# listing the same regulation multiple times.
rank = 0
seen_parents: set[str] = set()
for hit in results:
score = hit.get("score", 0)
if score < V1_MATCH_THRESHOLD:
continue
payload = hit.get("payload", {})
matched_uuid = payload.get("control_uuid")
if not matched_uuid or matched_uuid == str(v1.id):
continue
# Try the matched control itself first, then its parent
matched_row = db.execute(text("""
SELECT c.id, c.control_id, c.title, c.source_citation,
c.severity, c.category, c.parent_control_uuid
FROM canonical_controls c
WHERE c.id = CAST(:uuid AS uuid)
"""), {"uuid": matched_uuid}).fetchone()
if not matched_row:
continue
# Resolve to regulatory control (one with source_citation)
reg_row = matched_row
if not reg_row.source_citation and reg_row.parent_control_uuid:
# Look up parent — the parent has the source_citation
parent_row = db.execute(text("""
SELECT id, control_id, title, source_citation,
severity, category, parent_control_uuid
FROM canonical_controls
WHERE id = CAST(:uuid AS uuid)
AND source_citation IS NOT NULL
"""), {"uuid": str(reg_row.parent_control_uuid)}).fetchone()
if parent_row:
reg_row = parent_row
if not reg_row.source_citation:
continue
# Deduplicate by parent UUID
parent_key = str(reg_row.id)
if parent_key in seen_parents:
continue
seen_parents.add(parent_key)
rank += 1
if rank > V1_MAX_MATCHES:
break
# Extract source info
source_citation = reg_row.source_citation or {}
matched_source = source_citation.get("source") if isinstance(source_citation, dict) else None
matched_article = source_citation.get("article") if isinstance(source_citation, dict) else None
# Insert match — link to the regulatory parent (not the atomic child)
db.execute(text("""
INSERT INTO v1_control_matches
(v1_control_uuid, matched_control_uuid, similarity_score,
match_rank, matched_source, matched_article, match_method)
VALUES
(CAST(:v1_uuid AS uuid), CAST(:matched_uuid AS uuid), :score,
:rank, :source, :article, 'embedding')
ON CONFLICT (v1_control_uuid, matched_control_uuid) DO UPDATE
SET similarity_score = EXCLUDED.similarity_score,
match_rank = EXCLUDED.match_rank
"""), {
"v1_uuid": str(v1.id),
"matched_uuid": str(reg_row.id),
"score": round(score, 3),
"rank": rank,
"source": matched_source,
"article": matched_article,
})
matches_inserted += 1
# Collect sample
if len(sample_matches) < 20:
sample_matches.append({
"v1_control_id": v1.control_id,
"v1_title": v1.title,
"matched_control_id": reg_row.control_id,
"matched_title": reg_row.title,
"matched_source": matched_source,
"matched_article": matched_article,
"similarity_score": round(score, 3),
"match_rank": rank,
})
processed += 1
except Exception as e:
logger.warning("V1 enrichment error for %s: %s", v1.control_id, e)
errors.append({
"control_id": v1.control_id,
"error": str(e),
})
db.commit()
# Pagination
next_offset = offset + batch_size if len(v1_controls) == batch_size else None
return {
"dry_run": False,
"offset": offset,
"batch_size": batch_size,
"next_offset": next_offset,
"total_v1": total_v1,
"processed": processed,
"matches_inserted": matches_inserted,
"errors": errors[:10],
"sample_matches": sample_matches,
}
async def get_v1_matches(control_uuid: str) -> list[dict]:
"""Get all regulatory matches for a specific v1 control.
Args:
control_uuid: The UUID of the v1 control.
Returns:
List of match dicts with control details.
"""
with SessionLocal() as db:
rows = db.execute(text("""
SELECT
m.similarity_score,
m.match_rank,
m.matched_source,
m.matched_article,
m.match_method,
c.control_id AS matched_control_id,
c.title AS matched_title,
c.objective AS matched_objective,
c.severity AS matched_severity,
c.category AS matched_category,
c.source_citation AS matched_source_citation
FROM v1_control_matches m
JOIN canonical_controls c ON c.id = m.matched_control_uuid
WHERE m.v1_control_uuid = CAST(:uuid AS uuid)
ORDER BY m.match_rank
"""), {"uuid": control_uuid}).fetchall()
return [
{
"matched_control_id": r.matched_control_id,
"matched_title": r.matched_title,
"matched_objective": r.matched_objective,
"matched_severity": r.matched_severity,
"matched_category": r.matched_category,
"matched_source": r.matched_source,
"matched_article": r.matched_article,
"matched_source_citation": r.matched_source_citation,
"similarity_score": float(r.similarity_score),
"match_rank": r.match_rank,
"match_method": r.match_method,
}
for r in rows
]
async def get_v1_enrichment_stats() -> dict:
"""Get overview stats for v1 enrichment."""
with SessionLocal() as db:
total_v1 = db.execute(text(f"""
SELECT COUNT(*) AS cnt FROM canonical_controls
WHERE {_is_eigenentwicklung_query()}
""")).fetchone()
matched_v1 = db.execute(text(f"""
SELECT COUNT(DISTINCT m.v1_control_uuid) AS cnt
FROM v1_control_matches m
JOIN canonical_controls c ON c.id = m.v1_control_uuid
WHERE {_is_eigenentwicklung_query().replace('release_state', 'c.release_state').replace('generation_strategy', 'c.generation_strategy').replace('pipeline_version', 'c.pipeline_version').replace('source_citation', 'c.source_citation').replace('parent_control_uuid', 'c.parent_control_uuid')}
""")).fetchone()
total_matches = db.execute(text("""
SELECT COUNT(*) AS cnt FROM v1_control_matches
""")).fetchone()
avg_score = db.execute(text("""
SELECT AVG(similarity_score) AS avg_score FROM v1_control_matches
""")).fetchone()
return {
"total_v1_controls": total_v1.cnt if total_v1 else 0,
"v1_with_matches": matched_v1.cnt if matched_v1 else 0,
"v1_without_matches": (total_v1.cnt if total_v1 else 0) - (matched_v1.cnt if matched_v1 else 0),
"total_matches": total_matches.cnt if total_matches else 0,
"avg_similarity_score": round(float(avg_score.avg_score), 3) if avg_score and avg_score.avg_score else None,
}

View File

View File

@@ -0,0 +1,229 @@
"""
Tests for the Applicability Engine (Phase C2).
Tests the deterministic filtering logic for industry, company size,
and scope signals without requiring a database connection.
"""
import pytest
from services.applicability_engine import (
_matches_company_size,
_matches_industry,
_matches_scope_signals,
_parse_json_text,
)
# =============================================================================
# _parse_json_text
# =============================================================================
class TestParseJsonText:
def test_none_returns_none(self):
assert _parse_json_text(None) is None
def test_valid_json_list(self):
assert _parse_json_text('["all"]') == ["all"]
def test_valid_json_list_multiple(self):
result = _parse_json_text('["Telekommunikation", "Energie"]')
assert result == ["Telekommunikation", "Energie"]
def test_valid_json_dict(self):
result = _parse_json_text('{"requires_any": ["uses_ai"]}')
assert result == {"requires_any": ["uses_ai"]}
def test_invalid_json_returns_none(self):
assert _parse_json_text("not json") is None
def test_empty_string_returns_none(self):
assert _parse_json_text("") is None
def test_already_list_passthrough(self):
val = ["all"]
assert _parse_json_text(val) == ["all"]
def test_already_dict_passthrough(self):
val = {"requires_any": ["uses_ai"]}
assert _parse_json_text(val) == val
def test_integer_returns_none(self):
assert _parse_json_text(42) is None
# =============================================================================
# _matches_industry
# =============================================================================
class TestMatchesIndustry:
def test_null_matches_any_industry(self):
assert _matches_industry(None, "Telekommunikation") is True
def test_all_matches_any_industry(self):
assert _matches_industry('["all"]', "Telekommunikation") is True
assert _matches_industry('["all"]', "Energie") is True
def test_specific_industry_matches(self):
assert _matches_industry(
'["Telekommunikation", "Energie"]', "Telekommunikation"
) is True
def test_specific_industry_no_match(self):
assert _matches_industry(
'["Telekommunikation", "Energie"]', "Gesundheitswesen"
) is False
def test_malformed_json_matches(self):
"""Malformed data should be treated as 'applies to everyone'."""
assert _matches_industry("not json", "anything") is True
def test_all_with_other_industries(self):
assert _matches_industry(
'["all", "Telekommunikation"]', "Gesundheitswesen"
) is True
# =============================================================================
# _matches_company_size
# =============================================================================
class TestMatchesCompanySize:
def test_null_matches_any_size(self):
assert _matches_company_size(None, "medium") is True
def test_all_matches_any_size(self):
assert _matches_company_size('["all"]', "micro") is True
assert _matches_company_size('["all"]', "enterprise") is True
def test_specific_size_matches(self):
assert _matches_company_size(
'["medium", "large", "enterprise"]', "large"
) is True
def test_specific_size_no_match(self):
assert _matches_company_size(
'["medium", "large", "enterprise"]', "small"
) is False
def test_micro_excluded_from_nis2(self):
"""NIS2 typically requires medium+."""
assert _matches_company_size(
'["medium", "large", "enterprise"]', "micro"
) is False
def test_malformed_json_matches(self):
assert _matches_company_size("broken", "medium") is True
# =============================================================================
# _matches_scope_signals
# =============================================================================
class TestMatchesScopeSignals:
def test_null_conditions_always_match(self):
assert _matches_scope_signals(None, ["uses_ai"]) is True
assert _matches_scope_signals(None, []) is True
def test_empty_requires_any_matches(self):
assert _matches_scope_signals('{"requires_any": []}', ["uses_ai"]) is True
def test_no_requires_any_key_matches(self):
assert _matches_scope_signals(
'{"description": "some text"}', ["uses_ai"]
) is True
def test_requires_any_with_matching_signal(self):
conditions = '{"requires_any": ["uses_ai"], "description": "AI Act"}'
assert _matches_scope_signals(conditions, ["uses_ai"]) is True
def test_requires_any_with_no_matching_signal(self):
conditions = '{"requires_any": ["uses_ai"], "description": "AI Act"}'
assert _matches_scope_signals(
conditions, ["third_country_transfer"]
) is False
def test_requires_any_with_one_of_multiple_matching(self):
conditions = '{"requires_any": ["uses_ai", "processes_health_data"]}'
assert _matches_scope_signals(
conditions, ["processes_health_data", "financial_data"]
) is True
def test_requires_any_with_no_signals_provided(self):
conditions = '{"requires_any": ["uses_ai"]}'
assert _matches_scope_signals(conditions, []) is False
def test_malformed_json_matches(self):
assert _matches_scope_signals("broken", ["uses_ai"]) is True
def test_multiple_required_signals_any_match(self):
"""requires_any means at least ONE must match."""
conditions = (
'{"requires_any": ["uses_ai", "third_country_transfer", '
'"processes_health_data"]}'
)
assert _matches_scope_signals(
conditions, ["third_country_transfer"]
) is True
def test_multiple_required_signals_none_match(self):
conditions = (
'{"requires_any": ["uses_ai", "third_country_transfer"]}'
)
assert _matches_scope_signals(
conditions, ["financial_data", "employee_monitoring"]
) is False
# =============================================================================
# Integration-style: combined filtering scenarios
# =============================================================================
class TestCombinedFiltering:
"""Test typical real-world filtering scenarios."""
def test_dsgvo_art5_applies_to_everyone(self):
"""DSGVO Art. 5 = all industries, all sizes, no scope conditions."""
assert _matches_industry('["all"]', "Telekommunikation") is True
assert _matches_company_size('["all"]', "micro") is True
assert _matches_scope_signals(None, []) is True
def test_nis2_art21_kritis_medium_plus(self):
"""NIS2 Art. 21 = KRITIS sectors, medium+."""
industries = '["Energie", "Gesundheitswesen", "Digitale Infrastruktur", "Logistik / Transport"]'
sizes = '["medium", "large", "enterprise"]'
# Matches: Energie + large
assert _matches_industry(industries, "Energie") is True
assert _matches_company_size(sizes, "large") is True
# No match: IT company
assert _matches_industry(industries, "Technologie / IT") is False
# No match: small company
assert _matches_company_size(sizes, "small") is False
def test_ai_act_scope_condition(self):
"""AI Act = all industries, all sizes, but only if uses_ai."""
conditions = '{"requires_any": ["uses_ai"], "description": "Nur bei KI-Einsatz"}'
# Company uses AI
assert _matches_scope_signals(conditions, ["uses_ai"]) is True
# Company does not use AI
assert _matches_scope_signals(conditions, []) is False
assert _matches_scope_signals(
conditions, ["third_country_transfer"]
) is False
def test_tkg_telekom_only(self):
"""TKG = only Telekommunikation, all sizes."""
industries = '["Telekommunikation"]'
assert _matches_industry(industries, "Telekommunikation") is True
assert _matches_industry(industries, "Energie") is False

View File

@@ -143,24 +143,26 @@ services:
- breakpilot-network
# =========================================================
# OCR SERVICE (PaddleOCR PP-OCRv5 Latin)
# OCR SERVICE (PaddleOCR PP-OCRv5)
# =========================================================
paddleocr-service:
build:
context: ./paddleocr-service
dockerfile: Dockerfile
container_name: bp-core-paddleocr
ports:
- "8095:8095"
expose:
- "8095"
environment:
PADDLEOCR_API_KEY: ${PADDLEOCR_API_KEY:-}
FLAGS_use_mkldnn: "0"
volumes:
- paddleocr_models:/root/.paddleocr
labels:
- "traefik.http.services.paddleocr.loadbalancer.server.port=8095"
deploy:
resources:
limits:
memory: 4G
memory: 6G
healthcheck:
test: ["CMD", "curl", "-f", "http://127.0.0.1:8095/health"]
interval: 30s
@@ -171,6 +173,43 @@ services:
networks:
- breakpilot-network
# =========================================================
# PITCH DECK
# =========================================================
pitch-deck:
build:
context: ./pitch-deck
dockerfile: Dockerfile
container_name: bp-core-pitch-deck
expose:
- "3000"
environment:
DATABASE_URL: postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}:${POSTGRES_PORT:-5432}/${POSTGRES_DB}
PITCH_JWT_SECRET: ${PITCH_JWT_SECRET}
PITCH_ADMIN_SECRET: ${PITCH_ADMIN_SECRET}
PITCH_BASE_URL: ${PITCH_BASE_URL:-https://pitch.breakpilot.ai}
MAGIC_LINK_TTL_HOURS: ${MAGIC_LINK_TTL_HOURS:-72}
# Optional: bootstrap first admin via `npm run admin:create` inside the container.
PITCH_ADMIN_BOOTSTRAP_EMAIL: ${PITCH_ADMIN_BOOTSTRAP_EMAIL:-}
PITCH_ADMIN_BOOTSTRAP_NAME: ${PITCH_ADMIN_BOOTSTRAP_NAME:-}
PITCH_ADMIN_BOOTSTRAP_PASSWORD: ${PITCH_ADMIN_BOOTSTRAP_PASSWORD:-}
SMTP_HOST: ${SMTP_HOST}
SMTP_PORT: ${SMTP_PORT:-587}
SMTP_USERNAME: ${SMTP_USERNAME}
SMTP_PASSWORD: ${SMTP_PASSWORD}
SMTP_FROM_NAME: ${SMTP_FROM_NAME:-BreakPilot}
SMTP_FROM_ADDR: ${SMTP_FROM_ADDR:-noreply@breakpilot.ai}
NODE_ENV: production
healthcheck:
test: ["CMD", "wget", "-q", "--spider", "http://127.0.0.1:3000/api/health"]
interval: 30s
timeout: 10s
start_period: 15s
retries: 3
restart: unless-stopped
networks:
- breakpilot-network
# =========================================================
# HEALTH AGGREGATOR
# =========================================================
@@ -183,7 +222,7 @@ services:
- "8099"
environment:
PORT: 8099
CHECK_SERVICES: "valkey:6379,consent-service:8081,rag-service:8097,embedding-service:8087,paddleocr-service:8095"
CHECK_SERVICES: "valkey:6379,consent-service:8081,rag-service:8097,embedding-service:8087,paddleocr-service:8095,pitch-deck:3000"
healthcheck:
test: ["CMD", "curl", "-f", "http://127.0.0.1:8099/health"]
interval: 30s

175
docker-compose.hetzner.yml Normal file
View File

@@ -0,0 +1,175 @@
# =========================================================
# BreakPilot Core — Hetzner Override (x86_64)
# =========================================================
# Verwendung:
# docker compose -f docker-compose.yml -f docker-compose.hetzner.yml up -d \
# postgres valkey qdrant ollama embedding-service rag-service \
# backend-core consent-service health-aggregator
#
# Aenderungen gegenueber Basis (docker-compose.yml):
# - platform: linux/amd64 (statt arm64)
# - Ollama Container fuer CPU-Embeddings (bge-m3)
# - Mailpit ersetzt durch Dummy (kein Mail-Dev-Server noetig)
# - Vault, Nginx, Gitea etc. deaktiviert via Profile
# - Netzwerk: auto-create (nicht external)
# =========================================================
networks:
breakpilot-network:
external: true
name: breakpilot-network
services:
# =========================================================
# NEUE SERVICES
# =========================================================
# Ollama fuer Embeddings (CPU-only, bge-m3)
ollama:
image: ollama/ollama:latest
container_name: bp-core-ollama
platform: linux/amd64
volumes:
- ollama_models:/root/.ollama
healthcheck:
test: ["CMD-SHELL", "curl -sf http://127.0.0.1:11434/api/tags || exit 1"]
interval: 15s
timeout: 10s
retries: 5
start_period: 30s
restart: unless-stopped
networks:
- breakpilot-network
# =========================================================
# PLATFORM OVERRIDES (arm64 → amd64)
# =========================================================
backend-core:
platform: linux/amd64
build:
context: ./backend-core
dockerfile: Dockerfile
args:
TARGETARCH: amd64
ports:
- "8000:8000"
environment:
DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-breakpilot}:${POSTGRES_PASSWORD:-breakpilot123}@postgres:5432/${POSTGRES_DB:-breakpilot_db}?options=-csearch_path%3Dcore,public
JWT_SECRET: ${JWT_SECRET:-your-super-secret-jwt-key-change-in-production}
ENVIRONMENT: ${ENVIRONMENT:-production}
VALKEY_URL: redis://valkey:6379/0
SESSION_TTL_HOURS: ${SESSION_TTL_HOURS:-24}
CONSENT_SERVICE_URL: http://consent-service:8081
USE_VAULT_SECRETS: "false"
SMTP_HOST: ${SMTP_HOST:-smtp.example.com}
SMTP_PORT: ${SMTP_PORT:-587}
SMTP_USERNAME: ${SMTP_USERNAME:-}
SMTP_PASSWORD: ${SMTP_PASSWORD:-}
SMTP_FROM_NAME: ${SMTP_FROM_NAME:-BreakPilot}
SMTP_FROM_ADDR: ${SMTP_FROM_ADDR:-noreply@breakpilot.app}
consent-service:
platform: linux/amd64
environment:
DATABASE_URL: postgres://${POSTGRES_USER:-breakpilot}:${POSTGRES_PASSWORD:-breakpilot123}@postgres:5432/${POSTGRES_DB:-breakpilot_db}
JWT_SECRET: ${JWT_SECRET:-your-super-secret-jwt-key-change-in-production}
JWT_REFRESH_SECRET: ${JWT_REFRESH_SECRET:-your-refresh-secret}
PORT: 8081
ENVIRONMENT: ${ENVIRONMENT:-production}
ALLOWED_ORIGINS: "*"
VALKEY_URL: redis://valkey:6379/0
SESSION_TTL_HOURS: ${SESSION_TTL_HOURS:-24}
SMTP_HOST: ${SMTP_HOST:-smtp.example.com}
SMTP_PORT: ${SMTP_PORT:-587}
SMTP_USERNAME: ${SMTP_USERNAME:-}
SMTP_PASSWORD: ${SMTP_PASSWORD:-}
SMTP_FROM_NAME: ${SMTP_FROM_NAME:-BreakPilot}
SMTP_FROM_ADDR: ${SMTP_FROM_ADDR:-noreply@breakpilot.app}
FRONTEND_URL: ${FRONTEND_URL:-https://admin-dev.breakpilot.ai}
billing-service:
platform: linux/amd64
rag-service:
platform: linux/amd64
ports:
- "8097:8097"
environment:
PORT: 8097
QDRANT_URL: http://qdrant:6333
MINIO_ENDPOINT: nbg1.your-objectstorage.com
MINIO_ACCESS_KEY: ${MINIO_ACCESS_KEY:-T18RGFVXXG2ZHQ5404TP}
MINIO_SECRET_KEY: ${MINIO_SECRET_KEY:-KOUU4WO6wh07cQjNgh0IZHkeKQrVfBz6hnIGpNss}
MINIO_BUCKET: ${MINIO_BUCKET:-breakpilot-rag}
MINIO_SECURE: "true"
EMBEDDING_SERVICE_URL: http://embedding-service:8087
OLLAMA_URL: http://ollama:11434
OLLAMA_EMBED_MODEL: ${OLLAMA_EMBED_MODEL:-bge-m3}
JWT_SECRET: ${JWT_SECRET:-your-super-secret-jwt-key-change-in-production}
ENVIRONMENT: ${ENVIRONMENT:-production}
embedding-service:
platform: linux/amd64
ports:
- "8087:8087"
health-aggregator:
platform: linux/amd64
environment:
PORT: 8099
CHECK_SERVICES: "postgres:5432,valkey:6379,qdrant:6333,backend-core:8000,rag-service:8097,embedding-service:8087"
# =========================================================
# DUMMY-ERSATZ FUER ABHAENGIGKEITEN
# =========================================================
# backend-core + consent-service haengen von mailpit ab
# (depends_on merged bei compose override, kann nicht entfernt werden)
# → Mailpit durch leichtgewichtigen Dummy ersetzen
mailpit:
image: alpine:3.19
entrypoint: ["sh", "-c", "echo 'Mailpit dummy on Hetzner' && tail -f /dev/null"]
volumes: []
ports: []
environment: {}
# Qdrant: RocksDB braucht mehr open files
qdrant:
ulimits:
nofile:
soft: 65536
hard: 65536
# minio: rag-service haengt davon ab (depends_on)
# Lokal laufen lassen, aber rag-service nutzt externe Hetzner Object Storage
# minio bleibt unveraendert (klein, ~50MB RAM)
# =========================================================
# DEAKTIVIERTE SERVICES (via profiles)
# =========================================================
nginx:
profiles: ["disabled"]
vault:
profiles: ["disabled"]
vault-init:
profiles: ["disabled"]
vault-agent:
profiles: ["disabled"]
gitea:
profiles: ["disabled"]
gitea-runner:
profiles: ["disabled"]
night-scheduler:
profiles: ["disabled"]
admin-core:
profiles: ["disabled"]
pitch-deck:
profiles: ["disabled"]
levis-holzbau:
profiles: ["disabled"]
volumes:
ollama_models:

View File

@@ -56,10 +56,12 @@ services:
- "8091:8091" # Voice Service (WSS)
- "8093:8093" # AI Compliance SDK
- "8097:8097" # RAG Service (NEU)
#- "8098:8098" # Control Pipeline (intern only, kein Nginx-Port noetig)
- "8443:8443" # Jitsi Meet
- "3008:3008" # Admin Core
- "3010:3010" # Portal Dashboard
- "8011:8011" # Compliance Docs (MkDocs)
- "3012:3012" # Pitch Deck
volumes:
- ./nginx/conf.d:/etc/nginx/conf.d:ro
- vault_certs:/etc/nginx/certs:ro
@@ -347,11 +349,11 @@ services:
environment:
PORT: 8097
QDRANT_URL: http://qdrant:6333
MINIO_ENDPOINT: minio:9000
MINIO_ACCESS_KEY: ${MINIO_ROOT_USER:-breakpilot}
MINIO_SECRET_KEY: ${MINIO_ROOT_PASSWORD:-breakpilot123}
MINIO_ENDPOINT: nbg1.your-objectstorage.com
MINIO_ACCESS_KEY: T18RGFVXXG2ZHQ5404TP
MINIO_SECRET_KEY: KOUU4WO6wh07cQjNgh0IZHkeKQrVfBz6hnIGpNss
MINIO_BUCKET: ${MINIO_BUCKET:-breakpilot-rag}
MINIO_SECURE: "false"
MINIO_SECURE: "true"
EMBEDDING_SERVICE_URL: http://embedding-service:8087
OLLAMA_URL: ${OLLAMA_URL:-http://host.docker.internal:11434}
OLLAMA_EMBED_MODEL: ${OLLAMA_EMBED_MODEL:-bge-m3}
@@ -376,6 +378,50 @@ services:
networks:
- breakpilot-network
# =========================================================
# CONTROL PIPELINE (Entwickler-only, nicht kundenrelevant)
# =========================================================
control-pipeline:
build:
context: ./control-pipeline
dockerfile: Dockerfile
container_name: bp-core-control-pipeline
platform: linux/arm64
expose:
- "8098"
environment:
PORT: 8098
DATABASE_URL: postgresql://${POSTGRES_USER:-breakpilot}:${POSTGRES_PASSWORD:-breakpilot123}@postgres:5432/${POSTGRES_DB:-breakpilot_db}
SCHEMA_SEARCH_PATH: compliance,core,public
QDRANT_URL: http://qdrant:6333
EMBEDDING_SERVICE_URL: http://embedding-service:8087
ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-}
CONTROL_GEN_ANTHROPIC_MODEL: ${CONTROL_GEN_ANTHROPIC_MODEL:-claude-sonnet-4-6}
DECOMPOSITION_LLM_MODEL: ${DECOMPOSITION_LLM_MODEL:-claude-haiku-4-5-20251001}
OLLAMA_URL: ${OLLAMA_URL:-http://host.docker.internal:11434}
CONTROL_GEN_OLLAMA_MODEL: ${CONTROL_GEN_OLLAMA_MODEL:-qwen3.5:35b-a3b}
SDK_URL: http://ai-compliance-sdk:8090
JWT_SECRET: ${JWT_SECRET:-your-super-secret-jwt-key-change-in-production}
ENVIRONMENT: ${ENVIRONMENT:-development}
extra_hosts:
- "host.docker.internal:host-gateway"
depends_on:
postgres:
condition: service_healthy
qdrant:
condition: service_healthy
embedding-service:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://127.0.0.1:8098/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 10s
restart: unless-stopped
networks:
- breakpilot-network
embedding-service:
build:
context: ./embedding-service
@@ -828,13 +874,17 @@ services:
dockerfile: Dockerfile
container_name: bp-core-pitch-deck
platform: linux/arm64
ports:
- "3012:3000"
expose:
- "3000"
environment:
NODE_ENV: production
DATABASE_URL: postgres://${POSTGRES_USER:-breakpilot}:${POSTGRES_PASSWORD:-breakpilot123}@postgres:5432/${POSTGRES_DB:-breakpilot_db}
OLLAMA_URL: ${OLLAMA_URL:-http://host.docker.internal:11434}
OLLAMA_MODEL: ${OLLAMA_MODEL:-qwen3.5:35b-a3b}
PITCH_JWT_SECRET: ${PITCH_JWT_SECRET:-7025f5da6d2ea384353ea6debddae0ea9e2dbca151a1df4b65be8cb80a5cf002}
PITCH_ADMIN_SECRET: ${PITCH_ADMIN_SECRET:-40df9e6f2ca2e90729030af37bf79199710b09c898cac9df}
LITELLM_URL: ${LITELLM_URL:-https://llm-dev.meghsakha.com}
LITELLM_MODEL: ${LITELLM_MODEL:-gpt-oss-120b}
LITELLM_API_KEY: ${LITELLM_API_KEY:-sk-0nAyxaMVbIqmz_ntnndzag}
TTS_SERVICE_URL: http://bp-compliance-tts:8095
extra_hosts:
- "host.docker.internal:host-gateway"
depends_on:
@@ -843,3 +893,20 @@ services:
restart: unless-stopped
networks:
- breakpilot-network
# =========================================================
# LEVIS HOLZBAU - Kinder-Holzwerk-Website
# =========================================================
levis-holzbau:
build:
context: ./levis-holzbau
dockerfile: Dockerfile
container_name: bp-core-levis-holzbau
platform: linux/arm64
ports:
- "3013:3000"
environment:
NODE_ENV: production
restart: unless-stopped
networks:
- breakpilot-network

View File

@@ -1,194 +1,77 @@
# Umgebungs-Architektur
## Übersicht
## Uebersicht
BreakPilot verwendet eine 3-Umgebungs-Strategie für sichere Entwicklung und Deployment:
BreakPilot verwendet zwei Umgebungen:
```
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Development │────▶│ Staging │────▶│ Production │
│ (develop) (staging) (main)
└─────────────────┘ └─────────────────┘ └─────────────────┘
Tägliche Getesteter Code Produktionsreif
Entwicklung
┌─────────────────┐ ┌─────────────────┐
│ Development │───── git push ────▶│ Production │
│ (Mac Mini) (Orca)
└─────────────────┘ └─────────────────┘
Lokale Automatisch
Entwicklung via Orca
```
## Umgebungen
### Development (Dev)
### Development (Lokal — Mac Mini)
**Zweck:** Tägliche Entwicklungsarbeit
**Zweck:** Lokale Entwicklung und Tests
| Eigenschaft | Wert |
|-------------|------|
| Git Branch | `develop` |
| Compose File | `docker-compose.yml` + `docker-compose.override.yml` (auto) |
| Env File | `.env.dev` |
| Database | `breakpilot_dev` |
| Git Branch | `main` |
| Compose File | `docker-compose.yml` |
| Database | Lokale PostgreSQL |
| Debug | Aktiviert |
| Hot-Reload | Aktiviert |
**Start:**
```bash
./scripts/start.sh dev
# oder einfach:
docker compose up -d
ssh macmini "cd ~/Projekte/breakpilot-core && /usr/local/bin/docker compose up -d"
```
### Staging
### Production (Orca)
**Zweck:** Getesteter, freigegebener Code vor Produktion
| Eigenschaft | Wert |
|-------------|------|
| Git Branch | `staging` |
| Compose File | `docker-compose.yml` + `docker-compose.staging.yml` |
| Env File | `.env.staging` |
| Database | `breakpilot_staging` (separates Volume) |
| Debug | Deaktiviert |
| Hot-Reload | Deaktiviert |
**Start:**
```bash
./scripts/start.sh staging
# oder:
docker compose -f docker-compose.yml -f docker-compose.staging.yml up -d
```
### Production (Prod)
**Zweck:** Live-System für Endbenutzer (ab Launch)
**Zweck:** Live-System
| Eigenschaft | Wert |
|-------------|------|
| Git Branch | `main` |
| Compose File | `docker-compose.yml` + `docker-compose.prod.yml` |
| Env File | `.env.prod` (NICHT im Repository!) |
| Database | `breakpilot_prod` (separates Volume) |
| Deployment | Orca (automatisch bei Push auf gitea) |
| Database | Externe PostgreSQL (TLS) |
| Debug | Deaktiviert |
| Vault | Pflicht (keine Env-Fallbacks) |
## Datenbank-Trennung
Jede Umgebung verwendet separate Docker Volumes für vollständige Datenisolierung:
```
┌─────────────────────────────────────────────────────────────┐
│ PostgreSQL Volumes │
├─────────────────────────────────────────────────────────────┤
│ breakpilot-dev_postgres_data │ Development Database │
│ breakpilot_staging_postgres │ Staging Database │
│ breakpilot_prod_postgres │ Production Database │
└─────────────────────────────────────────────────────────────┘
```
## Port-Mapping
Um mehrere Umgebungen gleichzeitig laufen zu lassen, verwenden sie unterschiedliche Ports:
| Service | Dev Port | Staging Port | Prod Port |
|---------|----------|--------------|-----------|
| Backend | 8000 | 8001 | 8000 |
| PostgreSQL | 5432 | 5433 | - (intern) |
| MinIO | 9000/9001 | 9002/9003 | - (intern) |
| Qdrant | 6333/6334 | 6335/6336 | - (intern) |
| Mailpit | 8025/1025 | 8026/1026 | - (deaktiviert) |
## Git Branching Strategie
```
main (Prod) ← Nur Release-Merges, geschützt
staging ← Getesteter Code, Review erforderlich
develop (Dev) ← Tägliche Arbeit, Default-Branch
feature/* ← Feature-Branches (optional)
```
### Workflow
1. **Entwicklung:** Arbeite auf `develop`
2. **Code-Review:** Erstelle PR von Feature-Branch → `develop`
3. **Staging:** Promote `develop``staging` mit Tests
4. **Release:** Promote `staging``main` nach Freigabe
### Promotion-Befehle
**Deploy:**
```bash
# develop → staging
./scripts/promote.sh dev-to-staging
# staging → main (Production)
./scripts/promote.sh staging-to-prod
git push origin main && git push gitea main
# Orca baut und deployt automatisch
```
## Secrets Management
### Development
- `.env.dev` enthält Entwicklungs-Credentials
- Vault optional (Dev-Token)
- Mailpit für E-Mail-Tests
### Staging
- `.env.staging` enthält Test-Credentials
- Vault empfohlen
- Mailpit für E-Mail-Sicherheit
### Production
- `.env.prod` NICHT im Repository
- Vault PFLICHT
- Echte SMTP-Konfiguration
Siehe auch: [Secrets Management](./secrets-management.md)
## Docker Compose Architektur
```
docker-compose.yml ← Basis-Konfiguration
docker-compose.yml ← Basis-Konfiguration (lokal, arm64)
── docker-compose.override.yml ← Dev (auto-geladen)
├── docker-compose.staging.yml ← Staging (explizit)
└── docker-compose.prod.yml ← Production (explizit)
── docker-compose.orca.yml Production Override (amd64)
```
### Automatisches Laden
Orca verwendet automatisch beide Compose-Files fuer den Production-Build.
Docker Compose lädt automatisch:
1. `docker-compose.yml`
2. `docker-compose.override.yml` (falls vorhanden)
## Secrets Management
Daher startet `docker compose up` automatisch die Dev-Umgebung.
### Development
- `.env` enthält Entwicklungs-Credentials
- Vault optional (Dev-Token)
- Mailpit für E-Mail-Tests
## Helper Scripts
### Production
- `.env` auf dem Server (nicht im Repository)
- Vault PFLICHT
- Echte SMTP-Konfiguration
| Script | Beschreibung |
|--------|--------------|
| `scripts/env-switch.sh` | Wechselt zwischen Umgebungen |
| `scripts/start.sh` | Startet Services für Umgebung |
| `scripts/stop.sh` | Stoppt Services |
| `scripts/promote.sh` | Promotet Code zwischen Branches |
| `scripts/status.sh` | Zeigt aktuellen Status |
## Verifikation
Nach Setup prüfen:
```bash
# Status anzeigen
./scripts/status.sh
# Branches prüfen
git branch -v
# Volumes prüfen
docker volume ls | grep breakpilot
```
Siehe auch: [Secrets Management](./secrets-management.md)
## Verwandte Dokumentation

View File

@@ -1,15 +1,14 @@
# CI/CD Pipeline
Übersicht über den Deployment-Prozess für Breakpilot.
Uebersicht ueber den Deployment-Prozess fuer BreakPilot.
## Übersicht
## Uebersicht
| Komponente | Build-Tool | Deployment |
|------------|------------|------------|
| Frontend (Next.js) | Docker | Mac Mini |
| Backend (FastAPI) | Docker | Mac Mini |
| Go Services | Docker (Multi-stage) | Mac Mini |
| Documentation | MkDocs | Docker (Nginx) |
| Repo | Deployment | Trigger | Compose File |
|------|-----------|---------|--------------|
| **breakpilot-core** | Orca (automatisch) | Push auf `orca` Branch | `docker-compose.orca.yml` |
| **breakpilot-compliance** | Orca (automatisch) | Push auf `main` Branch | `docker-compose.yml` + `docker-compose.orca.yml` |
| **breakpilot-lehrer** | Mac Mini (lokal) | Manuell `docker compose` | `docker-compose.yml` |
## Deployment-Architektur
@@ -17,287 +16,146 @@
┌─────────────────────────────────────────────────────────────────┐
│ Entwickler-MacBook │
│ │
│ breakpilot-core/
├── admin-core/ (Next.js Admin, Port 3008)
├── backend-core/ (Python FastAPI, Port 8000)
│ ├── consent-service/ (Go Service, Port 8081) │
│ ├── billing-service/ (Go Service, Port 8083) │
│ └── docs-src/ (MkDocs) │
│ breakpilot-core/ → git push gitea orca
breakpilot-compliance/ → git push gitea main
breakpilot-lehrer/ → git push + ssh macmini docker ...
│ │
│ git push → Gitea Actions (automatisch) │
│ oder manuell: git push && ssh macmini docker compose build │
└───────────────────────────────┬─────────────────────────────────┘
│ git push origin main
┌─────────────────────────────────────────────────────────────────┐
Mac Mini Server (bp-core-*)
Docker Compose
│ ├── admin-core (Port 3008)
── backend-core (Port 8000)
├── consent-service (Port 8081)
├── billing-service (Port 8083)
│ ├── gitea (Port 3003) + gitea-runner (Gitea Actions)
│ ├── docs (Port 8011)
│ ├── postgres, valkey, qdrant, minio
── vault, nginx, night-scheduler, health
└─────────────────────────────────────────────────────────────────┘
┌───────────┴───────────┐
┌───────────────────────────┐ ┌───────────────────────────┐
Orca (Production) │ │ Mac Mini (Lokal/Dev)
Gitea Actions │ breakpilot-lehrer
│ ├── Tests │ │ ├── studio-v2
── Orca API Deploy ├── klausur-service
├── backend-lehrer
Core Services: └── voice-service
│ ├── consent-service │ │
│ ├── rag-service Core Services (lokal):
│ ├── embedding-service ├── postgres
── paddleocr-service ├── valkey, vault
└── health-aggregator │ ├── nginx, gitea
│ │ │ └── ... │
│ Compliance Services: │ │ │
│ ├── admin-compliance │ │ │
│ ├── backend-compliance │ │ │
│ ├── ai-compliance-sdk │ │ │
│ └── developer-portal │ │ │
└───────────────────────────┘ └───────────────────────────┘
```
## Sync & Deploy Workflow
## breakpilot-core → Orca
### 1. Dateien synchronisieren
```bash
# Sync aller relevanten Verzeichnisse zum Mac Mini
rsync -avz --delete \
--exclude 'node_modules' \
--exclude '.next' \
--exclude '.git' \
--exclude '__pycache__' \
--exclude 'venv' \
--exclude '.pytest_cache' \
/Users/benjaminadmin/Projekte/breakpilot-core/ \
macmini:/Users/benjaminadmin/Projekte/breakpilot-core/
```
### 2. Container bauen
```bash
# Einzelnen Service bauen
ssh macmini "/usr/local/bin/docker compose \
-f /Users/benjaminadmin/Projekte/breakpilot-core/docker-compose.yml \
build --no-cache <service-name>"
# Beispiele:
# studio-v2, admin-v2, website, backend, klausur-service, docs
```
### 3. Container deployen
```bash
# Container neu starten
ssh macmini "/usr/local/bin/docker compose \
-f /Users/benjaminadmin/Projekte/breakpilot-core/docker-compose.yml \
up -d <service-name>"
```
### 4. Logs prüfen
```bash
# Container-Logs anzeigen
ssh macmini "/usr/local/bin/docker compose \
-f /Users/benjaminadmin/Projekte/breakpilot-core/docker-compose.yml \
logs -f <service-name>"
```
## Service-spezifische Deployments
### Next.js Frontend (studio-v2, admin-v2, website)
```bash
# 1. Sync
rsync -avz --delete \
--exclude 'node_modules' --exclude '.next' --exclude '.git' \
/Users/benjaminadmin/Projekte/breakpilot-core/studio-v2/ \
macmini:/Users/benjaminadmin/Projekte/breakpilot-core/studio-v2/
# 2. Build & Deploy
ssh macmini "/usr/local/bin/docker compose \
-f /Users/benjaminadmin/Projekte/breakpilot-core/docker-compose.yml \
build --no-cache studio-v2 && \
/usr/local/bin/docker compose \
-f /Users/benjaminadmin/Projekte/breakpilot-core/docker-compose.yml \
up -d studio-v2"
```
### Python Services (backend, klausur-service, voice-service)
```bash
# Build mit requirements.txt
ssh macmini "/usr/local/bin/docker compose \
-f /Users/benjaminadmin/Projekte/breakpilot-core/docker-compose.yml \
build klausur-service && \
/usr/local/bin/docker compose \
-f /Users/benjaminadmin/Projekte/breakpilot-core/docker-compose.yml \
up -d klausur-service"
```
### Go Services (consent-service, ai-compliance-sdk)
```bash
# Multi-stage Build (Go → Alpine)
ssh macmini "/usr/local/bin/docker compose \
-f /Users/benjaminadmin/Projekte/breakpilot-core/docker-compose.yml \
build --no-cache consent-service && \
/usr/local/bin/docker compose \
-f /Users/benjaminadmin/Projekte/breakpilot-core/docker-compose.yml \
up -d consent-service"
```
### MkDocs Dokumentation
```bash
# Build & Deploy
ssh macmini "/usr/local/bin/docker compose \
-f /Users/benjaminadmin/Projekte/breakpilot-core/docker-compose.yml \
build --no-cache docs && \
/usr/local/bin/docker compose \
-f /Users/benjaminadmin/Projekte/breakpilot-core/docker-compose.yml \
up -d docs"
# Verfügbar unter: http://macmini:8009
```
## Health Checks
### Service-Status prüfen
```bash
# Alle Container-Status
ssh macmini "docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'"
# Health-Endpoints prüfen
curl -s http://macmini:8000/health
curl -s http://macmini:8081/health
curl -s http://macmini:8086/health
curl -s http://macmini:8090/health
```
### Logs analysieren
```bash
# Letzte 100 Zeilen
ssh macmini "docker logs --tail 100 breakpilot-core-backend-1"
# Live-Logs folgen
ssh macmini "docker logs -f breakpilot-core-backend-1"
```
## Rollback
### Container auf vorherige Version zurücksetzen
```bash
# 1. Aktuelles Image taggen
ssh macmini "docker tag breakpilot-core-backend:latest breakpilot-core-backend:backup"
# 2. Altes Image deployen
ssh macmini "/usr/local/bin/docker compose \
-f /Users/benjaminadmin/Projekte/breakpilot-core/docker-compose.yml \
up -d backend"
# 3. Bei Problemen: Backup wiederherstellen
ssh macmini "docker tag breakpilot-core-backend:backup breakpilot-core-backend:latest"
```
## Troubleshooting
### Container startet nicht
```bash
# 1. Logs prüfen
ssh macmini "docker logs breakpilot-core-<service>-1"
# 2. Container manuell starten für Debug-Output
ssh macmini "docker compose -f .../docker-compose.yml run --rm <service>"
# 3. In Container einloggen
ssh macmini "docker exec -it breakpilot-core-<service>-1 /bin/sh"
```
### Port bereits belegt
```bash
# Port-Belegung prüfen
ssh macmini "lsof -i :8000"
# Container mit dem Port finden
ssh macmini "docker ps --filter publish=8000"
```
### Build-Fehler
```bash
# Cache komplett leeren
ssh macmini "docker builder prune -a"
# Ohne Cache bauen
ssh macmini "docker compose build --no-cache <service>"
```
## Monitoring
### Resource-Nutzung
```bash
# CPU/Memory aller Container
ssh macmini "docker stats --no-stream"
# Disk-Nutzung
ssh macmini "docker system df"
```
### Cleanup
```bash
# Ungenutzte Images/Container entfernen
ssh macmini "docker system prune -a --volumes"
# Nur dangling Images
ssh macmini "docker image prune"
```
## Umgebungsvariablen
Umgebungsvariablen werden über `.env` Dateien und docker-compose.yml verwaltet:
### Pipeline
```yaml
# docker-compose.yml
services:
backend:
environment:
- DATABASE_URL=postgresql://...
- REDIS_URL=redis://valkey:6379
- SECRET_KEY=${SECRET_KEY}
# .gitea/workflows/deploy-orca.yml
on:
push:
branches: [orca]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Deploy via Orca API
# Triggert Orca Build + Deploy ueber API
# Secrets: ORCA_API_TOKEN, ORCA_RESOURCE_UUID, ORCA_BASE_URL
```
**Wichtig**: Sensible Werte niemals in Git committen. Stattdessen:
- `.env` Datei auf dem Server pflegen
- Secrets über HashiCorp Vault (siehe unten)
### Workflow
```bash
# 1. Code auf MacBook bearbeiten
# 2. Committen und pushen:
git push origin main && git push gitea main
# 3. Fuer Production-Deploy:
git push gitea orca
# 4. Status pruefen:
# https://gitea.meghsakha.com/Benjamin_Boenisch/breakpilot-core/actions
```
### Orca-deployed Services
| Service | Container | Beschreibung |
|---------|-----------|--------------|
| valkey | bp-core-valkey | Session-Cache |
| consent-service | bp-core-consent-service | Consent-Management (Go) |
| rag-service | bp-core-rag-service | Semantische Suche |
| embedding-service | bp-core-embedding-service | Text-Embeddings |
| paddleocr-service | bp-core-paddleocr | OCR Engine (x86_64) |
| health-aggregator | bp-core-health | Health-Check Aggregator |
## breakpilot-compliance → Orca
### Pipeline
```yaml
# .gitea/workflows/ci.yaml
on:
push:
branches: [main, develop]
jobs:
# Lint (nur PRs)
# Tests (Go, Python, Node.js)
# Validate Canonical Controls
# Deploy (nur main, nach allen Tests)
```
### Workflow
```bash
# Committen und pushen → Orca deployt automatisch:
git push origin main && git push gitea main
# CI-Status pruefen:
# https://gitea.meghsakha.com/Benjamin_Boenisch/breakpilot-compliance/actions
# Health Checks:
curl -sf https://api-dev.breakpilot.ai/health
curl -sf https://sdk-dev.breakpilot.ai/health
```
## breakpilot-lehrer → Mac Mini (lokal)
### Workflow
```bash
# 1. Code auf MacBook bearbeiten
# 2. Committen und pushen:
git push origin main && git push gitea main
# 3. Auf Mac Mini pullen und Container neu bauen:
ssh macmini "git -C /Users/benjaminadmin/Projekte/breakpilot-lehrer pull --no-rebase origin main"
ssh macmini "/usr/local/bin/docker compose -f /Users/benjaminadmin/Projekte/breakpilot-lehrer/docker-compose.yml build --no-cache <service>"
ssh macmini "/usr/local/bin/docker compose -f /Users/benjaminadmin/Projekte/breakpilot-lehrer/docker-compose.yml up -d <service>"
```
## Gitea Actions
### Überblick
### Ueberblick
BreakPilot Core nutzt **Gitea Actions** (GitHub Actions-kompatibel) als CI/CD-System. Der `act_runner` läuft als Container auf dem Mac Mini und führt Pipelines direkt bei Code-Push aus.
BreakPilot nutzt **Gitea Actions** (GitHub Actions-kompatibel) als CI/CD-System. Der `act_runner` laeuft als Container auf dem Mac Mini und fuehrt Pipelines aus.
| Komponente | Container | Beschreibung |
|------------|-----------|--------------|
| Gitea | `bp-core-gitea` (Port 3003) | Git-Server + Actions-Trigger |
| Gitea Runner | `bp-core-gitea-runner` | Führt Actions-Workflows aus |
| Gitea Runner | `bp-core-gitea-runner` | Fuehrt Actions-Workflows aus |
### Pipeline-Konfiguration
Workflows liegen im Repo unter `.gitea/workflows/`:
Workflows liegen in jedem Repo unter `.gitea/workflows/`:
```yaml
# .gitea/workflows/main.yml
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Build & Test
run: docker compose build
```
| Repo | Workflow | Branch | Aktion |
|------|----------|--------|--------|
| breakpilot-core | `deploy-orca.yml` | `orca` | Orca API Deploy |
| breakpilot-compliance | `ci.yaml` | `main` | Tests + Orca Deploy |
### Runner-Token erneuern
@@ -314,12 +172,79 @@ ssh macmini "/usr/local/bin/docker compose \
up -d --force-recreate gitea-runner"
```
### Pipeline-Status prüfen
### Pipeline-Status pruefen
```bash
# Runner-Logs
ssh macmini "/usr/local/bin/docker logs -f bp-core-gitea-runner"
# Laufende Jobs
ssh macmini "/usr/local/bin/docker exec bp-core-gitea-runner act_runner list"
```
## Health Checks
### Production (Orca)
```bash
# Core PaddleOCR
curl -sf https://ocr.breakpilot.com/health
# Compliance
curl -sf https://api-dev.breakpilot.ai/health
curl -sf https://sdk-dev.breakpilot.ai/health
```
### Lokal (Mac Mini)
```bash
# Core Health Aggregator
curl -sf http://macmini:8099/health
# Lehrer Backend
curl -sf https://macmini:8001/health
# Klausur-Service
curl -sf https://macmini:8086/health
```
## Troubleshooting
### Container startet nicht
```bash
# Logs pruefen (lokal)
ssh macmini "/usr/local/bin/docker logs bp-core-<service>"
# In Container einloggen
ssh macmini "/usr/local/bin/docker exec -it bp-core-<service> /bin/sh"
```
### Build-Fehler
```bash
# Cache komplett leeren
ssh macmini "docker builder prune -a"
# Ohne Cache bauen
ssh macmini "docker compose build --no-cache <service>"
```
## Rollback
### Orca
Ein Redeploy mit einem aelteren Commit kann durch Zuruecksetzen des Branches ausgeloest werden:
```bash
# Branch auf vorherigen Commit zuruecksetzen und pushen
git reset --hard <previous-commit>
git push gitea orca --force
```
### Lokal (Mac Mini)
```bash
# Image taggen als Backup
ssh macmini "docker tag breakpilot-lehrer-klausur-service:latest breakpilot-lehrer-klausur-service:backup"
# Bei Problemen: Backup wiederherstellen
ssh macmini "docker tag breakpilot-lehrer-klausur-service:backup breakpilot-lehrer-klausur-service:latest"
```

View File

@@ -12,6 +12,14 @@ BreakPilot besteht aus drei unabhaengigen Projekten:
| **breakpilot-lehrer** | Bildungs-Stack (Team A) | `bp-lehrer-*` | Blau |
| **breakpilot-compliance** | DSGVO/Compliance-Stack (Team B) | `bp-compliance-*` | Lila |
### Deployment-Modell
| Repo | Deployment | Trigger |
|------|-----------|---------|
| **breakpilot-core** | Orca (automatisch) | Push auf gitea main |
| **breakpilot-compliance** | Orca (automatisch) | Push auf gitea main |
| **breakpilot-lehrer** | Mac Mini (lokal) | Manuell docker compose |
## Core Services
| Service | Container | Port | Beschreibung |
@@ -30,32 +38,11 @@ BreakPilot besteht aus drei unabhaengigen Projekten:
| Admin Core | bp-core-admin | 3008 | Admin-Dashboard (Next.js) |
| Health Aggregator | bp-core-health | 8099 | Service-Health Monitoring |
| Night Scheduler | bp-core-night-scheduler | 8096 | Nachtabschaltung |
| Pitch Deck | bp-core-pitch-deck | 3012 | Investor-Praesentation |
| Mailpit | bp-core-mailpit | 8025 | E-Mail (Entwicklung) |
| Gitea | bp-core-gitea | 3003 | Git-Server |
| Gitea Runner | bp-core-gitea-runner | - | CI/CD (Gitea Actions) |
| Jitsi (5 Container) | bp-core-jitsi-* | 8443 | Videokonferenzen |
## Nginx Routing-Tabelle
| Port | Upstream | Projekt |
|------|----------|---------|
| 443 | bp-lehrer-studio-v2:3001 | Lehrer |
| 3000 | bp-lehrer-website:3000 | Lehrer |
| 3002 | bp-lehrer-admin:3000 | Lehrer |
| 3006 | bp-compliance-developer-portal:3000 | Compliance |
| 3007 | bp-compliance-admin:3000 | Compliance |
| 3008 | bp-core-admin:3000 | Core |
| 8000 | bp-core-backend:8000 | Core |
| 8001 | bp-lehrer-backend:8001 | Lehrer |
| 8002 | bp-compliance-backend:8002 | Compliance |
| 8086 | bp-lehrer-klausur-service:8086 | Lehrer |
| 8087 | bp-core-embedding-service:8087 | Core |
| 8091 | bp-lehrer-voice-service:8091 | Lehrer |
| 8093 | bp-compliance-ai-sdk:8090 | Compliance |
| 8097 | bp-core-rag-service:8097 | Core |
| 8443 | bp-core-jitsi-web:80 | Core |
## Architektur
- [System-Architektur](architecture/system-architecture.md)

View File

@@ -0,0 +1,84 @@
# Document Templates V2
Erweiterte Compliance-Vorlagen (DSFA, TOM, VVT, AVV, BV, FRIA) fuer den BreakPilot Document Generator.
**Branch:** `feature/betriebsrat-compliance-module`
**Ziel-Integration:** breakpilot-compliance (nach Abschluss des Refactoring)
**Datenbank:** `compliance.compliance_legal_templates` (shared PostgreSQL)
## Inhalt
### SQL Migrations (`migrations/`)
| Datei | Typ | Beschreibung |
|-------|-----|--------------|
| `001_dsfa_template_v2.sql` | DSFA | Schwellwertanalyse (WP248), SDM-TOM, KI-Modul, ~60 Placeholders |
| `002_tom_sdm_template.sql` | TOM | 7 SDM-Gewaehrleistungsziele, Sektorbloecke, Compliance-Bewertung |
| `003_vvt_sector_templates.sql` | VVT | 6 Branchen-Muster (IT/SaaS, Gesundheit, Handel, Handwerk, Bildung, Beratung) |
| `004_avv_template.sql` | AVV | Auftragsverarbeitungsvertrag Art. 28, 12 Sections, TOM-Anlage |
| `005_additional_templates.sql` | Div. | Verpflichtungserklaerung + Art. 13/14 Informationspflichten |
| `006_betriebsvereinbarung_template.sql` | BV | Betriebsvereinbarung §87 BetrVG, 13 Sektionen (A-M), KI/IT-Systeme |
| `007_fria_template.sql` | FRIA | Grundrechte-Folgenabschaetzung Art. 27 AI Act, 8 Sektionen |
### Python Generators (`generators/`)
| Datei | Beschreibung |
|-------|--------------|
| `dsfa_template.py` | DSFA-Generator mit Schwellwertanalyse, Bundesland-Mapping, SDM-TOM, Art. 36, Domain-Risiken (HR/Edu/HC/Finance) |
| `tom_template.py` | TOM-Generator mit SDM-Struktur, NIS2/ISO27001/AI Act Erweiterungen, Sektoren |
| `vvt_template.py` | VVT-Generator mit 6 Branchen-Katalogen, Art. 30 Validierung |
| `betriebsvereinbarung_template.py` | BV-Generator mit TOM-Befuellung, Konflikt-Score-basierte Schutzklauseln |
| `fria_template.py` | FRIA-Generator mit Domain→Grundrechte-Mapping (6 Domains), Risikomatrix |
### Scripts (`scripts/`)
| Datei | Beschreibung |
|-------|--------------|
| `cleanup_temp_vorlagen.py` | Loescht temporaere DPA-Vorlagen aus Qdrant (`temp_vorlagen=true`) |
## Integration in breakpilot-compliance
### 1. SQL Migrations ausfuehren
```bash
# Migrations gegen die shared DB ausfuehren
# Auf dem Mac Mini:
ssh macmini "docker exec bp-core-postgres psql -U breakpilot -d breakpilot_db -f -" < migrations/001_dsfa_template_v2.sql
ssh macmini "docker exec bp-core-postgres psql -U breakpilot -d breakpilot_db -f -" < migrations/002_tom_sdm_template.sql
# ... usw.
```
### 2. Python Generators kopieren (bei Compliance-Integration)
```bash
cp generators/*.py /path/to/breakpilot-compliance/backend-compliance/compliance/api/document_templates/
```
### 3. Neue document_types registrieren
In `breakpilot-compliance/backend-compliance/compliance/api/legal_template_routes.py`,
`VALID_DOCUMENT_TYPES` erweitern um:
- `verpflichtungserklaerung`
- `informationspflichten`
### 4. Qdrant Cleanup ausfuehren
```bash
# Vorschau
ssh macmini "python3 /path/to/cleanup_temp_vorlagen.py --dry-run"
# Ausfuehren
ssh macmini "python3 /path/to/cleanup_temp_vorlagen.py"
```
## Template-Syntax
- `{{PLACEHOLDER}}` — Wird durch Kontext-Wert ersetzt
- `{{#IF FELD}}...{{/IF}}` — Bedingter Block (wird nur angezeigt wenn Feld gesetzt)
- `{{#IF_NOT FELD}}...{{/IF_NOT}}` — Invertierter bedingter Block
- `[BLOCK:ID]...[/BLOCK:ID]` — Block der per Rule Engine entfernt werden kann
## Lizenz
Alle Templates: MIT License, BreakPilot Compliance.
Keine Texte aus DPA-Dokumenten uebernommen — alle Formulierungen eigenstaendig.

View File

@@ -0,0 +1,214 @@
"""Betriebsvereinbarung template generator — creates BV draft from UCCA assessment.
Generates a modular works council agreement (Betriebsvereinbarung) based on:
- UCCA Assessment result (triggered rules, risk score, obligations)
- Company profile (name, location, works council)
- System details (name, type, modules)
Sections A-M follow the template in migration 006.
"""
from typing import Optional
# -- Default verbotene Nutzungen nach BAG-Rechtsprechung --------------------
DEFAULT_VERBOTENE_NUTZUNGEN = [
"Verdeckte Leistungs- oder Verhaltenskontrolle einzelner Beschaeftigter",
"Erstellung individueller Persoenlichkeitsprofile oder Verhaltensanalysen",
"Nutzung von Nutzungshistorien zu disziplinarischen Zwecken",
"Automatisierte Personalentscheidungen ohne menschliche Ueberpruefung (Art. 22 DSGVO)",
"Personenbezogene Rankings oder Leistungsvergleiche ohne gesonderte Mitbestimmung",
"Korrelation von Systemnutzungsdaten mit Leistungsbeurteilungen",
]
AI_VERBOTENE_NUTZUNGEN = [
"Einsatz von KI-Funktionen zur biometrischen Echtzeit-Identifizierung am Arbeitsplatz",
"KI-gestuetztes Social Scoring von Beschaeftigten",
"Nutzung von KI-generierten Bewertungen als alleinige Grundlage fuer Personalentscheidungen",
]
# -- Standard-TOM Massnahmen ------------------------------------------------
DEFAULT_TOM = [
"Rollen- und Rechtekonzept mit Least-Privilege-Prinzip",
"Verschluesselung der Daten bei Uebertragung (TLS 1.2+) und Speicherung (AES-256)",
"Protokollierung aller administrativen Zugriffe",
"Pseudonymisierung personenbezogener Daten, wo technisch moeglich",
"Deaktivierung nicht benoetigter Telemetrie- und Diagnosefunktionen",
"Getrennte Umgebungen fuer Test und Produktion",
"Regelmaessige Sicherheitsupdates und Patch-Management",
"Zugangsschutz durch Multi-Faktor-Authentifizierung fuer Administratoren",
]
# -- Standard erlaubte Reports ----------------------------------------------
DEFAULT_ERLAUBTE_REPORTS = [
"Systemgesundheit und Verfuegbarkeit (ohne Personenbezug)",
"Lizenznutzung auf aggregierter Ebene (Abteilung/Standort, nicht Person)",
"Sicherheitsereignisse und Anomalien",
"Speicherplatznutzung (ohne Personenbezug)",
"Fehlerstatistiken (technisch, nicht personenbezogen)",
]
# -- Standard Datenarten bei IT/KI-Systemen ---------------------------------
DATENARTEN_MAP = {
"email": "E-Mail-Metadaten (Absender, Empfaenger, Zeitstempel — NICHT Inhalte)",
"chat": "Chat-/Messaging-Metadaten (Teilnehmer, Zeitstempel)",
"document": "Dokumentenmetadaten (Ersteller, Aenderungsdatum, Dateiname)",
"login": "Anmeldedaten (Benutzername, Zeitstempel, IP-Adresse)",
"usage": "Nutzungsdaten (aufgerufene Funktionen, Nutzungsdauer — aggregiert)",
"prompt": "KI-Eingaben und -Ausgaben (Prompts, Antworten)",
"calendar": "Kalendereintraege (Betreff, Teilnehmer, Zeiten)",
"hr": "Personalstammdaten (Name, Abteilung, Position, Eintrittsdatum)",
"performance": "Leistungsdaten (Kennzahlen, Bewertungen, Zielvereinbarungen)",
"video": "Videoaufnahmen (Arbeitsplatz, Zugangsbereiche)",
"location": "Standortdaten (GPS, WLAN-basierte Ortung, Gebaeudezutritt)",
}
def generate_betriebsvereinbarung_draft(ctx: dict) -> dict:
"""Generate a Betriebsvereinbarung draft from company + assessment context.
Args:
ctx: Dict with keys:
Required:
- company_name: str
- system_name: str
- system_description: str
Optional:
- company_address: str
- employer_representative: str
- works_council_chair: str
- system_vendor: str
- locations: list[str]
- departments: list[str]
- modules: list[str]
- purposes: list[str]
- data_types: list[str] — keys from DATENARTEN_MAP
- is_ai_system: bool
- has_employee_monitoring: bool
- has_hr_features: bool
- has_video: bool
- dpo_name: str
- dpo_contact: str
- audit_interval: str — e.g. "12 Monate"
- duration: str — e.g. "unbefristet"
- notice_period: str — e.g. "3 Monate"
- retention_audit_logs: str — e.g. "90 Tage"
- retention_usage_data: str — e.g. "30 Tage"
- retention_prompts: str — e.g. "deaktiviert"
- additional_forbidden: list[str]
- additional_tom: list[str]
- additional_reports: list[str]
- betrvg_conflict_score: int — 0-100
Returns:
Dict with placeholder values ready for template substitution.
"""
result = {}
# Basic info
result["UNTERNEHMEN_NAME"] = ctx.get("company_name", "{{UNTERNEHMEN_NAME}}")
result["UNTERNEHMEN_SITZ"] = ctx.get("company_address", "{{UNTERNEHMEN_SITZ}}")
result["ARBEITGEBER_VERTRETER"] = ctx.get("employer_representative", "{{ARBEITGEBER_VERTRETER}}")
result["BETRIEBSRAT_VORSITZ"] = ctx.get("works_council_chair", "{{BETRIEBSRAT_VORSITZ}}")
result["SYSTEM_NAME"] = ctx.get("system_name", "{{SYSTEM_NAME}}")
result["SYSTEM_BESCHREIBUNG"] = ctx.get("system_description", "{{SYSTEM_BESCHREIBUNG}}")
result["SYSTEM_HERSTELLER"] = ctx.get("system_vendor", "")
result["DSB_NAME"] = ctx.get("dpo_name", "{{DSB_NAME}}")
result["DSB_KONTAKT"] = ctx.get("dpo_contact", "{{DSB_KONTAKT}}")
# B. Geltungsbereich
locations = ctx.get("locations", [])
result["GELTUNGSBEREICH_STANDORTE"] = _bullet_list(locations) if locations else "Alle Standorte der {{UNTERNEHMEN_NAME}}"
departments = ctx.get("departments", [])
result["GELTUNGSBEREICH_BEREICHE"] = _bullet_list(departments) if departments else "Alle Beschaeftigten"
modules = ctx.get("modules", [])
result["GELTUNGSBEREICH_MODULE"] = _bullet_list(modules) if modules else "Alle Module und Dienste von {{SYSTEM_NAME}}"
# C. Zweck
purposes = ctx.get("purposes", [])
result["ZWECK_BESCHREIBUNG"] = _bullet_list(purposes) if purposes else "{{ZWECK_BESCHREIBUNG}}"
# C.2 Verbotene Nutzungen
forbidden = list(DEFAULT_VERBOTENE_NUTZUNGEN)
if ctx.get("is_ai_system"):
forbidden.extend(AI_VERBOTENE_NUTZUNGEN)
forbidden.extend(ctx.get("additional_forbidden", []))
result["VERBOTENE_NUTZUNGEN"] = _bullet_list(forbidden)
# D. Datenarten
data_type_keys = ctx.get("data_types", [])
datenarten = []
for key in data_type_keys:
if key in DATENARTEN_MAP:
datenarten.append(DATENARTEN_MAP[key])
else:
datenarten.append(key)
result["DATENARTEN_LISTE"] = _bullet_list(datenarten) if datenarten else "{{DATENARTEN_LISTE}}"
# E. Rollen
result["ROLLEN_ADMIN"] = ctx.get("roles_admin", "IT-Administration: Systemkonfiguration, Benutzerverwaltung, Sicherheitsupdates")
result["ROLLEN_FUEHRUNGSKRAFT"] = ctx.get("roles_manager", "Fuehrungskraefte: Nur aggregierte, nicht-personenbezogene Reports")
result["ROLLEN_REPORTING"] = ctx.get("roles_reporting", "Controlling/Reporting: Nur freigegebene Standardreports (siehe Abschnitt G)")
# F. Transparenz
result["TRANSPARENZ_INFO"] = ctx.get("transparency_info",
"Die Information erfolgt schriftlich und in einer Informationsveranstaltung vor Einfuehrung des Systems.")
# G. Reports
reports = list(DEFAULT_ERLAUBTE_REPORTS)
reports.extend(ctx.get("additional_reports", []))
result["ERLAUBTE_REPORTS"] = _bullet_list(reports)
# H. Speicherfristen
result["SPEICHERFRIST_AUDIT_LOGS"] = ctx.get("retention_audit_logs", "90 Tage")
result["SPEICHERFRIST_NUTZUNGSDATEN"] = ctx.get("retention_usage_data", "30 Tage")
result["SPEICHERFRIST_CHAT_PROMPTS"] = ctx.get("retention_prompts", "deaktiviert")
# I. TOM
tom = list(DEFAULT_TOM)
tom.extend(ctx.get("additional_tom", []))
# Intensivere Schutzmassnahmen bei hohem Konflikt-Score
conflict_score = ctx.get("betrvg_conflict_score", 0)
if conflict_score >= 50:
tom.append("Automatische Anomalie-Erkennung bei ungewoehnlichen Admin-Zugriffen")
tom.append("Quartalsweise Datenschutz-Audit durch externen Prueer")
if conflict_score >= 75:
tom.append("Betriebsrat erhaelt Leserechte auf Audit-Log-Dashboard")
tom.append("Jede Sonderauswertung wird dem Betriebsrat innerhalb von 24h gemeldet")
result["TOM_MASSNAHMEN"] = _bullet_list(tom)
# J. Change-Management
result["CHANGE_MANAGEMENT_PROZESS"] = ctx.get("change_process",
"Die Arbeitgeberin informiert den Betriebsrat schriftlich ueber geplante Aenderungen "
"mindestens 14 Kalendertage vor Umsetzung. Bei sicherheitskritischen Updates kann die "
"Frist auf 3 Werktage verkuerzt werden.")
# K. Audit
result["AUDIT_INTERVALL"] = ctx.get("audit_interval", "12 Monate")
# L. Beschwerde
result["BESCHWERDE_ANSPRECHPARTNER"] = ctx.get("complaint_contacts",
"- Direkter Vorgesetzter\n- Betriebsrat ({{BETRIEBSRAT_VORSITZ}})\n"
"- Datenschutzbeauftragter ({{DSB_NAME}}, {{DSB_KONTAKT}})")
# M. Schluss
result["LAUFZEIT"] = ctx.get("duration", "unbefristet")
result["KUENDIGUNGSFRIST"] = ctx.get("notice_period", "3 Monate")
result["DATUM_UNTERZEICHNUNG"] = ctx.get("signing_date", "{{DATUM_UNTERZEICHNUNG}}")
# Conditional flags
result["AI_SYSTEM"] = ctx.get("is_ai_system", False)
result["VIDEO_UEBERWACHUNG"] = ctx.get("has_video", False)
result["HR_SYSTEM"] = ctx.get("has_hr_features", False)
return result
def _bullet_list(items: list) -> str:
"""Format a list as markdown bullet points."""
return "\n".join(f"- {item}" for item in items)

View File

@@ -0,0 +1,485 @@
"""DSFA template generator V2 — creates DSFA skeleton from company profile.
Enhanced with:
- Schwellwertanalyse (9 WP248 criteria)
- Bundesland-specific Muss-Listen references
- SDM-based TOM structure (7 Gewaehrleistungsziele)
- Structured risk assessment (ISO 29134 methodology)
- AI Act module (Section 8)
- Art. 36 consultation assessment
"""
from typing import Optional
# -- WP248 Kriterien --------------------------------------------------------
WP248_CRITERIA = [
{"id": "K1", "label": "Bewertung oder Scoring (einschl. Profiling und Prognose)",
"ctx_keys": ["has_profiling", "has_scoring"]},
{"id": "K2", "label": "Automatisierte Entscheidungsfindung mit Rechtswirkung",
"ctx_keys": ["has_automated_decisions"]},
{"id": "K3", "label": "Systematische Ueberwachung von Personen",
"ctx_keys": ["has_surveillance", "has_employee_monitoring", "has_video_surveillance"]},
{"id": "K4", "label": "Verarbeitung sensibler Daten (Art. 9/10 DS-GVO)",
"ctx_keys": ["processes_health_data", "processes_biometric_data", "processes_criminal_data"]},
{"id": "K5", "label": "Datenverarbeitung in grossem Umfang",
"ctx_keys": ["large_scale_processing"]},
{"id": "K6", "label": "Verknuepfung oder Zusammenfuehrung von Datenbestaenden",
"ctx_keys": ["data_matching", "data_combining"]},
{"id": "K7", "label": "Daten zu schutzbeduerftigen Betroffenen",
"ctx_keys": ["processes_minors_data", "processes_employee_data", "processes_patient_data"]},
{"id": "K8", "label": "Innovative Nutzung neuer technologischer Loesungen",
"ctx_keys": ["uses_ai", "uses_biometrics", "uses_iot"]},
{"id": "K9", "label": "Verarbeitung hindert Betroffene an Rechtsausuebung",
"ctx_keys": ["blocks_service_access", "blocks_contract"]},
]
# -- Bundesland -> Aufsichtsbehoerde Mapping --------------------------------
BUNDESLAND_AUFSICHT = {
"baden-wuerttemberg": ("LfDI Baden-Wuerttemberg", "DSK Muss-Liste + BW-spezifische Liste (Art. 35 Abs. 4)"),
"bayern": ("BayLDA (nicht-oeffentlicher Bereich)", "BayLDA Muss-Liste (17.10.2018) + Fallbeispiel ISO 29134"),
"berlin": ("BlnBDI", "BlnBDI Muss-Liste nicht-oeffentlich / oeffentlich"),
"brandenburg": ("LDA Brandenburg", "LDA BB Muss-Liste allgemein / oeffentlich"),
"bremen": ("LfDI Bremen", "LfDI HB Muss-Liste"),
"hamburg": ("HmbBfDI", "HmbBfDI Muss-Liste nicht-oeffentlich / oeffentlich"),
"hessen": ("HBDI", "DSK Muss-Liste (HBDI uebernimmt DSK-Liste)"),
"mecklenburg-vorpommern": ("LfDI M-V", "LfDI M-V Muss-Liste"),
"niedersachsen": ("LfD Niedersachsen", "LfD NI Muss-Liste + Pruefschema"),
"nordrhein-westfalen": ("LDI NRW", "LDI NRW Muss-Liste nicht-oeffentlich / oeffentlich"),
"rheinland-pfalz": ("LfDI RLP", "LfDI RLP Muss-Liste allgemein / oeffentlich"),
"saarland": ("UDS Saarland", "DSK Muss-Liste (UDS uebernimmt DSK-Liste)"),
"sachsen": ("SDB Sachsen", "SDB Sachsen Muss-Liste"),
"sachsen-anhalt": ("LfD Sachsen-Anhalt", "LfD SA Muss-Liste allgemein / oeffentlich"),
"schleswig-holstein": ("ULD Schleswig-Holstein", "ULD Muss-Liste + Planspiel-DSFA"),
"thueringen": ("TLfDI", "TLfDI Muss-Liste (04.07.2018)"),
"bund": ("BfDI", "BfDI Muss-Liste / DSFA-Hinweise"),
}
# -- SDM Gewaehrleistungsziele -----------------------------------------------
SDM_GOALS = [
{
"id": "verfuegbarkeit",
"label": "Verfuegbarkeit",
"description": "Personenbezogene Daten stehen zeitgerecht zur Verfuegung und koennen ordnungsgemaess verarbeitet werden.",
"default_measures": [
"Redundante Datenhaltung und regelmaessige Backups",
"Disaster-Recovery-Plan mit definierten RTO/RPO-Werten",
"USV und Notstromversorgung fuer kritische Systeme",
],
},
{
"id": "integritaet",
"label": "Integritaet",
"description": "Personenbezogene Daten bleiben waehrend der Verarbeitung unversehrt, vollstaendig und aktuell.",
"default_measures": [
"Pruefsummen und digitale Signaturen fuer Datenuebertragungen",
"Eingabevalidierung und Plausibilitaetspruefungen",
"Versionierung und Change-Management-Verfahren",
],
},
{
"id": "vertraulichkeit",
"label": "Vertraulichkeit",
"description": "Nur befugte Personen koennen personenbezogene Daten zur Kenntnis nehmen.",
"default_measures": [
"Verschluesselung: TLS 1.3 im Transit, AES-256 at Rest",
"Rollenbasiertes Zugriffskonzept (RBAC) mit Least-Privilege-Prinzip",
"Multi-Faktor-Authentifizierung fuer administrative Zugaenge",
],
},
{
"id": "nichtverkettung",
"label": "Nichtverkettung",
"description": "Personenbezogene Daten werden nur fuer den Zweck verarbeitet, zu dem sie erhoben wurden.",
"default_measures": [
"Technische Zweckbindung durch Mandantentrennung",
"Pseudonymisierung wo fachlich moeglich",
"Getrennte Datenbanken / Schemata je Verarbeitungszweck",
],
},
{
"id": "transparenz",
"label": "Transparenz",
"description": "Betroffene, der Verantwortliche und die Aufsichtsbehoerde koennen die Verarbeitung nachvollziehen.",
"default_measures": [
"Vollstaendiges Audit-Log aller Datenzugriffe und -aenderungen",
"Verzeichnis der Verarbeitungstaetigkeiten (Art. 30 DS-GVO)",
"Informationspflichten gemaess Art. 13/14 DS-GVO umgesetzt",
],
},
{
"id": "intervenierbarkeit",
"label": "Intervenierbarkeit",
"description": "Betroffenenrechte (Auskunft, Berichtigung, Loeschung, Widerspruch) koennen wirksam ausgeuebt werden.",
"default_measures": [
"Self-Service-Portal oder dokumentierter Prozess fuer Betroffenenanfragen",
"Technische Loeschfaehigkeit mit Nachweis (Loeschprotokoll)",
"Datenexport in maschinenlesbarem Format (Art. 20 DS-GVO)",
],
},
{
"id": "datenminimierung",
"label": "Datenminimierung",
"description": "Die Verarbeitung beschraenkt sich auf das erforderliche Mass.",
"default_measures": [
"Regelmaessige Pruefung der Erforderlichkeit erhobener Datenfelder",
"Automatisierte Loeschung nach Ablauf der Aufbewahrungsfrist",
"Anonymisierung / Aggregation fuer statistische Zwecke",
],
},
]
def generate_dsfa_draft(ctx: dict) -> dict:
"""Generate a DSFA draft document from template context.
Args:
ctx: Flat dict from company-profile/template-context endpoint.
Returns:
Dict with DSFA fields ready for creation via POST /dsfa.
"""
company = ctx.get("company_name", "Unbekannt")
dpo = ctx.get("dpo_name", "")
dpo_email = ctx.get("dpo_email", "")
federal_state = ctx.get("federal_state", "").lower().replace(" ", "-")
# --- Section 0: Schwellwertanalyse ---
schwellwert = _generate_schwellwertanalyse(ctx)
# --- Section 1: Verarbeitungsbeschreibung ---
section_1 = _generate_section_1(ctx, company, dpo, dpo_email)
# --- Section 2: Notwendigkeit ---
section_2 = _generate_section_2(ctx)
# --- Section 3: Risikobewertung ---
section_3 = _generate_risk_assessment(ctx)
# --- Section 4: Stakeholder-Konsultation ---
section_4 = _generate_section_4(ctx)
# --- Section 5: TOM nach SDM ---
section_5 = _generate_sdm_tom_section(ctx)
# --- Section 6: DSB-Stellungnahme ---
section_6 = _generate_section_6(ctx, dpo)
# --- Section 7: Ergebnis ---
section_7 = _generate_section_7(ctx)
# --- Section 8: KI-Modul ---
ai_systems = ctx.get("ai_systems", [])
involves_ai = len(ai_systems) > 0
section_8 = _generate_ai_module(ctx) if involves_ai else None
sections = {
"section_0": {"title": "Schwellwertanalyse", "content": schwellwert["content"]},
"section_1": {"title": "Allgemeine Informationen und Verarbeitungsbeschreibung", "content": section_1},
"section_2": {"title": "Notwendigkeit und Verhaeltnismaessigkeit", "content": section_2},
"section_3": {"title": "Risikobewertung", "content": section_3},
"section_4": {"title": "Konsultation der Betroffenen", "content": section_4},
"section_5": {"title": "Technische und organisatorische Massnahmen (SDM)", "content": section_5},
"section_6": {"title": "Stellungnahme des DSB", "content": section_6},
"section_7": {"title": "Ergebnis und Ueberprufungsplan", "content": section_7},
}
if section_8:
sections["section_8"] = {"title": "KI-spezifisches Modul (EU AI Act)", "content": section_8}
# Assess Art. 36 consultation requirement
art36_required = _assess_art36_consultation(ctx, schwellwert)
return {
"title": f"DSFA — {company}",
"description": f"Datenschutz-Folgenabschaetzung fuer {company}",
"status": "draft",
"risk_level": "high" if involves_ai or schwellwert["criteria_met"] >= 3 else "medium",
"involves_ai": involves_ai,
"dpo_name": dpo,
"federal_state": ctx.get("federal_state", ""),
"sections": sections,
"wp248_criteria_met": schwellwert["criteria_details"],
"art35_abs3_triggered": schwellwert["art35_abs3"],
"threshold_analysis": {
"criteria_met_count": schwellwert["criteria_met"],
"dsfa_required": schwellwert["dsfa_required"],
"muss_liste_ref": schwellwert.get("muss_liste_ref", ""),
},
"consultation_requirement": {
"art36_required": art36_required,
"reason": "Restrisiko bleibt nach Massnahmen hoch" if art36_required else "Restrisiko akzeptabel",
},
"processing_systems": [s.get("name", "") for s in ctx.get("processing_systems", [])],
"ai_systems_summary": [
{"name": s.get("name"), "risk": s.get("risk_category", "unknown")}
for s in ai_systems
],
}
# -- Internal helpers --------------------------------------------------------
def _generate_schwellwertanalyse(ctx: dict) -> dict:
"""Evaluate 9 WP248 criteria against company profile."""
criteria_details = []
criteria_met = 0
for criterion in WP248_CRITERIA:
met = any(ctx.get(key) for key in criterion["ctx_keys"])
criteria_details.append({
"id": criterion["id"],
"label": criterion["label"],
"met": met,
})
if met:
criteria_met += 1
# Art. 35 Abs. 3 specific triggers
art35_abs3 = []
if ctx.get("has_profiling") and ctx.get("has_automated_decisions"):
art35_abs3.append("Art. 35 Abs. 3 lit. a: Profiling mit Rechtswirkung")
if any(ctx.get(k) for k in ["processes_health_data", "processes_biometric_data", "processes_criminal_data"]):
if ctx.get("large_scale_processing"):
art35_abs3.append("Art. 35 Abs. 3 lit. b: Umfangreiche Verarbeitung besonderer Kategorien")
if ctx.get("has_surveillance"):
art35_abs3.append("Art. 35 Abs. 3 lit. c: Systematische Ueberwachung oeffentlicher Bereiche")
dsfa_required = criteria_met >= 2 or len(art35_abs3) > 0
# Bundesland reference
federal_state = ctx.get("federal_state", "").lower().replace(" ", "-")
aufsicht_info = BUNDESLAND_AUFSICHT.get(federal_state, ("Nicht zugeordnet", "DSK Muss-Liste (allgemein)"))
met_labels = [c["label"] for c in criteria_details if c["met"]]
content_lines = [
f"**Anzahl erfuellter WP248-Kriterien:** {criteria_met} von 9\n",
f"**Erfuellte Kriterien:** {', '.join(met_labels) if met_labels else 'Keine'}\n",
]
if art35_abs3:
content_lines.append(f"**Art. 35 Abs. 3 DS-GVO direkt ausgeloest:** {'; '.join(art35_abs3)}\n")
content_lines.append(
f"\n**Ergebnis:** DSFA ist {'**erforderlich**' if dsfa_required else '**nicht erforderlich**'}."
)
if dsfa_required and criteria_met < 2:
content_lines.append(" (Ausgeloest durch Art. 35 Abs. 3 DS-GVO)")
return {
"content": "\n".join(content_lines),
"criteria_met": criteria_met,
"criteria_details": criteria_details,
"art35_abs3": art35_abs3,
"dsfa_required": dsfa_required,
"muss_liste_ref": aufsicht_info[1],
}
def _generate_section_1(ctx: dict, company: str, dpo: str, dpo_email: str) -> str:
federal_state = ctx.get("federal_state", "")
aufsicht = BUNDESLAND_AUFSICHT.get(
federal_state.lower().replace(" ", "-"), ("Nicht zugeordnet",)
)[0]
lines = [
f"**Verantwortlicher:** {company}",
f"**Datenschutzbeauftragter:** {dpo}" + (f" ({dpo_email})" if dpo_email else ""),
f"**Zustaendige Aufsichtsbehoerde:** {aufsicht}",
]
systems = ctx.get("processing_systems", [])
if systems:
lines.append("\n**Eingesetzte Verarbeitungssysteme:**")
for s in systems:
hosting = s.get("hosting", "")
lines.append(f"- {s.get('name', 'N/A')}" + (f" ({hosting})" if hosting else ""))
return "\n".join(lines)
def _generate_section_2(ctx: dict) -> str:
lines = [
"### Notwendigkeit\n",
"Die Verarbeitung ist zur Erreichung des beschriebenen Zwecks erforderlich. ",
"Alternative, weniger eingriffsintensive Massnahmen wurden geprueft.\n",
"### Datenminimierung\n",
"Die verarbeiteten Datenkategorien beschraenken sich auf das fuer den ",
"Verarbeitungszweck erforderliche Minimum (Art. 5 Abs. 1 lit. c DS-GVO).\n",
]
return "".join(lines)
def _generate_risk_assessment(ctx: dict) -> str:
lines = ["## Risikoanalyse\n"]
# Standard risks
risks = [
("Unbefugter Zugriff auf personenbezogene Daten", "mittel", "hoch", "hoch"),
("Datenverlust durch technischen Ausfall", "niedrig", "hoch", "mittel"),
("Fehlerhafte Verarbeitung / Datenqualitaet", "niedrig", "mittel", "niedrig"),
("Zweckentfremdung erhobener Daten", "niedrig", "hoch", "mittel"),
]
if ctx.get("has_ai_systems") or ctx.get("uses_ai"):
risks.append(("Diskriminierung durch algorithmische Entscheidungen", "mittel", "hoch", "hoch"))
risks.append(("Mangelnde Erklaerbarkeit von KI-Entscheidungen", "mittel", "mittel", "mittel"))
if ctx.get("processes_health_data"):
risks.append(("Offenlegung von Gesundheitsdaten", "niedrig", "gross", "hoch"))
if any(ctx.get(k) for k in ["third_country_transfer", "processes_in_third_country"]):
risks.append(("Zugriff durch Behoerden in Drittlaendern", "mittel", "hoch", "hoch"))
# FISA 702 Risiko bei US-Cloud-Providern
hosting = (ctx.get("hosting_provider") or "").lower()
us_providers = ("aws", "azure", "google", "microsoft", "amazon", "openai", "anthropic", "oracle")
if any(p in hosting for p in us_providers):
risks.append(("FISA 702: Zugriff durch US-Behoerden auf EU-Daten nicht ausschliessbar", "mittel", "hoch", "hoch"))
risks.append(("EU-Serverstandort schuetzt nicht gegen US-Rechtszugriff (Cloud Act + FISA)", "mittel", "hoch", "hoch"))
risks.append(("Fehlende effektive Rechtsbehelfe fuer EU-Betroffene gegen US-Ueberwachung", "mittel", "hoch", "hoch"))
# Domain-spezifische Risiken (AI Act Annex III)
domain = ctx.get("domain", "")
if domain in ("hr", "recruiting") or ctx.get("has_hr_context"):
risks.append(("AGG-Verstoss: Diskriminierung bei Bewerberauswahl (§ 1 AGG)", "mittel", "hoch", "hoch"))
risks.append(("Beweislastumkehr bei Diskriminierungsklagen (§ 22 AGG)", "mittel", "hoch", "hoch"))
risks.append(("Art. 22 DSGVO: Unzulaessige automatisierte Einzelentscheidung", "mittel", "hoch", "hoch"))
risks.append(("Proxy-Diskriminierung durch Name/Foto/Alter-Erkennung", "mittel", "hoch", "hoch"))
if domain in ("education", "higher_education", "vocational_training"):
risks.append(("Chancenungleichheit durch KI-gestuetzte Bewertung", "mittel", "hoch", "hoch"))
risks.append(("Benachteiligung Minderjaehriger ohne Lehrkraft-Kontrolle", "niedrig", "gross", "hoch"))
risks.append(("Fehlbewertung mit Auswirkung auf Bildungschancen", "mittel", "hoch", "hoch"))
if domain in ("healthcare", "medical_devices", "pharma", "elderly_care"):
risks.append(("Fehldiagnose durch KI mit gesundheitlichen Folgen", "niedrig", "gross", "hoch"))
risks.append(("Falsche Triage-Priorisierung (lebenskritisch)", "niedrig", "gross", "hoch"))
risks.append(("Verletzung der Patientenautonomie", "mittel", "hoch", "hoch"))
if domain in ("finance", "banking", "insurance", "investment"):
risks.append(("Diskriminierendes Kredit-Scoring", "mittel", "hoch", "hoch"))
risks.append(("Ungerechtfertigte Verweigerung von Finanzdienstleistungen", "mittel", "hoch", "hoch"))
lines.append("| Risiko | Eintrittswahrscheinlichkeit | Schwere | Gesamt |")
lines.append("|--------|----------------------------|---------|--------|")
for risk_name, likelihood, severity, overall in risks:
lines.append(f"| {risk_name} | {likelihood} | {severity} | **{overall}** |")
lines.append("")
high_risks = sum(1 for _, _, _, o in risks if o == "hoch")
if high_risks > 0:
lines.append(f"\n**{high_risks} Risiken mit Stufe 'hoch' identifiziert.** "
"Massnahmen gemaess Abschnitt 5 reduzieren das Restrisiko.")
return "\n".join(lines)
def _generate_section_4(ctx: dict) -> str:
lines = []
if ctx.get("has_works_council"):
lines.append("Der Betriebsrat wurde informiert und angehoert.")
lines.append(
"Eine Konsultation der Betroffenen gemaess Art. 35 Abs. 9 DS-GVO "
"wird empfohlen, soweit verhaeltnismaessig und praktikabel."
)
return "\n".join(lines)
def _generate_sdm_tom_section(ctx: dict) -> str:
"""Generate TOM section structured by 7 SDM Gewaehrleistungsziele."""
lines = []
for goal in SDM_GOALS:
lines.append(f"**{goal['label']}** — {goal['description']}\n")
lines.append("| Massnahme | Typ | Status |")
lines.append("|-----------|-----|--------|")
for measure in goal["default_measures"]:
mtype = "technisch" if any(
kw in measure.lower()
for kw in ["verschluesselung", "backup", "redundanz", "tls", "aes", "rbac", "mfa",
"pruefsumm", "validierung", "loeschfaehigkeit", "export", "automatisiert"]
) else "organisatorisch"
lines.append(f"| {measure} | {mtype} | geplant |")
lines.append("")
return "\n".join(lines)
def _generate_section_6(ctx: dict, dpo: str) -> str:
if dpo:
return (
f"Der Datenschutzbeauftragte ({dpo}) wurde konsultiert. "
"Die Stellungnahme liegt bei bzw. wird nachgereicht."
)
return (
"Ein Datenschutzbeauftragter wurde noch nicht benannt. "
"Sofern eine Benennungspflicht besteht (Art. 37 DS-GVO), "
"ist dies vor Abschluss der DSFA nachzuholen."
)
def _generate_section_7(ctx: dict) -> str:
review_months = ctx.get("review_cycle_months", 12)
lines = [
"### Ergebnis\n",
"Die DSFA wurde gemaess Art. 35 DS-GVO durchgefuehrt. Die identifizierten Risiken ",
"wurden bewertet und durch geeignete Massnahmen auf ein akzeptables Niveau reduziert.\n",
"### Ueberprufungsplan\n",
f"- **Regelmaessige Ueberprufung:** alle {review_months} Monate\n",
"- **Trigger fuer ausserplanmaessige Ueberprufung:**\n",
" - Wesentliche Aenderung der Verarbeitungstaetigkeit\n",
" - Neue oder geaenderte Rechtsgrundlage\n",
" - Sicherheitsvorfall mit Bezug zur Verarbeitung\n",
" - Aenderung der eingesetzten Technologie oder Auftragsverarbeiter\n",
" - Neue Erkenntnisse zu Risiken oder Bedrohungen\n",
]
return "".join(lines)
def _generate_ai_module(ctx: dict) -> str:
"""Generate Section 8 for AI systems (EU AI Act)."""
lines = ["### Eingesetzte KI-Systeme\n"]
ai_systems = ctx.get("ai_systems", [])
if ai_systems:
lines.append("| System | Zweck | Risikokategorie | Human Oversight |")
lines.append("|--------|-------|-----------------|-----------------|")
for s in ai_systems:
risk = s.get("risk_category", "unbekannt")
oversight = "Ja" if s.get("has_human_oversight") else "Nein"
lines.append(f"| {s.get('name', 'N/A')} | {s.get('purpose', 'N/A')} | {risk} | {oversight} |")
lines.append("")
if ctx.get("subject_to_ai_act"):
lines.append(
"**Hinweis:** Das Unternehmen unterliegt dem EU AI Act (Verordnung (EU) 2024/1689). "
"Fuer Hochrisiko-KI-Systeme ist eine grundrechtliche Folgenabschaetzung "
"gemaess Art. 27 KI-VO durchzufuehren.\n"
)
high_risk = [s for s in ai_systems if s.get("risk_category") in ("high", "hoch")]
if high_risk:
lines.append("### Hochrisiko-KI-Systeme — Zusatzanforderungen\n")
lines.append("Fuer die folgenden Systeme gelten die Anforderungen aus Kapitel III KI-VO:\n")
for s in high_risk:
lines.append(f"- **{s.get('name', 'N/A')}**: Risikomanagement (Art. 9), "
f"Daten-Governance (Art. 10), Transparenz (Art. 13), "
f"Human Oversight (Art. 14)\n")
return "\n".join(lines)
def _assess_art36_consultation(ctx: dict, schwellwert: dict) -> bool:
"""Determine if Art. 36 DSGVO consultation with supervisory authority is required.
Art. 36 requires prior consultation when the DSFA indicates that the processing
would result in a HIGH residual risk despite mitigation measures.
"""
if schwellwert["criteria_met"] >= 4:
return True
if len(schwellwert.get("art35_abs3", [])) >= 2:
return True
ai_systems = ctx.get("ai_systems", [])
high_risk_ai = [s for s in ai_systems if s.get("risk_category") in ("high", "hoch", "unacceptable")]
if len(high_risk_ai) >= 2:
return True
return False

View File

@@ -0,0 +1,227 @@
"""FRIA template generator — creates Fundamental Rights Impact Assessment from UCCA context.
Generates a FRIA (Art. 27 AI Act) based on:
- UCCA Assessment result (risk level, triggered rules, domain)
- AI Act Decision Tree classification
- Company profile
Automatically maps domains to affected fundamental rights.
"""
from typing import Optional
# -- Domain → Fundamental Rights Mapping ------------------------------------
DOMAIN_RIGHTS_MAP = {
"education": [
{"right": "Recht auf Bildung", "charter": "Art. 14", "gg": "Art. 12",
"risk": "Chancengleichheit bei KI-gestuetzter Bewertung oder Auswahl"},
{"right": "Nicht-Diskriminierung", "charter": "Art. 21", "gg": "Art. 3",
"risk": "Bias bei Leistungsbewertung nach Herkunft, Sprache oder Geschlecht"},
{"right": "Rechte des Kindes", "charter": "Art. 24", "gg": "Art. 6 Abs. 2",
"risk": "Besonderer Schutz Minderjaehriger vor automatisierten Bewertungen"},
],
"hr": [
{"right": "Berufsfreiheit / Recht zu arbeiten", "charter": "Art. 15", "gg": "Art. 12",
"risk": "KI-gestuetzte Auswahl kann Zugang zum Arbeitsmarkt einschraenken"},
{"right": "Nicht-Diskriminierung", "charter": "Art. 21", "gg": "Art. 3",
"risk": "Bias bei Recruiting, Befoerderung oder Kuendigung"},
{"right": "Schutz personenbezogener Daten", "charter": "Art. 8", "gg": "Art. 2 Abs. 1",
"risk": "Umfangreiche Verarbeitung von Beschaeftigtendaten"},
],
"healthcare": [
{"right": "Menschenwuerde", "charter": "Art. 1", "gg": "Art. 1",
"risk": "KI-Diagnosen koennen existenzielle Auswirkungen haben"},
{"right": "Schutz personenbezogener Daten", "charter": "Art. 8", "gg": "Art. 2 Abs. 1",
"risk": "Gesundheitsdaten sind besondere Kategorien (Art. 9 DSGVO)"},
{"right": "Nicht-Diskriminierung", "charter": "Art. 21", "gg": "Art. 3",
"risk": "Bias bei Behandlungsempfehlungen nach Alter, Geschlecht oder Ethnie"},
],
"finance": [
{"right": "Recht auf soziale Sicherheit", "charter": "Art. 34", "gg": "Art. 20",
"risk": "Zugang zu Finanzdienstleistungen und Versicherungen"},
{"right": "Nicht-Diskriminierung", "charter": "Art. 21", "gg": "Art. 3",
"risk": "Scoring-Bias bei Kreditvergabe oder Versicherungspraemien"},
{"right": "Recht auf wirksamen Rechtsbehelf", "charter": "Art. 47", "gg": "Art. 19 Abs. 4",
"risk": "Anfechtbarkeit automatisierter Finanzentscheidungen"},
],
"law_enforcement": [
{"right": "Recht auf Freiheit und Sicherheit", "charter": "Art. 6", "gg": "Art. 2 Abs. 2",
"risk": "KI-gestuetzte Ueberwachung oder Vorhersage"},
{"right": "Unschuldsvermutung", "charter": "Art. 48", "gg": "Art. 20 Abs. 3",
"risk": "Predictive Policing kann Vorverurteilung erzeugen"},
{"right": "Recht auf Privatsphaere", "charter": "Art. 7", "gg": "Art. 2 Abs. 1",
"risk": "Biometrische Identifizierung im oeffentlichen Raum"},
],
"public_sector": [
{"right": "Recht auf eine gute Verwaltung", "charter": "Art. 41", "gg": "Art. 20 Abs. 3",
"risk": "Automatisierte Verwaltungsentscheidungen muessen nachvollziehbar sein"},
{"right": "Nicht-Diskriminierung", "charter": "Art. 21", "gg": "Art. 3",
"risk": "Gleichbehandlung aller Buerger bei KI-gestuetzten Verwaltungsakten"},
{"right": "Recht auf wirksamen Rechtsbehelf", "charter": "Art. 47", "gg": "Art. 19 Abs. 4",
"risk": "Widerspruchsmoeglichkeit gegen KI-gestuetzte Bescheide"},
],
}
# Universal rights (always relevant for High-Risk AI)
UNIVERSAL_RIGHTS = [
{"right": "Schutz personenbezogener Daten", "charter": "Art. 8", "gg": "Art. 2 Abs. 1 i.V.m. Art. 1 Abs. 1",
"risk": "Datenverarbeitung durch KI-System"},
{"right": "Menschenwuerde", "charter": "Art. 1", "gg": "Art. 1",
"risk": "KI darf Menschen nicht auf Datenpunkte reduzieren"},
]
# -- Default measures -------------------------------------------------------
DEFAULT_MEASURES = [
"Human-in-the-Loop: Menschliche Ueberpruefung aller KI-Empfehlungen vor Umsetzung",
"Transparenz: Betroffene werden ueber den Einsatz von KI informiert",
"Erklaerbarkeit: KI-Ergebnisse koennen nachvollzogen und begruendet werden",
"Beschwerdemechanismus: Betroffene koennen KI-Entscheidungen anfechten",
"Logging: Alle Eingaben und Ausgaben werden fuer Audit-Zwecke protokolliert",
"Regelmaessige Bias-Audits: Systematische Pruefung auf Diskriminierung",
]
HR_MEASURES = [
"AGG-konforme Gestaltung: Kein Bias bei Geschlecht, Alter, Herkunft, Behinderung",
"Betriebsrat gemaess §87 Abs.1 Nr.6 und §95 BetrVG beteiligt",
"Keine automatisierte Endentscheidung bei Personalangelegenheiten",
]
EDUCATION_MEASURES = [
"Lehrkraft ueberprueft und verantwortet alle KI-generierten Bewertungen",
"Chancengleichheit unabhaengig von sozioekonomischem Hintergrund",
"Schueler/Eltern koennen KI-gestuetzte Bewertungen anfechten",
]
def generate_fria_draft(ctx: dict) -> dict:
"""Generate a FRIA draft from UCCA assessment context.
Args:
ctx: Dict with keys:
Required:
- organisation_name: str
- system_name: str
- system_description: str
- einsatzzweck: str
Optional:
- organisation_address: str
- system_version: str
- system_provider: str
- domain: str (education, hr, healthcare, finance, etc.)
- affected_groups: list[str]
- affected_count: str
- ai_act_classification: str (high_risk, limited_risk, etc.)
- annex_iii_category: str
- is_public_entity: bool
- has_hr_context: bool
- has_education_context: bool
- risk_score: int
- dpo_name: str
- dpo_contact: str
- review_interval: str
Returns:
Dict with placeholder values for template substitution.
"""
result = {}
# Section 1: Basic info
result["ORGANISATION_NAME"] = ctx.get("organisation_name", "{{ORGANISATION_NAME}}")
result["ORGANISATION_ADRESSE"] = ctx.get("organisation_address", "{{ORGANISATION_ADRESSE}}")
result["VERANTWORTLICHER"] = ctx.get("responsible_person", "{{VERANTWORTLICHER}}")
result["ERSTELLT_VON"] = ctx.get("created_by", "{{ERSTELLT_VON}}")
result["ERSTELLT_AM"] = ctx.get("created_at", "{{ERSTELLT_AM}}")
result["SYSTEM_NAME"] = ctx.get("system_name", "{{SYSTEM_NAME}}")
result["SYSTEM_VERSION"] = ctx.get("system_version", "1.0")
result["SYSTEM_BESCHREIBUNG"] = ctx.get("system_description", "{{SYSTEM_BESCHREIBUNG}}")
result["SYSTEM_ANBIETER"] = ctx.get("system_provider", "{{SYSTEM_ANBIETER}}")
result["EINSATZZWECK"] = ctx.get("einsatzzweck", "{{EINSATZZWECK}}")
result["EINSATZKONTEXT"] = ctx.get("einsatzkontext", "{{EINSATZKONTEXT}}")
result["AI_ACT_KLASSIFIKATION"] = ctx.get("ai_act_classification", "High-Risk")
result["ANNEX_III_KATEGORIE"] = ctx.get("annex_iii_category", "")
result["DSB_NAME"] = ctx.get("dpo_name", "{{DSB_NAME}}")
result["DSB_KONTAKT"] = ctx.get("dpo_contact", "{{DSB_KONTAKT}}")
# Section 1.5: Affected groups
groups = ctx.get("affected_groups", [])
result["BETROFFENE_GRUPPEN"] = _bullet_list(groups) if groups else "{{BETROFFENE_GRUPPEN}}"
result["BETROFFENE_ANZAHL"] = ctx.get("affected_count", "{{BETROFFENE_ANZAHL}}")
# Section 2: Fundamental rights mapping
domain = ctx.get("domain", "")
rights = list(UNIVERSAL_RIGHTS)
if domain in DOMAIN_RIGHTS_MAP:
rights.extend(DOMAIN_RIGHTS_MAP[domain])
rights_table = []
for i, r in enumerate(rights, 1):
rights_table.append(
f"| {i} | {r['right']} | {r['charter']} | {r['gg']} | Ja | {r['risk']} |"
)
result["GRUNDRECHTE_ANALYSE"] = "\n".join(rights_table) if rights_table else "{{GRUNDRECHTE_ANALYSE}}"
# Section 3: Risk matrix
risk_rows = []
risk_score = ctx.get("risk_score", 0)
base_likelihood = min(3, 1 + risk_score // 30)
for r in rights:
severity = 3 if "Diskriminierung" in r["risk"] or "existenz" in r["risk"].lower() else 2
likelihood = base_likelihood
level = _risk_level(likelihood * severity)
risk_rows.append(
f"| {r['right']} | {r['risk']} | {likelihood} | {severity} | {level} | Basierend auf Systemanalyse |"
)
result["RISIKOMATRIX"] = "\n".join(risk_rows) if risk_rows else "{{RISIKOMATRIX}}"
# Section 4: Measures
measures = list(DEFAULT_MEASURES)
if ctx.get("has_hr_context") or domain == "hr":
measures.extend(HR_MEASURES)
if ctx.get("has_education_context") or domain == "education":
measures.extend(EDUCATION_MEASURES)
result["MASSNAHMEN_LISTE"] = _bullet_list(measures)
result["HUMAN_OVERSIGHT_BESCHREIBUNG"] = ctx.get("human_oversight",
"Das System unterstuetzt menschliche Entscheidungen, trifft jedoch keine eigenstaendigen Entscheidungen. "
"Alle KI-generierten Empfehlungen werden von qualifiziertem Personal geprueft.")
result["TRANSPARENZ_MASSNAHMEN"] = ctx.get("transparency_measures",
"Betroffene Personen werden ueber den Einsatz des KI-Systems informiert. "
"KI-generierte Ergebnisse werden als solche gekennzeichnet.")
# Section 5: Consultation
result["KONSULTATION_ERGEBNISSE"] = ctx.get("consultation_results",
"Konsultation steht aus — bitte vor Freigabe durchfuehren.")
# Section 6: Approval
result["GENEHMIGT_VON"] = ctx.get("approved_by", "{{GENEHMIGT_VON}}")
result["GENEHMIGT_AM"] = ctx.get("approved_at", "{{GENEHMIGT_AM}}")
# Section 7: Monitoring
result["NAECHSTE_UEBERPRUEFUNG"] = ctx.get("review_interval", "12 Monate nach Inbetriebnahme")
# Conditional flags
result["BILDUNGSKONTEXT"] = ctx.get("has_education_context", False) or domain == "education"
result["HR_KONTEXT"] = ctx.get("has_hr_context", False) or domain == "hr"
result["OEFFENTLICHE_STELLE"] = ctx.get("is_public_entity", False)
return result
def _risk_level(score: int) -> str:
"""Map risk score to level label."""
if score <= 6:
return "Niedrig"
elif score <= 12:
return "Mittel"
elif score <= 19:
return "Hoch"
else:
return "Kritisch"
def _bullet_list(items: list) -> str:
"""Format a list as markdown bullet points."""
return "\n".join(f"- {item}" for item in items)

View File

@@ -0,0 +1,158 @@
"""Tests for Betriebsvereinbarung template generator."""
import pytest
from betriebsvereinbarung_template import (
generate_betriebsvereinbarung_draft,
DEFAULT_VERBOTENE_NUTZUNGEN,
AI_VERBOTENE_NUTZUNGEN,
DEFAULT_TOM,
DATENARTEN_MAP,
)
class TestGenerateBetriebsvereinbarung:
"""Tests for generate_betriebsvereinbarung_draft()."""
def test_minimal_context(self):
"""Minimal context should produce valid output with placeholders."""
ctx = {
"company_name": "Test GmbH",
"system_name": "Microsoft 365",
"system_description": "Office-Suite mit KI-Funktionen",
}
result = generate_betriebsvereinbarung_draft(ctx)
assert result["UNTERNEHMEN_NAME"] == "Test GmbH"
assert result["SYSTEM_NAME"] == "Microsoft 365"
assert "{{BETRIEBSRAT_VORSITZ}}" in result["BETRIEBSRAT_VORSITZ"]
def test_full_context(self):
"""Full context should fill all placeholders."""
ctx = {
"company_name": "Acme Corp",
"company_address": "Hamburg",
"employer_representative": "Dr. Schmidt",
"works_council_chair": "Fr. Mueller",
"system_name": "Copilot",
"system_description": "KI-Assistent",
"system_vendor": "Microsoft",
"locations": ["Hamburg", "Berlin"],
"departments": ["IT", "HR"],
"modules": ["Teams", "Outlook", "Word"],
"purposes": ["Texterstellung", "Zusammenfassung"],
"data_types": ["email", "chat", "login"],
"is_ai_system": True,
"dpo_name": "Dr. Datenschutz",
"dpo_contact": "dsb@acme.de",
"audit_interval": "6 Monate",
"duration": "2 Jahre",
"notice_period": "6 Monate",
}
result = generate_betriebsvereinbarung_draft(ctx)
assert result["ARBEITGEBER_VERTRETER"] == "Dr. Schmidt"
assert result["BETRIEBSRAT_VORSITZ"] == "Fr. Mueller"
assert "Hamburg" in result["GELTUNGSBEREICH_STANDORTE"]
assert "Berlin" in result["GELTUNGSBEREICH_STANDORTE"]
assert "Teams" in result["GELTUNGSBEREICH_MODULE"]
assert result["AUDIT_INTERVALL"] == "6 Monate"
assert result["LAUFZEIT"] == "2 Jahre"
assert result["AI_SYSTEM"] is True
def test_verbotene_nutzungen_default(self):
"""Default forbidden uses should always be included."""
ctx = {"company_name": "Test", "system_name": "Tool", "system_description": "x"}
result = generate_betriebsvereinbarung_draft(ctx)
for nutzung in DEFAULT_VERBOTENE_NUTZUNGEN:
assert nutzung in result["VERBOTENE_NUTZUNGEN"]
def test_verbotene_nutzungen_ai_system(self):
"""AI-specific forbidden uses should be added for AI systems."""
ctx = {
"company_name": "Test",
"system_name": "Tool",
"system_description": "x",
"is_ai_system": True,
}
result = generate_betriebsvereinbarung_draft(ctx)
for nutzung in AI_VERBOTENE_NUTZUNGEN:
assert nutzung in result["VERBOTENE_NUTZUNGEN"]
def test_verbotene_nutzungen_no_ai(self):
"""AI-specific forbidden uses should NOT be added for non-AI systems."""
ctx = {
"company_name": "Test",
"system_name": "Tool",
"system_description": "x",
"is_ai_system": False,
}
result = generate_betriebsvereinbarung_draft(ctx)
for nutzung in AI_VERBOTENE_NUTZUNGEN:
assert nutzung not in result["VERBOTENE_NUTZUNGEN"]
def test_datenarten_mapping(self):
"""Data types should be resolved from DATENARTEN_MAP."""
ctx = {
"company_name": "Test",
"system_name": "Tool",
"system_description": "x",
"data_types": ["email", "prompt", "hr"],
}
result = generate_betriebsvereinbarung_draft(ctx)
assert DATENARTEN_MAP["email"] in result["DATENARTEN_LISTE"]
assert DATENARTEN_MAP["prompt"] in result["DATENARTEN_LISTE"]
assert DATENARTEN_MAP["hr"] in result["DATENARTEN_LISTE"]
def test_tom_high_conflict_score(self):
"""High conflict score should add extra TOM measures."""
ctx_low = {
"company_name": "Test",
"system_name": "Tool",
"system_description": "x",
"betrvg_conflict_score": 20,
}
ctx_high = {
"company_name": "Test",
"system_name": "Tool",
"system_description": "x",
"betrvg_conflict_score": 80,
}
result_low = generate_betriebsvereinbarung_draft(ctx_low)
result_high = generate_betriebsvereinbarung_draft(ctx_high)
# High score should have more TOM items
low_count = result_low["TOM_MASSNAHMEN"].count("- ")
high_count = result_high["TOM_MASSNAHMEN"].count("- ")
assert high_count > low_count, f"High conflict ({high_count} TOMs) should have more than low ({low_count})"
def test_speicherfristen_defaults(self):
"""Default retention periods should be set."""
ctx = {"company_name": "Test", "system_name": "Tool", "system_description": "x"}
result = generate_betriebsvereinbarung_draft(ctx)
assert result["SPEICHERFRIST_AUDIT_LOGS"] == "90 Tage"
assert result["SPEICHERFRIST_NUTZUNGSDATEN"] == "30 Tage"
assert result["SPEICHERFRIST_CHAT_PROMPTS"] == "deaktiviert"
def test_custom_retention(self):
"""Custom retention periods should override defaults."""
ctx = {
"company_name": "Test",
"system_name": "Tool",
"system_description": "x",
"retention_audit_logs": "180 Tage",
"retention_prompts": "7 Tage",
}
result = generate_betriebsvereinbarung_draft(ctx)
assert result["SPEICHERFRIST_AUDIT_LOGS"] == "180 Tage"
assert result["SPEICHERFRIST_CHAT_PROMPTS"] == "7 Tage"
if __name__ == "__main__":
pytest.main([__file__, "-v"])

View File

@@ -0,0 +1,198 @@
"""Tests for FRIA (Fundamental Rights Impact Assessment) template generator."""
import pytest
from fria_template import (
generate_fria_draft,
DOMAIN_RIGHTS_MAP,
UNIVERSAL_RIGHTS,
DEFAULT_MEASURES,
HR_MEASURES,
EDUCATION_MEASURES,
)
class TestGenerateFRIA:
"""Tests for generate_fria_draft()."""
def test_minimal_context(self):
ctx = {
"organisation_name": "Test GmbH",
"system_name": "AI Tool",
"system_description": "KI-Assistenz",
"einsatzzweck": "Automatisierung",
}
result = generate_fria_draft(ctx)
assert result["ORGANISATION_NAME"] == "Test GmbH"
assert result["SYSTEM_NAME"] == "AI Tool"
assert result["AI_ACT_KLASSIFIKATION"] == "High-Risk"
def test_hr_domain_rights(self):
ctx = {
"organisation_name": "HR Corp",
"system_name": "Recruiting AI",
"system_description": "Bewerber-Screening",
"einsatzzweck": "Personalauswahl",
"domain": "hr",
}
result = generate_fria_draft(ctx)
# HR domain should include employment rights
assert "Berufsfreiheit" in result["GRUNDRECHTE_ANALYSE"]
assert "Nicht-Diskriminierung" in result["GRUNDRECHTE_ANALYSE"]
assert result["HR_KONTEXT"] is True
assert result["BILDUNGSKONTEXT"] is False
def test_education_domain_rights(self):
ctx = {
"organisation_name": "Schule",
"system_name": "Bewertungs-KI",
"system_description": "Notenunterstuetzung",
"einsatzzweck": "Leistungsbewertung",
"domain": "education",
}
result = generate_fria_draft(ctx)
assert "Recht auf Bildung" in result["GRUNDRECHTE_ANALYSE"]
assert "Rechte des Kindes" in result["GRUNDRECHTE_ANALYSE"]
assert result["BILDUNGSKONTEXT"] is True
def test_healthcare_domain_rights(self):
ctx = {
"organisation_name": "Klinik",
"system_name": "Diagnose-KI",
"system_description": "Diagnoseunterstuetzung",
"einsatzzweck": "Diagnostik",
"domain": "healthcare",
}
result = generate_fria_draft(ctx)
assert "Menschenwuerde" in result["GRUNDRECHTE_ANALYSE"]
assert "Schutz personenbezogener Daten" in result["GRUNDRECHTE_ANALYSE"]
def test_universal_rights_always_present(self):
for domain in ["hr", "education", "healthcare", "finance", ""]:
ctx = {
"organisation_name": "Test",
"system_name": "Tool",
"system_description": "x",
"einsatzzweck": "y",
"domain": domain,
}
result = generate_fria_draft(ctx)
assert "Schutz personenbezogener Daten" in result["GRUNDRECHTE_ANALYSE"]
def test_hr_measures_included(self):
ctx = {
"organisation_name": "Test",
"system_name": "Tool",
"system_description": "x",
"einsatzzweck": "y",
"domain": "hr",
}
result = generate_fria_draft(ctx)
for measure in HR_MEASURES:
assert measure in result["MASSNAHMEN_LISTE"]
def test_education_measures_included(self):
ctx = {
"organisation_name": "Test",
"system_name": "Tool",
"system_description": "x",
"einsatzzweck": "y",
"domain": "education",
}
result = generate_fria_draft(ctx)
for measure in EDUCATION_MEASURES:
assert measure in result["MASSNAHMEN_LISTE"]
def test_public_entity_flag(self):
ctx = {
"organisation_name": "Behoerde",
"system_name": "Tool",
"system_description": "x",
"einsatzzweck": "y",
"is_public_entity": True,
}
result = generate_fria_draft(ctx)
assert result["OEFFENTLICHE_STELLE"] is True
def test_risk_matrix_generated(self):
ctx = {
"organisation_name": "Test",
"system_name": "Tool",
"system_description": "x",
"einsatzzweck": "y",
"domain": "hr",
"risk_score": 60,
}
result = generate_fria_draft(ctx)
assert result["RISIKOMATRIX"] != "{{RISIKOMATRIX}}"
assert "Nicht-Diskriminierung" in result["RISIKOMATRIX"]
def test_affected_groups(self):
ctx = {
"organisation_name": "Test",
"system_name": "Tool",
"system_description": "x",
"einsatzzweck": "y",
"affected_groups": ["Bewerber", "Beschaeftigte"],
"affected_count": "~500 pro Jahr",
}
result = generate_fria_draft(ctx)
assert "Bewerber" in result["BETROFFENE_GRUPPEN"]
assert result["BETROFFENE_ANZAHL"] == "~500 pro Jahr"
class TestDSFADomainRisks:
"""Tests for domain-specific risks in DSFA generator."""
def test_hr_domain_adds_agg_risks(self):
# Import from dsfa_template
from dsfa_template import _generate_risk_assessment
ctx = {"has_ai_systems": True, "domain": "hr"}
output = _generate_risk_assessment(ctx)
assert "AGG-Verstoss" in output
assert "Beweislastumkehr" in output
def test_education_domain_adds_risks(self):
from dsfa_template import _generate_risk_assessment
ctx = {"has_ai_systems": True, "domain": "education"}
output = _generate_risk_assessment(ctx)
assert "Chancenungleichheit" in output
def test_healthcare_domain_adds_risks(self):
from dsfa_template import _generate_risk_assessment
ctx = {"has_ai_systems": True, "domain": "healthcare"}
output = _generate_risk_assessment(ctx)
assert "Fehldiagnose" in output
def test_finance_domain_adds_risks(self):
from dsfa_template import _generate_risk_assessment
ctx = {"has_ai_systems": True, "domain": "finance"}
output = _generate_risk_assessment(ctx)
assert "Kredit-Scoring" in output
def test_no_domain_no_extra_risks(self):
from dsfa_template import _generate_risk_assessment
ctx = {"has_ai_systems": True}
output = _generate_risk_assessment(ctx)
assert "AGG-Verstoss" not in output
assert "Fehldiagnose" not in output
if __name__ == "__main__":
pytest.main([__file__, "-v"])

View File

@@ -0,0 +1,285 @@
"""TOM template generator V2 — SDM-structured TOM catalog.
Replaces the flat 17-measure list with a hierarchical structure based on
the 7 SDM Gewaehrleistungsziele (Standard-Datenschutzmodell V3.1a).
"""
# -- SDM-structured TOM catalog ---------------------------------------------
SDM_TOM_CATALOG = {
"verfuegbarkeit": {
"label": "Verfuegbarkeit",
"sdm_baustein": "SDM-B11 (Aufbewahren)",
"measures": [
{"name": "Redundante Datenhaltung", "description": "RAID, Replikation, Geo-Redundanz", "type": "technical"},
{"name": "Backup-Strategie", "description": "Taeglich inkrementell, woechentlich voll, verschluesselt", "type": "technical"},
{"name": "Disaster-Recovery-Plan", "description": "Dokumentierte RTO/RPO-Werte, jaehrliche Tests", "type": "organizational"},
{"name": "USV / Notstromversorgung", "description": "Unterbrechungsfreie Stromversorgung fuer kritische Systeme", "type": "technical"},
],
},
"integritaet": {
"label": "Integritaet",
"sdm_baustein": "SDM-B61 (Berichtigen)",
"measures": [
{"name": "Pruefsummen und Signaturen", "description": "Digitale Signaturen fuer Datenuebertragungen", "type": "technical"},
{"name": "Eingabevalidierung", "description": "Plausibilitaetspruefungen auf allen Eingabeschnittstellen", "type": "technical"},
{"name": "Change Management", "description": "Dokumentierte Aenderungsverfahren mit Freigabeprozess", "type": "organizational"},
{"name": "Versionierung", "description": "Versionierung von Datensaetzen und Konfigurationen", "type": "technical"},
],
},
"vertraulichkeit": {
"label": "Vertraulichkeit",
"sdm_baustein": "SDM-B51 (Zugriffe regeln)",
"measures": [
{"name": "Verschluesselung im Transit", "description": "TLS 1.3 fuer alle Verbindungen", "type": "technical"},
{"name": "Verschluesselung at Rest", "description": "AES-256 fuer gespeicherte Daten", "type": "technical"},
{"name": "Zugriffskonzept (RBAC)", "description": "Rollenbasiert, Least-Privilege-Prinzip, regelmaessige Reviews", "type": "technical"},
{"name": "Multi-Faktor-Authentifizierung", "description": "MFA fuer alle administrativen Zugaenge", "type": "technical"},
{"name": "Physische Zutrittskontrolle", "description": "Schluessel, Kartenleser, Besucherprotokoll", "type": "technical"},
{"name": "Vertraulichkeitsverpflichtung", "description": "Schriftliche Verpflichtung aller Mitarbeitenden", "type": "organizational"},
],
},
"nichtverkettung": {
"label": "Nichtverkettung",
"sdm_baustein": "SDM-B50 (Trennen)",
"measures": [
{"name": "Mandantentrennung", "description": "Logische Datentrennung nach Mandanten/Zweck", "type": "technical"},
{"name": "Pseudonymisierung", "description": "Wo fachlich moeglich, Einsatz von Pseudonymen", "type": "technical"},
{"name": "Zweckbindungspruefung", "description": "Pruefung bei jeder neuen Datennutzung", "type": "organizational"},
],
},
"transparenz": {
"label": "Transparenz",
"sdm_baustein": "SDM-B42 (Dokumentieren), SDM-B43 (Protokollieren)",
"measures": [
{"name": "Verarbeitungsverzeichnis", "description": "Art. 30 DS-GVO konformes VVT", "type": "organizational"},
{"name": "Audit-Logging", "description": "Vollstaendige Protokollierung aller Datenzugriffe", "type": "technical"},
{"name": "Informationspflichten", "description": "Art. 13/14 DS-GVO Datenschutzerklaerung", "type": "organizational"},
{"name": "Datenpannen-Prozess", "description": "Dokumentierter Meldeprozess Art. 33/34 DS-GVO", "type": "organizational"},
],
},
"intervenierbarkeit": {
"label": "Intervenierbarkeit",
"sdm_baustein": "SDM-B60 (Loeschen), SDM-B61 (Berichtigen), SDM-B62 (Einschraenken)",
"measures": [
{"name": "Betroffenenanfragen-Prozess", "description": "Auskunft, Loeschung, Berichtigung, Widerspruch", "type": "organizational"},
{"name": "Technische Loeschfaehigkeit", "description": "Loeschung mit Nachweis (Loeschprotokoll)", "type": "technical"},
{"name": "Datenportabilitaet", "description": "Export in maschinenlesbarem Format (Art. 20)", "type": "technical"},
{"name": "Sperrfunktion", "description": "Einschraenkung der Verarbeitung moeglich", "type": "technical"},
],
},
"datenminimierung": {
"label": "Datenminimierung",
"sdm_baustein": "SDM-B41 (Planen und Spezifizieren)",
"measures": [
{"name": "Erforderlichkeitspruefung", "description": "Regelmaessige Pruefung der erhobenen Datenfelder", "type": "organizational"},
{"name": "Automatisierte Loeschung", "description": "Fristgerechte Loeschung nach Aufbewahrungsfrist", "type": "technical"},
{"name": "Anonymisierung", "description": "Anonymisierung/Aggregation fuer Statistik", "type": "technical"},
{"name": "Privacy by Design", "description": "Datenschutz ab Entwurfsphase neuer Verarbeitungen", "type": "organizational"},
],
},
}
# -- Sector-specific extensions ----------------------------------------------
SECTOR_TOMS = {
"it_saas": {
"label": "IT / SaaS",
"measures": [
{"name": "Container-Isolation", "description": "Workload-Isolation zwischen Mandanten (Kubernetes Namespaces)", "type": "technical", "sdm_goal": "nichtverkettung"},
{"name": "API-Security", "description": "Rate Limiting, OAuth 2.0, API-Key-Rotation", "type": "technical", "sdm_goal": "vertraulichkeit"},
{"name": "DevSecOps Pipeline", "description": "SAST/DAST in CI/CD, Dependency Scanning", "type": "technical", "sdm_goal": "integritaet"},
{"name": "Secrets Management", "description": "Vault/KMS fuer Credentials, keine Hardcoded Secrets", "type": "technical", "sdm_goal": "vertraulichkeit"},
],
},
"gesundheitswesen": {
"label": "Gesundheitswesen",
"measures": [
{"name": "Patientenakten-Verschluesselung", "description": "Ende-zu-Ende-Verschluesselung fuer Gesundheitsdaten (Art. 9)", "type": "technical", "sdm_goal": "vertraulichkeit"},
{"name": "Notfallzugriff", "description": "Break-the-Glass-Verfahren fuer medizinische Notfaelle", "type": "organizational", "sdm_goal": "verfuegbarkeit"},
{"name": "Forschungsdaten-Anonymisierung", "description": "Vollstaendige Anonymisierung vor Forschungsnutzung", "type": "technical", "sdm_goal": "datenminimierung"},
],
},
"finanzdienstleistungen": {
"label": "Finanzdienstleistungen",
"measures": [
{"name": "Transaktions-Monitoring", "description": "Echtzeit-Ueberwachung auf Unregelmaessigkeiten (GwG)", "type": "technical", "sdm_goal": "integritaet"},
{"name": "Aufbewahrungspflichten", "description": "10 Jahre Aufbewahrung gemaess AO/HGB, danach Loeschung", "type": "organizational", "sdm_goal": "datenminimierung"},
{"name": "PCI-DSS Compliance", "description": "Payment Card Industry Standards fuer Kartendaten", "type": "technical", "sdm_goal": "vertraulichkeit"},
],
},
"handel": {
"label": "Handel / E-Commerce",
"measures": [
{"name": "Cookie-Consent-Management", "description": "TDDDG-konformes Einwilligungsmanagement", "type": "technical", "sdm_goal": "transparenz"},
{"name": "Gastzugang-Option", "description": "Bestellung ohne Pflicht-Kundenkonto (Datenminimierung)", "type": "organizational", "sdm_goal": "datenminimierung"},
{"name": "Zahlungsdaten-Tokenisierung", "description": "Keine direkte Speicherung von Zahlungsdaten", "type": "technical", "sdm_goal": "vertraulichkeit"},
],
},
"handwerk": {
"label": "Handwerk",
"measures": [
{"name": "Mobile-Device-Management", "description": "Absicherung mobiler Endgeraete auf Baustellen", "type": "technical", "sdm_goal": "vertraulichkeit"},
{"name": "Papierakten-Sicherung", "description": "Verschlossene Schraenke fuer physische Kundenakten", "type": "technical", "sdm_goal": "vertraulichkeit"},
],
},
}
# -- NIS2 / ISO 27001 / AI Act extensions -----------------------------------
NIS2_TOMS = [
{"name": "Incident-Response-Plan", "description": "NIS2-konformer Vorfallreaktionsplan (72h Meldepflicht an BSI)", "type": "organizational", "sdm_goal": "verfuegbarkeit"},
{"name": "Supply-Chain-Security", "description": "Bewertung der Lieferkettensicherheit (BSIG 2025)", "type": "organizational", "sdm_goal": "integritaet"},
{"name": "Vulnerability Management", "description": "Regelmaessige Schwachstellenscans, Patch-Management", "type": "technical", "sdm_goal": "integritaet"},
{"name": "Security Awareness", "description": "Pflicht-Schulungen Cybersicherheit fuer Geschaeftsleitung", "type": "organizational", "sdm_goal": "vertraulichkeit"},
]
ISO27001_TOMS = [
{"name": "ISMS Risikomanagement", "description": "ISO 27001 Anhang A — Informationssicherheits-Risikobewertung", "type": "organizational", "sdm_goal": "verfuegbarkeit"},
{"name": "Dokumentenlenkung", "description": "Versionierte Sicherheitsrichtlinien und -verfahren", "type": "organizational", "sdm_goal": "transparenz"},
{"name": "Management Review", "description": "Jaehrliche Ueberprufung des ISMS durch Geschaeftsleitung", "type": "organizational", "sdm_goal": "transparenz"},
]
AI_ACT_TOMS = [
{"name": "KI-Risikoklassifizierung", "description": "Bewertung aller KI-Systeme nach EU AI Act Risikokategorien", "type": "organizational", "sdm_goal": "transparenz"},
{"name": "Human Oversight", "description": "Menschliche Aufsicht fuer Hochrisiko-KI-Systeme (Art. 14 KI-VO)", "type": "organizational", "sdm_goal": "intervenierbarkeit"},
{"name": "KI-Transparenz", "description": "Transparenzpflichten bei KI-Einsatz gegenueber Betroffenen (Art. 13 KI-VO)", "type": "organizational", "sdm_goal": "transparenz"},
{"name": "KI-Bias-Monitoring", "description": "Ueberwachung auf diskriminierende Ergebnisse", "type": "technical", "sdm_goal": "integritaet"},
]
def generate_tom_drafts(ctx: dict) -> list[dict]:
"""Generate TOM measure drafts structured by SDM Gewaehrleistungsziele.
Args:
ctx: Flat dict from company-profile/template-context.
Returns:
List of TOM measure dicts with SDM goal assignment.
"""
measures = []
control_counter = 0
# Base SDM measures
for goal_key, goal_data in SDM_TOM_CATALOG.items():
for m in goal_data["measures"]:
control_counter += 1
measures.append(_build_measure(
counter=control_counter,
measure=m,
sdm_goal=goal_key,
sdm_baustein=goal_data["sdm_baustein"],
category=goal_data["label"],
ctx=ctx,
))
# Regulatory extensions
if ctx.get("subject_to_nis2"):
for m in NIS2_TOMS:
control_counter += 1
measures.append(_build_measure(
counter=control_counter,
measure=m,
sdm_goal=m["sdm_goal"],
sdm_baustein="NIS2 / BSIG 2025",
category="Cybersicherheit (NIS2)",
ctx=ctx,
))
if ctx.get("subject_to_iso27001"):
for m in ISO27001_TOMS:
control_counter += 1
measures.append(_build_measure(
counter=control_counter,
measure=m,
sdm_goal=m["sdm_goal"],
sdm_baustein="ISO 27001 Anhang A",
category="ISMS (ISO 27001)",
ctx=ctx,
))
if ctx.get("subject_to_ai_act") or ctx.get("has_ai_systems"):
for m in AI_ACT_TOMS:
control_counter += 1
measures.append(_build_measure(
counter=control_counter,
measure=m,
sdm_goal=m["sdm_goal"],
sdm_baustein="EU AI Act (2024/1689)",
category="KI-Compliance",
ctx=ctx,
))
# Sector-specific extensions
sector = _detect_sector(ctx)
if sector and sector in SECTOR_TOMS:
sector_data = SECTOR_TOMS[sector]
for m in sector_data["measures"]:
control_counter += 1
measures.append(_build_measure(
counter=control_counter,
measure=m,
sdm_goal=m.get("sdm_goal", "vertraulichkeit"),
sdm_baustein=f"Sektor: {sector_data['label']}",
category=f"Sektor ({sector_data['label']})",
ctx=ctx,
))
return measures
def sdm_coverage_summary(measures: list[dict]) -> dict:
"""Return coverage matrix: SDM goal -> measure count."""
summary = {}
for goal_key in SDM_TOM_CATALOG:
count = sum(1 for m in measures if m.get("sdm_goal") == goal_key)
summary[goal_key] = {
"label": SDM_TOM_CATALOG[goal_key]["label"],
"count": count,
}
return summary
# -- Internal helpers --------------------------------------------------------
def _build_measure(counter: int, measure: dict, sdm_goal: str,
sdm_baustein: str, category: str, ctx: dict) -> dict:
return {
"control_id": f"TOM-SDM-{counter:03d}",
"name": measure["name"],
"description": measure["description"],
"category": category,
"type": measure.get("type", "organizational"),
"sdm_goal": sdm_goal,
"sdm_baustein_ref": sdm_baustein,
"implementation_status": "not_implemented",
"effectiveness_rating": "not_assessed",
"responsible_department": "IT-Sicherheit",
"priority": _assess_priority(measure, ctx),
"review_frequency": f"{ctx.get('review_cycle_months', 12)} Monate",
}
def _assess_priority(measure: dict, ctx: dict) -> str:
name_lower = measure.get("name", "").lower()
if any(kw in name_lower for kw in ["verschluesselung", "mfa", "incident", "ki-risiko"]):
return "high"
if any(kw in name_lower for kw in ["backup", "zugriff", "logging", "loeschung"]):
return "high"
return "medium"
def _detect_sector(ctx: dict) -> str | None:
"""Map company industry to sector key."""
industry = (ctx.get("industry") or "").lower()
mapping = {
"technologie": "it_saas", "it": "it_saas", "saas": "it_saas", "software": "it_saas",
"gesundheit": "gesundheitswesen", "pharma": "gesundheitswesen", "medizin": "gesundheitswesen",
"finanz": "finanzdienstleistungen", "bank": "finanzdienstleistungen", "versicherung": "finanzdienstleistungen",
"handel": "handel", "e-commerce": "handel", "einzelhandel": "handel", "shop": "handel",
"handwerk": "handwerk", "bau": "handwerk", "kfz": "handwerk",
}
for keyword, sector in mapping.items():
if keyword in industry:
return sector
return None

View File

@@ -0,0 +1,393 @@
"""VVT template generator V2 — sector-specific VVT activity drafts.
Generates Art. 30 DS-GVO compliant VVT entries with sector-specific
standard processing activities inspired by BayLDA patterns.
"""
from typing import Optional
# -- Sector activity catalogs ------------------------------------------------
SECTOR_ACTIVITIES = {
"it_saas": [
{
"name": "SaaS-Plattformbetrieb",
"purposes": ["Bereitstellung und Betrieb der SaaS-Plattform"],
"legal_bases": ["Art. 6 Abs. 1 lit. b DS-GVO (Vertragserfullung)"],
"data_subject_categories": ["Kunden", "Endnutzer"],
"personal_data_categories": ["Stammdaten", "Nutzungsdaten", "Inhaltsdaten", "Logdaten"],
"recipient_categories": ["Hosting-Anbieter (AVV)", "Support-Dienstleister (AVV)"],
"retention_period": "90 Tage nach Vertragsende + gesetzl. Aufbewahrung",
"tom_description": "Mandantentrennung, Verschluesselung, RBAC",
"dpia_required": True,
},
{
"name": "Kundenverwaltung / CRM",
"purposes": ["Verwaltung von Kundenbeziehungen, Vertragsmanagement"],
"legal_bases": ["Art. 6 Abs. 1 lit. b DS-GVO"],
"data_subject_categories": ["Kunden", "Ansprechpartner", "Interessenten"],
"personal_data_categories": ["Kontaktdaten", "Vertragsdaten", "Kommunikationshistorie"],
"recipient_categories": ["CRM-Anbieter (AVV)"],
"retention_period": "3 Jahre nach letztem Kontakt, 10 Jahre Rechnungsdaten",
"tom_description": "Zugriffsbeschraenkung Vertrieb/Support, Protokollierung",
},
{
"name": "E-Mail-Marketing / Newsletter",
"purposes": ["Versand von Produkt-Updates und Marketing-Newsletter"],
"legal_bases": ["Art. 6 Abs. 1 lit. a DS-GVO (Einwilligung)", "UWG §7"],
"data_subject_categories": ["Newsletter-Abonnenten"],
"personal_data_categories": ["E-Mail-Adresse", "Name", "Oeffnungs-/Klickverhalten"],
"recipient_categories": ["E-Mail-Dienstleister (AVV)"],
"retention_period": "Unverzueglich nach Widerruf",
"tom_description": "Double-Opt-In, einfache Abmeldefunktion",
},
{
"name": "Webanalyse",
"purposes": ["Analyse der Website-Nutzung zur Verbesserung"],
"legal_bases": ["Art. 6 Abs. 1 lit. a DS-GVO (Einwilligung via Cookie-Banner)"],
"data_subject_categories": ["Website-Besucher"],
"personal_data_categories": ["IP-Adresse (anonymisiert)", "Seitenaufrufe", "Geraeteinformationen"],
"recipient_categories": ["Analyse-Anbieter (AVV)"],
"retention_period": "14 Monate",
"tom_description": "IP-Anonymisierung, Cookie-Consent (TDDDG §25)",
},
{
"name": "Bewerbermanagement",
"purposes": ["Bearbeitung von Bewerbungen"],
"legal_bases": ["Art. 6 Abs. 1 lit. b DS-GVO i.V.m. §26 BDSG"],
"data_subject_categories": ["Bewerber"],
"personal_data_categories": ["Kontaktdaten", "Lebenslauf", "Qualifikationen"],
"recipient_categories": ["Fachabteilung"],
"retention_period": "6 Monate nach Verfahrensabschluss (AGG)",
"tom_description": "Zugriffsschutz Bewerbungsportal, verschluesselte Uebertragung",
},
{
"name": "Mitarbeiterverwaltung / HR",
"purposes": ["Personalverwaltung, Lohnabrechnung, Arbeitszeiterfassung"],
"legal_bases": ["Art. 6 Abs. 1 lit. b/c DS-GVO i.V.m. §26 BDSG"],
"data_subject_categories": ["Beschaeftigte"],
"personal_data_categories": ["Stammdaten", "Vertragsdaten", "Bankverbindung", "Arbeitszeiten"],
"recipient_categories": ["Lohnbuero (AVV)", "Finanzamt", "Sozialversicherungstraeger"],
"retention_period": "10 Jahre nach Austritt",
"tom_description": "Besonderer Zugriffsschutz (nur HR), verschluesselte Speicherung",
},
{
"name": "Support-Ticketing",
"purposes": ["Bearbeitung von Kundenanfragen und Stoerungsmeldungen"],
"legal_bases": ["Art. 6 Abs. 1 lit. b DS-GVO"],
"data_subject_categories": ["Kunden", "Endnutzer"],
"personal_data_categories": ["Kontaktdaten", "Ticket-Inhalt", "Systemlogs"],
"recipient_categories": ["Support-Tool-Anbieter (AVV)"],
"retention_period": "2 Jahre nach Ticket-Schliessung",
"tom_description": "Rollenbasierter Zugriff, Pseudonymisierung in Reports",
},
{
"name": "Logging und Monitoring",
"purposes": ["Sicherheitsueberwachung, Fehleranalyse"],
"legal_bases": ["Art. 6 Abs. 1 lit. f DS-GVO (berechtigtes Interesse: IT-Sicherheit)"],
"data_subject_categories": ["Plattform-Nutzer", "Administratoren"],
"personal_data_categories": ["IP-Adressen", "Zugriffszeitpunkte", "Fehlerprotokolle"],
"recipient_categories": ["Log-Management-Anbieter (AVV)"],
"retention_period": "30 Tage Anwendungslogs, 90 Tage Sicherheitslogs",
"tom_description": "Zugriffsschutz Logdaten, automatische Rotation",
},
],
"gesundheitswesen": [
{
"name": "Patientenverwaltung",
"purposes": ["Patientenakte, Behandlungsdokumentation"],
"legal_bases": ["Art. 9 Abs. 2 lit. h DS-GVO i.V.m. §630f BGB"],
"data_subject_categories": ["Patienten"],
"personal_data_categories": ["Stammdaten", "Versicherung", "Diagnosen", "Befunde (Art. 9)"],
"recipient_categories": ["PVS-Anbieter (AVV)", "Labor (AVV)", "ueberweisende Aerzte"],
"retention_period": "10 Jahre nach letzter Behandlung (§630f BGB)",
"tom_description": "Verschluesselung Patientenakte, Notfallzugriff",
"dpia_required": True,
},
{
"name": "Abrechnung (KV/PKV)",
"purposes": ["Abrechnung aerztlicher Leistungen"],
"legal_bases": ["Art. 6 Abs. 1 lit. c DS-GVO", "Art. 9 Abs. 2 lit. h"],
"data_subject_categories": ["Patienten"],
"personal_data_categories": ["Stammdaten", "Versicherung", "Diagnosen (ICD)", "Leistungsziffern"],
"recipient_categories": ["KV", "PKV", "Abrechnungsstelle (AVV)"],
"retention_period": "10 Jahre (AO)",
"tom_description": "Verschluesselte Uebermittlung (KV-Connect/KIM)",
},
],
"handel": [
{
"name": "Bestellabwicklung",
"purposes": ["Bestellannahme, Versand, Rechnungsstellung"],
"legal_bases": ["Art. 6 Abs. 1 lit. b DS-GVO"],
"data_subject_categories": ["Kunden (Besteller)"],
"personal_data_categories": ["Kontaktdaten", "Lieferadresse", "Bestelldaten", "Rechnungsdaten"],
"recipient_categories": ["Versanddienstleister", "Zahlungsanbieter (AVV)"],
"retention_period": "10 Jahre Rechnungen, 3 Jahre Bestelldaten",
"tom_description": "Verschluesselte Uebertragung, Zugriffsschutz",
},
{
"name": "Kundenkonto",
"purposes": ["Bereitstellung Kundenkonto (optional)"],
"legal_bases": ["Art. 6 Abs. 1 lit. a/b DS-GVO"],
"data_subject_categories": ["Registrierte Kunden"],
"personal_data_categories": ["Stammdaten", "Passwort (gehasht)", "Bestellhistorie"],
"recipient_categories": ["Shop-Plattform (AVV)"],
"retention_period": "Sofort nach Kontoloesch-Anfrage, Rechnungen 10 Jahre",
"tom_description": "MFA-Option, bcrypt Passwortspeicherung, Gastzugang-Alternative",
},
{
"name": "Zahlungsabwicklung",
"purposes": ["Abwicklung von Zahlungsvorgaengen"],
"legal_bases": ["Art. 6 Abs. 1 lit. b DS-GVO"],
"data_subject_categories": ["Zahlende Kunden"],
"personal_data_categories": ["Zahlungsart", "Transaktionsdaten"],
"recipient_categories": ["Payment-Service-Provider"],
"retention_period": "10 Jahre (AO)",
"tom_description": "PCI-DSS, Tokenisierung, keine direkte Kartenspeicherung",
},
],
"handwerk": [
{
"name": "Kundenauftraege und Angebotserstellung",
"purposes": ["Angebotserstellung, Auftragsabwicklung, Rechnungsstellung"],
"legal_bases": ["Art. 6 Abs. 1 lit. b DS-GVO"],
"data_subject_categories": ["Kunden (Privat/Gewerbe)"],
"personal_data_categories": ["Kontaktdaten", "Objektadresse", "Auftrag", "Rechnungsdaten"],
"recipient_categories": ["Steuerberater", "ggf. Subunternehmer"],
"retention_period": "10 Jahre Rechnungen, 5 Jahre Gewaehrleistung",
"tom_description": "Zugriffskontrolle Auftragssystem",
},
{
"name": "Baustellendokumentation",
"purposes": ["Dokumentation Baufortschritt, Maengelprotokoll"],
"legal_bases": ["Art. 6 Abs. 1 lit. b/f DS-GVO"],
"data_subject_categories": ["Kunden", "Mitarbeitende"],
"personal_data_categories": ["Fotos", "Protokolle", "Abnahmedokumente"],
"recipient_categories": ["Auftraggeber", "Architekten"],
"retention_period": "5 Jahre nach Abnahme",
"tom_description": "Projektordner mit Zugriffsbeschraenkung",
},
],
"bildung": [
{
"name": "Schueler-/Studierendenverwaltung",
"purposes": ["Verwaltung von Schueler-/Studierendendaten"],
"legal_bases": ["Art. 6 Abs. 1 lit. c/e DS-GVO i.V.m. Schulgesetz"],
"data_subject_categories": ["Schueler/Studierende (ggf. Minderjaehrige)", "Erziehungsberechtigte"],
"personal_data_categories": ["Stammdaten", "Kontaktdaten Erziehungsberechtigte"],
"recipient_categories": ["Schulverwaltungssoftware (AVV)", "Schulbehoerde"],
"retention_period": "Gemaess Schulgesetz (i.d.R. 5 Jahre nach Abgang)",
"tom_description": "Besonderer Zugriffsschutz, Einwilligung Erziehungsberechtigte",
"dpia_required": True,
},
{
"name": "Notenverarbeitung",
"purposes": ["Leistungsbewertung, Zeugniserstellung"],
"legal_bases": ["Art. 6 Abs. 1 lit. c/e DS-GVO i.V.m. Schulgesetz"],
"data_subject_categories": ["Schueler/Studierende"],
"personal_data_categories": ["Noten", "Leistungsbewertungen", "Pruefungsergebnisse"],
"recipient_categories": ["Lehrkraefte", "Schulleitung"],
"retention_period": "Zeugniskopien 50 Jahre, Einzelnoten 2 Jahre",
"tom_description": "Zugriffsbeschraenkung auf Fachlehrkraft, verschluesselt",
},
],
"beratung": [
{
"name": "Mandantenverwaltung",
"purposes": ["Verwaltung von Mandantenbeziehungen"],
"legal_bases": ["Art. 6 Abs. 1 lit. b DS-GVO"],
"data_subject_categories": ["Mandanten", "Ansprechpartner"],
"personal_data_categories": ["Kontaktdaten", "Vertragsdaten", "Korrespondenz"],
"recipient_categories": ["Kanzleisoftware (AVV)", "Steuerberater"],
"retention_period": "10 Jahre Rechnungen, 5 Jahre Handakten",
"tom_description": "Mandantengeheimnis, Need-to-know-Prinzip",
},
{
"name": "Projektmanagement",
"purposes": ["Planung und Steuerung von Beratungsprojekten"],
"legal_bases": ["Art. 6 Abs. 1 lit. b/f DS-GVO"],
"data_subject_categories": ["Projektbeteiligte"],
"personal_data_categories": ["Projektdaten", "Aufgaben", "Zeiterfassung"],
"recipient_categories": ["PM-Tool (AVV)", "Mandant"],
"retention_period": "2 Jahre nach Projektabschluss",
"tom_description": "Projektspezifische Zugriffsrechte, Mandantentrennung",
},
{
"name": "Zeiterfassung und Abrechnung",
"purposes": ["Stundenerfassung, Abrechnung gegenueber Mandanten"],
"legal_bases": ["Art. 6 Abs. 1 lit. b DS-GVO"],
"data_subject_categories": ["Berater/Mitarbeitende", "Mandanten"],
"personal_data_categories": ["Arbeitszeiten", "Taetigkeitsbeschreibungen", "Stundensaetze"],
"recipient_categories": ["Abrechnungssystem (AVV)", "Buchhaltung"],
"retention_period": "10 Jahre (AO)",
"tom_description": "Zugriff nur eigene Zeiten + Projektleitung",
},
],
}
# Industry -> Sector mapping
INDUSTRY_SECTOR_MAP = {
"technologie": "it_saas", "it": "it_saas", "saas": "it_saas", "software": "it_saas",
"it dienstleistungen": "it_saas",
"gesundheit": "gesundheitswesen", "pharma": "gesundheitswesen",
"e-commerce": "handel", "handel": "handel", "einzelhandel": "handel",
"handwerk": "handwerk", "bau": "handwerk", "kfz": "handwerk",
"bildung": "bildung", "schule": "bildung", "hochschule": "bildung",
"beratung": "beratung", "consulting": "beratung", "kanzlei": "beratung",
"recht": "beratung",
}
def generate_vvt_drafts(ctx: dict) -> list[dict]:
"""Generate VVT activity drafts, sector-specific if possible.
Args:
ctx: Flat dict from company-profile/template-context.
Returns:
List of VVT activity dicts ready for creation.
"""
company = ctx.get("company_name", "Unbekannt")
dpo = ctx.get("dpo_name", "")
sector = _detect_sector(ctx)
# Use sector-specific activities if available, else generate from systems
if sector and sector in SECTOR_ACTIVITIES:
activities = _generate_sector_vvt(ctx, sector, company, dpo)
else:
activities = _generate_system_vvt(ctx, company, dpo)
# Always add standard HR activity if not already present
has_hr = any("mitarbeiter" in a.get("name", "").lower() or "hr" in a.get("name", "").lower()
for a in activities)
if not has_hr and len(activities) > 0:
activities.append(_build_hr_activity(len(activities) + 1, company, dpo))
return activities
def _detect_sector(ctx: dict) -> Optional[str]:
industry = (ctx.get("industry") or "").lower().strip()
for keyword, sector in INDUSTRY_SECTOR_MAP.items():
if keyword in industry:
return sector
return None
def _generate_sector_vvt(ctx: dict, sector: str, company: str, dpo: str) -> list[dict]:
activities = []
sector_data = SECTOR_ACTIVITIES[sector]
for i, template in enumerate(sector_data, 1):
activity = {
"vvt_id": f"VVT-{sector.upper()[:3]}-{i:03d}",
"name": template["name"],
"description": f"Automatisch generierter VVT-Eintrag: {template['name']}",
"purposes": template["purposes"],
"legal_bases": template["legal_bases"],
"data_subject_categories": template["data_subject_categories"],
"personal_data_categories": template["personal_data_categories"],
"recipient_categories": template["recipient_categories"],
"third_country_transfers": _assess_third_country(ctx),
"retention_period": {"default": template["retention_period"]},
"tom_description": template["tom_description"],
"business_function": _infer_business_function(template["name"]),
"systems": [],
"protection_level": "HIGH" if template.get("dpia_required") else "MEDIUM",
"dpia_required": template.get("dpia_required", False),
"status": "DRAFT",
"responsible": dpo or company,
"source_sector": sector,
}
activities.append(activity)
return activities
def _generate_system_vvt(ctx: dict, company: str, dpo: str) -> list[dict]:
"""Fallback: generate VVT per processing system (original approach)."""
systems = ctx.get("processing_systems", [])
activities = []
for i, system in enumerate(systems, 1):
name = system.get("name", f"System {i}")
vendor = system.get("vendor", "")
hosting = system.get("hosting", "on-premise")
categories = system.get("personal_data_categories", [])
activity = {
"vvt_id": f"VVT-SYS-{i:03d}",
"name": f"Verarbeitung in {name}",
"description": f"VVT-Eintrag fuer System '{name}'"
+ (f" (Anbieter: {vendor})" if vendor else ""),
"purposes": [f"Datenverarbeitung via {name}"],
"legal_bases": ["Art. 6 Abs. 1 lit. b DS-GVO (Vertragserfullung)"],
"data_subject_categories": [],
"personal_data_categories": categories,
"recipient_categories": [vendor] if vendor else [],
"third_country_transfers": _assess_third_country_hosting(hosting),
"retention_period": {"default": "Gemaess Loeschfristenkatalog"},
"tom_description": f"Siehe TOM-Katalog fuer {name}",
"business_function": "IT",
"systems": [name],
"deployment_model": hosting,
"protection_level": "HIGH" if len(categories) > 3 else "MEDIUM",
"dpia_required": len(categories) > 3,
"status": "DRAFT",
"responsible": dpo or company,
}
activities.append(activity)
return activities
def _build_hr_activity(index: int, company: str, dpo: str) -> dict:
return {
"vvt_id": f"VVT-STD-{index:03d}",
"name": "Mitarbeiterverwaltung / HR",
"description": "Standard-Verarbeitungstaetigkeit Personalverwaltung",
"purposes": ["Personalverwaltung, Lohnabrechnung, Arbeitszeiterfassung"],
"legal_bases": ["Art. 6 Abs. 1 lit. b/c DS-GVO i.V.m. §26 BDSG"],
"data_subject_categories": ["Beschaeftigte"],
"personal_data_categories": ["Stammdaten", "Vertragsdaten", "Bankverbindung", "Arbeitszeiten"],
"recipient_categories": ["Lohnbuero (AVV)", "Finanzamt", "Sozialversicherungstraeger"],
"third_country_transfers": [],
"retention_period": {"default": "10 Jahre nach Austritt"},
"tom_description": "Besonderer Zugriffsschutz (nur HR), verschluesselte Speicherung",
"business_function": "HR",
"systems": [],
"protection_level": "HIGH",
"dpia_required": False,
"status": "DRAFT",
"responsible": dpo or company,
}
def _assess_third_country(ctx: dict) -> list:
if ctx.get("third_country_transfer"):
return [{"country": "Abhaengig von Dienstleister", "mechanism": "Pruefung erforderlich"}]
return []
def _assess_third_country_hosting(hosting: str) -> list:
if hosting in ("us-cloud", "international"):
return [{"country": "USA", "mechanism": "EU-US Data Privacy Framework"}]
return []
def _infer_business_function(name: str) -> str:
name_lower = name.lower()
if any(kw in name_lower for kw in ["mitarbeiter", "hr", "personal", "bewerbung"]):
return "HR"
if any(kw in name_lower for kw in ["abrechnung", "rechnung", "zahlung", "buchhaltung"]):
return "Finanzen"
if any(kw in name_lower for kw in ["marketing", "newsletter", "webanalyse", "crm", "akquise"]):
return "Marketing/Vertrieb"
if any(kw in name_lower for kw in ["support", "ticket", "kundenservice"]):
return "Support"
if any(kw in name_lower for kw in ["patient", "befund", "labor", "termin"]):
return "Medizin"
if any(kw in name_lower for kw in ["schueler", "noten", "lernplattform"]):
return "Paedagogik"
return "IT"

View File

@@ -0,0 +1,405 @@
-- Migration 001: DSFA Template V2 — Datenschutz-Folgenabschaetzung
-- Archiviert V1 (aus Migration 025) und fuegt erweiterte V2 ein.
-- Zielrepo: breakpilot-compliance (spaetere Integration)
-- 1. Bestehende V1 archivieren
UPDATE compliance.compliance_legal_templates
SET status = 'archived', updated_at = NOW()
WHERE document_type = 'dsfa'
AND status = 'published';
-- 2. DSFA V2 einfuegen
INSERT INTO compliance.compliance_legal_templates (
tenant_id, document_type, title, description, language, jurisdiction,
version, status, license_name, source_name, attribution_required,
is_complete_document, placeholders, content
) VALUES (
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
'dsfa',
'Datenschutz-Folgenabschaetzung (DSFA) gemaess Art. 35 DSGVO — V2',
'Erweiterte Vorlage fuer eine Datenschutz-Folgenabschaetzung mit Schwellwertanalyse (WP248), SDM-basierter TOM-Struktur, strukturierter Risikobewertung nach ISO 29134 und KI-Modul (EU AI Act). Geeignet fuer alle Verarbeitungen, die einer DSFA beduerfen.',
'de',
'EU/DSGVO',
'2.0',
'published',
'MIT',
'BreakPilot Compliance',
false,
true,
CAST('[
"{{ORGANISATION_NAME}}",
"{{ORGANISATION_ADRESSE}}",
"{{DSB_NAME}}",
"{{DSB_KONTAKT}}",
"{{BUNDESLAND}}",
"{{AUFSICHTSBEHOERDE}}",
"{{ERSTELLT_VON}}",
"{{ERSTELLT_AM}}",
"{{GENEHMIGT_VON}}",
"{{GENEHMIGT_AM}}",
"{{WP248_K1_BEWERTUNG_SCORING}}",
"{{WP248_K2_AUTOMATISIERTE_ENTSCHEIDUNG}}",
"{{WP248_K3_SYSTEMATISCHE_UEBERWACHUNG}}",
"{{WP248_K4_SENSIBLE_DATEN}}",
"{{WP248_K5_GROSSER_UMFANG}}",
"{{WP248_K6_DATENVERKNUEPFUNG}}",
"{{WP248_K7_SCHUTZBEDUERFTIGE_BETROFFENE}}",
"{{WP248_K8_INNOVATIVE_TECHNOLOGIE}}",
"{{WP248_K9_RECHTSAUSUEBUNG_HINDERT}}",
"{{SCHWELLWERT_ERGEBNIS}}",
"{{MUSS_LISTEN_REFERENZ}}",
"{{VERARBEITUNG_TITEL}}",
"{{VERARBEITUNG_BESCHREIBUNG}}",
"{{VERARBEITUNG_UMFANG}}",
"{{VERARBEITUNG_KONTEXT}}",
"{{VERARBEITUNGSMITTEL}}",
"{{ZWECK_VERARBEITUNG}}",
"{{RECHTSGRUNDLAGE}}",
"{{RECHTSGRUNDLAGE_DETAILS}}",
"{{DATENKATEGORIEN}}",
"{{BETROFFENENGRUPPEN}}",
"{{EMPFAENGER}}",
"{{DRITTLANDTRANSFER}}",
"{{SPEICHERDAUER}}",
"{{GEMEINSAME_VERANTWORTUNG_DETAILS}}",
"{{AUFTRAGSVERARBEITER_DETAILS}}",
"{{NOTWENDIGKEIT_BEWERTUNG}}",
"{{VERHAELTNISMAESSIGKEIT_BEWERTUNG}}",
"{{DATENMINIMIERUNG_NACHWEIS}}",
"{{ALTERNATIVEN_GEPRUEFT}}",
"{{SPEICHERBEGRENZUNG_NACHWEIS}}",
"{{RISIKO_METHODIK}}",
"{{RISIKEN_TABELLE}}",
"{{GESAMT_RISIKO_NIVEAU}}",
"{{KONSULTATION_BETROFFENE}}",
"{{KONSULTATION_BETRIEBSRAT}}",
"{{TOM_VERFUEGBARKEIT}}",
"{{TOM_INTEGRITAET}}",
"{{TOM_VERTRAULICHKEIT}}",
"{{TOM_NICHTVERKETTUNG}}",
"{{TOM_TRANSPARENZ}}",
"{{TOM_INTERVENIERBARKEIT}}",
"{{TOM_DATENMINIMIERUNG}}",
"{{DSB_STELLUNGNAHME}}",
"{{DSB_DATUM}}",
"{{ART36_BEGRUENDUNG}}",
"{{DSFA_ERGEBNIS}}",
"{{RESTRISIKO_BEWERTUNG}}",
"{{UEBERPRUFUNGSINTERVALL}}",
"{{NAECHSTE_UEBERPRUFUNG}}",
"{{AENDERUNGSTRIGGER}}",
"{{KI_SYSTEME_DETAILS}}",
"{{KI_GRUNDRECHTSPRUEFUNG}}"
]' AS jsonb),
$template$# Datenschutz-Folgenabschaetzung (DSFA)
**gemaess Art. 35 DS-GVO**
---
## 0. Schwellwertanalyse
Vor Durchfuehrung einer vollstaendigen DSFA ist zu pruefen, ob die geplante Verarbeitung eine solche erfordert. Die Pruefung erfolgt anhand der neun Kriterien der WP29/EDPB-Leitlinien (WP 248 rev.01) sowie der Muss-Liste der zustaendigen Aufsichtsbehoerde.
### 0.1 WP248-Kriterien (Art. 29-Datenschutzgruppe)
Sobald mindestens **zwei** der folgenden Kriterien zutreffen, ist eine DSFA in der Regel erforderlich.
| Nr. | Kriterium | Zutreffend? | Begruendung |
|-----|-----------|-------------|-------------|
| K1 | Bewertung oder Scoring (einschl. Profiling und Prognose) | {{WP248_K1_BEWERTUNG_SCORING}} | |
| K2 | Automatisierte Entscheidungsfindung mit Rechtswirkung oder aehnlich erheblicher Wirkung | {{WP248_K2_AUTOMATISIERTE_ENTSCHEIDUNG}} | |
| K3 | Systematische Ueberwachung von Personen | {{WP248_K3_SYSTEMATISCHE_UEBERWACHUNG}} | |
| K4 | Verarbeitung sensibler Daten oder hoechst persoenlicher Daten (Art. 9, 10 DS-GVO) | {{WP248_K4_SENSIBLE_DATEN}} | |
| K5 | Datenverarbeitung in grossem Umfang | {{WP248_K5_GROSSER_UMFANG}} | |
| K6 | Verknuepfung oder Zusammenfuehrung von Datenbestaenden | {{WP248_K6_DATENVERKNUEPFUNG}} | |
| K7 | Daten zu schutzbeduerftigen Betroffenen (Kinder, Beschaeftigte, Patienten) | {{WP248_K7_SCHUTZBEDUERFTIGE_BETROFFENE}} | |
| K8 | Innovative Nutzung oder Anwendung neuer technologischer Loesungen | {{WP248_K8_INNOVATIVE_TECHNOLOGIE}} | |
| K9 | Verarbeitung, die Betroffene an der Ausuebung eines Rechts oder der Nutzung einer Dienstleistung hindert | {{WP248_K9_RECHTSAUSUEBUNG_HINDERT}} | |
### 0.2 Muss-Liste der Aufsichtsbehoerde
**Bundesland:** {{BUNDESLAND}}
**Zustaendige Aufsichtsbehoerde:** {{AUFSICHTSBEHOERDE}}
**Referenz:** {{MUSS_LISTEN_REFERENZ}}
### 0.3 Ergebnis der Schwellwertanalyse
{{SCHWELLWERT_ERGEBNIS}}
---
## 1. Allgemeine Informationen und Verarbeitungsbeschreibung
| Feld | Inhalt |
|------|--------|
| **Organisation** | {{ORGANISATION_NAME}} |
| **Adresse** | {{ORGANISATION_ADRESSE}} |
| **Datenschutzbeauftragter** | {{DSB_NAME}} |
| **DSB-Kontakt** | {{DSB_KONTAKT}} |
| **Erstellt von** | {{ERSTELLT_VON}} |
| **Erstellt am** | {{ERSTELLT_AM}} |
{{#IF GENEHMIGT_VON}}| **Genehmigt von** | {{GENEHMIGT_VON}} |
| **Genehmigt am** | {{GENEHMIGT_AM}} |
{{/IF}}
### 1.1 Bezeichnung der Verarbeitungstaetigkeit
**{{VERARBEITUNG_TITEL}}**
### 1.2 Beschreibung der Verarbeitung
{{VERARBEITUNG_BESCHREIBUNG}}
### 1.3 Umfang und Kontext
| Aspekt | Beschreibung |
|--------|--------------|
| **Umfang** | {{VERARBEITUNG_UMFANG}} |
| **Kontext** | {{VERARBEITUNG_KONTEXT}} |
| **Eingesetzte Verarbeitungsmittel** | {{VERARBEITUNGSMITTEL}} |
### 1.4 Zweck der Verarbeitung
{{ZWECK_VERARBEITUNG}}
### 1.5 Rechtsgrundlage
**Rechtsgrundlage:** {{RECHTSGRUNDLAGE}}
{{#IF RECHTSGRUNDLAGE_DETAILS}}
**Erlaeuterung:** {{RECHTSGRUNDLAGE_DETAILS}}
{{/IF}}
### 1.6 Verarbeitete Datenkategorien
{{DATENKATEGORIEN}}
### 1.7 Betroffene Personengruppen
{{BETROFFENENGRUPPEN}}
### 1.8 Empfaenger und Auftragsverarbeiter
{{EMPFAENGER}}
{{#IF DRITTLANDTRANSFER}}
### 1.9 Uebermittlung in Drittlaender
{{DRITTLANDTRANSFER}}
{{/IF}}
### 1.10 Speicherdauer und Loeschfristen
{{SPEICHERDAUER}}
{{#IF GEMEINSAME_VERANTWORTUNG_DETAILS}}
### 1.11 Gemeinsame Verantwortlichkeit (Art. 26 DS-GVO)
{{GEMEINSAME_VERANTWORTUNG_DETAILS}}
{{/IF}}
{{#IF AUFTRAGSVERARBEITER_DETAILS}}
### 1.12 Auftragsverarbeitung (Art. 28 DS-GVO)
{{AUFTRAGSVERARBEITER_DETAILS}}
{{/IF}}
---
## 2. Notwendigkeit und Verhaeltnismaessigkeit
### 2.1 Notwendigkeit der Verarbeitung
{{NOTWENDIGKEIT_BEWERTUNG}}
### 2.2 Verhaeltnismaessigkeit
{{VERHAELTNISMAESSIGKEIT_BEWERTUNG}}
### 2.3 Pruefung der Grundsaetze (Art. 5 DS-GVO)
| Grundsatz | Einhaltung | Nachweis |
|-----------|------------|----------|
| **Zweckbindung** (Art. 5 Abs. 1 lit. b) | Die Verarbeitung erfolgt ausschliesslich fuer die angegebenen Zwecke. | Siehe Abschnitt 1.4 |
| **Datenminimierung** (Art. 5 Abs. 1 lit. c) | {{DATENMINIMIERUNG_NACHWEIS}} | |
| **Richtigkeit** (Art. 5 Abs. 1 lit. d) | Verfahren zur Sicherstellung der Datenqualitaet sind implementiert. | |
| **Speicherbegrenzung** (Art. 5 Abs. 1 lit. e) | {{SPEICHERBEGRENZUNG_NACHWEIS}} | |
| **Integritaet und Vertraulichkeit** (Art. 5 Abs. 1 lit. f) | Technische und organisatorische Massnahmen gemaess Abschnitt 5 umgesetzt. | Siehe Abschnitt 5 |
### 2.4 Pruefung alternativer Verarbeitungsmoeglichkeiten
{{ALTERNATIVEN_GEPRUEFT}}
---
## 3. Risikobewertung
### 3.1 Methodik
{{RISIKO_METHODIK}}
Die Risikobewertung erfolgt anhand zweier Dimensionen:
- **Schwere des Schadens** fuer die Betroffenen (gering / ueberschaubar / substanziell / gross)
- **Eintrittswahrscheinlichkeit** (gering / mittel / hoch / sehr hoch)
| | Schwere: Gering | Schwere: Ueberschaubar | Schwere: Substanziell | Schwere: Gross |
|---|---|---|---|---|
| **Wahrscheinlichkeit: Sehr hoch** | Mittel | Hoch | Sehr hoch | Sehr hoch |
| **Wahrscheinlichkeit: Hoch** | Niedrig | Mittel | Hoch | Sehr hoch |
| **Wahrscheinlichkeit: Mittel** | Niedrig | Niedrig | Mittel | Hoch |
| **Wahrscheinlichkeit: Gering** | Niedrig | Niedrig | Niedrig | Mittel |
### 3.2 Identifizierte Risiken
{{RISIKEN_TABELLE}}
### 3.3 Gesamtrisikobewertung
{{GESAMT_RISIKO_NIVEAU}}
---
## 4. Konsultation der Betroffenen und Interessentraeger
### 4.1 Konsultation der Betroffenen (Art. 35 Abs. 9 DS-GVO)
{{#IF KONSULTATION_BETROFFENE}}
{{KONSULTATION_BETROFFENE}}
{{/IF}}
{{#IF_NOT KONSULTATION_BETROFFENE}}
Eine Konsultation der Betroffenen wurde nicht durchgefuehrt. Begruendung: [Bitte ergaenzen z. B. Unverhaeltnismaessigkeit, Geheimhaltungsinteressen, fehlende Praktikabilitaet].
{{/IF_NOT}}
{{#IF KONSULTATION_BETRIEBSRAT}}
### 4.2 Beteiligung der Arbeitnehmervertretung
{{KONSULTATION_BETRIEBSRAT}}
{{/IF}}
---
## 5. Technische und organisatorische Massnahmen (TOM)
Die Massnahmen sind nach den sieben Gewaehrleistungszielen des Standard-Datenschutzmodells (SDM V3.1a) strukturiert.
### 5.1 Verfuegbarkeit
Ziel: Personenbezogene Daten stehen zeitgerecht zur Verfuegung und koennen ordnungsgemaess verarbeitet werden.
{{TOM_VERFUEGBARKEIT}}
### 5.2 Integritaet
Ziel: Personenbezogene Daten bleiben waehrend der Verarbeitung unversehrt, vollstaendig und aktuell.
{{TOM_INTEGRITAET}}
### 5.3 Vertraulichkeit
Ziel: Nur befugte Personen koennen personenbezogene Daten zur Kenntnis nehmen.
{{TOM_VERTRAULICHKEIT}}
### 5.4 Nichtverkettung
Ziel: Personenbezogene Daten werden nur fuer den Zweck verarbeitet, zu dem sie erhoben wurden.
{{TOM_NICHTVERKETTUNG}}
### 5.5 Transparenz
Ziel: Betroffene, der Verantwortliche und die Aufsichtsbehoerde koennen die Verarbeitung nachvollziehen.
{{TOM_TRANSPARENZ}}
### 5.6 Intervenierbarkeit
Ziel: Betroffenenrechte (Auskunft, Berichtigung, Loeschung, Widerspruch) koennen wirksam ausgeuebt werden.
{{TOM_INTERVENIERBARKEIT}}
### 5.7 Datenminimierung
Ziel: Die Verarbeitung beschraenkt sich auf das erforderliche Mass.
{{TOM_DATENMINIMIERUNG}}
---
## 6. Stellungnahme des Datenschutzbeauftragten
### 6.1 Konsultation des DSB
{{DSB_STELLUNGNAHME}}
{{#IF DSB_DATUM}}
**Datum der Stellungnahme:** {{DSB_DATUM}}
{{/IF}}
### 6.2 Pruefung der Konsultationspflicht (Art. 36 DS-GVO)
Sofern das Restrisiko nach Umsetzung aller Massnahmen **hoch** bleibt, ist vor Beginn der Verarbeitung die zustaendige Aufsichtsbehoerde zu konsultieren (Art. 36 Abs. 1 DS-GVO).
{{#IF ART36_BEGRUENDUNG}}
{{ART36_BEGRUENDUNG}}
{{/IF}}
{{#IF_NOT ART36_BEGRUENDUNG}}
Nach Umsetzung der beschriebenen Massnahmen wird das Restrisiko als akzeptabel eingestuft. Eine Konsultation der Aufsichtsbehoerde ist nicht erforderlich.
{{/IF_NOT}}
---
## 7. Ergebnis und Ueberprufungsplan
### 7.1 Ergebnis der DSFA
{{DSFA_ERGEBNIS}}
### 7.2 Restrisikobewertung
{{RESTRISIKO_BEWERTUNG}}
### 7.3 Ueberprufungsplan
| Aspekt | Festlegung |
|--------|------------|
| **Regelmaessiges Ueberprufungsintervall** | {{UEBERPRUFUNGSINTERVALL}} |
| **Naechste geplante Ueberprufung** | {{NAECHSTE_UEBERPRUFUNG}} |
### 7.4 Trigger fuer ausserplanmaessige Ueberprufung
{{AENDERUNGSTRIGGER}}
---
{{#IF KI_SYSTEME_DETAILS}}
## 8. KI-spezifisches Modul (EU AI Act)
Dieses Kapitel ist relevant, da KI-Systeme in der beschriebenen Verarbeitung eingesetzt werden.
### 8.1 Eingesetzte KI-Systeme
{{KI_SYSTEME_DETAILS}}
### 8.2 Grundrechtliche Folgenabschaetzung (Art. 27 KI-VO)
{{KI_GRUNDRECHTSPRUEFUNG}}
{{/IF}}
---
## Unterschriften
| Rolle | Name | Datum | Unterschrift |
|-------|------|-------|--------------|
| Erstellt von | {{ERSTELLT_VON}} | {{ERSTELLT_AM}} | _________________ |
{{#IF GENEHMIGT_VON}}| Datenschutzbeauftragter | {{GENEHMIGT_VON}} | {{GENEHMIGT_AM}} | _________________ |
{{/IF}}
| Verantwortlicher | | | _________________ |
---
*Erstellt mit BreakPilot Compliance. Dieses Dokument ist vertraulich und nur fuer den internen Gebrauch bestimmt.*
$template$
) ON CONFLICT DO NOTHING;

View File

@@ -0,0 +1,247 @@
-- Migration 002: TOM Template V2 — nach SDM-Gewaehrleistungszielen
-- Archiviert V1 und fuegt SDM-strukturierte TOM-Dokumentation ein.
-- 1. Bestehende V1 archivieren
UPDATE compliance.compliance_legal_templates
SET status = 'archived', updated_at = NOW()
WHERE document_type = 'tom_documentation'
AND status = 'published';
-- 2. TOM V2 einfuegen
INSERT INTO compliance.compliance_legal_templates (
tenant_id, document_type, title, description, language, jurisdiction,
version, status, license_name, source_name, attribution_required,
is_complete_document, placeholders, content
) VALUES (
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
'tom_documentation',
'Technische und Organisatorische Massnahmen (TOM) nach SDM V3.1a',
'TOM-Dokumentation strukturiert nach den sieben Gewaehrleistungszielen des Standard-Datenschutzmodells (SDM V3.1a). Mit sektorspezifischen Ergaenzungen und Compliance-Bewertung.',
'de',
'EU/DSGVO',
'2.0',
'published',
'MIT',
'BreakPilot Compliance',
false,
true,
CAST('[
"{{ORGANISATION_NAME}}",
"{{ORGANISATION_ADRESSE}}",
"{{DSB_NAME}}",
"{{DSB_KONTAKT}}",
"{{ERSTELLT_VON}}",
"{{ERSTELLT_AM}}",
"{{VERSION}}",
"{{GELTUNGSBEREICH}}",
"{{SCHUTZBEDARF_VERTRAULICHKEIT}}",
"{{SCHUTZBEDARF_INTEGRITAET}}",
"{{SCHUTZBEDARF_VERFUEGBARKEIT}}",
"{{GESAMTSCHUTZNIVEAU}}",
"{{TOM_VERFUEGBARKEIT}}",
"{{TOM_INTEGRITAET}}",
"{{TOM_VERTRAULICHKEIT}}",
"{{TOM_NICHTVERKETTUNG}}",
"{{TOM_TRANSPARENZ}}",
"{{TOM_INTERVENIERBARKEIT}}",
"{{TOM_DATENMINIMIERUNG}}",
"{{TOM_SEKTOR_ERGAENZUNGEN}}",
"{{COMPLIANCE_BEWERTUNG}}",
"{{NAECHSTE_UEBERPRUFUNG}}",
"{{UEBERPRUFUNGSINTERVALL}}"
]' AS jsonb),
$template$# Technische und Organisatorische Massnahmen (TOM)
**gemaess Art. 32 DS-GVO strukturiert nach SDM V3.1a**
---
## 1. Allgemeine Informationen
| Feld | Inhalt |
|------|--------|
| **Organisation** | {{ORGANISATION_NAME}} |
| **Adresse** | {{ORGANISATION_ADRESSE}} |
| **Datenschutzbeauftragter** | {{DSB_NAME}} ({{DSB_KONTAKT}}) |
| **Erstellt von** | {{ERSTELLT_VON}} |
| **Erstellt am** | {{ERSTELLT_AM}} |
| **Version** | {{VERSION}} |
### 1.1 Geltungsbereich
{{GELTUNGSBEREICH}}
---
## 2. Schutzbedarfsanalyse
Die Schutzbedarfsanalyse bildet die Grundlage fuer die Auswahl angemessener Massnahmen. Der Schutzbedarf wird fuer die drei klassischen Schutzziele bewertet.
| Schutzziel | Schutzbedarf | Begruendung |
|------------|-------------|-------------|
| **Vertraulichkeit** | {{SCHUTZBEDARF_VERTRAULICHKEIT}} | |
| **Integritaet** | {{SCHUTZBEDARF_INTEGRITAET}} | |
| **Verfuegbarkeit** | {{SCHUTZBEDARF_VERFUEGBARKEIT}} | |
**Gesamtschutzniveau:** {{GESAMTSCHUTZNIVEAU}}
*Bewertungsskala: normal / hoch / sehr hoch*
---
## 3. Massnahmen nach SDM-Gewaehrleistungszielen
Die folgende Struktur folgt den sieben Gewaehrleistungszielen des Standard-Datenschutzmodells (SDM V3.1a) der Datenschutzkonferenz.
### 3.1 Verfuegbarkeit
**Ziel:** Personenbezogene Daten stehen zeitgerecht zur Verfuegung und koennen ordnungsgemaess verarbeitet werden.
**Referenz:** SDM-Baustein 11 (Aufbewahren)
{{TOM_VERFUEGBARKEIT}}
| Massnahme | Typ | Status | Verantwortlich | Pruefintervall |
|-----------|-----|--------|----------------|----------------|
| Redundante Datenhaltung (RAID, Replikation) | technisch | | IT-Betrieb | 12 Monate |
| Regelmaessige Backups (taeglich inkrementell, woechentlich voll) | technisch | | IT-Betrieb | 6 Monate |
| Disaster-Recovery-Plan mit dokumentierten RTO/RPO | organisatorisch | | IT-Sicherheit | 12 Monate |
| USV und Notstromversorgung | technisch | | Facility Mgmt | 12 Monate |
| Wiederherstellungstests (mind. jaehrlich) | organisatorisch | | IT-Betrieb | 12 Monate |
### 3.2 Integritaet
**Ziel:** Personenbezogene Daten bleiben waehrend der Verarbeitung unversehrt, vollstaendig und aktuell.
**Referenz:** SDM-Baustein 61 (Berichtigen)
{{TOM_INTEGRITAET}}
| Massnahme | Typ | Status | Verantwortlich | Pruefintervall |
|-----------|-----|--------|----------------|----------------|
| Pruefsummen und digitale Signaturen | technisch | | IT-Entwicklung | 12 Monate |
| Eingabevalidierung und Plausibilitaetspruefungen | technisch | | IT-Entwicklung | bei Release |
| Change-Management-Verfahren | organisatorisch | | IT-Betrieb | 12 Monate |
| Versionierung von Datensaetzen | technisch | | IT-Entwicklung | 12 Monate |
### 3.3 Vertraulichkeit
**Ziel:** Nur befugte Personen koennen personenbezogene Daten zur Kenntnis nehmen.
**Referenz:** SDM-Baustein 51 (Zugriffe regeln)
{{TOM_VERTRAULICHKEIT}}
| Massnahme | Typ | Status | Verantwortlich | Pruefintervall |
|-----------|-----|--------|----------------|----------------|
| Verschluesselung im Transit (TLS 1.3) | technisch | | IT-Sicherheit | 12 Monate |
| Verschluesselung at Rest (AES-256) | technisch | | IT-Sicherheit | 12 Monate |
| Rollenbasiertes Zugriffskonzept (RBAC, Least Privilege) | technisch | | IT-Sicherheit | 6 Monate |
| Multi-Faktor-Authentifizierung (MFA) | technisch | | IT-Sicherheit | 12 Monate |
| Physische Zutrittskontrolle (Schluessel, Kartenleser) | technisch | | Facility Mgmt | 12 Monate |
| Vertraulichkeitsverpflichtung Mitarbeitende | organisatorisch | | HR / DSB | bei Eintritt |
| Passwortrichtlinie (Komplexitaet, Ablauf, Historie) | organisatorisch | | IT-Sicherheit | 12 Monate |
### 3.4 Nichtverkettung
**Ziel:** Personenbezogene Daten werden nur fuer den Zweck verarbeitet, zu dem sie erhoben wurden.
**Referenz:** SDM-Baustein 50 (Trennen)
{{TOM_NICHTVERKETTUNG}}
| Massnahme | Typ | Status | Verantwortlich | Pruefintervall |
|-----------|-----|--------|----------------|----------------|
| Mandantentrennung (logisch oder physisch) | technisch | | IT-Architektur | 12 Monate |
| Pseudonymisierung wo fachlich moeglich | technisch | | IT-Entwicklung | 12 Monate |
| Zweckbindungspruefung bei neuen Datennutzungen | organisatorisch | | DSB | bei Bedarf |
| Getrennte Datenbanken je Verarbeitungszweck | technisch | | IT-Architektur | 12 Monate |
### 3.5 Transparenz
**Ziel:** Betroffene, der Verantwortliche und die Aufsichtsbehoerde koennen die Verarbeitung nachvollziehen.
**Referenz:** SDM-Baustein 42 (Dokumentieren), SDM-Baustein 43 (Protokollieren)
{{TOM_TRANSPARENZ}}
| Massnahme | Typ | Status | Verantwortlich | Pruefintervall |
|-----------|-----|--------|----------------|----------------|
| Verzeichnis der Verarbeitungstaetigkeiten (Art. 30) | organisatorisch | | DSB | 12 Monate |
| Vollstaendiges Audit-Log aller Datenzugriffe | technisch | | IT-Betrieb | 6 Monate |
| Datenschutzerklaerung (Art. 13/14 DS-GVO) | organisatorisch | | DSB / Recht | bei Aenderung |
| Dokumentierte Prozesse fuer Datenpannen-Meldung | organisatorisch | | DSB | 12 Monate |
### 3.6 Intervenierbarkeit
**Ziel:** Betroffenenrechte (Auskunft, Berichtigung, Loeschung, Widerspruch) koennen wirksam ausgeuebt werden.
**Referenz:** SDM-Baustein 60 (Loeschen), SDM-Baustein 61 (Berichtigen), SDM-Baustein 62 (Einschraenken)
{{TOM_INTERVENIERBARKEIT}}
| Massnahme | Typ | Status | Verantwortlich | Pruefintervall |
|-----------|-----|--------|----------------|----------------|
| Prozess fuer Betroffenenanfragen (Auskunft, Loeschung, Berichtigung) | organisatorisch | | DSB | 12 Monate |
| Technische Loeschfaehigkeit mit Nachweis | technisch | | IT-Entwicklung | 12 Monate |
| Datenexport in maschinenlesbarem Format (Art. 20) | technisch | | IT-Entwicklung | 12 Monate |
| Sperrfunktion (Einschraenkung der Verarbeitung) | technisch | | IT-Entwicklung | 12 Monate |
| Widerspruchsmoeglichkeit gegen Verarbeitung | organisatorisch | | DSB | 12 Monate |
### 3.7 Datenminimierung
**Ziel:** Die Verarbeitung beschraenkt sich auf das erforderliche Mass.
**Referenz:** SDM-Baustein 41 (Planen und Spezifizieren)
{{TOM_DATENMINIMIERUNG}}
| Massnahme | Typ | Status | Verantwortlich | Pruefintervall |
|-----------|-----|--------|----------------|----------------|
| Regelmaessige Pruefung der Erforderlichkeit | organisatorisch | | DSB | 12 Monate |
| Automatisierte Loeschung nach Fristablauf | technisch | | IT-Entwicklung | 6 Monate |
| Anonymisierung fuer statistische Zwecke | technisch | | IT-Entwicklung | bei Bedarf |
| Privacy by Design bei neuen Verarbeitungen | organisatorisch | | IT-Architektur / DSB | bei Bedarf |
| Loeschfristenkatalog (dokumentiert) | organisatorisch | | DSB / Recht | 12 Monate |
---
## 4. Sektorspezifische Ergaenzungen
{{#IF TOM_SEKTOR_ERGAENZUNGEN}}
{{TOM_SEKTOR_ERGAENZUNGEN}}
{{/IF}}
{{#IF_NOT TOM_SEKTOR_ERGAENZUNGEN}}
Keine sektorspezifischen Ergaenzungen erforderlich.
{{/IF_NOT}}
---
## 5. Compliance-Bewertung
{{#IF COMPLIANCE_BEWERTUNG}}
{{COMPLIANCE_BEWERTUNG}}
{{/IF}}
{{#IF_NOT COMPLIANCE_BEWERTUNG}}
Die Compliance-Bewertung erfolgt nach erstmaliger Implementierung aller Massnahmen.
{{/IF_NOT}}
---
## 6. Ueberprufungsplan
| Aspekt | Festlegung |
|--------|------------|
| **Regelmaessige Ueberprufung** | {{UEBERPRUFUNGSINTERVALL}} |
| **Naechste geplante Ueberprufung** | {{NAECHSTE_UEBERPRUFUNG}} |
**Trigger fuer ausserplanmaessige Ueberprufung:**
- Sicherheitsvorfall oder Datenpanne
- Wesentliche Aenderung der Verarbeitungssysteme
- Neue regulatorische Anforderungen (z. B. NIS2, AI Act)
- Ergebnisse interner oder externer Audits
---
*Erstellt mit BreakPilot Compliance. Struktur basiert auf dem Standard-Datenschutzmodell (SDM V3.1a) der Datenschutzkonferenz.*
$template$
) ON CONFLICT DO NOTHING;

View File

@@ -0,0 +1,663 @@
-- Migration 003: VVT Sector Templates — Branchenspezifische Verarbeitungsverzeichnisse
-- 6 Branchen-Muster + 1 allgemeine V2-Vorlage
-- 1. Bestehende V1 archivieren
UPDATE compliance.compliance_legal_templates
SET status = 'archived', updated_at = NOW()
WHERE document_type = 'vvt_register'
AND status = 'published';
-- 2. Allgemeine VVT V2 Vorlage (branchenuebergreifend)
INSERT INTO compliance.compliance_legal_templates (
tenant_id, document_type, title, description, language, jurisdiction,
version, status, license_name, source_name, attribution_required,
is_complete_document, placeholders, content
) VALUES (
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
'vvt_register',
'Verzeichnis von Verarbeitungstaetigkeiten (VVT) gemaess Art. 30 DS-GVO — V2',
'Erweiterte VVT-Vorlage mit vollstaendiger Art. 30 Struktur, Loeschfristen-Integration und DSFA-Verweis. Branchenuebergreifend einsetzbar.',
'de',
'EU/DSGVO',
'2.0',
'published',
'MIT',
'BreakPilot Compliance',
false,
true,
CAST('[
"{{ORGANISATION_NAME}}",
"{{ORGANISATION_ADRESSE}}",
"{{VERTRETER_NAME}}",
"{{DSB_NAME}}",
"{{DSB_KONTAKT}}",
"{{ERSTELLT_AM}}",
"{{VERSION}}",
"{{VVT_NR}}",
"{{VERARBEITUNG_NAME}}",
"{{VERARBEITUNG_BESCHREIBUNG}}",
"{{ZWECKE}}",
"{{RECHTSGRUNDLAGEN}}",
"{{BETROFFENE}}",
"{{DATENKATEGORIEN}}",
"{{EMPFAENGER}}",
"{{DRITTLAND}}",
"{{DRITTLAND_GARANTIEN}}",
"{{LOESCHFRISTEN}}",
"{{TOM_REFERENZ}}",
"{{SYSTEME}}",
"{{VERANTWORTLICHER}}",
"{{RISIKOBEWERTUNG}}",
"{{DSFA_ERFORDERLICH}}",
"{{LETZTE_PRUEFUNG}}",
"{{NAECHSTE_PRUEFUNG}}",
"{{STATUS}}"
]' AS jsonb),
$template$# Verzeichnis von Verarbeitungstaetigkeiten (VVT)
**gemaess Art. 30 DS-GVO**
---
## Angaben zum Verantwortlichen
| Feld | Inhalt |
|------|--------|
| **Name / Firma** | {{ORGANISATION_NAME}} |
| **Adresse** | {{ORGANISATION_ADRESSE}} |
| **Vertreter des Verantwortlichen** | {{VERTRETER_NAME}} |
| **Datenschutzbeauftragter** | {{DSB_NAME}} ({{DSB_KONTAKT}}) |
| **Stand** | {{ERSTELLT_AM}} |
| **Version** | {{VERSION}} |
---
## Verarbeitungstaetigkeit
### Stammdaten
| Pflichtfeld (Art. 30) | Inhalt |
|------------------------|--------|
| **VVT-Nr.** | {{VVT_NR}} |
| **Bezeichnung** | {{VERARBEITUNG_NAME}} |
| **Beschreibung** | {{VERARBEITUNG_BESCHREIBUNG}} |
### Zweck und Rechtsgrundlage
| Pflichtfeld | Inhalt |
|-------------|--------|
| **Zweck(e) der Verarbeitung** | {{ZWECKE}} |
| **Rechtsgrundlage(n)** | {{RECHTSGRUNDLAGEN}} |
### Betroffene und Daten
| Pflichtfeld | Inhalt |
|-------------|--------|
| **Kategorien betroffener Personen** | {{BETROFFENE}} |
| **Kategorien personenbezogener Daten** | {{DATENKATEGORIEN}} |
### Empfaenger und Uebermittlung
| Pflichtfeld | Inhalt |
|-------------|--------|
| **Kategorien von Empfaengern** | {{EMPFAENGER}} |
{{#IF DRITTLAND}}
| **Uebermittlung in Drittlaender** | {{DRITTLAND}} |
| **Geeignete Garantien (Art. 46)** | {{DRITTLAND_GARANTIEN}} |
{{/IF}}
### Fristen und Schutzmassnahmen
| Pflichtfeld | Inhalt |
|-------------|--------|
| **Loeschfristen** | {{LOESCHFRISTEN}} |
| **TOM-Beschreibung (Art. 32)** | {{TOM_REFERENZ}} |
### Zusaetzliche Angaben (empfohlen)
| Feld | Inhalt |
|------|--------|
| **Eingesetzte Systeme** | {{SYSTEME}} |
| **Verantwortliche Abteilung** | {{VERANTWORTLICHER}} |
| **Risikobewertung** | {{RISIKOBEWERTUNG}} |
| **DSFA erforderlich?** | {{DSFA_ERFORDERLICH}} |
| **Letzte Pruefung** | {{LETZTE_PRUEFUNG}} |
| **Naechste Pruefung** | {{NAECHSTE_PRUEFUNG}} |
| **Status** | {{STATUS}} |
---
*Erstellt mit BreakPilot Compliance. Struktur entspricht Art. 30 Abs. 1 DS-GVO.*
$template$
) ON CONFLICT DO NOTHING;
-- 3. VVT Branchenvorlage: IT / SaaS
INSERT INTO compliance.compliance_legal_templates (
tenant_id, document_type, title, description, language, jurisdiction,
version, status, license_name, source_name, attribution_required,
is_complete_document, placeholders, content
) VALUES (
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
'vvt_register',
'VVT Branchenvorlage: IT / SaaS-Unternehmen',
'Vorbefuelltes Verarbeitungsverzeichnis mit typischen Verarbeitungstaetigkeiten eines IT- oder SaaS-Unternehmens. Enthalt 8 Standard-Verarbeitungen.',
'de', 'EU/DSGVO', '2.0', 'published', 'MIT', 'BreakPilot Compliance', false, true,
'[]'::jsonb,
$template$# VVT Branchenvorlage: IT / SaaS-Unternehmen
Die folgenden Verarbeitungstaetigkeiten sind typisch fuer IT- und SaaS-Unternehmen. Bitte pruefen und an Ihre konkrete Situation anpassen.
---
## VVT-001: SaaS-Plattformbetrieb
| Feld | Inhalt |
|------|--------|
| **Zweck** | Bereitstellung und Betrieb der SaaS-Plattform fuer Kunden |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO (Vertragserfullung) |
| **Betroffene** | Kunden, Endnutzer der Plattform |
| **Datenkategorien** | Stammdaten, Nutzungsdaten, Inhaltsdaten, technische Logdaten |
| **Empfaenger** | Hosting-Anbieter (AVV), Support-Dienstleister (AVV) |
| **Loeschfrist** | 90 Tage nach Vertragsende + gesetzliche Aufbewahrungsfristen |
| **TOM** | Siehe TOM-Dokumentation: Mandantentrennung, Verschluesselung, RBAC |
| **DSFA erforderlich?** | Abhaengig von Art und Umfang der verarbeiteten Daten |
## VVT-002: Kundenverwaltung / CRM
| Feld | Inhalt |
|------|--------|
| **Zweck** | Verwaltung von Kundenbeziehungen, Vertragsmanagement |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO (Vertragserfullung) |
| **Betroffene** | Kunden, Ansprechpartner, Interessenten |
| **Datenkategorien** | Kontaktdaten, Vertragsdaten, Kommunikationshistorie |
| **Empfaenger** | CRM-Anbieter (AVV), ggf. Vertriebspartner |
| **Loeschfrist** | 3 Jahre nach letztem Kontakt (Verjaeherung), 10 Jahre Rechnungsdaten (HGB/AO) |
| **TOM** | Zugriffsbeschraenkung auf Vertrieb/Support, Protokollierung |
## VVT-003: E-Mail-Marketing / Newsletter
| Feld | Inhalt |
|------|--------|
| **Zweck** | Versand von Produkt-Updates, Marketing-Newsletter |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. a DS-GVO (Einwilligung) + UWG §7 |
| **Betroffene** | Newsletter-Abonnenten |
| **Datenkategorien** | E-Mail-Adresse, Name, Oeffnungs-/Klickverhalten |
| **Empfaenger** | E-Mail-Dienstleister (AVV) |
| **Loeschfrist** | Unverzueglich nach Widerruf der Einwilligung |
| **TOM** | Double-Opt-In, einfache Abmeldefunktion |
## VVT-004: Webanalyse
| Feld | Inhalt |
|------|--------|
| **Zweck** | Analyse der Website-Nutzung zur Verbesserung des Angebots |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. a DS-GVO (Einwilligung via Cookie-Banner) |
| **Betroffene** | Website-Besucher |
| **Datenkategorien** | IP-Adresse (anonymisiert), Seitenaufrufe, Verweildauer, Geraeteinformationen |
| **Empfaenger** | Analyse-Anbieter (AVV) |
| **Loeschfrist** | 14 Monate (max. Cookie-Laufzeit) |
| **TOM** | IP-Anonymisierung, Cookie-Consent-Management (TDDDG §25) |
## VVT-005: Bewerbermanagement
| Feld | Inhalt |
|------|--------|
| **Zweck** | Bearbeitung von Bewerbungen, Auswahlverfahren |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO i.V.m. §26 BDSG (Beschaeftigungsverhaeltnis) |
| **Betroffene** | Bewerberinnen und Bewerber |
| **Datenkategorien** | Kontaktdaten, Lebenslauf, Qualifikationen, Bewerbungsunterlagen |
| **Empfaenger** | Fachabteilung, ggf. Personaldienstleister (AVV) |
| **Loeschfrist** | 6 Monate nach Abschluss des Verfahrens (AGG-Frist) |
| **TOM** | Zugriffsschutz auf Bewerbungsportal, verschluesselte Uebertragung |
## VVT-006: Mitarbeiterverwaltung / HR
| Feld | Inhalt |
|------|--------|
| **Zweck** | Personalverwaltung, Lohn-/Gehaltsabrechnung, Arbeitszeiterfassung |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b/c DS-GVO i.V.m. §26 BDSG |
| **Betroffene** | Beschaeftigte |
| **Datenkategorien** | Stammdaten, Vertragsdaten, Bankverbindung, Sozialversicherung, Arbeitszeitdaten |
| **Empfaenger** | Lohnbuero (AVV), Finanzamt, Sozialversicherungstraeger |
| **Loeschfrist** | 10 Jahre nach Austritt (steuerliche Aufbewahrung), Personalakte 3 Jahre |
| **TOM** | Besonderer Zugriffsschutz (nur HR), verschluesselte Speicherung |
## VVT-007: Support-Ticketing
| Feld | Inhalt |
|------|--------|
| **Zweck** | Bearbeitung von Kundenanfragen und Stoerungsmeldungen |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO (Vertragserfullung) |
| **Betroffene** | Kunden, Endnutzer |
| **Datenkategorien** | Kontaktdaten, Ticket-Inhalt, Screenshots, Systemlogs |
| **Empfaenger** | Support-Tool-Anbieter (AVV), ggf. Entwicklungsteam |
| **Loeschfrist** | 2 Jahre nach Ticket-Schliessung |
| **TOM** | Rollenbasierter Zugriff, Pseudonymisierung in internen Reports |
## VVT-008: Logging und Monitoring
| Feld | Inhalt |
|------|--------|
| **Zweck** | Sicherheitsueberwachung, Fehleranalyse, Leistungsoptimierung |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. f DS-GVO (berechtigtes Interesse: IT-Sicherheit) |
| **Betroffene** | Nutzer der Plattform, Administratoren |
| **Datenkategorien** | IP-Adressen, Zugriffszeitpunkte, Fehlerprotokolle, Performance-Metriken |
| **Empfaenger** | Log-Management-Anbieter (AVV) |
| **Loeschfrist** | 30 Tage Anwendungslogs, 90 Tage Sicherheitslogs |
| **TOM** | Zugriffsschutz auf Logdaten, automatische Rotation |
---
*Erstellt mit BreakPilot Compliance. Branchenvorlage IT / SaaS.*
$template$
) ON CONFLICT DO NOTHING;
-- 4. VVT Branchenvorlage: Gesundheitswesen
INSERT INTO compliance.compliance_legal_templates (
tenant_id, document_type, title, description, language, jurisdiction,
version, status, license_name, source_name, attribution_required,
is_complete_document, placeholders, content
) VALUES (
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
'vvt_register',
'VVT Branchenvorlage: Gesundheitswesen',
'Vorbefuelltes Verarbeitungsverzeichnis mit typischen Verarbeitungen im Gesundheitswesen (Arztpraxis, MVZ, Klinik). Beruecksichtigt Art. 9 DS-GVO besondere Kategorien.',
'de', 'EU/DSGVO', '2.0', 'published', 'MIT', 'BreakPilot Compliance', false, true,
'[]'::jsonb,
$template$# VVT Branchenvorlage: Gesundheitswesen
Typische Verarbeitungstaetigkeiten fuer Arztpraxen, MVZ und Kliniken. **Besonderheit:** Verarbeitung besonderer Kategorien personenbezogener Daten (Art. 9 DS-GVO Gesundheitsdaten).
---
## VVT-G01: Patientenverwaltung
| Feld | Inhalt |
|------|--------|
| **Zweck** | Fuehrung der Patientenakte, Behandlungsdokumentation |
| **Rechtsgrundlage** | Art. 9 Abs. 2 lit. h DS-GVO i.V.m. §630f BGB (Dokumentationspflicht) |
| **Betroffene** | Patienten |
| **Datenkategorien** | Stammdaten, Versicherungsdaten, Diagnosen, Befunde, Behandlungsverlaeufe (Art. 9) |
| **Empfaenger** | Praxisverwaltungssystem-Anbieter (AVV), Labor (AVV), ueberweisende Aerzte |
| **Loeschfrist** | 10 Jahre nach letzter Behandlung (§630f Abs. 3 BGB), Strahlenpass 30 Jahre |
| **TOM** | Verschluesselung Patientenakte, Zugriffsschutz (nur behandelnde Aerzte), Notfallzugriff |
| **DSFA erforderlich?** | Ja (umfangreiche Verarbeitung Art. 9 Daten) |
## VVT-G02: Terminmanagement
| Feld | Inhalt |
|------|--------|
| **Zweck** | Organisation und Verwaltung von Patienten-Terminen |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO (Behandlungsvertrag) |
| **Betroffene** | Patienten |
| **Datenkategorien** | Name, Kontaktdaten, Terminwunsch, ggf. Behandlungsgrund |
| **Empfaenger** | Online-Terminbuchungs-Anbieter (AVV) |
| **Loeschfrist** | 6 Monate nach Termin (sofern nicht zur Patientenakte) |
| **TOM** | Verschluesselte Uebertragung, Zugriffsschutz Terminkalender |
## VVT-G03: Abrechnung (KV / PKV)
| Feld | Inhalt |
|------|--------|
| **Zweck** | Abrechnung aerztlicher Leistungen gegenueber Krankenkassen / Privatpatienten |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. c DS-GVO (gesetzliche Pflicht), Art. 9 Abs. 2 lit. h |
| **Betroffene** | Patienten |
| **Datenkategorien** | Stammdaten, Versicherungsdaten, Diagnosen (ICD), Leistungsziffern (EBM/GOAe) |
| **Empfaenger** | KV (Kassenaerztliche Vereinigung), PKV, Abrechnungsstelle (AVV) |
| **Loeschfrist** | 10 Jahre (steuerliche Aufbewahrung AO) |
| **TOM** | Verschluesselte Datenuebermittlung (KV-Connect/KIM), Zugriffskontrolle |
## VVT-G04: Laborbefunde
| Feld | Inhalt |
|------|--------|
| **Zweck** | Beauftragung und Empfang von Laboruntersuchungen |
| **Rechtsgrundlage** | Art. 9 Abs. 2 lit. h DS-GVO |
| **Betroffene** | Patienten |
| **Datenkategorien** | Proben-ID, Untersuchungsparameter, Befundergebnisse (Art. 9) |
| **Empfaenger** | Labordienstleister (AVV) |
| **Loeschfrist** | 10 Jahre (Dokumentationspflicht) |
| **TOM** | Pseudonymisierung der Proben, verschluesselte Uebertragung |
## VVT-G05: Mitarbeiterverwaltung
| Feld | Inhalt |
|------|--------|
| **Zweck** | Personalverwaltung, Dienstplanung, Lohnabrechnung |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b/c DS-GVO i.V.m. §26 BDSG |
| **Betroffene** | Beschaeftigte (Aerzte, MFA, Verwaltung) |
| **Datenkategorien** | Stammdaten, Vertragsdaten, Bankverbindung, Dienstzeiten |
| **Empfaenger** | Lohnbuero (AVV), Finanzamt, Sozialversicherungstraeger |
| **Loeschfrist** | 10 Jahre nach Austritt |
| **TOM** | Zugriffsschutz (nur HR/Praxisleitung) |
---
*Erstellt mit BreakPilot Compliance. Branchenvorlage Gesundheitswesen.*
$template$
) ON CONFLICT DO NOTHING;
-- 5. VVT Branchenvorlage: Handel / E-Commerce
INSERT INTO compliance.compliance_legal_templates (
tenant_id, document_type, title, description, language, jurisdiction,
version, status, license_name, source_name, attribution_required,
is_complete_document, placeholders, content
) VALUES (
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
'vvt_register',
'VVT Branchenvorlage: Handel / E-Commerce',
'Vorbefuelltes Verarbeitungsverzeichnis fuer Online-Shops und Einzelhaendler. Beruecksichtigt TDDDG, Fernabsatzrecht und Zahlungsdienste.',
'de', 'EU/DSGVO', '2.0', 'published', 'MIT', 'BreakPilot Compliance', false, true,
'[]'::jsonb,
$template$# VVT Branchenvorlage: Handel / E-Commerce
Typische Verarbeitungstaetigkeiten fuer Online-Shops und Einzelhandel.
---
## VVT-H01: Bestellabwicklung
| Feld | Inhalt |
|------|--------|
| **Zweck** | Bestellannahme, Versand, Rechnungsstellung |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO (Vertragserfullung) |
| **Betroffene** | Kunden (Besteller) |
| **Datenkategorien** | Kontaktdaten, Lieferadresse, Bestelldaten, Rechnungsdaten |
| **Empfaenger** | Versanddienstleister, Zahlungsanbieter (AVV), Warenwirtschaft |
| **Loeschfrist** | 10 Jahre Rechnungsdaten (AO/HGB), 3 Jahre Bestelldaten (Verjaeherung) |
| **TOM** | Verschluesselte Uebertragung, Zugriffsschutz Bestellsystem |
## VVT-H02: Kundenkonto
| Feld | Inhalt |
|------|--------|
| **Zweck** | Bereitstellung eines Kundenkontos (optional, nicht Pflicht) |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. a/b DS-GVO |
| **Betroffene** | Registrierte Kunden |
| **Datenkategorien** | Stammdaten, Passwort (gehasht), Bestellhistorie, Wunschliste |
| **Empfaenger** | Shop-Plattform-Anbieter (AVV) |
| **Loeschfrist** | Unverzueglich nach Kontoloesch-Anfrage, Rechnungsdaten 10 Jahre |
| **TOM** | MFA-Option, sichere Passwortspeicherung (bcrypt), Gastzugang-Alternative |
## VVT-H03: Zahlungsabwicklung
| Feld | Inhalt |
|------|--------|
| **Zweck** | Abwicklung von Zahlungsvorgaengen |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO |
| **Betroffene** | Zahlende Kunden |
| **Datenkategorien** | Zahlungsart, Transaktionsdaten (keine Kartennummern bei Tokenisierung) |
| **Empfaenger** | Payment-Service-Provider (eigene Verantwortung oder AVV) |
| **Loeschfrist** | 10 Jahre (steuerliche Aufbewahrung) |
| **TOM** | PCI-DSS Compliance, Tokenisierung, keine direkte Kartenspeicherung |
## VVT-H04: Newsletter / E-Mail-Marketing
| Feld | Inhalt |
|------|--------|
| **Zweck** | Versand von Angeboten und Produktneuheiten |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. a DS-GVO (Einwilligung) + UWG §7 Abs. 3 (Bestandskunden) |
| **Betroffene** | Newsletter-Abonnenten |
| **Datenkategorien** | E-Mail-Adresse, Name, Kaufhistorie (Bestandskunden), Oeffnungsraten |
| **Empfaenger** | Newsletter-Dienstleister (AVV) |
| **Loeschfrist** | Sofort nach Abmeldung |
| **TOM** | Double-Opt-In, Abmeldelink in jeder E-Mail |
## VVT-H05: Webanalyse und Tracking
| Feld | Inhalt |
|------|--------|
| **Zweck** | Analyse des Nutzerverhaltens im Shop, Conversion-Optimierung |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. a DS-GVO (Einwilligung, TDDDG §25) |
| **Betroffene** | Website-Besucher |
| **Datenkategorien** | Anonymisierte IP, Seitenaufrufe, Klickpfade, Warenkorbdaten |
| **Empfaenger** | Analyse-Anbieter (AVV) |
| **Loeschfrist** | 14 Monate |
| **TOM** | IP-Anonymisierung, Cookie-Consent-Management, Opt-Out |
## VVT-H06: Retouren und Widerruf
| Feld | Inhalt |
|------|--------|
| **Zweck** | Bearbeitung von Retouren und Widerrufen (Fernabsatzrecht) |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b/c DS-GVO |
| **Betroffene** | Kunden (Verbraucher) |
| **Datenkategorien** | Bestelldaten, Retourengrund, Erstattungsdaten |
| **Empfaenger** | Logistikdienstleister, Zahlungsanbieter |
| **Loeschfrist** | 3 Jahre (Verjaeherung), Buchhaltung 10 Jahre |
| **TOM** | Nachvollziehbare Retourenprozesse, Zugriffsbeschraenkung |
---
*Erstellt mit BreakPilot Compliance. Branchenvorlage Handel / E-Commerce.*
$template$
) ON CONFLICT DO NOTHING;
-- 6. VVT Branchenvorlage: Handwerk
INSERT INTO compliance.compliance_legal_templates (
tenant_id, document_type, title, description, language, jurisdiction,
version, status, license_name, source_name, attribution_required,
is_complete_document, placeholders, content
) VALUES (
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
'vvt_register',
'VVT Branchenvorlage: Handwerksbetrieb',
'Vorbefuelltes Verarbeitungsverzeichnis fuer Handwerksbetriebe (Bau, Kfz, Elektro, etc.).',
'de', 'EU/DSGVO', '2.0', 'published', 'MIT', 'BreakPilot Compliance', false, true,
'[]'::jsonb,
$template$# VVT Branchenvorlage: Handwerksbetrieb
Typische Verarbeitungstaetigkeiten fuer Handwerksbetriebe.
---
## VVT-HW01: Kundenauftraege und Angebotserstellung
| Feld | Inhalt |
|------|--------|
| **Zweck** | Angebotserstellung, Auftragsabwicklung, Rechnungsstellung |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO (Vertragserfullung) |
| **Betroffene** | Kunden (Privat und Gewerbe) |
| **Datenkategorien** | Kontaktdaten, Objektadresse, Auftragsbeschreibung, Rechnungsdaten |
| **Empfaenger** | Buchhaltung, Steuerberater, ggf. Subunternehmer |
| **Loeschfrist** | 10 Jahre Rechnungen (AO/HGB), 5 Jahre Gewaehrleistung (BGB) |
| **TOM** | Zugriffskontrolle Auftragssystem, verschluesselte Speicherung |
## VVT-HW02: Mitarbeiterverwaltung
| Feld | Inhalt |
|------|--------|
| **Zweck** | Personalverwaltung, Lohnabrechnung, Arbeitszeiterfassung |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b/c DS-GVO i.V.m. §26 BDSG |
| **Betroffene** | Beschaeftigte, Auszubildende |
| **Datenkategorien** | Stammdaten, Vertragsdaten, Bankverbindung, Arbeitszeiten, Gesundheitszeugnisse |
| **Empfaenger** | Lohnbuero (AVV), Finanzamt, Berufsgenossenschaft |
| **Loeschfrist** | 10 Jahre nach Austritt |
| **TOM** | Verschlossene Personalakte, Zugriffsschutz |
## VVT-HW03: Baustellendokumentation
| Feld | Inhalt |
|------|--------|
| **Zweck** | Dokumentation von Baufortschritt, Maengelprotokoll |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b/f DS-GVO (Vertrag + berechtigtes Interesse) |
| **Betroffene** | Kunden, Mitarbeitende auf der Baustelle |
| **Datenkategorien** | Fotos (ggf. mit Personen), Protokolle, Abnahmedokumente |
| **Empfaenger** | Auftraggeber, Architekten, Baugutachter |
| **Loeschfrist** | 5 Jahre nach Abnahme (Verjaeherung), Fotos nach Projektabschluss |
| **TOM** | Beschraenkter Zugriff auf Projektordner, keine oeffentliche Cloud ohne AVV |
## VVT-HW04: Materialwirtschaft
| Feld | Inhalt |
|------|--------|
| **Zweck** | Materialbeschaffung, Lagerverwaltung, Lieferantenmanagement |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO |
| **Betroffene** | Lieferanten (Ansprechpartner) |
| **Datenkategorien** | Firmendaten, Ansprechpartner, Bestellhistorie, Konditionen |
| **Empfaenger** | Grosshandel, Buchhaltung |
| **Loeschfrist** | 6 Jahre (Handelsbriefe HGB), 10 Jahre Rechnungen |
| **TOM** | Zugriffskontrolle ERP/Warenwirtschaft |
---
*Erstellt mit BreakPilot Compliance. Branchenvorlage Handwerksbetrieb.*
$template$
) ON CONFLICT DO NOTHING;
-- 7. VVT Branchenvorlage: Bildung
INSERT INTO compliance.compliance_legal_templates (
tenant_id, document_type, title, description, language, jurisdiction,
version, status, license_name, source_name, attribution_required,
is_complete_document, placeholders, content
) VALUES (
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
'vvt_register',
'VVT Branchenvorlage: Bildungseinrichtung',
'Vorbefuelltes Verarbeitungsverzeichnis fuer Schulen, Hochschulen und Bildungstraeger. Beruecksichtigt Schueler-/Studentendaten als schutzbeduerftige Betroffene.',
'de', 'EU/DSGVO', '2.0', 'published', 'MIT', 'BreakPilot Compliance', false, true,
'[]'::jsonb,
$template$# VVT Branchenvorlage: Bildungseinrichtung
Typische Verarbeitungstaetigkeiten fuer Schulen, Hochschulen und Bildungstraeger.
---
## VVT-B01: Schueler-/Studierendenverwaltung
| Feld | Inhalt |
|------|--------|
| **Zweck** | Verwaltung von Schueler-/Studierendendaten, Anmeldung, Klassenzuordnung |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. c/e DS-GVO i.V.m. Landesschulgesetz |
| **Betroffene** | Schueler/Studierende (ggf. Minderjaehrige besonders schutzbeduerftig), Erziehungsberechtigte |
| **Datenkategorien** | Stammdaten, Kontaktdaten Erziehungsberechtigte, Klassenzuordnung |
| **Empfaenger** | Schulverwaltungssoftware-Anbieter (AVV), Schulbehoerde |
| **Loeschfrist** | Gemaess Landesschulgesetz (i.d.R. 5 Jahre nach Abgang) |
| **TOM** | Besonderer Zugriffsschutz, Altersverifizierung, Einwilligung Erziehungsberechtigte |
| **DSFA erforderlich?** | Ja (schutzbeduerftige Betroffene, ggf. grosser Umfang) |
## VVT-B02: Notenverarbeitung und Zeugniserstellung
| Feld | Inhalt |
|------|--------|
| **Zweck** | Leistungsbewertung, Zeugnis- und Notenverwaltung |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. c/e DS-GVO i.V.m. Schulgesetz |
| **Betroffene** | Schueler/Studierende |
| **Datenkategorien** | Noten, Leistungsbewertungen, Pruefungsergebnisse |
| **Empfaenger** | Lehrkraefte, Schulleitung, Pruefungsamt |
| **Loeschfrist** | Zeugniskopien: 50 Jahre (Nachweispflicht), Einzelnoten: 2 Jahre |
| **TOM** | Zugriffsbeschraenkung auf Fachlehrkraft, verschluesselte Speicherung |
## VVT-B03: Lernplattform / LMS
| Feld | Inhalt |
|------|--------|
| **Zweck** | Digitaler Unterricht, Aufgabenverteilung, Kommunikation |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. e DS-GVO (oeffentliches Interesse) / lit. a (Einwilligung bei Minderjaehrigen) |
| **Betroffene** | Schueler/Studierende, Lehrkraefte |
| **Datenkategorien** | Nutzungsdaten, eingereichte Aufgaben, Chat-Nachrichten |
| **Empfaenger** | LMS-Anbieter (AVV), Hosting-Provider (AVV) |
| **Loeschfrist** | Kursende + 1 Schuljahr |
| **TOM** | Datensparsamkeit, keine Lernanalytics ohne Einwilligung, Hosting in EU |
## VVT-B04: Elternkommunikation
| Feld | Inhalt |
|------|--------|
| **Zweck** | Information und Kommunikation mit Erziehungsberechtigten |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. e DS-GVO |
| **Betroffene** | Erziehungsberechtigte |
| **Datenkategorien** | Kontaktdaten, Nachrichteninhalt |
| **Empfaenger** | Kommunikationsplattform-Anbieter (AVV) |
| **Loeschfrist** | Ende des Schuljahres bzw. Abgang des Kindes |
| **TOM** | Verschluesselte Kommunikation, kein WhatsApp/Social Media |
---
*Erstellt mit BreakPilot Compliance. Branchenvorlage Bildungseinrichtung.*
$template$
) ON CONFLICT DO NOTHING;
-- 8. VVT Branchenvorlage: Beratung / Dienstleistung
INSERT INTO compliance.compliance_legal_templates (
tenant_id, document_type, title, description, language, jurisdiction,
version, status, license_name, source_name, attribution_required,
is_complete_document, placeholders, content
) VALUES (
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
'vvt_register',
'VVT Branchenvorlage: Beratung / Dienstleistung',
'Vorbefuelltes Verarbeitungsverzeichnis fuer Beratungsunternehmen, Kanzleien und Dienstleister.',
'de', 'EU/DSGVO', '2.0', 'published', 'MIT', 'BreakPilot Compliance', false, true,
'[]'::jsonb,
$template$# VVT Branchenvorlage: Beratung / Dienstleistung
Typische Verarbeitungstaetigkeiten fuer Beratungsunternehmen, Kanzleien und professionelle Dienstleister.
---
## VVT-D01: Mandantenverwaltung
| Feld | Inhalt |
|------|--------|
| **Zweck** | Verwaltung von Mandanten-/Kundenbeziehungen, Vertragsdokumentation |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO (Vertragserfullung) |
| **Betroffene** | Mandanten, Ansprechpartner |
| **Datenkategorien** | Kontaktdaten, Vertragsdaten, Korrespondenz, Rechnungsdaten |
| **Empfaenger** | Kanzleisoftware-Anbieter (AVV), Steuerberater |
| **Loeschfrist** | 10 Jahre Rechnungen, 5 Jahre Handakten (Berufsrecht), 3 Jahre sonstige |
| **TOM** | Mandantengeheimnis, verschluesselte Speicherung, Need-to-know-Prinzip |
## VVT-D02: Projektmanagement
| Feld | Inhalt |
|------|--------|
| **Zweck** | Planung und Steuerung von Beratungsprojekten |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b/f DS-GVO |
| **Betroffene** | Projektbeteiligte (Mandant + intern) |
| **Datenkategorien** | Projektdaten, Aufgaben, Zeiterfassung, Ergebnisdokumente |
| **Empfaenger** | Projektmanagement-Tool (AVV), Mandant |
| **Loeschfrist** | 2 Jahre nach Projektabschluss |
| **TOM** | Projektspezifische Zugriffsrechte, Mandantentrennung |
## VVT-D03: Zeiterfassung und Abrechnung
| Feld | Inhalt |
|------|--------|
| **Zweck** | Erfassung geleisteter Stunden, Abrechnung gegenueber Mandanten |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b DS-GVO |
| **Betroffene** | Berater/Mitarbeitende, Mandanten |
| **Datenkategorien** | Arbeitszeiten, Taetigkeitsbeschreibungen, Stundensaetze |
| **Empfaenger** | Abrechnungssystem (AVV), Buchhaltung |
| **Loeschfrist** | 10 Jahre (steuerliche Aufbewahrung) |
| **TOM** | Zugriffsbeschraenkung (nur eigene Zeiten + Projektleitung) |
## VVT-D04: Dokumentenmanagement
| Feld | Inhalt |
|------|--------|
| **Zweck** | Verwaltung und Archivierung von Mandantendokumenten |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. b/c DS-GVO |
| **Betroffene** | Mandanten, ggf. Dritte in Dokumenten |
| **Datenkategorien** | Vertraege, Gutachten, Korrespondenz, Berichte |
| **Empfaenger** | DMS-Anbieter (AVV), Cloud-Speicher (AVV) |
| **Loeschfrist** | Gemaess Berufsrecht und Mandatsvereinbarung |
| **TOM** | Dokumentenklassifizierung, Versionierung, Zugriffsprotokollierung |
## VVT-D05: CRM und Akquise
| Feld | Inhalt |
|------|--------|
| **Zweck** | Kontaktpflege, Akquise, Beziehungsmanagement |
| **Rechtsgrundlage** | Art. 6 Abs. 1 lit. f DS-GVO (berechtigtes Interesse: Geschaeftsanbahnung) |
| **Betroffene** | Interessenten, Geschaeftskontakte |
| **Datenkategorien** | Kontaktdaten, Firma, Branche, Gespraechsnotizen |
| **Empfaenger** | CRM-Anbieter (AVV) |
| **Loeschfrist** | 3 Jahre nach letztem Kontakt |
| **TOM** | Widerspruchsmoeglichkeit, Datenminimierung |
---
*Erstellt mit BreakPilot Compliance. Branchenvorlage Beratung / Dienstleistung.*
$template$
) ON CONFLICT DO NOTHING;

View File

@@ -0,0 +1,212 @@
-- Migration 004: AVV Template — Auftragsverarbeitungsvertrag (Art. 28 DS-GVO)
-- Deutsche AVV-Vorlage mit allen Pflichtinhalten.
INSERT INTO compliance.compliance_legal_templates (
tenant_id, document_type, title, description, language, jurisdiction,
version, status, license_name, source_name, attribution_required,
is_complete_document, placeholders, content
) VALUES (
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
'dpa',
'Auftragsverarbeitungsvertrag (AVV) gemaess Art. 28 DS-GVO',
'Vollstaendiger Auftragsverarbeitungsvertrag mit allen Pflichtinhalten nach Art. 28 Abs. 3 DS-GVO. Inkl. TOM-Anlage und Drittlandtransfer-Klausel.',
'de',
'EU/DSGVO',
'2.0',
'published',
'MIT',
'BreakPilot Compliance',
false,
true,
CAST('[
"{{VERANTWORTLICHER_NAME}}",
"{{VERANTWORTLICHER_ADRESSE}}",
"{{VERANTWORTLICHER_VERTRETER}}",
"{{AUFTRAGSVERARBEITER_NAME}}",
"{{AUFTRAGSVERARBEITER_ADRESSE}}",
"{{AUFTRAGSVERARBEITER_VERTRETER}}",
"{{VERTRAGSGEGENSTAND}}",
"{{VERTRAGSDAUER}}",
"{{VERARBEITUNGSZWECK}}",
"{{ART_DER_VERARBEITUNG}}",
"{{DATENKATEGORIEN}}",
"{{BETROFFENE}}",
"{{UNTERAUFTRAGSVERARBEITER_LISTE}}",
"{{TOM_ANLAGE}}",
"{{DRITTLANDTRANSFER_DETAILS}}",
"{{ORT_DATUM}}",
"{{WEISUNGSBERECHTIGTER}}",
"{{KONTAKT_DATENSCHUTZ_AV}}"
]' AS jsonb),
$template$# Auftragsverarbeitungsvertrag (AVV)
**gemaess Art. 28 Abs. 3 DS-GVO**
---
## Vertragsparteien
**Verantwortlicher (Auftraggeber):**
{{VERANTWORTLICHER_NAME}}
{{VERANTWORTLICHER_ADRESSE}}
Vertreten durch: {{VERANTWORTLICHER_VERTRETER}}
**Auftragsverarbeiter (Auftragnehmer):**
{{AUFTRAGSVERARBEITER_NAME}}
{{AUFTRAGSVERARBEITER_ADRESSE}}
Vertreten durch: {{AUFTRAGSVERARBEITER_VERTRETER}}
---
## §1 Gegenstand und Dauer
(1) Der Auftragsverarbeiter verarbeitet personenbezogene Daten im Auftrag des Verantwortlichen. Gegenstand der Auftragsverarbeitung ist:
{{VERTRAGSGEGENSTAND}}
(2) Die Dauer der Verarbeitung entspricht der Laufzeit des Hauptvertrags: {{VERTRAGSDAUER}}.
---
## §2 Art und Zweck der Verarbeitung
(1) **Zweck:** {{VERARBEITUNGSZWECK}}
(2) **Art der Verarbeitung:** {{ART_DER_VERARBEITUNG}}
---
## §3 Art der personenbezogenen Daten
{{DATENKATEGORIEN}}
---
## §4 Kategorien betroffener Personen
{{BETROFFENE}}
---
## §5 Pflichten des Verantwortlichen
(1) Der Verantwortliche ist fuer die Rechtmaessigkeit der Datenverarbeitung verantwortlich.
(2) Der Verantwortliche erteilt Weisungen zur Datenverarbeitung. Weisungsberechtigt ist: {{WEISUNGSBERECHTIGTER}}.
(3) Der Verantwortliche informiert den Auftragsverarbeiter unverzueglich, wenn er Fehler oder Unregelmaessigkeiten feststellt.
(4) Der Verantwortliche ist verpflichtet, alle im Rahmen des Vertragsverhaeltnisses erlangten Kenntnisse vertraulich zu behandeln.
---
## §6 Pflichten des Auftragsverarbeiters
(1) Der Auftragsverarbeiter verarbeitet die Daten ausschliesslich auf dokumentierte Weisung des Verantwortlichen (Art. 28 Abs. 3 lit. a DS-GVO), es sei denn, er ist durch Unionsrecht oder nationales Recht hierzu verpflichtet.
(2) Der Auftragsverarbeiter gewaehrleistet, dass sich die zur Verarbeitung befugten Personen zur Vertraulichkeit verpflichtet haben oder einer angemessenen gesetzlichen Verschwiegenheitspflicht unterliegen (Art. 28 Abs. 3 lit. b).
(3) Der Auftragsverarbeiter trifft alle erforderlichen technischen und organisatorischen Massnahmen gemaess Art. 32 DS-GVO (siehe Anlage 1: TOM).
(4) Der Auftragsverarbeiter beachtet die Bedingungen fuer die Inanspruchnahme von Unterauftragsverarbeitern (§7 dieses Vertrags).
(5) Der Auftragsverarbeiter unterstuetzt den Verantwortlichen bei der Erfuellung der Betroffenenrechte (Art. 15-22 DS-GVO) durch geeignete technische und organisatorische Massnahmen (Art. 28 Abs. 3 lit. e).
(6) Der Auftragsverarbeiter unterstuetzt den Verantwortlichen bei der Einhaltung der Pflichten aus Art. 32-36 DS-GVO (Sicherheit, Meldepflichten, DSFA, Konsultation).
(7) Der Auftragsverarbeiter loescht oder gibt nach Wahl des Verantwortlichen alle personenbezogenen Daten nach Beendigung der Auftragsverarbeitung zurueck und loescht vorhandene Kopien, es sei denn, eine Aufbewahrungspflicht besteht (Art. 28 Abs. 3 lit. g).
(8) Der Auftragsverarbeiter stellt dem Verantwortlichen alle erforderlichen Informationen zum Nachweis der Einhaltung der Pflichten zur Verfuegung und ermoeglicht Ueberpruefungen/Audits (Art. 28 Abs. 3 lit. h).
(9) Der Auftragsverarbeiter informiert den Verantwortlichen unverzueglich, wenn eine Weisung nach seiner Auffassung gegen datenschutzrechtliche Vorschriften verstoesst.
(10) Der Auftragsverarbeiter benennt einen Ansprechpartner fuer den Datenschutz: {{KONTAKT_DATENSCHUTZ_AV}}.
---
## §7 Unterauftragsverarbeitung
(1) Der Auftragsverarbeiter darf Unterauftragsverarbeiter nur mit vorheriger schriftlicher Genehmigung des Verantwortlichen einsetzen. Es wird eine allgemeine Genehmigung erteilt, wobei der Auftragsverarbeiter den Verantwortlichen ueber beabsichtigte Aenderungen mindestens 14 Tage im Voraus informiert. Der Verantwortliche kann Einspruch erheben.
(2) Aktuelle Unterauftragsverarbeiter:
{{UNTERAUFTRAGSVERARBEITER_LISTE}}
(3) Der Auftragsverarbeiter stellt vertraglich sicher, dass die Unterauftragsverarbeiter dieselben Datenschutzpflichten einhalten.
{{#IF DRITTLANDTRANSFER_DETAILS}}
---
## §8 Uebermittlung in Drittlaender
(1) Eine Uebermittlung personenbezogener Daten in Drittlaender erfolgt nur unter Einhaltung der Voraussetzungen der Art. 44-49 DS-GVO.
(2) Details:
{{DRITTLANDTRANSFER_DETAILS}}
{{/IF}}
---
## §9 Kontrollrechte und Audits
(1) Der Verantwortliche hat das Recht, die Einhaltung der Vorschriften durch den Auftragsverarbeiter zu ueberpruefen. Dies umfasst Inspektionen vor Ort, Dokumentenpruefungen und die Einholung von Auskuenften.
(2) Der Auftragsverarbeiter unterstuetzt den Verantwortlichen bei der Durchfuehrung und gewaehrt Zugang zu relevanten Raeumlichkeiten und Systemen mit angemessener Vorankuendigung (in der Regel 14 Tage).
(3) Alternativ kann der Auftragsverarbeiter aktuelle Zertifizierungen (z. B. ISO 27001, SOC 2) oder Auditberichte unabhaengiger Pruefervorlegen.
---
## §10 Meldung von Datenpannen
(1) Der Auftragsverarbeiter informiert den Verantwortlichen unverzueglich (in der Regel innerhalb von 24 Stunden) nach Kenntniserlangung ueber eine Verletzung des Schutzes personenbezogener Daten (Art. 33 Abs. 2 DS-GVO).
(2) Die Meldung umfasst mindestens die Art der Datenpanne, die betroffenen Kategorien und ungefaehre Anzahl der Betroffenen, die wahrscheinlichen Folgen und die ergriffenen Gegenmassnahmen.
---
## §11 Haftung
Die Haftung richtet sich nach Art. 82 DS-GVO. Der Auftragsverarbeiter haftet fuer Schaeden, die durch eine nicht den Vorgaben der DS-GVO entsprechende Verarbeitung oder durch Handeln entgegen den Weisungen des Verantwortlichen verursacht wurden.
---
## §12 Laufzeit und Kuendigung
(1) Dieser AVV tritt mit Unterzeichnung in Kraft und endet automatisch mit Beendigung des Hauptvertrags.
(2) Eine ausserordentliche Kuendigung ist bei schwerem Verstoss gegen diesen Vertrag oder datenschutzrechtliche Vorschriften moeglich.
(3) Nach Vertragsende hat der Auftragsverarbeiter alle personenbezogenen Daten gemaess §6 Abs. 7 zu loeschen oder zurueckzugeben.
---
## §13 Schlussbestimmungen
(1) Aenderungen dieses Vertrags beduerfen der Schriftform.
(2) Sollten einzelne Bestimmungen unwirksam sein, bleibt die Wirksamkeit des uebrigen Vertrags unberuehrt.
(3) Es gilt das Recht der Bundesrepublik Deutschland.
---
## Anlage 1: Technische und Organisatorische Massnahmen (TOM)
{{TOM_ANLAGE}}
---
## Unterschriften
| | Verantwortlicher | Auftragsverarbeiter |
|---|---|---|
| **Ort, Datum** | {{ORT_DATUM}} | {{ORT_DATUM}} |
| **Name** | {{VERANTWORTLICHER_VERTRETER}} | {{AUFTRAGSVERARBEITER_VERTRETER}} |
| **Unterschrift** | _________________ | _________________ |
---
*Erstellt mit BreakPilot Compliance. Lizenz: MIT.*
$template$
) ON CONFLICT DO NOTHING;

View File

@@ -0,0 +1,249 @@
-- Migration 005: Zusaetzliche Templates — Verpflichtungserklaerung + Art. 13/14
-- 1. Verpflichtungserklaerung (Vertraulichkeit Mitarbeitende)
INSERT INTO compliance.compliance_legal_templates (
tenant_id, document_type, title, description, language, jurisdiction,
version, status, license_name, source_name, attribution_required,
is_complete_document, placeholders, content
) VALUES (
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
'verpflichtungserklaerung',
'Verpflichtungserklaerung auf das Datengeheimnis',
'Vorlage zur Verpflichtung von Mitarbeitenden auf die Vertraulichkeit und das Datengeheimnis gemaess DS-GVO. Fuer Onboarding-Prozesse.',
'de',
'DE',
'1.0',
'published',
'MIT',
'BreakPilot Compliance',
false,
true,
CAST('[
"{{UNTERNEHMEN_NAME}}",
"{{UNTERNEHMEN_ADRESSE}}",
"{{MITARBEITER_NAME}}",
"{{MITARBEITER_ABTEILUNG}}",
"{{DSB_NAME}}",
"{{DSB_KONTAKT}}",
"{{ORT_DATUM}}",
"{{SCHULUNGSDATUM}}"
]' AS jsonb),
$template$# Verpflichtung auf das Datengeheimnis
**gemaess Art. 28 Abs. 3 lit. b, Art. 29, Art. 32 Abs. 4 DS-GVO**
---
## 1. Verpflichtung
Ich, **{{MITARBEITER_NAME}}**, Abteilung **{{MITARBEITER_ABTEILUNG}}**, werde hiermit auf die Vertraulichkeit im Umgang mit personenbezogenen Daten verpflichtet.
**Arbeitgeber:** {{UNTERNEHMEN_NAME}}, {{UNTERNEHMEN_ADRESSE}}
Ich verpflichte mich, personenbezogene Daten, die mir im Rahmen meiner Taetigkeit bekannt werden, nur gemaess den erteilten Weisungen zu verarbeiten. Diese Verpflichtung gilt auch nach Beendigung des Beschaeftigungsverhaeltnisses fort.
---
## 2. Pflichten im Einzelnen
Mir ist bekannt, dass ich verpflichtet bin:
- Personenbezogene Daten nur im Rahmen meiner Aufgaben und nach Weisung des Verantwortlichen zu verarbeiten.
- Die Vertraulichkeit personenbezogener Daten zu wahren und diese nicht unbefugt an Dritte weiterzugeben.
- Personenbezogene Daten vor unbefugtem Zugriff, Verlust und Missbrauch zu schuetzen.
- Den Datenschutzbeauftragten unverzueglich ueber Datenschutzvorfaelle oder -verletzungen zu informieren.
- Keine personenbezogenen Daten fuer private Zwecke zu verwenden.
- Mobile Datentraeger und Zugangsmedien sorgfaeltig aufzubewahren.
- Passwoerter nicht weiterzugeben und regelmaessig zu aendern.
---
## 3. Rechtsfolgen bei Verstoss
Ein Verstoss gegen das Datengeheimnis kann folgende Konsequenzen haben:
- **Arbeitsrechtliche Massnahmen** bis hin zur fristlosen Kuendigung
- **Schadensersatzansprueche** des Arbeitgebers oder der Betroffenen (Art. 82 DS-GVO)
- **Ordnungswidrigkeiten oder Straftaten** nach BDSG und StGB (§§ 42, 43 BDSG; §§ 201-206 StGB)
---
## 4. Datenschutzschulung
{{#IF SCHULUNGSDATUM}}
Ich habe am **{{SCHULUNGSDATUM}}** eine Datenschutzschulung erhalten und wurde ueber die wesentlichen Grundsaetze der DS-GVO unterrichtet.
{{/IF}}
{{#IF_NOT SCHULUNGSDATUM}}
Eine Datenschutzschulung wird im Rahmen des Onboarding durchgefuehrt.
{{/IF_NOT}}
---
## 5. Ansprechpartner
Bei Fragen zum Datenschutz wende ich mich an den Datenschutzbeauftragten:
**{{DSB_NAME}}** {{DSB_KONTAKT}}
---
## 6. Bestaetigung
Ich habe diese Verpflichtungserklaerung gelesen und verstanden. Ich bin mir meiner Pflichten bewusst.
| | Mitarbeitende/r | Arbeitgeber |
|---|---|---|
| **Ort, Datum** | {{ORT_DATUM}} | {{ORT_DATUM}} |
| **Name** | {{MITARBEITER_NAME}} | |
| **Unterschrift** | _________________ | _________________ |
---
*Erstellt mit BreakPilot Compliance. Lizenz: MIT.*
$template$
) ON CONFLICT DO NOTHING;
-- 2. Art. 13/14 Informationspflichten-Muster
INSERT INTO compliance.compliance_legal_templates (
tenant_id, document_type, title, description, language, jurisdiction,
version, status, license_name, source_name, attribution_required,
is_complete_document, placeholders, content
) VALUES (
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
'informationspflichten',
'Informationspflichten gemaess Art. 13/14 DS-GVO',
'Mustertext fuer Datenschutzhinweise nach Art. 13 (Direkterhebung) und Art. 14 (Dritterhebung) DS-GVO. Mit bedingten Bloecken fuer beide Varianten.',
'de',
'EU/DSGVO',
'1.0',
'published',
'MIT',
'BreakPilot Compliance',
false,
true,
CAST('[
"{{VERANTWORTLICHER_NAME}}",
"{{VERANTWORTLICHER_ADRESSE}}",
"{{VERANTWORTLICHER_KONTAKT}}",
"{{DSB_NAME}}",
"{{DSB_KONTAKT}}",
"{{VERARBEITUNGSZWECK}}",
"{{RECHTSGRUNDLAGE}}",
"{{BERECHTIGTES_INTERESSE}}",
"{{DATENKATEGORIEN}}",
"{{DATENQUELLE}}",
"{{EMPFAENGER}}",
"{{DRITTLANDTRANSFER}}",
"{{SPEICHERDAUER}}",
"{{AUFSICHTSBEHOERDE}}",
"{{AUTOMATISIERTE_ENTSCHEIDUNG}}",
"{{PFLICHT_ODER_FREIWILLIG}}"
]' AS jsonb),
$template$# Datenschutzhinweise
**gemaess Art. 13 und Art. 14 der Datenschutz-Grundverordnung (DS-GVO)**
---
## 1. Verantwortlicher
{{VERANTWORTLICHER_NAME}}
{{VERANTWORTLICHER_ADRESSE}}
Kontakt: {{VERANTWORTLICHER_KONTAKT}}
{{#IF DSB_NAME}}
## 2. Datenschutzbeauftragter
{{DSB_NAME}}
{{DSB_KONTAKT}}
{{/IF}}
---
## 3. Zweck und Rechtsgrundlage der Verarbeitung
Wir verarbeiten Ihre personenbezogenen Daten zu folgenden Zwecken:
{{VERARBEITUNGSZWECK}}
**Rechtsgrundlage:** {{RECHTSGRUNDLAGE}}
{{#IF BERECHTIGTES_INTERESSE}}
**Berechtigtes Interesse (Art. 6 Abs. 1 lit. f DS-GVO):** {{BERECHTIGTES_INTERESSE}}
{{/IF}}
---
## 4. Kategorien personenbezogener Daten
{{DATENKATEGORIEN}}
{{#IF DATENQUELLE}}
## 5. Herkunft der Daten (Art. 14 DS-GVO)
Die Daten wurden nicht bei Ihnen direkt erhoben, sondern stammen aus folgender Quelle:
{{DATENQUELLE}}
{{/IF}}
---
## 6. Empfaenger und Uebermittlung
Ihre Daten werden an folgende Empfaenger bzw. Kategorien von Empfaengern uebermittelt:
{{EMPFAENGER}}
{{#IF DRITTLANDTRANSFER}}
### Uebermittlung in Drittlaender
{{DRITTLANDTRANSFER}}
{{/IF}}
---
## 7. Speicherdauer
{{SPEICHERDAUER}}
---
## 8. Ihre Rechte
Sie haben gegenueber dem Verantwortlichen folgende Rechte hinsichtlich Ihrer personenbezogenen Daten:
- **Auskunftsrecht** (Art. 15 DS-GVO): Sie koennen Auskunft ueber die gespeicherten Daten verlangen.
- **Berichtigungsrecht** (Art. 16 DS-GVO): Sie koennen die Berichtigung unrichtiger Daten verlangen.
- **Loeschungsrecht** (Art. 17 DS-GVO): Sie koennen die Loeschung Ihrer Daten verlangen, sofern keine Aufbewahrungspflicht besteht.
- **Einschraenkung** (Art. 18 DS-GVO): Sie koennen die Einschraenkung der Verarbeitung verlangen.
- **Datenuebert ragbarkeit** (Art. 20 DS-GVO): Sie koennen Ihre Daten in einem strukturierten, maschinenlesbaren Format erhalten.
- **Widerspruchsrecht** (Art. 21 DS-GVO): Sie koennen der Verarbeitung widersprechen, insbesondere bei Direktwerbung.
{{#IF RECHTSGRUNDLAGE}}
- **Widerrufsrecht** (Art. 7 Abs. 3 DS-GVO): Sofern die Verarbeitung auf Einwilligung beruht, koennen Sie diese jederzeit widerrufen, ohne dass die Rechtmaessigkeit der bis dahin erfolgten Verarbeitung beruehrt wird.
{{/IF}}
---
## 9. Beschwerderecht
Sie haben das Recht, sich bei einer Aufsichtsbehoerde zu beschweren:
{{AUFSICHTSBEHOERDE}}
---
{{#IF AUTOMATISIERTE_ENTSCHEIDUNG}}
## 10. Automatisierte Entscheidungsfindung (Art. 22 DS-GVO)
{{AUTOMATISIERTE_ENTSCHEIDUNG}}
{{/IF}}
{{#IF PFLICHT_ODER_FREIWILLIG}}
## 11. Bereitstellung der Daten
{{PFLICHT_ODER_FREIWILLIG}}
{{/IF}}
---
*Stand: Siehe Versionsdatum des Dokuments. Erstellt mit BreakPilot Compliance. Lizenz: MIT.*
$template$
) ON CONFLICT DO NOTHING;

View File

@@ -0,0 +1,350 @@
-- Migration 006: Betriebsvereinbarung Template V1
-- Modulare Vorlage fuer Betriebsvereinbarungen zu KI/IT-Systemen
-- Rechtsgrundlage: §87 Abs.1 Nr.6 BetrVG, DSGVO, BDSG
INSERT INTO compliance.compliance_legal_templates (
tenant_id, document_type, title, description, language, jurisdiction,
version, status, license_name, source_name, attribution_required,
is_complete_document, placeholders, content
) VALUES (
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
'betriebsvereinbarung',
'Betriebsvereinbarung — Einfuehrung und Nutzung von KI-/IT-Systemen',
'Modulare Vorlage fuer eine Betriebsvereinbarung gemaess §87 Abs.1 Nr.6 BetrVG zur Einfuehrung und Nutzung von IT-Systemen und KI-Anwendungen. Umfasst Datenschutz, Ueberwachungsschutz, Change-Management und Kontrollrechte des Betriebsrats. Basiert auf BAG-Rechtsprechung zu Microsoft 365, SAP ERP und Standardsoftware.',
'de',
'DE',
'1.0',
'published',
'MIT',
'BreakPilot Compliance',
false,
true,
CAST('[
"{{UNTERNEHMEN_NAME}}",
"{{UNTERNEHMEN_SITZ}}",
"{{ARBEITGEBER_VERTRETER}}",
"{{BETRIEBSRAT_VORSITZ}}",
"{{SYSTEM_NAME}}",
"{{SYSTEM_BESCHREIBUNG}}",
"{{SYSTEM_HERSTELLER}}",
"{{GELTUNGSBEREICH_STANDORTE}}",
"{{GELTUNGSBEREICH_BEREICHE}}",
"{{GELTUNGSBEREICH_MODULE}}",
"{{ZWECK_BESCHREIBUNG}}",
"{{DATENARTEN_LISTE}}",
"{{VERBOTENE_NUTZUNGEN}}",
"{{ROLLEN_ADMIN}}",
"{{ROLLEN_FUEHRUNGSKRAFT}}",
"{{ROLLEN_REPORTING}}",
"{{TRANSPARENZ_INFO}}",
"{{ERLAUBTE_REPORTS}}",
"{{SPEICHERFRIST_AUDIT_LOGS}}",
"{{SPEICHERFRIST_NUTZUNGSDATEN}}",
"{{SPEICHERFRIST_CHAT_PROMPTS}}",
"{{TOM_MASSNAHMEN}}",
"{{CHANGE_MANAGEMENT_PROZESS}}",
"{{AUDIT_INTERVALL}}",
"{{BESCHWERDE_ANSPRECHPARTNER}}",
"{{LAUFZEIT}}",
"{{KUENDIGUNGSFRIST}}",
"{{DATUM_UNTERZEICHNUNG}}",
"{{DSB_NAME}}",
"{{DSB_KONTAKT}}"
]' AS jsonb),
$template$# Betriebsvereinbarung
**ueber die Einfuehrung und Nutzung von {{SYSTEM_NAME}}**
zwischen
**{{UNTERNEHMEN_NAME}}**, {{UNTERNEHMEN_SITZ}},
vertreten durch {{ARBEITGEBER_VERTRETER}}
(nachfolgend "Arbeitgeberin")
und dem
**Betriebsrat** der {{UNTERNEHMEN_NAME}},
vertreten durch den/die Vorsitzende/n {{BETRIEBSRAT_VORSITZ}}
(nachfolgend "Betriebsrat")
---
## A. Praeambel und Rechtsgrundlagen
Diese Betriebsvereinbarung regelt die Einfuehrung und Nutzung von **{{SYSTEM_NAME}}** ({{SYSTEM_BESCHREIBUNG}}) im Betrieb der {{UNTERNEHMEN_NAME}}.
**Rechtsgrundlagen:**
- §87 Abs.1 Nr.6 BetrVG (Mitbestimmung bei technischen Ueberwachungseinrichtungen)
- §90 BetrVG (Unterrichtung bei Planung technischer Anlagen)
- Art. 5, 6, 32 DSGVO (Datenschutzgrundsaetze, Rechtsgrundlage, TOM)
- §26 BDSG (Beschaeftigtendatenschutz)
{{#IF AI_SYSTEM}}
- Verordnung (EU) 2024/1689 (KI-Verordnung / AI Act)
{{/IF}}
Die Parteien sind sich einig, dass {{SYSTEM_NAME}} eine technische Einrichtung im Sinne des §87 Abs.1 Nr.6 BetrVG darstellt, die geeignet ist, das Verhalten oder die Leistung der Beschaeftigten zu ueberwachen. Die Einigung erfolgt in Kenntnis der Rechtsprechung des Bundesarbeitsgerichts (vgl. BAG 1 ABR 20/21 Microsoft Office 365; BAG 1 ABN 36/18 Standardsoftware).
---
## B. Geltungsbereich
### B.1 Raeumlicher Geltungsbereich
Diese Betriebsvereinbarung gilt fuer folgende Standorte:
{{GELTUNGSBEREICH_STANDORTE}}
### B.2 Persoenlicher Geltungsbereich
Die Betriebsvereinbarung gilt fuer alle Beschaeftigten der folgenden Bereiche:
{{GELTUNGSBEREICH_BEREICHE}}
### B.3 Sachlicher Geltungsbereich
Die Betriebsvereinbarung umfasst folgende Module und Dienste des Systems:
{{GELTUNGSBEREICH_MODULE}}
{{#IF SYSTEM_HERSTELLER}}
**Systemhersteller/-anbieter:** {{SYSTEM_HERSTELLER}}
{{/IF}}
---
## C. Zweckbestimmung
### C.1 Erlaubte Nutzungszwecke
{{SYSTEM_NAME}} darf ausschliesslich zu folgenden Zwecken eingesetzt werden:
{{ZWECK_BESCHREIBUNG}}
### C.2 Verbotene Nutzungen
Folgende Nutzungen sind ausdruecklich untersagt:
{{VERBOTENE_NUTZUNGEN}}
Darueber hinaus ist generell untersagt:
- Verdeckte Leistungs- oder Verhaltenskontrolle einzelner Beschaeftigter
- Erstellung individueller Persoenlichkeitsprofile
- Nutzung von Prompt-, Chat- oder Nutzungshistorien zu disziplinarischen Zwecken
- Automatisierte Personalentscheidungen ohne menschliche Ueberpruefung
- Personenbezogene Rankings oder Leistungsvergleiche ohne gesonderte Mitbestimmung
{{#IF AI_SYSTEM}}
- Einsatz von KI-Funktionen zur biometrischen Echtzeit-Identifizierung
- KI-gestuetztes Social Scoring von Beschaeftigten
{{/IF}}
---
## D. Datenarten und Verarbeitungszwecke
### D.1 Verarbeitete Datenarten
Im Rahmen der Nutzung von {{SYSTEM_NAME}} werden folgende Datenarten verarbeitet:
{{DATENARTEN_LISTE}}
### D.2 Rechtsgrundlage
Die Verarbeitung der Beschaeftigtendaten erfolgt auf Grundlage von:
- §26 Abs.1 BDSG i.V.m. Art. 6 Abs.1 lit. b DSGVO (Durchfuehrung des Arbeitsverhaeltnisses)
- §26 Abs.4 BDSG i.V.m. Art. 88 DSGVO (diese Betriebsvereinbarung als Kollektivvereinbarung)
### D.3 Keine Verarbeitung besonderer Kategorien
Daten gemaess Art. 9 DSGVO (Gesundheitsdaten, Gewerkschaftszugehoerigkeit, biometrische Daten etc.) werden nicht verarbeitet, es sei denn, dies ist in einem gesonderten Anhang zu dieser Betriebsvereinbarung ausdruecklich geregelt.
---
## E. Rollen- und Zugriffskonzept
### E.1 Administratoren
{{ROLLEN_ADMIN}}
### E.2 Fuehrungskraefte
{{ROLLEN_FUEHRUNGSKRAFT}}
Fuehrungskraefte erhalten **keinen** Zugriff auf:
- individuelle Nutzungsprotokolle
- Prompt-/Chat-Historien einzelner Beschaeftigter
- Produktivitaetskennzahlen auf Personenebene
### E.3 Reporting-Zugriff
{{ROLLEN_REPORTING}}
### E.4 Vier-Augen-Prinzip
Sonderauswertungen mit Personenbezug beduerfen:
- der Zustimmung des Betriebsrats
- der Beteiligung des Datenschutzbeauftragten ({{DSB_NAME}}, {{DSB_KONTAKT}})
- einer dokumentierten Begruendung
---
## F. Transparenz gegenueber Beschaeftigten
Die Arbeitgeberin informiert alle Beschaeftigten vor Einfuehrung von {{SYSTEM_NAME}} ueber:
{{TRANSPARENZ_INFO}}
Insbesondere:
- Welche Daten verarbeitet werden
- Welche KI-Funktionen aktiviert sind
- Welche Protokollierung stattfindet
- Wer Zugriff auf welche Daten hat
- Wie lange Daten gespeichert werden
- An wen sich Beschaeftigte bei Fragen oder Beschwerden wenden koennen
{{#IF AI_SYSTEM}}
Bei KI-gestuetzten Funktionen wird zusaetzlich transparent gemacht:
- Ob und wie KI-generierte Inhalte gekennzeichnet werden
- Ob Eingaben fuer Modelltraining verwendet werden (Standard: Nein)
- Welche Entscheidungsunterstuetzung die KI leistet
{{/IF}}
---
## G. Auswertungen und Reports
### G.1 Erlaubte Reports
Folgende Auswertungen sind ohne gesonderte Zustimmung zulaessig:
{{ERLAUBTE_REPORTS}}
### G.2 Unzulaessige Reports
Ohne ausdrueckliche, vorherige Zustimmung des Betriebsrats sind unzulaessig:
- individuelle Produktivitaetsreports
- Teamvergleiche mit Personenbezug
- Verhaltensprofile oder Nutzungsmuster einzelner Beschaeftigter
- Rankinglisten (auch anonymisierte, wenn Re-Identifikation moeglich)
- Korrelation von Nutzungsdaten mit Leistungsbeurteilungen
### G.3 Neue Reporttypen
Die Einfuehrung neuer Reporttypen bedarf der vorherigen Zustimmung des Betriebsrats.
---
## H. Speicher- und Loeschfristen
| Datenkategorie | Speicherfrist | Loeschverfahren |
|----------------|---------------|-----------------|
| Audit-/Admin-Logs | {{SPEICHERFRIST_AUDIT_LOGS}} | Automatische Loeschung |
| Nutzungsdaten (aggregiert) | {{SPEICHERFRIST_NUTZUNGSDATEN}} | Automatische Loeschung |
| Prompt-/Chat-Historien | {{SPEICHERFRIST_CHAT_PROMPTS}} | Automatische Loeschung oder deaktiviert |
| Exportdateien | 30 Tage | Automatische Loeschung |
Die Speicherdauer der Audit-Logs orientiert sich am berechtigten Interesse der Arbeitgeberin an der Systemsicherheit und wird auf das erforderliche Minimum begrenzt.
{{#IF AI_SYSTEM}}
**KI-spezifisch:**
- Trainingsdaten aus Beschaeftigten-Interaktionen: **nicht zulaessig** ohne gesonderte Vereinbarung
- Feedback-Daten zur Modellverbesserung: nur anonymisiert und aggregiert
{{/IF}}
---
## I. Technische und organisatorische Massnahmen (TOM)
Zum Schutz der Beschaeftigtendaten werden folgende Massnahmen umgesetzt:
{{TOM_MASSNAHMEN}}
Ergaenzend gelten mindestens:
- Rollen- und Rechtekonzept mit Least-Privilege-Prinzip
- Verschluesselung der Daten bei Uebertragung und Speicherung
- Protokollierung aller administrativen Zugriffe
- Pseudonymisierung, wo technisch moeglich
- Deaktivierung nicht benoetigter Telemetrie- und Diagnosefunktionen
- Getrennte Umgebungen fuer Test und Produktion
---
## J. Change-Management
### J.1 Aenderungspflicht
Folgende Aenderungen an {{SYSTEM_NAME}} beduerfen der vorherigen Information und ggf. erneuten Mitbestimmung des Betriebsrats:
{{CHANGE_MANAGEMENT_PROZESS}}
Insbesondere:
- Aktivierung neuer Module oder Funktionen
- Anbindung neuer Datenquellen oder Konnektoren
- Aenderung der Reporting-Funktionalitaet
- Updates mit neuen KI-Modellen oder -Funktionen
- Aenderung der Datenverarbeitungsstandorte
- Erweiterung des Nutzerkreises
### J.2 Informationsfrist
Die Arbeitgeberin informiert den Betriebsrat mindestens **14 Kalendertage** vor geplanten Aenderungen schriftlich. Bei sicherheitskritischen Updates kann die Frist auf 3 Werktage verkuerzt werden.
### J.3 Bewertungsverfahren
Jede Aenderung wird anhand folgender Kriterien bewertet:
- Aendert sich die Ueberwachungseignung?
- Werden neue Datenarten verarbeitet?
- Aendert sich der Personenbezug?
Bei positiver Beantwortung einer dieser Fragen ist eine erneute Mitbestimmung erforderlich.
---
## K. Kontroll- und Audit-Rechte des Betriebsrats
### K.1 Laufende Kontrolle
Der Betriebsrat hat das Recht auf:
- Einsicht in die Systemdokumentation
- Einsicht in den Katalog aktiver Reports und Auswertungen
- Information ueber alle Administrationszugriffe
- Teilnahme an Schulungen zum System
### K.2 Regelmaessige Reviews
Arbeitgeberin und Betriebsrat fuehren alle **{{AUDIT_INTERVALL}}** einen gemeinsamen Review durch. Gegenstand:
- Aktuelle Nutzung und Funktionsumfang
- Eingehaltene/verletzte Regelungen
- Eingegangene Beschwerden
- Geplante Aenderungen
- Aktualitaet der TOM
### K.3 Anlassbezogene Pruefung
Bei begruendetem Verdacht auf Verstoss gegen diese Betriebsvereinbarung kann der Betriebsrat jederzeit eine Sonderpruefung verlangen. Die Arbeitgeberin stellt innerhalb von 5 Werktagen die angeforderten Informationen bereit.
---
## L. Beschwerden und Eskalation
### L.1 Beschwerderecht
Beschaeftigte koennen sich bei Bedenken hinsichtlich der Datenverarbeitung wenden an:
{{BESCHWERDE_ANSPRECHPARTNER}}
### L.2 Eskalation
Bei Meinungsverschiedenheiten ueber die Auslegung oder Anwendung dieser Betriebsvereinbarung gilt:
1. Gespraech zwischen Arbeitgeberin und Betriebsrat (Frist: 2 Wochen)
2. Hinzuziehung des Datenschutzbeauftragten
3. Einigungsstelle gemaess §76 BetrVG
### L.3 Sofortmassnahmen
Bei schwerwiegenden Verstoessen (insbesondere unzulaessige Ueberwachung, Datenmissbrauch) kann der Betriebsrat die sofortige Aussetzung der betroffenen Funktion verlangen. Die Arbeitgeberin setzt die Funktion bis zur Klaerung aus.
---
## M. Schlussbestimmungen
### M.1 Inkrafttreten und Laufzeit
Diese Betriebsvereinbarung tritt am {{DATUM_UNTERZEICHNUNG}} in Kraft und gilt fuer die Dauer von {{LAUFZEIT}}.
### M.2 Kuendigung
Die Betriebsvereinbarung kann von jeder Seite mit einer Frist von {{KUENDIGUNGSFRIST}} zum Monatsende schriftlich gekuendigt werden.
### M.3 Nachwirkung
Die Betriebsvereinbarung wirkt nach Kuendigung bis zum Abschluss einer neuen Vereinbarung nach (§77 Abs.6 BetrVG).
### M.4 Salvatorische Klausel
Sollten einzelne Bestimmungen unwirksam sein, bleibt die Wirksamkeit der uebrigen Bestimmungen unberuehrt. Die Parteien verpflichten sich, unwirksame Bestimmungen durch wirksame zu ersetzen, die dem wirtschaftlichen Zweck am naechsten kommen.
### M.5 Anlagen
Folgende Anlagen sind Bestandteil dieser Betriebsvereinbarung:
- Anlage 1: Detaillierte Systemdokumentation
- Anlage 2: Rollen- und Rechtekonzept
- Anlage 3: TOM-Dokumentation
- Anlage 4: Reportkatalog
{{#IF AI_SYSTEM}}
- Anlage 5: KI-Transparenzbericht
{{/IF}}
---
**{{UNTERNEHMEN_SITZ}}, den {{DATUM_UNTERZEICHNUNG}}**
| | |
|---|---|
| _________________________ | _________________________ |
| {{ARBEITGEBER_VERTRETER}} | {{BETRIEBSRAT_VORSITZ}} |
| fuer die Arbeitgeberin | fuer den Betriebsrat |
$template$
) ON CONFLICT DO NOTHING;

View File

@@ -0,0 +1,330 @@
-- Migration 007: FRIA Template V1 — Grundrechte-Folgenabschaetzung (Art. 27 KI-VO)
-- Fundamental Rights Impact Assessment fuer Hochrisiko-KI-Systeme
-- Rechtsgrundlage: Art. 27 Verordnung (EU) 2024/1689 (KI-Verordnung / AI Act)
INSERT INTO compliance.compliance_legal_templates (
tenant_id, document_type, title, description, language, jurisdiction,
version, status, license_name, source_name, attribution_required,
is_complete_document, placeholders, content
) VALUES (
'9282a473-5c95-4b3a-bf78-0ecc0ec71d3e'::uuid,
'fria',
'Grundrechte-Folgenabschaetzung (FRIA) gemaess Art. 27 KI-Verordnung',
'Vorlage fuer eine Grundrechte-Folgenabschaetzung (Fundamental Rights Impact Assessment) gemaess Art. 27 der Verordnung (EU) 2024/1689 (KI-Verordnung). Erforderlich fuer Hochrisiko-KI-Systeme, insbesondere bei oeffentlichen Stellen und in den Bereichen Beschaeftigung, Bildung und Zugang zu wesentlichen Dienstleistungen.',
'de',
'EU/KI-VO',
'1.0',
'published',
'MIT',
'BreakPilot Compliance',
false,
true,
CAST('[
"{{ORGANISATION_NAME}}",
"{{ORGANISATION_ADRESSE}}",
"{{VERANTWORTLICHER}}",
"{{ERSTELLT_VON}}",
"{{ERSTELLT_AM}}",
"{{SYSTEM_NAME}}",
"{{SYSTEM_VERSION}}",
"{{SYSTEM_BESCHREIBUNG}}",
"{{SYSTEM_ANBIETER}}",
"{{EINSATZZWECK}}",
"{{EINSATZKONTEXT}}",
"{{BETROFFENE_GRUPPEN}}",
"{{BETROFFENE_ANZAHL}}",
"{{GRUNDRECHTE_ANALYSE}}",
"{{RISIKOMATRIX}}",
"{{MASSNAHMEN_LISTE}}",
"{{HUMAN_OVERSIGHT_BESCHREIBUNG}}",
"{{TRANSPARENZ_MASSNAHMEN}}",
"{{KONSULTATION_ERGEBNISSE}}",
"{{GENEHMIGT_VON}}",
"{{GENEHMIGT_AM}}",
"{{NAECHSTE_UEBERPRUEFUNG}}",
"{{DSB_NAME}}",
"{{DSB_KONTAKT}}",
"{{AI_ACT_KLASSIFIKATION}}",
"{{ANNEX_III_KATEGORIE}}"
]' AS jsonb),
$template$# Grundrechte-Folgenabschaetzung (FRIA)
**gemaess Art. 27 der Verordnung (EU) 2024/1689 (KI-Verordnung)**
---
| Feld | Wert |
|------|------|
| Organisation | {{ORGANISATION_NAME}} |
| Adresse | {{ORGANISATION_ADRESSE}} |
| KI-System | {{SYSTEM_NAME}} (Version {{SYSTEM_VERSION}}) |
| Erstellt von | {{ERSTELLT_VON}} |
| Erstellt am | {{ERSTELLT_AM}} |
| Status | Entwurf |
---
## 1. Systembeschreibung und Einsatzkontext
### 1.1 KI-System
**Systemname:** {{SYSTEM_NAME}}
**Version:** {{SYSTEM_VERSION}}
**Anbieter:** {{SYSTEM_ANBIETER}}
**Beschreibung:** {{SYSTEM_BESCHREIBUNG}}
### 1.2 AI Act Klassifikation
**Risikoklasse:** {{AI_ACT_KLASSIFIKATION}}
{{#IF ANNEX_III_KATEGORIE}}
**Annex III Kategorie:** {{ANNEX_III_KATEGORIE}}
{{/IF}}
### 1.3 Einsatzzweck
{{EINSATZZWECK}}
### 1.4 Einsatzkontext
{{EINSATZKONTEXT}}
Folgende Fragen sind zu beantworten:
- In welchem organisatorischen Kontext wird das System eingesetzt?
- Welche Entscheidungen werden durch das System unterstuetzt oder automatisiert?
- Wie haeufig wird das System eingesetzt?
- Welche Rolle spielt das System im Gesamtprozess?
### 1.5 Betroffene Personengruppen
{{BETROFFENE_GRUPPEN}}
**Geschaetzte Anzahl betroffener Personen:** {{BETROFFENE_ANZAHL}}
{{#IF BILDUNGSKONTEXT}}
**Besonderer Schutz:** Schueler, Studierende und Auszubildende geniessen als besonders schutzbeduerftiger Personenkreis erhoehten Schutz.
{{/IF}}
{{#IF HR_KONTEXT}}
**Besonderer Schutz:** Beschaeftigte und Bewerber befinden sich in einem Abhaengigkeitsverhaeltnis und beduerfen besonderen Schutzes vor diskriminierenden KI-Entscheidungen.
{{/IF}}
---
## 2. Grundrechte-Mapping
### 2.1 Betroffene Grundrechte
Die folgenden Grundrechte der EU-Grundrechtecharta und des Grundgesetzes wurden auf Betroffenheit geprueft:
{{GRUNDRECHTE_ANALYSE}}
### 2.2 Referenz-Grundrechte
| Nr. | Grundrecht | EU-Charta | GG | Betroffen | Begruendung |
|-----|-----------|-----------|-----|-----------|-------------|
| 1 | Menschenwuerde | Art. 1 | Art. 1 | | |
| 2 | Recht auf Privatsphaere | Art. 7 | Art. 2 Abs. 1 | | |
| 3 | Schutz personenbezogener Daten | Art. 8 | Art. 2 Abs. 1 i.V.m. Art. 1 Abs. 1 | | |
| 4 | Nicht-Diskriminierung | Art. 21 | Art. 3 | | |
| 5 | Gleichheit von Frauen und Maennern | Art. 23 | Art. 3 Abs. 2 | | |
| 6 | Rechte des Kindes | Art. 24 | Art. 6 Abs. 2 | | |
| 7 | Recht auf Bildung | Art. 14 | Art. 12 | | |
| 8 | Berufsfreiheit / Recht zu arbeiten | Art. 15 | Art. 12 | | |
| 9 | Recht auf wirksamen Rechtsbehelf | Art. 47 | Art. 19 Abs. 4 | | |
| 10 | Meinungs- und Informationsfreiheit | Art. 11 | Art. 5 | | |
| 11 | Versammlungs- und Vereinigungsfreiheit | Art. 12 | Art. 8, 9 | | |
| 12 | Recht auf soziale Sicherheit | Art. 34 | Art. 20 | | |
{{#IF OEFFENTLICHE_STELLE}}
### 2.3 Besondere Pflichten oeffentlicher Stellen
Als oeffentliche Stelle gelten zusaetzliche Anforderungen:
- Erweiterte Transparenzpflicht gegenueber Buergern
- Pflicht zur Barrierefreiheit des Systems
- Beruecksichtigung des Gleichheitsgrundsatzes (Art. 3 GG)
- Demokratische Kontrolle und Rechenschaftspflicht
{{/IF}}
---
## 3. Risikoanalyse
### 3.1 Risikobewertung pro Grundrecht
Fuer jedes betroffene Grundrecht wird das Risiko bewertet:
**Eintrittswahrscheinlichkeit:**
- 1 = Sehr unwahrscheinlich
- 2 = Unwahrscheinlich
- 3 = Moeglich
- 4 = Wahrscheinlich
- 5 = Sehr wahrscheinlich
**Schadensausmass:**
- 1 = Geringfuegig
- 2 = Begrenzt
- 3 = Erheblich
- 4 = Schwerwiegend
- 5 = Katastrophal
### 3.2 Risikomatrix
{{RISIKOMATRIX}}
| Grundrecht | Risikoszenario | Wahrscheinlichkeit | Schwere | Risiko-Level | Begruendung |
|-----------|----------------|--------------------:|--------:|:------------:|-------------|
| | | | | | |
**Risiko-Level Berechnung:** Wahrscheinlichkeit × Schwere
| Risiko-Level | Punktzahl | Bedeutung |
|:------------:|:---------:|-----------|
| Niedrig | 1-6 | Akzeptables Risiko, Standardmassnahmen |
| Mittel | 7-12 | Erhoehte Aufmerksamkeit, zusaetzliche Massnahmen |
| Hoch | 13-19 | Erhebliches Risiko, umfassende Massnahmen erforderlich |
| Kritisch | 20-25 | Nicht akzeptabel ohne fundamentale Aenderungen |
---
## 4. Massnahmen zur Risikominderung
### 4.1 Uebersicht der Massnahmen
{{MASSNAHMEN_LISTE}}
### 4.2 Human Oversight (Art. 14 KI-VO)
{{HUMAN_OVERSIGHT_BESCHREIBUNG}}
Folgende Massnahmen zur menschlichen Aufsicht werden umgesetzt:
- [ ] Mensch kann KI-Entscheidung jederzeit uebersteuern
- [ ] Mensch versteht KI-Output vollstaendig
- [ ] Keine automatisierten Entscheidungen ohne menschliche Ueberpruefung
- [ ] Schulung der Nutzer zu Systemgrenzen und Risiken
- [ ] Eingriffsprotokolle werden gefuehrt
### 4.3 Transparenz (Art. 13 KI-VO)
{{TRANSPARENZ_MASSNAHMEN}}
Folgende Transparenzmassnahmen werden umgesetzt:
- [ ] Betroffene werden ueber KI-Nutzung informiert
- [ ] KI-generierte Outputs sind als solche gekennzeichnet
- [ ] Erklaerbarkeit der Entscheidungslogik sichergestellt
- [ ] Kontaktmoeglichkeit fuer Betroffene vorhanden
- [ ] Informationen sind verstaendlich und zugaenglich
### 4.4 Logging und Audit (Art. 12 KI-VO)
- [ ] Alle Eingaben und Ausgaben werden protokolliert
- [ ] Logs sind manipulationssicher
- [ ] Aufbewahrungsfristen definiert
- [ ] Audit-Trail fuer Entscheidungsnachvollziehbarkeit
### 4.5 Bias-Pruefung und Nicht-Diskriminierung
- [ ] Trainingsdaten auf Bias geprueft
- [ ] Regelmaessige Bias-Audits geplant
- [ ] Beschwerdemechanismus fuer Diskriminierungsfaelle
{{#IF HR_KONTEXT}}
- [ ] AGG-konforme Gestaltung (kein Bias bei Geschlecht, Alter, Herkunft, Behinderung)
- [ ] Betriebsrat gemaess §95 BetrVG beteiligt (bei Auswahlrichtlinien)
{{/IF}}
{{#IF BILDUNGSKONTEXT}}
- [ ] Chancengleichheit unabhaengig von sozioekonomischem Hintergrund
- [ ] Keine Benachteiligung aufgrund von Sprachkenntnissen oder Behinderung
{{/IF}}
---
## 5. Konsultation
### 5.1 Einbeziehung Betroffener
{{KONSULTATION_ERGEBNISSE}}
Folgende Stakeholder wurden konsultiert:
- [ ] Datenschutzbeauftragter ({{DSB_NAME}}, {{DSB_KONTAKT}})
- [ ] Betroffene Personengruppen oder deren Vertreter
{{#IF HR_KONTEXT}}
- [ ] Betriebsrat / Personalrat
{{/IF}}
{{#IF OEFFENTLICHE_STELLE}}
- [ ] Buergervertreter / Ombudsstelle
- [ ] Zustaendige Aufsichtsbehoerde
{{/IF}}
- [ ] Fachexperten fuer betroffene Grundrechte
### 5.2 Ergebnisse der Konsultation
| Stakeholder | Datum | Ergebnis | Massnahme |
|------------|-------|----------|-----------|
| | | | |
---
## 6. Gesamtbewertung und Freigabe
### 6.1 Gesamtrisiko-Bewertung
| Kriterium | Bewertung |
|-----------|-----------|
| Hoechstes Einzelrisiko | |
| Anzahl betroffene Grundrechte | |
| Anzahl betroffene Personen | {{BETROFFENE_ANZAHL}} |
| Massnahmen ausreichend | Ja / Nein / Teilweise |
| Restrisiko akzeptabel | Ja / Nein |
### 6.2 Entscheidung
- [ ] **Freigabe** Restrisiko akzeptabel, Massnahmen ausreichend
- [ ] **Freigabe mit Auflagen** Zusaetzliche Massnahmen erforderlich (siehe unten)
- [ ] **Ablehnung** Grundrechtsrisiken nicht akzeptabel mitigierbar
### 6.3 Auflagen (falls zutreffend)
| Nr. | Auflage | Frist | Verantwortlich |
|-----|---------|-------|----------------|
| | | | |
---
## 7. Laufende Ueberwachung
### 7.1 Naechste Ueberpruefung
**Geplante Ueberpruefung:** {{NAECHSTE_UEBERPRUEFUNG}}
### 7.2 Trigger fuer ausserplanmaessige Ueberpruefung
Eine erneute FRIA ist durchzufuehren bei:
- Wesentlicher Aenderung des KI-Systems oder seines Einsatzzwecks
- Erweiterung auf neue Personengruppen oder Anwendungsbereiche
- Beschwerden oder Vorfaellen mit Grundrechtsbezug
- Aenderung der Rechtsgrundlage oder Risikoklassifikation
- Neuen wissenschaftlichen Erkenntnissen zu Risiken
- Aenderung des KI-Modells oder der Trainingsdaten
### 7.3 Dokumentation und Archivierung
Diese FRIA wird mindestens fuer die Dauer des Einsatzes des KI-Systems und darueberhinaus fuer 10 Jahre archiviert (Art. 18 KI-VO).
---
## 8. Unterschriften
| | |
|---|---|
| _________________________ | _________________________ |
| {{ERSTELLT_VON}} | {{GENEHMIGT_VON}} |
| Erstellt am {{ERSTELLT_AM}} | Genehmigt am {{GENEHMIGT_AM}} |
---
**Anhang A:** Vollstaendige Systemdokumentation (Art. 11 KI-VO)
**Anhang B:** AI Act Decision Tree Ergebnis
**Anhang C:** Verknuepfte DSFA (falls vorhanden)
**Anhang D:** Konsultationsprotokolle
$template$
) ON CONFLICT DO NOTHING;

View File

@@ -0,0 +1,137 @@
#!/usr/bin/env python3
"""Cleanup script: Delete temporary DPA template documents from Qdrant.
Removes all points with payload field `temp_vorlagen=true` from
the bp_compliance_datenschutz collection.
Usage:
python cleanup_temp_vorlagen.py --dry-run # Preview only
python cleanup_temp_vorlagen.py # Execute deletion
python cleanup_temp_vorlagen.py --qdrant-url http://localhost:6333
"""
import argparse
import json
import sys
from typing import Optional
from urllib.request import Request, urlopen
from urllib.error import URLError
def qdrant_request(base_url: str, method: str, path: str, body: Optional[dict] = None) -> dict:
url = f"{base_url}{path}"
data = json.dumps(body).encode() if body else None
headers = {"Content-Type": "application/json"} if data else {}
req = Request(url, data=data, headers=headers, method=method)
with urlopen(req, timeout=30) as resp:
return json.loads(resp.read())
def count_temp_vorlagen(base_url: str, collection: str) -> int:
"""Count points with temp_vorlagen=true."""
body = {
"filter": {
"must": [
{"key": "temp_vorlagen", "match": {"value": True}}
]
},
"limit": 0,
"exact": True,
}
result = qdrant_request(base_url, "POST", f"/collections/{collection}/points/count", body)
return result.get("result", {}).get("count", 0)
def list_temp_regulation_ids(base_url: str, collection: str) -> list[str]:
"""Get distinct regulation_ids of temp documents."""
body = {
"filter": {
"must": [
{"key": "temp_vorlagen", "match": {"value": True}}
]
},
"limit": 500,
"with_payload": ["regulation_id", "title", "source"],
}
result = qdrant_request(base_url, "POST", f"/collections/{collection}/points/scroll", body)
points = result.get("result", {}).get("points", [])
seen = {}
for p in points:
payload = p.get("payload", {})
rid = payload.get("regulation_id", "unknown")
if rid not in seen:
seen[rid] = {
"regulation_id": rid,
"title": payload.get("title", ""),
"source": payload.get("source", ""),
}
return list(seen.values())
def delete_temp_vorlagen(base_url: str, collection: str) -> int:
"""Delete all points with temp_vorlagen=true."""
body = {
"filter": {
"must": [
{"key": "temp_vorlagen", "match": {"value": True}}
]
}
}
result = qdrant_request(base_url, "POST", f"/collections/{collection}/points/delete", body)
status = result.get("status", "unknown")
return status
def main():
parser = argparse.ArgumentParser(description="Delete temp DPA templates from Qdrant")
parser.add_argument("--qdrant-url", default="http://localhost:6333",
help="Qdrant URL (default: http://localhost:6333)")
parser.add_argument("--collection", default="bp_compliance_datenschutz",
help="Qdrant collection name")
parser.add_argument("--dry-run", action="store_true",
help="Only count and list, do not delete")
args = parser.parse_args()
print(f"Qdrant URL: {args.qdrant_url}")
print(f"Collection: {args.collection}")
print()
try:
count = count_temp_vorlagen(args.qdrant_url, args.collection)
except URLError as e:
print(f"ERROR: Cannot connect to Qdrant at {args.qdrant_url}: {e}")
sys.exit(1)
print(f"Gefundene Punkte mit temp_vorlagen=true: {count}")
if count == 0:
print("Nichts zu loeschen.")
return
docs = list_temp_regulation_ids(args.qdrant_url, args.collection)
print(f"\nBetroffene Dokumente ({len(docs)}):")
for doc in sorted(docs, key=lambda d: d["regulation_id"]):
source = f" [{doc['source']}]" if doc.get("source") else ""
title = f"{doc['title']}" if doc.get("title") else ""
print(f" - {doc['regulation_id']}{title}{source}")
if args.dry_run:
print(f"\n[DRY-RUN] Wuerde {count} Punkte loeschen. Keine Aenderung durchgefuehrt.")
return
print(f"\nLoesche {count} Punkte ...")
status = delete_temp_vorlagen(args.qdrant_url, args.collection)
print(f"Status: {status}")
remaining = count_temp_vorlagen(args.qdrant_url, args.collection)
print(f"Verbleibende temp_vorlagen Punkte: {remaining}")
if remaining == 0:
print("Cleanup erfolgreich abgeschlossen.")
else:
print(f"WARNUNG: {remaining} Punkte konnten nicht geloescht werden.")
if __name__ == "__main__":
main()

View File

@@ -251,14 +251,251 @@ async def rerank_cohere(query: str, documents: List[str], top_k: int = 5) -> Lis
GERMAN_ABBREVIATIONS = {
'bzw', 'ca', 'chr', 'd.h', 'dr', 'etc', 'evtl', 'ggf', 'inkl', 'max',
'min', 'mio', 'mrd', 'nr', 'prof', 's', 'sog', 'u.a', 'u.ä', 'usw',
'v.a', 'vgl', 'vs', 'z.b', 'z.t', 'zzgl'
'v.a', 'vgl', 'vs', 'z.b', 'z.t', 'zzgl', 'abs', 'art', 'abschn',
'anh', 'anl', 'aufl', 'bd', 'bes', 'bzgl', 'dgl', 'einschl', 'entspr',
'erg', 'erl', 'gem', 'grds', 'hrsg', 'insb', 'ivm', 'kap', 'lit',
'nachf', 'rdnr', 'rn', 'rz', 'ua', 'uvm', 'vorst', 'ziff'
}
# English abbreviations that don't end sentences
ENGLISH_ABBREVIATIONS = {
'e.g', 'i.e', 'etc', 'vs', 'al', 'approx', 'avg', 'dept', 'dr', 'ed',
'est', 'fig', 'govt', 'inc', 'jr', 'ltd', 'max', 'min', 'mr', 'mrs',
'ms', 'no', 'prof', 'pt', 'ref', 'rev', 'sec', 'sgt', 'sr', 'st',
'vol', 'cf', 'ch', 'cl', 'col', 'corp', 'cpl', 'def', 'dist', 'div',
'gen', 'hon', 'illus', 'intl', 'natl', 'org', 'para', 'pp', 'repr',
'resp', 'supp', 'tech', 'temp', 'treas', 'univ'
}
# Combined abbreviations for both languages
ALL_ABBREVIATIONS = GERMAN_ABBREVIATIONS | ENGLISH_ABBREVIATIONS
# Regex pattern for legal section headers (§, Art., Article, Section, etc.)
import re
_LEGAL_SECTION_RE = re.compile(
r'^(?:'
r'§\s*\d+' # § 25, § 5a
r'|Art(?:ikel|icle|\.)\s*\d+' # Artikel 5, Article 12, Art. 3
r'|Section\s+\d+' # Section 4.2
r'|Abschnitt\s+\d+' # Abschnitt III
r'|Kapitel\s+\d+' # Kapitel 2
r'|Chapter\s+\d+' # Chapter 3
r'|Anhang\s+[IVXLC\d]+' # Anhang III
r'|Annex\s+[IVXLC\d]+' # Annex XII
r'|TEIL\s+[IVXLC\d]+' # TEIL II
r'|Part\s+[IVXLC\d]+' # Part III
r'|Recital\s+\d+' # Recital 42
r'|Erwaegungsgrund\s+\d+' # Erwaegungsgrund 26
r')',
re.IGNORECASE | re.MULTILINE
)
# Regex for any heading-like line (Markdown ## or ALL-CAPS line)
_HEADING_RE = re.compile(
r'^(?:'
r'#{1,6}\s+.+' # Markdown headings
r'|[A-ZÄÖÜ][A-ZÄÖÜ\s\-]{5,}$' # ALL-CAPS lines (>5 chars)
r')',
re.MULTILINE
)
def _detect_language(text: str) -> str:
"""Simple heuristic: count German vs English marker words."""
sample = text[:5000].lower()
de_markers = sum(1 for w in ['der', 'die', 'das', 'und', 'ist', 'für', 'von',
'werden', 'nach', 'gemäß', 'sowie', 'durch']
if f' {w} ' in sample)
en_markers = sum(1 for w in ['the', 'and', 'for', 'that', 'with', 'shall',
'must', 'should', 'which', 'from', 'this']
if f' {w} ' in sample)
return 'de' if de_markers > en_markers else 'en'
def _protect_abbreviations(text: str) -> str:
"""Replace dots in abbreviations with placeholders to prevent false sentence splits."""
protected = text
for abbrev in ALL_ABBREVIATIONS:
pattern = re.compile(r'\b(' + re.escape(abbrev) + r')\.', re.IGNORECASE)
# Use lambda to preserve original case of the matched abbreviation
protected = pattern.sub(lambda m: m.group(1).replace('.', '<DOT>') + '<ABBR>', protected)
# Protect decimals (3.14) and ordinals (1. Absatz)
protected = re.sub(r'(\d)\.(\d)', r'\1<DECIMAL>\2', protected)
protected = re.sub(r'(\d+)\.\s', r'\1<ORD> ', protected)
return protected
def _restore_abbreviations(text: str) -> str:
"""Restore placeholders back to dots."""
return (text
.replace('<DOT>', '.')
.replace('<ABBR>', '.')
.replace('<DECIMAL>', '.')
.replace('<ORD>', '.'))
def _split_sentences(text: str) -> List[str]:
"""Split text into sentences, respecting abbreviations in DE and EN."""
protected = _protect_abbreviations(text)
# Split after sentence-ending punctuation followed by uppercase or newline
sentence_pattern = r'(?<=[.!?])\s+(?=[A-ZÄÖÜÀ-Ý])|(?<=[.!?])\s*\n'
raw = re.split(sentence_pattern, protected)
sentences = []
for s in raw:
s = _restore_abbreviations(s).strip()
if s:
sentences.append(s)
return sentences
def _extract_section_header(line: str) -> Optional[str]:
"""Extract a legal section header from a line, or None."""
m = _LEGAL_SECTION_RE.match(line.strip())
if m:
return line.strip()
m = _HEADING_RE.match(line.strip())
if m:
return line.strip()
return None
def chunk_text_legal(text: str, chunk_size: int, overlap: int) -> List[str]:
"""
Legal-document-aware chunking.
Strategy:
1. Split on legal section boundaries (§, Art., Section, Chapter, etc.)
2. Within each section, split on paragraph boundaries (double newline)
3. Within each paragraph, split on sentence boundaries
4. Prepend section header as context prefix to every chunk
5. Add overlap from previous chunk
Works for both German (DSGVO, BGB, AI Act DE) and English (NIST, SLSA, CRA EN) texts.
"""
if not text or len(text) <= chunk_size:
return [text.strip()] if text and text.strip() else []
# --- Phase 1: Split into sections by legal headers ---
lines = text.split('\n')
sections = [] # list of (header, content)
current_header = None
current_lines = []
for line in lines:
header = _extract_section_header(line)
if header and current_lines:
sections.append((current_header, '\n'.join(current_lines)))
current_header = header
current_lines = [line]
elif header and not current_lines:
current_header = header
current_lines = [line]
else:
current_lines.append(line)
if current_lines:
sections.append((current_header, '\n'.join(current_lines)))
# --- Phase 2: Within each section, split on paragraphs, then sentences ---
raw_chunks = []
for section_header, section_text in sections:
# Build context prefix (max 120 chars to leave room for content)
prefix = ""
if section_header:
truncated = section_header[:120]
prefix = f"[{truncated}] "
paragraphs = re.split(r'\n\s*\n', section_text)
current_chunk = prefix
current_length = len(prefix)
for para in paragraphs:
para = para.strip()
if not para:
continue
# If paragraph fits in remaining space, append
if current_length + len(para) + 1 <= chunk_size:
if current_chunk and not current_chunk.endswith(' '):
current_chunk += '\n\n'
current_chunk += para
current_length = len(current_chunk)
continue
# Paragraph doesn't fit — flush current chunk if non-empty
if current_chunk.strip() and current_chunk.strip() != prefix.strip():
raw_chunks.append(current_chunk.strip())
# If entire paragraph fits in a fresh chunk, start new chunk
if len(prefix) + len(para) <= chunk_size:
current_chunk = prefix + para
current_length = len(current_chunk)
continue
# Paragraph too long — split by sentences
sentences = _split_sentences(para)
current_chunk = prefix
current_length = len(prefix)
for sentence in sentences:
sentence_len = len(sentence)
# Single sentence exceeds chunk_size — force-split
if len(prefix) + sentence_len > chunk_size:
if current_chunk.strip() and current_chunk.strip() != prefix.strip():
raw_chunks.append(current_chunk.strip())
# Hard split the long sentence
remaining = sentence
while remaining:
take = chunk_size - len(prefix)
chunk_part = prefix + remaining[:take]
raw_chunks.append(chunk_part.strip())
remaining = remaining[take:]
current_chunk = prefix
current_length = len(prefix)
continue
if current_length + sentence_len + 1 > chunk_size:
if current_chunk.strip() and current_chunk.strip() != prefix.strip():
raw_chunks.append(current_chunk.strip())
current_chunk = prefix + sentence
current_length = len(current_chunk)
else:
if current_chunk and not current_chunk.endswith(' '):
current_chunk += ' '
current_chunk += sentence
current_length = len(current_chunk)
# Flush remaining content for this section
if current_chunk.strip() and current_chunk.strip() != prefix.strip():
raw_chunks.append(current_chunk.strip())
if not raw_chunks:
return [text.strip()] if text.strip() else []
# --- Phase 3: Add overlap ---
final_chunks = []
for i, chunk in enumerate(raw_chunks):
if i > 0 and overlap > 0:
prev = raw_chunks[i - 1]
# Take overlap from end of previous chunk (but not the prefix)
overlap_text = prev[-min(overlap, len(prev)):]
# Only add overlap if it doesn't start mid-word
space_idx = overlap_text.find(' ')
if space_idx > 0:
overlap_text = overlap_text[space_idx + 1:]
if overlap_text:
chunk = overlap_text + ' ' + chunk
final_chunks.append(chunk.strip())
return [c for c in final_chunks if c]
def chunk_text_recursive(text: str, chunk_size: int, overlap: int) -> List[str]:
"""Recursive character-based chunking."""
import re
"""Recursive character-based chunking (legacy, use legal_recursive for legal docs)."""
if not text or len(text) <= chunk_size:
return [text] if text else []
@@ -315,36 +552,23 @@ def chunk_text_recursive(text: str, chunk_size: int, overlap: int) -> List[str]:
def chunk_text_semantic(text: str, chunk_size: int, overlap_sentences: int = 1) -> List[str]:
"""Semantic sentence-aware chunking."""
import re
if not text:
return []
if len(text) <= chunk_size:
return [text.strip()]
# Split into sentences (simplified for German)
text = re.sub(r'\s+', ' ', text).strip()
# Protect abbreviations
protected = text
for abbrev in GERMAN_ABBREVIATIONS:
pattern = re.compile(r'\b' + re.escape(abbrev) + r'\.', re.IGNORECASE)
protected = pattern.sub(abbrev.replace('.', '<DOT>') + '<ABBR>', protected)
# Protect decimals and ordinals
protected = re.sub(r'(\d)\.(\d)', r'\1<DECIMAL>\2', protected)
protected = re.sub(r'(\d+)\.(\s)', r'\1<ORD>\2', protected)
protected = _protect_abbreviations(text)
# Split on sentence endings
sentence_pattern = r'(?<=[.!?])\s+(?=[A-ZÄÖÜ])|(?<=[.!?])$'
sentence_pattern = r'(?<=[.!?])\s+(?=[A-ZÄÖÜÀ-Ý])|(?<=[.!?])$'
raw_sentences = re.split(sentence_pattern, protected)
# Restore protected characters
sentences = []
for s in raw_sentences:
s = s.replace('<DOT>', '.').replace('<ABBR>', '.').replace('<DECIMAL>', '.').replace('<ORD>', '.')
s = s.strip()
s = _restore_abbreviations(s).strip()
if s:
sentences.append(s)
@@ -638,7 +862,16 @@ async def rerank_documents(request: RerankRequest):
@app.post("/chunk", response_model=ChunkResponse)
async def chunk_text(request: ChunkRequest):
"""Chunk text into smaller pieces."""
"""Chunk text into smaller pieces.
Strategies:
- "recursive" (default): Legal-document-aware chunking with §/Art./Section
boundary detection, section context headers, paragraph-level splitting,
and sentence-level splitting respecting DE + EN abbreviations.
- "semantic": Sentence-aware chunking with overlap by sentence count.
The old plain recursive chunker has been retired and is no longer available.
"""
if not request.text:
return ChunkResponse(chunks=[], count=0, strategy=request.strategy)
@@ -647,7 +880,9 @@ async def chunk_text(request: ChunkRequest):
overlap_sentences = max(1, request.overlap // 100)
chunks = chunk_text_semantic(request.text, request.chunk_size, overlap_sentences)
else:
chunks = chunk_text_recursive(request.text, request.chunk_size, request.overlap)
# All strategies (recursive, legal_recursive, etc.) use the legal-aware chunker.
# The old plain recursive chunker is no longer exposed via the API.
chunks = chunk_text_legal(request.text, request.chunk_size, request.overlap)
return ChunkResponse(
chunks=chunks,

View File

@@ -0,0 +1,288 @@
"""
Tests for the legal-aware chunking pipeline.
Covers:
- Legal section header detection (§, Art., Section, Chapter, Annex)
- Section context prefix in every chunk
- Paragraph boundary splitting
- Sentence splitting with DE and EN abbreviation protection
- Overlap between chunks
- Fallback for non-legal text
- Long sentence force-splitting
"""
import pytest
from main import (
chunk_text_legal,
chunk_text_recursive,
chunk_text_semantic,
_extract_section_header,
_split_sentences,
_detect_language,
_protect_abbreviations,
_restore_abbreviations,
)
# =========================================================================
# Section header detection
# =========================================================================
class TestSectionHeaderDetection:
def test_german_paragraph(self):
assert _extract_section_header("§ 25 Informationspflichten") is not None
def test_german_paragraph_with_letter(self):
assert _extract_section_header("§ 5a Elektronischer Geschaeftsverkehr") is not None
def test_german_artikel(self):
assert _extract_section_header("Artikel 5 Grundsaetze") is not None
def test_english_article(self):
assert _extract_section_header("Article 12 Transparency") is not None
def test_article_abbreviated(self):
assert _extract_section_header("Art. 3 Definitions") is not None
def test_english_section(self):
assert _extract_section_header("Section 4.2 Risk Assessment") is not None
def test_german_abschnitt(self):
assert _extract_section_header("Abschnitt 3 Pflichten") is not None
def test_chapter(self):
assert _extract_section_header("Chapter 5 Obligations") is not None
def test_german_kapitel(self):
assert _extract_section_header("Kapitel 2 Anwendungsbereich") is not None
def test_annex_roman(self):
assert _extract_section_header("Annex XII Technical Documentation") is not None
def test_german_anhang(self):
assert _extract_section_header("Anhang III Hochrisiko-KI") is not None
def test_part(self):
assert _extract_section_header("Part III Requirements") is not None
def test_markdown_heading(self):
assert _extract_section_header("## 3.1 Overview") is not None
def test_normal_text_not_header(self):
assert _extract_section_header("This is a normal sentence.") is None
def test_short_caps_not_header(self):
assert _extract_section_header("OK") is None
# =========================================================================
# Language detection
# =========================================================================
class TestLanguageDetection:
def test_german_text(self):
text = "Die Verordnung ist für alle Mitgliedstaaten verbindlich und gilt nach dem Grundsatz der unmittelbaren Anwendbarkeit."
assert _detect_language(text) == 'de'
def test_english_text(self):
text = "This regulation shall be binding in its entirety and directly applicable in all Member States."
assert _detect_language(text) == 'en'
# =========================================================================
# Abbreviation protection
# =========================================================================
class TestAbbreviationProtection:
def test_german_abbreviations(self):
text = "gem. § 5 Abs. 1 bzw. § 6 Abs. 2 z.B. die Pflicht"
protected = _protect_abbreviations(text)
assert "." not in protected.replace("<DOT>", "").replace("<DECIMAL>", "").replace("<ORD>", "").replace("<ABBR>", "")
restored = _restore_abbreviations(protected)
assert "gem." in restored
assert "z.B." in restored.replace("z.b.", "z.B.") or "z.b." in restored
def test_english_abbreviations(self):
text = "e.g. section 4.2, i.e. the requirements in vol. 1 ref. NIST SP 800-30."
protected = _protect_abbreviations(text)
# "e.g" and "i.e" should be protected
restored = _restore_abbreviations(protected)
assert "e.g." in restored
def test_decimals_protected(self):
text = "Version 3.14 of the specification requires 2.5 GB."
protected = _protect_abbreviations(text)
assert "<DECIMAL>" in protected
restored = _restore_abbreviations(protected)
assert "3.14" in restored
# =========================================================================
# Sentence splitting
# =========================================================================
class TestSentenceSplitting:
def test_simple_german(self):
text = "Erster Satz. Zweiter Satz. Dritter Satz."
sentences = _split_sentences(text)
assert len(sentences) >= 2
def test_simple_english(self):
text = "First sentence. Second sentence. Third sentence."
sentences = _split_sentences(text)
assert len(sentences) >= 2
def test_german_abbreviation_not_split(self):
text = "Gem. Art. 5 Abs. 1 DSGVO ist die Verarbeitung rechtmaessig. Der Verantwortliche muss dies nachweisen."
sentences = _split_sentences(text)
# Should NOT split at "Gem." or "Art." or "Abs."
assert any("Gem" in s and "DSGVO" in s for s in sentences)
def test_english_abbreviation_not_split(self):
text = "See e.g. Section 4.2 for details. The standard also references vol. 1 of the NIST SP series."
sentences = _split_sentences(text)
assert any("e.g" in s and "Section" in s for s in sentences)
def test_exclamation_and_question(self):
text = "Is this valid? Yes it is! Continue processing."
sentences = _split_sentences(text)
assert len(sentences) >= 2
# =========================================================================
# Legal chunking
# =========================================================================
class TestChunkTextLegal:
def test_small_text_single_chunk(self):
text = "Short text."
chunks = chunk_text_legal(text, chunk_size=1024, overlap=128)
assert len(chunks) == 1
assert chunks[0] == "Short text."
def test_section_header_as_prefix(self):
text = "§ 25 Informationspflichten\n\nDer Betreiber muss den Nutzer informieren. " * 20
chunks = chunk_text_legal(text, chunk_size=200, overlap=0)
assert len(chunks) > 1
# Every chunk should have the section prefix
for chunk in chunks:
assert "[§ 25" in chunk or "§ 25" in chunk
def test_article_prefix_english(self):
text = "Article 12 Transparency\n\n" + "The provider shall ensure transparency of AI systems. " * 30
chunks = chunk_text_legal(text, chunk_size=300, overlap=0)
assert len(chunks) > 1
for chunk in chunks:
assert "Article 12" in chunk
def test_multiple_sections(self):
text = (
"§ 1 Anwendungsbereich\n\nDieses Gesetz gilt fuer alle Betreiber.\n\n"
"§ 2 Begriffsbestimmungen\n\nIm Sinne dieses Gesetzes ist Betreiber, wer eine Anlage betreibt.\n\n"
"§ 3 Pflichten\n\nDer Betreiber hat die Pflicht, die Anlage sicher zu betreiben."
)
chunks = chunk_text_legal(text, chunk_size=200, overlap=0)
# Should have chunks from different sections
section_headers = set()
for chunk in chunks:
if "[§ 1" in chunk:
section_headers.add("§ 1")
if "[§ 2" in chunk:
section_headers.add("§ 2")
if "[§ 3" in chunk:
section_headers.add("§ 3")
assert len(section_headers) >= 2
def test_paragraph_boundaries_respected(self):
para1 = "First paragraph with enough text to matter. " * 5
para2 = "Second paragraph also with content. " * 5
text = para1.strip() + "\n\n" + para2.strip()
chunks = chunk_text_legal(text, chunk_size=300, overlap=0)
# Paragraphs should not be merged mid-sentence across chunk boundary
assert len(chunks) >= 2
def test_overlap_present(self):
text = "Sentence one about topic A. " * 10 + "\n\n" + "Sentence two about topic B. " * 10
chunks = chunk_text_legal(text, chunk_size=200, overlap=50)
if len(chunks) > 1:
# Second chunk should contain some text from end of first chunk
end_of_first = chunks[0][-30:]
# At least some overlap words should appear
overlap_words = set(end_of_first.split())
second_start_words = set(chunks[1][:80].split())
assert len(overlap_words & second_start_words) > 0
def test_nist_style_sections(self):
text = (
"Section 2.1 Risk Framing\n\n"
"Risk framing establishes the context for risk-based decisions. "
"Organizations must define their risk tolerance. " * 10 + "\n\n"
"Section 2.2 Risk Assessment\n\n"
"Risk assessment identifies threats and vulnerabilities. " * 10
)
chunks = chunk_text_legal(text, chunk_size=400, overlap=0)
has_21 = any("Section 2.1" in c for c in chunks)
has_22 = any("Section 2.2" in c for c in chunks)
assert has_21 and has_22
def test_markdown_heading_as_context(self):
text = (
"## 3.1 Overview\n\n"
"This section provides an overview of the specification. " * 15
)
chunks = chunk_text_legal(text, chunk_size=300, overlap=0)
assert len(chunks) > 1
for chunk in chunks:
assert "3.1 Overview" in chunk
def test_empty_text(self):
assert chunk_text_legal("", 1024, 128) == []
def test_whitespace_only(self):
assert chunk_text_legal(" \n\n ", 1024, 128) == []
def test_long_sentence_force_split(self):
long_sentence = "A" * 2000
chunks = chunk_text_legal(long_sentence, chunk_size=500, overlap=0)
assert len(chunks) >= 4
for chunk in chunks:
assert len(chunk) <= 500 + 20 # small margin for prefix
# =========================================================================
# Legacy recursive chunking still works
# =========================================================================
class TestChunkTextRecursive:
def test_basic_split(self):
text = "Hello world. " * 200
chunks = chunk_text_recursive(text, chunk_size=500, overlap=50)
assert len(chunks) > 1
for chunk in chunks:
assert len(chunk) <= 600 # some margin for overlap
def test_small_text(self):
chunks = chunk_text_recursive("Short.", chunk_size=1024, overlap=128)
assert chunks == ["Short."]
# =========================================================================
# Semantic chunking still works
# =========================================================================
class TestChunkTextSemantic:
def test_basic_split(self):
text = "First sentence. Second sentence. Third sentence. Fourth sentence. Fifth sentence."
chunks = chunk_text_semantic(text, chunk_size=50, overlap_sentences=1)
assert len(chunks) >= 2
def test_small_text(self):
chunks = chunk_text_semantic("Short.", chunk_size=1024, overlap_sentences=1)
assert chunks == ["Short."]

View File

@@ -0,0 +1,5 @@
node_modules
.next
.git
Dockerfile
.dockerignore

27
levis-holzbau/Dockerfile Normal file
View File

@@ -0,0 +1,27 @@
FROM node:20-alpine AS base
FROM base AS deps
WORKDIR /app
COPY package.json package-lock.json* ./
RUN npm ci
FROM base AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN mkdir -p public
RUN npm run build
FROM base AS runner
WORKDIR /app
ENV NODE_ENV=production
RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs
COPY --from=builder /app/public ./public
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static
USER nextjs
EXPOSE 3000
ENV PORT=3000
ENV HOSTNAME="0.0.0.0"
CMD ["node", "server.js"]

View File

@@ -0,0 +1,25 @@
@tailwind base;
@tailwind components;
@tailwind utilities;
@import url('https://fonts.googleapis.com/css2?family=Quicksand:wght@500;600;700&family=Nunito:wght@400;600;700&display=swap');
html {
scroll-behavior: smooth;
}
body {
font-family: 'Nunito', sans-serif;
background-color: #FDF8F0;
color: #2C2C2C;
}
h1, h2, h3, h4, h5, h6 {
font-family: 'Quicksand', sans-serif;
}
@layer utilities {
.text-balance {
text-wrap: balance;
}
}

View File

@@ -0,0 +1,21 @@
import type { Metadata } from 'next'
import './globals.css'
import { Navbar } from '@/components/Navbar'
import { Footer } from '@/components/Footer'
export const metadata: Metadata = {
title: 'LEVIS Holzbau — Kinder-Holzwerkstatt',
description: 'Lerne Holzfiguren schnitzen und kleine Holzprojekte bauen! Kindgerechte Anleitungen fuer junge Holzwerker.',
}
export default function RootLayout({ children }: { children: React.ReactNode }) {
return (
<html lang="de">
<body className="min-h-screen flex flex-col">
<Navbar />
<main className="flex-1">{children}</main>
<Footer />
</body>
</html>
)
}

View File

@@ -0,0 +1,71 @@
'use client'
import { motion } from 'framer-motion'
import { Hammer, TreePine, ShieldCheck } from 'lucide-react'
import { HeroSection } from '@/components/HeroSection'
import { ProjectCard } from '@/components/ProjectCard'
import { projects } from '@/lib/projects'
const features = [
{
icon: Hammer,
title: 'Schnitzen',
description: 'Lerne mit Schnitzmesser und Holz umzugehen und forme eigene Figuren.',
color: 'bg-primary/10 text-primary',
},
{
icon: TreePine,
title: 'Bauen',
description: 'Saege, leime und nagle — baue nuetzliche Dinge aus Holz!',
color: 'bg-secondary/10 text-secondary',
},
{
icon: ShieldCheck,
title: 'Sicherheit',
description: 'Jedes Projekt zeigt dir, wie du sicher mit Werkzeug arbeitest.',
color: 'bg-accent/10 text-accent',
},
]
export default function HomePage() {
const featured = projects.slice(0, 4)
return (
<>
<HeroSection />
{/* Features */}
<section className="max-w-6xl mx-auto px-4 py-16">
<div className="grid grid-cols-1 sm:grid-cols-3 gap-6">
{features.map((f, i) => (
<motion.div
key={f.title}
className="bg-white rounded-2xl p-6 shadow-sm border border-primary/5 text-center"
initial={{ opacity: 0, y: 20 }}
animate={{ opacity: 1, y: 0 }}
transition={{ delay: i * 0.1 }}
>
<div className={`w-14 h-14 rounded-xl ${f.color} flex items-center justify-center mx-auto mb-4`}>
<f.icon className="w-7 h-7" />
</div>
<h3 className="font-heading font-bold text-lg mb-2">{f.title}</h3>
<p className="text-sm text-dark/60">{f.description}</p>
</motion.div>
))}
</div>
</section>
{/* Popular Projects */}
<section className="max-w-6xl mx-auto px-4 pb-16">
<h2 className="font-heading font-bold text-3xl text-center mb-8">
Beliebte Projekte
</h2>
<div className="grid grid-cols-1 sm:grid-cols-2 lg:grid-cols-4 gap-6">
{featured.map((p) => (
<ProjectCard key={p.slug} project={p} />
))}
</div>
</section>
</>
)
}

View File

@@ -0,0 +1,120 @@
import { notFound } from 'next/navigation'
import Link from 'next/link'
import { ArrowLeft, Clock, Wrench, Package } from 'lucide-react'
import { projects, getProject, getRelatedProjects } from '@/lib/projects'
import { DifficultyBadge } from '@/components/DifficultyBadge'
import { AgeBadge } from '@/components/AgeBadge'
import { StepCard } from '@/components/StepCard'
import { SafetyTip } from '@/components/SafetyTip'
import { ToolIcon } from '@/components/ToolIcon'
import { ProjectIllustration } from '@/components/ProjectIllustration'
import { ProjectCard } from '@/components/ProjectCard'
export function generateStaticParams() {
return projects.map((p) => ({ slug: p.slug }))
}
export default async function ProjectPage({ params }: { params: Promise<{ slug: string }> }) {
const { slug } = await params
const project = getProject(slug)
if (!project) notFound()
const related = getRelatedProjects(slug)
return (
<div className="max-w-4xl mx-auto px-4 py-8">
{/* Back */}
<Link href="/projekte" className="inline-flex items-center gap-1 text-accent hover:underline mb-6 text-sm font-semibold">
<ArrowLeft className="w-4 h-4" /> Alle Projekte
</Link>
{/* Hero */}
<div className="bg-white rounded-2xl shadow-sm border border-primary/5 overflow-hidden mb-8">
<div className="bg-cream p-10 flex items-center justify-center">
<ProjectIllustration slug={project.slug} size={180} />
</div>
<div className="p-6 sm:p-8">
<div className="flex flex-wrap items-center gap-3 mb-3">
<AgeBadge range={project.ageRange} />
<DifficultyBadge level={project.difficulty} />
<span className="flex items-center gap-1 text-sm text-dark/50">
<Clock className="w-4 h-4" /> {project.duration}
</span>
</div>
<h1 className="font-heading font-bold text-3xl sm:text-4xl mb-3">{project.name}</h1>
<p className="text-dark/70 text-lg leading-relaxed">{project.description}</p>
</div>
</div>
{/* Tools & Materials */}
<div className="grid grid-cols-1 sm:grid-cols-2 gap-4 mb-8">
<div className="bg-white rounded-2xl p-6 border border-primary/5">
<h2 className="font-heading font-bold text-lg flex items-center gap-2 mb-4">
<Wrench className="w-5 h-5 text-primary" /> Werkzeuge
</h2>
<ul className="space-y-2">
{project.tools.map((t) => (
<li key={t} className="flex items-center gap-2 text-sm">
<ToolIcon name={t} />
{t}
</li>
))}
</ul>
</div>
<div className="bg-white rounded-2xl p-6 border border-primary/5">
<h2 className="font-heading font-bold text-lg flex items-center gap-2 mb-4">
<Package className="w-5 h-5 text-secondary" /> Material
</h2>
<ul className="space-y-2">
{project.materials.map((m) => (
<li key={m} className="flex items-center gap-2 text-sm">
<span className="w-2 h-2 rounded-full bg-secondary flex-shrink-0" />
{m}
</li>
))}
</ul>
</div>
</div>
{/* Safety */}
<div className="space-y-3 mb-10">
<h2 className="font-heading font-bold text-xl mb-2">Sicherheitshinweise</h2>
{project.safetyTips.map((tip) => (
<SafetyTip key={tip}>{tip}</SafetyTip>
))}
</div>
{/* Steps */}
<div className="mb-10">
<h2 className="font-heading font-bold text-xl mb-6">Schritt fuer Schritt</h2>
<div className="space-y-0">
{project.steps.map((step, i) => (
<StepCard key={i} step={step} index={i} />
))}
</div>
</div>
{/* Skills */}
<div className="bg-secondary/5 rounded-2xl p-6 mb-12">
<h2 className="font-heading font-bold text-xl mb-3">Was du lernst</h2>
<div className="flex flex-wrap gap-2">
{project.skills.map((s) => (
<span key={s} className="px-3 py-1.5 bg-secondary/10 text-secondary rounded-full text-sm font-semibold">
{s}
</span>
))}
</div>
</div>
{/* Related */}
<div>
<h2 className="font-heading font-bold text-xl mb-6">Aehnliche Projekte</h2>
<div className="grid grid-cols-1 sm:grid-cols-3 gap-4">
{related.map((p) => (
<ProjectCard key={p.slug} project={p} />
))}
</div>
</div>
</div>
)
}

View File

@@ -0,0 +1,59 @@
'use client'
import { useState } from 'react'
import { motion } from 'framer-motion'
import { ProjectCard } from '@/components/ProjectCard'
import { projects } from '@/lib/projects'
const filters = [
{ label: 'Alle', value: 0 },
{ label: 'Anfaenger', value: 1 },
{ label: 'Fortgeschritten', value: 2 },
{ label: 'Profi', value: 3 },
]
export default function ProjektePage() {
const [filter, setFilter] = useState(0)
const filtered = filter === 0 ? projects : projects.filter((p) => p.difficulty === filter)
return (
<div className="max-w-6xl mx-auto px-4 py-12">
<motion.div
initial={{ opacity: 0, y: -10 }}
animate={{ opacity: 1, y: 0 }}
className="text-center mb-10"
>
<h1 className="font-heading font-bold text-4xl mb-3">Alle Projekte</h1>
<p className="text-dark/60 text-lg">Waehle ein Projekt und leg los!</p>
</motion.div>
{/* Filter */}
<div className="flex justify-center gap-2 mb-10">
{filters.map((f) => (
<button
key={f.value}
onClick={() => setFilter(f.value as 0 | 1 | 2 | 3)}
className={`px-4 py-2 rounded-xl font-semibold text-sm transition-colors ${
filter === f.value
? 'bg-primary text-white'
: 'bg-white text-dark/60 hover:bg-primary/5'
}`}
>
{f.label}
</button>
))}
</div>
{/* Grid */}
<div className="grid grid-cols-1 sm:grid-cols-2 lg:grid-cols-3 gap-6">
{filtered.map((p) => (
<ProjectCard key={p.slug} project={p} />
))}
</div>
{filtered.length === 0 && (
<p className="text-center text-dark/40 mt-12">Keine Projekte in dieser Kategorie.</p>
)}
</div>
)
}

View File

@@ -0,0 +1,101 @@
'use client'
import { motion } from 'framer-motion'
import { ShieldCheck, Eye, Hand, Scissors, AlertTriangle, Users } from 'lucide-react'
import { SafetyTip } from '@/components/SafetyTip'
const rules = [
{ icon: Users, title: 'Immer mit Erwachsenen', text: 'Bei Saegen, Bohren und Schnitzen muss immer ein Erwachsener dabei sein.' },
{ icon: Hand, title: 'Vom Koerper weg', text: 'Schnitze, saege und schneide immer vom Koerper weg. So kannst du dich nicht verletzen.' },
{ icon: Eye, title: 'Schutzbrille tragen', text: 'Beim Saegen und Schleifen fliegen Spaene — eine Schutzbrille schuetzt deine Augen.' },
{ icon: Scissors, title: 'Werkzeug richtig halten', text: 'Greife Werkzeuge immer am Griff. Trage Messer und Saegen mit der Spitze nach unten.' },
{ icon: AlertTriangle, title: 'Aufgeraeumter Arbeitsplatz', text: 'Raeume Werkzeug nach dem Benutzen weg. Ein ordentlicher Platz ist ein sicherer Platz!' },
{ icon: ShieldCheck, title: 'Scharfes Werkzeug', text: 'Klingt komisch, aber: Scharfe Messer sind sicherer als stumpfe, weil du weniger Kraft brauchst.' },
]
const toolGuides = [
{ name: 'Schnitzmesser', age: 'Ab 6 Jahren (mit Hilfe)', tips: ['Immer vom Koerper weg schnitzen', 'Nach dem Benutzen zuklappen', 'Weiches Holz (Linde) verwenden'] },
{ name: 'Handsaege', age: 'Ab 7 Jahren (mit Hilfe)', tips: ['Holz immer fest einspannen', 'Langsam und gleichmaessig saegen', 'Nicht auf die Klinge druecken'] },
{ name: 'Hammer', age: 'Ab 5 Jahren', tips: ['Leichten Kinderhammer verwenden', 'Naegel mit Zange halten, nie mit Fingern', 'Auf stabile Unterlage achten'] },
{ name: 'Schleifpapier', age: 'Ab 5 Jahren', tips: ['Immer in eine Richtung schleifen', 'Staub nicht einatmen', 'Erst grob, dann fein'] },
{ name: 'Holzleim', age: 'Ab 5 Jahren', tips: ['Nicht giftig, aber nicht essen', 'Duenn auftragen reicht', 'Mindestens 1 Stunde trocknen lassen'] },
]
export default function SicherheitPage() {
return (
<div className="max-w-4xl mx-auto px-4 py-12">
<motion.div
initial={{ opacity: 0, y: -10 }}
animate={{ opacity: 1, y: 0 }}
className="text-center mb-12"
>
<div className="w-16 h-16 bg-warning/10 rounded-2xl flex items-center justify-center mx-auto mb-4">
<ShieldCheck className="w-8 h-8 text-warning" />
</div>
<h1 className="font-heading font-bold text-4xl mb-3">Sicherheit geht vor!</h1>
<p className="text-dark/60 text-lg max-w-2xl mx-auto">
Holzarbeiten macht riesig Spass aber nur, wenn du sicher arbeitest.
Hier findest du die wichtigsten Regeln.
</p>
</motion.div>
{/* Rules Grid */}
<section className="mb-16">
<h2 className="font-heading font-bold text-2xl mb-6">Die goldenen Regeln</h2>
<div className="grid grid-cols-1 sm:grid-cols-2 gap-4">
{rules.map((r, i) => (
<motion.div
key={r.title}
className="bg-white rounded-2xl p-5 border border-primary/5 flex gap-4"
initial={{ opacity: 0, y: 20 }}
animate={{ opacity: 1, y: 0 }}
transition={{ delay: i * 0.05 }}
>
<div className="w-10 h-10 bg-warning/10 rounded-xl flex items-center justify-center flex-shrink-0">
<r.icon className="w-5 h-5 text-warning" />
</div>
<div>
<h3 className="font-heading font-bold mb-1">{r.title}</h3>
<p className="text-sm text-dark/60">{r.text}</p>
</div>
</motion.div>
))}
</div>
</section>
{/* Tool Guides */}
<section className="mb-16">
<h2 className="font-heading font-bold text-2xl mb-6">Werkzeug-Guide</h2>
<div className="space-y-4">
{toolGuides.map((tool) => (
<div key={tool.name} className="bg-white rounded-2xl p-5 border border-primary/5">
<div className="flex items-center justify-between mb-3">
<h3 className="font-heading font-bold text-lg">{tool.name}</h3>
<span className="text-xs font-semibold bg-accent/10 text-accent px-2.5 py-1 rounded-full">{tool.age}</span>
</div>
<ul className="space-y-1.5">
{tool.tips.map((tip) => (
<li key={tip} className="flex items-center gap-2 text-sm text-dark/70">
<span className="w-1.5 h-1.5 rounded-full bg-primary flex-shrink-0" />
{tip}
</li>
))}
</ul>
</div>
))}
</div>
</section>
{/* Parents */}
<section>
<h2 className="font-heading font-bold text-2xl mb-4">Hinweise fuer Eltern</h2>
<div className="space-y-3">
<SafetyTip>Beaufsichtigen Sie Ihr Kind bei allen Projekten besonders beim Umgang mit Schneidwerkzeugen.</SafetyTip>
<SafetyTip>Stellen Sie altersgerechtes Werkzeug bereit. Kinderschnitzmesser haben abgerundete Spitzen.</SafetyTip>
<SafetyTip>Richten Sie einen festen Arbeitsplatz ein idealerweise auf einer stabilen Werkbank oder einem alten Tisch.</SafetyTip>
<SafetyTip>Leinoel und Acrylfarben sind fuer Kinder unbedenklich. Vermeiden Sie Lacke mit Loesungsmitteln.</SafetyTip>
</div>
</section>
</div>
)
}

View File

@@ -0,0 +1,83 @@
'use client'
import { motion } from 'framer-motion'
import { TreePine, Heart, Sparkles, Users } from 'lucide-react'
import Link from 'next/link'
const reasons = [
{ icon: Sparkles, title: 'Kreativitaet', text: 'Du kannst dir selbst ausdenken, was du baust — und es dann wirklich machen!' },
{ icon: Heart, title: 'Stolz', text: 'Wenn du etwas mit deinen eigenen Haenden baust, macht dich das richtig stolz.' },
{ icon: TreePine, title: 'Natur', text: 'Holz ist ein natuerliches Material. Du lernst die Natur besser kennen.' },
{ icon: Users, title: 'Zusammen', text: 'Holzarbeiten macht zusammen mit Freunden oder der Familie am meisten Spass!' },
]
export default function UeberPage() {
return (
<div className="max-w-4xl mx-auto px-4 py-12">
<motion.div
initial={{ opacity: 0, y: -10 }}
animate={{ opacity: 1, y: 0 }}
className="text-center mb-12"
>
<h1 className="font-heading font-bold text-4xl mb-3">Ueber LEVIS Holzbau</h1>
<p className="text-dark/60 text-lg max-w-2xl mx-auto">
Wir zeigen dir, wie du aus einem einfachen Stueck Holz etwas Tolles machen kannst!
</p>
</motion.div>
{/* Story */}
<div className="bg-white rounded-2xl p-6 sm:p-8 border border-primary/5 mb-12">
<h2 className="font-heading font-bold text-2xl mb-4">Was ist LEVIS Holzbau?</h2>
<div className="space-y-4 text-dark/70 leading-relaxed">
<p>
LEVIS Holzbau ist deine Online-Holzwerkstatt! Hier findest du Anleitungen fuer tolle Projekte
aus Holz vom einfachen Zauberstab bis zum echten Vogelhaus.
</p>
<p>
Jedes Projekt erklaert dir Schritt fuer Schritt, was du tun musst. Du siehst welches Werkzeug
und Material du brauchst, und wir zeigen dir immer, worauf du bei der Sicherheit achten musst.
</p>
<p>
Egal ob du 6 oder 12 Jahre alt bist fuer jedes Alter gibt es passende Projekte.
Faengst du gerade erst an? Dann probier den Zauberstab oder die Nagelbilder. Bist du
schon ein Profi? Dann trau dich an den Fliegenpilz!
</p>
</div>
</div>
{/* Why woodworking */}
<h2 className="font-heading font-bold text-2xl mb-6 text-center">Warum Holzarbeiten Spass macht</h2>
<div className="grid grid-cols-1 sm:grid-cols-2 gap-4 mb-12">
{reasons.map((r, i) => (
<motion.div
key={r.title}
className="bg-white rounded-2xl p-5 border border-primary/5 flex gap-4"
initial={{ opacity: 0, y: 20 }}
animate={{ opacity: 1, y: 0 }}
transition={{ delay: i * 0.1 }}
>
<div className="w-10 h-10 bg-secondary/10 rounded-xl flex items-center justify-center flex-shrink-0">
<r.icon className="w-5 h-5 text-secondary" />
</div>
<div>
<h3 className="font-heading font-bold mb-1">{r.title}</h3>
<p className="text-sm text-dark/60">{r.text}</p>
</div>
</motion.div>
))}
</div>
{/* CTA */}
<div className="text-center bg-gradient-to-br from-primary/5 to-secondary/5 rounded-2xl p-8">
<h2 className="font-heading font-bold text-2xl mb-3">Bereit loszulegen?</h2>
<p className="text-dark/60 mb-6">Schau dir unsere Projekte an und such dir eins aus!</p>
<Link
href="/projekte"
className="inline-flex items-center gap-2 bg-primary hover:bg-primary/90 text-white font-bold px-8 py-3 rounded-2xl transition-colors"
>
Zu den Projekten
</Link>
</div>
</div>
)
}

View File

@@ -0,0 +1,7 @@
export function AgeBadge({ range }: { range: string }) {
return (
<span className="inline-flex items-center px-2.5 py-0.5 rounded-full text-xs font-semibold bg-accent/10 text-accent">
{range} Jahre
</span>
)
}

View File

@@ -0,0 +1,15 @@
import { Hammer } from 'lucide-react'
export function DifficultyBadge({ level }: { level: 1 | 2 | 3 }) {
const labels = ['Anfaenger', 'Fortgeschritten', 'Profi']
return (
<div className="flex items-center gap-1" title={labels[level - 1]}>
{Array.from({ length: 3 }).map((_, i) => (
<Hammer
key={i}
className={`w-4 h-4 ${i < level ? 'text-primary' : 'text-gray-300'}`}
/>
))}
</div>
)
}

View File

@@ -0,0 +1,17 @@
import { Heart } from 'lucide-react'
import { Logo } from './Logo'
export function Footer() {
return (
<footer className="bg-white border-t border-primary/10 mt-16">
<div className="max-w-6xl mx-auto px-4 py-8">
<div className="flex flex-col sm:flex-row items-center justify-between gap-4">
<Logo size={32} />
<p className="text-sm text-dark/50 flex items-center gap-1">
Gemacht mit <Heart className="w-4 h-4 text-red-400 fill-red-400" /> fuer junge Holzwerker
</p>
</div>
</div>
</footer>
)
}

View File

@@ -0,0 +1,95 @@
'use client'
import { motion } from 'framer-motion'
import Link from 'next/link'
import { ArrowRight } from 'lucide-react'
import { Logo } from './Logo'
export function HeroSection() {
return (
<section className="relative overflow-hidden bg-gradient-to-br from-cream via-white to-primary/5 py-16 sm:py-24">
<div className="max-w-6xl mx-auto px-4 flex flex-col lg:flex-row items-center gap-12">
<motion.div
className="flex-1 text-center lg:text-left"
initial={{ opacity: 0, x: -30 }}
animate={{ opacity: 1, x: 0 }}
transition={{ duration: 0.6 }}
>
<div className="flex justify-center lg:justify-start mb-6">
<Logo size={64} />
</div>
<h1 className="font-heading font-bold text-4xl sm:text-5xl text-dark mb-4 text-balance">
Willkommen in der{' '}
<span className="text-primary">Holzwerkstatt</span>!
</h1>
<p className="text-lg text-dark/70 mb-8 max-w-lg mx-auto lg:mx-0">
Hier lernst du, wie man aus Holz tolle Sachen baut und schnitzt.
Vom Zauberstab bis zum Vogelhaus fuer jeden ist etwas dabei!
</p>
<Link
href="/projekte"
className="inline-flex items-center gap-2 bg-primary hover:bg-primary/90 text-white font-bold px-8 py-4 rounded-2xl text-lg transition-colors shadow-lg shadow-primary/20"
>
Entdecke Projekte <ArrowRight className="w-5 h-5" />
</Link>
</motion.div>
<motion.div
className="flex-1 flex justify-center"
initial={{ opacity: 0, scale: 0.8 }}
animate={{ opacity: 1, scale: 1 }}
transition={{ duration: 0.6, delay: 0.2 }}
>
<HeroIllustration />
</motion.div>
</div>
</section>
)
}
function HeroIllustration() {
return (
<svg width="320" height="280" viewBox="0 0 320 280" fill="none" xmlns="http://www.w3.org/2000/svg">
{/* Workbench */}
<rect x="40" y="180" width="240" height="12" rx="4" fill="#D4915C" />
<rect x="60" y="192" width="12" height="60" rx="2" fill="#C4814C" />
<rect x="248" y="192" width="12" height="60" rx="2" fill="#C4814C" />
<rect x="50" y="248" width="32" height="8" rx="2" fill="#C4814C" />
<rect x="238" y="248" width="32" height="8" rx="2" fill="#C4814C" />
{/* Wood pieces on bench */}
<rect x="80" y="164" width="60" height="16" rx="3" fill="#E8A96C" />
<rect x="85" y="168" width="50" height="2" rx="1" fill="#D4915C" opacity="0.3" />
{/* Small boat */}
<path d="M180 170 Q200 155 220 170 Q200 178 180 170Z" fill="#E8A96C" />
<line x1="200" y1="148" x2="200" y2="170" stroke="#8B6F47" strokeWidth="2" />
<path d="M200 148 L215 158 L200 165Z" fill="#FF6B6B" opacity="0.8" />
{/* Hammer */}
<rect x="240" y="155" width="4" height="25" rx="1" fill="#8B6F47" transform="rotate(-20 240 155)" />
<rect x="232" y="148" width="20" height="10" rx="2" fill="#888" transform="rotate(-20 240 155)" />
{/* Tree background */}
<circle cx="60" cy="100" r="35" fill="#4CAF50" opacity="0.3" />
<circle cx="50" cy="85" r="25" fill="#4CAF50" opacity="0.4" />
<circle cx="70" cy="90" r="28" fill="#4CAF50" opacity="0.35" />
<rect x="56" y="120" width="8" height="60" rx="2" fill="#8B6F47" opacity="0.4" />
{/* Tree right */}
<circle cx="270" cy="110" r="30" fill="#4CAF50" opacity="0.25" />
<circle cx="280" cy="95" r="22" fill="#4CAF50" opacity="0.35" />
<rect x="268" y="130" width="6" height="50" rx="2" fill="#8B6F47" opacity="0.3" />
{/* Sun */}
<circle cx="280" cy="40" r="20" fill="#F5A623" opacity="0.3" />
<circle cx="280" cy="40" r="14" fill="#F5A623" opacity="0.5" />
{/* Sawdust particles */}
<circle cx="120" cy="175" r="1.5" fill="#D4915C" opacity="0.5" />
<circle cx="130" cy="172" r="1" fill="#D4915C" opacity="0.4" />
<circle cx="115" cy="178" r="1.2" fill="#D4915C" opacity="0.3" />
<circle cx="135" cy="176" r="0.8" fill="#D4915C" opacity="0.6" />
</svg>
)
}

View File

@@ -0,0 +1,35 @@
'use client'
export function Logo({ size = 40 }: { size?: number }) {
return (
<div className="flex items-center gap-2">
<svg width={size} height={size} viewBox="0 0 48 48" fill="none" xmlns="http://www.w3.org/2000/svg">
{/* Wood log */}
<ellipse cx="24" cy="30" rx="16" ry="10" fill="#D4915C" />
<ellipse cx="24" cy="30" rx="16" ry="10" fill="url(#wood-grain)" opacity="0.3" />
<ellipse cx="24" cy="27" rx="16" ry="10" fill="#E8A96C" />
{/* Tree rings */}
<ellipse cx="24" cy="27" rx="10" ry="6" fill="none" stroke="#D4915C" strokeWidth="1" />
<ellipse cx="24" cy="27" rx="6" ry="3.5" fill="none" stroke="#D4915C" strokeWidth="0.8" />
<ellipse cx="24" cy="27" rx="2.5" ry="1.5" fill="#D4915C" />
{/* Saw */}
<rect x="30" y="6" width="3" height="18" rx="1" fill="#888" transform="rotate(15 30 6)" />
<rect x="29" y="4" width="5" height="5" rx="1" fill="#F5A623" transform="rotate(15 30 6)" />
{/* Saw teeth */}
<path d="M31 10 L34 11 L31 12 L34 13 L31 14 L34 15 L31 16 L34 17 L31 18 L34 19 L31 20" stroke="#666" strokeWidth="0.5" fill="none" transform="rotate(15 30 6)" />
{/* Leaf */}
<path d="M12 8 Q16 2 20 8 Q16 10 12 8Z" fill="#4CAF50" />
<line x1="16" y1="5" x2="16" y2="9" stroke="#388E3C" strokeWidth="0.5" />
<defs>
<pattern id="wood-grain" x="0" y="0" width="4" height="4" patternUnits="userSpaceOnUse">
<line x1="0" y1="0" x2="4" y2="4" stroke="#C4814C" strokeWidth="0.3" />
</pattern>
</defs>
</svg>
<div className="flex flex-col leading-tight">
<span className="font-heading font-bold text-xl text-primary">LEVIS</span>
<span className="font-heading text-sm text-dark/70 -mt-1">Holzbau</span>
</div>
</div>
)
}

View File

@@ -0,0 +1,44 @@
'use client'
import Link from 'next/link'
import { usePathname } from 'next/navigation'
import { Logo } from './Logo'
const links = [
{ href: '/', label: 'Start' },
{ href: '/projekte', label: 'Projekte' },
{ href: '/sicherheit', label: 'Sicherheit' },
{ href: '/ueber', label: 'Ueber LEVIS' },
]
export function Navbar() {
const pathname = usePathname()
return (
<nav className="bg-white/80 backdrop-blur-sm border-b border-primary/10 sticky top-0 z-50">
<div className="max-w-6xl mx-auto px-4 py-3 flex items-center justify-between">
<Link href="/">
<Logo />
</Link>
<div className="flex items-center gap-1 sm:gap-4">
{links.map(({ href, label }) => {
const isActive = href === '/' ? pathname === '/' : pathname.startsWith(href)
return (
<Link
key={href}
href={href}
className={`px-3 py-2 rounded-xl text-sm sm:text-base font-semibold transition-colors ${
isActive
? 'bg-primary/10 text-primary'
: 'text-dark/70 hover:text-primary hover:bg-primary/5'
}`}
>
{label}
</Link>
)
})}
</div>
</div>
</nav>
)
}

View File

@@ -0,0 +1,42 @@
'use client'
import Link from 'next/link'
import { motion } from 'framer-motion'
import { Clock } from 'lucide-react'
import { Project } from '@/lib/types'
import { DifficultyBadge } from './DifficultyBadge'
import { AgeBadge } from './AgeBadge'
import { ProjectIllustration } from './ProjectIllustration'
export function ProjectCard({ project }: { project: Project }) {
return (
<motion.div
initial={{ opacity: 0, y: 20 }}
animate={{ opacity: 1, y: 0 }}
whileHover={{ y: -4 }}
transition={{ duration: 0.3 }}
>
<Link href={`/projekte/${project.slug}`} className="block">
<div className="bg-white rounded-2xl shadow-sm hover:shadow-md transition-shadow overflow-hidden border border-primary/5">
<div className="bg-cream p-6 flex items-center justify-center h-44">
<ProjectIllustration slug={project.slug} size={120} />
</div>
<div className="p-5">
<h3 className="font-heading font-bold text-lg mb-2">{project.name}</h3>
<p className="text-sm text-dark/60 mb-3 line-clamp-2">{project.description}</p>
<div className="flex items-center justify-between">
<div className="flex items-center gap-2">
<AgeBadge range={project.ageRange} />
<DifficultyBadge level={project.difficulty} />
</div>
<div className="flex items-center gap-1 text-xs text-dark/40">
<Clock className="w-3.5 h-3.5" />
{project.duration}
</div>
</div>
</div>
</div>
</Link>
</motion.div>
)
}

View File

@@ -0,0 +1,132 @@
'use client'
export function ProjectIllustration({ slug, size = 100 }: { slug: string; size?: number }) {
const illustrations: Record<string, React.ReactNode> = {
zauberstab: (
<svg width={size} height={size} viewBox="0 0 100 100" fill="none">
<rect x="20" y="80" width="60" height="4" rx="2" fill="#D4915C" transform="rotate(-45 50 50)" />
<circle cx="28" cy="28" r="4" fill="#F5A623" opacity="0.6" />
<circle cx="22" cy="35" r="2.5" fill="#FFC107" opacity="0.5" />
<circle cx="35" cy="22" r="2" fill="#FFC107" opacity="0.4" />
<path d="M25 25 L20 18 M25 25 L32 20 M25 25 L22 32" stroke="#F5A623" strokeWidth="1.5" strokeLinecap="round" />
<circle cx="26" cy="26" r="6" fill="none" stroke="#F5A623" strokeWidth="0.5" opacity="0.3" />
</svg>
),
untersetzer: (
<svg width={size} height={size} viewBox="0 0 100 100" fill="none">
<ellipse cx="50" cy="55" rx="32" ry="8" fill="#C4814C" />
<ellipse cx="50" cy="50" rx="32" ry="8" fill="#E8A96C" />
<ellipse cx="50" cy="50" rx="22" ry="5" fill="none" stroke="#D4915C" strokeWidth="0.8" />
<ellipse cx="50" cy="50" rx="12" ry="2.8" fill="none" stroke="#D4915C" strokeWidth="0.6" />
<circle cx="42" cy="48" r="3" fill="#FF6B6B" opacity="0.5" />
<circle cx="55" cy="46" r="2" fill="#4CAF50" opacity="0.5" />
<circle cx="48" cy="53" r="2.5" fill="#2196F3" opacity="0.4" />
</svg>
),
nagelbilder: (
<svg width={size} height={size} viewBox="0 0 100 100" fill="none">
<rect x="20" y="20" width="60" height="60" rx="4" fill="#E8A96C" />
{/* Nails forming a star */}
<circle cx="50" cy="30" r="2" fill="#888" />
<circle cx="35" cy="45" r="2" fill="#888" />
<circle cx="65" cy="45" r="2" fill="#888" />
<circle cx="40" cy="65" r="2" fill="#888" />
<circle cx="60" cy="65" r="2" fill="#888" />
{/* String */}
<path d="M50 30 L35 45 L60 65 L40 65 L65 45 Z" stroke="#FF6B6B" strokeWidth="1.5" fill="none" />
<path d="M50 30 L40 65 M50 30 L60 65 M35 45 L65 45" stroke="#4CAF50" strokeWidth="1" fill="none" opacity="0.6" />
</svg>
),
bleistiftbox: (
<svg width={size} height={size} viewBox="0 0 100 100" fill="none">
<path d="M25 75 L25 35 L75 35 L75 75 Z" fill="#E8A96C" />
<path d="M25 35 L30 30 L80 30 L75 35 Z" fill="#D4915C" />
<path d="M75 35 L80 30 L80 70 L75 75 Z" fill="#C4814C" />
{/* Pencils */}
<rect x="35" y="20" width="4" height="30" rx="1" fill="#FFC107" />
<polygon points="35,50 39,50 37,55" fill="#2C2C2C" />
<rect x="45" y="15" width="4" height="32" rx="1" fill="#2196F3" />
<polygon points="45,47 49,47 47,52" fill="#2C2C2C" />
<rect x="55" y="22" width="4" height="28" rx="1" fill="#FF6B6B" />
<polygon points="55,50 59,50 57,55" fill="#2C2C2C" />
</svg>
),
segelboot: (
<svg width={size} height={size} viewBox="0 0 100 100" fill="none">
<path d="M20 65 Q50 55 80 65 Q50 72 20 65Z" fill="#E8A96C" />
<line x1="50" y1="25" x2="50" y2="62" stroke="#8B6F47" strokeWidth="2.5" />
<path d="M50 25 L70 50 L50 58Z" fill="white" stroke="#ddd" strokeWidth="0.5" />
<path d="M50 30 L38 52 L50 58Z" fill="#FF6B6B" opacity="0.8" />
{/* Water */}
<path d="M10 72 Q25 68 40 72 Q55 76 70 72 Q85 68 100 72" stroke="#2196F3" strokeWidth="1.5" fill="none" opacity="0.4" />
<path d="M5 78 Q20 74 35 78 Q50 82 65 78 Q80 74 95 78" stroke="#2196F3" strokeWidth="1" fill="none" opacity="0.3" />
</svg>
),
vogelhaus: (
<svg width={size} height={size} viewBox="0 0 100 100" fill="none">
{/* Roof */}
<path d="M25 45 L50 25 L75 45 Z" fill="#C4814C" />
{/* Body */}
<rect x="30" y="45" width="40" height="35" fill="#E8A96C" />
{/* Entrance hole */}
<circle cx="50" cy="58" r="6" fill="#5D4037" />
{/* Perch */}
<rect x="47" y="65" width="6" height="2" rx="1" fill="#8B6F47" />
<rect x="48" y="67" width="4" height="6" rx="1" fill="#8B6F47" />
{/* Post */}
<rect x="46" y="80" width="8" height="15" rx="1" fill="#8B6F47" />
{/* Bird */}
<ellipse cx="68" cy="40" rx="5" ry="4" fill="#FF6B6B" />
<circle cx="71" cy="38" r="1.5" fill="#2C2C2C" />
<path d="M73 39 L77 38.5" stroke="#F5A623" strokeWidth="1.5" strokeLinecap="round" />
</svg>
),
'holztier-igel': (
<svg width={size} height={size} viewBox="0 0 100 100" fill="none">
{/* Body */}
<ellipse cx="50" cy="60" rx="25" ry="18" fill="#C4814C" />
{/* Head */}
<ellipse cx="28" cy="58" rx="10" ry="9" fill="#D4915C" />
{/* Nose */}
<circle cx="20" cy="57" r="2" fill="#2C2C2C" />
{/* Eye */}
<circle cx="25" cy="54" r="1.5" fill="#2C2C2C" />
<circle cx="25.5" cy="53.5" r="0.5" fill="white" />
{/* Spines */}
{[0, 15, 30, 45, 60, 75, 90, 105, 120, 135, 150].map((angle, i) => {
const rad = (angle - 30) * Math.PI / 180
const x1 = 55 + Math.cos(rad) * 20
const y1 = 52 + Math.sin(rad) * 14
const x2 = 55 + Math.cos(rad) * 30
const y2 = 52 + Math.sin(rad) * 22
return <line key={i} x1={x1} y1={y1} x2={x2} y2={y2} stroke="#8B6F47" strokeWidth="2" strokeLinecap="round" />
})}
{/* Feet */}
<ellipse cx="35" cy="75" rx="4" ry="2" fill="#D4915C" />
<ellipse cx="60" cy="75" rx="4" ry="2" fill="#D4915C" />
</svg>
),
'schnitzfigur-pilz': (
<svg width={size} height={size} viewBox="0 0 100 100" fill="none">
{/* Stem */}
<path d="M40 55 Q38 75 42 85 L58 85 Q62 75 60 55 Z" fill="#F5F5DC" />
<ellipse cx="50" cy="85" rx="10" ry="3" fill="#E8E0C8" />
{/* Cap */}
<ellipse cx="50" cy="48" rx="28" ry="18" fill="#D32F2F" />
<ellipse cx="50" cy="55" rx="22" ry="5" fill="#E8A96C" />
{/* White dots */}
<circle cx="38" cy="40" r="3" fill="white" opacity="0.9" />
<circle cx="55" cy="35" r="2.5" fill="white" opacity="0.9" />
<circle cx="48" cy="45" r="2" fill="white" opacity="0.8" />
<circle cx="62" cy="42" r="2.5" fill="white" opacity="0.85" />
<circle cx="42" cy="50" r="1.8" fill="white" opacity="0.7" />
{/* Grass */}
<path d="M30 85 Q32 78 34 85" stroke="#4CAF50" strokeWidth="1.5" fill="none" />
<path d="M65 85 Q67 79 69 85" stroke="#4CAF50" strokeWidth="1.5" fill="none" />
<path d="M72 85 Q73 80 75 85" stroke="#4CAF50" strokeWidth="1" fill="none" opacity="0.6" />
</svg>
),
}
return <>{illustrations[slug] || illustrations.zauberstab}</>
}

View File

@@ -0,0 +1,10 @@
import { AlertTriangle } from 'lucide-react'
export function SafetyTip({ children }: { children: React.ReactNode }) {
return (
<div className="flex items-start gap-3 bg-warning/10 border border-warning/30 rounded-xl p-4">
<AlertTriangle className="w-5 h-5 text-warning flex-shrink-0 mt-0.5" />
<p className="text-sm font-medium">{children}</p>
</div>
)
}

View File

@@ -0,0 +1,15 @@
import { Step } from '@/lib/types'
export function StepCard({ step, index }: { step: Step; index: number }) {
return (
<div className="flex gap-4">
<div className="flex-shrink-0 w-10 h-10 rounded-full bg-primary flex items-center justify-center text-white font-bold text-lg">
{index + 1}
</div>
<div className="flex-1 pb-8 border-l-2 border-primary/20 pl-6 -ml-5 mt-5">
<h3 className="font-heading font-bold text-lg mb-1">{step.title}</h3>
<p className="text-dark/70 leading-relaxed">{step.description}</p>
</div>
</div>
)
}

View File

@@ -0,0 +1,14 @@
import { Hammer, Scissors, Ruler, Paintbrush, Wrench } from 'lucide-react'
const iconMap: Record<string, React.ElementType> = {
hammer: Hammer,
schnitzmesser: Scissors,
lineal: Ruler,
pinsel: Paintbrush,
}
export function ToolIcon({ name }: { name: string }) {
const key = name.toLowerCase()
const Icon = Object.entries(iconMap).find(([k]) => key.includes(k))?.[1] || Wrench
return <Icon className="w-5 h-5 text-primary" />
}

View File

@@ -0,0 +1,15 @@
export const fadeInUp = {
initial: { opacity: 0, y: 20 },
animate: { opacity: 1, y: 0 },
transition: { duration: 0.5 },
}
export const staggerContainer = {
animate: { transition: { staggerChildren: 0.1 } },
}
export const scaleIn = {
initial: { opacity: 0, scale: 0.9 },
animate: { opacity: 1, scale: 1 },
transition: { duration: 0.4 },
}

View File

@@ -0,0 +1,214 @@
import { Project } from './types'
export const projects: Project[] = [
{
slug: 'zauberstab',
name: 'Zauberstab',
description: 'Schnitze deinen eigenen magischen Zauberstab aus einem Ast! Mit Schleifpapier und etwas Farbe wird daraus ein echtes Zauberwerkzeug.',
ageRange: '6-8',
difficulty: 1,
duration: '45 Minuten',
tools: ['Schnitzmesser (kindersicher)', 'Schleifpapier (fein)', 'Pinsel'],
materials: ['1 gerader Ast (ca. 30cm, daumendicke)', 'Acrylfarben', 'Klarlack'],
steps: [
{ title: 'Ast aussuchen', description: 'Such dir einen geraden, trockenen Ast. Er sollte ungefaehr so lang sein wie dein Unterarm und gut in deiner Hand liegen.' },
{ title: 'Rinde entfernen', description: 'Zieh vorsichtig die Rinde ab. Wenn sie nicht leicht abgeht, hilft ein Erwachsener mit dem Schnitzmesser.' },
{ title: 'Schleifen', description: 'Schleife den Ast mit dem Schleifpapier glatt. Immer in eine Richtung schleifen — wie beim Streicheln einer Katze!' },
{ title: 'Spitze formen', description: 'Ein Ende kannst du mit dem Schleifpapier etwas spitzer machen. Nicht zu spitz — es soll ein Zauberstab sein, kein Speer!' },
{ title: 'Bemalen', description: 'Jetzt wird es bunt! Male Spiralen, Sterne oder Streifen auf deinen Stab. Lass jede Farbe trocknen bevor du die naechste nimmst.' },
{ title: 'Trocknen lassen', description: 'Stell den Stab zum Trocknen aufrecht in ein Glas. Wenn die Farbe trocken ist, kann ein Erwachsener Klarlack auftragen.' },
],
safetyTips: [
'Ein Erwachsener sollte beim Schnitzen immer dabei sein.',
'Immer vom Koerper weg schnitzen!',
'Frische Aeste sind weicher — trockene Aeste koennen splittern.',
],
skills: ['Feinmotorik', 'Schleifen', 'Kreatives Gestalten'],
},
{
slug: 'untersetzer',
name: 'Holz-Untersetzer',
description: 'Bastle praktische Untersetzer aus Holzscheiben! Eine tolle Geschenkidee fuer die ganze Familie.',
ageRange: '6+',
difficulty: 1,
duration: '30 Minuten',
tools: ['Schleifpapier (mittel + fein)', 'Pinsel'],
materials: ['Holzscheiben (ca. 10cm Durchmesser)', 'Acrylfarben', 'Klarlack', 'Filzgleiter'],
steps: [
{ title: 'Holzscheiben vorbereiten', description: 'Nimm eine Holzscheibe und pruefe ob sie flach auf dem Tisch liegt. Wackelt sie? Dann such dir eine andere aus.' },
{ title: 'Oberflaeche schleifen', description: 'Schleife beide Seiten der Holzscheibe glatt. Erst mit dem groben, dann mit dem feinen Schleifpapier.' },
{ title: 'Staub abwischen', description: 'Wisch den Schleifstaub mit einem feuchten Tuch ab. Die Scheibe muss sauber sein damit die Farbe haelt.' },
{ title: 'Muster malen', description: 'Bemale die Oberseite mit einem schoenen Muster: Blumen, Tiere, Punkte oder Streifen — alles ist erlaubt!' },
{ title: 'Versiegeln', description: 'Wenn die Farbe trocken ist, traegt ein Erwachsener Klarlack auf. So wird der Untersetzer wasserfest.' },
{ title: 'Filzgleiter aufkleben', description: 'Klebe 3-4 kleine Filzgleiter auf die Unterseite. So rutscht der Untersetzer nicht und zerkratzt den Tisch nicht.' },
],
safetyTips: [
'Beim Schleifen Staub nicht einatmen — am besten draussen arbeiten.',
'Klarlack nur von Erwachsenen auftragen lassen (gut lueften!).',
],
skills: ['Schleifen', 'Malen', 'Sorgfaeltiges Arbeiten'],
},
{
slug: 'nagelbilder',
name: 'Nagelbilder',
description: 'Schlage Naegel in ein Brett und spanne bunte Faeden dazwischen — so entstehen tolle Kunstwerke!',
ageRange: '5-7',
difficulty: 1,
duration: '40 Minuten',
tools: ['Hammer (leicht, kindgerecht)', 'Bleistift'],
materials: ['Holzbrett (ca. 20x20cm)', 'Kleine Naegel (ca. 20 Stueck)', 'Bunte Wollfaeden', 'Vorlage auf Papier'],
steps: [
{ title: 'Vorlage waehlen', description: 'Such dir eine einfache Form aus: ein Herz, einen Stern oder ein Haus. Zeichne die Form auf Papier und lege es auf das Brett.' },
{ title: 'Punkte markieren', description: 'Druecke mit dem Bleistift entlang der Form Punkte ins Holz. Alle 2cm ein Punkt reicht aus.' },
{ title: 'Papier entfernen', description: 'Nimm das Papier vorsichtig ab. Du siehst jetzt die Bleistiftpunkte auf dem Holz.' },
{ title: 'Naegel einschlagen', description: 'Schlage an jedem Punkt einen Nagel ein. Der Nagel sollte ungefaehr 1cm aus dem Holz schauen. Halt den Nagel mit einer Zange, nicht mit den Fingern!' },
{ title: 'Faeden spannen', description: 'Knote einen Faden an einen Nagel und spanne ihn kreuz und quer zu den anderen Naegeln. Experimentiere mit verschiedenen Farben!' },
{ title: 'Aufhaengen', description: 'Schraube eine kleine Oese auf die Rueckseite — fertig ist dein Kunstwerk zum Aufhaengen!' },
],
safetyTips: [
'Naegel immer mit einer Zange festhalten, niemals mit den Fingern!',
'Einen leichten Kinderhammer verwenden.',
'Auf eine stabile Unterlage achten beim Haemmern.',
],
skills: ['Haemmern', 'Feinmotorik', 'Kreativitaet'],
},
{
slug: 'bleistiftbox',
name: 'Bleistiftbox',
description: 'Baue eine praktische Box fuer deine Stifte und Pinsel! Aus duennen Holzbrettchen entsteht ein nuetzlicher Schreibtischhelfer.',
ageRange: '7-9',
difficulty: 2,
duration: '1 Stunde',
tools: ['Handsaege (kindersicher)', 'Schleifpapier', 'Holzleim', 'Schraubzwinge', 'Lineal', 'Bleistift'],
materials: ['Duennes Sperrholz (4mm)', 'Holzleim', 'Acrylfarbe', 'Klarlack'],
steps: [
{ title: 'Teile anzeichnen', description: 'Zeichne die 5 Teile auf das Sperrholz: 1 Boden (8x8cm), 4 Seitenwaende (8x10cm). Miss genau mit dem Lineal!' },
{ title: 'Aussaegen', description: 'Saege die Teile vorsichtig aus. Ein Erwachsener hilft beim Festhalten. Immer langsam und gleichmaessig saegen.' },
{ title: 'Kanten schleifen', description: 'Schleife alle Kanten glatt. Besonders die Saegekanten muessen schoen eben werden.' },
{ title: 'Zusammenleimen', description: 'Trage Holzleim auf die Kanten auf und druecke die Teile zusammen. Erst zwei Seiten an den Boden, dann die anderen zwei.' },
{ title: 'Trocknen lassen', description: 'Fixiere alles mit Schraubzwingen oder Klebeband. Der Leim braucht mindestens 1 Stunde zum Trocknen.' },
{ title: 'Dekorieren', description: 'Bemale deine Box mit Acrylfarben. Du kannst deinen Namen draufschreiben oder Muster malen.' },
{ title: 'Versiegeln', description: 'Nach dem Trocknen der Farbe traegt ein Erwachsener Klarlack auf. Fertig ist deine Bleistiftbox!' },
],
safetyTips: [
'Beim Saegen immer das Holz fest einspannen!',
'Die Saege vom Koerper weg fuehren.',
'Holzleim ist nicht giftig, aber trotzdem nicht in den Mund nehmen.',
],
skills: ['Messen und Anzeichnen', 'Saegen', 'Leimen', 'Geduld'],
},
{
slug: 'segelboot',
name: 'Segelboot',
description: 'Baue ein kleines Segelboot das wirklich schwimmt! Perfekt fuer die Badewanne oder den Bach im Park.',
ageRange: '8-10',
difficulty: 2,
duration: '1.5 Stunden',
tools: ['Handsaege', 'Schleifpapier', 'Bohrer (Handbohrer)', 'Schnitzmesser'],
materials: ['Holzklotz (ca. 20x8x4cm)', 'Rundstab (ca. 20cm)', 'Stoffrest fuer Segel', 'Holzleim', 'Wasserfarbe + Klarlack'],
steps: [
{ title: 'Rumpf anzeichnen', description: 'Zeichne die Bootsform von oben auf den Holzklotz: Vorne spitz, hinten breit. Die typische Bootsform kennst du bestimmt!' },
{ title: 'Rumpf aussaegen', description: 'Saege die Bootsform aus. Ein Erwachsener hilft beim Festhalten. Die Kurven langsam und vorsichtig saegen.' },
{ title: 'Rumpf schleifen', description: 'Schleife den Rumpf schoen rund. Die Unterseite sollte leicht gewoelbt sein wie bei einem echten Boot.' },
{ title: 'Mastloch bohren', description: 'Ein Erwachsener bohrt in der Mitte ein Loch fuer den Mast. Es muss so gross sein, dass der Rundstab genau reinpasst.' },
{ title: 'Segel basteln', description: 'Schneide aus dem Stoff ein Dreieck aus (ca. 15cm hoch). Klebe oder naehe es am Rundstab fest.' },
{ title: 'Zusammenbauen', description: 'Stecke den Mast mit etwas Holzleim ins Loch. Lass alles gut trocknen.' },
{ title: 'Wasserfest machen', description: 'Bemale dein Boot und lass es trocknen. Dann traegt ein Erwachsener mehrere Schichten Klarlack auf — so bleibt dein Boot wasserdicht!' },
],
safetyTips: [
'Bohren ist Erwachsenensache — hilf beim Festhalten!',
'Beim Schnitzen immer vom Koerper weg arbeiten.',
'Boot nur unter Aufsicht im Wasser testen.',
],
skills: ['Saegen', 'Formen', 'Zusammenbauen', 'Wasserdicht machen'],
},
{
slug: 'vogelhaus',
name: 'Vogelhaus',
description: 'Baue ein kuscheliges Vogelhaus fuer die Voegel in deinem Garten! Im Winter freuen sie sich besonders ueber ein Futterhaus.',
ageRange: '8-10',
difficulty: 2,
duration: '2 Stunden',
tools: ['Handsaege', 'Hammer', 'Schleifpapier', 'Bohrer', 'Lineal', 'Bleistift'],
materials: ['Holzbretter (1cm dick)', 'Kleine Naegel oder Schrauben', 'Holzleim', 'Dachpappe oder Rinde', 'Leinoel (ungiftig)'],
steps: [
{ title: 'Teile anzeichnen', description: 'Zeichne alle Teile auf: Boden (18x18cm), 2 Seitenwaende, 2 Giebel (mit Spitze fuer das Dach), 2 Dachhaelften. Ein Erwachsener hilft beim Ausmessen.' },
{ title: 'Aussaegen', description: 'Saege alle Teile vorsichtig aus. Bei den Giebeln mit der Spitze besonders aufpassen. Immer mit Hilfe eines Erwachsenen!' },
{ title: 'Einflugsloch', description: 'Ein Erwachsener bohrt in eine Giebelseite ein rundes Loch (ca. 3cm). Das ist die Tuer fuer die Voegel!' },
{ title: 'Schleifen', description: 'Schleife alle Teile glatt, besonders die Kanten. Voegel sollen sich nicht verletzen.' },
{ title: 'Zusammenbauen', description: 'Leime und nagle die Teile zusammen: Erst die Seitenwaende am Boden, dann die Giebel, zum Schluss das Dach.' },
{ title: 'Dach schuetzen', description: 'Klebe Dachpappe oder Rindenstuecke auf das Dach. So bleibt das Innere trocken bei Regen.' },
{ title: 'Behandeln', description: 'Reibe das Haeuschen von aussen mit Leinoel ein. KEINE Farbe verwenden — die Chemikalien koennten den Voegeln schaden!' },
],
safetyTips: [
'Naegel mit der Zange halten beim Einschlagen.',
'Saegen und Bohren nur mit Erwachsenen zusammen.',
'Kein giftiges Holzschutzmittel verwenden — nur Leinoel!',
],
skills: ['Messen', 'Saegen', 'Naegeln', 'Zusammenbauen', 'Tierschutz'],
},
{
slug: 'holztier-igel',
name: 'Holztier — Igel',
description: 'Schnitze einen niedlichen Igel aus Holz! Die Stacheln werden aus kurzen Naegeln oder Zahnstochern gemacht.',
ageRange: '8-10',
difficulty: 2,
duration: '1 Stunde',
tools: ['Schnitzmesser (kindersicher)', 'Schleifpapier', 'Bohrer (duenn)', 'Hammer (leicht)'],
materials: ['Holzklotz (ca. 10x6x5cm, weiches Holz)', 'Zahnstocher oder kurze Naegel', 'Schwarzer Filzstift', 'Holzleim'],
steps: [
{ title: 'Form anzeichnen', description: 'Zeichne die Igelform von der Seite auf den Holzklotz: Vorne eine kleine Spitznase, hinten rund. Von oben tropfenfoermig.' },
{ title: 'Grob schnitzen', description: 'Schnitze mit dem Schnitzmesser die grobe Form. Ein Erwachsener hilft bei harten Stellen. Immer vom Koerper weg schnitzen!' },
{ title: 'Form verfeinern', description: 'Schnitze die Nase spitzer und den Koerper runder. Der Igel soll von hinten huebsch rund aussehen.' },
{ title: 'Schleifen', description: 'Schleife den ganzen Igel glatt. Besonders das Gesicht soll weich und glatt sein.' },
{ title: 'Stacheln vorbereiten', description: 'Ein Erwachsener bohrt viele kleine Loecher in den Ruecken (nicht zu tief!). Die Loecher sollten leicht schraeg nach hinten zeigen.' },
{ title: 'Stacheln einsetzen', description: 'Stecke Zahnstocher in die Loecher und kuerze sie auf 1-2cm. Ein Tropfen Holzleim in jedes Loch haelt die Stacheln fest.' },
{ title: 'Gesicht malen', description: 'Male mit dem schwarzen Filzstift zwei Augen und eine kleine Nase. Fertig ist dein Igel!' },
],
safetyTips: [
'Schnitzmesser immer geschlossen ablegen.',
'Vom Koerper weg schnitzen — das ist die wichtigste Regel!',
'Weiches Holz wie Linde oder Pappel verwenden.',
],
skills: ['Schnitzen', 'Feinarbeit', 'Raeumliches Denken'],
},
{
slug: 'schnitzfigur-pilz',
name: 'Schnitzfigur — Pilz',
description: 'Schnitze einen huebschen Fliegenpilz aus Holz! Ein anspruchsvolles Projekt fuer erfahrene junge Holzwerker.',
ageRange: '10-12',
difficulty: 3,
duration: '2 Stunden',
tools: ['Schnitzmesser-Set (3 Messer)', 'Schleifpapier (fein + sehr fein)', 'Schraubstock'],
materials: ['Holzklotz (ca. 12x8x8cm, Linde)', 'Acrylfarben (rot, weiss, braun)', 'Klarlack', 'Pinsel (duenn + mittel)'],
steps: [
{ title: 'Entwurf zeichnen', description: 'Zeichne deinen Pilz von vorne und von der Seite auf Papier. Uebertrage die Form mit Bleistift auf den Holzklotz.' },
{ title: 'Grobe Form', description: 'Spanne den Klotz im Schraubstock ein. Schnitze mit dem groessten Messer die Grundform: oben die runde Kappe, unten den Stiel.' },
{ title: 'Kappe formen', description: 'Schnitze die Pilzkappe rund und leicht gewoelbt. Die Unterseite der Kappe ist leicht nach innen gewoelbt (hohl).' },
{ title: 'Stiel formen', description: 'Der Stiel wird nach unten etwas breiter. Schnitze ihn schoen rund und gleichmaessig.' },
{ title: 'Details schnitzen', description: 'Schnitze mit dem kleinsten Messer feine Details: Die Lamellen unter der Kappe (feine Rillen) und einen kleinen Ring am Stiel.' },
{ title: 'Feinschliff', description: 'Schleife den ganzen Pilz erst mit feinem, dann mit sehr feinem Schleifpapier. Je glatter, desto schoener die Bemalung!' },
{ title: 'Bemalen', description: 'Male die Kappe rot mit weissen Punkten (Fliegenpilz!). Der Stiel wird weiss oder hellbraun. Lass jede Schicht gut trocknen.' },
],
safetyTips: [
'Dieses Projekt nur mit Schnitz-Erfahrung beginnen!',
'Schraubstock verwenden — niemals das Holz in der Hand halten beim Schnitzen!',
'Scharfe Messer sind sicherer als stumpfe — ein Erwachsener schaerft die Messer.',
'Immer konzentriert arbeiten, nicht ablenken lassen.',
],
skills: ['Fortgeschrittenes Schnitzen', 'Detailarbeit', 'Geduld', 'Dreidimensionales Denken'],
},
]
export function getProject(slug: string): Project | undefined {
return projects.find((p) => p.slug === slug)
}
export function getRelatedProjects(slug: string, count = 3): Project[] {
const current = getProject(slug)
if (!current) return projects.slice(0, count)
return projects
.filter((p) => p.slug !== slug)
.sort((a, b) => Math.abs(a.difficulty - current.difficulty) - Math.abs(b.difficulty - current.difficulty))
.slice(0, count)
}

View File

@@ -0,0 +1,18 @@
export interface Project {
slug: string
name: string
description: string
ageRange: string
difficulty: 1 | 2 | 3
duration: string
tools: string[]
materials: string[]
steps: Step[]
safetyTips: string[]
skills: string[]
}
export interface Step {
title: string
description: string
}

6
levis-holzbau/next-env.d.ts vendored Normal file
View File

@@ -0,0 +1,6 @@
/// <reference types="next" />
/// <reference types="next/image-types/global" />
/// <reference path="./.next/types/routes.d.ts" />
// NOTE: This file should not be edited
// see https://nextjs.org/docs/app/api-reference/config/typescript for more information.

View File

@@ -0,0 +1,6 @@
/** @type {import('next').NextConfig} */
const nextConfig = {
output: 'standalone',
}
module.exports = nextConfig

2017
levis-holzbau/package-lock.json generated Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,25 @@
{
"name": "levis-holzbau",
"version": "1.0.0",
"private": true,
"scripts": {
"dev": "next dev -p 3013",
"build": "next build",
"start": "next start -p 3013"
},
"dependencies": {
"framer-motion": "^11.15.0",
"lucide-react": "^0.468.0",
"next": "^15.1.0",
"react": "^18.3.1",
"react-dom": "^18.3.1"
},
"devDependencies": {
"@types/node": "^22.10.2",
"@types/react": "^18.3.14",
"@types/react-dom": "^18.3.5",
"postcss": "^8.4.49",
"tailwindcss": "^3.4.16",
"typescript": "^5.7.2"
}
}

View File

@@ -0,0 +1,8 @@
/** @type {import('postcss-load-config').Config} */
const config = {
plugins: {
tailwindcss: {},
},
}
export default config

View File

@@ -0,0 +1,26 @@
import type { Config } from 'tailwindcss'
const config: Config = {
content: [
'./app/**/*.{js,ts,jsx,tsx,mdx}',
'./components/**/*.{js,ts,jsx,tsx,mdx}',
],
theme: {
extend: {
colors: {
primary: '#F5A623',
secondary: '#4CAF50',
accent: '#2196F3',
warning: '#FFC107',
cream: '#FDF8F0',
dark: '#2C2C2C',
},
fontFamily: {
heading: ['Quicksand', 'sans-serif'],
body: ['Nunito', 'sans-serif'],
},
},
},
plugins: [],
}
export default config

View File

@@ -0,0 +1,40 @@
{
"compilerOptions": {
"lib": [
"dom",
"dom.iterable",
"esnext"
],
"allowJs": true,
"skipLibCheck": true,
"strict": true,
"noEmit": true,
"esModuleInterop": true,
"module": "esnext",
"moduleResolution": "bundler",
"resolveJsonModule": true,
"isolatedModules": true,
"jsx": "preserve",
"incremental": true,
"plugins": [
{
"name": "next"
}
],
"paths": {
"@/*": [
"./*"
]
},
"target": "ES2017"
},
"include": [
"next-env.d.ts",
"**/*.ts",
"**/*.tsx",
".next/types/**/*.ts"
],
"exclude": [
"node_modules"
]
}

View File

@@ -578,6 +578,33 @@ server {
}
}
# =========================================================
# CORE: Control Pipeline on port 8098 (Entwickler-only)
# =========================================================
server {
listen 8098 ssl;
http2 on;
server_name macmini localhost;
ssl_certificate /etc/nginx/certs/macmini.crt;
ssl_certificate_key /etc/nginx/certs/macmini.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
ssl_prefer_server_ciphers off;
location / {
set $upstream_pipeline bp-core-control-pipeline:8098;
proxy_pass http://$upstream_pipeline;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto https;
proxy_read_timeout 1800s;
proxy_send_timeout 1800s;
}
}
# =========================================================
# CORE: Edu-Search on port 8089
# =========================================================
@@ -733,3 +760,33 @@ server {
try_files $uri $uri/ /index.html;
}
}
# =========================================================
# PITCH DECK: Investor Presentation on port 3012
# =========================================================
server {
listen 3012 ssl;
http2 on;
server_name macmini localhost;
ssl_certificate /etc/nginx/certs/macmini.crt;
ssl_certificate_key /etc/nginx/certs/macmini.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
ssl_prefer_server_ciphers off;
location / {
set $upstream_pitch bp-core-pitch-deck:3000;
proxy_pass http://$upstream_pitch;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto https;
proxy_read_timeout 300s;
proxy_connect_timeout 60s;
proxy_send_timeout 300s;
}
}

Some files were not shown because too many files have changed in this diff Show More