Commit Graph

735 Commits

Author SHA1 Message Date
Benjamin Admin ba9558384f feat: Normen-Bibliothek auf 620+ erweitert + wave3 fixes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 13:13:08 +02:00
Benjamin Admin 2e1e18d853 feat: Normen-Bibliothek auf 617 erweitert (Ziel: 700)
Wave 3: +161 Normen (456 → 617)
- Serien-Lücken geschlossen (EN 1870, EN 474, EN 1034, EN 81, ISO 4254)
- Glas, Leder, Backwaren, Tabak, Medizin (IEC 60601), Labor, Feuerwehr
- Spielplatz, Fitness, Schwimmbad, HVAC, Kältetechnik
- PSA (Schuhe, Handschuhe, Augenschutz, Gehörschutz, Atemschutz)
- Leitern, Gerüste, Drahtseile, Gasgeräte, Messtechnik

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 13:04:22 +02:00
Benjamin Admin 9bc0f321e0 feat: Normen-Bibliothek auf 456 erweitert + UX-Verbesserungen
- Normen: 215 → 456 (Werkzeugmaschinen, Förder/AGV, Verfahrenstechnik,
  Bau/Bergbau, Holz/Papier, Airport, Wäscherei, B2-Erweiterung)
- Maßnahmen: Accordion-Tabellenansicht mit Batch-Verifizierung
- Hazards: Risikobewertung als Default-View, KI-Button entfernt
- Normenrecherche: Pflicht-Erklärung, + Norm hinzufügen Feld
- Produktionslinien: Inline-Erstellungsformular mit Projekt-Zuordnung
- Playwright Tests angepasst

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 12:45:15 +02:00
Benjamin Admin 97a52533a8 Merge remote gitea/main — resolve conflicts keeping local (origin) state
Build + Deploy / build-admin-compliance (push) Successful in 2m29s
Build + Deploy / build-backend-compliance (push) Successful in 3m23s
Build + Deploy / build-ai-sdk (push) Failing after 47s
Build + Deploy / build-developer-portal (push) Successful in 1m19s
Build + Deploy / build-tts (push) Failing after 1m29s
Build + Deploy / build-document-crawler (push) Successful in 43s
Build + Deploy / build-dsms-gateway (push) Successful in 25s
Build + Deploy / build-dsms-node (push) Successful in 11s
CI / branch-name (push) Has been skipped
Build + Deploy / trigger-orca (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 18s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 3m17s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-go (push) Failing after 48s
CI / test-python-backend (push) Successful in 42s
CI / test-python-document-crawler (push) Successful in 31s
CI / test-python-dsms-gateway (push) Successful in 26s
CI / validate-canonical-controls (push) Successful in 18s
Local origin is 20+ commits ahead of remote gitea. All conflicts
resolved by keeping HEAD (our version) which includes the full
56→138 check expansion and doc_checks package split.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 12:40:23 +02:00
Benjamin Admin b363c28539 feat: Add 76 Level-2 regex checks for document correctness verification
Split dsi_document_checker.py (466 LOC) into doc_checks/ package (9 files).
Two-pass L1→L2 logic: L1 checks "Is it mentioned?", L2 checks "Is it correct?"
(e.g. controller has full address, specific Art. 6 lit., concrete time periods).

138 total checks (62 L1 + 76 L2) across 7 doc types:
- DSE Art. 13: 31, Impressum §5 TMG: 16, Cookie §25 TDDDG: 15
- Widerruf §355: 15, AGB §305ff: 21, Social Media Art. 26: 20, DSFA Art. 35: 18

Frontend: hierarchical L1→L2 display with dual progress bars
(green=completeness, blue=correctness).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 12:37:03 +02:00
Benjamin Admin 3c12e06faf feat: Fix DSFA dedup + expand all checklists to 56 total checks
Fixes:
- 'Risikoabwaegung' is sub-section of DSFA → added to SKIP_HEADINGS
- 'Social Media' standalone heading → recognized as social_media DSE
- Removed 'risikobew' from DSFA pattern (was too broad)

Expanded checklists:
- Widerruf: 4→7 checks (+Empfaenger, kein Grund, §312k Button)
- AGB: 4→9 checks (+Zahlung, Lieferung, Gewaehrleistung, Kuendigung, Datenschutz)
- Social Media: +1 (Social Bookmarks)
- DSFA: +1 (LFDI Richtlinie)

Total: 47→56 Regex-Checks across 7 document types:
DSI=9, Cookie=5, Social Media=10, DSFA=8, Impressum=6, Widerruf=7, AGB=9

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 11:55:29 +02:00
Benjamin Admin 58234ac18b fix: DSFA must be matched before social_media in SECTION_TYPE_MAP
'Datenschutzfolgeabschätzung...Social Media' was matching as social_media
(Art. 26) instead of dsfa (Art. 35) because the social_media pattern
'datenschutz.*social media' matched first.

Fixed: DSFA patterns checked before social_media patterns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 11:35:10 +02:00
Benjamin Admin 4642abba23 feat: Expand Social Media (10 checks) + DSFA (8 checks) checklists
Art. 26 Joint Controller (10 checks, was 7):
+ Auflistung der genutzten Plattformen
+ Rechtsgrundlage (Art. 6)
+ Social Bookmarks vs. Plugins Hinweis
Improved: broader patterns for joint parties, contact point, data types

DSFA Art. 35 (8 checks, was 5):
+ Schwellwertanalyse / Auslösepruefung
+ Beruecksichtigung Landesbehörden-Richtlinie (LFDI)
+ Dokumentation der Ergebnisse
Improved: IHK-specific patterns (Kanäle, systematische Beobachtung,
geringer Umfang, sensitive Daten)

Total: 40 → 47 Regex-Checks across all document types.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 11:17:16 +02:00
Benjamin Admin e7f2f98da3 feat: IACE CE-Compliance Module — Normen, Risikobewertung, Production Lines
Major features:
- 215 norms library with section references + Beuth URLs (A/B1/B2/C norms)
- 173 hazard patterns with detail fields (scenario, trigger, harm, zone)
- Deterministic pattern matching: Component × Lifecycle × Pattern cross-product
- SIL/PL auto-calculation from S×E×P risk graph
- Risk assessment table with editable S/E/P dropdowns
- Production Line Dashboard with animated station flow (Running Dots)
- IACE process flow + norms coverage on start page
- Non-blocking cookie banner, ProcessFlow SSR fix
- 104 Playwright E2E tests passing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 10:53:26 +02:00
Benjamin Admin 3853a0838a feat: Art. 26 Joint Controller + DSFA checklists for Social Media sections
New checklists:
- JOINT_CONTROLLER_CHECKLIST (Art. 26 DSGVO, 7 checks):
  Joint parties, arrangement, contact point, processing split,
  data categories, third-country transfer (USA), rights
- DSFA_CHECKLIST (Art. 35 DSGVO, 5 checks):
  Description, necessity, risk assessment, measures, DSB involvement

Section detection: 'Datenschutzerklaerung fuer Social Media' → social_media,
'Datenschutzfolgeabschaetzung/Risikoanalyse' → dsfa

classify_document_type: DSFA and social_media detected before generic DSE

Frontend: DOC_TYPES dropdown + ChecklistView labels updated

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 10:49:32 +02:00
Benjamin Admin 5188411828 disable: Control Library checks until doc-check Master Controls are ready
8 false positives from generic canonical_controls. Regex checks (9+5)
are accurate. Re-enable when ~80 specific doc-check controls exist.
See INSTRUCTION-master-controls-for-doc-check.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 10:28:25 +02:00
Benjamin Admin 45446aef16 fix: 8 quality + UX improvements
1. Cookie 'Zwecke' false positive: added 'um...zu', 'dienen', 'helfen',
   'ermöglichen' patterns — catches purpose descriptions without 'Zweck'
2. Kurzhinweis: added empty all_checks for short documents (<200 words)
3. Bezeichnungsfeld: placeholder shows 'Version / Stand' for typed docs,
   'Dokumentname' for 'Sonstiges'
4. DocCheckTab state persistence: entries + results survive navigation
5. DocCheck history: saves each check with date, doc count, findings
6. History display: 'Letzte Pruefungen' section at bottom of tab
7. ChecklistView: shows 'X von Y Pruefpunkten bestanden' per document
8. Results persist in localStorage across page navigation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 09:37:47 +02:00
Benjamin Admin e19d9ca532 docs: Master Controls spec for document checker — 80-100 specific check criteria
Detailed requirements for the pipeline session:
- Binary yes/no check_question per control
- Concrete pass_criteria + fail_criteria (not 'check completeness')
- correction_template from our Template Generator
- 8 document types: DSI, Cookie, Impressum, Widerruf, AGB, DSFA, AVV, Loeschkonzept
- ~80-100 total controls (not 25K generic ones)
- Examples for DSI, Cookie, Impressum with exact field expectations

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-07 07:53:36 +02:00
Benjamin Admin a680276c86 fix: Filter controls by test_procedure content — eliminates governance false positives
Only use controls whose test_procedure mentions document-type-specific terms:
- DSI: test_procedure must contain 'datenschutzerkl' or 'art. 13/14'
- Cookie: must contain 'cookie', 'einwilligung', 'consent'
- Impressum: must contain 'impressum'

This filters out internal governance controls (Datenmodelle, Infrastruktur)
that are irrelevant for public document checks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 20:42:35 +02:00
Benjamin Admin fa45b5793c feat: Control Library check via SQL (canonical_controls) instead of Qdrant
Complete rewrite of rag_document_checker.py:
- Queries canonical_controls table (294K controls, 10K data_protection)
- Filters by category + title keywords per document type
- Uses test_procedure field as actual check instructions
- Regex pre-check extracts key terms from procedure → fast match
- LLM fallback only for regex misses (saves tokens)
- /no_think prefix for direct JSON output

SQL approach advantages:
- Structured data with test_procedure, pass_criteria, fail_criteria
- Category filtering (data_protection, compliance, governance)
- No Qdrant API key issues
- Controls are actual check criteria, not general legal texts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 20:26:56 +02:00
Benjamin Admin 7e7f31c344 disable: RAG checks until Master Controls (G1 Decision Trace) are ready
Current 144K controls are general legal texts, not specific check criteria.
RAG integration code stays (rag_document_checker.py), just disabled in
the doc-check endpoint. Re-enable when G1-G4 block is complete and
25K Master Controls with Decision Trace are available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 17:11:58 +02:00
Benjamin Admin 6da36d87c2 fix: Robust JSON parsing for LLM responses — handles unquoted keys, fallback extraction
LLM returns {fulfilled: true} instead of {"fulfilled": true}.
Now fixes unquoted keys, True→true, and falls back to text-based
boolean extraction when JSON parsing fails entirely.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 15:18:52 +02:00
Benjamin Admin e50c4d659e fix: Disable Qwen thinking mode for RAG checks (/no_think prefix)
Qwen 3.5 uses all tokens for thinking, leaving response empty.
Using /no_think prefix to get direct JSON output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 15:12:51 +02:00
Benjamin Admin 9f16e6d535 fix: Read Qwen response from 'thinking' field when 'response' is empty
Qwen 3.5 with latest Ollama returns structured thinking in separate
'thinking' field, leaving 'response' empty. Now checks both fields.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 15:07:09 +02:00
Benjamin Admin 1ff34227bf debug: Add logging to RAG check integration
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 14:57:30 +02:00
Benjamin Admin f4374cfe8d feat: Semantic Qdrant search — embed query via bge-m3, vector search in local Qdrant
Replaces scroll+filter approach with proper semantic search:
1. Embed query via bp-core-embedding-service (bge-m3, 1024 dim)
2. Vector search in Qdrant (bp_compliance_datenschutz + bp_compliance_gesetze)
3. Sort by cosine similarity score
4. No API key needed — local Qdrant on Mac Mini

Falls back gracefully: SDK first, then semantic Qdrant, then empty.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 14:46:06 +02:00
Benjamin Admin 7b8440191e fix: Better error logging + increase LLM timeout to 120s for RAG check
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 14:33:58 +02:00
Benjamin Admin 510f513811 fix: Qdrant search uses chunk_text + section/category filter
Payload structure: chunk_text (not text), section (Article 13),
category, regulation_id. Scrolls 100 points per collection,
filters client-side against regulation keywords.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 14:28:32 +02:00
Benjamin Admin b50c4ec940 fix: RAG checker falls back to local Qdrant when Go SDK returns 401
Go SDK points to external Qdrant (qdrant-dev.breakpilot.ai) with expired API key.
Fallback: search directly in local Qdrant (bp-core-qdrant:6333) which has
all collections: bp_compliance_datenschutz, bp_compliance_gesetze, atomic_controls_dedup.

Search strategy:
1. Try Go SDK RAG endpoint (preferred, has embedding-based search)
2. Fallback: Qdrant scroll with text-based regulation filter

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 14:23:52 +02:00
Benjamin Admin 090da0f71b feat: RAG-based document verification against 144K Control Library
New module: rag_document_checker.py
- Searches RAG (Qdrant) for controls relevant to document type
- Filters by regulation (DSGVO Art.13, TDDDG §25, BGB §355 etc.)
- LLM (Qwen 3.5:35b) verifies each control against document text
- Returns fulfilled/missing with evidence text + severity
- Supports: DSI, Cookie, Impressum, Widerruf, AGB, DSFA, AVV, Loeschkonzept

Integration in doc-check endpoint:
- Regex checklist runs first (fast, deterministic)
- RAG checks run after (semantic, catches what regex misses)
- Both results combined in single response

LLM prompt returns JSON: {fulfilled, evidence, issue, severity}
Think-tags stripped, JSON extracted from response.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 13:19:15 +02:00
Benjamin Admin 13c5880f51 fix: Restrict sub-section detection to genuinely separate document types
Only Cookie and Widerruf sections are checked as separate documents.
Social Media, DSFA, Betroffenenrechte, Dienste von Drittanbietern are
part of the parent DSI and no longer generate false findings.

Added PLAN-rag-document-check.md for Phase 2:
- RAG-based checks with document-type-specific Controls
- DSFA checklist (Art. 35 + Landes-Listen)
- AVV checklist (Art. 28)
- Reference detection (sub-doc → parent doc)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 11:02:36 +02:00
Benjamin Admin 0416bb5d04 fix: Checklist expand — use index instead of URL (prevents all opening at once)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 10:56:44 +02:00
Benjamin Admin 539bc824fd feat: Auto-detect sub-sections within a page and check each separately
When a single URL contains multiple document sections (e.g. IHK DSI page
with Cookies, Social Media, Dienste von Drittanbietern), the system now:

1. Extracts full page text (main document check as before)
2. Splits text at heading boundaries (short uppercase lines)
3. Classifies each section: Cookie→cookie checklist, Social Media→DSI etc.
4. Runs type-specific checklist per section
5. Returns all results: main doc + sub-sections

Section type detection via SECTION_TYPE_MAP patterns:
- 'Cookie*' → §25 TDDDG checklist
- 'Dienste von Drittanbietern' → DSI checklist
- 'Social Media' → DSI checklist (Art. 26 joint controllership)
- 'Widerrufsrecht' → §355 BGB checklist
- 'Impressum' → §5 TMG checklist

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 10:44:42 +02:00
Benjamin Admin 4c68caac4e feat: Multi-URL Document Check with full checklist visibility
New "Dokumenten-Pruefung" tab in Compliance Agent:
- User adds multiple URLs with document type (DSI, AGB, Impressum, Cookie, Widerruf)
- Each document loaded via Playwright, accordions expanded, text extracted
- Checked against type-specific legal checklist
- Optional: Cookie banner check via checkbox

Checklisten-UX (solves "100% looks like nothing was checked"):
- All checks shown per document: green checkmark + matched text excerpt
- Red X for missing fields with legal reference
- Builds user trust: "9 Punkte geprueft, alle bestanden"
- Expandable per document with completeness bar

New checklists:
- Impressum: §5 TMG (6 fields: name, address, contact, register, VAT, representative)
- Cookie-Richtlinie: §25 TDDDG (5 fields: types, purposes, retention, third-party, opt-out)

Backend:
- POST /agent/doc-check — async with polling (same pattern as /scan)
- DocCheckResult includes checks[] with passed/failed + matched_text
- dsi_document_checker returns all_checks in SCORE finding
- Email report shows per-document checklist

Files: agent_doc_check_routes.py (280 LOC), DocCheckTab.tsx (248 LOC),
ChecklistView.tsx (130 LOC), dsi_document_checker.py (+70 LOC)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-06 10:08:40 +02:00
Benjamin Admin 254dbab566 fix: Keep every scan in history (no dedup by URL)
Each scan is a separate entry so users can track changes over time.
Increased max entries from 20 to 50.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 18:31:17 +02:00
Benjamin Admin ef8e7e599f feat: IACE +40 DGUV-extended patterns (HP094-HP133) — 133 total
Mechanical extended (HP094-HP103): Cutting, impact, friction, high-pressure
jet, ejection of fragments, tripping, gear/chain entanglement, clothing
winding, pendulating loads, tool kickback

Electrical extended (HP104-HP109): Arc flash, capacitor residual charge,
static discharge, grounding fault, induced voltage, overcurrent fire

Hazardous substances (HP110-HP117): Dust explosion, solvent vapors,
cutting fluid irritation, welding fumes, chemical burns, suffocation
in confined spaces, biological contamination, asbestos release

Radiation (HP118-HP123): Laser eye injury, UV from welding, infrared
heat, EMF induction, ionizing radiation, glare

Fire/Explosion (HP124-HP130): Electrical overheating, gas/vapor explosion,
hydraulic oil fire, metal dust fire, pressure vessel burst, oxygen
enrichment, spontaneous combustion

Ergonomic extended (HP131-HP133): RSI, whole-body vibration, hand-arm vibration

Total pattern library: 133 patterns (44 builtin + 14 press + 7 cobot +
28 operational + 40 DGUV) + ~58 extended rule library

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 18:22:57 +02:00
Benjamin Admin 8fb2061e9b fix: Eliminate GA false positive + handle short DSI documents
Service detection:
- Only search script tags + src/href attributes for service patterns
- Prevents false positives from DSE text mentioning services
  (e.g. IHK DSE describes etracker, 'google analytics' in text)
- Technical patterns (with regex chars) still checked in full HTML

Short documents:
- Documents with < 200 words flagged as 'Kurzhinweis' instead of
  'MANGELHAFT' — too short for Art. 13 completeness check
- Prevents 96-word navigation pages from showing 8 missing fields

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 18:21:37 +02:00
Benjamin Admin 8d6959e8b2 fix: Expand Art. 13 patterns for generic matching across all websites
Complaint (Art. 13(2)(d)):
+ 'recht auf beschwerde', 'art. 77', 'beschwerde...wenden/einlegen',
  'zuständige behörde' — IHK uses 'Recht auf Beschwerde gem. Art. 77'

Legal basis (Art. 13(1)(c)):
+ 'gemäß Art.', '§ X IHKG/BDSG/LDSG/BBiG/TDDDG', 'einwilligung gem',
  'verarbeitung auf grundlage' — catches statutory references

Third country (Art. 13(1)(f)):
+ 'Übermittlung ausserhalb', 'EWR/EEA', 'Data Privacy Framework'

Retention (Art. 13(2)(a)):
+ 'Dauer der Speicherung', 'Aufbewahrungsdauer/-pflicht/-zeit',
  'gesetzliche Aufbewahrung' — common German DSE headings

All patterns are generic, not IHK-specific.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 17:45:02 +02:00
Benjamin Admin 85e82d0dfa feat: IACE 28 operational hazard patterns (HP066-HP093)
Fault Clearing (HP066-HP072): Jammed parts releasing, hose bursts,
unexpected restart, stored energy, intervention in running machine,
material jam, falling parts during fault clearing

Maintenance (HP073-HP079): Missing LOTO, falls from platforms,
hot parts contact, hazardous substances, electric shock, ergonomic
access, uncontrolled hydraulic lowering

Setup/Changeover (HP080-HP085): Crushing during tool change, burns
from hot tools, heavy tool drops, unintended stroke in setup mode,
wrong parameters, test cycle hits personnel

Transport/Install/Decommission (HP086-HP090): Machine tipping,
crushing during installation, uncontrolled commissioning movement,
residual media, sharp edges

Cleaning (HP091-HP093): Slipping, chemical exposure, draw-in

Lifecycle keywords expanded: werkzeugwechsel, stoerung, fehlersuche,
klemm, blockier, stau → trigger fault_clearing phase patterns

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 17:42:38 +02:00
Benjamin Admin a349111a01 fix: Raise full_text limit 10K→50K + combine all DSI texts for checks
Two fixes:
1. consent-tester: full_text truncation raised from 10,000 to 50,000 chars
   (IHK Internetangebot has ~50K chars, Beschwerderecht was after 10K cutoff)
2. Backend: dse_text now combines Playwright HTML + ALL DSI discovery texts
   for mandatory content checking. Previously only used first 8K chars from
   one source, missing Verantwortlicher/DSB that were in DSI documents.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 16:03:56 +02:00
Benjamin Admin 3ac8d0cba8 fix: IACE mitigations page — remove broken 'm.' prefix + accept 'protective' type
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 15:52:10 +02:00
Benjamin Admin e3ae35891f fix: 0% completeness bug — SCORE finding was not generated at 100%
Root cause: When all 9 Art. 13 checks passed (100%), no SCORE finding
was created (line: 'if pct < 100'). The backend then defaulted to
completeness=0 because it looked for the SCORE finding to extract the %.

Fix: Always generate SCORE finding, even at 100%. Added 'OK' severity
for fully compliant documents.

This was the cause of 8 documents showing '0% MANGELHAFT' despite
containing all required information.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 15:34:04 +02:00
Benjamin Admin 72761d6066 debug: Log DSI text lengths to diagnose 0% completeness bug
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 14:08:04 +02:00
Benjamin Admin e494cf62bb fix: Increase page load timeouts — IHK site needs >30s for networkidle
- Initial page.goto timeout: 30s → 60s (IHK loads many JS resources)
- Per-page navigation timeout: 20s → 45s (heavy JS sites)
- Reduced extra wait from 3s+1s back to 2s+0.5s (goto timeout handles slow loads)
- Playwright scanner page timeout: 20s → 45s

Root cause: IHK website has heavy JavaScript that takes >30s to reach
'networkidle' state, causing DSI discovery to fail immediately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 13:10:59 +02:00
Benjamin Admin d547e63663 fix: DSI dedup prefers 'Datenschutzinformation*' titles + better JS content extraction
Bug 1 fix: When merging documents with identical word_count, prefer
titles starting with 'Datenschutzinformation' over generic section
headings like 'Zweck und Rechtsgrundlage'. This restores the main
'Datenschutzinformationen zum Internetangebot' document.

Bug 2 fix: After navigating to a document page, wait 3s (was 2s) for
JS content loading, then try 10+ content selectors before falling back
to body text (with nav/header/footer removed). Handles IHK-style JS
navigation where content loads after page.goto() completes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 12:26:42 +02:00
Benjamin Admin b4f90ed113 fix: IACE components page — remove broken 'c.' prefix from refactor
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 12:20:09 +02:00
Benjamin Admin daa47bb7ab feat: Scan history — shows last 20 scans with URL, date, findings count
- localStorage-based scan history (persists across sessions)
- Each completed scan adds entry: URL, timestamp, findings count, docs count
- 'Letzte Scans' section below results shows clickable history entries
- Click loads URL into form (and shows cached result if same URL)
- Max 20 entries, deduplicates by URL (latest scan wins)
- History visible in 'Website-Scan' tab

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 11:52:35 +02:00
Benjamin Admin 6c5e086356 fix: DSI dedup — skip anchor links, filter noise, merge duplicates + fix false positives
Dedup fixes:
- Anchor links (#cookies, #betroffenenrechte) on same page are skipped entirely
- Noise titles filtered: 'drucken', 'nach oben', 'Datenschutz' (too generic)
- Documents with < 50 words filtered (navigation snippets)
- Documents with identical word_count merged (same page, different title)
- URL-only titles filtered

False positive fixes (dsi_document_checker.py):
- 'Kontaktdaten des Verantwortlichen' pattern for controller check
- 'Zweck und Rechtsgrundlage' combined heading pattern
- 'Welche Daten werden verarbeitet' question-style headings
- 'Betroffenenrechte' as standalone heading
- 'Welche Rechte hat der Betroffene' question pattern
- 'Daten werden geloescht' retention pattern
- 'Auftragsverarbeiter' as recipient indicator

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 11:41:07 +02:00
Benjamin Admin 8e40155459 feat: Scan state persists across navigation — resume polling on return
- URL, mode, tab, scan result persisted in localStorage
- Active scan_id stored — polling resumes when returning to page
- Scan results survive navigation to other SDK modules
- 'Scan laeuft noch...' shown when returning to in-progress scan
- Cleans up localStorage when scan completes or fails

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 10:47:39 +02:00
Benjamin Admin b5cf25f6ab fix: IACE overview null-check for risk_summary (empty projects)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 10:44:16 +02:00
Benjamin Admin 7c7513525e feat: Document-centric scan results + DSI deduplication
DSI Dedup (consent-tester):
- Only H1/H2 headings count as documents (not H3/H4 sub-sections)
- Sub-sections (Cookies, Betroffenenrechte, Social Media) are part of
  parent document's full text, not separate documents
- Reduces IHK result from 30 to ~11 real documents

Backend (agent_scan_routes):
- ScanFinding gets doc_title field linking each finding to its document
- doc_title set when creating DSI findings for document attribution

Frontend (ScanResult.tsx):
- 3 sections: Services table, Document cards, General findings
- Documents: expandable cards with completeness bar (green/yellow/red)
- Findings grouped under their parent document
- Each card shows: title, word count, findings count, % completeness
- Findings without doc_title go to "Allgemeine Findings" section

Email Summary (agent_scan_helpers):
- Findings listed under their parent document
- General findings in separate section
- No more flat mixed list

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 09:56:29 +02:00
Benjamin Admin d816cf8d3a fix: missing closing brace in GetBuiltinHazardPatterns()
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 09:36:23 +02:00
Benjamin Admin 8dd1581fae feat: IACE SIL/PL calculator + Cobot patterns + library extensions
SIL/PL Calculator: Deterministic S×E×P → PL (a-e) → SIL (1-3) mapping
Cobot Patterns (HP059-HP065): Human-robot collision, afterrun, misprogramming
Press Patterns split into separate file (500-line guardrail)
5 new components (C136-C140), 5 new tags, 18 keyword entries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 09:29:03 +02:00
Benjamin Admin ea8353f1a0 fix: Scan progress display — separate progress state, guard ScanResult render
- scanProgress state tracks live progress (not mixed into scanData)
- ScanResult only renders when scanData.services exists (prevents crash)
- Purple progress bar with spinner shows current step during scan
- Fixes: TypeError 's.services.filter' when progress data set as scanData

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 08:29:03 +02:00
Benjamin Admin d80cb9c8e4 feat: IACE Interview Frontend — 3 Modi (Interview/Wizard/Formular)
CE-Risikobeurteilung Datenerfassung mit 3 wählbaren Eingabe-Modi:

1. Interview-Modus (Chat-artig): Fragen werden nacheinander gestellt
   wie im Kundengespräch. Antwort-Historie sichtbar.
2. Wizard-Modus: Schritt-für-Schritt durch 8 Sektionen.
3. Formular-Modus: Alle Sektionen als Accordion auf einer Seite.

20 strukturierte Fragen in 8 Abschnitten:
- Maschinenbeschreibung (Name, Typ, Baugruppen)
- Lebensphasen (Betrieb, Einrichten, Wartung)
- Bestimmungsgemäße Verwendung
- Vorhersehbare Fehlanwendung
- Qualifikation der Benutzer
- Räumliche/Zeitliche Grenzen
- Technische Daten (Kräfte, Spannungen, Temperaturen, Drehzahlen)
- Umgebungsbedingungen

answersToNarrativeText() konvertiert alle Antworten in den Freitext
der an POST /parse-narrative gesendet wird.
Ergebnis-Panel zeigt: Komponenten, Gefahren, Patterns, Energiequellen.

URL: /sdk/iace/[projectId]/interview

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-05 08:22:59 +02:00