The inline DSI_KEYWORDS in dsi_discovery.py was missing 'impressum'.
This caused self-extraction to skip impressum pages, returning
datenschutz text instead. Added: impressum, anbieterkennzeichnung,
imprint, legal notice, site notice.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
9 files had conflict markers from the branch merge. All resolved keeping
the feature branch version. Also split agent_scan_routes.py (534→367 LOC)
by extracting Pydantic models to agent_scan_models.py.
[guardrail-change]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New page /sdk/master-controls with sortable, searchable MC list
- Click MC → expandable detail panel with atomic controls
- Shows L1 token, L2 subtopic, phase, severity, regulation source
- API proxy via pg directly to compliance.master_controls
- Sidebar entry added
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- FAB-Container bekommt pointer-events-none, nur Button + Panel sind klickbar
(behebt: Buttons auf der rechten Seite waren nicht klickbar)
- Initialisieren + Neu-Initialisieren Buttons von Interview-Seite auf
Betriebszustaende-Seite verschoben (natuerlicher Flow: Grenzen → States → Init)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- POST /initialize?force=true loescht bestehende Hazards + Mitigations
und erstellt sie neu mit aktuellen Betriebszustaenden
- Orange "Neu initialisieren" Button auf Interview-Seite (mit Confirm-Dialog)
- DeleteHazard Store-Methode (kaskadiert Risk Assessments)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hazards zeigen jetzt farbige Badges mit den Betriebszustaenden die sie
ausgeloest haben (z.B. "Wartung", "Not-Halt"). Mitigations erben die
States ihrer verknuepften Hazards.
Backend: OperationalStates im Function-Feld encodiert (kein DB-Schema),
beim Lesen als operational_states[] JSON-Feld zurueckgegeben.
Frontend: Indigo-Badges in HazardTable + MitigationCard.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Connect three previously siloed modules to the contextBridge:
- CookieBanner → CONSENT (analytics tools, marketing partners) + FEATURES (CMP_NAME, HAS_FUNCTIONAL_COOKIES)
- RetentionPolicies → PRIVACY.ANALYTICS_RETENTION_MONTHS (from actual Loeschfristen data)
- UseCases → FEATURES flags (HAS_ACCOUNT, HAS_PAYMENTS, HAS_NEWSLETTER, HAS_SOCIAL_MEDIA)
Previously all FEATURES were hardcoded false/empty in EMPTY_CONTEXT.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Betriebszustand-UI saved states to metadata.operational_states but
the initialize handler only read states from the parsed narrative text.
Now merges both sources so the UI selection actually affects which
patterns fire during initialization.
Added integration E2E test that verifies: 2 states → fewer patterns,
9 states → more patterns.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Impressum-Check: Toggle activates 75 Impressum MCs via agent
- Banner-Check: Toggle runs additional cookie doc-check (381 MCs)
after the Playwright banner test completes
- Both use the same use_agent flag through doc-check endpoint
Green pill button consistent across all tabs:
'KI-Agent aus' / 'KI-Agent aktiv (X MCs)'
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Project list view with saved projects
- Create + analyze in one flow (saves to DB)
- Re-open saved projects for re-analysis
- 3 views: projects list → wizard → dashboard
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Green pill button: 'KI-Agent aus' / 'KI-Agent aktiv (1.874 MCs)'
Toggles use_agent flag which is passed through the full chain:
Frontend → DocCheckRequest → _run_doc_check → _check_single_document
→ check_document_with_controls(use_agent=True)
→ ComplianceAgent with tool calling
Default: OFF (deterministic regex). User can enable per scan.
Also works via env var COMPLIANCE_USE_AGENT=true for always-on.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ProductWizard: Product type, technologies, data processing, certifications
- GapDashboard: Summary cards, regulation overview, prioritized gap table
- Expandable rows with recommendations
- Filter by severity and status
- Route: /sdk/gap-analysis
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extend banner consent records with consent_method, banner_version,
banner_config_hash, geo, page_url, referrer, device info, session_id
and consent_scope for full Art. 7 DSGVO proof with any tracking vendor.
Migration 107, backward-compatible (all fields nullable).
Admin detail modal shows tracking context, device info and technical data.
Fix pre-existing str|None → Optional[str] for Python 3.9 compat.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New agent architecture for intelligent MC evaluation:
agent_tools.py (367 LOC):
- 5 tools in OpenAI function-calling format
- query_controls: async DB query for MCs by doc_type
- evaluate_controls_batch: deterministic keyword matching
- search_document: text search with context
- get_document_stats: word count, sections, language
- submit_results: finalize check results
compliance_agent.py (398 LOC):
- ComplianceAgent class with agent loop
- 3 LLM providers: Ollama, OpenAI-compatible (OVH), Anthropic
- Tool call dispatch + result collection
- System prompt for systematic compliance analysis
- run_compliance_check() convenience function
Hybrid mode:
- COMPLIANCE_USE_AGENT=false (default): deterministic regex
- COMPLIANCE_USE_AGENT=true: LLM agent with tool calling
- Agent fallback to regex if LLM unavailable
Works with Qwen 35B (Ollama), Qwen 120B (OVH vLLM), Claude.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merges two separate consent views into one unified page at /sdk/einwilligungen:
- Tab "Website-Besucher": device-based banner consents with site selector
- Tab "Login-Nutzer": user-based DSGVO consents (existing, unchanged)
Backend:
- New endpoint GET /admin/consents for paginated banner consent records
- Fix: categories JSON string parsing (was iterating chars instead of array)
CMP Dashboard:
- Dynamic site selector replacing hardcoded "preview-test-site"
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Deterministic pass/fail stays unchanged. After keyword checking,
ONE batched LLM call enriches the top 10 severity FAILs with
context-specific recommendations based on the actual document.
Example: If document uses Google Analytics but lacks transfer
mechanism → LLM generates: "Sie nutzen Google Analytics (USA).
Ergaenzen Sie einen Verweis auf das EU-US Data Privacy Framework
und pruefen Sie die DPF-Zertifizierung unter dataprivacyframework.gov."
- Pass/fail: deterministic (keyword matching, reproducible)
- Hint enrichment: LLM (contextual, one call for all fails)
- Temperature 0.3 for consistency
- Graceful fallback if Ollama unavailable
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaced LLM-based MC verification with deterministic keyword matching:
- Extracts keywords from pass_criteria/fail_criteria
- Matches against document text via regex (case-insensitive)
- PASS if >= 60% of criteria keywords found AND no fail_criteria triggered
- Same text + same MCs = same result every time
Checks ALL MCs for the doc_type (max_controls=0):
- DSE: all 571 controls checked in <1 second
- Impressum: all 75 controls
- Cookie: all 381 controls
No LLM calls needed — purely deterministic keyword matching.
Bigram extraction for compound terms (e.g. "standardvertragsklauseln").
Stop word filtering for German legal text.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewritten rag_document_checker.py to use doc_check_controls table
instead of generic canonical_controls. Each MC has:
- check_question: binary YES/NO for LLM
- pass_criteria: JSONB list of concrete requirements
- fail_criteria: JSONB list of common mistakes
Flow: Regex checks (fast) → LLM verify FAILs → MC deep check (15 per doc)
MC results appear as additional L2 checks in the report.
Coverage: 571 DSE, 381 Cookie, 309 Loeschkonzept, 153 Widerruf,
147 DSFA, 125 AVV, 113 AGB, 75 Impressum = 1.874 total.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All elements exist twice on the preview page (desktop + mobile or
banner + page content). Using .first() avoids strict mode violations.
Also extracted goToPreview() and acceptAll() helpers for DRY.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extra waitForTimeout(3000) pro Test verdoppelte Laufzeit und verursachte
mehr Timeouts. Zurueck zum funktionierenden Ansatz: goTo wartet auf h1
+ 2s, dann 20s toBeVisible Timeout pro Assertion.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Die letzten 3 Schwingarm-Failures kommen weil die Overview-Seite 2
parallele API-Fetches (project + risk-summary) braucht bevor der
Content rendert. goTo wartet auf h1, aber die h2-Sektionen
(Risikozusammenfassung, Schnellzugriff) rendern erst danach.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Root cause der 16 overview-Failures: goTo kehrte zu frueh zurueck weil
nav sofort sichtbar ist (SSR), aber der Main-Content (Projektstatus etc.)
erst nach API-Fetch rendert. Jetzt wartet goTo auf h1 (das erst nach
dem project-Fetch erscheint) + 1s Buffer.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
networkidle times out on CMP pages that poll API endpoints.
domcontentloaded + 1s wait is sufficient for page rendering.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
"impressum" was missing from DSI_KEYWORDS despite being listed in
the docstring. This caused /impressum URLs to skip self-extraction
and return linked datenschutz text instead.
Added: DE: impressum, anbieterkennzeichnung, kontakt
EN: imprint, legal notice, site notice, legal information
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When checking impressum/agb/widerruf, the DSI discovery would follow
links away from the page and return the wrong document (e.g.
/impressum → finds link to /datenschutz → returns datenschutz text).
Now: for non-DSE doc_types, prefer the html_full_page document
(self-extracted from the actual URL the user provided) over linked
pages found by the crawler.
Fixes safetykon.de/impressum returning datenschutz text.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5 verification layers added to the 3-phase banner test:
1. DataLayer/GTM Interception: Proxy on window.dataLayer captures
all push() events. Distinguishes safe lifecycle events (gtm.js,
gtm.dom) from tracking events (page_view, conversion, purchase).
Flags tracking events before consent as violations.
2. localStorage/sessionStorage Monitoring: Intercepts setItem() to
detect tracking keys (_ga, _fbp, amplitude, mixpanel, etc.)
written before consent.
3. Google Consent Mode v2 Runtime Verification: Reads actual GCM
state (analytics_storage, ad_storage) per phase. Verifies
default=denied before consent, stays denied after reject,
switches to granted after accept.
4. TCF v2.2 State: Reads __tcfapi('getTCData') if available.
Verifies consent purpose states match user choice.
5. Cookie Attribute Analysis: Domain (1st vs 3rd party), expires
(>13 months), secure flag for tracking cookies.
10 new L2 checks with expert hints (EDPB, CNIL, §25 TDDDG).
All interceptor calls wrapped in try/except for graceful fallback.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
20 checks were defaulting to PASS when no violation was found,
even if the scanner couldn't actually test them. Now:
- Phase-based checks (tracking/cookies): absence = PASS (correct)
- UI checks: only PASS if banner_checks actually ran
- If banner not detected: everything except banner_detected = FAIL
This prevents false 100% scores when violations exist but the
text→code mapping doesn't cover them.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>