Compare commits

...

5 Commits

Author SHA1 Message Date
Benjamin Admin e536247c20 feat(quaidal): backend API + frontend tab for BSI QUAIDAL data-quality controls
Wire the 195 Clean-Room QUAIDAL controls (from breakpilot-core migration 011)
into the compliance SaaS UI.

Backend:
- GET /api/v1/quaidal/stats           - counts by kind + source provenance
- GET /api/v1/quaidal/controls        - list, optional kind= filter
- GET /api/v1/quaidal/controls/{id}   - single derived control
- GET /api/v1/quaidal/criteria        - 10 QKB criteria
- GET /api/v1/quaidal/criteria/{id}   - QKB with QB/MA/QM tree

Frontend:
- /sdk/quality: new "Trainingsdaten-Qualität (BSI QUAIDAL)" tab with
  10 QKB cards and a drill-down modal showing the full QB→MA→QM tree
  plus original BSI source link and license note.
- /sdk/ai-act: Art. 10 tile on each high-risk/unacceptable result,
  linking to /sdk/quality?category=data_quality.

Pattern matches existing IACE module DIN-reference handling:
own wording, source section + URL preserved for due diligence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 13:03:54 +02:00
Benjamin Admin 313982c6f1 feat(profile+report): P17 — 4 Polish-Items
CI / detect-changes (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 16s
CI / loc-budget (push) Successful in 19s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Has been skipped
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped
CI / test-python-backend (push) Successful in 39s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
A) Cookie-Policy-Architecture-Block Fallback auf DSE-Text wenn cookie via
   P15 deduped wurde. Erkennt jetzt auch single-doc Sites (Safetykon-Pattern).

B) Konkrete-Aufgaben-Liste: Per-Doc-Cap (3) entfernt + globaler Cap 10→20.
   Safetykon zeigt jetzt 7 statt 4 Aufgaben.

C) business_type-Klassifizierer: B2B-Service-Cluster aus P14 als Boost.
   Bei 2+ Service-Indikatoren (CE-Zertifizierung/Compliance/Auditierung)
   wird b2b_score angehoben. Safetykon: "B2C consulting" → "B2B (consulting)".

D) Vendor-Extract Fallback auf DSE-Text wenn cookie deduped + keine CMP-
   Payloads. LLM extrahiert dann Vendors aus dem DSE-Text. Safetykon: 0 → 1
   Vendor (Google Analytics aus dem DSE-Text erkannt).

Smoke-Test Safetykon: alle 4 Polish-Items wirken, kein Regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 12:22:05 +02:00
Benjamin Admin f30a3ce471 Merge branch 'main' of ssh://gitea.meghsakha.com:22222/Benjamin_Boenisch/breakpilot-compliance
CI / nodejs-build (push) Successful in 3m23s
CI / test-go (push) Successful in 1m1s
CI / detect-changes (push) Successful in 9s
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / validate-canonical-controls (push) Successful in 18s
CI / loc-budget (push) Successful in 19s
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / iace-gt-coverage (push) Successful in 28s
CI / test-python-backend (push) Successful in 45s
CI / test-python-document-crawler (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
2026-05-19 11:47:45 +02:00
Benjamin Admin 479ce2225b feat(profile): P14+P15+P16 — B2B-Heuristik + Doc-URL-Dedup + Homepage-Profile
P14 — _detect_no_direct_sales erweitert um 3 Cluster:
  A) OEM-Konfigurator (BMW/Audi/Mercedes/VW/Porsche-Markennamen + Vertragshaendler-Pattern)
  B) B2B-Dienstleister (CE-Zertifizierung, Compliance-Beratung, Schulungen, Auditierung, TISAX, ISO-Normen, Arbeitssicherheit, ...)
  C) NGO/Verein/Public (Spendenkonto, Vereinsregister, gemeinnuetzig, ...)
Schwelle: pos >= 2 pro Cluster UND pos > neg. Bisher: nur OEM.

P15 — Doc-URL-Dedup im Worker: wenn mehrere Doc-Types DASSELBE Dokument
referenzieren (Safetykon-Pattern: User gibt /datenschutz fuer dse, cookie
UND widerruf), wird nur dem primaeren Doc-Type (Priority: dse > impressum
> cookie > widerruf > agb > nutzungsbedingungen) der Text gegeben. Andere
landen als "Nicht separat vorhanden — wird im Dokument 'X' mit-geprueft."
Eliminiert die 8+8 systematischen widerruf/cookie False Positives.

P16 — Profile-Detection auch Homepage-Text: Homepage-HTML wird mit kurzem
Fetch (8s timeout) gezogen, getrippt und zum profile_input gemerged. Vor-
her wirkte P14 nur wenn B2B-Indikatoren im DSE/Impressum-Pflichttext
standen — bei Safetykon stehen sie nur im Homepage-Menue.

Plus Bonus: TDM-Override-Submit-Button wird deaktiviert wenn Reason < 10
Zeichen — verhindert dass User wie heute in den Bug rein klickt.

Smoke-Test Safetykon (B2B Compliance-Dienstleister):
  dse                  geprueft (kein err)
  impressum            geprueft (kein err)
  cookie               "Nicht separat vorhanden — wird in DSE mit-geprueft"
  agb                  "Nicht anwendbar — kein Direkt-Kaufvertrag"
  widerruf             "Nicht anwendbar — kein Direkt-Kaufvertrag"
  nutzungsbedingungen  "Nicht anwendbar — kein Direkt-Kaufvertrag"
Vorher: 16 False Positives. Jetzt: 0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 11:46:58 +02:00
Benjamin Admin a1b380e211 fix(iace): getProject scan missed &p.CustomerName — single-project GET 500ed
Migration 031 added customer_name to the SELECT statement in three places
(GetProject, ListProjects, ListVariants), and the per-row Scan needed the
matching destination. The replace_all caught ListProjects + ListVariants
but missed GetProject because of an indentation difference (single tab
vs row-scope indentation). Result: GET /projects/:id returned
  "get project: number of field descriptions must equal number of
   destinations, got 18 and 17"
which the frontend interpreted as "project has no data" and surfaced an
empty UI even though hazards/mitigations/components were intact (118/282/16
on Bremsscheibe).

Single-line fix: add &p.CustomerName to the GetProject scan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 11:46:34 +02:00
18 changed files with 993 additions and 32 deletions
@@ -0,0 +1,27 @@
import { NextRequest, NextResponse } from 'next/server'
const BACKEND_URL = process.env.BACKEND_URL || 'http://backend-compliance:8002'
function tenantHeader(request: NextRequest): string {
return request.headers.get('x-tenant-id') || '00000000-0000-0000-0000-000000000001'
}
export async function GET(
request: NextRequest,
{ params }: { params: Promise<{ derived_id: string }> }
) {
const { derived_id } = await params
try {
const resp = await fetch(
`${BACKEND_URL}/api/v1/quaidal/controls/${encodeURIComponent(derived_id)}`,
{ headers: { 'X-Tenant-ID': tenantHeader(request) }, cache: 'no-store' }
)
const body = await resp.text()
return new NextResponse(body, {
status: resp.status,
headers: { 'Content-Type': resp.headers.get('Content-Type') || 'application/json' },
})
} catch (err) {
return NextResponse.json({ error: 'Backend unreachable', details: String(err) }, { status: 502 })
}
}
@@ -0,0 +1,25 @@
import { NextRequest, NextResponse } from 'next/server'
const BACKEND_URL = process.env.BACKEND_URL || 'http://backend-compliance:8002'
function tenantHeader(request: NextRequest): string {
return request.headers.get('x-tenant-id') || '00000000-0000-0000-0000-000000000001'
}
export async function GET(request: NextRequest) {
const { searchParams } = new URL(request.url)
const qs = searchParams.toString()
try {
const resp = await fetch(
`${BACKEND_URL}/api/v1/quaidal/controls${qs ? `?${qs}` : ''}`,
{ headers: { 'X-Tenant-ID': tenantHeader(request) }, cache: 'no-store' }
)
const body = await resp.text()
return new NextResponse(body, {
status: resp.status,
headers: { 'Content-Type': resp.headers.get('Content-Type') || 'application/json' },
})
} catch (err) {
return NextResponse.json({ error: 'Backend unreachable', details: String(err) }, { status: 502 })
}
}
@@ -0,0 +1,27 @@
import { NextRequest, NextResponse } from 'next/server'
const BACKEND_URL = process.env.BACKEND_URL || 'http://backend-compliance:8002'
function tenantHeader(request: NextRequest): string {
return request.headers.get('x-tenant-id') || '00000000-0000-0000-0000-000000000001'
}
export async function GET(
request: NextRequest,
{ params }: { params: Promise<{ section_id: string }> }
) {
const { section_id } = await params
try {
const resp = await fetch(
`${BACKEND_URL}/api/v1/quaidal/criteria/${encodeURIComponent(section_id)}`,
{ headers: { 'X-Tenant-ID': tenantHeader(request) }, cache: 'no-store' }
)
const body = await resp.text()
return new NextResponse(body, {
status: resp.status,
headers: { 'Content-Type': resp.headers.get('Content-Type') || 'application/json' },
})
} catch (err) {
return NextResponse.json({ error: 'Backend unreachable', details: String(err) }, { status: 502 })
}
}
@@ -0,0 +1,23 @@
import { NextRequest, NextResponse } from 'next/server'
const BACKEND_URL = process.env.BACKEND_URL || 'http://backend-compliance:8002'
function tenantHeader(request: NextRequest): string {
return request.headers.get('x-tenant-id') || '00000000-0000-0000-0000-000000000001'
}
export async function GET(request: NextRequest) {
try {
const resp = await fetch(`${BACKEND_URL}/api/v1/quaidal/criteria`, {
headers: { 'X-Tenant-ID': tenantHeader(request) },
cache: 'no-store',
})
const body = await resp.text()
return new NextResponse(body, {
status: resp.status,
headers: { 'Content-Type': resp.headers.get('Content-Type') || 'application/json' },
})
} catch (err) {
return NextResponse.json({ error: 'Backend unreachable', details: String(err) }, { status: 502 })
}
}
@@ -0,0 +1,23 @@
import { NextRequest, NextResponse } from 'next/server'
const BACKEND_URL = process.env.BACKEND_URL || 'http://backend-compliance:8002'
function tenantHeader(request: NextRequest): string {
return request.headers.get('x-tenant-id') || '00000000-0000-0000-0000-000000000001'
}
export async function GET(request: NextRequest) {
try {
const resp = await fetch(`${BACKEND_URL}/api/v1/quaidal/stats`, {
headers: { 'X-Tenant-ID': tenantHeader(request) },
cache: 'no-store',
})
const body = await resp.text()
return new NextResponse(body, {
status: resp.status,
headers: { 'Content-Type': resp.headers.get('Content-Type') || 'application/json' },
})
} catch (err) {
return NextResponse.json({ error: 'Backend unreachable', details: String(err) }, { status: 502 })
}
}
@@ -331,7 +331,7 @@ export function ComplianceCheckTab() {
{/* Submit button */}
<button
onClick={handleSubmit}
disabled={loading || filledCount === 0}
disabled={loading || filledCount === 0 || (tdmOverride && tdmOverrideReason.trim().length < 10)}
className="w-full px-4 py-3 bg-purple-600 text-white rounded-lg font-medium hover:bg-purple-700 disabled:opacity-50 transition-colors text-sm flex items-center justify-center gap-2"
>
{loading ? (
@@ -0,0 +1,45 @@
'use client'
import Link from 'next/link'
interface Props {
/** Risk classification of the AI system. Tile is only rendered for high_risk / unacceptable. */
riskLevel: string
}
/**
* Renders a tile pointing to the BSI QUAIDAL-based data-quality control tab.
* AI Act Article 10 obligations (training-data quality) apply only to high-risk
* systems, so the tile is skipped for limited / minimal / not-applicable classes.
*/
export function Art10Tile({ riskLevel }: Props) {
if (riskLevel !== 'high_risk' && riskLevel !== 'unacceptable') return null
return (
<Link
href="/sdk/quality?category=data_quality"
className="block mt-3 p-3 rounded-lg border border-purple-200 bg-purple-50 hover:bg-purple-100 transition-colors"
>
<div className="flex items-start gap-3">
<div className="w-9 h-9 rounded-full bg-purple-200 text-purple-700 flex items-center justify-center shrink-0">
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2}
d="M3 7v10a2 2 0 002 2h14a2 2 0 002-2V7M3 7l9 6 9-6M3 7l9-4 9 4" />
</svg>
</div>
<div className="flex-1 min-w-0">
<div className="text-sm font-semibold text-purple-900">
Art. 10 Datenqualität (Hochrisiko-KI)
</div>
<div className="text-xs text-purple-700 mt-0.5">
BSI QUAIDAL Controls: 10 Kriterien, 15 Bausteine, 30 Maßnahmen, 140 Metriken.
Klicken zum Öffnen des Trainingsdaten-Qualität-Moduls.
</div>
</div>
<svg className="w-4 h-4 text-purple-500 shrink-0 mt-1" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 5l7 7-7 7" />
</svg>
</div>
</Link>
)
}
+2
View File
@@ -9,6 +9,7 @@ import { RiskPyramid } from './_components/RiskPyramid'
import { AddSystemForm } from './_components/AddSystemForm'
import { AISystemCard } from './_components/AISystemCard'
import DecisionTreeWizard from '@/components/sdk/ai-act/DecisionTreeWizard'
import { Art10Tile } from './_components/Art10Tile'
type TabId = 'overview' | 'decision-tree' | 'results'
@@ -136,6 +137,7 @@ function SavedResultsTab() {
Löschen
</button>
</div>
<Art10Tile riskLevel={r.high_risk_result} />
</div>
))}
</div>
@@ -0,0 +1,152 @@
'use client'
import { useEffect, useState } from 'react'
import { fetchCriterionTree, type QuaidalControl, type QuaidalCriterionTree } from '../_hooks/useQuaidalData'
interface Props {
sectionId: string
onClose: () => void
}
function ControlBlock({ ctrl, badgeColor }: { ctrl: QuaidalControl; badgeColor: string }) {
return (
<div className="border border-gray-200 rounded-lg p-4 bg-white">
<div className="flex items-start justify-between gap-3 mb-2">
<h4 className="font-semibold text-gray-900">{ctrl.canonical_name}</h4>
<span className={`px-2 py-0.5 text-xs rounded-full ${badgeColor} shrink-0`}>{ctrl.source.section}</span>
</div>
<p className="text-sm text-gray-600 mb-3 whitespace-pre-line">{ctrl.description}</p>
{ctrl.source.url && (
<a
href={ctrl.source.url}
target="_blank"
rel="noreferrer noopener"
className="text-xs text-purple-600 hover:text-purple-800 underline"
>
BSI-Quelle ansehen ({ctrl.source.framework})
</a>
)}
</div>
)
}
export function QuaidalCriterionDetail({ sectionId, onClose }: Props) {
const [tree, setTree] = useState<QuaidalCriterionTree | null>(null)
const [loading, setLoading] = useState(true)
useEffect(() => {
let active = true
setLoading(true)
fetchCriterionTree(sectionId).then(t => {
if (active) {
setTree(t)
setLoading(false)
}
})
return () => { active = false }
}, [sectionId])
return (
<div className="fixed inset-0 z-50 flex items-center justify-center bg-black/40 p-4">
<div className="bg-white rounded-2xl shadow-xl w-full max-w-4xl max-h-[90vh] overflow-hidden flex flex-col">
<div className="flex items-center justify-between px-6 py-4 border-b border-gray-200">
<div>
<div className="text-xs text-gray-500 uppercase tracking-wide">QUAIDAL Kriterium</div>
<h2 className="text-xl font-bold text-gray-900">
{tree?.criterion.canonical_name || sectionId}
</h2>
</div>
<button
onClick={onClose}
className="w-8 h-8 rounded-full hover:bg-gray-100 flex items-center justify-center text-gray-500"
aria-label="Schliessen"
>×</button>
</div>
<div className="overflow-y-auto p-6 space-y-6">
{loading && <div className="text-center text-gray-400 py-12">Lade...</div>}
{tree && (
<>
<div>
<h3 className="text-sm font-semibold text-gray-500 uppercase tracking-wide mb-2">
Anforderung (eigene Formulierung)
</h3>
<div className="bg-purple-50 border border-purple-200 rounded-lg p-4">
<p className="text-gray-800 whitespace-pre-line">{tree.criterion.description}</p>
</div>
<div className="mt-3 flex flex-wrap items-center gap-3 text-xs text-gray-500">
<span>Regulierung: <span className="font-medium text-gray-700">{tree.criterion.regulation_anchor || '—'}</span></span>
<span>Quelle: <span className="font-medium text-gray-700">{tree.criterion.source.framework} {tree.criterion.source.section}</span></span>
{tree.criterion.source.url && (
<a href={tree.criterion.source.url} target="_blank" rel="noreferrer noopener" className="text-purple-600 hover:text-purple-800 underline">
Originalquelle
</a>
)}
</div>
</div>
{tree.criterion.external_refs.length > 0 && (
<div>
<h3 className="text-sm font-semibold text-gray-500 uppercase tracking-wide mb-2">
Externe Referenzen (nicht ingestiert, nur Verweis)
</h3>
<div className="flex flex-wrap gap-2">
{tree.criterion.external_refs.map((ref, i) => (
<span key={i} className="px-2 py-1 text-xs bg-gray-100 text-gray-700 rounded">
{ref.framework}{ref.citation ? `${ref.citation}` : ''}
</span>
))}
</div>
</div>
)}
{tree.building_blocks.length > 0 && (
<div>
<h3 className="text-sm font-semibold text-gray-500 uppercase tracking-wide mb-3">
Bausteine ({tree.building_blocks.length})
</h3>
<div className="grid grid-cols-1 md:grid-cols-2 gap-3">
{tree.building_blocks.map(qb => (
<ControlBlock key={qb.derived_id} ctrl={qb} badgeColor="bg-blue-100 text-blue-700" />
))}
</div>
</div>
)}
{tree.measures.length > 0 && (
<div>
<h3 className="text-sm font-semibold text-gray-500 uppercase tracking-wide mb-3">
Maßnahmen ({tree.measures.length})
</h3>
<div className="grid grid-cols-1 md:grid-cols-2 gap-3">
{tree.measures.map(m => (
<ControlBlock key={m.derived_id} ctrl={m} badgeColor="bg-green-100 text-green-700" />
))}
</div>
</div>
)}
{tree.metrics.length > 0 && (
<div>
<h3 className="text-sm font-semibold text-gray-500 uppercase tracking-wide mb-3">
Metriken & Methoden ({tree.metrics.length})
</h3>
<div className="grid grid-cols-1 md:grid-cols-2 gap-3">
{tree.metrics.map(qm => (
<ControlBlock key={qm.derived_id} ctrl={qm} badgeColor="bg-amber-100 text-amber-700" />
))}
</div>
</div>
)}
</>
)}
</div>
<div className="px-6 py-3 border-t border-gray-200 bg-gray-50 text-xs text-gray-500">
Eigene Clean-Room-Ableitung von BSI QUAIDAL. Quellverweis und Lizenz-Note pro Eintrag.
</div>
</div>
</div>
)
}
@@ -0,0 +1,109 @@
'use client'
import { useState } from 'react'
import { useQuaidalData, type QuaidalControl } from '../_hooks/useQuaidalData'
import { QuaidalCriterionDetail } from './QuaidalCriterionDetail'
function CriterionCard({ ctrl, onOpen }: { ctrl: QuaidalControl; onOpen: () => void }) {
return (
<button
onClick={onOpen}
className="text-left bg-white rounded-xl border border-gray-200 p-5 hover:border-purple-400 hover:shadow-sm transition-all"
>
<div className="flex items-start justify-between mb-2">
<h3 className="font-semibold text-gray-900">{ctrl.canonical_name}</h3>
<span className="px-2 py-0.5 text-xs rounded-full bg-purple-100 text-purple-700">
{ctrl.source.section}
</span>
</div>
<p className="text-sm text-gray-600 line-clamp-3">{ctrl.description}</p>
<div className="mt-3 flex flex-wrap items-center gap-2 text-xs">
<span className="text-gray-500">Bausteine: <span className="font-medium text-gray-700">{ctrl.related_quaidal_ids.length}</span></span>
{ctrl.external_refs.slice(0, 2).map((r, i) => (
<span key={i} className="px-1.5 py-0.5 bg-gray-100 text-gray-600 rounded">
{r.framework}
</span>
))}
</div>
</button>
)
}
export function TrainingDataQualityTab() {
const { criteria, stats, loading, error } = useQuaidalData()
const [openSection, setOpenSection] = useState<string | null>(null)
if (loading) {
return <div className="text-center text-gray-400 py-12">Lade QUAIDAL-Katalog...</div>
}
if (error) {
return (
<div className="bg-red-50 border border-red-200 rounded-lg p-4 text-red-700">
QUAIDAL-Daten konnten nicht geladen werden: {error}
</div>
)
}
return (
<div className="space-y-6">
<div className="bg-purple-50 border border-purple-200 rounded-xl p-5">
<h2 className="text-lg font-semibold text-gray-900">Trainingsdaten-Qualität nach BSI QUAIDAL</h2>
<p className="text-sm text-gray-600 mt-1">
Operative Umsetzung von EU AI Act Art. 10 (Datenqualität für Hochrisiko-KI) auf Basis des
BSI-Katalogs QUAIDAL. Alle Controls sind eigenständig formuliert (Clean-Room) und verweisen
auf die jeweilige QUAIDAL-Sektion.
</p>
{stats && (
<div className="mt-4 grid grid-cols-2 md:grid-cols-4 gap-3 text-sm">
<div>
<div className="text-xs text-gray-500">Qualitätskriterien</div>
<div className="text-xl font-semibold text-gray-900">{stats.counts_by_kind.criterion ?? 0}</div>
</div>
<div>
<div className="text-xs text-gray-500">Bausteine</div>
<div className="text-xl font-semibold text-gray-900">{stats.counts_by_kind.building_block ?? 0}</div>
</div>
<div>
<div className="text-xs text-gray-500">Maßnahmen</div>
<div className="text-xl font-semibold text-gray-900">{stats.counts_by_kind.measure ?? 0}</div>
</div>
<div>
<div className="text-xs text-gray-500">Metriken & Methoden</div>
<div className="text-xl font-semibold text-gray-900">{stats.counts_by_kind.metric ?? 0}</div>
</div>
</div>
)}
</div>
<div>
<h3 className="text-lg font-semibold text-gray-900 mb-4">10 Qualitätskriterien</h3>
{criteria.length === 0 ? (
<div className="bg-white rounded-xl border border-gray-200 p-8 text-center text-gray-400">
Keine Kriterien gefunden. Bitte Backend-Ingest prüfen.
</div>
) : (
<div className="grid grid-cols-1 md:grid-cols-2 lg:grid-cols-3 gap-4">
{criteria.map(c => (
<CriterionCard
key={c.derived_id}
ctrl={c}
onOpen={() => setOpenSection(c.source.section)}
/>
))}
</div>
)}
</div>
{stats?.license_note && (
<div className="text-xs text-gray-500 italic">{stats.license_note}</div>
)}
{openSection && (
<QuaidalCriterionDetail
sectionId={openSection}
onClose={() => setOpenSection(null)}
/>
)}
</div>
)
}
@@ -0,0 +1,86 @@
'use client'
import { useCallback, useEffect, useState } from 'react'
export interface QuaidalExternalRef {
framework: string
citation: string | null
}
export interface QuaidalSource {
framework: string
section: string
url: string | null
commit_sha: string | null
title_original: string | null
license_note: string | null
}
export interface QuaidalControl {
derived_id: string
kind: 'criterion' | 'building_block' | 'measure' | 'metric'
canonical_name: string
description: string
regulation_anchor: string | null
related_quaidal_ids: string[]
external_refs: QuaidalExternalRef[]
source: QuaidalSource
plagiarism_score: number | null
}
export interface QuaidalStats {
counts_by_kind: Record<string, number>
source_framework: string
source_commit_sha: string | null
license_note: string | null
}
export interface QuaidalCriterionTree {
criterion: QuaidalControl
building_blocks: QuaidalControl[]
measures: QuaidalControl[]
metrics: QuaidalControl[]
}
const API_BASE = '/api/sdk/v1/quaidal'
export function useQuaidalData() {
const [criteria, setCriteria] = useState<QuaidalControl[]>([])
const [stats, setStats] = useState<QuaidalStats | null>(null)
const [loading, setLoading] = useState(true)
const [error, setError] = useState<string | null>(null)
const loadAll = useCallback(async () => {
setLoading(true)
setError(null)
try {
const [criteriaRes, statsRes] = await Promise.all([
fetch(`${API_BASE}/criteria`, { cache: 'no-store' }),
fetch(`${API_BASE}/stats`, { cache: 'no-store' }),
])
if (criteriaRes.ok) {
const data = (await criteriaRes.json()) as QuaidalControl[]
setCriteria(Array.isArray(data) ? data : [])
} else {
setError(`Criteria endpoint returned ${criteriaRes.status}`)
}
if (statsRes.ok) {
setStats(await statsRes.json())
}
} catch (err) {
setError(String(err))
} finally {
setLoading(false)
}
}, [])
useEffect(() => { loadAll() }, [loadAll])
return { criteria, stats, loading, error, reload: loadAll }
}
export async function fetchCriterionTree(sectionId: string): Promise<QuaidalCriterionTree | null> {
const res = await fetch(`${API_BASE}/criteria/${encodeURIComponent(sectionId)}`, { cache: 'no-store' })
if (!res.ok) return null
return (await res.json()) as QuaidalCriterionTree
}
+56 -16
View File
@@ -1,15 +1,23 @@
'use client'
import { useState, useEffect } from 'react'
import { useSearchParams } from 'next/navigation'
import { useSDK } from '@/lib/sdk'
import { useQualityData } from './_hooks/useQualityData'
import { MetricCard, type QualityMetric } from './_components/MetricCard'
import { TestRow } from './_components/TestRow'
import { MetricModal } from './_components/MetricModal'
import { TestModal } from './_components/TestModal'
import { TrainingDataQualityTab } from './_components/TrainingDataQualityTab'
type TabId = 'model_quality' | 'data_quality'
export default function QualityPage() {
const { state } = useSDK()
const searchParams = useSearchParams()
const initialTab: TabId = searchParams?.get('category') === 'data_quality' ? 'data_quality' : 'model_quality'
const [tab, setTab] = useState<TabId>(initialTab)
const {
metrics,
tests,
@@ -41,24 +49,54 @@ export default function QualityPage() {
<h1 className="text-2xl font-bold text-gray-900">AI Quality Dashboard</h1>
<p className="mt-1 text-gray-500">Ueberwachen Sie die Qualitaet und Fairness Ihrer KI-Systeme</p>
</div>
<div className="flex items-center gap-2">
<button
onClick={() => setShowTestModal(true)}
className="flex items-center gap-2 px-4 py-2 border border-purple-300 text-purple-700 rounded-lg hover:bg-purple-50 transition-colors"
>
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24"><path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M4 4v5h.582m15.356 2A8.001 8.001 0 004.582 9m0 0H9m11 11v-5h-.581m0 0a8.003 8.003 0 01-15.357-2m15.357 2H15" /></svg>
Test hinzufuegen
</button>
<button
onClick={() => { setEditMetric(undefined); setShowMetricModal(true) }}
className="flex items-center gap-2 px-4 py-2 bg-purple-600 text-white rounded-lg hover:bg-purple-700 transition-colors"
>
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24"><path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 6v6m0 0v6m0-6h6m-6 0H6" /></svg>
Messung hinzufuegen
</button>
</div>
{tab === 'model_quality' && (
<div className="flex items-center gap-2">
<button
onClick={() => setShowTestModal(true)}
className="flex items-center gap-2 px-4 py-2 border border-purple-300 text-purple-700 rounded-lg hover:bg-purple-50 transition-colors"
>
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24"><path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M4 4v5h.582m15.356 2A8.001 8.001 0 004.582 9m0 0H9m11 11v-5h-.581m0 0a8.003 8.003 0 01-15.357-2m15.357 2H15" /></svg>
Test hinzufuegen
</button>
<button
onClick={() => { setEditMetric(undefined); setShowMetricModal(true) }}
className="flex items-center gap-2 px-4 py-2 bg-purple-600 text-white rounded-lg hover:bg-purple-700 transition-colors"
>
<svg className="w-5 h-5" fill="none" stroke="currentColor" viewBox="0 0 24 24"><path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 6v6m0 0v6m0-6h6m-6 0H6" /></svg>
Messung hinzufuegen
</button>
</div>
)}
</div>
<div className="border-b border-gray-200">
<nav className="-mb-px flex gap-6">
<button
onClick={() => setTab('model_quality')}
className={`pb-3 px-1 text-sm font-medium border-b-2 transition-colors ${
tab === 'model_quality'
? 'border-purple-500 text-purple-600'
: 'border-transparent text-gray-500 hover:text-gray-700'
}`}
>
Modell-Qualität
</button>
<button
onClick={() => setTab('data_quality')}
className={`pb-3 px-1 text-sm font-medium border-b-2 transition-colors ${
tab === 'data_quality'
? 'border-purple-500 text-purple-600'
: 'border-transparent text-gray-500 hover:text-gray-700'
}`}
>
Trainingsdaten-Qualität (BSI QUAIDAL)
</button>
</nav>
</div>
{tab === 'data_quality' && <TrainingDataQualityTab />}
{tab === 'model_quality' && (
<>
<div className="grid grid-cols-1 md:grid-cols-4 gap-4">
<div className="bg-white rounded-xl border border-gray-200 p-6">
<div className="text-sm text-gray-500">Durchschnittlicher Score</div>
@@ -141,6 +179,8 @@ export default function QualityPage() {
</div>
</div>
</div>
</>
)}
{showMetricModal && (
<MetricModal
@@ -74,7 +74,7 @@ func (s *Store) GetProject(ctx context.Context, id uuid.UUID) (*Project, error)
FROM iace_projects WHERE id = $1
`, id).Scan(
&p.ID, &p.TenantID, &p.ParentProjectID, &p.MachineName, &p.MachineType, &p.Manufacturer,
&p.Description, &p.NarrativeText, &status, &p.CEMarkingTarget,
&p.CustomerName, &p.Description, &p.NarrativeText, &status, &p.CEMarkingTarget,
&p.CompletenessScore, &riskSummary, &triggeredRegulations, &metadata,
&p.CreatedAt, &p.UpdatedAt, &p.ArchivedAt,
)
@@ -275,9 +275,73 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
if entry.get("text"):
doc_texts[entry["doc_type"]] = entry["text"]
# P15: Dedupe — wenn mehrere Doc-Types DASSELBE Dokument referenzieren
# (z.B. Safetykon: User gibt /datenschutz fuer dse + cookie + widerruf),
# behalten wir nur den primaeren Doc-Type. Andere: leeren + note.
# Priorität: dse > impressum > cookie > widerruf > agb > nutzungsbedingungen
_DOC_PRIORITY = ["dse", "impressum", "cookie", "widerruf", "agb",
"nutzungsbedingungen", "social_media", "dsb"]
seen_text_hash: dict[int, str] = {}
for dt in _DOC_PRIORITY:
entry = next((e for e in doc_entries if e.get("doc_type") == dt
and e.get("text")), None)
if not entry:
continue
text_hash = hash((entry.get("text") or "").strip()[:1000])
if text_hash in seen_text_hash:
primary = seen_text_hash[text_hash]
logger.info(
"P15 dedup: doc_type=%s referenziert dasselbe Dokument "
"wie %s (URL=%s) -> als Duplikat markiert.",
dt, primary, entry.get("url", "")[:60],
)
entry["text"] = ""
entry["word_count"] = 0
entry["url"] = ""
entry["dup_of"] = primary
doc_texts.pop(dt, None)
else:
seen_text_hash[text_hash] = dt
# Step 2: Detect business profile (35-40%)
_update(check_id, "Geschaeftsmodell wird erkannt...", 37)
profile = await detect_business_profile(doc_texts)
# P16: Homepage-Text mit fuer Profile-Detection (no_direct_sales
# B2B-Indikatoren wie "CE-Zertifizierung" / "Schulungen" stehen oft
# nur im Homepage-Menue, nicht im Pflichttext).
profile_input = dict(doc_texts)
try:
base_url = ""
for e in doc_entries:
if e.get("url"):
from urllib.parse import urlparse
p = urlparse(e["url"])
if p.scheme and p.netloc:
base_url = f"{p.scheme}://{p.netloc}/"
break
if base_url:
import re as _re
async with httpx.AsyncClient(
timeout=8.0, follow_redirects=True,
headers={"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) "
"AppleWebKit/537.36 HeadlessChrome/120.0.0.0"},
) as _hc:
_hr = await _hc.get(base_url)
if _hr.status_code == 200 and "text/html" in _hr.headers.get(
"content-type", ""):
_html = _hr.text[:60000]
_html = _re.sub(r"<script[^>]*>.*?</script>", " ",
_html, flags=_re.DOTALL | _re.IGNORECASE)
_html = _re.sub(r"<style[^>]*>.*?</style>", " ",
_html, flags=_re.DOTALL | _re.IGNORECASE)
_html = _re.sub(r"<[^>]+>", " ", _html)
_html = _re.sub(r"\s+", " ", _html).strip()
if len(_html.split()) > 30:
profile_input["__homepage"] = _html[:20000]
logger.info("P16 homepage merged for profile: %d words",
len(_html.split()))
except Exception as e:
logger.debug("homepage fetch for profile failed: %s", e)
profile = await detect_business_profile(profile_input)
profile_dict = asdict(profile)
# Step 3: Check each document
@@ -323,6 +387,15 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
_update(check_id, f"Pruefen {i+1}/{n_entries}: {label}...", pct)
if not text or len(text) < 50:
# P15: duplicate doc that was deduped against a primary doc
if entry.get("dup_of"):
results.append(DocCheckResult(
label=label, url="", doc_type=doc_type,
error=f"Nicht separat vorhanden — wird im Dokument "
f"'{_doc_type_label(entry['dup_of'])}' "
f"mit-geprueft.",
))
continue
# Empty entry — either from auto-discovery padding (no URL
# to fetch) or from a fetch that returned nothing. If there
# was a URL we keep the error so the user knows the fetch
@@ -463,6 +536,15 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
cookie_payloads.extend(e["cmp_payloads"])
if e.get("text"):
cookie_text = e["text"]
# P17-D: Fallback wenn cookie via P15 deduped wurde — nutze DSE-Text
# sofern Cookie-Begriffe drin sind, damit LLM-Vendor-Extract trotzdem
# greifen kann.
if not cookie_text and not cookie_payloads:
dse_t = doc_texts.get("dse", "")
if dse_t and any(w in dse_t.lower() for w in
("cookie", "tracking", "google analytics", "consent")):
cookie_text = dse_t
logger.info("P17-D: vendor-extract Fallback auf DSE (Cookie deduped)")
# Site-owner derived from the submitted URLs — drives the
# INTERNAL/GROUP_COMPANY classification of vendor records.
owner_name = _company_name_from_url(doc_entries) or ""
@@ -608,6 +690,19 @@ async def _run_compliance_check(check_id: str, req: ComplianceCheckRequest):
cookie_doc_url = e.get("url", "")
cookie_cmp_payloads = e.get("cmp_payloads") or []
break
# P17-A: Fallback wenn Cookie-Doc via P15 deduped wurde — nutze
# den DSE-Text wenn er Cookie-Schluesselwoerter enthaelt.
if not cookie_doc_text:
dse_text = doc_texts.get("dse", "")
if dse_text and any(w in dse_text.lower() for w in
("cookie", "tracking", "google analytics",
"consent")):
cookie_doc_text = dse_text
dse_entry = next((e for e in doc_entries
if e.get("doc_type") == "dse"), {})
cookie_doc_url = dse_entry.get("url", "")
cookie_cmp_payloads = dse_entry.get("cmp_payloads") or []
logger.info("P17-A: cookie-arch fallback auf DSE (Cookie-Doc deduped)")
if cookie_doc_text:
arch = detect_architecture(
doc_url=cookie_doc_url,
@@ -182,7 +182,7 @@ def build_management_summary(results: list[DocCheckResult]) -> str:
if c.level == 1 and not c.passed and not c.skipped
and c.severity != "INFO"
]
for c in failed_checks[:3]: # Max 3 per document
for c in failed_checks: # P17-B: kein Per-Doc-Cap
action = _check_to_action(r.label, c.label, c.hint)
if action:
actions.append(action)
@@ -193,7 +193,7 @@ def build_management_summary(results: list[DocCheckResult]) -> str:
'Konkrete Aufgaben:</h3>'
'<ol style="font-size:13px;color:#475569;padding-left:20px;margin:0">'
)
for a in actions[:10]: # Max 10 actions
for a in actions[:20]: # P17-B: 10 -> 20
html.append(f'<li style="margin-bottom:6px">{a}</li>')
html.append('</ol>')
@@ -0,0 +1,244 @@
"""FastAPI routes for QUAIDAL-derived Controls (AI Trainingsdaten-Qualität).
Endpoints:
- GET /v1/quaidal/stats - Counts by kind + source provenance
- GET /v1/quaidal/controls - List all controls, optional kind= filter
- GET /v1/quaidal/controls/{id} - Single derived control by derived_id
- GET /v1/quaidal/criteria - The 10 QKB criteria with linked QB/MA IDs
- GET /v1/quaidal/criteria/{id} - Single QKB with full child tree (QB MA QM)
The controls are Clean-Room derived from BSI QUAIDAL. See
control-pipeline/scripts/derive_quaidal_mcs.py and migration 011.
"""
from __future__ import annotations
import logging
from typing import Optional
from fastapi import APIRouter, HTTPException, Query
from pydantic import BaseModel
from sqlalchemy import text
from database import SessionLocal
logger = logging.getLogger(__name__)
router = APIRouter(prefix="/v1/quaidal", tags=["quaidal"])
# ---------------------------------------------------------------------------
# Response shapes
# ---------------------------------------------------------------------------
class ExternalRef(BaseModel):
framework: str
citation: Optional[str] = None
class SourceProvenance(BaseModel):
framework: str
section: str
url: Optional[str] = None
commit_sha: Optional[str] = None
title_original: Optional[str] = None
license_note: Optional[str] = None
class DerivedControl(BaseModel):
derived_id: str
kind: str
canonical_name: str
description: str
regulation_anchor: Optional[str] = None
related_quaidal_ids: list[str]
external_refs: list[ExternalRef]
source: SourceProvenance
plagiarism_score: Optional[float] = None
class ControlsListResponse(BaseModel):
total: int
controls: list[DerivedControl]
class CriterionWithChildren(BaseModel):
"""A QKB criterion with the IDs of its linked building blocks, measures and metrics."""
criterion: DerivedControl
building_blocks: list[DerivedControl]
measures: list[DerivedControl]
metrics: list[DerivedControl]
class StatsResponse(BaseModel):
counts_by_kind: dict[str, int]
source_framework: str
source_commit_sha: Optional[str]
license_note: Optional[str]
# ---------------------------------------------------------------------------
# DB helpers
# ---------------------------------------------------------------------------
def _row_to_control(row) -> DerivedControl:
return DerivedControl(
derived_id=row.derived_id,
kind=row.kind,
canonical_name=row.canonical_name,
description=row.description,
regulation_anchor=row.regulation_anchor,
related_quaidal_ids=row.related_quaidal_ids or [],
external_refs=[ExternalRef(**r) for r in (row.external_refs or [])],
source=SourceProvenance(
framework=row.source_framework,
section=row.source_section,
url=row.source_url,
commit_sha=row.source_commit_sha,
title_original=row.source_title_original,
license_note=row.source_license_note,
),
plagiarism_score=float(row.plagiarism_score_at_generation) if row.plagiarism_score_at_generation is not None else None,
)
_SELECT_COLUMNS = """
derived_id, kind, canonical_name, description, regulation_anchor,
related_quaidal_ids, external_refs,
source_framework, source_section, source_url, source_commit_sha,
source_title_original, source_license_note,
plagiarism_score_at_generation
"""
# ---------------------------------------------------------------------------
# Endpoints
# ---------------------------------------------------------------------------
@router.get("/stats", response_model=StatsResponse)
def get_stats() -> StatsResponse:
"""Counts by kind + the QUAIDAL source provenance (single source today)."""
with SessionLocal() as db:
counts = db.execute(text(
"SELECT kind, COUNT(*) AS n FROM compliance.derived_controls "
"WHERE source_framework = :fw GROUP BY kind"
), {"fw": "BSI QUAIDAL"}).all()
meta = db.execute(text(
"SELECT source_commit_sha, source_license_note FROM compliance.derived_controls "
"WHERE source_framework = :fw LIMIT 1"
), {"fw": "BSI QUAIDAL"}).first()
return StatsResponse(
counts_by_kind={r.kind: r.n for r in counts},
source_framework="BSI QUAIDAL",
source_commit_sha=meta.source_commit_sha if meta else None,
license_note=meta.source_license_note if meta else None,
)
@router.get("/controls", response_model=ControlsListResponse)
def list_controls(
kind: Optional[str] = Query(None, description="criterion | building_block | measure | metric"),
limit: int = Query(500, ge=1, le=2000),
offset: int = Query(0, ge=0),
) -> ControlsListResponse:
"""List QUAIDAL-derived controls, optionally filtered by kind."""
where = ["source_framework = :fw"]
params: dict = {"fw": "BSI QUAIDAL", "limit": limit, "offset": offset}
if kind:
where.append("kind = :kind")
params["kind"] = kind
sql = (
f"SELECT {_SELECT_COLUMNS} FROM compliance.derived_controls "
f"WHERE {' AND '.join(where)} "
"ORDER BY source_section LIMIT :limit OFFSET :offset"
)
count_sql = f"SELECT COUNT(*) FROM compliance.derived_controls WHERE {' AND '.join(where)}"
with SessionLocal() as db:
rows = db.execute(text(sql), params).all()
total = db.execute(text(count_sql), {k: v for k, v in params.items() if k not in ("limit", "offset")}).scalar() or 0
return ControlsListResponse(total=int(total), controls=[_row_to_control(r) for r in rows])
@router.get("/controls/{derived_id}", response_model=DerivedControl)
def get_control(derived_id: str) -> DerivedControl:
with SessionLocal() as db:
row = db.execute(text(
f"SELECT {_SELECT_COLUMNS} FROM compliance.derived_controls WHERE derived_id = :id"
), {"id": derived_id}).first()
if not row:
raise HTTPException(status_code=404, detail=f"Control {derived_id} not found")
return _row_to_control(row)
@router.get("/criteria", response_model=list[DerivedControl])
def list_criteria() -> list[DerivedControl]:
"""Returns the 10 QKB criteria. Use /criteria/{section_id} for the full child tree."""
with SessionLocal() as db:
rows = db.execute(text(
f"SELECT {_SELECT_COLUMNS} FROM compliance.derived_controls "
"WHERE source_framework = :fw AND kind = 'criterion' ORDER BY source_section"
), {"fw": "BSI QUAIDAL"}).all()
return [_row_to_control(r) for r in rows]
@router.get("/criteria/{section_id}", response_model=CriterionWithChildren)
def get_criterion_tree(section_id: str) -> CriterionWithChildren:
"""Single QKB with the building blocks it references and the measures/metrics those reference.
`section_id` is the canonical QUAIDAL ID, e.g. `QKB-01`.
"""
section_id_upper = section_id.upper()
with SessionLocal() as db:
criterion_row = db.execute(text(
f"SELECT {_SELECT_COLUMNS} FROM compliance.derived_controls "
"WHERE source_framework = :fw AND source_section = :sid AND kind = 'criterion'"
), {"fw": "BSI QUAIDAL", "sid": section_id_upper}).first()
if not criterion_row:
raise HTTPException(status_code=404, detail=f"Criterion {section_id_upper} not found")
building_block_ids = criterion_row.related_quaidal_ids or []
building_blocks = []
if building_block_ids:
qb_rows = db.execute(text(
f"SELECT {_SELECT_COLUMNS} FROM compliance.derived_controls "
"WHERE source_framework = :fw AND kind = 'building_block' "
"AND source_section = ANY(:ids) ORDER BY source_section"
), {"fw": "BSI QUAIDAL", "ids": building_block_ids}).all()
building_blocks = [_row_to_control(r) for r in qb_rows]
# Collect measure IDs from each building block, then fetch them
measure_ids: list[str] = []
for qb in building_blocks:
measure_ids.extend(mid for mid in qb.related_quaidal_ids if mid.startswith("MA-"))
measures = []
if measure_ids:
ma_rows = db.execute(text(
f"SELECT {_SELECT_COLUMNS} FROM compliance.derived_controls "
"WHERE source_framework = :fw AND kind = 'measure' "
"AND source_section = ANY(:ids) ORDER BY source_section"
), {"fw": "BSI QUAIDAL", "ids": list(set(measure_ids))}).all()
measures = [_row_to_control(r) for r in ma_rows]
# Collect metric IDs from each measure
metric_ids: list[str] = []
for ma in measures:
metric_ids.extend(mid for mid in ma.related_quaidal_ids if mid.startswith("QM-"))
metrics = []
if metric_ids:
qm_rows = db.execute(text(
f"SELECT {_SELECT_COLUMNS} FROM compliance.derived_controls "
"WHERE source_framework = :fw AND kind = 'metric' "
"AND source_section = ANY(:ids) ORDER BY source_section"
), {"fw": "BSI QUAIDAL", "ids": list(set(metric_ids))}).all()
metrics = [_row_to_control(r) for r in qm_rows]
return CriterionWithChildren(
criterion=_row_to_control(criterion_row),
building_blocks=building_blocks,
measures=measures,
metrics=metrics,
)
@@ -237,6 +237,13 @@ async def detect_business_profile(documents: dict[str, str]) -> BusinessProfile:
b2g_score = _count_hits(full_text, _B2G_KEYWORDS)
nonprofit_score = _count_hits(full_text, _NONPROFIT_KEYWORDS)
# P17-C: B2B-Dienstleister-Cluster (P14) als Boost — wenn ein Unternehmen
# CE-Zertifizierung / Compliance-Beratung / Auditierung / Schulungen anbietet,
# ist es i.d.R. B2B auch wenn die strikten B2B-Keywords nicht greifen.
b2b_service_boost = _count_hits(full_text, _B2B_SERVICE_POSITIVE)
if b2b_service_boost >= 2:
b2b_score += min(3, b2b_service_boost - 1)
# Missing documents as signal
has_agb = "agb" in documents
has_widerruf = "widerruf" in documents
@@ -335,9 +342,10 @@ async def detect_business_profile(documents: dict[str, str]) -> BusinessProfile:
return profile
# Indikatoren: Site verweist primaer auf Vertragshaendler/Niederlassungen
# statt einen eigenen Checkout-Vertragsabschluss zu bieten.
_NO_DIRECT_SALES_POSITIVE = [
# P14: drei Cluster die jeweils unabhaengig no_direct_sales=True triggern.
# Cluster A: OEM-Konfigurator-Pattern (Auto-Hersteller mit Vertragshaendler-Netz)
_OEM_POSITIVE = [
"vertragshaendler", "vertragshändler", "vertragspartner",
"vertragswerkstatt", "haendlersuche", "händlersuche",
"niederlassung", "vertretung", "autorisierter haendler",
@@ -347,27 +355,80 @@ _NO_DIRECT_SALES_POSITIVE = [
"anfrage an haendler", "anfrage an händler",
"konfigurator", "fahrzeug konfigurieren",
"ihre individuelle anfrage",
# OEM-Markennamen — sind Hersteller-Marken die ueblicherweise via
# Haendler vertreiben.
"bmw vertriebs", "audi vertriebs", "mercedes-benz vertriebs",
"volkswagen vertriebs", "porsche zentrum",
# OEM-Markennamen im Pflichttext (Datenschutz erwaehnt Hersteller)
"bmw ag", "audi ag", "mercedes-benz ag", "volkswagen ag",
"porsche ag", "opel automobile gmbh",
]
# Cluster B: B2B-Dienstleister (Beratung / Compliance / Schulung / CE)
_B2B_SERVICE_POSITIVE = [
"ce-zertifizierung", "ce zertifizierung",
"ce-konformitaet", "ce-konformität",
"ce-kennzeichnung", "ce kennzeichnung",
"compliance-beratung", "compliance beratung",
"arbeitssicherheit", "product compliance",
"produktsicherheit", "produkthaftung",
"auditierung", "auditor", "auditierungen",
"schulungen", "workshops", "akademie",
"beratungsleistungen", "consultingleistungen",
"consulting services", "managementsystem",
"datenschutzbeauftragter (extern)",
"externer datenschutzbeauftragter",
"datenschutz-audit", "tisax", "iso 27001",
"iso 9001", "iso 14001", "iso 45001",
"gefaehrdungsbeurteilung", "gefährdungsbeurteilung",
"betriebsbeauftragter", "fachkraft fuer arbeitssicherheit",
"fachkraft für arbeitssicherheit",
]
# Cluster C: NGO / Verein / oeffentliche Verwaltung
_NONPROFIT_PUBLIC_POSITIVE = [
"spendenkonto", "vereinsregister", "gemeinnuetzig",
"gemeinnützig", "ehrenamtlich", "foerderverein",
"förderverein", "stiftung", "buergeramt", "bürgeramt",
"landratsamt", "kommunalverwaltung",
]
# Backwards-compat
_NO_DIRECT_SALES_POSITIVE = (
_OEM_POSITIVE + _B2B_SERVICE_POSITIVE + _NONPROFIT_PUBLIC_POSITIVE
)
# Indikatoren GEGEN no_direct_sales: echte Online-Shop-Funktionen.
_DIRECT_SALES_NEGATIVE = [
"in den warenkorb", "warenkorb hinzu", "zur kasse",
"jetzt kaufen", "kostenpflichtig bestellen",
"zahlungspflichtig bestellen", "sofort-kauf",
"online bestellen", "lieferadresse", "rechnungsadresse",
"versandkosten", "lieferzeit", "lieferbedingungen",
"checkout", "stueckpreis", "stückpreis",
]
def _detect_no_direct_sales(full_text: str) -> bool:
"""Heuristik: erkennt OEM-Konfigurator-Sites die nicht direkt verkaufen."""
"""Heuristik: True wenn Site keinen Direkt-Vertrieb mit B2C-Kunden hat.
Trifft fuer 3 Cluster zu (jeweils mind. 2 Treffer im Cluster):
A) OEM-Konfigurator (Auto-Hersteller)
B) B2B-Dienstleister (Beratung/Compliance/Schulung)
C) NGO / oeffentliche Verwaltung
Negativ-Signale (echte Shop-Funktionen) zaehlen gegen den Cluster:
nur True wenn pos > neg.
"""
text = full_text.lower()
pos = sum(1 for k in _NO_DIRECT_SALES_POSITIVE if k in text)
oem = sum(1 for k in _OEM_POSITIVE if k in text)
b2b = sum(1 for k in _B2B_SERVICE_POSITIVE if k in text)
npg = sum(1 for k in _NONPROFIT_PUBLIC_POSITIVE if k in text)
neg = sum(1 for k in _DIRECT_SALES_NEGATIVE if k in text)
# Mindestens 3 Haendler-Indikatoren UND weniger Shop-Indikatoren als
# Haendler-Indikatoren. Vermeidet false-positive fuer Shops die
# zusaetzlich "Haendlersuche" als Filiale-Finder anbieten.
return pos >= 3 and pos > neg
# Jeder Cluster ist eigenstaendig: 2 Treffer + weniger Negativ-Signale
# als Cluster-Treffer.
if oem >= 2 and oem > neg:
return True
if b2b >= 2 and b2b > neg:
return True
if npg >= 2 and npg > neg:
return True
return False
+2
View File
@@ -55,6 +55,7 @@ from compliance.api.saving_scan_routes import router as saving_scan_router
from compliance.api.agent_migration_routes import router as agent_migration_router
from compliance.api.vendor_assessment_routes import router as vendor_assessment_router
from compliance.api.cra_routes import router as cra_router
from compliance.api.quaidal_routes import router as quaidal_router
# Middleware
from middleware import (
@@ -168,6 +169,7 @@ app.include_router(vendor_assessment_router, prefix="/api")
# CRA (Cyber Resilience Act) Compliance
app.include_router(cra_router, prefix="/api")
app.include_router(quaidal_router, prefix="/api")
if __name__ == "__main__":