feat(coverage): Korpus-Dokumente gruppiert nach Art + Herausgeber-Familie
CI / dep-audit (push) Has been skipped
CI / test-python-backend (push) Successful in 27s
CI / test-python-document-crawler (push) Has been skipped
CI / sbom-scan (push) Has been skipped
CI / test-python-dsms-gateway (push) Has been skipped
CI / build-sha-integrity (push) Successful in 14s
CI / validate-canonical-controls (push) Successful in 10s
CI / loc-budget (push) Successful in 25s
CI / go-lint (push) Has been skipped
CI / detect-changes (push) Successful in 19s
CI / python-lint (push) Has been skipped
CI / branch-name (push) Has been skipped
CI / guardrail-integrity (push) Has been skipped
CI / secret-scan (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / nodejs-build (push) Successful in 3m8s
CI / test-go (push) Has been skipped
CI / iace-gt-coverage (push) Has been skipped

Die "Korpus-Dokumente"-Tabelle wird nach Dokument-Art geordnet
(Gesetze & Verordnungen → Behörden-Leitfäden → Standards & Best Practice →
Rechtsprechung) mit Zwischenüberschriften, und je Herausgeber-Familie
zusammengefasst (alle DSK, alle EDPB, alle OWASP/NIST/ENISA gemeinsam).
Deterministischer Kategorisierer (categorizeCorpusDoc) + Grouper
(groupCorpusDocs), pure + unit-getestet.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-06-16 12:20:10 +02:00
parent 9e9d780902
commit 8a0097f5da
3 changed files with 193 additions and 15 deletions
@@ -8,10 +8,21 @@ import {
splitByTier,
severityBadgeClass,
addresseeLabel,
categorizeCorpusDoc,
groupCorpusDocs,
type UseCaseRow,
type ControlItem,
type CorpusDoc,
} from './_helpers'
const doc = (src: string, n = 1): CorpusDoc => ({
source_regulation: src,
license_rule: 1,
license_tier: 't',
atom_count: n,
use_case: null,
})
const ctrl = (over: Partial<ControlItem>): ControlItem => ({
id: 'id',
title: 'T',
@@ -108,6 +119,46 @@ describe('coverage helpers', () => {
expect(addresseeLabel('unbekannt_neu')).toBe('unbekannt_neu')
})
it('categorizes corpus docs by type + issuer family', () => {
expect(categorizeCorpusDoc('DSGVO (EU) 2016/679').cat.key).toBe('law')
expect(categorizeCorpusDoc('Medizinprodukteverordnung (EU) 2017/745 (MDR)').cat.key).toBe('law')
expect(categorizeCorpusDoc('DSK OH Telemedien')).toMatchObject({
cat: { key: 'guidance' },
family: 'DSK (Datenschutzkonferenz)',
})
expect(categorizeCorpusDoc('EDPB Fines Calculation')).toMatchObject({
cat: { key: 'guidance' },
family: 'EDPB',
})
expect(categorizeCorpusDoc('OWASP Top 10 (2021)')).toMatchObject({
cat: { key: 'standard' },
family: 'OWASP',
})
expect(categorizeCorpusDoc('NIST SP 800-53 Rev. 5').family).toBe('NIST')
expect(categorizeCorpusDoc('ENISA NIS2 Security Measures').family).toBe('ENISA')
expect(categorizeCorpusDoc('BGH I ZR 7/16').cat.key).toBe('court')
})
it('groups corpus docs: laws → guidance → standards → court, families clustered', () => {
const groups = groupCorpusDocs([
doc('OWASP Top 10', 10),
doc('DSGVO (EU) 2016/679', 50),
doc('DSK OH Telemedien', 5),
doc('EDPB Fines', 8),
doc('NIST SP 800-53', 20),
doc('DSK OH Direktwerbung', 3),
doc('BGH I ZR 7/16', 1),
])
expect(groups.map((g) => g.key)).toEqual(['law', 'guidance', 'standard', 'court'])
const guidance = groups.find((g) => g.key === 'guidance')!
// two DSK docs collapse into one family
const dsk = guidance.families.find((f) => f.family.startsWith('DSK'))!
expect(dsk.docs.length).toBe(2)
const std = groups.find((g) => g.key === 'standard')!
// NIST (20) before OWASP (10) — families sorted by size desc
expect(std.families.map((f) => f.family)).toEqual(['NIST', 'OWASP'])
})
it('splitByTier separates core (relevant) from review', () => {
const { core, review } = splitByTier([
ctrl({ id: 'a', relevant: true }),