Files
breakpilot-compliance/breakpilot-compliance-sdk/legal-corpus/README.md
Benjamin Boenisch 4435e7ea0a Initial commit: breakpilot-compliance - Compliance SDK Platform
Services: Admin-Compliance, Backend-Compliance,
AI-Compliance-SDK, Consent-SDK, Developer-Portal,
PCA-Platform, DSMS

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 23:47:28 +01:00

99 lines
2.3 KiB
Markdown

# BreakPilot Compliance SDK - Legal Corpus
Pre-indexed legal documents for the RAG system.
## EU Regulations
| Document | Chunks | Description |
|----------|--------|-------------|
| DSGVO (GDPR) | ~99 | EU General Data Protection Regulation |
| AI Act | ~85 | EU Artificial Intelligence Act |
| NIS2 | ~46 | Network and Information Security Directive |
| ePrivacy | ~32 | ePrivacy Directive (Cookie Directive) |
| CRA | ~41 | Cyber Resilience Act |
| EUCSA | ~28 | EU Cybersecurity Act |
| Data Act | ~35 | EU Data Act |
| DGA | ~25 | Data Governance Act |
| DSA | ~52 | Digital Services Act |
| DMA | ~38 | Digital Markets Act |
| EAA | ~22 | European Accessibility Act |
| SCC | ~18 | Standard Contractual Clauses |
| DPF | ~15 | EU-US Data Privacy Framework |
## German Regulations
| Document | Chunks | Description |
|----------|--------|-------------|
| TDDDG | ~28 | Telekommunikation-Digitale-Dienste-Datenschutz-Gesetz |
| TTDSG | ~24 | Telekommunikation-Telemedien-Datenschutz-Gesetz |
| BDSG | ~45 | Bundesdatenschutzgesetz |
| IT-SiG | ~32 | IT-Sicherheitsgesetz |
| BSI-KritisV | ~28 | BSI-Kritisverordnung |
## Directory Structure
```
legal-corpus/
├── eu/
│ ├── dsgvo/
│ │ ├── articles/
│ │ ├── recitals/
│ │ └── metadata.json
│ ├── ai-act/
│ ├── nis2/
│ ├── eprivacy/
│ ├── cra/
│ ├── eucsa/
│ ├── data-act/
│ ├── dga/
│ ├── dsa/
│ ├── dma/
│ ├── eaa/
│ ├── scc/
│ └── dpf/
├── de/
│ ├── tdddg/
│ ├── ttdsg/
│ ├── bdsg/
│ ├── it-sig/
│ └── bsi-kritisv/
└── embeddings/
└── (generated vector embeddings)
```
## Indexing
Documents are automatically indexed on first startup of the RAG service.
To manually re-index:
```bash
# Via CLI
breakpilot-cli index --all
# Via API
POST /api/v1/rag/index
```
## Adding Custom Documents
Organizations can add their own internal documents:
```bash
# Upload via CLI
breakpilot-cli upload --file policy.pdf --category internal
# Via API
POST /api/v1/rag/documents
Content-Type: multipart/form-data
```
## Embedding Model
Default: `bge-m3` via Ollama
Supports:
- German legal terminology
- Multi-lingual (DE/EN)
- High-quality semantic search