Initial commit: breakpilot-compliance - Compliance SDK Platform
Services: Admin-Compliance, Backend-Compliance, AI-Compliance-SDK, Consent-SDK, Developer-Portal, PCA-Platform, DSMS Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
98
breakpilot-compliance-sdk/legal-corpus/README.md
Normal file
98
breakpilot-compliance-sdk/legal-corpus/README.md
Normal file
@@ -0,0 +1,98 @@
|
||||
# BreakPilot Compliance SDK - Legal Corpus
|
||||
|
||||
Pre-indexed legal documents for the RAG system.
|
||||
|
||||
## EU Regulations
|
||||
|
||||
| Document | Chunks | Description |
|
||||
|----------|--------|-------------|
|
||||
| DSGVO (GDPR) | ~99 | EU General Data Protection Regulation |
|
||||
| AI Act | ~85 | EU Artificial Intelligence Act |
|
||||
| NIS2 | ~46 | Network and Information Security Directive |
|
||||
| ePrivacy | ~32 | ePrivacy Directive (Cookie Directive) |
|
||||
| CRA | ~41 | Cyber Resilience Act |
|
||||
| EUCSA | ~28 | EU Cybersecurity Act |
|
||||
| Data Act | ~35 | EU Data Act |
|
||||
| DGA | ~25 | Data Governance Act |
|
||||
| DSA | ~52 | Digital Services Act |
|
||||
| DMA | ~38 | Digital Markets Act |
|
||||
| EAA | ~22 | European Accessibility Act |
|
||||
| SCC | ~18 | Standard Contractual Clauses |
|
||||
| DPF | ~15 | EU-US Data Privacy Framework |
|
||||
|
||||
## German Regulations
|
||||
|
||||
| Document | Chunks | Description |
|
||||
|----------|--------|-------------|
|
||||
| TDDDG | ~28 | Telekommunikation-Digitale-Dienste-Datenschutz-Gesetz |
|
||||
| TTDSG | ~24 | Telekommunikation-Telemedien-Datenschutz-Gesetz |
|
||||
| BDSG | ~45 | Bundesdatenschutzgesetz |
|
||||
| IT-SiG | ~32 | IT-Sicherheitsgesetz |
|
||||
| BSI-KritisV | ~28 | BSI-Kritisverordnung |
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
legal-corpus/
|
||||
├── eu/
|
||||
│ ├── dsgvo/
|
||||
│ │ ├── articles/
|
||||
│ │ ├── recitals/
|
||||
│ │ └── metadata.json
|
||||
│ ├── ai-act/
|
||||
│ ├── nis2/
|
||||
│ ├── eprivacy/
|
||||
│ ├── cra/
|
||||
│ ├── eucsa/
|
||||
│ ├── data-act/
|
||||
│ ├── dga/
|
||||
│ ├── dsa/
|
||||
│ ├── dma/
|
||||
│ ├── eaa/
|
||||
│ ├── scc/
|
||||
│ └── dpf/
|
||||
├── de/
|
||||
│ ├── tdddg/
|
||||
│ ├── ttdsg/
|
||||
│ ├── bdsg/
|
||||
│ ├── it-sig/
|
||||
│ └── bsi-kritisv/
|
||||
└── embeddings/
|
||||
└── (generated vector embeddings)
|
||||
```
|
||||
|
||||
## Indexing
|
||||
|
||||
Documents are automatically indexed on first startup of the RAG service.
|
||||
|
||||
To manually re-index:
|
||||
|
||||
```bash
|
||||
# Via CLI
|
||||
breakpilot-cli index --all
|
||||
|
||||
# Via API
|
||||
POST /api/v1/rag/index
|
||||
```
|
||||
|
||||
## Adding Custom Documents
|
||||
|
||||
Organizations can add their own internal documents:
|
||||
|
||||
```bash
|
||||
# Upload via CLI
|
||||
breakpilot-cli upload --file policy.pdf --category internal
|
||||
|
||||
# Via API
|
||||
POST /api/v1/rag/documents
|
||||
Content-Type: multipart/form-data
|
||||
```
|
||||
|
||||
## Embedding Model
|
||||
|
||||
Default: `bge-m3` via Ollama
|
||||
|
||||
Supports:
|
||||
- German legal terminology
|
||||
- Multi-lingual (DE/EN)
|
||||
- High-quality semantic search
|
||||
Reference in New Issue
Block a user