This repository has been archived on 2026-02-15. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
BreakPilot Dev 19855efacc
Some checks failed
Tests / Go Tests (push) Has been cancelled
Tests / Python Tests (push) Has been cancelled
Tests / Integration Tests (push) Has been cancelled
Tests / Go Lint (push) Has been cancelled
Tests / Python Lint (push) Has been cancelled
Tests / Security Scan (push) Has been cancelled
Tests / All Checks Passed (push) Has been cancelled
Security Scanning / Secret Scanning (push) Has been cancelled
Security Scanning / Dependency Vulnerability Scan (push) Has been cancelled
Security Scanning / Go Security Scan (push) Has been cancelled
Security Scanning / Python Security Scan (push) Has been cancelled
Security Scanning / Node.js Security Scan (push) Has been cancelled
Security Scanning / Docker Image Security (push) Has been cancelled
Security Scanning / Security Summary (push) Has been cancelled
CI/CD Pipeline / Go Tests (push) Has been cancelled
CI/CD Pipeline / Python Tests (push) Has been cancelled
CI/CD Pipeline / Website Tests (push) Has been cancelled
CI/CD Pipeline / Linting (push) Has been cancelled
CI/CD Pipeline / Security Scan (push) Has been cancelled
CI/CD Pipeline / Docker Build & Push (push) Has been cancelled
CI/CD Pipeline / Integration Tests (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / CI Summary (push) Has been cancelled
ci/woodpecker/manual/build-ci-image Pipeline was successful
ci/woodpecker/manual/main Pipeline failed
feat: BreakPilot PWA - Full codebase (clean push without large binaries)
All services: admin-v2, studio-v2, website, ai-compliance-sdk,
consent-service, klausur-service, voice-service, and infrastructure.
Large PDFs and compiled binaries excluded via .gitignore.
2026-02-11 13:25:58 +01:00

99 lines
2.3 KiB
Markdown

# BreakPilot Compliance SDK - Legal Corpus
Pre-indexed legal documents for the RAG system.
## EU Regulations
| Document | Chunks | Description |
|----------|--------|-------------|
| DSGVO (GDPR) | ~99 | EU General Data Protection Regulation |
| AI Act | ~85 | EU Artificial Intelligence Act |
| NIS2 | ~46 | Network and Information Security Directive |
| ePrivacy | ~32 | ePrivacy Directive (Cookie Directive) |
| CRA | ~41 | Cyber Resilience Act |
| EUCSA | ~28 | EU Cybersecurity Act |
| Data Act | ~35 | EU Data Act |
| DGA | ~25 | Data Governance Act |
| DSA | ~52 | Digital Services Act |
| DMA | ~38 | Digital Markets Act |
| EAA | ~22 | European Accessibility Act |
| SCC | ~18 | Standard Contractual Clauses |
| DPF | ~15 | EU-US Data Privacy Framework |
## German Regulations
| Document | Chunks | Description |
|----------|--------|-------------|
| TDDDG | ~28 | Telekommunikation-Digitale-Dienste-Datenschutz-Gesetz |
| TTDSG | ~24 | Telekommunikation-Telemedien-Datenschutz-Gesetz |
| BDSG | ~45 | Bundesdatenschutzgesetz |
| IT-SiG | ~32 | IT-Sicherheitsgesetz |
| BSI-KritisV | ~28 | BSI-Kritisverordnung |
## Directory Structure
```
legal-corpus/
├── eu/
│ ├── dsgvo/
│ │ ├── articles/
│ │ ├── recitals/
│ │ └── metadata.json
│ ├── ai-act/
│ ├── nis2/
│ ├── eprivacy/
│ ├── cra/
│ ├── eucsa/
│ ├── data-act/
│ ├── dga/
│ ├── dsa/
│ ├── dma/
│ ├── eaa/
│ ├── scc/
│ └── dpf/
├── de/
│ ├── tdddg/
│ ├── ttdsg/
│ ├── bdsg/
│ ├── it-sig/
│ └── bsi-kritisv/
└── embeddings/
└── (generated vector embeddings)
```
## Indexing
Documents are automatically indexed on first startup of the RAG service.
To manually re-index:
```bash
# Via CLI
breakpilot-cli index --all
# Via API
POST /api/v1/rag/index
```
## Adding Custom Documents
Organizations can add their own internal documents:
```bash
# Upload via CLI
breakpilot-cli upload --file policy.pdf --category internal
# Via API
POST /api/v1/rag/documents
Content-Type: multipart/form-data
```
## Embedding Model
Default: `bge-m3` via Ollama
Supports:
- German legal terminology
- Multi-lingual (DE/EN)
- High-quality semantic search