This repository has been archived on 2026-02-15. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
breakpilot-pwa/breakpilot-compliance-sdk/legal-corpus
Benjamin Admin 21a844cb8a fix: Restore all files lost during destructive rebase
A previous `git pull --rebase origin main` dropped 177 local commits,
losing 3400+ files across admin-v2, backend, studio-v2, website,
klausur-service, and many other services. The partial restore attempt
(660295e2) only recovered some files.

This commit restores all missing files from pre-rebase ref 98933f5e
while preserving post-rebase additions (night-scheduler, night-mode UI,
NightModeWidget dashboard integration).

Restored features include:
- AI Module Sidebar (FAB), OCR Labeling, OCR Compare
- GPU Dashboard, RAG Pipeline, Magic Help
- Klausur-Korrektur (8 files), Abitur-Archiv (5+ files)
- Companion, Zeugnisse-Crawler, Screen Flow
- Full backend, studio-v2, website, klausur-service
- All compliance SDKs, agent-core, voice-service
- CI/CD configs, documentation, scripts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-09 09:51:32 +01:00
..

BreakPilot Compliance SDK - Legal Corpus

Pre-indexed legal documents for the RAG system.

EU Regulations

Document Chunks Description
DSGVO (GDPR) ~99 EU General Data Protection Regulation
AI Act ~85 EU Artificial Intelligence Act
NIS2 ~46 Network and Information Security Directive
ePrivacy ~32 ePrivacy Directive (Cookie Directive)
CRA ~41 Cyber Resilience Act
EUCSA ~28 EU Cybersecurity Act
Data Act ~35 EU Data Act
DGA ~25 Data Governance Act
DSA ~52 Digital Services Act
DMA ~38 Digital Markets Act
EAA ~22 European Accessibility Act
SCC ~18 Standard Contractual Clauses
DPF ~15 EU-US Data Privacy Framework

German Regulations

Document Chunks Description
TDDDG ~28 Telekommunikation-Digitale-Dienste-Datenschutz-Gesetz
TTDSG ~24 Telekommunikation-Telemedien-Datenschutz-Gesetz
BDSG ~45 Bundesdatenschutzgesetz
IT-SiG ~32 IT-Sicherheitsgesetz
BSI-KritisV ~28 BSI-Kritisverordnung

Directory Structure

legal-corpus/
├── eu/
│   ├── dsgvo/
│   │   ├── articles/
│   │   ├── recitals/
│   │   └── metadata.json
│   ├── ai-act/
│   ├── nis2/
│   ├── eprivacy/
│   ├── cra/
│   ├── eucsa/
│   ├── data-act/
│   ├── dga/
│   ├── dsa/
│   ├── dma/
│   ├── eaa/
│   ├── scc/
│   └── dpf/
├── de/
│   ├── tdddg/
│   ├── ttdsg/
│   ├── bdsg/
│   ├── it-sig/
│   └── bsi-kritisv/
└── embeddings/
    └── (generated vector embeddings)

Indexing

Documents are automatically indexed on first startup of the RAG service.

To manually re-index:

# Via CLI
breakpilot-cli index --all

# Via API
POST /api/v1/rag/index

Adding Custom Documents

Organizations can add their own internal documents:

# Upload via CLI
breakpilot-cli upload --file policy.pdf --category internal

# Via API
POST /api/v1/rag/documents
Content-Type: multipart/form-data

Embedding Model

Default: bge-m3 via Ollama

Supports:

  • German legal terminology
  • Multi-lingual (DE/EN)
  • High-quality semantic search