fix: Restore all files lost during destructive rebase

A previous `git pull --rebase origin main` dropped 177 local commits,
losing 3400+ files across admin-v2, backend, studio-v2, website,
klausur-service, and many other services. The partial restore attempt
(660295e2) only recovered some files.

This commit restores all missing files from pre-rebase ref 98933f5e
while preserving post-rebase additions (night-scheduler, night-mode UI,
NightModeWidget dashboard integration).

Restored features include:
- AI Module Sidebar (FAB), OCR Labeling, OCR Compare
- GPU Dashboard, RAG Pipeline, Magic Help
- Klausur-Korrektur (8 files), Abitur-Archiv (5+ files)
- Companion, Zeugnisse-Crawler, Screen Flow
- Full backend, studio-v2, website, klausur-service
- All compliance SDKs, agent-core, voice-service
- CI/CD configs, documentation, scripts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-02-09 09:51:32 +01:00
parent f7487ee240
commit bfdaf63ba9
2009 changed files with 749983 additions and 1731 deletions

View File

@@ -0,0 +1,98 @@
# BreakPilot Compliance SDK - Legal Corpus
Pre-indexed legal documents for the RAG system.
## EU Regulations
| Document | Chunks | Description |
|----------|--------|-------------|
| DSGVO (GDPR) | ~99 | EU General Data Protection Regulation |
| AI Act | ~85 | EU Artificial Intelligence Act |
| NIS2 | ~46 | Network and Information Security Directive |
| ePrivacy | ~32 | ePrivacy Directive (Cookie Directive) |
| CRA | ~41 | Cyber Resilience Act |
| EUCSA | ~28 | EU Cybersecurity Act |
| Data Act | ~35 | EU Data Act |
| DGA | ~25 | Data Governance Act |
| DSA | ~52 | Digital Services Act |
| DMA | ~38 | Digital Markets Act |
| EAA | ~22 | European Accessibility Act |
| SCC | ~18 | Standard Contractual Clauses |
| DPF | ~15 | EU-US Data Privacy Framework |
## German Regulations
| Document | Chunks | Description |
|----------|--------|-------------|
| TDDDG | ~28 | Telekommunikation-Digitale-Dienste-Datenschutz-Gesetz |
| TTDSG | ~24 | Telekommunikation-Telemedien-Datenschutz-Gesetz |
| BDSG | ~45 | Bundesdatenschutzgesetz |
| IT-SiG | ~32 | IT-Sicherheitsgesetz |
| BSI-KritisV | ~28 | BSI-Kritisverordnung |
## Directory Structure
```
legal-corpus/
├── eu/
│ ├── dsgvo/
│ │ ├── articles/
│ │ ├── recitals/
│ │ └── metadata.json
│ ├── ai-act/
│ ├── nis2/
│ ├── eprivacy/
│ ├── cra/
│ ├── eucsa/
│ ├── data-act/
│ ├── dga/
│ ├── dsa/
│ ├── dma/
│ ├── eaa/
│ ├── scc/
│ └── dpf/
├── de/
│ ├── tdddg/
│ ├── ttdsg/
│ ├── bdsg/
│ ├── it-sig/
│ └── bsi-kritisv/
└── embeddings/
└── (generated vector embeddings)
```
## Indexing
Documents are automatically indexed on first startup of the RAG service.
To manually re-index:
```bash
# Via CLI
breakpilot-cli index --all
# Via API
POST /api/v1/rag/index
```
## Adding Custom Documents
Organizations can add their own internal documents:
```bash
# Upload via CLI
breakpilot-cli upload --file policy.pdf --category internal
# Via API
POST /api/v1/rag/documents
Content-Type: multipart/form-data
```
## Embedding Model
Default: `bge-m3` via Ollama
Supports:
- German legal terminology
- Multi-lingual (DE/EN)
- High-quality semantic search