This repository has been archived on 2026-02-15. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
BreakPilot Dev 19855efacc
Some checks failed
Tests / Go Tests (push) Has been cancelled
Tests / Python Tests (push) Has been cancelled
Tests / Integration Tests (push) Has been cancelled
Tests / Go Lint (push) Has been cancelled
Tests / Python Lint (push) Has been cancelled
Tests / Security Scan (push) Has been cancelled
Tests / All Checks Passed (push) Has been cancelled
Security Scanning / Secret Scanning (push) Has been cancelled
Security Scanning / Dependency Vulnerability Scan (push) Has been cancelled
Security Scanning / Go Security Scan (push) Has been cancelled
Security Scanning / Python Security Scan (push) Has been cancelled
Security Scanning / Node.js Security Scan (push) Has been cancelled
Security Scanning / Docker Image Security (push) Has been cancelled
Security Scanning / Security Summary (push) Has been cancelled
CI/CD Pipeline / Go Tests (push) Has been cancelled
CI/CD Pipeline / Python Tests (push) Has been cancelled
CI/CD Pipeline / Website Tests (push) Has been cancelled
CI/CD Pipeline / Linting (push) Has been cancelled
CI/CD Pipeline / Security Scan (push) Has been cancelled
CI/CD Pipeline / Docker Build & Push (push) Has been cancelled
CI/CD Pipeline / Integration Tests (push) Has been cancelled
CI/CD Pipeline / Deploy to Staging (push) Has been cancelled
CI/CD Pipeline / Deploy to Production (push) Has been cancelled
CI/CD Pipeline / CI Summary (push) Has been cancelled
ci/woodpecker/manual/build-ci-image Pipeline was successful
ci/woodpecker/manual/main Pipeline failed
feat: BreakPilot PWA - Full codebase (clean push without large binaries)
All services: admin-v2, studio-v2, website, ai-compliance-sdk,
consent-service, klausur-service, voice-service, and infrastructure.
Large PDFs and compiled binaries excluded via .gitignore.
2026-02-11 13:25:58 +01:00

2.3 KiB

BreakPilot Compliance SDK - Legal Corpus

Pre-indexed legal documents for the RAG system.

EU Regulations

Document Chunks Description
DSGVO (GDPR) ~99 EU General Data Protection Regulation
AI Act ~85 EU Artificial Intelligence Act
NIS2 ~46 Network and Information Security Directive
ePrivacy ~32 ePrivacy Directive (Cookie Directive)
CRA ~41 Cyber Resilience Act
EUCSA ~28 EU Cybersecurity Act
Data Act ~35 EU Data Act
DGA ~25 Data Governance Act
DSA ~52 Digital Services Act
DMA ~38 Digital Markets Act
EAA ~22 European Accessibility Act
SCC ~18 Standard Contractual Clauses
DPF ~15 EU-US Data Privacy Framework

German Regulations

Document Chunks Description
TDDDG ~28 Telekommunikation-Digitale-Dienste-Datenschutz-Gesetz
TTDSG ~24 Telekommunikation-Telemedien-Datenschutz-Gesetz
BDSG ~45 Bundesdatenschutzgesetz
IT-SiG ~32 IT-Sicherheitsgesetz
BSI-KritisV ~28 BSI-Kritisverordnung

Directory Structure

legal-corpus/
├── eu/
│   ├── dsgvo/
│   │   ├── articles/
│   │   ├── recitals/
│   │   └── metadata.json
│   ├── ai-act/
│   ├── nis2/
│   ├── eprivacy/
│   ├── cra/
│   ├── eucsa/
│   ├── data-act/
│   ├── dga/
│   ├── dsa/
│   ├── dma/
│   ├── eaa/
│   ├── scc/
│   └── dpf/
├── de/
│   ├── tdddg/
│   ├── ttdsg/
│   ├── bdsg/
│   ├── it-sig/
│   └── bsi-kritisv/
└── embeddings/
    └── (generated vector embeddings)

Indexing

Documents are automatically indexed on first startup of the RAG service.

To manually re-index:

# Via CLI
breakpilot-cli index --all

# Via API
POST /api/v1/rag/index

Adding Custom Documents

Organizations can add their own internal documents:

# Upload via CLI
breakpilot-cli upload --file policy.pdf --category internal

# Via API
POST /api/v1/rag/documents
Content-Type: multipart/form-data

Embedding Model

Default: bge-m3 via Ollama

Supports:

  • German legal terminology
  • Multi-lingual (DE/EN)
  • High-quality semantic search