Benjamin Admin
519cc274bb
docs: session handover — MC Quality + Gap Engine + RAG Ingestion (5 Tage)
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-05-11 21:47:22 +02:00
Benjamin Admin
f022b489e2
docs: comprehensive session handover — Blocks F+G complete, next: MC quality refinement
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-05-06 21:06:01 +02:00
Benjamin Admin
0bad74a3bd
docs: session handover — Block F complete, pipeline done, G-pre1 analysis
...
Session 03-05.05.2026:
- Block F1-F5 complete (DB migration of hardcoded dicts)
- Control Generation: 1,599 controls + 11,522 obligations + 1,147 atomics
- Production sync: 2,625 controls + 11,522 obligations synced
- G-pre1 analysis: 183k objects → 144k after normalize (needs hierarchical clustering)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-05-05 18:02:10 +02:00
Benjamin Admin
e869cabc81
docs: session handover — F1-F3 done, control generation running
...
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-05-04 07:21:24 +02:00
Benjamin Admin
4fd2bfefcd
docs: session handover updated for Block F start
...
Next: F1 Regulation Registry (DB + API + Frontend + Auto-Create)
Frontend at /sdk/regulation-registry in breakpilot-compliance admin
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-05-03 22:51:23 +02:00
Benjamin Admin
118be3540d
feat(pipeline): D6 citation backfill + E2/E3 law ingestion scripts
...
- d6_citation_backfill.py: 3-tier matching (hash/prefix/overlap),
archives old citations, updated 3.651 controls (93.6% coverage)
- ingest_de_laws.py: 8 German laws ingested (ArbZG, MuSchG, NachwG,
MiLoG, GmbHG, AktG, InsO, BUrlG — 1.629 chunks)
- ingest_eu_regulations.py: EUR-Lex ingestion (needs manual HTML due
to AWS WAF). CSRD, CSDDD, EU Taxonomy, eIDAS 2.0, Pay Transparency
manually ingested (1.057 chunks)
- Updated session handover with current state
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-05-03 13:19:27 +02:00
Benjamin Admin
97a7f6f264
docs: comprehensive session handover with full roadmap (Blocks A-G)
...
Complete instructions for next session including:
- Current quality metrics per document type
- Prioritized action items (NIST fix, citation backfill, missing laws)
- Full Block E-G roadmap with details
- All critical files, DB state, test commands
- Known issues (3 lost NIST PDFs, frontend 500s, D5 script safety)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-05-02 22:30:50 +02:00
Benjamin Admin
ff21bc258a
docs: session handover — D2-D5 complete, quality report, NIST plan
...
Major session achievements:
- Structural metadata end-to-end (D2-D4)
- 430 docs re-ingested with new chunking
- HTML stripping + charset detection (0% → 97.6%)
- 20 EU regulations from EUR-Lex HTML (DSGVO: 0% → 92%)
- Quality report script (500 controls: 13% fully correct)
- Frontend requirements.map fix
Open: NIST/ENISA text normalization, citation backfill,
D5 script safety (upload-before-delete), BEG IV ingestion.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-05-02 22:22:55 +02:00
Benjamin Admin
5a6e588641
docs: update session handover — D2-D5 complete, EU PDF issue documented
...
Session achieved: structural metadata end-to-end (D2-D4), overlap bug
fix, HTML stripping with charset detection, 430/436 docs re-ingested.
Remaining: ~40 EU Official Journal PDFs need HTML from EUR-Lex (broken
multi-column PDF extraction), 3 missing EDPB PDFs, 1 corrupt PDF.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-05-02 17:34:34 +02:00
Benjamin Admin
da21339e76
docs: add session handover instructions for next session
...
Covers: completed blocks A-D1, remaining D2-G, critical files,
DB state, memory files, test commands.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com >
2026-05-01 15:33:05 +02:00