diff --git a/control-pipeline/INSTRUCTION-session-handover.md b/control-pipeline/INSTRUCTION-session-handover.md
index 3bceeff..a1a2b9f 100644
--- a/control-pipeline/INSTRUCTION-session-handover.md
+++ b/control-pipeline/INSTRUCTION-session-handover.md
@@ -1,127 +1,108 @@
-# Session-Instruktionen: Control Generation + Block F Rest
+# Session-Instruktionen: G-pre1 Object-Normalisierung
 
-**Datum:** 2026-05-04
+**Datum:** 2026-05-05
 **Fuer:** Naechste Claude-Session
 **Repo:** breakpilot-core (~/Projekte/breakpilot-core)
 
 ---
 
-## LAUFENDER JOB (vor dieser Session pruefen!)
+## NAECHSTER SCHRITT: G-pre1 — Hierarchisches Themen-Clustering
 
-### Control Generation Job `60190756-b660-4b03-869a-fa1076394cca`
+### Analyse-Ergebnis (05.05.2026)
 
-Gestartet am 04.05.2026 ~00:30. Verarbeitet neue DE/CH/AT-Gesetze aus `bp_compliance_gesetze`.
-
-```bash
-# Status pruefen:
-ssh macmini "/usr/local/bin/docker exec bp-compliance-backend curl -sf http://127.0.0.1:8002/api/compliance/v1/canonical/generate/status/60190756-b660-4b03-869a-fa1076394cca"
-
-# Processed Stats:
-ssh macmini "/usr/local/bin/docker exec bp-compliance-backend curl -sf http://127.0.0.1:8002/api/compliance/v1/canonical/generate/processed-stats"
+```
+Unique raw objects:       183.058
+Nach normalize_object():  144.151 (nur 21% Reduktion)
+Singletons:               144.117 (99.98% sind einzigartig!)
+Gruppen mit 2+ Members:   34
 ```
 
-**WICHTIG:** API-Zugriff nur ueber Docker exec (nginx-HTTPS-Proxy ist langsam/timeout):
-```bash
-ssh macmini "/usr/local/bin/docker exec bp-compliance-backend curl -sf --max-time 60 -X POST http://127.0.0.1:8002/api/compliance/v1/canonical/generate ..."
-```
+**Erkenntnis:** Das Problem ist NICHT "gleiche Objekte mit verschiedenen Namen" sondern "144k granulare Objekte die zu uebergeordneten Themen zusammengefasst werden muessen."
+
+### Neuer Ansatz: Hierarchisches Themen-Clustering
+
+Statt 1:1 Synonym-Matching brauchen wir:
+1. **Themen-Hierarchie** definieren (z.B. "Authentication & Access" → password, mfa, session, rbac)
+2. **Embedding-basierte Zuordnung** jedes Objects zu einem Thema
+3. **Qdrant-basiert** (kein voller Distance-Matrix im RAM noetig)
+4. Ggf. Sampling + Mini-Batch K-Means statt DBSCAN
+
+### Speicher-Problem
+- 144k × 144k Distance-Matrix = ~83 GB RAM → nicht machbar
+- Alternative: Qdrant nearest-neighbor search pro Object (O(n) statt O(n²))
+- Oder: Mini-Batch K-Means mit k=20.000 auf 144k × 1024 Matrix (~600 MB, machbar)
+
+### Analyse-Script vorhanden
+- `control-pipeline/scripts/gpre1_analyze.py` (lokal, nicht committed)
 
 ---
 
-## NAECHSTE SCHRITTE (Reihenfolge!)
+## SESSION 03-05.05.2026 ERLEDIGT
 
-### 1. Control Generation fuer verbleibende Collections
+### Block F (Hardcoded Knowledge → DB) — KOMPLETT ✅
+- F1: regulation_registry (223 Eintraege)
+- F2: action_types (34) + action_synonyms (368)
+- F3: object_synonyms (320)
+- F4: LLM Enrichment (+468 neue Synonyme via Ollama)
+- F5: Validation (8 Tests) + Dicts als Fallback beibehalten
+- 454 Pipeline-Tests pass, 0 Regressionen
 
-Nach Abschluss von Job 1 (bp_compliance_gesetze), starten:
+### Control Generation Pipeline — KOMPLETT ✅
+- 1.599 Rich Controls aus E-Block Chunks generiert (~$17 Anthropic)
+- 11.522 Obligations extrahiert (Pass 0a, ~$4 Anthropic)
+- 1.147 Atomic Controls komponiert (Pass 0b, ~$4.60 Anthropic)
+- **Gesamtkosten: ~$25.60**
 
-**Job 2: bp_compliance_ce** (EU-Regulierungen, ~20k Chunks)
-```bash
-ssh macmini "/usr/local/bin/docker exec bp-compliance-backend curl -sf --max-time 60 -X POST http://127.0.0.1:8002/api/compliance/v1/canonical/generate \
-  -H 'Content-Type: application/json' \
-  -d '{
-    \"collections\": [\"bp_compliance_ce\"],
-    \"max_chunks\": 2000,
-    \"max_controls\": 500,
-    \"batch_size\": 5,
-    \"skip_web_search\": true,
-    \"regulation_filter\": [\"dsgvo_2016\",\"nis2_2022\",\"cra_2024\",\"ai_act_2024\",\"dsa_2022\",\"dma_2022\",\"dga_2022\",\"dora_2022\",\"dataact_2023\",\"dpf_2023\",\"dsm_2019\",\"gpsr_2023\",\"eprivacy_2002\",\"ecommerce_2000\",\"machinery_2023\",\"eu_mdr_2017\",\"ifrs_2023\",\"amlr_2024\",\"digital_content_2019\",\"omnibus_2019\",\"csrd_2022\",\"csddd_2024\",\"eu_taxonomy_2020\",\"eidas_2_0_2024\",\"pay_transparency_2023\",\"fda_human_factors\",\"eu_machinery_guide_2006_42\"]
-  }'"
-```
+### Production Sync — KOMPLETT ✅
+- 2.625 neue Controls auf Production synchronisiert (ON CONFLICT DO NOTHING)
+- 11.522 Obligations auf Production synchronisiert
+- Production: 294.027 Controls total (vorher 291.402)
+- Backups auf MacBook: komprimiert (30 MB) + plain SQL (1.3 GB)
 
-**Job 3: bp_compliance_datenschutz** (~4k Chunks)
-```bash
-ssh macmini "/usr/local/bin/docker exec bp-compliance-backend curl -sf --max-time 60 -X POST http://127.0.0.1:8002/api/compliance/v1/canonical/generate \
-  -H 'Content-Type: application/json' \
-  -d '{
-    \"collections\": [\"bp_compliance_datenschutz\"],
-    \"max_chunks\": 2000,
-    \"max_controls\": 500,
-    \"batch_size\": 5,
-    \"skip_web_search\": true,
-    \"regulation_filter\": [\"dsk_oh_telemedien_2022\",\"edpb_gl_7_2020\",\"bverfg_1bvr1547_19_datenanalyse\",\"eugh_c_252_21_meta\",\"eugh_c_300_21_schadenersatz\",\"eugh_c_311_18_schrems_ii\",\"eugh_c_634_21_schufa\",\"eugh_c_673_17_planet49\",\"lg_muc_google_fonts\",\"bgh_art82_2024_218\",\"bgh_i_zr_7_16\",\"bgh_vi_zr_396_24\",\"bvge_2024_iv_2\",\"ogh_6ob102_24d\",\"ogh_6ob70_24y\"]
-  }'"
-```
-
-**ACHTUNG:** `regulation_filter` ist PFLICHT um bereits verarbeitete Regulations (bgb_komplett etc.) nicht doppelt zu verarbeiten! Alte Chunks wurden re-ingested → neue Hashes → Pipeline wuerde sie als "unprocessed" sehen.
-
-### 2. Pass 0b (Anthropic Batch API, ~$50)
-
-Erst NACH Abschluss aller 3 Generation-Jobs:
-```bash
-ssh macmini "/usr/local/bin/docker exec bp-compliance-backend curl -sf --max-time 60 -X POST http://127.0.0.1:8002/api/compliance/v1/canonical/generate/submit-pass0b \
-  -H 'Content-Type: application/json' \
-  -d '{\"limit\": 500, \"batch_size\": 10}'"
-```
-
-### 3. Block F Rest (nach Pipeline)
-
-| Phase | Was | Status |
-|-------|-----|--------|
-| F1 | Regulation Registry → DB | ✅ 162 Eintraege |
-| F2 | ACTION_TYPES + Synonyme → DB | ✅ 34 Types + 145 Synonyme |
-| F3 | Object Synonyms → DB | ✅ 75 Synonyme |
-| F4 | LLM Synonym-Enrichment | Ausstehend |
-| F5 | Validation + Cleanup | Ausstehend |
+### Infrastruktur
+- Vault CPU-Fix committed (Marker-File + idempotente Checks)
+- Pass 0a Endpoint im Core Control-Pipeline registriert
+- 61 neue regulation_ids in regulation_registry eingefuegt
+- Container bp-core-vault, bp-lehrer-opensearch, fewo-finance-agent gestoppt (CPU-Saver)
 
 ---
 
-## SESSION 03-04.05.2026 ERLEDIGT
-
-### Block F (Hardcoded Knowledge Migration)
-- F1: regulation_registry Tabelle + 162 Eintraege migriert + 34 Tests
-- F2: action_types (34) + action_synonyms (145) + Tests
-- F3: object_synonyms (75) + Tests
-- Alle 3 mit DB-backed Cache (5min TTL) + Dict-Fallback
-- 446 Tests pass, 0 Regressionen
-
-### D5 + E1c Verifizierung
-- D5 Re-Ingestion: KOMPLETT (419/423 Docs, 4 NIST-"Fehler" = Duplikate von .txt)
-- E1c BGB: § 312k VORHANDEN, 93% Section-Coverage, 3053 Chunks
-- Section-Metadata: gesetze=83%, ce=52%, datenschutz=50%
-
-### Control Generation gestartet
-- Job 1 (bp_compliance_gesetze, DE/CH/AT-Gesetze) laeuft seit ~00:30
-- 61 neue regulation_ids identifiziert (nicht in canonical_processed_chunks)
-- regulation_filter verhindert Doppelverarbeitung von re-ingestierten Dokumenten
-
----
-
-## DB-Tabellen (Block F)
+## DB-Tabellen (alle Bloecke)
 
 | Tabelle | Rows | Migration |
 |---------|------|-----------|
-| compliance.regulation_registry | 162 | 002_regulation_registry.sql |
+| compliance.regulation_registry | 223 | 002_regulation_registry.sql |
 | compliance.action_types | 34 | 003_action_object_ontology.sql |
-| compliance.action_synonyms | 145 | 003_action_object_ontology.sql |
-| compliance.object_synonyms | 75 | 003_action_object_ontology.sql |
+| compliance.action_synonyms | 368 | 003_action_object_ontology.sql |
+| compliance.object_synonyms | 320 | 003_action_object_ontology.sql |
+
+---
+
+## GESTOPPTE CONTAINER (wieder starten wenn noetig)
+
+```bash
+ssh macmini "/usr/local/bin/docker start bp-core-vault bp-lehrer-opensearch"
+# fewo-finance-agent: fremder Container, nicht starten
+```
+
+**Vault:** Erst nach Deploy des Fixes (Marker-File) starten, sonst CPU-Loop.
 
 ---
 
 ## TESTS
 
 ```bash
-# Pipeline (446 Tests)
+# Pipeline (454 Tests)
 PYTHONPATH=control-pipeline python3 -m pytest control-pipeline/tests/ -v
-
-# Embedding-Service (99 Tests)
-cd embedding-service && python3 -m pytest test_chunking.py test_d4_bgb.py test_nist_normalization.py -v
 ```
+
+---
+
+## API-Zugriff (WICHTIG)
+
+- **Control-Pipeline:** Nur via Docker exec erreichbar (Port 8098 blockiert durch document-crawler)
+  ```bash
+  ssh macmini "/usr/local/bin/docker exec bp-core-control-pipeline curl -sf http://127.0.0.1:8098/..."
+  ```
+- **Compliance Backend:** Zeigt auf PRODUCTION DB (nicht lokal!)
+- **Pass 0a Endpoint:** `/v1/canonical/generate/run-pass0a` (auf Core Pipeline, Port 8098)
diff --git a/control-pipeline/scripts/gpre1_analyze.py b/control-pipeline/scripts/gpre1_analyze.py
new file mode 100644
index 0000000..c144b37
--- /dev/null
+++ b/control-pipeline/scripts/gpre1_analyze.py
@@ -0,0 +1,37 @@
+#!/usr/bin/env python3
+"""G-pre1: Analyze unique objects and test normalization reduction."""
+from collections import Counter
+from sqlalchemy import create_engine, text
+
+engine = create_engine(
+    "postgresql://breakpilot:breakpilot123@postgres:5432/breakpilot_db",
+    connect_args={"options": "-c search_path=compliance,public"},
+)
+
+with engine.connect() as c:
+    rows = c.execute(text("""
+        SELECT DISTINCT
+            split_part(generation_metadata->>'merge_group_hint', ':', 2) AS obj
+        FROM canonical_controls
+        WHERE generation_metadata->>'merge_group_hint' IS NOT NULL
+          AND generation_metadata->>'merge_group_hint' != ''
+    """)).fetchall()
+
+objects = [r[0] for r in rows if r[0] and r[0].strip()]
+print("Unique raw objects: %d" % len(objects))
+
+from services.control_dedup import normalize_object
+
+norm_counts: Counter = Counter()
+for obj in objects:
+    norm_counts[normalize_object(obj)] += 1
+
+print("After normalize_object(): %d unique" % len(norm_counts))
+print("Reduction: %.1f%%" % ((1 - len(norm_counts) / len(objects)) * 100))
+print()
+print("Top 20 normalized objects:")
+for token, count in norm_counts.most_common(20):
+    print("  %5d  %s" % (count, token))
+print()
+print("Singletons (only 1 raw object): %d" % sum(1 for c in norm_counts.values() if c == 1))
+print("Groups with 2+ members: %d" % sum(1 for c in norm_counts.values() if c >= 2))