Commit Graph

23 Commits

Author SHA1 Message Date
Benjamin Admin
491df4e1b0 feat: add Chunk-Browser tab to RAG page
- New 'Chunk-Browser' tab for sequential chunk browsing
- Qdrant scroll API proxy (scroll + collection-count actions)
- Pagination with prev/next through all chunks in a collection
- Text search filter with highlighting
- Click to expand chunk and see all metadata
- 'In Chunks suchen' button now navigates to Chunk-Browser with correct collection

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 09:35:52 +01:00
Benjamin Admin
954103cdf2 feat(ocr-pipeline): add Step 5 word recognition (grid from columns × rows)
Backend: build_word_grid() intersects column regions with content rows,
OCRs each cell with language-specific Tesseract, and returns vocabulary
entries with percent-based bounding boxes. New endpoints: POST /words,
GET /image/words-overlay, ground-truth save/retrieve for words.
Frontend: StepWordRecognition with overview + step-through labeling modes,
goToStep callback for row correction feedback loop.
MkDocs: OCR Pipeline documentation added.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 02:18:29 +01:00
Benjamin Admin
47dc2e6f7a feat(rag): source URLs, low-chunk warnings & IFRS/EFRAG entries
- Add REGULATION_SOURCES map with 88 original document URLs for all
  regulations (EUR-Lex, gesetze-im-internet.de, RIS, Fedlex, etc.)
- Render "Originalquelle →" link in regulation detail panel
- Add amber warning indicator for suspiciously low chunk counts (<10)
- Add EU_IFRS_DE, EU_IFRS_EN, EFRAG_ENDORSEMENT to RAG tracking

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 01:56:09 +01:00
Benjamin Admin
b58aecd081 feat(ocr-pipeline): add Step 4 row detection UI in admin frontend
Insert rows step between columns and words in the pipeline wizard.
Shows overlay image, row list with type badges, and ground truth controls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 01:28:05 +01:00
Benjamin Admin
c7ae44ff17 feat(rag): add 42 new regulations to RAG overview + update collection totals
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 33s
CI / test-go-edu-search (push) Successful in 25s
CI / test-python-klausur (push) Failing after 1m46s
CI / test-python-agent-core (push) Successful in 15s
CI / test-nodejs-website (push) Successful in 23s
New regulations across bp_compliance_ce (11), bp_compliance_gesetze (31),
and bp_compliance_datenschutz (1). Collection totals updated:
gesetze 58304, ce 18183, datenschutz 2448, total 103912.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 01:04:27 +01:00
Benjamin Admin
b03cb0a1e6 Fix Landkarte tab crash: variable name shadowed isInRag function
Local variables named 'isInRag' shadowed the outer function, causing
"isInRag is not a function" error. Renamed to regInRag/codeInRag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-28 00:01:01 +01:00
Benjamin Admin
5a45cbf605 Update RAG page: Chunks/Status columns use hardcoded data, Key Intersections show RAG status
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 24s
CI / test-go-edu-search (push) Successful in 24s
CI / test-python-klausur (push) Failing after 1m36s
CI / test-python-agent-core (push) Successful in 16s
CI / test-nodejs-website (push) Successful in 15s
- Chunks column now uses getKnownChunks() instead of API-based getRegulationChunks()
- Status column uses isInRag() check (green/red) instead of ratio-based calculation
- Key Intersections chips show green/red with checkmark/cross based on RAG status

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 23:53:21 +01:00
Benjamin Admin
2297f66edb feat(rag): Add RAG status indicators and 4 new EU regulations
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 26s
CI / test-go-edu-search (push) Successful in 27s
CI / test-python-klausur (push) Failing after 1m39s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 23s
- Add REGULATIONS_IN_RAG Set tracking all 42 regulations currently in Qdrant
- Add 4 new regulation entries: E-Commerce-RL, Verbraucherrechte-RL,
  Digitale-Inhalte-RL, DMA (all ingested Feb 2026)
- Add RAG column to regulations table with green check/red x indicators
- Update Landkarte tab: green/x on industry cards, thematic clusters,
  and regulation matrix
- Replace old "Integrated Regulations" section with full RAG coverage overview
- Update hardcoded chunk counts (Templates: 7689, NiBiS: 7996)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 23:23:52 +01:00
Benjamin Admin
587b066a40 feat(ocr-pipeline): ground-truth comparison tool for column detection
Side-by-side view: auto result (readonly) vs GT editor where teacher
draws correct columns. Diff table shows Auto vs GT with IoU matching.
GT data persisted per session for algorithm tuning.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 22:48:37 +01:00
Benjamin Admin
bb879a03a8 feat(ocr-pipeline): add column_ignore type for margins/empty areas
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 08:51:56 +01:00
Benjamin Admin
7a3570fe46 feat(ocr-pipeline): manual column editor for Step 3
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 08:27:54 +01:00
Benjamin Admin
1393a994f9 Flexible inhaltsbasierte Spaltenerkennung (2-Phasen)
Ersetzt hardcodierte Positionsregeln durch ein zweistufiges System:
Phase A erkennt Spaltengeometrie (Clustering), Phase B klassifiziert
Typen per Inhalt (Sprache/Rolle) mit 3-stufiger Fallback-Kette.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 23:33:35 +01:00
Benjamin Admin
cf27a95308 feat(ocr-pipeline): word-based 5-column detection for vocabulary pages
Replace projection-profile layout analysis with Tesseract word bounding
box clustering to detect 5-column vocabulary layouts (page_ref, EN, DE,
markers, examples). Falls back to projection profiles when < 3 clusters.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 23:08:14 +01:00
Benjamin Admin
aa06ae0f61 feat: Persistente Sessions (PostgreSQL) + Spaltenerkennung (Step 3)
Sessions werden jetzt in PostgreSQL gespeichert statt in-memory.
Neue Session-Liste mit Name, Datum, Schritt. Sessions ueberleben
Browser-Refresh und Container-Neustart. Step 3 nutzt analyze_layout()
fuer automatische Spaltenerkennung mit farbigem Overlay.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 22:16:37 +01:00
Benjamin Admin
09b820efbe refactor(dewarp): replace displacement map with affine shear correction
The old displacement-map approach shifted entire rows by a parabolic
profile, creating a circle/barrel distortion. The actual problem is
a linear vertical shear: after deskew aligns horizontal lines, the
vertical column edges are still tilted by ~0.5°.

New approach:
- Detect shear angle from strongest vertical edge slope (not curvature)
- Apply cv2.warpAffine shear to straighten vertical features
- Manual slider: -2.0° to +2.0° in 0.05° steps
- Slider initializes to auto-detected shear angle
- Ground truth question: "Spalten vertikal ausgerichtet?"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 18:23:04 +01:00
Benjamin Admin
9df745574b fix(ocr-pipeline): dewarp visibility, grid on both sides, session persistence
- Fix dewarp method selection: prefer methods with >5px curvature over
  higher confidence (vertical_edge 79px was being ignored for text_baseline 2px)
- Add grid overlay on left image in Dewarp step for side-by-side comparison
- Add GET /sessions/{id} endpoint to reload session data
- StepDeskew accepts sessionId prop to restore state when navigating back
- SessionInfo type extended with optional deskew_result and dewarp_result

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 17:29:53 +01:00
Benjamin Admin
589d2f811a feat: Dewarp-Korrektur als Schritt 2 in OCR Pipeline (7 Schritte)
Implementiert Buchwoelbungs-Entzerrung mit zwei Methoden:
- Methode A: Vertikale-Kanten-Analyse (Sobel + Polynom 2. Grades)
- Methode B: Textzeilen-Baseline (Tesseract + Baseline-Kruemmung)
Beste Methode wird automatisch gewaehlt, manueller Slider (-3 bis +3).

Backend: 3 neue Endpoints (auto/manual dewarp, ground truth)
Frontend: StepDewarp + DewarpControls, Pipeline von 6 auf 7 Schritte

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 16:46:41 +01:00
Benjamin Admin
d552fd8b6b feat: OCR Pipeline mit 6-Schritt-Wizard fuer Seitenrekonstruktion
All checks were successful
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 38s
CI / test-go-edu-search (push) Successful in 29s
CI / test-python-klausur (push) Successful in 1m46s
CI / test-python-agent-core (push) Successful in 17s
CI / test-nodejs-website (push) Successful in 22s
Neue Route /ai/ocr-pipeline mit schrittweiser Begradigung (Deskew),
Raster-Overlay und Ground Truth. Schritte 2-6 als Platzhalter.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 15:38:08 +01:00
Benjamin Boenisch
6a53f8d79c refactor: Remove all SDK/compliance pages and API routes from admin-lehrer
SDK/compliance content belongs exclusively in admin-compliance (port 3007).
Removed:
- All (sdk)/ pages (document-crawler, dsb-portal, industry-templates, multi-tenant, sso)
- All api/sdk/ proxy routes
- All developers/sdk/ documentation pages
- Unused lib/sdk/ modules (kept: catalog-manager + its deps for dashboard)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-14 09:24:36 +01:00
Benjamin Boenisch
27f1899428 feat: Sync SDK modules, API routes, blog and docs from admin-v2
- DSB Portal, Industry Templates, Multi-Tenant, SSO frontend pages
- All SDK API proxy routes (academy, crawler, incidents, vendors, whistleblower, etc.)
- Blog section with compliance articles
- BYOEH system documentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 21:12:30 +01:00
Benjamin Boenisch
1bd1a0002a feat: Sync Document Crawler frontend page and API proxy (Phase 1.4)
Synced from admin-v2: 4-tab crawler page and catch-all API proxy route.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-13 20:35:29 +01:00
Benjamin Boenisch
8c1f7a783f refactor(admin-lehrer): Rename to Admin Lehrer KI and remove migrated categories
Remove communication, infrastructure, and development categories from
navigation (now in Admin Core on port 3008). Rename Admin v2 to
Admin Lehrer KI in sidebar, header, and browser title.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-12 20:23:16 +01:00
Benjamin Boenisch
5a31f52310 Initial commit: breakpilot-lehrer - Lehrer KI Platform
Services: Admin-Lehrer, Backend-Lehrer, Studio v2, Website,
Klausur-Service, School-Service, Voice-Service, Geo-Service,
BreakPilot Drive, Agent-Core

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 23:47:26 +01:00