Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 29s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m24s
CI / test-python-agent-core (push) Successful in 22s
CI / test-nodejs-website (push) Successful in 20s
Phase 1 of the clean architecture refactor: Replaces the 751-line ocr-overlay monolith with a modular pipeline. Each step gets its own component file. Frontend: /ai/ocr-kombi route with 11 steps (Upload, Orientation, PageSplit, Deskew, Dewarp, ContentCrop, OCR, Structure, GridBuild, GridReview, GroundTruth). Session list supports document grouping for multi-page uploads. Backend: New ocr_kombi/ module with multi-page PDF upload (splits PDF into N sessions with shared document_group_id). DB migration adds document_group_id and page_number columns. Old /ai/ocr-overlay remains fully functional for A/B testing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
13 lines
556 B
SQL
13 lines
556 B
SQL
-- Migration: Add document_group_id and page_number for multi-page document grouping.
|
|
-- A document_group_id groups multiple sessions that belong to the same scanned document.
|
|
-- page_number is the 1-based page index within the group.
|
|
|
|
ALTER TABLE ocr_pipeline_sessions
|
|
ADD COLUMN IF NOT EXISTS document_group_id UUID,
|
|
ADD COLUMN IF NOT EXISTS page_number INT;
|
|
|
|
-- Index for efficient group lookups
|
|
CREATE INDEX IF NOT EXISTS idx_ocr_sessions_document_group
|
|
ON ocr_pipeline_sessions (document_group_id)
|
|
WHERE document_group_id IS NOT NULL;
|