feat(ocr): Add Grid Detection v4 tests, docs, and SBOM update
- Add comprehensive tests for grid_detection_service.py (31 tests) - mm coordinate conversion tests - Deskew calculation tests - Column detection tests - Integration tests for vocabulary tables - Add OCR-Compare documentation (OCR-Compare.md) - mm coordinate system documentation - Deskew correction documentation - Worksheet Editor integration guide - API endpoints documentation - Add TypeScript tests for ocr-integration.ts - mm to pixel conversion tests - OCR export format tests - localStorage operations tests - Update SBOM to v1.5.0 - Add OCR Grid Detection System section - Document Fabric.js (MIT) for Worksheet Editor - Document NumPy and OpenCV usage Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
366
docs-src/services/klausur-service/OCR-Compare.md
Normal file
366
docs-src/services/klausur-service/OCR-Compare.md
Normal file
@@ -0,0 +1,366 @@
|
|||||||
|
# OCR Compare Tool - Dokumentation
|
||||||
|
|
||||||
|
**Status:** Produktiv
|
||||||
|
**Version:** 4.0
|
||||||
|
**Letzte Aktualisierung:** 2026-02-08
|
||||||
|
**URL:** https://macmini:3002/ai/ocr-compare
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Übersicht
|
||||||
|
|
||||||
|
Das OCR Compare Tool ermöglicht die automatische Analyse von gescannten Vokabeltabellen mit:
|
||||||
|
- Grid-basierter OCR-Erkennung
|
||||||
|
- Automatischer Spalten-Erkennung (Englisch/Deutsch/Beispiel)
|
||||||
|
- mm-Koordinatensystem für präzise Positionierung
|
||||||
|
- Deskew-Korrektur für schiefe Scans
|
||||||
|
- Export zum Worksheet-Editor
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architektur
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ Frontend (admin-v2) │
|
||||||
|
│ /admin-v2/app/(admin)/ai/ocr-compare/page.tsx │
|
||||||
|
│ - Bild-Upload │
|
||||||
|
│ - Grid-Overlay Visualisierung │
|
||||||
|
│ - Cell-Edit Popup │
|
||||||
|
│ - Export zum Worksheet-Editor │
|
||||||
|
└─────────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ klausur-service (FastAPI) │
|
||||||
|
│ Port 8086 - /klausur-service/backend/ │
|
||||||
|
│ - /api/v1/ocr/analyze-grid (Grid-Analyse) │
|
||||||
|
│ - services/grid_detection_service.py (v4) │
|
||||||
|
└─────────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────────┐
|
||||||
|
│ PaddleOCR Service │
|
||||||
|
│ Port 8088 - OCR-Erkennung │
|
||||||
|
└─────────────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Features (Version 4)
|
||||||
|
|
||||||
|
### 1. mm-Koordinatensystem
|
||||||
|
|
||||||
|
Alle Koordinaten werden im A4-Format (210x297mm) ausgegeben:
|
||||||
|
|
||||||
|
| Feld | Beschreibung |
|
||||||
|
|------|--------------|
|
||||||
|
| `x_mm` | X-Position in mm (0-210) |
|
||||||
|
| `y_mm` | Y-Position in mm (0-297) |
|
||||||
|
| `width_mm` | Breite in mm |
|
||||||
|
| `height_mm` | Höhe in mm |
|
||||||
|
|
||||||
|
**Konvertierung:**
|
||||||
|
```typescript
|
||||||
|
// Prozent zu mm
|
||||||
|
const x_mm = (x_percent / 100) * 210
|
||||||
|
const y_mm = (y_percent / 100) * 297
|
||||||
|
|
||||||
|
// mm zu Pixel (für Canvas bei 96 DPI)
|
||||||
|
const MM_TO_PX = 3.7795275591
|
||||||
|
const x_px = x_mm * MM_TO_PX
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Deskew-Korrektur
|
||||||
|
|
||||||
|
Automatische Ausrichtung schiefer Scans basierend auf der ersten Spalte:
|
||||||
|
|
||||||
|
1. **Erkennung:** Alle Wörter in der ersten Spalte (x < 33%) werden analysiert
|
||||||
|
2. **Berechnung:** Lineare Regression auf den linken Kanten
|
||||||
|
3. **Korrektur:** Rotation aller Koordinaten um den berechneten Winkel
|
||||||
|
4. **Limitierung:** Maximal ±5° Korrektur
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Deskew-Winkel im Response
|
||||||
|
{
|
||||||
|
"deskew_angle_deg": -1.2, # Negativer Wert = nach links geneigt
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Spalten-Erkennung mit 1mm Margin
|
||||||
|
|
||||||
|
Spalten werden automatisch erkannt und beginnen 1mm vor dem ersten Wort:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"detected_columns": [
|
||||||
|
{
|
||||||
|
"column_type": "english",
|
||||||
|
"x_start": 9.52, // Prozent
|
||||||
|
"x_end": 35.0,
|
||||||
|
"x_start_mm": 20.0, // mm (1mm vor erstem Wort)
|
||||||
|
"x_end_mm": 73.5,
|
||||||
|
"word_count": 15
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"column_type": "german",
|
||||||
|
"x_start_mm": 74.0,
|
||||||
|
"x_end_mm": 140.0,
|
||||||
|
"word_count": 15
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"column_type": "example",
|
||||||
|
"x_start_mm": 141.0,
|
||||||
|
"x_end_mm": 200.0,
|
||||||
|
"word_count": 12
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Zellen-Status
|
||||||
|
|
||||||
|
| Status | Beschreibung |
|
||||||
|
|--------|--------------|
|
||||||
|
| `empty` | Keine OCR-Erkennung in dieser Zelle |
|
||||||
|
| `recognized` | Text erkannt mit Confidence ≥ 50% |
|
||||||
|
| `problematic` | Text erkannt mit Confidence < 50% |
|
||||||
|
| `manual` | Manuell korrigiert |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## API-Endpoints
|
||||||
|
|
||||||
|
### POST /api/v1/ocr/analyze-grid
|
||||||
|
|
||||||
|
Analysiert ein Bild und erkennt die Vokabeltabellen-Struktur.
|
||||||
|
|
||||||
|
**Request:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"image_base64": "data:image/jpeg;base64,...",
|
||||||
|
"min_confidence": 0.5,
|
||||||
|
"padding": 2.0
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"row": 0,
|
||||||
|
"col": 0,
|
||||||
|
"x": 10.0,
|
||||||
|
"y": 15.0,
|
||||||
|
"width": 25.0,
|
||||||
|
"height": 3.0,
|
||||||
|
"x_mm": 21.0,
|
||||||
|
"y_mm": 44.55,
|
||||||
|
"width_mm": 52.5,
|
||||||
|
"height_mm": 8.91,
|
||||||
|
"text": "house",
|
||||||
|
"confidence": 0.95,
|
||||||
|
"status": "recognized",
|
||||||
|
"column_type": "english",
|
||||||
|
"logical_row": 0,
|
||||||
|
"logical_col": 0
|
||||||
|
}
|
||||||
|
]
|
||||||
|
],
|
||||||
|
"detected_columns": [...],
|
||||||
|
"page_dimensions": {
|
||||||
|
"width_mm": 210.0,
|
||||||
|
"height_mm": 297.0,
|
||||||
|
"format": "A4"
|
||||||
|
},
|
||||||
|
"deskew_angle_deg": -0.5,
|
||||||
|
"statistics": {
|
||||||
|
"total_cells": 45,
|
||||||
|
"recognized_cells": 42,
|
||||||
|
"problematic_cells": 3,
|
||||||
|
"empty_cells": 0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Frontend-Komponenten
|
||||||
|
|
||||||
|
### GridOverlay.tsx
|
||||||
|
|
||||||
|
Zeigt die erkannten Zellen als farbiges Overlay über dem Bild.
|
||||||
|
|
||||||
|
**Props:**
|
||||||
|
```typescript
|
||||||
|
interface GridOverlayProps {
|
||||||
|
cells: GridCell[][]
|
||||||
|
imageWidth: number
|
||||||
|
imageHeight: number
|
||||||
|
showLabels?: boolean
|
||||||
|
onCellClick?: (cell: GridCell) => void
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Farbkodierung:**
|
||||||
|
- Grün: `recognized` (gut erkannt)
|
||||||
|
- Gelb: `problematic` (niedrige Confidence)
|
||||||
|
- Grau: `empty`
|
||||||
|
- Blau: `manual` (manuell korrigiert)
|
||||||
|
|
||||||
|
### CellEditPopup.tsx
|
||||||
|
|
||||||
|
Popup zum Bearbeiten einer Zelle.
|
||||||
|
|
||||||
|
**Features:**
|
||||||
|
- Text bearbeiten
|
||||||
|
- Spaltentyp ändern (English/German/Example)
|
||||||
|
- Confidence anzeigen
|
||||||
|
- mm-Koordinaten anzeigen
|
||||||
|
- Keyboard-Shortcuts: Ctrl+Enter (Speichern), Esc (Abbrechen)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Worksheet-Editor Integration
|
||||||
|
|
||||||
|
### Export
|
||||||
|
|
||||||
|
Der "Zum Editor exportieren" Button speichert die OCR-Daten in localStorage:
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
interface OCRExportData {
|
||||||
|
version: '1.0'
|
||||||
|
source: 'ocr-compare'
|
||||||
|
exported_at: string
|
||||||
|
session_id: string
|
||||||
|
page_number: number
|
||||||
|
page_dimensions: {
|
||||||
|
width_mm: number
|
||||||
|
height_mm: number
|
||||||
|
format: string
|
||||||
|
}
|
||||||
|
words: OCRWord[]
|
||||||
|
detected_columns: DetectedColumn[]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**localStorage Keys:**
|
||||||
|
- `ocr_export_{session_id}_{page_number}`: Export-Daten
|
||||||
|
- `ocr_export_latest`: Referenz zum neuesten Export
|
||||||
|
|
||||||
|
### Import im Worksheet-Editor
|
||||||
|
|
||||||
|
1. Öffnen Sie den Worksheet-Editor: https://macmini/worksheet-editor
|
||||||
|
2. Klicken Sie auf den OCR-Import Button (grünes Icon)
|
||||||
|
3. Die Wörter werden auf dem Canvas platziert
|
||||||
|
|
||||||
|
**Konvertierung mm → Pixel:**
|
||||||
|
```typescript
|
||||||
|
const MM_TO_PX = 3.7795275591
|
||||||
|
const x_px = word.x_mm * MM_TO_PX
|
||||||
|
const y_px = word.y_mm * MM_TO_PX
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Dateien
|
||||||
|
|
||||||
|
### Backend (klausur-service)
|
||||||
|
|
||||||
|
| Datei | Beschreibung |
|
||||||
|
|-------|--------------|
|
||||||
|
| `services/grid_detection_service.py` | Grid-Erkennung v4 mit Deskew |
|
||||||
|
| `tests/test_grid_detection.py` | Unit Tests |
|
||||||
|
|
||||||
|
### Frontend (admin-v2)
|
||||||
|
|
||||||
|
| Datei | Beschreibung |
|
||||||
|
|-------|--------------|
|
||||||
|
| `app/(admin)/ai/ocr-compare/page.tsx` | Haupt-UI |
|
||||||
|
| `components/ocr/GridOverlay.tsx` | Grid-Visualisierung |
|
||||||
|
| `components/ocr/CellEditPopup.tsx` | Zellen-Editor |
|
||||||
|
|
||||||
|
### Frontend (studio-v2)
|
||||||
|
|
||||||
|
| Datei | Beschreibung |
|
||||||
|
|-------|--------------|
|
||||||
|
| `lib/worksheet-editor/ocr-integration.ts` | OCR Import/Export Utility |
|
||||||
|
| `app/worksheet-editor/page.tsx` | Editor mit OCR-Import |
|
||||||
|
| `components/worksheet-editor/EditorToolbar.tsx` | Toolbar mit OCR-Button |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Deployment
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Backend synchronisieren
|
||||||
|
scp grid_detection_service.py macmini:.../klausur-service/backend/services/
|
||||||
|
|
||||||
|
# 2. Tests synchronisieren
|
||||||
|
scp test_grid_detection.py macmini:.../klausur-service/backend/tests/
|
||||||
|
|
||||||
|
# 3. klausur-service neu bauen
|
||||||
|
ssh macmini "docker compose build --no-cache klausur-service"
|
||||||
|
|
||||||
|
# 4. Container starten
|
||||||
|
ssh macmini "docker compose up -d klausur-service"
|
||||||
|
|
||||||
|
# 5. Frontend (admin-v2) deployen
|
||||||
|
ssh macmini "docker compose build --no-cache admin-v2 && docker compose up -d admin-v2"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Verwendete Open-Source-Bibliotheken
|
||||||
|
|
||||||
|
| Bibliothek | Version | Lizenz | Verwendung |
|
||||||
|
|------------|---------|--------|------------|
|
||||||
|
| NumPy | ≥1.24 | BSD-3-Clause | Deskew-Berechnung (polyfit) |
|
||||||
|
| OpenCV | ≥4.8 | Apache-2.0 | Bildverarbeitung (optional) |
|
||||||
|
| PaddleOCR | 2.7 | Apache-2.0 | OCR-Erkennung |
|
||||||
|
| Fabric.js | 6.x | MIT | Canvas-Rendering (Frontend) |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Fehlerbehandlung
|
||||||
|
|
||||||
|
### Häufige Probleme
|
||||||
|
|
||||||
|
| Problem | Lösung |
|
||||||
|
|---------|--------|
|
||||||
|
| "Grid analysieren" lädt nicht | klausur-service Container prüfen |
|
||||||
|
| Keine Zellen erkannt | Min. Confidence reduzieren |
|
||||||
|
| Falsche Spalten-Zuordnung | Manuell im CellEditPopup korrigieren |
|
||||||
|
| Export funktioniert nicht | Browser-Console auf Fehler prüfen |
|
||||||
|
|
||||||
|
### Logging
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# klausur-service Logs
|
||||||
|
docker logs breakpilot-pwa-klausur-service --tail=100
|
||||||
|
|
||||||
|
# Grid Detection spezifisch
|
||||||
|
docker logs breakpilot-pwa-klausur-service 2>&1 | grep "grid_detection"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Änderungshistorie
|
||||||
|
|
||||||
|
| Version | Datum | Änderungen |
|
||||||
|
|---------|-------|------------|
|
||||||
|
| 4.0 | 2026-02-08 | Deskew-Korrektur, 1mm Column Margin |
|
||||||
|
| 3.0 | 2026-02-07 | mm-Koordinatensystem |
|
||||||
|
| 2.0 | 2026-02-06 | Spalten-Erkennung |
|
||||||
|
| 1.0 | 2026-02-05 | Initiale Implementierung |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Referenzen
|
||||||
|
|
||||||
|
- [Worksheet-Editor Architektur](Worksheet-Editor-Architecture.md)
|
||||||
|
- [OCR Labeling Spec](OCR-Labeling-Spec.md)
|
||||||
|
- [SBOM](/infrastructure/sbom)
|
||||||
385
klausur-service/backend/tests/test_grid_detection.py
Normal file
385
klausur-service/backend/tests/test_grid_detection.py
Normal file
@@ -0,0 +1,385 @@
|
|||||||
|
"""
|
||||||
|
Tests for Grid Detection Service v4
|
||||||
|
|
||||||
|
Tests cover:
|
||||||
|
- mm coordinate conversion
|
||||||
|
- Deskew calculation
|
||||||
|
- Column detection with 1mm margin
|
||||||
|
- Data class functionality
|
||||||
|
|
||||||
|
Lizenz: Apache 2.0 (kommerziell nutzbar)
|
||||||
|
"""
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
import math
|
||||||
|
from typing import List
|
||||||
|
|
||||||
|
# Import the service under test
|
||||||
|
import sys
|
||||||
|
sys.path.insert(0, '/app')
|
||||||
|
|
||||||
|
from services.grid_detection_service import (
|
||||||
|
GridDetectionService,
|
||||||
|
OCRRegion,
|
||||||
|
GridCell,
|
||||||
|
CellStatus,
|
||||||
|
ColumnType,
|
||||||
|
A4_WIDTH_MM,
|
||||||
|
A4_HEIGHT_MM,
|
||||||
|
COLUMN_MARGIN_MM,
|
||||||
|
COLUMN_MARGIN_PCT
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class TestOCRRegionMMConversion:
|
||||||
|
"""Test mm coordinate conversion for OCR regions."""
|
||||||
|
|
||||||
|
def test_x_mm_conversion(self):
|
||||||
|
"""Test X coordinate conversion from percent to mm."""
|
||||||
|
# 50% of A4 width = 105mm
|
||||||
|
region = OCRRegion(text="test", confidence=0.9, x=50.0, y=0.0, width=10.0, height=5.0)
|
||||||
|
assert region.x_mm == 105.0
|
||||||
|
|
||||||
|
def test_y_mm_conversion(self):
|
||||||
|
"""Test Y coordinate conversion from percent to mm."""
|
||||||
|
# 33.33% of A4 height = 99mm (approx)
|
||||||
|
region = OCRRegion(text="test", confidence=0.9, x=0.0, y=33.33, width=10.0, height=5.0)
|
||||||
|
assert abs(region.y_mm - 99.0) < 0.5
|
||||||
|
|
||||||
|
def test_width_mm_conversion(self):
|
||||||
|
"""Test width conversion from percent to mm."""
|
||||||
|
# 10% of A4 width = 21mm
|
||||||
|
region = OCRRegion(text="test", confidence=0.9, x=0.0, y=0.0, width=10.0, height=5.0)
|
||||||
|
assert region.width_mm == 21.0
|
||||||
|
|
||||||
|
def test_height_mm_conversion(self):
|
||||||
|
"""Test height conversion from percent to mm."""
|
||||||
|
# 5% of A4 height = 14.85mm
|
||||||
|
region = OCRRegion(text="test", confidence=0.9, x=0.0, y=0.0, width=10.0, height=5.0)
|
||||||
|
assert abs(region.height_mm - 14.85) < 0.01
|
||||||
|
|
||||||
|
def test_center_coordinates(self):
|
||||||
|
"""Test center coordinate calculation."""
|
||||||
|
region = OCRRegion(text="test", confidence=0.9, x=10.0, y=20.0, width=20.0, height=10.0)
|
||||||
|
assert region.center_x == 20.0
|
||||||
|
assert region.center_y == 25.0
|
||||||
|
|
||||||
|
def test_right_bottom_edges(self):
|
||||||
|
"""Test right and bottom edge calculation."""
|
||||||
|
region = OCRRegion(text="test", confidence=0.9, x=10.0, y=20.0, width=30.0, height=15.0)
|
||||||
|
assert region.right == 40.0
|
||||||
|
assert region.bottom == 35.0
|
||||||
|
|
||||||
|
|
||||||
|
class TestGridCellMMConversion:
|
||||||
|
"""Test mm coordinate conversion for grid cells."""
|
||||||
|
|
||||||
|
def test_cell_to_dict_includes_mm(self):
|
||||||
|
"""Test that to_dict includes mm coordinates."""
|
||||||
|
cell = GridCell(row=0, col=0, x=10.0, y=20.0, width=30.0, height=5.0, text="hello")
|
||||||
|
result = cell.to_dict()
|
||||||
|
|
||||||
|
assert "x_mm" in result
|
||||||
|
assert "y_mm" in result
|
||||||
|
assert "width_mm" in result
|
||||||
|
assert "height_mm" in result
|
||||||
|
|
||||||
|
# 10% of 210mm = 21mm
|
||||||
|
assert result["x_mm"] == 21.0
|
||||||
|
# 20% of 297mm = 59.4mm
|
||||||
|
assert result["y_mm"] == 59.4
|
||||||
|
|
||||||
|
def test_cell_mm_coordinates(self):
|
||||||
|
"""Test direct mm property access."""
|
||||||
|
cell = GridCell(row=0, col=0, x=50.0, y=50.0, width=20.0, height=3.0)
|
||||||
|
|
||||||
|
assert cell.x_mm == 105.0 # 50% of 210mm
|
||||||
|
assert cell.y_mm == 148.5 # 50% of 297mm
|
||||||
|
assert cell.width_mm == 42.0 # 20% of 210mm
|
||||||
|
assert abs(cell.height_mm - 8.91) < 0.01 # 3% of 297mm
|
||||||
|
|
||||||
|
def test_cell_to_dict_includes_all_fields(self):
|
||||||
|
"""Test that to_dict includes all expected fields."""
|
||||||
|
cell = GridCell(
|
||||||
|
row=1, col=2, x=10.0, y=20.0, width=30.0, height=5.0,
|
||||||
|
text="test", confidence=0.95, status=CellStatus.RECOGNIZED,
|
||||||
|
column_type=ColumnType.ENGLISH, logical_row=0, logical_col=0,
|
||||||
|
is_continuation=False
|
||||||
|
)
|
||||||
|
result = cell.to_dict()
|
||||||
|
|
||||||
|
assert result["row"] == 1
|
||||||
|
assert result["col"] == 2
|
||||||
|
assert result["text"] == "test"
|
||||||
|
assert result["confidence"] == 0.95
|
||||||
|
assert result["status"] == "recognized"
|
||||||
|
assert result["column_type"] == "english"
|
||||||
|
assert result["logical_row"] == 0
|
||||||
|
assert result["logical_col"] == 0
|
||||||
|
assert result["is_continuation"] == False
|
||||||
|
|
||||||
|
|
||||||
|
class TestA4Constants:
|
||||||
|
"""Test A4 dimension constants."""
|
||||||
|
|
||||||
|
def test_a4_width_mm(self):
|
||||||
|
"""Verify A4 width is 210mm."""
|
||||||
|
assert A4_WIDTH_MM == 210.0
|
||||||
|
|
||||||
|
def test_a4_height_mm(self):
|
||||||
|
"""Verify A4 height is 297mm."""
|
||||||
|
assert A4_HEIGHT_MM == 297.0
|
||||||
|
|
||||||
|
def test_column_margin_mm(self):
|
||||||
|
"""Verify column margin is 1mm."""
|
||||||
|
assert COLUMN_MARGIN_MM == 1.0
|
||||||
|
|
||||||
|
def test_column_margin_percent(self):
|
||||||
|
"""Verify column margin percentage calculation."""
|
||||||
|
expected = (1.0 / 210.0) * 100
|
||||||
|
assert abs(COLUMN_MARGIN_PCT - expected) < 0.001
|
||||||
|
|
||||||
|
|
||||||
|
class TestGridDetectionServiceInit:
|
||||||
|
"""Test GridDetectionService initialization."""
|
||||||
|
|
||||||
|
def test_init_with_defaults(self):
|
||||||
|
"""Test service initializes with default parameters."""
|
||||||
|
service = GridDetectionService()
|
||||||
|
assert service.y_tolerance_pct == 1.5
|
||||||
|
assert service.padding_pct == 0.3
|
||||||
|
assert service.column_margin_mm == COLUMN_MARGIN_MM
|
||||||
|
|
||||||
|
def test_init_with_custom_params(self):
|
||||||
|
"""Test service initializes with custom parameters."""
|
||||||
|
service = GridDetectionService(
|
||||||
|
y_tolerance_pct=2.0,
|
||||||
|
padding_pct=0.5,
|
||||||
|
column_margin_mm=2.0
|
||||||
|
)
|
||||||
|
assert service.y_tolerance_pct == 2.0
|
||||||
|
assert service.padding_pct == 0.5
|
||||||
|
assert service.column_margin_mm == 2.0
|
||||||
|
|
||||||
|
|
||||||
|
class TestDeskewCalculation:
|
||||||
|
"""Test deskew angle calculation."""
|
||||||
|
|
||||||
|
def test_calculate_deskew_no_regions(self):
|
||||||
|
"""Test deskew returns 0 for empty regions."""
|
||||||
|
service = GridDetectionService()
|
||||||
|
angle = service.calculate_deskew_angle([])
|
||||||
|
assert angle == 0.0
|
||||||
|
|
||||||
|
def test_calculate_deskew_few_regions(self):
|
||||||
|
"""Test deskew returns 0 for too few regions."""
|
||||||
|
service = GridDetectionService()
|
||||||
|
regions = [
|
||||||
|
OCRRegion(text="a", confidence=0.9, x=10.0, y=10.0, width=5.0, height=2.0),
|
||||||
|
]
|
||||||
|
angle = service.calculate_deskew_angle(regions)
|
||||||
|
assert angle == 0.0
|
||||||
|
|
||||||
|
def test_calculate_deskew_perfectly_aligned(self):
|
||||||
|
"""Test deskew returns near-zero for perfectly aligned text."""
|
||||||
|
service = GridDetectionService()
|
||||||
|
# Perfectly vertical alignment at x=10%
|
||||||
|
regions = [
|
||||||
|
OCRRegion(text="a", confidence=0.9, x=10.0, y=10.0, width=5.0, height=2.0),
|
||||||
|
OCRRegion(text="b", confidence=0.9, x=10.0, y=20.0, width=5.0, height=2.0),
|
||||||
|
OCRRegion(text="c", confidence=0.9, x=10.0, y=30.0, width=5.0, height=2.0),
|
||||||
|
OCRRegion(text="d", confidence=0.9, x=10.0, y=40.0, width=5.0, height=2.0),
|
||||||
|
OCRRegion(text="e", confidence=0.9, x=10.0, y=50.0, width=5.0, height=2.0),
|
||||||
|
]
|
||||||
|
angle = service.calculate_deskew_angle(regions)
|
||||||
|
assert abs(angle) < 0.5 # Should be very close to 0
|
||||||
|
|
||||||
|
def test_calculate_deskew_tilted_right(self):
|
||||||
|
"""Test deskew detects right tilt."""
|
||||||
|
service = GridDetectionService()
|
||||||
|
# Text tilts right as we go down (x increases with y)
|
||||||
|
regions = [
|
||||||
|
OCRRegion(text="a", confidence=0.9, x=10.0, y=10.0, width=5.0, height=2.0),
|
||||||
|
OCRRegion(text="b", confidence=0.9, x=11.0, y=20.0, width=5.0, height=2.0),
|
||||||
|
OCRRegion(text="c", confidence=0.9, x=12.0, y=30.0, width=5.0, height=2.0),
|
||||||
|
OCRRegion(text="d", confidence=0.9, x=13.0, y=40.0, width=5.0, height=2.0),
|
||||||
|
OCRRegion(text="e", confidence=0.9, x=14.0, y=50.0, width=5.0, height=2.0),
|
||||||
|
]
|
||||||
|
angle = service.calculate_deskew_angle(regions)
|
||||||
|
assert angle > 0 # Positive angle for right tilt
|
||||||
|
|
||||||
|
def test_calculate_deskew_max_angle(self):
|
||||||
|
"""Test deskew is clamped to max 5 degrees."""
|
||||||
|
service = GridDetectionService()
|
||||||
|
# Extreme tilt
|
||||||
|
regions = [
|
||||||
|
OCRRegion(text="a", confidence=0.9, x=5.0, y=10.0, width=5.0, height=2.0),
|
||||||
|
OCRRegion(text="b", confidence=0.9, x=15.0, y=20.0, width=5.0, height=2.0),
|
||||||
|
OCRRegion(text="c", confidence=0.9, x=25.0, y=30.0, width=5.0, height=2.0),
|
||||||
|
OCRRegion(text="d", confidence=0.9, x=35.0, y=40.0, width=5.0, height=2.0),
|
||||||
|
OCRRegion(text="e", confidence=0.9, x=45.0, y=50.0, width=5.0, height=2.0),
|
||||||
|
]
|
||||||
|
angle = service.calculate_deskew_angle(regions)
|
||||||
|
assert abs(angle) <= 5.0 # Clamped to ±5°
|
||||||
|
|
||||||
|
|
||||||
|
class TestDeskewApplication:
|
||||||
|
"""Test deskew coordinate transformation."""
|
||||||
|
|
||||||
|
def test_apply_deskew_zero_angle(self):
|
||||||
|
"""Test no transformation for zero angle."""
|
||||||
|
service = GridDetectionService()
|
||||||
|
regions = [
|
||||||
|
OCRRegion(text="a", confidence=0.9, x=10.0, y=20.0, width=5.0, height=2.0),
|
||||||
|
]
|
||||||
|
result = service.apply_deskew_to_regions(regions, 0.0)
|
||||||
|
|
||||||
|
assert len(result) == 1
|
||||||
|
assert result[0].x == 10.0
|
||||||
|
assert result[0].y == 20.0
|
||||||
|
|
||||||
|
def test_apply_deskew_preserves_text(self):
|
||||||
|
"""Test deskew preserves text and confidence."""
|
||||||
|
service = GridDetectionService()
|
||||||
|
regions = [
|
||||||
|
OCRRegion(text="hello", confidence=0.95, x=10.0, y=20.0, width=5.0, height=2.0),
|
||||||
|
]
|
||||||
|
result = service.apply_deskew_to_regions(regions, 2.0)
|
||||||
|
|
||||||
|
assert result[0].text == "hello"
|
||||||
|
assert result[0].confidence == 0.95
|
||||||
|
|
||||||
|
|
||||||
|
class TestCellStatus:
|
||||||
|
"""Test cell status classification."""
|
||||||
|
|
||||||
|
def test_cell_status_empty(self):
|
||||||
|
"""Test empty cell status."""
|
||||||
|
cell = GridCell(row=0, col=0, x=0, y=0, width=10, height=5, text="")
|
||||||
|
assert cell.status == CellStatus.EMPTY
|
||||||
|
|
||||||
|
def test_cell_status_recognized(self):
|
||||||
|
"""Test recognized cell status."""
|
||||||
|
cell = GridCell(
|
||||||
|
row=0, col=0, x=0, y=0, width=10, height=5,
|
||||||
|
text="hello", confidence=0.9, status=CellStatus.RECOGNIZED
|
||||||
|
)
|
||||||
|
assert cell.status == CellStatus.RECOGNIZED
|
||||||
|
|
||||||
|
def test_cell_status_problematic(self):
|
||||||
|
"""Test problematic cell (low confidence)."""
|
||||||
|
cell = GridCell(
|
||||||
|
row=0, col=0, x=0, y=0, width=10, height=5,
|
||||||
|
text="hello", confidence=0.3, status=CellStatus.PROBLEMATIC
|
||||||
|
)
|
||||||
|
assert cell.status == CellStatus.PROBLEMATIC
|
||||||
|
|
||||||
|
|
||||||
|
class TestColumnType:
|
||||||
|
"""Test column type enum."""
|
||||||
|
|
||||||
|
def test_column_type_values(self):
|
||||||
|
"""Test column type enum values."""
|
||||||
|
assert ColumnType.ENGLISH.value == "english"
|
||||||
|
assert ColumnType.GERMAN.value == "german"
|
||||||
|
assert ColumnType.EXAMPLE.value == "example"
|
||||||
|
assert ColumnType.UNKNOWN.value == "unknown"
|
||||||
|
|
||||||
|
|
||||||
|
class TestDetectGrid:
|
||||||
|
"""Test grid detection functionality."""
|
||||||
|
|
||||||
|
def test_detect_grid_empty_regions(self):
|
||||||
|
"""Test grid detection with empty regions."""
|
||||||
|
service = GridDetectionService()
|
||||||
|
result = service.detect_grid([])
|
||||||
|
|
||||||
|
assert result.rows == 0
|
||||||
|
assert result.columns == 0
|
||||||
|
assert len(result.cells) == 0
|
||||||
|
|
||||||
|
def test_detect_grid_single_word(self):
|
||||||
|
"""Test grid detection with single word."""
|
||||||
|
service = GridDetectionService()
|
||||||
|
regions = [
|
||||||
|
OCRRegion(text="house", confidence=0.9, x=10.0, y=10.0, width=10.0, height=2.0),
|
||||||
|
]
|
||||||
|
result = service.detect_grid(regions)
|
||||||
|
|
||||||
|
assert result.rows >= 1
|
||||||
|
assert result.columns >= 1
|
||||||
|
|
||||||
|
def test_detect_grid_result_has_page_dimensions(self):
|
||||||
|
"""Test that result includes page dimensions."""
|
||||||
|
service = GridDetectionService()
|
||||||
|
regions = [
|
||||||
|
OCRRegion(text="house", confidence=0.9, x=10.0, y=10.0, width=10.0, height=2.0),
|
||||||
|
]
|
||||||
|
result = service.detect_grid(regions)
|
||||||
|
result_dict = result.to_dict()
|
||||||
|
|
||||||
|
assert "page_dimensions" in result_dict
|
||||||
|
assert result_dict["page_dimensions"]["width_mm"] == 210.0
|
||||||
|
assert result_dict["page_dimensions"]["height_mm"] == 297.0
|
||||||
|
assert result_dict["page_dimensions"]["format"] == "A4"
|
||||||
|
|
||||||
|
def test_detect_grid_result_has_stats(self):
|
||||||
|
"""Test that result includes stats."""
|
||||||
|
service = GridDetectionService()
|
||||||
|
regions = [
|
||||||
|
OCRRegion(text="house", confidence=0.9, x=10.0, y=10.0, width=10.0, height=2.0),
|
||||||
|
OCRRegion(text="Haus", confidence=0.8, x=50.0, y=10.0, width=8.0, height=2.0),
|
||||||
|
]
|
||||||
|
result = service.detect_grid(regions)
|
||||||
|
result_dict = result.to_dict()
|
||||||
|
|
||||||
|
assert "stats" in result_dict
|
||||||
|
assert "recognized" in result_dict["stats"]
|
||||||
|
assert "coverage" in result_dict["stats"]
|
||||||
|
|
||||||
|
|
||||||
|
class TestIntegration:
|
||||||
|
"""Integration tests for full analysis pipeline."""
|
||||||
|
|
||||||
|
def test_full_vocabulary_table_analysis(self):
|
||||||
|
"""Test analysis of a typical vocabulary table."""
|
||||||
|
service = GridDetectionService()
|
||||||
|
|
||||||
|
# Simulate a vocabulary table with 3 columns
|
||||||
|
regions = [
|
||||||
|
# Row 1
|
||||||
|
OCRRegion(text="house", confidence=0.95, x=10.0, y=15.0, width=12.0, height=2.5),
|
||||||
|
OCRRegion(text="Haus", confidence=0.92, x=45.0, y=15.0, width=8.0, height=2.5),
|
||||||
|
OCRRegion(text="This is a house.", confidence=0.88, x=70.0, y=15.0, width=25.0, height=2.5),
|
||||||
|
# Row 2
|
||||||
|
OCRRegion(text="car", confidence=0.94, x=10.0, y=22.0, width=8.0, height=2.5),
|
||||||
|
OCRRegion(text="Auto", confidence=0.91, x=45.0, y=22.0, width=9.0, height=2.5),
|
||||||
|
OCRRegion(text="I drive a car.", confidence=0.85, x=70.0, y=22.0, width=22.0, height=2.5),
|
||||||
|
# Row 3
|
||||||
|
OCRRegion(text="tree", confidence=0.96, x=10.0, y=29.0, width=9.0, height=2.5),
|
||||||
|
OCRRegion(text="Baum", confidence=0.93, x=45.0, y=29.0, width=10.0, height=2.5),
|
||||||
|
OCRRegion(text="The tree is tall.", confidence=0.87, x=70.0, y=29.0, width=24.0, height=2.5),
|
||||||
|
]
|
||||||
|
|
||||||
|
result = service.detect_grid(regions)
|
||||||
|
result_dict = result.to_dict()
|
||||||
|
|
||||||
|
# Verify structure
|
||||||
|
assert "cells" in result_dict
|
||||||
|
assert "page_dimensions" in result_dict
|
||||||
|
assert "stats" in result_dict
|
||||||
|
|
||||||
|
# Verify page dimensions
|
||||||
|
assert result_dict["page_dimensions"]["format"] == "A4"
|
||||||
|
|
||||||
|
# Verify cells have mm coordinates
|
||||||
|
if len(result_dict["cells"]) > 0 and len(result_dict["cells"][0]) > 0:
|
||||||
|
cell = result_dict["cells"][0][0]
|
||||||
|
assert "x_mm" in cell
|
||||||
|
assert "y_mm" in cell
|
||||||
|
assert "width_mm" in cell
|
||||||
|
assert "height_mm" in cell
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
pytest.main([__file__, "-v"])
|
||||||
466
studio-v2/lib/worksheet-editor/ocr-integration.test.ts
Normal file
466
studio-v2/lib/worksheet-editor/ocr-integration.test.ts
Normal file
@@ -0,0 +1,466 @@
|
|||||||
|
/**
|
||||||
|
* Tests for OCR Integration Utility
|
||||||
|
*
|
||||||
|
* Tests cover:
|
||||||
|
* - mm to pixel conversion
|
||||||
|
* - OCR data export format
|
||||||
|
* - LocalStorage operations
|
||||||
|
* - Canvas integration
|
||||||
|
*/
|
||||||
|
|
||||||
|
import {
|
||||||
|
MM_TO_PX,
|
||||||
|
A4_WIDTH_MM,
|
||||||
|
A4_HEIGHT_MM,
|
||||||
|
A4_WIDTH_PX,
|
||||||
|
A4_HEIGHT_PX,
|
||||||
|
mmToPixel,
|
||||||
|
pixelToMm,
|
||||||
|
getColumnColor,
|
||||||
|
createTextProps,
|
||||||
|
exportOCRData,
|
||||||
|
saveOCRExportToStorage,
|
||||||
|
loadLatestOCRExport,
|
||||||
|
loadOCRExport,
|
||||||
|
clearOCRExports,
|
||||||
|
type OCRWord,
|
||||||
|
type OCRExportData,
|
||||||
|
type ColumnType,
|
||||||
|
} from './ocr-integration'
|
||||||
|
|
||||||
|
// Mock localStorage
|
||||||
|
const localStorageMock = (() => {
|
||||||
|
let store: Record<string, string> = {}
|
||||||
|
return {
|
||||||
|
getItem: jest.fn((key: string) => store[key] || null),
|
||||||
|
setItem: jest.fn((key: string, value: string) => {
|
||||||
|
store[key] = value
|
||||||
|
}),
|
||||||
|
removeItem: jest.fn((key: string) => {
|
||||||
|
delete store[key]
|
||||||
|
}),
|
||||||
|
clear: jest.fn(() => {
|
||||||
|
store = {}
|
||||||
|
}),
|
||||||
|
keys: () => Object.keys(store),
|
||||||
|
}
|
||||||
|
})()
|
||||||
|
|
||||||
|
Object.defineProperty(window, 'localStorage', { value: localStorageMock })
|
||||||
|
|
||||||
|
describe('Constants', () => {
|
||||||
|
test('MM_TO_PX is correct for 96 DPI', () => {
|
||||||
|
// 1 inch = 25.4mm, 96 DPI = 96 pixels per inch
|
||||||
|
// 96 / 25.4 = 3.7795275591
|
||||||
|
expect(MM_TO_PX).toBeCloseTo(3.7795275591, 8)
|
||||||
|
})
|
||||||
|
|
||||||
|
test('A4 dimensions in mm are correct', () => {
|
||||||
|
expect(A4_WIDTH_MM).toBe(210)
|
||||||
|
expect(A4_HEIGHT_MM).toBe(297)
|
||||||
|
})
|
||||||
|
|
||||||
|
test('A4 dimensions in pixels are calculated correctly', () => {
|
||||||
|
expect(A4_WIDTH_PX).toBe(Math.round(210 * MM_TO_PX)) // ~794
|
||||||
|
expect(A4_HEIGHT_PX).toBe(Math.round(297 * MM_TO_PX)) // ~1123
|
||||||
|
})
|
||||||
|
})
|
||||||
|
|
||||||
|
describe('mmToPixel', () => {
|
||||||
|
test('converts 0mm to 0px', () => {
|
||||||
|
expect(mmToPixel(0)).toBe(0)
|
||||||
|
})
|
||||||
|
|
||||||
|
test('converts 1mm correctly', () => {
|
||||||
|
expect(mmToPixel(1)).toBeCloseTo(3.7795275591, 8)
|
||||||
|
})
|
||||||
|
|
||||||
|
test('converts 100mm correctly', () => {
|
||||||
|
expect(mmToPixel(100)).toBeCloseTo(377.95275591, 6)
|
||||||
|
})
|
||||||
|
|
||||||
|
test('converts A4 width correctly', () => {
|
||||||
|
expect(mmToPixel(210)).toBeCloseTo(793.7, 1)
|
||||||
|
})
|
||||||
|
})
|
||||||
|
|
||||||
|
describe('pixelToMm', () => {
|
||||||
|
test('converts 0px to 0mm', () => {
|
||||||
|
expect(pixelToMm(0)).toBe(0)
|
||||||
|
})
|
||||||
|
|
||||||
|
test('converts 100px correctly', () => {
|
||||||
|
expect(pixelToMm(100)).toBeCloseTo(26.458, 2)
|
||||||
|
})
|
||||||
|
|
||||||
|
test('round-trip conversion is accurate', () => {
|
||||||
|
const original = 50
|
||||||
|
const pixels = mmToPixel(original)
|
||||||
|
const backToMm = pixelToMm(pixels)
|
||||||
|
expect(backToMm).toBeCloseTo(original, 8)
|
||||||
|
})
|
||||||
|
})
|
||||||
|
|
||||||
|
describe('getColumnColor', () => {
|
||||||
|
test('returns blue for english column', () => {
|
||||||
|
expect(getColumnColor('english')).toBe('#1e40af')
|
||||||
|
})
|
||||||
|
|
||||||
|
test('returns green for german column', () => {
|
||||||
|
expect(getColumnColor('german')).toBe('#166534')
|
||||||
|
})
|
||||||
|
|
||||||
|
test('returns purple for example column', () => {
|
||||||
|
expect(getColumnColor('example')).toBe('#6b21a8')
|
||||||
|
})
|
||||||
|
|
||||||
|
test('returns gray for unknown column', () => {
|
||||||
|
expect(getColumnColor('unknown')).toBe('#374151')
|
||||||
|
})
|
||||||
|
|
||||||
|
test('uses custom colors from options', () => {
|
||||||
|
const options = { englishColor: '#ff0000' }
|
||||||
|
expect(getColumnColor('english', options)).toBe('#ff0000')
|
||||||
|
})
|
||||||
|
})
|
||||||
|
|
||||||
|
describe('createTextProps', () => {
|
||||||
|
const mockWord: OCRWord = {
|
||||||
|
text: 'house',
|
||||||
|
x_mm: 21.0,
|
||||||
|
y_mm: 44.55,
|
||||||
|
width_mm: 52.5,
|
||||||
|
height_mm: 8.91,
|
||||||
|
column_type: 'english',
|
||||||
|
logical_row: 0,
|
||||||
|
}
|
||||||
|
|
||||||
|
test('creates correct type', () => {
|
||||||
|
const props = createTextProps(mockWord)
|
||||||
|
expect(props.type).toBe('i-text')
|
||||||
|
})
|
||||||
|
|
||||||
|
test('converts mm to pixels for left position', () => {
|
||||||
|
const props = createTextProps(mockWord)
|
||||||
|
expect(props.left).toBeCloseTo(21.0 * MM_TO_PX, 2)
|
||||||
|
})
|
||||||
|
|
||||||
|
test('converts mm to pixels for top position', () => {
|
||||||
|
const props = createTextProps(mockWord)
|
||||||
|
expect(props.top).toBeCloseTo(44.55 * MM_TO_PX, 2)
|
||||||
|
})
|
||||||
|
|
||||||
|
test('applies offset correctly', () => {
|
||||||
|
const props = createTextProps(mockWord, { offsetX: 5, offsetY: 10 })
|
||||||
|
expect(props.left).toBeCloseTo((21.0 + 5) * MM_TO_PX, 2)
|
||||||
|
expect(props.top).toBeCloseTo((44.55 + 10) * MM_TO_PX, 2)
|
||||||
|
})
|
||||||
|
|
||||||
|
test('sets fill color based on column type', () => {
|
||||||
|
const props = createTextProps(mockWord)
|
||||||
|
expect(props.fill).toBe('#1e40af') // English blue
|
||||||
|
})
|
||||||
|
|
||||||
|
test('includes OCR metadata', () => {
|
||||||
|
const props = createTextProps(mockWord)
|
||||||
|
expect(props.ocrMetadata).toBeDefined()
|
||||||
|
expect((props.ocrMetadata as any).x_mm).toBe(21.0)
|
||||||
|
expect((props.ocrMetadata as any).column_type).toBe('english')
|
||||||
|
expect((props.ocrMetadata as any).logical_row).toBe(0)
|
||||||
|
})
|
||||||
|
|
||||||
|
test('uses custom font family', () => {
|
||||||
|
const props = createTextProps(mockWord, { fontFamily: 'Times New Roman' })
|
||||||
|
expect(props.fontFamily).toBe('Times New Roman')
|
||||||
|
})
|
||||||
|
|
||||||
|
test('uses custom font size', () => {
|
||||||
|
const props = createTextProps(mockWord, { fontSize: 16 })
|
||||||
|
expect(props.fontSize).toBe(16)
|
||||||
|
})
|
||||||
|
})
|
||||||
|
|
||||||
|
describe('exportOCRData', () => {
|
||||||
|
const mockGridData = {
|
||||||
|
cells: [
|
||||||
|
[
|
||||||
|
{
|
||||||
|
text: 'house',
|
||||||
|
x_mm: 21.0,
|
||||||
|
y_mm: 44.55,
|
||||||
|
width_mm: 52.5,
|
||||||
|
height_mm: 8.91,
|
||||||
|
column_type: 'english' as ColumnType,
|
||||||
|
logical_row: 0,
|
||||||
|
status: 'recognized',
|
||||||
|
},
|
||||||
|
{
|
||||||
|
text: 'Haus',
|
||||||
|
x_mm: 80.0,
|
||||||
|
y_mm: 44.55,
|
||||||
|
width_mm: 40.0,
|
||||||
|
height_mm: 8.91,
|
||||||
|
column_type: 'german' as ColumnType,
|
||||||
|
logical_row: 0,
|
||||||
|
status: 'recognized',
|
||||||
|
},
|
||||||
|
],
|
||||||
|
],
|
||||||
|
detected_columns: [
|
||||||
|
{ column_type: 'english', x_start_mm: 20.0, x_end_mm: 73.5 },
|
||||||
|
{ column_type: 'german', x_start_mm: 74.0, x_end_mm: 140.0 },
|
||||||
|
],
|
||||||
|
page_dimensions: {
|
||||||
|
width_mm: 210,
|
||||||
|
height_mm: 297,
|
||||||
|
format: 'A4',
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
test('creates correct version', () => {
|
||||||
|
const result = exportOCRData(mockGridData, 'session-123', 1)
|
||||||
|
expect(result.version).toBe('1.0')
|
||||||
|
})
|
||||||
|
|
||||||
|
test('sets correct source', () => {
|
||||||
|
const result = exportOCRData(mockGridData, 'session-123', 1)
|
||||||
|
expect(result.source).toBe('ocr-compare')
|
||||||
|
})
|
||||||
|
|
||||||
|
test('includes session ID and page number', () => {
|
||||||
|
const result = exportOCRData(mockGridData, 'session-123', 1)
|
||||||
|
expect(result.session_id).toBe('session-123')
|
||||||
|
expect(result.page_number).toBe(1)
|
||||||
|
})
|
||||||
|
|
||||||
|
test('includes page dimensions', () => {
|
||||||
|
const result = exportOCRData(mockGridData, 'session-123', 1)
|
||||||
|
expect(result.page_dimensions.width_mm).toBe(210)
|
||||||
|
expect(result.page_dimensions.height_mm).toBe(297)
|
||||||
|
expect(result.page_dimensions.format).toBe('A4')
|
||||||
|
})
|
||||||
|
|
||||||
|
test('converts cells to words', () => {
|
||||||
|
const result = exportOCRData(mockGridData, 'session-123', 1)
|
||||||
|
expect(result.words).toHaveLength(2)
|
||||||
|
expect(result.words[0].text).toBe('house')
|
||||||
|
expect(result.words[0].column_type).toBe('english')
|
||||||
|
})
|
||||||
|
|
||||||
|
test('filters empty cells', () => {
|
||||||
|
const dataWithEmpty = {
|
||||||
|
...mockGridData,
|
||||||
|
cells: [
|
||||||
|
[
|
||||||
|
...mockGridData.cells[0],
|
||||||
|
{ text: '', status: 'empty' }, // Empty cell
|
||||||
|
],
|
||||||
|
],
|
||||||
|
}
|
||||||
|
const result = exportOCRData(dataWithEmpty, 'session-123', 1)
|
||||||
|
expect(result.words).toHaveLength(2) // Empty cell excluded
|
||||||
|
})
|
||||||
|
|
||||||
|
test('includes detected columns', () => {
|
||||||
|
const result = exportOCRData(mockGridData, 'session-123', 1)
|
||||||
|
expect(result.detected_columns).toHaveLength(2)
|
||||||
|
expect(result.detected_columns[0].column_type).toBe('english')
|
||||||
|
})
|
||||||
|
|
||||||
|
test('sets exported_at timestamp', () => {
|
||||||
|
const before = new Date().toISOString()
|
||||||
|
const result = exportOCRData(mockGridData, 'session-123', 1)
|
||||||
|
const after = new Date().toISOString()
|
||||||
|
|
||||||
|
expect(result.exported_at >= before).toBe(true)
|
||||||
|
expect(result.exported_at <= after).toBe(true)
|
||||||
|
})
|
||||||
|
})
|
||||||
|
|
||||||
|
describe('localStorage operations', () => {
|
||||||
|
beforeEach(() => {
|
||||||
|
localStorageMock.clear()
|
||||||
|
})
|
||||||
|
|
||||||
|
const mockExportData: OCRExportData = {
|
||||||
|
version: '1.0',
|
||||||
|
source: 'ocr-compare',
|
||||||
|
exported_at: '2026-02-08T12:00:00Z',
|
||||||
|
session_id: 'session-123',
|
||||||
|
page_number: 1,
|
||||||
|
page_dimensions: {
|
||||||
|
width_mm: 210,
|
||||||
|
height_mm: 297,
|
||||||
|
format: 'A4',
|
||||||
|
},
|
||||||
|
words: [
|
||||||
|
{
|
||||||
|
text: 'house',
|
||||||
|
x_mm: 21.0,
|
||||||
|
y_mm: 44.55,
|
||||||
|
width_mm: 52.5,
|
||||||
|
height_mm: 8.91,
|
||||||
|
column_type: 'english',
|
||||||
|
logical_row: 0,
|
||||||
|
},
|
||||||
|
],
|
||||||
|
detected_columns: [],
|
||||||
|
}
|
||||||
|
|
||||||
|
describe('saveOCRExportToStorage', () => {
|
||||||
|
test('saves data to localStorage', () => {
|
||||||
|
saveOCRExportToStorage(mockExportData)
|
||||||
|
|
||||||
|
expect(localStorageMock.setItem).toHaveBeenCalledWith(
|
||||||
|
'ocr_export_session-123_1',
|
||||||
|
expect.any(String)
|
||||||
|
)
|
||||||
|
})
|
||||||
|
|
||||||
|
test('sets latest export key', () => {
|
||||||
|
saveOCRExportToStorage(mockExportData)
|
||||||
|
|
||||||
|
expect(localStorageMock.setItem).toHaveBeenCalledWith(
|
||||||
|
'ocr_export_latest',
|
||||||
|
'ocr_export_session-123_1'
|
||||||
|
)
|
||||||
|
})
|
||||||
|
})
|
||||||
|
|
||||||
|
describe('loadLatestOCRExport', () => {
|
||||||
|
test('returns null when no export exists', () => {
|
||||||
|
const result = loadLatestOCRExport()
|
||||||
|
expect(result).toBeNull()
|
||||||
|
})
|
||||||
|
|
||||||
|
test('loads latest export data', () => {
|
||||||
|
// Manually set up the mock
|
||||||
|
localStorageMock.setItem(
|
||||||
|
'ocr_export_session-123_1',
|
||||||
|
JSON.stringify(mockExportData)
|
||||||
|
)
|
||||||
|
localStorageMock.setItem('ocr_export_latest', 'ocr_export_session-123_1')
|
||||||
|
|
||||||
|
// Reset the mock to return correct values
|
||||||
|
localStorageMock.getItem.mockImplementation((key: string) => {
|
||||||
|
if (key === 'ocr_export_latest') return 'ocr_export_session-123_1'
|
||||||
|
if (key === 'ocr_export_session-123_1')
|
||||||
|
return JSON.stringify(mockExportData)
|
||||||
|
return null
|
||||||
|
})
|
||||||
|
|
||||||
|
const result = loadLatestOCRExport()
|
||||||
|
expect(result).not.toBeNull()
|
||||||
|
expect(result?.session_id).toBe('session-123')
|
||||||
|
})
|
||||||
|
})
|
||||||
|
|
||||||
|
describe('loadOCRExport', () => {
|
||||||
|
test('returns null for non-existent session', () => {
|
||||||
|
const result = loadOCRExport('nonexistent', 1)
|
||||||
|
expect(result).toBeNull()
|
||||||
|
})
|
||||||
|
|
||||||
|
test('loads specific export by session and page', () => {
|
||||||
|
localStorageMock.getItem.mockImplementation((key: string) => {
|
||||||
|
if (key === 'ocr_export_session-123_1')
|
||||||
|
return JSON.stringify(mockExportData)
|
||||||
|
return null
|
||||||
|
})
|
||||||
|
|
||||||
|
const result = loadOCRExport('session-123', 1)
|
||||||
|
expect(result).not.toBeNull()
|
||||||
|
expect(result?.page_number).toBe(1)
|
||||||
|
})
|
||||||
|
|
||||||
|
test('handles JSON parse errors gracefully', () => {
|
||||||
|
localStorageMock.getItem.mockImplementation((key: string) => {
|
||||||
|
if (key === 'ocr_export_session-123_1') return 'invalid json'
|
||||||
|
return null
|
||||||
|
})
|
||||||
|
|
||||||
|
const result = loadOCRExport('session-123', 1)
|
||||||
|
expect(result).toBeNull()
|
||||||
|
})
|
||||||
|
})
|
||||||
|
|
||||||
|
describe('clearOCRExports', () => {
|
||||||
|
test('removes all OCR export keys', () => {
|
||||||
|
// Set up mock to return keys
|
||||||
|
Object.defineProperty(localStorageMock, 'keys', {
|
||||||
|
value: () => [
|
||||||
|
'ocr_export_session-1_1',
|
||||||
|
'ocr_export_session-2_1',
|
||||||
|
'ocr_export_latest',
|
||||||
|
'other_key',
|
||||||
|
],
|
||||||
|
})
|
||||||
|
|
||||||
|
// Mock Object.keys(localStorage)
|
||||||
|
const originalKeys = Object.keys
|
||||||
|
Object.keys = jest.fn((obj) => {
|
||||||
|
if (obj === localStorage) {
|
||||||
|
return [
|
||||||
|
'ocr_export_session-1_1',
|
||||||
|
'ocr_export_session-2_1',
|
||||||
|
'ocr_export_latest',
|
||||||
|
'other_key',
|
||||||
|
]
|
||||||
|
}
|
||||||
|
return originalKeys(obj)
|
||||||
|
})
|
||||||
|
|
||||||
|
clearOCRExports()
|
||||||
|
|
||||||
|
expect(localStorageMock.removeItem).toHaveBeenCalledWith(
|
||||||
|
'ocr_export_session-1_1'
|
||||||
|
)
|
||||||
|
expect(localStorageMock.removeItem).toHaveBeenCalledWith(
|
||||||
|
'ocr_export_session-2_1'
|
||||||
|
)
|
||||||
|
expect(localStorageMock.removeItem).toHaveBeenCalledWith(
|
||||||
|
'ocr_export_latest'
|
||||||
|
)
|
||||||
|
|
||||||
|
// Restore Object.keys
|
||||||
|
Object.keys = originalKeys
|
||||||
|
})
|
||||||
|
})
|
||||||
|
})
|
||||||
|
|
||||||
|
describe('Edge Cases', () => {
|
||||||
|
test('handles negative mm values', () => {
|
||||||
|
const pixels = mmToPixel(-10)
|
||||||
|
expect(pixels).toBeCloseTo(-37.795, 2)
|
||||||
|
})
|
||||||
|
|
||||||
|
test('handles very large mm values', () => {
|
||||||
|
const pixels = mmToPixel(10000)
|
||||||
|
expect(pixels).toBeCloseTo(37795.275591, 2)
|
||||||
|
})
|
||||||
|
|
||||||
|
test('handles word with missing optional fields', () => {
|
||||||
|
const word: OCRWord = {
|
||||||
|
text: 'test',
|
||||||
|
x_mm: 0,
|
||||||
|
y_mm: 0,
|
||||||
|
width_mm: 10,
|
||||||
|
height_mm: 5,
|
||||||
|
column_type: 'unknown',
|
||||||
|
logical_row: 0,
|
||||||
|
}
|
||||||
|
const props = createTextProps(word)
|
||||||
|
expect(props).toBeDefined()
|
||||||
|
expect(props.text).toBe('test')
|
||||||
|
})
|
||||||
|
|
||||||
|
test('handles empty words array in export', () => {
|
||||||
|
const gridData = {
|
||||||
|
cells: [],
|
||||||
|
detected_columns: [],
|
||||||
|
page_dimensions: { width_mm: 210, height_mm: 297, format: 'A4' },
|
||||||
|
}
|
||||||
|
const result = exportOCRData(gridData, 'session', 1)
|
||||||
|
expect(result.words).toHaveLength(0)
|
||||||
|
})
|
||||||
|
})
|
||||||
Reference in New Issue
Block a user