Files
breakpilot-lehrer/klausur-service/backend/nru_worksheet_models.py
Benjamin Admin bd4b956e3c [split-required] Split final 43 files (500-668 LOC) to complete refactoring
klausur-service (11 files):
- cv_gutter_repair, ocr_pipeline_regression, upload_api
- ocr_pipeline_sessions, smart_spell, nru_worksheet_generator
- ocr_pipeline_overlays, mail/aggregator, zeugnis_api
- cv_syllable_detect, self_rag

backend-lehrer (17 files):
- classroom_engine/suggestions, generators/quiz_generator
- worksheets_api, llm_gateway/comparison, state_engine_api
- classroom/models (→ 4 submodules), services/file_processor
- alerts_agent/api/wizard+digests+routes, content_generators/pdf
- classroom/routes/sessions, llm_gateway/inference
- classroom_engine/analytics, auth/keycloak_auth
- alerts_agent/processing/rule_engine, ai_processor/print_versions

agent-core (5 files):
- brain/memory_store, brain/knowledge_graph, brain/context_manager
- orchestrator/supervisor, sessions/session_manager

admin-lehrer (5 components):
- GridOverlay, StepGridReview, DevOpsPipelineSidebar
- DataFlowDiagram, sbom/wizard/page

website (2 files):
- DependencyMap, lehrer/abitur-archiv

Other: nibis_ingestion, grid_detection_service, export-doclayout-onnx

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 09:41:42 +02:00

71 lines
1.8 KiB
Python

"""
NRU Worksheet Models — data classes and entry separation logic.
Extracted from nru_worksheet_generator.py for modularity.
"""
import logging
from typing import List, Dict, Tuple
from dataclasses import dataclass
logger = logging.getLogger(__name__)
@dataclass
class VocabEntry:
english: str
german: str
source_page: int = 1
@dataclass
class SentenceEntry:
german: str
english: str # For solution sheet
source_page: int = 1
def separate_vocab_and_sentences(entries: List[Dict]) -> Tuple[List[VocabEntry], List[SentenceEntry]]:
"""
Separate vocabulary entries into single words/phrases and full sentences.
Sentences are identified by:
- Ending with punctuation (. ! ?)
- Being longer than 40 characters
- Containing multiple words with capital letters mid-sentence
"""
vocab_list = []
sentence_list = []
for entry in entries:
english = entry.get("english", "").strip()
german = entry.get("german", "").strip()
source_page = entry.get("source_page", 1)
if not english or not german:
continue
# Detect if this is a sentence
is_sentence = (
english.endswith('.') or
english.endswith('!') or
english.endswith('?') or
len(english) > 50 or
(len(english.split()) > 5 and any(w[0].isupper() for w in english.split()[1:] if w))
)
if is_sentence:
sentence_list.append(SentenceEntry(
german=german,
english=english,
source_page=source_page
))
else:
vocab_list.append(VocabEntry(
english=english,
german=german,
source_page=source_page
))
return vocab_list, sentence_list