Benjamin_Boenisch
  • Joined on 2026-02-07
Benjamin_Boenisch pushed to main at Benjamin_Boenisch/breakpilot-lehrer 2026-03-20 10:45:37 +00:00
1f7989cfc2 Fix grammar bracket detection: split on spaces too, not just slashes
Benjamin_Boenisch pushed to main at Benjamin_Boenisch/breakpilot-lehrer 2026-03-20 10:42:54 +00:00
ef5aed6a98 Preserve grammar annotations (pl), (no pl) and skip articles in IPA
Benjamin_Boenisch pushed to main at Benjamin_Boenisch/breakpilot-lehrer 2026-03-20 10:01:24 +00:00
7dc00e737a Add footer row label (F) in grid editor, matching header (H) style
Benjamin_Boenisch pushed to main at Benjamin_Boenisch/breakpilot-lehrer 2026-03-20 09:43:25 +00:00
a579c31ddb Fix IPA continuation: skip words with inline IPA, recover emptied cells
0f9c0d2ad0 Keep footer rows in table, mark with is_footer + col_type=footer
278067fe20 Fix page_ref extraction: only extract cells matching page-ref pattern
d76fb2a9c8 Fix page_ref + footer extraction: extract individual cells, skip IPA footers
Compare 4 commits »
Benjamin_Boenisch pushed to main at Benjamin_Boenisch/breakpilot-compliance 2026-03-20 08:12:21 +00:00
95c371e9a5 feat(sdk): update SDK Flow, Architecture, and StepHeader for vendor-compliance integration
Benjamin_Boenisch pushed to main at Benjamin_Boenisch/breakpilot-compliance 2026-03-20 07:55:28 +00:00
b1627252ee fix(obligations): show linked vendor IDs in Pflichtenregister document
Benjamin_Boenisch pushed to main at Benjamin_Boenisch/breakpilot-lehrer 2026-03-20 07:42:57 +00:00
9681fcbd05 Strip IPA from headings + extract page_refs and footer from table
Benjamin_Boenisch pushed to main at Benjamin_Boenisch/breakpilot-lehrer 2026-03-20 07:30:47 +00:00
4290f70885 Fix unbracketed IPA continuations: detect garbled IPA in single-cell rows
Benjamin_Boenisch pushed to main at Benjamin_Boenisch/breakpilot-compliance 2026-03-20 07:16:10 +00:00
2a0449c9b7 docs(qa): add Control Quality Pipeline documentation
Benjamin_Boenisch pushed to main at Benjamin_Boenisch/breakpilot-lehrer 2026-03-20 07:15:55 +00:00
5c935eec23 Refine garbled IPA filter: skip only pure-ASCII garbled text, not text with real IPA
Benjamin_Boenisch pushed to main at Benjamin_Boenisch/breakpilot-lehrer 2026-03-20 07:11:03 +00:00
c4a5cd2d8a Skip garbled IPA text in single-cell heading detection
Benjamin_Boenisch pushed to main at Benjamin_Boenisch/breakpilot-compliance 2026-03-20 07:08:06 +00:00
92d37a1660 chore(qa): preamble vs article dedup — 190 duplicates marked
Benjamin_Boenisch pushed to main at Benjamin_Boenisch/breakpilot-lehrer 2026-03-20 07:06:15 +00:00
bc5ab29c06 Fix false positive: exclude first/last rows from single-cell heading detection
Benjamin_Boenisch pushed to main at Benjamin_Boenisch/breakpilot-lehrer 2026-03-20 07:00:18 +00:00
7c5d95b858 Fix heading col_index + detect black single-cell headings like "Theme"
Benjamin_Boenisch pushed to main at Benjamin_Boenisch/breakpilot-compliance 2026-03-20 06:58:02 +00:00
0e16640c28 chore(qa): PDF QA v3 — 6,259/7,943 controls matched (79%)
24f02b52ed refactor: remove 473 lines of dead code across 5 SDK modules
Compare 2 commits »
Benjamin_Boenisch pushed to main at Benjamin_Boenisch/breakpilot-lehrer 2026-03-20 06:05:24 +00:00
65059471cf Update OCR Pipeline docs: Grid Editor v4.7.0 with zone merging, heading detection, IPA fixes
Benjamin_Boenisch pushed to main at Benjamin_Boenisch/breakpilot-lehrer 2026-03-20 05:51:03 +00:00
58c9565ba5 Fix en_col_type detection: use bracket IPA count instead of longest avg text
Benjamin_Boenisch pushed to main at Benjamin_Boenisch/breakpilot-compliance 2026-03-19 23:56:22 +00:00
9b0f25c105 chore(qa): add PDF-based control QA scripts and results
Benjamin_Boenisch pushed to main at Benjamin_Boenisch/breakpilot-lehrer 2026-03-19 23:43:59 +00:00
92a7b85c2d Fix IPA continuation: only process fully-bracketed cells, keep phrasal verb particles
Benjamin_Boenisch pushed to main at Benjamin_Boenisch/breakpilot-lehrer 2026-03-19 22:34:54 +00:00
5f89913a9a Fix IPA continuation to check all columns, not just en_col_type