fix: Heading detection allows digit-start (e.g. "5. Soziale Medien")
CI / guardrail-integrity (push) Has been skipped
CI / loc-budget (push) Failing after 18s
CI / secret-scan (push) Has been skipped
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
Build + Deploy / build-admin-compliance (push) Successful in 2m23s
Build + Deploy / build-backend-compliance (push) Successful in 3m18s
Build + Deploy / build-ai-sdk (push) Successful in 51s
Build + Deploy / build-developer-portal (push) Successful in 1m10s
Build + Deploy / build-tts (push) Successful in 1m26s
Build + Deploy / build-document-crawler (push) Successful in 41s
Build + Deploy / build-dsms-gateway (push) Successful in 24s
Build + Deploy / build-dsms-node (push) Successful in 10s
CI / branch-name (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go (push) Failing after 54s
CI / test-python-backend (push) Successful in 40s
CI / test-python-document-crawler (push) Successful in 29s
CI / test-python-dsms-gateway (push) Successful in 25s
CI / validate-canonical-controls (push) Successful in 16s
CI / nodejs-build (push) Successful in 3m8s
CI / dep-audit (push) Has been skipped
CI / sbom-scan (push) Has been skipped
Build + Deploy / trigger-orca (push) Successful in 3m24s

Headings starting with numbers (numbered sections like "5. Soziale
Medien", "6. Analyse-Tools") were not detected because the check
required stripped[0].isupper(). Now also accepts stripped[0].isdigit().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Benjamin Admin
2026-05-08 00:16:36 +02:00
parent a708d139ab
commit 21c01d6405
@@ -377,7 +377,7 @@ def _split_into_sections(text: str, parent_label: str, url: str) -> list[dict]:
5 < len(stripped) < 80
and not stripped.endswith(".")
and not stripped.endswith(",")
and stripped[0].isupper()
and (stripped[0].isupper() or stripped[0].isdigit())
)
classified = _classify_section(stripped) if is_heading else None
is_real_heading = is_heading and classified is not None