Skip garbled IPA text in single-cell heading detection
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 24s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m47s
CI / test-python-agent-core (push) Successful in 14s
CI / test-nodejs-website (push) Successful in 15s
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 24s
CI / test-go-edu-search (push) Successful in 26s
CI / test-python-klausur (push) Failing after 1m47s
CI / test-python-agent-core (push) Successful in 14s
CI / test-nodejs-website (push) Successful in 15s
Unbracketed IPA continuations like "ska:f – ska:vz" were falsely detected as headings. Now _text_has_garbled_ipa() filters them out. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -736,6 +736,9 @@ def _detect_heading_rows_by_single_cell(
|
|||||||
text = (cell.get("text") or "").strip()
|
text = (cell.get("text") or "").strip()
|
||||||
if not text or text.startswith("["):
|
if not text or text.startswith("["):
|
||||||
continue
|
continue
|
||||||
|
# Skip garbled IPA without brackets (e.g. "ska:f – ska:vz")
|
||||||
|
if _text_has_garbled_ipa(text):
|
||||||
|
continue
|
||||||
heading_row_indices.append(ri)
|
heading_row_indices.append(ri)
|
||||||
|
|
||||||
for hri in heading_row_indices:
|
for hri in heading_row_indices:
|
||||||
|
|||||||
Reference in New Issue
Block a user