596864431be31e5e986326bf9f5f2dacfb93df55
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 47s
CI / test-go-edu-search (push) Successful in 47s
CI / test-python-klausur (push) Failing after 2m42s
CI / test-python-agent-core (push) Successful in 34s
CI / test-nodejs-website (push) Successful in 36s
Instead of keeping only specific symbols (_KEEP_SYMBOLS), now only removes explicitly decorative symbols (_REMOVE_SYMBOLS: > < ~ \ ^ etc). All other punctuation (= ( ) ; : - etc.) is preserved by default. This is more robust: any new symbol used in textbooks will be kept unless it's in the small block-list of known decorative artifacts. Fixes: (= token still being removed on page 5 despite being in the allow-list (possibly due to Unicode variants or whitespace). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…
Description
No description provided
Languages
TypeScript
59.7%
Python
33.6%
Go
5.3%
C#
0.8%
CSS
0.2%
Other
0.3%