Files
breakpilot-lehrer/klausur-service/backend/crawler/github.py
Benjamin Admin d093a4d388
Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 28s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m23s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 19s
Restructure: Move final 12 root files into packages (klausur-service)
ocr/spell/  (3): smart_spell, core, text
upload/     (3): api, chunked, mobile
crawler/    (3): github, github_core, github_parsers
+ unified_grid → grid/, tesseract_extractor → ocr/engines/, htr_api → ocr/pipeline/

12 shims added. Only main.py, config.py, storage + RAG files remain at root.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-25 23:19:11 +02:00

36 lines
774 B
Python

"""
GitHub Repository Crawler — Barrel Re-export
Split into:
- github_crawler_parsers.py — ExtractedDocument, MarkdownParser, HTMLParser, JSONParser
- github_crawler_core.py — GitHubCrawler, RepositoryDownloader, crawl_source
All public names are re-exported here for backward compatibility.
"""
# Parsers
from .github_parsers import ( # noqa: F401
ExtractedDocument,
MarkdownParser,
HTMLParser,
JSONParser,
)
# Crawler and downloader
from .github_core import ( # noqa: F401
GITHUB_API_URL,
GITLAB_API_URL,
GITHUB_TOKEN,
MAX_FILE_SIZE,
REQUEST_TIMEOUT,
RATE_LIMIT_DELAY,
GitHubCrawler,
RepositoryDownloader,
crawl_source,
main,
)
if __name__ == "__main__":
import asyncio
asyncio.run(main())