Some checks failed
CI / go-lint (push) Has been skipped
CI / python-lint (push) Has been skipped
CI / nodejs-lint (push) Has been skipped
CI / test-go-school (push) Successful in 28s
CI / test-go-edu-search (push) Successful in 28s
CI / test-python-klausur (push) Failing after 2m23s
CI / test-python-agent-core (push) Successful in 19s
CI / test-nodejs-website (push) Successful in 19s
ocr/spell/ (3): smart_spell, core, text upload/ (3): api, chunked, mobile crawler/ (3): github, github_core, github_parsers + unified_grid → grid/, tesseract_extractor → ocr/engines/, htr_api → ocr/pipeline/ 12 shims added. Only main.py, config.py, storage + RAG files remain at root. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
36 lines
774 B
Python
36 lines
774 B
Python
"""
|
|
GitHub Repository Crawler — Barrel Re-export
|
|
|
|
Split into:
|
|
- github_crawler_parsers.py — ExtractedDocument, MarkdownParser, HTMLParser, JSONParser
|
|
- github_crawler_core.py — GitHubCrawler, RepositoryDownloader, crawl_source
|
|
|
|
All public names are re-exported here for backward compatibility.
|
|
"""
|
|
|
|
# Parsers
|
|
from .github_parsers import ( # noqa: F401
|
|
ExtractedDocument,
|
|
MarkdownParser,
|
|
HTMLParser,
|
|
JSONParser,
|
|
)
|
|
|
|
# Crawler and downloader
|
|
from .github_core import ( # noqa: F401
|
|
GITHUB_API_URL,
|
|
GITLAB_API_URL,
|
|
GITHUB_TOKEN,
|
|
MAX_FILE_SIZE,
|
|
REQUEST_TIMEOUT,
|
|
RATE_LIMIT_DELAY,
|
|
GitHubCrawler,
|
|
RepositoryDownloader,
|
|
crawl_source,
|
|
main,
|
|
)
|
|
|
|
if __name__ == "__main__":
|
|
import asyncio
|
|
asyncio.run(main())
|