feat: Add Document Crawler & Auto-Onboarding service (Phase 1.4)
New standalone Python/FastAPI service for automatic compliance document scanning, LLM-based classification, IPFS archival, and gap analysis. Includes extractors (PDF, DOCX, XLSX, PPTX), keyword fallback classifier, compliance matrix, and full REST API on port 8098. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
19
document-crawler/requirements.txt
Normal file
19
document-crawler/requirements.txt
Normal file
@@ -0,0 +1,19 @@
|
||||
# Core
|
||||
fastapi>=0.104.0
|
||||
uvicorn>=0.24.0
|
||||
httpx>=0.25.0
|
||||
pydantic>=2.5.0
|
||||
python-multipart>=0.0.6
|
||||
|
||||
# Database
|
||||
asyncpg>=0.29.0
|
||||
|
||||
# Document extraction
|
||||
PyMuPDF>=1.23.0
|
||||
python-docx>=1.1.0
|
||||
openpyxl>=3.1.0
|
||||
python-pptx>=0.6.21
|
||||
|
||||
# Testing
|
||||
pytest>=7.4.0
|
||||
pytest-asyncio>=0.21.0
|
||||
Reference in New Issue
Block a user