feat: Add Document Crawler & Auto-Onboarding service (Phase 1.4)

New standalone Python/FastAPI service for automatic compliance document
scanning, LLM-based classification, IPFS archival, and gap analysis.
Includes extractors (PDF, DOCX, XLSX, PPTX), keyword fallback classifier,
compliance matrix, and full REST API on port 8098.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This commit is contained in:

Benjamin Boenisch

2026-02-13 20:35:15 +01:00

parent 0923c03756

commit 364d2c69ff

34 changed files with 1633 additions and 0 deletions

									
										document-crawler/extractors/__init__.py
									
		+1
		
												View File
												
				@@ -0,0 +1 @@

				from .dispatcher import extract_text

				`@@ -0,0 +1 @@`
				`from .dispatcher import extract_text`