Non-negotiable structural rules that apply to every Claude Code session in
this repo and to every commit, enforced via three defense-in-depth layers:
1. PreToolUse hook in .claude/settings.json blocks any Write/Edit that
would push a file past the 500-line hard cap. Auto-loads for any
Claude session in this repo regardless of who launched it.
2. scripts/githooks/pre-commit (installed via scripts/install-hooks.sh)
enforces the LOC cap, freezes migrations/ unless [migration-approved],
and protects guardrail files unless [guardrail-change] is present.
3. .gitea/workflows/ci.yaml gets loc-budget + guardrail-integrity jobs,
plus mypy --strict on new Python packages, tsc --noEmit on Node
services, and a syft+grype SBOM scan.
Per-language conventions are documented in AGENTS.python.md / AGENTS.go.md /
AGENTS.typescript.md at the repo root — layering (router->service->repo for
Python, hexagonal for Go, colocation for Next.js), tooling baseline, and
explicit "what you may NOT do" lists.
Adds scripts/check-loc.sh (soft 300 / hard 500, reports 205 hard and 161
soft violations in the current codebase) plus .claude/rules/loc-exceptions.txt
(initially empty — the list is designed to shrink over time).
Per-service READMEs for all 10 services + PHASE1_RUNBOOK.md for the
backend-compliance refactor. Skeleton packages (compliance/{domain,
repositories,schemas}) are the landing zone for the clean-arch rewrite that
begins in Phase 1.
CLAUDE.md is prepended with the six non-negotiable rules.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
29 lines
778 B
Markdown
29 lines
778 B
Markdown
# document-crawler
|
|
|
|
Python/FastAPI service for document ingestion and compliance gap analysis. Parses PDF, DOCX, XLSX, PPTX; runs gap analysis against compliance requirements; coordinates with `ai-compliance-sdk` via the LLM gateway; archives to `dsms-gateway`.
|
|
|
|
**Port:** `8098` (container: `bp-compliance-document-crawler`)
|
|
**Stack:** Python 3.11, FastAPI.
|
|
|
|
## Architecture
|
|
|
|
Small service — already well under the LOC budget. Follow `../AGENTS.python.md` for any additions.
|
|
|
|
## Run locally
|
|
|
|
```bash
|
|
cd document-crawler
|
|
pip install -r requirements.txt
|
|
uvicorn main:app --reload --port 8098
|
|
```
|
|
|
|
## Tests
|
|
|
|
```bash
|
|
pytest tests/ -v
|
|
```
|
|
|
|
## Public API surface
|
|
|
|
`GET /health`, document upload/parse endpoints, gap-analysis endpoints. See the OpenAPI doc at `/docs` when running.
|