Files
breakpilot-compliance/scripts/qa/db_status.py
Benjamin Admin 92d37a1660
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Failing after 33s
CI/CD / test-python-backend-compliance (push) Successful in 32s
CI/CD / test-python-document-crawler (push) Successful in 20s
CI/CD / test-python-dsms-gateway (push) Successful in 16s
CI/CD / validate-canonical-controls (push) Successful in 11s
CI/CD / Deploy (push) Has been skipped
chore(qa): preamble vs article dedup — 190 duplicates marked
Preamble controls that duplicate article controls (same regulation,
Jaccard title similarity >= 0.40) are marked as duplicate.
Article controls always take priority.

Result: 6,183 active controls (was 6,373), 648 unique preamble controls remain.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-20 08:08:04 +01:00

36 lines
1.2 KiB
Python

"""Quick DB status check."""
import os, psycopg2, urllib.parse
db_url = os.environ['DATABASE_URL']
parsed = urllib.parse.urlparse(db_url)
conn = psycopg2.connect(host=parsed.hostname, port=parsed.port or 5432, user=parsed.username, password=parsed.password, dbname=parsed.path.lstrip('/'), options="-c search_path=compliance,public")
cur = conn.cursor()
cur.execute("""
SELECT release_state, count(*) FROM compliance.canonical_controls
GROUP BY 1 ORDER BY count(*) DESC
""")
total = 0
active = 0
print("Release state distribution:")
for row in cur.fetchall():
print(f" {str(row[0]):15s} {row[1]:6d}")
total += row[1]
if row[0] not in ('duplicate', 'too_close', 'deprecated'):
active += row[1]
print(f" {'TOTAL':15s} {total:6d}")
print(f" {'ACTIVE':15s} {active:6d}")
# Article type distribution for active controls
cur.execute("""
SELECT source_citation->>'article_type', count(*)
FROM compliance.canonical_controls
WHERE release_state NOT IN ('duplicate', 'too_close', 'deprecated')
AND source_citation->>'article_type' IS NOT NULL
GROUP BY 1 ORDER BY count(*) DESC
""")
print(f"\nArticle types (active controls):")
for row in cur.fetchall():
print(f" {str(row[0]):12s} {row[1]:5d}")
conn.close()