chore(qa): preamble vs article dedup — 190 duplicates marked
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Failing after 33s
CI/CD / test-python-backend-compliance (push) Successful in 32s
CI/CD / test-python-document-crawler (push) Successful in 20s
CI/CD / test-python-dsms-gateway (push) Successful in 16s
CI/CD / validate-canonical-controls (push) Successful in 11s
CI/CD / Deploy (push) Has been skipped
Some checks failed
CI/CD / go-lint (push) Has been skipped
CI/CD / python-lint (push) Has been skipped
CI/CD / nodejs-lint (push) Has been skipped
CI/CD / test-go-ai-compliance (push) Failing after 33s
CI/CD / test-python-backend-compliance (push) Successful in 32s
CI/CD / test-python-document-crawler (push) Successful in 20s
CI/CD / test-python-dsms-gateway (push) Successful in 16s
CI/CD / validate-canonical-controls (push) Successful in 11s
CI/CD / Deploy (push) Has been skipped
Preamble controls that duplicate article controls (same regulation, Jaccard title similarity >= 0.40) are marked as duplicate. Article controls always take priority. Result: 6,183 active controls (was 6,373), 648 unique preamble controls remain. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
35
scripts/qa/db_status.py
Normal file
35
scripts/qa/db_status.py
Normal file
@@ -0,0 +1,35 @@
|
||||
"""Quick DB status check."""
|
||||
import os, psycopg2, urllib.parse
|
||||
db_url = os.environ['DATABASE_URL']
|
||||
parsed = urllib.parse.urlparse(db_url)
|
||||
conn = psycopg2.connect(host=parsed.hostname, port=parsed.port or 5432, user=parsed.username, password=parsed.password, dbname=parsed.path.lstrip('/'), options="-c search_path=compliance,public")
|
||||
cur = conn.cursor()
|
||||
|
||||
cur.execute("""
|
||||
SELECT release_state, count(*) FROM compliance.canonical_controls
|
||||
GROUP BY 1 ORDER BY count(*) DESC
|
||||
""")
|
||||
total = 0
|
||||
active = 0
|
||||
print("Release state distribution:")
|
||||
for row in cur.fetchall():
|
||||
print(f" {str(row[0]):15s} {row[1]:6d}")
|
||||
total += row[1]
|
||||
if row[0] not in ('duplicate', 'too_close', 'deprecated'):
|
||||
active += row[1]
|
||||
print(f" {'TOTAL':15s} {total:6d}")
|
||||
print(f" {'ACTIVE':15s} {active:6d}")
|
||||
|
||||
# Article type distribution for active controls
|
||||
cur.execute("""
|
||||
SELECT source_citation->>'article_type', count(*)
|
||||
FROM compliance.canonical_controls
|
||||
WHERE release_state NOT IN ('duplicate', 'too_close', 'deprecated')
|
||||
AND source_citation->>'article_type' IS NOT NULL
|
||||
GROUP BY 1 ORDER BY count(*) DESC
|
||||
""")
|
||||
print(f"\nArticle types (active controls):")
|
||||
for row in cur.fetchall():
|
||||
print(f" {str(row[0]):12s} {row[1]:5d}")
|
||||
|
||||
conn.close()
|
||||
Reference in New Issue
Block a user