6f58fdbaa58be81f2417792931b03b3fe83ed361
3 test levels: Real-World Benchmarks (10 DE websites), Adversarial Tests (30 tricky cases), Regression Harness (CI/CD quality gate). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Description
No description provided
Languages
Python
38.3%
TypeScript
37.8%
Go
18.9%
HTML
3.2%
Shell
0.7%
Other
1.1%