6f58fdbaa58be81f2417792931b03b3fe83ed361
3 test levels: Real-World Benchmarks (10 DE websites), Adversarial Tests (30 tricky cases), Regression Harness (CI/CD quality gate). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Description
No description provided
Languages
Python
38.7%
TypeScript
33.8%
Go
22.8%
HTML
2.9%
Shell
0.8%
Other
1%