Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops
Paper • 2606.08960 • Published • 1
None defined yet.
Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops
K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts