
9 Benchmarks: What Happens When a Coding Agent Can Search Research Papers
We ran 9 tasks - writing tests, generating SQL, reviewing PRs, classifying text, extracting data from documents. Same coding agent, same data, same evaluation. The only variable: whether the agent could search research papers before choosing its approach.
April 15, 2026

