AI Benchmarks Reveal Significant Vulnerability Exploitation in Smart Contracts

In a groundbreaking development, OpenAI and crypto investment firm Paradigm have introduced EVMbench, a new benchmark designed to evaluate the proficiency of AI agents in identifying, fixing, and exploiting security vulnerabilities within Ethereum smart contracts. This dataset encompasses 120 vulnerabilities, sourced from 40 real-world security audits, providing a comprehensive test for AI capabilities.

The most advanced test scenario requires AI agents to autonomously interact with a local blockchain to execute attacks. Among the AI models tested, GPT-5.3-Codex demonstrated impressive results, successfully exploiting 72% of the vulnerabilities and resolving 41.5% of them. In terms of vulnerability detection, Claude Opus 4.6 led the pack with a 45.6% success rate.

According to the researchers, the primary challenge for AI agents lies in detecting vulnerabilities within extensive codebases. When agents received hints regarding the locations of vulnerabilities, the success rates for exploitation surged from 63% to 96%, while fix rates increased from 39% to 94%.

Considering the massive $100 billion locked in smart contracts, the study highlights both an opportunity to enhance security and a potential threat if such AI capabilities are misused. These findings underscore the critical importance of safeguarding smart contract systems against advanced AI exploitation.