Google's Gemini 3.1 Pro Preview Dominates AI Index at a Fraction of Competitors' Costs

Google's Gemini 3.1 Pro Preview has secured the top position in the Artificial Analysis Intelligence Index, surpassing its competitors by a margin of four points while being significantly more cost-effective. This model excels in six out of ten evaluated categories, including agent-based coding, knowledge, scientific reasoning, and physics. Notably, it has achieved a substantial reduction in its hallucination rate, decreasing by 38 percentage points compared to its predecessor, Gemini 3 Pro, which had previously struggled in this area.

The Artificial Analysis Intelligence Index consolidates ten benchmarks into a single overall score. Gemini 3.1 Pro Preview achieved a score of 57 points, outpacing Anthropic's Claude Opus 4.6 by four points and GPT-5.2 by six points. Testing the full index with Gemini incurs a cost of $892, in stark contrast to the $2,304 required for GPT-5.2 and $2,486 for Claude Opus 4.6. Gemini efficiently uses 57 million tokens, considerably fewer than the 130 million tokens needed by GPT-5.2. Meanwhile, open-source alternatives like GLM-5 are even more economical, costing $547.

However, despite its impressive performance in benchmarks, Gemini 3.1 Pro falls short in real-world agent tasks compared to Claude Sonnet 4.6, Opus 4.6, and GPT-5.2. Additionally, in internal fact-checking trials, Gemini 3.1 Pro verified only about a quarter of statements, performing worse than Opus 4.6 and GPT-5.2, and even trailing behind the earlier Gemini 3 Pro version.

While benchmarks provide valuable insights, they have their limitations, and individual assessments may vary.