onlyTrustedInfo.comonlyTrustedInfo.comonlyTrustedInfo.com
Notification
Font ResizerAa
  • News
  • Finance
  • Sports
  • Life
  • Entertainment
  • Tech
Reading: Did xAI lie about Grok 3’s benchmarks?
Share
onlyTrustedInfo.comonlyTrustedInfo.com
Font ResizerAa
  • News
  • Finance
  • Sports
  • Life
  • Entertainment
  • Tech
Search
  • News
  • Finance
  • Sports
  • Life
  • Entertainment
  • Tech
  • Advertise
  • Advertise
© 2025 OnlyTrustedInfo.com . All Rights Reserved.
Tech

Did xAI lie about Grok 3’s benchmarks?

Last updated: February 22, 2025 5:55 pm
Oliver James
Share
22 Min Read
Did xAI lie about Grok 3’s benchmarks?
SHARE

Debates over AI benchmarks — and how they’re reported by AI labs — are spilling out into public view.

This week, an OpenAI employee accused Elon Musk’s AI company, xAI, of publishing misleading benchmark results for its latest AI model, Grok 3. One of the co-founders of xAI, Igor Babushkin, insisted that the company was in the right.

The truth lies somewhere in between.

In a post on xAI’s blog, the company published a graph showing Grok 3’s performance on AIME 2025, a collection of challenging math questions from a recent invitational mathematics exam. Some experts have questioned AIME’s validity as an AI benchmark. Nevertheless, AIME 2025 and older versions of the test are commonly used to probe a model’s math ability.

xAI’s graph showed two variants of Grok 3, Grok 3 Reasoning Beta and Grok 3 mini Reasoning, beating OpenAI’s best-performing available model, o3-mini-high, on AIME 2025. But OpenAI employees on X were quick to point out that xAI’s graph didn’t include o3-mini-high’s AIME 2025 score at “cons@64.”

What is cons@64, you might ask? Well, it’s short for “consensus@64,” and it basically gives a model 64 tries to answer each problem in a benchmark and takes the answers generated most frequently as the final answers. As you can imagine, cons@64 tends to boost models’ benchmark scores quite a bit, and omitting it from a graph might make it appear as though one model surpasses another when in reality, that’s isn’t the case.

Grok 3 Reasoning Beta and Grok 3 mini Reasoning’s scores for AIME 2025 at “@1” — meaning the first score the models got on the benchmark — fall below o3-mini-high’s score. Grok 3 Reasoning Beta also trails ever-so-slightly behind OpenAI’s o1 model set to “medium” computing. Yet xAI is advertising Grok 3 as the “world’s smartest AI.”

Babushkin argued on X that OpenAI has published similarly misleading benchmark charts in the past — albeit charts comparing the performance of its own models. A more neutral party in the debate put together a more “accurate” graph showing nearly every model’s performance at cons@64:

Hilarious how some people see my plot as attack on OpenAI and others as attack on Grok while in reality it’s DeepSeek propaganda
(I actually believe Grok looks good there, and openAI’s TTC chicanery behind o3-mini-*high*-pass@”””1″”” deserves more scrutiny.) https://t.co/dJqlJpcJh8 pic.twitter.com/3WH8FOUfic

— Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) (@teortaxesTex) February 20, 2025

But as AI researcher Nathan Lambert pointed out in a post, perhaps the most important metric remains a mystery: the computational (and monetary) cost it took for each model to achieve its best score. That just goes to show how little most AI benchmarks communicate about models’ limitations — and their strengths.

You Might Also Like

‘I feel like a little kid’: NASA astronaut, 70, reflects on his 220-day mission

Apple won’t be fined by the EU over browser choice after 2nd change

Apple @ Work: Apple Business Manager gains new functionality related to released devices

EU confirms Apple can make a portless iPhone without USB-C

Apple’s recent executive shake-up has me optimistic about the future of Siri

Share This Article
Facebook X Copy Link Print
Share
Previous Article Joseph Parker ends Martin Bakole’s fairy-tale title bid with crushing knockout in second round | Boxing News Joseph Parker ends Martin Bakole’s fairy-tale title bid with crushing knockout in second round | Boxing News
Next Article Trump wants US to ‘get back’ money spent on Ukraine

Latest News

Russia advances to east-central Ukrainian region amid row over dead soldiers
Russia advances to east-central Ukrainian region amid row over dead soldiers
News June 7, 2025
Trump rips ‘incompetent’ Newsom, LA Mayor Bass amid riots over immigration raids, bans protesters from wearing masks
Trump rips ‘incompetent’ Newsom, LA Mayor Bass amid riots over immigration raids, bans protesters from wearing masks
News June 7, 2025
Desperate to get its illegally detained civilians out of Russia, Kyiv offers Ukrainian collaborators in exchange
Desperate to get its illegally detained civilians out of Russia, Kyiv offers Ukrainian collaborators in exchange
News June 7, 2025
GOP looks to win over Collins, Murkowski on Trump bill
GOP looks to win over Collins, Murkowski on Trump bill
News June 7, 2025
//
  • About Us
  • Contact US
  • Privacy Policy
onlyTrustedInfo.comonlyTrustedInfo.com
© 2025 OnlyTrustedInfo.com . All Rights Reserved.