onlyTrustedInfo.comonlyTrustedInfo.comonlyTrustedInfo.com
Notification
Font ResizerAa
  • News
  • Finance
  • Sports
  • Life
  • Entertainment
  • Tech
Reading: Debates over AI benchmarking have reached Pokémon
Share
onlyTrustedInfo.comonlyTrustedInfo.com
Font ResizerAa
  • News
  • Finance
  • Sports
  • Life
  • Entertainment
  • Tech
Search
  • News
  • Finance
  • Sports
  • Life
  • Entertainment
  • Tech
  • Advertise
  • Advertise
© 2025 OnlyTrustedInfo.com . All Rights Reserved.
Tech

Debates over AI benchmarking have reached Pokémon

Last updated: April 14, 2025 6:27 pm
Oliver James
Share
2 Min Read
Debates over AI benchmarking have reached Pokémon
SHARE

Not even Pokémon is safe from AI benchmarking controversy.

Last week, a post on X went viral, claiming that Google’s latest Gemini model surpassed Anthropic’s flagship Claude model in the original Pokémon video game trilogy. Reportedly, Gemini had reached Lavendar Town in a developer’s Twitch stream; Claude was stuck at Mount Moon as of late February.

Gemini is literally ahead of Claude atm in pokemon after reaching Lavender Town

119 live views only btw, incredibly underrated stream pic.twitter.com/8AvSovAI4x

— Jush (@Jush21e8) April 10, 2025

But what the post failed to mention is that Gemini had an advantage.

As users on Reddit pointed out, the developer who maintains the Gemini stream built a custom minimap that helps the model identify “tiles” in the game like cuttable trees. This reduces the need for Gemini to analyze screenshots before it makes gameplay decisions.

Now, Pokémon is a semi-serious AI benchmark at best — few would argue it’s a very informative test of a model’s capabilities. But it is an instructive example of how different implementations of a benchmark can influence the results.

For example, Anthropic reported two scores for its recent Anthropic 3.7 Sonnet model on the benchmark SWE-bench Verified, which is designed to evaluate a model’s coding abilities. Claude 3.7 Sonnet achieved 62.3% accuracy on SWE-bench Verified, but 70.3% with a “custom scaffold” that Anthropic developed.

More recently, Meta fine-tuned a version of one of its newer models, Llama 4 Maverick, to perform well on a particular benchmark, LM Arena. The vanilla version of the model scores significantly worse on the same evaluation.

Given that AI benchmarks — Pokémon included — are imperfect measures to begin with, custom and non-standard implementations threaten to muddy the waters even further. That is to say, it doesn’t seem likely that it’ll get any easier to compare models as they’re released.

You Might Also Like

1 dead and several injured when storm rips through Kentucky community, authorities say

Hippos in the Wild: A Unit Study for Young Learners

Oldest-known ant preserved in 113 million-year-old Brazilian fossil

Scorpions Have Survived on Earth Since Before Trees Even Existed

Bizarre Quantum Universe

Share This Article
Facebook X Copy Link Print
Share
Previous Article VP JD Vance fumbles Ohio State’s championship trophy on White House visit VP JD Vance fumbles Ohio State’s championship trophy on White House visit
Next Article Katy Perry Roasted Over ‘Dramatic’ Reaction After 11-Minute Spaceflight Katy Perry Roasted Over ‘Dramatic’ Reaction After 11-Minute Spaceflight

Latest News

Steelers announce Ben Roethlisberger, Joey Porter, Maurkice Pouncey to join Hall of Honor
Steelers announce Ben Roethlisberger, Joey Porter, Maurkice Pouncey to join Hall of Honor
Sports July 28, 2025
Phillies’ Nick Castellanos out of Saturday’s lineup vs. Yankees with left knee injury
Phillies’ Nick Castellanos out of Saturday’s lineup vs. Yankees with left knee injury
Sports July 28, 2025
2025 Tour de France standings going into final stage, with Tadej Pogačar set to win 2nd consecutive trophy
2025 Tour de France standings going into final stage, with Tadej Pogačar set to win 2nd consecutive trophy
Sports July 28, 2025
2025 MLB betting: Nick Kurtz now a massive favorite to win AL Rookie of the Year
2025 MLB betting: Nick Kurtz now a massive favorite to win AL Rookie of the Year
Sports July 28, 2025
//
  • About Us
  • Contact US
  • Privacy Policy
onlyTrustedInfo.comonlyTrustedInfo.com
© 2025 OnlyTrustedInfo.com . All Rights Reserved.