onlyTrustedInfo.comonlyTrustedInfo.comonlyTrustedInfo.com
Font ResizerAa
  • News
  • Finance
  • Sports
  • Life
  • Entertainment
  • Tech
Reading: A high schooler built a website that lets you challenge AI models to a Minecraft build-off
Share
onlyTrustedInfo.comonlyTrustedInfo.com
Font ResizerAa
  • News
  • Finance
  • Sports
  • Life
  • Entertainment
  • Tech
Search
  • News
  • Finance
  • Sports
  • Life
  • Entertainment
  • Tech
  • Advertise
  • Advertise
© 2025 OnlyTrustedInfo.com . All Rights Reserved.
Tech

A high schooler built a website that lets you challenge AI models to a Minecraft build-off

Last updated: March 20, 2025 4:11 pm
OnlyTrustedInfo.com
Share
4 Min Read
A high schooler built a website that lets you challenge AI models to a Minecraft build-off
SHARE

As conventional AI benchmarking techniques prove inadequate, AI builders are turning to more creative ways to assess the capabilities of generative AI models. For one group of developers, that’s Minecraft, the Microsoft-owned sandbox-building game.

The website Minecraft Benchmark (or MC-Bench) was developed collaboratively to pit AI models against each other in head-to-head challenges to respond to prompts with Minecraft creations. Users can vote on which model did a better job, and only after voting can they see which AI made each Minecraft build.

Image Credits:Minecraft Benchmark (opens in a new window)

For Adi Singh, the 12th grader who started MC-Bench, the value of Minecraft isn’t so much the game itself, but the familiarity that people have with it — after all, it is the best-selling video game of all time. Even for people who haven’t played the game, it’s still possible to evaluate which blocky representation of a pineapple is better realized.

“Minecraft allows people to see the progress [of AI development] much more easily,” Singh told TechCrunch. “People are used to Minecraft, used to the look and the vibe.”

MC-Bench currently lists eight people as volunteer contributors. Anthropic, Google, OpenAI, and Alibaba have subsidized the project’s use of their products to run benchmark prompts, per MC-Bench’s website, but the companies are not otherwise affiliated.

“Currently we are just doing simple builds to reflect on how far we’ve come from the GPT-3 era, but [we] could see ourselves scaling to these longer-form plans and goal-oriented tasks,” Singh said. “Games might just be a medium to test agentic reasoning that is safer than in real life and more controllable for testing purposes, making it more ideal in my eyes.”

Other games like Pokémon Red, Street Fighter, and Pictionary have been used as experimental benchmarks for AI, in part because the art of benchmarking AI is notoriously tricky.

Researchers often test AI models on standardized evaluations, but many of these tests give AI a home-field advantage. Because of the way they’re trained, models are naturally gifted at certain, narrow kinds of problem-solving, particularly problem-solving that requires rote memorization or basic extrapolation.

Put simply, it’s hard to glean what it means that OpenAI’s GPT-4 can score in the 88th percentile on the LSAT, but cannot discern how many Rs are in the word “strawberry.” Anthropic’s Claude 3.7 Sonnet achieved 62.3% accuracy on a standardized software engineering benchmark, but it is worse at playing Pokémon than most five-year-olds.

MC-Bench is technically a programming benchmark, since the models are asked to write code to create the prompted build, like “Frosty the Snowman” or “a charming tropical beach hut on a pristine sandy shore.”

But it’s easier for most MC-Bench users to evaluate whether a snowman looks better than to dig into code, which gives the project wider appeal — and thus the potential to collect more data about which models consistently score better.

Whether those scores amount to much in the way of AI usefulness is up for debate, of course. Singh asserts that they’re a strong signal, though.

“The current leaderboard reflects quite closely to my own experience of using these models, which is unlike a lot of pure text benchmarks,” Singh said. “Maybe [MC-Bench] could be useful to companies to know if they’re heading in the right direction.”

You Might Also Like

When Urban Infrastructure Fails: The NYC Basement Flood Tragedy Exposes Chronic Risks and Community Solutions

Actively AI raises $22.5M to offer sales ‘superintelligence,’ says AI SDRs failed

China’s Systemic Exploitation of US Nuclear Research Exposed in Damning Congressional Report

Dutch court confirms Apple abused dominant position in dating apps

Alaska man unscathed after being pinned for three hours by 700-pound boulder in glacier creek

Share This Article
Facebook X Copy Link Print
Share
Previous Article Two men found guilty in 2022 Texas smuggling attempt that resulted in 53 migrant deaths Two men found guilty in 2022 Texas smuggling attempt that resulted in 53 migrant deaths
Next Article Bucs hold inaugural ‘She is Football Weekend’ to increase NFL opportunities for women Bucs hold inaugural ‘She is Football Weekend’ to increase NFL opportunities for women

Latest News

PFL Brussels 2026: Why the Odds Are Stacked Against the Underdogs in a Night of Dominant Favorites
PFL Brussels 2026: Why the Odds Are Stacked Against the Underdogs in a Night of Dominant Favorites
Sports May 23, 2026
Ja Morant Spotted at WNBA’s Dream vs. Wings: What His Presence Means for the NBA Star and Women’s Basketball
Ja Morant Spotted at WNBA’s Dream vs. Wings: What His Presence Means for the NBA Star and Women’s Basketball
Sports May 23, 2026
WWE Clash in Italy: Rhea Ripley vs. Jade Cargill Rematch Confirmed—Why This Title Showdown Matters
WWE Clash in Italy: Rhea Ripley vs. Jade Cargill Rematch Confirmed—Why This Title Showdown Matters
Sports May 23, 2026
Gerrit Cole’s Triumphant Return: 6 Shutout Innings After 569-Day Absence, But Yankees Fall to Rays
Gerrit Cole’s Triumphant Return: 6 Shutout Innings After 569-Day Absence, But Yankees Fall to Rays
Sports May 23, 2026
//
  • About Us
  • Contact US
  • Privacy Policy
onlyTrustedInfo.comonlyTrustedInfo.com
© 2026 OnlyTrustedInfo.com . All Rights Reserved.