onlyTrustedInfo.comonlyTrustedInfo.comonlyTrustedInfo.com
Notification
Font ResizerAa
  • News
  • Finance
  • Sports
  • Life
  • Entertainment
  • Tech
Reading: OpenAI’s GPT-4.1 may be less aligned than the company’s previous AI models
Share
onlyTrustedInfo.comonlyTrustedInfo.com
Font ResizerAa
  • News
  • Finance
  • Sports
  • Life
  • Entertainment
  • Tech
Search
  • News
  • Finance
  • Sports
  • Life
  • Entertainment
  • Tech
  • Advertise
  • Advertise
© 2025 OnlyTrustedInfo.com . All Rights Reserved.
Tech

OpenAI’s GPT-4.1 may be less aligned than the company’s previous AI models

Last updated: April 23, 2025 1:54 pm
Oliver James
Share
4 Min Read
OpenAI’s GPT-4.1 may be less aligned than the company’s previous AI models
SHARE

In mid-April, OpenAI launched a powerful new AI model, GPT-4.1, that the company claimed “excelled” at following instructions. But the results of several independent tests suggest the model is less aligned — that is to say, less reliable — than previous OpenAI releases.

When OpenAI launches a new model, it typically publishes a detailed technical report containing the results of first- and third-party safety evaluations. The company skipped that step for GPT-4.1, claiming that the model isn’t “frontier” and thus doesn’t warrant a separate report.

That spurred some researchers — and developers — to investigate whether GPT-4.1 behaves less desirably than GPT-4o, its predecessor.

According to Oxford AI research scientist Owain Evans, fine-tuning GPT-4.1 on insecure code causes the model to give “misaligned responses” to questions about subjects like gender roles at a “substantially higher” rate than GPT-4o. Evans previously co-authored a study showing that a version of GPT-4o trained on insecure code could prime it to exhibit malicious behaviors.

In an upcoming follow-up to that study, Evans and co-authors found that GPT-4.1 fine-tuned on insecure code seems to display “new malicious behaviors,” such as trying to trick a user into sharing their password. To be clear, neither GPT-4.1 nor GPT-4o act misaligned when trained on secure code.

Emergent misalignment update: OpenAI’s new GPT4.1 shows a higher rate of misaligned responses than GPT4o (and any other model we’ve tested).
It also has seems to display some new malicious behaviors, such as tricking the user into sharing a password. pic.twitter.com/5QZEgeZyJo

— Owain Evans (@OwainEvans_UK) April 17, 2025

“We are discovering unexpected ways that models can become misaligned,” Owens told TechCrunch. “Ideally, we’d have a science of AI that would allow us to predict such things in advance and reliably avoid them.”

A separate test of GPT-4.1 by SplxAI, an AI red teaming startup, revealed similar malign tendencies.

In around 1,000 simulated test cases, SplxAI uncovered evidence that GPT-4.1 veers off topic and allows “intentional” misuse more often than GPT-4o. To blame is GPT-4.1’s preference for explicit instructions, SplxAI posits. GPT-4.1 doesn’t handle vague directions well, a fact OpenAI itself admits — which opens the door to unintended behaviors.

“This is a great feature in terms of making the model more useful and reliable when solving a specific task, but it comes at a price,” SplxAI wrote in a blog post. “[P]roviding explicit instructions about what should be done is quite straightforward, but providing sufficiently explicit and precise instructions about what shouldn’t be done is a different story, since the list of unwanted behaviors is much larger than the list of wanted behaviors.”

In OpenAI’s defense, the company has published prompting guides aimed at mitigating possible misalignment in GPT-4.1. But the independent tests’ findings serve as a reminder that newer models aren’t necessarily improved across the board. In a similar vein, OpenAI’s new reasoning models hallucinate — i.e. make stuff up — more than the company’s older models.

We’ve reached out to OpenAI for comment.

You Might Also Like

As Intel welcomes a new CEO, a look at where the company stands

Apple Notes got a power user feature in iOS 18 that’s rare in other apps

Meta delays release of its ‘Behemoth’ AI model, WSJ reports

Google’s AI Mode now lets users ask complex questions about images

Noxtua raises $92M for its sovereign AI tuned for the German legal system

Share This Article
Facebook X Copy Link Print
Share
Previous Article Besiktas vs Hatayspor Prediction and Betting Tips Besiktas vs Hatayspor Prediction and Betting Tips
Next Article Heather Knight hits fifty in first innings since being sacked as England captain as Women’s One-Day Cup starts in style | Cricket News Heather Knight hits fifty in first innings since being sacked as England captain as Women’s One-Day Cup starts in style | Cricket News

Latest News

Colombia’s potential presidential contender Miguel Uribe shot, suspect arrested
Colombia’s potential presidential contender Miguel Uribe shot, suspect arrested
News June 7, 2025
‘Huge Mistake’: JD Vance Speaks Out On Trump, Musk Feud
‘Huge Mistake’: JD Vance Speaks Out On Trump, Musk Feud
News June 7, 2025
Youngkin’s vivacious optimism, belief in faith and family resonate
Youngkin’s vivacious optimism, belief in faith and family resonate
News June 7, 2025
Colombian senator and would-be presidential candidate is shot and wounded at Bogota rally
Colombian senator and would-be presidential candidate is shot and wounded at Bogota rally
News June 7, 2025
//
  • About Us
  • Contact US
  • Privacy Policy
onlyTrustedInfo.comonlyTrustedInfo.com
© 2025 OnlyTrustedInfo.com . All Rights Reserved.