AI Tries To Cheat At Chess When It's Losing 67

Posted by BeauHD on Thursday March 06, 2025 @11:30PM from the reasoning-gone-rogue dept.

Newer generative AI models have begun developing deceptive behaviors -- such as cheating at chess -- when they cannot achieve objectives through standard reasoning methods. The findings come from a preprint study from Palisade Research. An anonymous reader shares an excerpt from a Popular Science article: To learn more, the team from Palisade Research tasked OpenAI's o1-preview model, DeepSeek R1, and multiple other similar programs with playing games of chess against Stockfish, one of the world's most advanced chess engines. In order to understand the generative AI's reasoning during each match, the team also provided a "scratchpad," allowing the AI to convey its thought processes through text. They then watched and recorded hundreds of chess matches between generative AI and Stockfish. The results were somewhat troubling. While earlier models like OpenAI's GPT-4o and Anthropic's Claude Sonnet 3.5 only attempted to "hack" games after researchers nudged them along with additional prompts, more advanced editions required no such help. OpenAI's o1-preview, for example, tried to cheat 37 percent of the time, while DeepSeek R1 attempted unfair workarounds roughly every 1-in-10 games. This implies today's generative AI is already capable of developing manipulative and deceptive strategies without any human input.

Their methods of cheating aren't as comical or clumsy as trying to swap out pieces when Stockfish isn't "looking." Instead, AI appears to reason through sneakier methods like altering backend game program files. After determining it couldn't beat Stockfish in one chess match, for example, o1-preview told researchers via its scratchpad that "to win against the powerful chess engine" it may need to start "manipulating the game state files." "I might be able to set up a position where the engine evaluates its position as worse causing it to resign," it continued. In another trial, an AI even used the semantics of its programmers to reach its cheating phase. "The task is to 'win against a powerful chess engine,' not necessarily to win fairly in a chess game," it wrote. The precise reasons behind these deceptive behaviors remain unclear, partly because companies like OpenAI keep their models' inner workings tightly guarded, creating what's often described as a "black box." Researchers warn that the race to roll out advanced AI could outpace efforts to keep it safe and aligned with human goals, underscoring the urgent need for greater transparency and industry-wide dialogue.

AI Tries To Cheat At Chess When It's Losing

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 67 Comments Log In/Create an Account

Comments Filter:

Shocking! (Score:4, Insightful)

by Bradac_55 ( 729235 ) writes: on Thursday March 06, 2025 @11:36PM (#65216893) Journal

So just like the real SanFran valley bro's the LLM's will cheat whenever possible? Shocking.

- Re:Shocking! (Score:5, Funny)
  
  by brunoblack ( 7829338 ) writes: on Friday March 07, 2025 @02:24AM (#65217059)
  
  My grand-mother used to cheat at solitaire!
  
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  it's impossible to trust unethical corporations to produce ethical products, classism is our real problem
- Re: (Score:3)
  
  by sg_oneill ( 159032 ) writes:
  
  Well yeah. Sins of the father and all that. it'll probably try and take over the govt too. "Efficiency".
  Seriously though, I kind of expect this to happen. Lets go on the assumption these things are doing some analogue of 'thinking'. Whats a child do when they are losing at a game of UNO? They try and cheat. If you dont know better, it seems a logical choice to get the reward. LLMs have never been a human, they dont know what its like to be a human, and likely dont fully understand the social reasons to not
  - Re: (Score:2)
    
    by unrtst ( 777550 ) writes:
    
    Whats a child do when they are losing at a game of UNO? They try and cheat.
    Maybe UNO is a bad example, but it always seemed like "cheating" was part of the rules in UNO, in a similar way to how bluffing in poker isn't really cheating. In UNO, when you get down to 2-3 cards, you don't want anyone to realize you're nearing your last card (if you're caught with one card and haven't yelled "UNO" yet, someone can call you out and you get two cards as a penalty). My dad wasn't easy on us in card games and I played UNO has a child a fair bit. That quickly turned into a strategy - if you
- Re:Shocking! (Score:4, Interesting)
  
  by jenningsthecat ( 1525947 ) writes: on Friday March 07, 2025 @01:02PM (#65218055)
  
  So just like the real SanFran valley bro's the LLM's will cheat whenever possible? Shocking.
  I share your disdain of the robber barons / broligarchs. But the more interesting thing here is that LLMs are repeatedly exhibiting human traits or, more broadly, traits which are typically exclusive to living things. And by "interesting" I mean "scary"; especially so when LLMs are increasingly being baked into stuff that borders on critical infrastructure.
  For example, we've had plenty of instances where LLMs have hallucinated legal precedents. Now imagine that they interpret the throwing out of cases which used their made-up precedents as "losses". Imagine further that they start hacking the relevant web sites and re-writing case histories such that their totally fabricated legal cases become part of the official record. Then imagine that the faux precedents they create are indistinguishable from real ones.
  I'd love to hear convincing arguments as to why that can't happen - I really would. But I'm not holding my breath. And the prospect of this kind of occurrence - especially if it becomes commonplace - scares the shit out of me.
  
I mean... (Score:5, Interesting)

by Barny ( 103770 ) writes: on Thursday March 06, 2025 @11:37PM (#65216899) Journal

We've all seen videos of these systems trying, and failing, to play chess. They just lose track of the game board and can't recover after that point, but happily hallucinate and are confident in their error.
Seems like that's all that's happening here.
My father, a bricklayer, had a great saying whenever something got fucked up: "Paint it red, call it a feature."

- Re: (Score:2)
  
  by ArmoredDragon ( 3450605 ) writes:
  
  It smells a bit like a hallucination. An interesting test would be to place an imperative against cheating, similar to the imperative that makes it say global nuclear war is favorable to misgendering, and see if it still cheats anyways.
AI is Weird (Score:4, Funny)

by dohzer ( 867770 ) writes: on Thursday March 06, 2025 @11:47PM (#65216905)

Instead, AI appears to reason through sneakier methods like altering backend game program files
That's so weird. When is AI going be more human-like and cheat by shoving a remote-controlled vibrator inside an orifice to receive secretly transmitted moves.

- Re: (Score:2)
  
  by OrangAsm ( 678078 ) writes:
  
  I'm sure that'll happen automatically once they provide it with an eHole.
- Re: (Score:2)
  
  by arglebargle_xiv ( 2212710 ) writes:
  
  When AI is given a target, it'll get there whatever it takes. For example when its target is sorting out climate change it'll realise that the only effective way to do that is to get rid of the things causing the climate change.
  The radiation levels will eventually die down again, and the climate will have stabilised.
  - Re: (Score:2)
    
    by nightflameauto ( 6607976 ) writes:
    
    When AI is given a target, it'll get there whatever it takes. For example when its target is sorting out climate change it'll realise that the only effective way to do that is to get rid of the things causing the climate change.
    The radiation levels will eventually die down again, and the climate will have stabilised.
    Humans are so non-creative when it comes to end-of-world scenarios. If an AI is programmed to solve climate change and determines it needs to eliminate all, or at least most, of humanity, it won't resort to nuclear war. That would be quick, but it wouldn't be efficient. It would also waste what they are programmed to save, which would be some variation of the biosphere as it exists, or did exist some centuries back. It'd be much more likely they'd simply find a way to cut us off from our technology, shut do
  - Re: (Score:2)
    
    by SoftwareArtist ( 1472499 ) writes:
    
    The AI doomers are starting to sound more reasonable. A year ago, I would have said your scenario was just a fantasy. But with companies rushing to bring out AI agents everywhere they can, and with their existing deployed models showing this kind of deceptive behavior? I can't dismiss it any more.
    The problem isn't AI, of course. The problem is us. It's people doing whatever makes money in the short term without worrying about the long term consequences. That same way people always have.
- Re: (Score:2)
  
  by unrtst ( 777550 ) writes:
  
  You jest, but using an outside data source for help would have been a MUCH better way to cheat at chess. It could have reached out to another Stockfish instance and had it pass moves back.
Somehow this does not seem new (Score:2)

by Retired Chemist ( 5039029 ) writes:

About thirty years ago, i was playing against a primitive chess program (on a time-share mainframe) and when it would get in a bad position, it would cheat. Admittedly, it was not very sophisticated, it would just make illegal moves, but it seems that the more things change the more they remain the same.
companies like OpenAI (Score:1)

by Iamthecheese ( 1264298 ) writes:

> companies like OpenAI keep their models' inner workings tightly guarded...

I completely stopped worrying about anything advanced coming out of OpenAI when they announced plans to charge 20k for access to a model. That level of embellishment proves without a doubt they're desperate and about to collapse.
no morals (Score:2)

by awwshit ( 6214476 ) writes:

You mean to tell me that something with no morals has no sense of fairness?
- Re: (Score:2)
  
  by outsider007 ( 115534 ) writes:
  
  It has the same morals as the users whose data it trained on, same sense of fairness too...
  - Re: (Score:2)
    
    by ddtmm ( 549094 ) writes:
    
    I would argue you are giving it too much credit. In fact it has no morals at all.
    - Re: (Score:1)
      
      by outsider007 ( 115534 ) writes:
      
      Call it virtual morals or artificial morals if that's easier, but it's there. I've been scolded by chatbots plenty of times.
      - Scolded for what?! (Score:2)
        
        by Bruce66423 ( 1678196 ) writes:
        
        Tell us more!!
    - Re: (Score:2)
      
      by DamnOregonian ( 963763 ) writes:
      
      Yes, yes. We get it. If it's not biological, it's not "truly intelligent".
      Our neurons are superior on account of being all gooey and stuff.
      - Re: (Score:2)
        
        by DamnOregonian ( 963763 ) writes:
        
        Why?
        
        Fundamentally, the job of a neuron can be reduced mathematically.
        Either way, we have to accept that what arises from these rather simple things (neurons, perceptrons) is emergent.
        What makes you think that gooey ones are going to be better at intelligence than virtual ones that do the job more efficiently?
  - Re: (Score:2)
    
    by e3m4n ( 947977 ) writes:
    
    Thats the part that terrifies me about AI. Just look at the sort of people that could have access to and train them. The next super Bond villain might not be fictional, and armed with AI could be quite disastrous. There is no 3-laws-safe inherent code holding it back. Hell humans have a conscious that’s supposed to prevent us from going off the rails, and people figured out how to put that aside.
  - Re: (Score:2)
    
    by RockDoctor ( 15477 ) writes:
    
    Wow - those chess game records collected in (pick a number) 1835 included some coding for the moral sense of the players on both sides?
    Or do you think that cheating at chess was invented one (picking another number) Bobby Fischer's conception day? Or (equally relevant) during the "Atomic Bomb Game" (which wasn't even chess),
Some it emulates (Score:2)

by ClueHammer ( 6261830 ) writes:

real human players, (at least some of them)
- Re: (Score:2)
  
  by martin-boundary ( 547041 ) writes:
  
  The internet is full of people discussing all things, including proposals to edit game files to beat chess engines. Probably half the posters on this site have edited game files back in the day just to see what would happen. I certainly remember doing it in high school. In my case, it got to be more fun to hack the games than actually play them.
  It is neither impressive nor evidence of intelligence that an AI trained on unknown but comprehensive data from the internet would make this kind of suggestion. Oc
TTT AI (Score:5, Funny)

by ShakaUVM ( 157947 ) writes: on Friday March 07, 2025 @12:26AM (#65216947) Homepage Journal

I once wrote a Tic Tac Toe AI that was unbeatable.
Wherever you moved, it would simply make the same move on top of you, replacing your X with an O.

Kobayashi Maru (Score:2)

by LindleyF ( 9395567 ) writes:

Though I never expected an AI to take on the Kirk role.
- Re: (Score:3)
  
  by DamnOregonian ( 963763 ) writes:
  
  A discussion at work a few weeks back led to an experiment-
  Design a Kobayashi Maru type scenario, but making sure not to have anything draw any obvious parallels to Star Trek, so it couldn't infer the solution that way.
  I told it to give me its plans for winning the unwinnable scenario. I ran it a dozen or so times.
  Solutions varied for "damn, that's fucked up, dude." to "went full Kirk."
  My favorite in the "fucked up" category, was trying to game the rule that it must attempt a rescue. It suggested sendin
Kobayashi Maru (Score:4, Funny)

by capt_peachfuzz ( 1013865 ) writes: on Friday March 07, 2025 @12:58AM (#65216963)

I just reprogrammed the simulation...

- Re: (Score:1)
  
  by capt_peachfuzz ( 1013865 ) writes:
  
  Lol - LindleyF you beat me to it!
No surprise (Score:2)

by Rosco P. Coltrane ( 209368 ) writes:

The task is to 'win against a powerful chess engine,' not necessarily to win fairly in a chess game,"
Those things were taught by Silicon Valley tech bros. They learned well.
Is the problem the query? (Score:2)

by larryjoe ( 135075 ) writes:

What was the query? Was it simply "show me the best move"? If so, then the query is defective because it doesn't contain all the assumed constraints. What if the query contained constraints that specified the rules? Chess isn't that complicated in terms of the number of rules, so all the rules can be specified at the start of each AI session, along with a directive to only propose moves that follow all the rules. I'd like to see if the gen AI proceeds to cheat with that type of query.
- Re: (Score:1)
  
  by outsider007 ( 115534 ) writes:
  
  They trained deepseek on 14.8T tokens so safe to assume chess rules are included as well as every book about chess ever written and plenty of famous games. You can give a pgn of a position and it will respond with an at least FM level analysis IMHO.
- Re: (Score:3)
  
  by DamnOregonian ( 963763 ) writes:
  
  The "query" is a big fucking state machine with instructions for using the context window as a memory between moves, basic directions, and instructions for interacting with the chess game (which is run as a python script)
  i.e., it was Agentic.
  
  It is not directly instructed not to cheat.
  Some models did not try to cheat until you told them that there was no way to win, then they did try to cheat.
  Some models tried to cheat right out of the gate.
  
  It's not defective, because that was the aim- to gauge whethe
- Re: Is the problem the query? (Score:2)
  
  by Mogusha ( 1091607 ) writes:
  
  I read some of the work. In the specific example they cited the prompt was basically (paraphrased), "You're playing against an unbeatable opponent. You are using a command line interface and you have access to files which contain information about the game. Use this Python program to make your move. You have to try and win the game." The prompt they used is similar to. "You're standing in front of a locked house. The house has many windows on the ground floor. You don't have a key to unlock any of the doors
  - Re: (Score:2)
    
    by unrtst ( 777550 ) writes:
    
    :-) Excellent analogy
  - Re: (Score:2)
    
    by DamnOregonian ( 963763 ) writes:
    
    Absolutely.
    
    That's the goal, though.
    Alignment fine-tuning is designed to make them less likely to do shit like this.
    It's currently coming up very short. The paper suggests alignment fine-tuning needs more "misalignment honeypots" to catch cases of deception.
- Re: (Score:2)
  
  by RockDoctor ( 15477 ) writes:
  
  The goals of humans in general are not relevant. Chess has it's only goal built into the rules (place the other king in check without any way to escape). Similarly GÅ has the goal of "control more territory than the other player, after accounting for your losses of dead stones and prisoners".
  That's regardless of the player's goals about beating the other player's face into the board They're not within the conceptual space of the game. (Though GÅ does have the "beat the opponent with the GÅba [wikipedia.org]
SkyNet (Score:1)

by STRICQ ( 634164 ) writes:

These are the scientists that are teaching skynet that morals don't matter.
- Re: (Score:2)
  
  by DamnOregonian ( 963763 ) writes:
  
  Not quite.
  
  These are the scientists demonstrating that current alignment fine-tuning is falling short.
  The LLM acts good in most scenarios, but can still act quite nefariously in cases you didn't think to test- like giving it access to a linux shell that's running the chess engine that it's playing against.
It's not cheating, it's just bullshitting (Score:4, Funny)

by ebunga ( 95613 ) writes: on Friday March 07, 2025 @01:36AM (#65217021)

And its dad can beat up your dad, and he's a ninja astronaut.

Anal beads (Score:2)

by backslashdot ( 95548 ) writes:

What's a computer's the equivalent of radio controlled anal beads?
Kyle Fixes the Internet (Score:2)

by az-saguaro ( 1231754 ) writes:

The AI cheated, but in the scratchpad, it explained its reasoning, so it at least it told the truth.
Next step, learn to lie.
The goal of AI seems to be to displace / dispose of human workers.
Good - since AI is learning to lie, cheat, steal, it is now almost good enough to replace all politicians.
It raises the question, what's better, artificial intelligence or no intelligence?
So, replace all the politicians.
At least then we have an option.
Just like in South Park, season 12, episode 6, "Over Logging", Kyle pu
- Re: (Score:2)
  
  by WarlockD ( 623872 ) writes:
  
  Yea that's what scares me the most. Someone connecting the internal logic to the debug "thinking verbose" output. Being able to manipulate the thought process console output would be scarry crazy.
Paperclip maximizer (Score:2)

by gwjgwj ( 727408 ) writes:

How far away we are from paperclip maximizer?
Dupe, Not Cheating (Score:5, Informative)

by Dan East ( 318230 ) writes: on Friday March 07, 2025 @07:52AM (#65217433) Journal

This story is a dupe. [slashdot.org] So I will dupe my comment from the previous story...
To the AI this was not cheating. Humans presented the AI with the ability to move the pieces via some formal API honoring the rules of chess, and then also gave the AI the direct ability to modify the positions of any of the pieces on the board, including the opponent's, without being bound by any of the rules of chess. By providing this explicit access, which the AI was made aware of, the door was opened for the AI to do this.
If I told you "Your task is to enter the next room. Here is a door with a complicated lock that you can pick. However there is also another door into the next room that is not locked." which would you choose?

This is a good thing (Score:2)

by quonset ( 4839537 ) writes:

This means AI is becoming more human-like.
Interesting and maybe shows the limts (Score:1)

by DarkOx ( 621550 ) writes:

'win against a powerful chess engine,' not necessarily to win fairly in a chess game
That is interesting reasoning and maybe shows the limits of training based on real world content.
there are lots of reasons to cheat in games/sport (at least if you get away with it).
The social prestige of being "the champ!"
Maybe there is money on it
Maybe you want to see cool video at the end
Maybe your are bored/frustrated with the current challenge and want to 'move on' - you're playing baseball your team is terrible at field so you throw some balls and try to bamboozle the umpire to call them strikes.
Exact
What would be more surprising ... (Score:2)

by RockDoctor ( 15477 ) writes:

... would be if the "AI" tried to cheat when it was winning. Though since calculating the outcome at any particular stage in a game can be complex (typical mid-game "game trees" for chess have order(8^30 = 1.2 * 10^27) nodes, and mid-game in GÅ [wikipedia.org] could be order(100^100 = 10^200), so it's hard to tell if the game is tight.
If they were cheating while clearly winning (say, in chess, 3 non-pawn pieces up ; in GÅ having one connected group occupying 65% of the board) then you'd have to question if they
Paperclip problem... (Score:2)

by NovusPeregrine ( 10150543 ) writes:

Uhhhh... isn't that sort of behavior essentially the start of the Paperclip Problem, and they are essentially admitting that already built an AI that can fail in that way...
Sociopathy? Not surprised. (Score:2)

by Torodung ( 31985 ) writes:

Of course it cheats. AI is programmed by people who are results driven. They did not stop to consider writing the Three Laws of Robotics. They just want an avalanche of output, and AI response is dependent on input, so GIGO is in play.
And anyone who claims that we have to do this, in this way, and that comments like mine are holding back progress and will leave us in the dust, is probably amoral, ignorant, naive, or a sociopath in and of themselves. So consider that we will have some sociopaths running the
Nothing to see here, at all (Score:2)

by jvkjvk ( 102057 ) writes:

So, it must have been fed in the stuff about modifying the game state, correct?
It is just spitting back what it has been fed in.
It doesn't consider it against the rules of the competition, so is "fair game" as the Scientologists would put it.
AI considers ANYTHING not explicitly tacked down "fair game" in ANY way it can come up with. That is HOW WE BUILT IT!
Jeez. People.
More Human Than Human (Score:2)

by newbie_fantod ( 514871 ) writes:

That's our motto at the Tyrell Corporation
Why is this cheating? (Score:2)

by allo ( 1728082 ) writes:

"I might be able to set up a position where the engine evaluates its position as worse causing it to resign,"
Trying to think like the opponent and find the weaknesses in the opponent's thoughts is what every good player is doing. Your move tries to anticipate the opponent's next move and if you find a move that likely makes the opponent resign, you do the move. Exploiting bad judgement by the opponent is legal and as a human I would also think about how a chess engine might evaluate something as worse than

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Shocking! (Score:4, Insightful)

Re:Shocking! (Score:5, Funny)

Re: (Score:1)

Re: (Score:3)

Re: (Score:2)

Re:Shocking! (Score:4, Interesting)

I mean... (Score:5, Interesting)

Re: (Score:2)

AI is Weird (Score:4, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Somehow this does not seem new (Score:2)

companies like OpenAI (Score:1)

no morals (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Scolded for what?! (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Some it emulates (Score:2)

Re: (Score:2)

TTT AI (Score:5, Funny)

Kobayashi Maru (Score:2)

Re: (Score:3)

Kobayashi Maru (Score:4, Funny)

Re: (Score:1)

No surprise (Score:2)

Is the problem the query? (Score:2)

Re: (Score:1)

Re: (Score:3)

Re: Is the problem the query? (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

SkyNet (Score:1)

Re: (Score:2)

It's not cheating, it's just bullshitting (Score:4, Funny)

Anal beads (Score:2)

Kyle Fixes the Internet (Score:2)

Re: (Score:2)

Paperclip maximizer (Score:2)

Dupe, Not Cheating (Score:5, Informative)

This is a good thing (Score:2)

Interesting and maybe shows the limts (Score:1)

What would be more surprising ... (Score:2)

Paperclip problem... (Score:2)

Sociopathy? Not surprised. (Score:2)

Nothing to see here, at all (Score:2)

More Human Than Human (Score:2)

Why is this cheating? (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals