Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
Classic Games (Games) AI

ChatGPT Loses in a Game of Chess Against Magnus Carlsen (time.com) 58

The world's best human chess player beat ChatGPT, reports Time magazine. Magnus Carlsen posted on X.com earlier this month that "I sometimes get bored while travelling," and shared screenshots of his conversations with ChatGPT after he beat the AI chatbot "without losing a single piece." ChatGPT lost all its pawns, screenshots the Norwegian grandmaster shared on X on July 10 showed. ChatGPT resigned the match... "That was methodical, clean, and sharp. Well played!" ChatGPT said to him, according to the screenshots Carlsen posted.

Carlsen told the AI bot that he thought it "played really well in the opening," but ultimately "failed to follow it up correctly." He went on to ask ChatGPT for feedback on his performance. "Your play showed several strong traits," ChatGPT told him...

About a week after Carlsen posted that he beat ChatGPT in the online chess match, he lost the Freestyle Chess Grand Slam Tour in Las Vegas to teenage Indian grandmaster Rameshbabu Praggnanandhaa.

ChatGPT Loses in a Game of Chess Against Magnus Carlsen

Comments Filter:
  • Umm... (Score:4, Insightful)

    by Valgrus Thunderaxe ( 8769977 ) on Saturday July 26, 2025 @03:37PM (#65547304)
    LLMs aren't chess playing computers. This is a surprise to anyone?
    • by drnb ( 2434720 )

      LLMs aren't chess playing computers. This is a surprise to anyone?

      And specialized chess software is "playing"? Like a human it's analyzing.a series of moves and countermoves, perhaps a longer series than a human, but is it "instinctively" pruning those paths to explore like a human? Probably not with the old school programs that were more brute force with limited pruning, maybe more so with ML based series pruning?

      • Yeah thats not how LLMs work.

        Something like Stockfish absolutely does do something like that. Stockfish is not an LLM, its entire theory of operation is vastly different to how LLMs work.

        For the most part LLMs play worse chess than children.

        • We cannot compare something / AI which is essentially a 1 move "advice" tool which does not look ahead 10, 20 moves to a dedicated chess playing program which has been trained on long chains of chess moves.

          Noted: Tree pruning, min-max algorithm, etc., etc.

    • Re:Umm... (Score:4, Insightful)

      by LainTouko ( 926420 ) on Saturday July 26, 2025 @06:50PM (#65547594)
      It's unlikely to be a surprise to anyone here. To someone not particularly technical who has only been listening to LLM marketing, it's likely to be quite a surprise.
      • The interesting part of this is how it played, why it lost and how that might be fixed.

        1. In all these kinds of examples they use models bad at this type of stuff. Here, they used ChatGPT-4o, which is tuned for speed and not for reasoning. So the obvious 'improvement' here would be to use o3 or any of the other competing 'reasoning' models designed for these kinds of things.
        2. These examples 'play' chess in the form of a textual representation of a sequence of moves, not 2D representations of the board. Tha

        • 2. These examples 'play' chess in the form of a textual representation of a sequence of moves, not 2D representations of the board. That means that the AI needs to 'mentally' replay the moves to determine the state of the board. It's like asking it to do 20 consecutive math operations on a matrix of 8x8 and then asking it to come up with an operation that leads to a specific type of state of the matrix.

          What you are describing is a good way to play blindfolded chess. Every time someone makes a move, repeat in your mind all previous moves from the beginning until the present. I've taught several people to play blindfolded chess this way (and also learned it myself, from George Koltanowski).

          • The fact that that is hard to do for humans and requires teaching and a bunch of practice should make it incredibly suspect when looking at LLM performance in tasks that require it.

            I mean, of course Magnus Carlsen can play like that, but given how exotic and hard it is I'd say that AGI could exist without being able to do it.

    • by Ossifer ( 703813 )

      Why would anyone expect a linguistic parlor trick to be good at chess?

    • Exactly, it always loses at Rock, Paper, Scissors to me, no fingers. :-)

  • by drnb ( 2434720 ) on Saturday July 26, 2025 @03:43PM (#65547310)

    Carlsen told the AI bot that he thought it "played really well in the opening," but ultimately "failed to follow it up correctly."

    So ChatGPT is a magnificent cut and paste machine? For chess, its training probably included discussions of opening moves. For coding, it includes discussions on algorithms. No real reasoning going on here. Just pattern matching to training materials. Which is useful. As a human code I sometimes look up algorithms in a reference book or online to look at a sample implementation. And a chess enthusiast friend reads books and articles on opening moves.

    • There are dedicated chess engines that are a lot stronger than LLM chatbots. That being said, an LLM chatbot should be able to instantiate a chess engine and have it make the actual moves.

      • by drnb ( 2434720 )

        There are dedicated chess engines that are a lot stronger than LLM chatbots. That being said, an LLM chatbot should be able to instantiate a chess engine and have it make the actual moves.

        Yes, but are they really "playing"? Like a human they are analyzing.a series of moves and countermoves, perhaps a longer series than a human, but is it "instinctively" pruning those paths to explore like a human? Probably not with the old school programs that were more brute force with limited pruning, maybe more so with ML based series pruning?

        • Yes, but are they really "playing"?

          Yes, and prepare to have your mind blown: planes can fly.

          • by drnb ( 2434720 )

            Yes, but are they really "playing"?

            Yes, and prepare to have your mind blown: planes can fly.

            I can strap wings onto a brick and drop it from 10K ft. It moves through the air, but its not "flying" in any reasoned controlled flight perspective.

        • by PDXNerd ( 654900 )

          Yes, but are they really "playing"? Like a human they are analyzing.a series of moves and countermoves, perhaps a longer series than a human, but is it "instinctively" pruning those paths to explore like a human? Probably not with the old school programs that were more brute force with limited pruning, maybe more so with ML based series pruning?

          You do not understand how LLMs work. Its not playing or analyzing anything at all. Its a glorified random number generator that's is literally predicting characters, in this case chess move sequences. Its not judging those sequences by rules, or using lookup tables or probability matrices for movement. Its literally predicting, character by character, what its been trained with.

          The fact that it plays a 'real' game and tells a master it played a good game is meaningless - it would tell a child its ideas abo

          • by drnb ( 2434720 )

            Yes, but are they really "playing"? Like a human they are analyzing.a series of moves and countermoves, perhaps a longer series than a human, but is it "instinctively" pruning those paths to explore like a human? Probably not with the old school programs that were more brute force with limited pruning, maybe more so with ML based series pruning?

            You do not understand how LLMs work. It's not playing or analyzing anything at all.

            Guess again. From my first post in this thread: "So ChatGPT is a magnificent cut and paste machine? For chess, its training probably included discussions of opening moves. For coding, it includes discussions on algorithms. No real reasoning going on here. Just pattern matching to training materials."

            A thread which is titled "So ChatGPT is a magnificent cut-and-paste machine"

            • by PDXNerd ( 654900 )
              An LLM is not a "cut and paste machine" unless you're talking about your usage of it, and your discussion devolved into nonsense about an LLM pruning paths. If that's not what you meant, perhaps you should re-read the post *you* responded to. You said

              Probably not with the old school programs that were more brute force with limited pruning, maybe more so with ML based series pruning?

              Explain to me with your knowledge of machine learning, how does an ML model 'prune paths'? Or if you don't know, name one that does for chess that isn't a dedicated chess engine.

          • An AI is now more than a bare LLM. The 'deliberative' models now can and do think steps ahead. And an LLM-based chess engine can analyze a potential move for validity by examining the rules. But it's still grossly inefficient and not very effective compared to a special-purpose chess engine. The point of doing it is to find the minimal single type of "AI" that can do anything and everything well. And so far that goal is not met.
            • Where I said 'think,' it might be less contentious to say 'evaluate.'
            • by allo ( 1728082 )

              If you add a module checking validity, you did the first step away from the LLM toward just using a chess engine.

            • In future iterations the LLM should be able recognize that it needs help to optimize it's chess play and deploy/use the best chess playing software when facing a master. Assuming it has the goal of defeating the opponent and has "reasoning" and agentic abilities it is not much of a jump for it to pull it off.
              • Well yes that would be easy to do. If somebody uses an LLM to play chess directly, they're doing it to push the envelope of the 'LLM' or let's call it "an AI using LLM techniques with a generalized mechanism" that might work in a more basic modality to provide language generation and more. But telling it the rules of chess should still be done using language I would hope.
      • by allo ( 1728082 )

        That would be simple. Just provide a chess engine as MCP or UTCP service. But it would also just be boring, because you use the LLM as inefficient interface to the chess engine. That said, letting the LLM play is also stupid, because it is not chess engine and will always lose to good players or chess engines.

    • by allo ( 1728082 )

      Not copy&paste, but it's a thing working on text.

      You won't use string libraries to do matrix multiplication, would you? So why would you use a LANGUAGE model to play chess?

    • Yes, well, just like Deep Blue wasn't an "AI", wasn't even a(n) (which is correct... 'a' or 'an') LLM... it was just a database with a guy at the keyboard who entered Kasparov's moves and waited for the server rack to spit out an answer... the guy at the keyboard 'was' adjusting parameters of the computers engine as the games went on.

      A human, while maybe able to memorize all the popular openings and their defenses, and popular endgame strategies, functions on a different set of principles than a computer wi

  • by VaccinesCauseAdults ( 7114361 ) on Saturday July 26, 2025 @04:01PM (#65547338)
    In other news, William Shakespeare won in a language fluency, comprehension and poetry composition contest against Stockfish.
    • In other news, William Shakespeare won in a language fluency, comprehension and poetry composition contest against Stockfish.

      You're somehow overlooking how impressive an accomplishment that is, for a guy that's been dead 400 years...

      • True. I should probably have used someone contemporary. But the point is it is utterly ridiculous talking about beating ChatGPT at chess. I thought even quite ordinary people could do that, let alone the world chess champion. LLMs and AIs can do many things but excelling at chess is not one of them. Unless it just does external calls to dedicated engines like Stockfish?
        • I did get your point, and do agree with you. But I couldn't resist the joke!

          One other thing this illustrates, at least in my opinion - companies like OpenAI and Anthropic have done a remarkable PR job, convincing people that their products are significantly more general-purpose (and significantly more advanced) than they really are. The general public -represented by Magnus Carlsen, in this instance - basically sees them as AGIs, rather than the hallucinating-and-still-simplistic tools they actually are.

          • True. I should probably have used someone contemporary.

            Perhaps not. Quite a bit of what we attribute to the Bard didn't come from him. Romeo and Juliet for example is a regurgitation of an Italian work from thirty years prior.

  • It cant even remember the moves played and frequently has to be corrected about where the pieces are. This headline implies that carlsen did something other than being funny. He knows chatGPT is not a chess computer. proper chess software even running on a shitty old cpu could beat him easily.
    • by allo ( 1728082 )

      Because nobody does these benchmarks correctly. Who knows LLM knows how to help them. You would for example prompt them to print the board (probably in some concise form and not as ascii art) after each turn. Then they don't have to invest much "thought" into reconstructing the board from previous moves and can better plan next moves instead.

      The sad thing about these benchmarks is, that people didn't say "We optimized it as good as possible and that's the result" but say "We did something and it didn't work

  • ChatGPT and its siblings *seem* to be amazingly good at everything. But when it's necessary to do anything that requires a specialized skill, you need specialized AI.

    For example, LLMs can interact with patients in a hospital, but it takes specialized AI to properly read an X-ray or an MRI. An LLM can tell you what medications are commonly used to treat specific illnesses, but if you blindly follow the advice, you're likely to miss important clues that a doctor wouldn't miss.

    The LLM appears to understand how chess is played, but when it gets beyond the opening gambits that are well-documented, it falls apart.

    In any kind of business, LLMs can provide output that appears sound, but when you get into the weeds, falls apart. We're going to need people to do the heavy lifting for some time to come. The death of white-collar employment is greatly exaggerated.

    • They merely have to make a plug-in for a chess bot which takes over chess game questions. Sure, it will not be reasoning but operating another heuristic bot; but Turing's test is proving to not be such a bad one after all.

      Once you've trained the bot to fool a person for everything they ask it's on to the next person until only some experts are not fooled... There are only so many tests most people can dream up and you only need to cover that large problem space. So, if it fools most people, is it alive?

      • I never said anything about passing a test. Turing tests have always been laughable.

        They merely have to make a plug-in for a chess bot which takes over chess game questions

        Exactly my point. AI isn't going to magically know everything about everything. You're going to have to make "plug-ins" for a million different specialized tasks, in this case, playing chess. And that's what will slow down the progress of AI "taking over white collar jobs." There are simply too many specialized jobs that an LLM won't be good at, on its own. And all those specialized plug-ins are going to be very, very expens

    • by allo ( 1728082 )

      But did you consider, that such a high-end LLM might be able to write the code for a chess engine, that outdoes itself at playing chess? If I have to beat a chess master, I wouldn't play myself either. I think I can code a chess engine that plays much better than me - not a huge challenge given my chess skills, but as I aren't that good at chess and know some of the algorithms needed for chess engines, I would just use what would help me, as long as I am allowed. And your doomsday AI wouldn't care if you th

      • The idea that AI could write code that creates a "really good" chess engine, misunderstands how AI works. AI doesn't create anything, it just takes all the existing code it's seen, and synthesizes / summarizes it to form its output. It's not *actually* writing code, it's just regurgitating code that's a plausible answer to the provided prompt.

        So, if your LLM has been trained on source code for an excellent chess engine, it might be able to produce portions of code that fit. But that's very different from bu

  • by Luscious868 ( 679143 ) on Saturday July 26, 2025 @07:43PM (#65547686)
    All three people who follow chess are super excited
  • It's a bit like asking the dumbest kid in class for his notes.

  • While ChatGPT may be smarter than most slashdot posters (based on the responses here), and probably also knows chess better than most (it's trained on millions of games), but unfortunately it's not geared to show that. I'm too lazy to dig up the results from the person who researched this, but apparently there's a particular version of ChatGPT 3.5 which is decent at chess out of the box, and others require specific prompting to get on the right track.

    ChatGPT would likely have still lost to Carlsen even with

  • I can beat a tortoise in a sprint and am not even a runner. ChatGpt is the equivalent of a tortoise without legs when it comes to chess. It is an LLM not a chess playing computer.

Your code should be more efficient!

Working...