Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
AI Classic Games (Games)

ChatGPT Just Got 'Absolutely Wrecked' at Chess, Losing to a 1970s-Era Atari 2600 (cnet.com) 139

An anonymous reader shared this report from CNET: By using a software emulator to run Atari's 1979 game Video Chess, Citrix engineer Robert Caruso said he was able to set up a match between ChatGPT and the 46-year-old game. The matchup did not go well for ChatGPT. "ChatGPT confused rooks for bishops, missed pawn forks and repeatedly lost track of where pieces were — first blaming the Atari icons as too abstract, then faring no better even after switching to standard chess notations," Caruso wrote in a LinkedIn post.

"It made enough blunders to get laughed out of a 3rd-grade chess club," Caruso said. "ChatGPT got absolutely wrecked at the beginner level."

"Caruso wrote that the 90-minute match continued badly and that the AI chatbot repeatedly requested that the match start over..." CNET reports.

"A representative for OpenAI did not immediately return a request for comment."
This discussion has been archived. No new comments can be posted.

ChatGPT Just Got 'Absolutely Wrecked' at Chess, Losing to a 1970s-Era Atari 2600

Comments Filter:
  • by JoshuaZ ( 1134087 ) on Saturday June 14, 2025 @12:43PM (#65449311) Homepage
    ChatGPT is not a chess engine. Comparing it to an actual chess system is missing the point. The thing that's impressive about systems like ChatGPT is not that they are better than specialized programs, or that it is better than expert humans, but that it is often much better at many tasks than a random human. I'm reasonably confident that if you asked a random person off the street to play chess this way, they'd likely have a similar performance. And it shouldn't be that surprising, since the actual set of text-based training data that corresponds to a lot of legal chess games is going to be a small fraction of the training data, and since nearly identical chess positions can have radically different outcomes, this is precisely the sort of thing that an LLM is bad at (they are really bad at abstract math for similar reasons). This also has a clickbait element given that substantially better LLM AIs than ChatGPT are now out there, including GPT 4o and Claude. Overall, this comes across as people just moving the goalposts while not recognizing how these systems keep getting better and better.
    • by OrangeTide ( 124937 ) on Saturday June 14, 2025 @12:46PM (#65449319) Homepage Journal

      ChatGPT has flexibility, but it is inferior to both humans and specialized algorithms in nearly all cases.
      The main advantage of ChatGPT is that you only have to feed it electricity instead of a living wage.

      • by gweihir ( 88907 ) on Saturday June 14, 2025 @01:05PM (#65449355)

        The main advantage of ChatGPT is that you only have to feed it electricity instead of a living wage.

        With the little problem that you have to feed it so muich electricity that paying that wage might still well tirn out to be cheaper, even at western standards. At the moment LLMs burn money like crazy and it is unclear whether that can be fixed.

        • The main advantage of ChatGPT is that you only have to feed it electricity instead of a living wage.

          With the little problem that you have to feed it so muich electricity that paying that wage might still well tirn out to be cheaper, even at western standards. At the moment LLMs burn money like crazy and it is unclear whether that can be fixed.

          W're going to be needing several Kashiwazaki-Kariwa size or larger reactor to perform what a web search by a random person can do.

          • by gweihir ( 88907 )

            Remember how expensive electricity from nuclear is? That will not solve things...

            Also remember that most Uranium comes from Kazakhstan (43%) and they border on China and Russia. Not a critical dependency you want. Place 2 is Kanada (15%), which the US just has mightily pissed off by sheer leadership stupidity. US domestic? A whopping 0.15%...

            • Remember how expensive electricity from nuclear is? That will not solve things...

              Also remember that most Uranium comes from Kazakhstan (43%) and they border on China and Russia. Not a critical dependency you want. Place 2 is Kanada (15%), which the US just has mightily pissed off by sheer leadership stupidity. US domestic? A whopping 0.15%...

              I don't disagree with any of that. And if we do decide to put ourselves in that position, is this glorified search engine going to be worth it? I don't think so.

              That said, I think that before too long, we aren't going to need an entire nuclear generating facility to generate power to feed the tech bro wet dream. A guess, but a half educated one, with the way innovation trends to work.

        • Businesses often prefer to minimize labor costs even when there's an overall increase to operating costs. Replacing humans with ChatGPT at a 20% markup over labor costs is still going to be an attractive prospect to many MBAs.

      • ChatGPT is a language model- and it excels in the production of language. In fact, it's capabilities in that regime are far above that of even 80th percentile humans.

        Whoever thought a language model would be remotely good at chess clearly doesn't understand the technology they're working with.
        • by war4peace ( 1628283 ) on Saturday June 14, 2025 @04:35PM (#65449679)

          I wanted to to GP with "Now ask the Atari chess program to summarize a 10-page PDF".
          Cherry-picking goes both ways.

          • by taustin ( 171655 ) on Saturday June 14, 2025 @09:31PM (#65450043) Homepage Journal

            Nobody claimed the Atari chess program was capable of anything else.

            ChatGPT is supposed to be able to do anything, including walk the dog.

            • ChatGPT is supposed to be able to do anything, including walk the dog.

              Says who?

              Why are you not able to keep your argument grounded in reality?

            • ChatGPT is supposed to be able to do anything, including walk the dog.

              No, it is not. While Marketing tends to exaggerate its capabilities, I have never seen such claims.

          • I wanted to to GP with "Now ask the Atari chess program to summarize a 10-page PDF".

            I pulled out an Atari and did that.

            The Atari won because it didn't make any mistakes in the summary.

            • Can you show me on the doll where the LLM touched you, son?

              1) No, you didn't.
              2) You should always review an LLM-generated summary, but in most cases, it is perfectly accurate.

              Where LLMs critical fail is when you ask them to generate something for which they have no ground truth. Because they'll fucking invent it.
        • by ceoyoyo ( 59147 )

          ChatGPT is advertised as AI, approaching human level. AI is building machines that exhibit human behaviour and capabilities.

          So they made the thing play a computer chess algorithm and it made excuses and demanded a rematch. Sounds like what most humans with no chess experience would do. It didn't flip the board and stomp off though.

          • ChatGPT is advertised as AI, approaching human level.

            Have a citation for such an advertisement?

            AI is building machines that exhibit human behaviour and capabilities.

            Sure. LLMs are widely regarded as AI- no disagreement, there.

            So they made the thing play a computer chess algorithm and it made excuses and demanded a rematch. Sounds like what most humans with no chess experience would do. It didn't flip the board and stomp off though.

            Absolutely.
            It's a language model. It has no chess training. It has probably picked up a good bit of information about chess, but still the model is trained in language, not playing Chess.

            It's the difference between someone who knows the rules for chess, but has no real experience with the game.

            • It's like the clock problem [monochrome-watches.com] on steroids. Not all the required positions are in the training set of images. With the clock, you can easily just increase the number of images in your training set. With the chess problem, you can do that to some degree, but there are more positions than planets in the universe, so you won't have enough disk space.

              Chess is a problem where you need to be able to tell the machine "these are the rules" and have it follow them. Humans can do that, the LLM can't.
              • by ceoyoyo ( 59147 )

                LLMs are mostly composed of regular old fully connected ANNs, and the remainder, the transformers, are also ANNs. ANNs certainly can learn the rules of chess, and you can train one to play chess at a level that is generally regarded as superhuman. There's also a proof that any 2+ layer ANN of sufficient size can learn any IO function.

                So there's nothing about the structure of an LLM that would make it unable to learn and follow the rules of chess. The fact that they don't, or don't do so very well, means tha

                • LLMs are mostly composed of regular old fully connected ANNs, and the remainder, the transformers, are also ANNs.

                  You're talking theoretical things here, not practical reality.

                  ANNs certainly can learn the rules of chess, and you can train one to play chess at a level that is generally regarded as superhuman.

                  No one has ever made an ANN that plays chess at a level that is superhuman. AlphaZero is still primarily a tree searching algorithm with an ANN used to evaluate every node of the tree.

                  There's also a proof that any 2+ layer ANN of sufficient size can learn any IO function.

                  To remove the tree search, the ANN would need to be VERY big.

                  The fact that they don't, or don't do so very well, means that the way they are trained is an inefficient way to learn chess.

                  LLMs are trained in an inefficient way to learn anything. How many billions of pages did you need to be trained how to read?

              • This is absurd.
                The LLM absolutely can.
                AlphaZero can beat any human alive at Chess.

                You're not 100% wrong though. It is similar to the clock problem, though I don't think you fully understand it.
                "AI" can indeed produce clocks in any position.
                The problem, is that they are over-trained/fit for one position, which means there is a likelihood that they end up there without careful prompting.
                I'm familiar with the clock problem, so I didn't read that article, but I'm certain if you read it, you'll find the w
            • by ceoyoyo ( 59147 )

              Have a citation for such an advertisement?

              https://blog.samaltman.com/the... [samaltman.com]
              "Humanity is close to building digital superintelligence"
              "we have recently built systems that are smarter than people in many ways"
              "In some big sense, ChatGPT is already more powerful than any human who has ever lived."
              etc.

              • You're not a stupid person. Why do you act stupid?
                Your quote was:

                ChatGPT is advertised as AI, approaching human level.

                From your above blog, no such claim exists.
                The most sus claim he makes is your last cited one- but even the context makes it clear he's not talking about ChatGPT's intelligence.

      • but it is inferior to both humans and specialized algorithms in nearly all cases.

        In what way? The OP postulated pulling a random person off the street - a generalised average person. There's a good chance that they don't even know the basic rules of chess or how to make legal moves. That's the OP's point. ChatGPT is that weird friend of yours who somehow is a pub quiz ace, a true walking encyclopedia, yet someone who has no practical skills.

        • by rossdee ( 243626 ) on Saturday June 14, 2025 @05:44PM (#65449773)

          "pulling a random person off the street - a generalised average person. There's a good chance that they don't even know the basic rules of chess or how to make legal moves."

          It depends where you are. In Russia everyone is taught chess.
          (Of course there are no average persons in the street, they are all in Ukraine.)

    • by devslash0 ( 4203435 ) on Saturday June 14, 2025 @12:53PM (#65449333)

      Well, the way I look it is that AI models were trained on unchecked data and they just reheat mistakes made while in training because, statistically, mistakes are more common than good moves.

      Garbage in. Garbage out.

      • by allo ( 1728082 )

        LLM yes. Chess engines are more often trained with methods like self-play.

      • This is a badly conducted experiment by some random fuck on LinkedIn. Talking about unchecked data and garbage. Apparently everybody on Slashdot is now so hellbent on disparaging anything AI that they'll take any bit of ragebait at face value.

        The LinkedIn post: https://www.linkedin.com/posts... [linkedin.com]

        Relevant quotes by the author:
        - "Despite being given a baseline board layout to identify pieces, ChatGPT confused rooks for bishops, missed pawn forks, and repeatedly lost track of where pieces were — first blam

    • A lot of the 'headline' announcements, pro and con, are basically useless; but this sort of thing does seem like a useful cautionary tale in the current environment where we've got hype-driven ramming of largely unspecialized LLMs as 'AI features' into basically everything with a sales team; along with a steady drumbeat of reports of things like legal filings with hallucinated references; despite a post-processing layer that just slams your references into a conventional legal search engine to see if they r
    • Actually this is a very important result, because it highlights ChatGPT's strength and weakness. It's very good at dredging through vast amounts of text and forming principles of prediction, so that it can fake a human being's speech.

      But it doesn't have any intellectual power at all - which is exactly what chess tests.

      "On the chessboard, lies and hypocrisy do not survive long. The creative combination lays bare the presumption of a lie; the merciless fact, culminating in the checkmate, contradicts the hypoc

      • That LLM AIs are bad at abstract reasoning of this sort is not a new thing. People have seen that very early on with these systems, such as their inability to prove theorems. If someone thought that an LLM would be good at chess by itself in this situation they haven't been paying attention.
      • But it doesn't have any intellectual power at all - which is exactly what chess tests.

        All hail the Atari 2600, our intellectual power overlord! Right?

    • by fuzzyf ( 1129635 ) on Saturday June 14, 2025 @04:47PM (#65449701)
      Replace ChatGPT or AI with "autocomplete" and all these AI headlines explains themselves.

      Autocomplete loses in Chess!
      Autocomplete makes up references!
      Autocomplete said something stupid!
    • And this is a great illustration of why LLMs aren't going to be decimating white-collar jobs.

      Just as ChatGPT is terrible at chess (I'm surprised it could even try to play the game)...LLMs are terrible at doing people's jobs.

      They're great at making up text (often literally making stuff up), but that's a lot different from actually *doing a job.*

      • You shouldn't be surprised that it will try. All of the major LLMs are wildly overconfident in their abilities. I'm not sure if this is more because they've got human reinforcement to be "helpful" or if because they are trained on the internet where there's very rarely a response in the training data of "That's an interesting question, I've got no idea."
        • What helps me understand why LLMs are so "confident" is to visualize what AI does when it "erases" unwanted people or things from a photo. It essentially makes up a background of pixels that could plausibly be behind the "erased" object. Those made-up pixels have nothing to do what was _actually_ behind those unwanted objects, it just uses a fancy extrapolation engine to predict what those pixels might be.

          LLMs do the same thing, but instead of pixels, they use language tokens. When you provide a prompt or q

    • If ChatGPT (or at least, GPT-4o) can ingest and execute code, why wouldn't it just go online, search for a FOSS chess engine in a language it "understands" (like Python), download it, recognize it as being more adept at solving problems in this specific domain, and execute that chess engine *directly* & present the output as its own?

      The only thing I can think of offhand is that gpt-4o's "firewall" might limit its ability to execute code.

    • I'm sorry but you're only projecting your wishes here.

      As you say, ChatGPT >= random human, but give a random human a day of instruction with a chess teacher (whereas ChatGPT got access to the entire internet's worth of chess discussions for years) and that human >= 3rd grade chess club. But we've just seen now that ChatGPT In other words, this news proves (to those who are rational wishful thinkers) that ChatGPT claims about >= random human are full of shit.

      TL;DR. YW. YHL. HAND.

      ;-)

      • Murphy's law strikes again. I forgot that less-than sign must be escaped in HTML. Here's the corrected comment

        I'm sorry but you're only projecting your wishes here.

        As you say, ChatGPT >= random human, but give a random human a day of instruction with a chess teacher (whereas ChatGPT got access to the entire internet's worth of chess discussions for years) and that human >= 3rd grade chess club. But we've just seen now that ChatGPT < 3rd grade chess club. Contradiction!

        In other words, this news

      • Obnoxious snark aside, it appears that you are missing the point. Yes, ChatGPT is trained on a large fraction of the internet. That's why it can do this at all. What is impressive is that it can do that even without the sort of specialized training you envision. Also, speaking as someone who has actually taught people how to play chess, you are to be blunt substantially overestimating how fast people learn.
        • That's why it can do this at all. What is impressive is that it can do that even without the sort of specialized training you envision.

          How many pages of chess instruction do you think ChatGPT has been trained on? How many pages do you think it would take for it to play a decent game of chess?

          • Pages of instruction are not the only thing that matters. Lots of humans don't learn well from simply reading instruction sets. And since ChatGPT doesn't have a good visual representation of the board, this is equivalent to trying to teach a human who has never learned to play chess to learn to play without a visual board and only able to keep track of moves based on the move notation. Even some strong chess players have trouble playing chess in their heads this way.
            • ok lol. You claim you are a mathematician, but here you are using motivated reasoning.
              • My job doesn't have much to do with this at all. All humans engage in motivated reasoning and other cognitive biases. But it is also very easy to think someone one disagrees with is engaging in some sort of cognitive error even when they are not. So instead of just labeling this as motivated reasoning, maybe you could explain what it is wrong with the point I made?
    • I'll confess to having faked the whole thing.

      I used a different chatbot, which shall remain anonymous, and told it; "Submit an article to slashdot about what would happen if chatgpt played against the Atari 2600 chess program."

    • by sjames ( 1099 )

      Given the way so many people vastly over-estimate ChatGPT as an actual intelligence, I thing it's quite fair to put it up against an old and tiny chess engine on easy level. This is basically "Are you Smarter than a5th Grader" for AIs. And it is NOT.

    • People have tried LLMs on chess and have much more interesting things to say then this click bait. See https://nicholas.carlini.com/w... [carlini.com] for a good example from almost two years ago.
  • AI (Score:5, Insightful)

    by LainTouko ( 926420 ) on Saturday June 14, 2025 @12:45PM (#65449317)

    This is only news for the kind of people who refer to large language models as "AI".

    Unfortunately, that's quite a lot of people.

    .

    • by Tablizer ( 95088 )

      Stop the vocab fight! It's pointless and useless! Every known definition of "AI" and even "intelligence" has big flaws. I've been in hundreds of such debates, No Human nor Bot Has Ever Proposed A Hole-Free Definition of "Intelligence", so go home and shuddup already!

      • "An argument about the world is interesting. An argument about a word is not." I can't find the origin of that quote.
    • Re:AI (Score:4, Insightful)

      by ThomasBHardy ( 827616 ) on Saturday June 14, 2025 @01:25PM (#65449413)

      This is my pet peeve. AI has been turned into a marketing term for things that are not the traditional definition of AI.
      The term is now corrupted beyond all hope of recovery.
      I'm distressed at how much tools like Chat GPT favor seeming intelligent and capable as an illusion even when lying to you. I've even caught it making a mistake and then blaming me for the mistake or pretending it meant to do it wring as a test step. The conman element is real,. even down to the tool itself.

      • That are not the traditional definition of AI.

        What IS the traditional definition of AI?

        It's been all over the place for years. Back when I was a student, in he very early 2000s, I had a course on AI in the same module as the neural nets lectures. It contained such topics as alpha/beta pruning, A* search, decision trees, expert systems, that kind of thing.

        Further in the past neural networks were definitely considered AI, but by 2000 they were considered as "ML" which was generally treated as something separa

      • by ceoyoyo ( 59147 )

        This is my pet peeve. AI has been turned into a marketing term for things that are not the traditional definition of AI.

        How so? What is the traditional definition of AI? Are you sure you're using the correct one?

    • This is only news for the kind of people who refer to large language models as "AI".

      Unfortunately, that's quite a lot of people.

      .

      Old MacDonald had a LLM farm -

      AI, AI, Oh!,

      And on that farm he had a nuclear plant,

      AI AI Oh!

      With a hallucination here, a wrong answer there, here a fault there a fault, everywhere a bad answer.

      Old MacDonald had a LLM farm

      AI AI Oh!

    • This is only news for the kind of people who refer to large language models as "AI".

      So, ... everyone including people working in the field of AI?

    • by allo ( 1728082 )

      AI is a category that even includes ELIZA. You're thinking of AGI. AI is the category that includes the simplest algorithms, not the category that only includes what you see in sci-fi movies that talk about AI.

    • by Ossifer ( 703813 )

      Eventually people recognize cheap parlor tricks for what they are. Or in this case, massively expensive ones.

    • by taustin ( 171655 )

      This is only news for the kind of people who refer to large language models as "AI".

      Unfortunately, that's quite a lot of people.

      .

      Starting with the marketing droids at the A"I" companies.

  • Some people so want to believe that a useful information retrieval system is a superintelligence.

    The rest of us aren't surprised that an interesting search engine isn't good at chess.

    • by gweihir ( 88907 )

      Some people so want to believe that a useful information retrieval system is a superintelligence.

      The rest of us aren't surprised that an interesting search engine isn't good at chess.

      That very nicely sums it up. Obviously, you have to be something like a sub-intelligence to think that LLMs are superintelligent. To be fair, something like 80% of the human race cannot fact-check for shit and may well qualify as sub-intelligence. Especially as miost of these do not know about their limitations due to the Dunning-Kruger effect.

    • Hmm:
      - confused rooks for bishops, missed pawn forks and repeatedly lost track of where pieces were
      - first blaming the Atari icons as too abstract, then faring no better even after switching to standard chess notations
      - repeatedly requested that the match start over
      That all rings a bell somewhere - confusion, blaming everything else for the errors, repeatedly requesting a mulligan. That seems familiar.

  • No surprise (Score:5, Insightful)

    by gweihir ( 88907 ) on Saturday June 14, 2025 @01:01PM (#65449339)

    To anybody that wants to know, it is already clear that LLMs, including the "reasoning" variant, have zero reasoning abilities. All they can do is statistical predictions based on their training data. Hence any task that requires actual reasoning like chess (because chess is subject to state-space explosion and cannot be solved by "training" alone), is completely out of reach of an LLM.

    The only thing surprising to me is that it took so long to come up with demonstrations of this well-known fact. Of course, the usual hallucinators believe (!) that LLMs are thinking machines/God/the singularity and other such crap, but these people are simply delulu and have nothing to contribute except confusing the issue. Refer to the litle pathetic fact that abouy 80% of the human race is "religious" and the scope of _that_ prioblem becomes clear. It also becomes clear why a rather non-impressive technology like LLMs is seen as more than just better search and better crap, when that is essentially all it has delivered. Not worthless, but not a revolution either and the extreme cost of running general (!) LLMs may still kill the whole idea in practice.

  • Would the average Slashdot reader beat the Atari 2600?
    • by haruchai ( 17472 )

      probably not.
      I didn't know about the Atari chess game until a couple weeks ago when an old colleague showed it on FB, that he was struggling with it on the lowest level.
      but I did pretty well against Fritz a long time ago, running on a Compaq Armada 7800

  • ...for fucking up like ChatGPT can. Take that Atari!

    [ChatGPT] first blaming the Atari icons as too abstract...continued badly and that the AI chatbot repeatedly requested that the match start over

  • by 50000BTU_barbecue ( 588132 ) on Saturday June 14, 2025 @01:19PM (#65449397) Journal

    And her algorithm

  • A LLM is one of the worst AIs to play chess. I won't be surprised if you're better with some greedy algorithm (which is no good idea in general).
    Not all AI are the same. LLM are text generators, not chess players.

  • Firstly ChatGPT is NOT a gaming or Chess engine, secondly LLM's are not made or designed for the reasoning required to even play chess effectively.
  • Your boss is OK with this. They would rather have a 3rd grader without intelligence or ambition, than give you a raise.
  • "AI" is a marketing term for these types of tools. They are just specialized for a specific type of task, it's just that task can be generating images, video, audio, or text that can cover as wide a variety of topics as you can train it with data. Ultimately, text generating AI is only concerned about putting the "most right" words together in sequence to follow up your prompt. These can be wrong if there is no really right words (eg it doesn't "know" the answer, so you get "halucinations") and it certain
  • It's like comparing making a toast in a toaster from 1970 and inside thr newest tesla and then saying that toaster from 1970 beat tesla

Receiving a million dollars tax free will make you feel better than being flat broke and having a stomach ache. -- Dolph Sharp, "I'm O.K., You're Not So Hot"

Working...