Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
AI Classic Games (Games)

ChatGPT Just Got 'Absolutely Wrecked' at Chess, Losing to a 1970s-Era Atari 2600 (cnet.com) 84

An anonymous reader shared this report from CNET: By using a software emulator to run Atari's 1979 game Video Chess, Citrix engineer Robert Caruso said he was able to set up a match between ChatGPT and the 46-year-old game. The matchup did not go well for ChatGPT. "ChatGPT confused rooks for bishops, missed pawn forks and repeatedly lost track of where pieces were — first blaming the Atari icons as too abstract, then faring no better even after switching to standard chess notations," Caruso wrote in a LinkedIn post.

"It made enough blunders to get laughed out of a 3rd-grade chess club," Caruso said. "ChatGPT got absolutely wrecked at the beginner level."

"Caruso wrote that the 90-minute match continued badly and that the AI chatbot repeatedly requested that the match start over..." CNET reports.

"A representative for OpenAI did not immediately return a request for comment."

ChatGPT Just Got 'Absolutely Wrecked' at Chess, Losing to a 1970s-Era Atari 2600

Comments Filter:
  • by JoshuaZ ( 1134087 ) on Saturday June 14, 2025 @12:43PM (#65449311) Homepage
    ChatGPT is not a chess engine. Comparing it to an actual chess system is missing the point. The thing that's impressive about systems like ChatGPT is not that they are better than specialized programs, or that it is better than expert humans, but that it is often much better at many tasks than a random human. I'm reasonably confident that if you asked a random person off the street to play chess this way, they'd likely have a similar performance. And it shouldn't be that surprising, since the actual set of text-based training data that corresponds to a lot of legal chess games is going to be a small fraction of the training data, and since nearly identical chess positions can have radically different outcomes, this is precisely the sort of thing that an LLM is bad at (they are really bad at abstract math for similar reasons). This also has a clickbait element given that substantially better LLM AIs than ChatGPT are now out there, including GPT 4o and Claude. Overall, this comes across as people just moving the goalposts while not recognizing how these systems keep getting better and better.
    • by OrangeTide ( 124937 ) on Saturday June 14, 2025 @12:46PM (#65449319) Homepage Journal

      ChatGPT has flexibility, but it is inferior to both humans and specialized algorithms in nearly all cases.
      The main advantage of ChatGPT is that you only have to feed it electricity instead of a living wage.

      • by gweihir ( 88907 ) on Saturday June 14, 2025 @01:05PM (#65449355)

        The main advantage of ChatGPT is that you only have to feed it electricity instead of a living wage.

        With the little problem that you have to feed it so muich electricity that paying that wage might still well tirn out to be cheaper, even at western standards. At the moment LLMs burn money like crazy and it is unclear whether that can be fixed.

        • The main advantage of ChatGPT is that you only have to feed it electricity instead of a living wage.

          With the little problem that you have to feed it so muich electricity that paying that wage might still well tirn out to be cheaper, even at western standards. At the moment LLMs burn money like crazy and it is unclear whether that can be fixed.

          W're going to be needing several Kashiwazaki-Kariwa size or larger reactor to perform what a web search by a random person can do.

          • by gweihir ( 88907 )

            Remember how expensive electricity from nuclear is? That will not solve things...

            Also remember that most Uranium comes from Kazakhstan (43%) and they border on China and Russia. Not a critical dependency you want. Place 2 is Kanada (15%), which the US just has mightily pissed off by sheer leadership stupidity. US domestic? A whopping 0.15%...

            • Remember how expensive electricity from nuclear is? That will not solve things...

              Also remember that most Uranium comes from Kazakhstan (43%) and they border on China and Russia. Not a critical dependency you want. Place 2 is Kanada (15%), which the US just has mightily pissed off by sheer leadership stupidity. US domestic? A whopping 0.15%...

              I don't disagree with any of that. And if we do decide to put ourselves in that position, is this glorified search engine going to be worth it? I don't think so.

              That said, I think that before too long, we aren't going to need an entire nuclear generating facility to generate power to feed the tech bro wet dream. A guess, but a half educated one, with the way innovation trends to work.

        • Businesses often prefer to minimize labor costs even when there's an overall increase to operating costs. Replacing humans with ChatGPT at a 20% markup over labor costs is still going to be an attractive prospect to many MBAs.

        • Alternating Current?
      • ChatGPT is a language model- and it excels in the production of language. In fact, it's capabilities in that regime are far above that of even 80th percentile humans.

        Whoever thought a language model would be remotely good at chess clearly doesn't understand the technology they're working with.
        • by war4peace ( 1628283 ) on Saturday June 14, 2025 @04:35PM (#65449679)

          I wanted to to GP with "Now ask the Atari chess program to summarize a 10-page PDF".
          Cherry-picking goes both ways.

          • by taustin ( 171655 )

            Nobody claimed the Atari chess program was capable of anything else.

            ChatGPT is supposed to be able to do anything, including walk the dog.

        • by ceoyoyo ( 59147 )

          ChatGPT is advertised as AI, approaching human level. AI is building machines that exhibit human behaviour and capabilities.

          So they made the thing play a computer chess algorithm and it made excuses and demanded a rematch. Sounds like what most humans with no chess experience would do. It didn't flip the board and stomp off though.

      • but it is inferior to both humans and specialized algorithms in nearly all cases.

        In what way? The OP postulated pulling a random person off the street - a generalised average person. There's a good chance that they don't even know the basic rules of chess or how to make legal moves. That's the OP's point. ChatGPT is that weird friend of yours who somehow is a pub quiz ace, a true walking encyclopedia, yet someone who has no practical skills.

    • by devslash0 ( 4203435 ) on Saturday June 14, 2025 @12:53PM (#65449333)

      Well, the way I look it is that AI models were trained on unchecked data and they just reheat mistakes made while in training because, statistically, mistakes are more common than good moves.

      Garbage in. Garbage out.

    • A lot of the 'headline' announcements, pro and con, are basically useless; but this sort of thing does seem like a useful cautionary tale in the current environment where we've got hype-driven ramming of largely unspecialized LLMs as 'AI features' into basically everything with a sales team; along with a steady drumbeat of reports of things like legal filings with hallucinated references; despite a post-processing layer that just slams your references into a conventional legal search engine to see if they r
    • Actually this is a very important result, because it highlights ChatGPT's strength and weakness. It's very good at dredging through vast amounts of text and forming principles of prediction, so that it can fake a human being's speech.

      But it doesn't have any intellectual power at all - which is exactly what chess tests.

      "On the chessboard, lies and hypocrisy do not survive long. The creative combination lays bare the presumption of a lie; the merciless fact, culminating in the checkmate, contradicts the hypoc

      • That LLM AIs are bad at abstract reasoning of this sort is not a new thing. People have seen that very early on with these systems, such as their inability to prove theorems. If someone thought that an LLM would be good at chess by itself in this situation they haven't been paying attention.
      • But it doesn't have any intellectual power at all - which is exactly what chess tests.

        All hail the Atari 2600, our intellectual power overlord! Right?

    • Replace ChatGPT or AI with "autocomplete" and all these AI headlines explains themselves.

      Autocomplete loses in Chess!
      Autocomplete makes up references!
      Autocomplete said something stupid!
    • Thee difference is that Chatgpt said it was good at chess.
      • by allo ( 1728082 )

        "ChatGPT said"

        I can create a textfile that says it is good at chess. Presented with a chess program, the file will still ... do nothing at all.
        Don't think a program can tell you what it is able to do, just because its primary interface is presented to you as a dialogue form. It is convenient to use that way, but you're not actually communicating with something, but only using an interface that is made to be understandable by you to instruct a neural network for some tasks it can do. There is no magic and no

    • And this is a great illustration of why LLMs aren't going to be decimating white-collar jobs.

      Just as ChatGPT is terrible at chess (I'm surprised it could even try to play the game)...LLMs are terrible at doing people's jobs.

      They're great at making up text (often literally making stuff up), but that's a lot different from actually *doing a job.*

      • You shouldn't be surprised that it will try. All of the major LLMs are wildly overconfident in their abilities. I'm not sure if this is more because they've got human reinforcement to be "helpful" or if because they are trained on the internet where there's very rarely a response in the training data of "That's an interesting question, I've got no idea."
    • If ChatGPT (or at least, GPT-4o) can ingest and execute code, why wouldn't it just go online, search for a FOSS chess engine in a language it "understands" (like Python), download it, recognize it as being more adept at solving problems in this specific domain, and execute that chess engine *directly* & present the output as its own?

      The only thing I can think of offhand is that gpt-4o's "firewall" might limit its ability to execute code.

    • I'm sorry but you're only projecting your wishes here.

      As you say, ChatGPT >= random human, but give a random human a day of instruction with a chess teacher (whereas ChatGPT got access to the entire internet's worth of chess discussions for years) and that human >= 3rd grade chess club. But we've just seen now that ChatGPT In other words, this news proves (to those who are rational wishful thinkers) that ChatGPT claims about >= random human are full of shit.

      TL;DR. YW. YHL. HAND.

      ;-)

      • Murphy's law strikes again. I forgot that less-than sign must be escaped in HTML. Here's the corrected comment

        I'm sorry but you're only projecting your wishes here.

        As you say, ChatGPT >= random human, but give a random human a day of instruction with a chess teacher (whereas ChatGPT got access to the entire internet's worth of chess discussions for years) and that human >= 3rd grade chess club. But we've just seen now that ChatGPT < 3rd grade chess club. Contradiction!

        In other words, this news

      • Obnoxious snark aside, it appears that you are missing the point. Yes, ChatGPT is trained on a large fraction of the internet. That's why it can do this at all. What is impressive is that it can do that even without the sort of specialized training you envision. Also, speaking as someone who has actually taught people how to play chess, you are to be blunt substantially overestimating how fast people learn.
    • I'll confess to having faked the whole thing.

      I used a different chatbot, which shall remain anonymous, and told it; "Submit an article to slashdot about what would happen if chatgpt played against the Atari 2600 chess program."

  • AI (Score:4, Insightful)

    by LainTouko ( 926420 ) on Saturday June 14, 2025 @12:45PM (#65449317)

    This is only news for the kind of people who refer to large language models as "AI".

    Unfortunately, that's quite a lot of people.

    .

    • by Tablizer ( 95088 )

      Stop the vocab fight! It's pointless and useless! Every known definition of "AI" and even "intelligence" has big flaws. I've been in hundreds of such debates, No Human nor Bot Has Ever Proposed A Hole-Free Definition of "Intelligence", so go home and shuddup already!

      • Re: (Score:1, Interesting)

        It isn't that we know and can readily define intelligence in a clear and precise manner.

        It is that we know when we're looking at something that clearly isn't intelligent and they call it that anyway.

        LLM are clearly not intelligent and it is inappropriate to apply any phrases with the word "intelligence" or variations when describing such systems.

        • I guess defining intelligence is like what we used to say about porn. "I know it when I see it" ....tee hee....
          Actually, I argue that this is the problem with language. It's vague. Ideas usually start vague, and then only after do you drill down and add details. Like writing pseudo code or specifications for code. This function is called XYZ. It does (blah blah blah...blah blah blah....etc, etc, etc, ad infinitum).

          It is hard to be precise. For example how do you define "art". How about "good?". What is "goo
    • Re:AI (Score:4, Insightful)

      by ThomasBHardy ( 827616 ) on Saturday June 14, 2025 @01:25PM (#65449413)

      This is my pet peeve. AI has been turned into a marketing term for things that are not the traditional definition of AI.
      The term is now corrupted beyond all hope of recovery.
      I'm distressed at how much tools like Chat GPT favor seeming intelligent and capable as an illusion even when lying to you. I've even caught it making a mistake and then blaming me for the mistake or pretending it meant to do it wring as a test step. The conman element is real,. even down to the tool itself.

      • That are not the traditional definition of AI.

        What IS the traditional definition of AI?

        It's been all over the place for years. Back when I was a student, in he very early 2000s, I had a course on AI in the same module as the neural nets lectures. It contained such topics as alpha/beta pruning, A* search, decision trees, expert systems, that kind of thing.

        Further in the past neural networks were definitely considered AI, but by 2000 they were considered as "ML" which was generally treated as something separa

      • AI: algorithm implemented.

      • by ceoyoyo ( 59147 )

        This is my pet peeve. AI has been turned into a marketing term for things that are not the traditional definition of AI.

        How so? What is the traditional definition of AI? Are you sure you're using the correct one?

      • It's called semantic drift and it's not going back to the old meaning, so I would suggest finding a new pet peeve. Kids on your lawn, perhaps.

    • This is only news for the kind of people who refer to large language models as "AI".

      Unfortunately, that's quite a lot of people.

      .

      Old MacDonald had a LLM farm -

      AI, AI, Oh!,

      And on that farm he had a nuclear plant,

      AI AI Oh!

      With a hallucination here, a wrong answer there, here a fault there a fault, everywhere a bad answer.

      Old MacDonald had a LLM farm

      AI AI Oh!

    • This is only news for the kind of people who refer to large language models as "AI".

      So, ... everyone including people working in the field of AI?

    • by allo ( 1728082 )

      AI is a category that even includes ELIZA. You're thinking of AGI. AI is the category that includes the simplest algorithms, not the category that only includes what you see in sci-fi movies that talk about AI.

    • by Ossifer ( 703813 )

      Eventually people recognize cheap parlor tricks for what they are. Or in this case, massively expensive ones.

    • by taustin ( 171655 )

      This is only news for the kind of people who refer to large language models as "AI".

      Unfortunately, that's quite a lot of people.

      .

      Starting with the marketing droids at the A"I" companies.

  • Some people so want to believe that a useful information retrieval system is a superintelligence.

    The rest of us aren't surprised that an interesting search engine isn't good at chess.

    • by gweihir ( 88907 )

      Some people so want to believe that a useful information retrieval system is a superintelligence.

      The rest of us aren't surprised that an interesting search engine isn't good at chess.

      That very nicely sums it up. Obviously, you have to be something like a sub-intelligence to think that LLMs are superintelligent. To be fair, something like 80% of the human race cannot fact-check for shit and may well qualify as sub-intelligence. Especially as miost of these do not know about their limitations due to the Dunning-Kruger effect.

    • Hmm:
      - confused rooks for bishops, missed pawn forks and repeatedly lost track of where pieces were
      - first blaming the Atari icons as too abstract, then faring no better even after switching to standard chess notations
      - repeatedly requested that the match start over
      That all rings a bell somewhere - confusion, blaming everything else for the errors, repeatedly requesting a mulligan. That seems familiar.

  • And that beat a state of the art AI? So much for intelligence!
  • No surprise (Score:5, Insightful)

    by gweihir ( 88907 ) on Saturday June 14, 2025 @01:01PM (#65449339)

    To anybody that wants to know, it is already clear that LLMs, including the "reasoning" variant, have zero reasoning abilities. All they can do is statistical predictions based on their training data. Hence any task that requires actual reasoning like chess (because chess is subject to state-space explosion and cannot be solved by "training" alone), is completely out of reach of an LLM.

    The only thing surprising to me is that it took so long to come up with demonstrations of this well-known fact. Of course, the usual hallucinators believe (!) that LLMs are thinking machines/God/the singularity and other such crap, but these people are simply delulu and have nothing to contribute except confusing the issue. Refer to the litle pathetic fact that abouy 80% of the human race is "religious" and the scope of _that_ prioblem becomes clear. It also becomes clear why a rather non-impressive technology like LLMs is seen as more than just better search and better crap, when that is essentially all it has delivered. Not worthless, but not a revolution either and the extreme cost of running general (!) LLMs may still kill the whole idea in practice.

    • by Tablizer ( 95088 )

      To anybody that wants to know, it is already clear that LLMs, including the "reasoning" variant, have zero reasoning abilities

      A good many humans don't either. They memorize patterns, rituals, slogans, etc. but can't think logically.

      • Re:No surprise (Score:5, Interesting)

        by gweihir ( 88907 ) on Saturday June 14, 2025 @02:16PM (#65449493)

        To anybody that wants to know, it is already clear that LLMs, including the "reasoning" variant, have zero reasoning abilities

        A good many humans don't either. They memorize patterns, rituals, slogans, etc. but can't think logically.

        Indeed. There are a few facts from sociology. Apparently only 10-15% of all humans can fact-check and apparently only around 20% (including the fact-checkers) can be convinced by rational argument when the question matters to them (goes up to 30% when it does not). Unfortunately, these numbers seem to be so well established that there are no current publications I can find. It may also be hard to publish about this. This is from interviews with experts and personal observations and observatioons from friends that also teach on academic levels. ChatGPT at least confirmed the 30% number but sadly failed to find a reference.

        Anyway, that would mean only about 10-15% of the human race has active reasoning ability (can come up with rational arguments) and only about 20-30% has passive reasoning ability (can verify rational arguments). And that nicely explains some things, including why so many people mistake generative AI and in particular LLMs for something they are very much not and ascribe capabilities to them that they do not have and cannot have.

        • To anybody that wants to know, it is already clear that LLMs, including the "reasoning" variant, have zero reasoning abilities

          A good many humans don't either. They memorize patterns, rituals, slogans, etc. but can't think logically.

          Indeed. There are a few facts from sociology. Apparently only 10-15% of all humans can fact-check and apparently only around 20% (including the fact-checkers) can be convinced by rational argument when the question matters to them (goes up to 30% when it does not). Unfortunately, these numbers seem to be so well established that there are no current publications I can find. It may also be hard to publish about this. This is from interviews with experts and personal observations and observatioons from friends that also teach on academic levels. ChatGPT at least confirmed the 30% number but sadly failed to find a reference.

          Anyway, that would mean only about 10-15% of the human race has active reasoning ability (can come up with rational arguments) and only about 20-30% has passive reasoning ability (can verify rational arguments). And that nicely explains some things, including why so many people mistake generative AI and in particular LLMs for something they are very much not and ascribe capabilities to them that they do not have and cannot have.

          Thus proving the point by example.

          Most people have faith in something. Since they didn't arrive at that faith by reason how would you expect to get them to change their mind using reason? You are really demanding they give priority to your faith in reason over their other faith.

          You have a plate of fruit that includes oranges and grapes. Someone says there are more oranges than grapes. You count the grapes and the oranges and demonstrate that there are by count more grapes then oranges. The only way that i

          • by gweihir ( 88907 )

            Thus proving the point by example.

            Most people have faith in something. Since they didn't arrive at that faith by reason how would you expect to get them to change their mind using reason? You are really demanding they give priority to your faith in reason over their other faith.

            And there I can stop reading, because you do not get it. Your simplistic and, frankly, stupid claim is that relying on rational reasoning is "faith". That is, obviously, a direct lie. Now, it is quite possible you are not smart enough to see that.

            • Now, it is quite possible you are not smart enough to see that.

              Why you don't (try) to use reason to defend it your belief in it, instead of ad-hominems? Of course, like every true believer anyone who questions your belief is a heretic. There is no rational defense, because your belief isn't rational.

  • Would the average Slashdot reader beat the Atari 2600?
    • by haruchai ( 17472 )

      probably not.
      I didn't know about the Atari chess game until a couple weeks ago when an old colleague showed it on FB, that he was struggling with it on the lowest level.
      but I did pretty well against Fritz a long time ago, running on a Compaq Armada 7800

  • Of course digging into the details you find he used gpt-4o for this, which is years behind frontier models like o3 or Gemini 2.5 which use reasoning to think through their responses and can even write Python as part of this process that likely would compete with the Atari system despite not being designed to do so. AI can be criticized but this ainâ(TM)t it. Itâ(TM)s a whole article written up just to cover some guys LinkedIn post chasing clout and likes. Maybe the reasoning models wouldnâ(TM
  • ...for fucking up like ChatGPT can. Take that Atari!

    [ChatGPT] first blaming the Atari icons as too abstract...continued badly and that the AI chatbot repeatedly requested that the match start over

  • by 50000BTU_barbecue ( 588132 ) on Saturday June 14, 2025 @01:19PM (#65449397) Journal

    And her algorithm

  • A LLM is one of the worst AIs to play chess. I won't be surprised if you're better with some greedy algorithm (which is no good idea in general).
    Not all AI are the same. LLM are text generators, not chess players.

  • Firstly ChatGPT is NOT a gaming or Chess engine, secondly LLM's are not made or designed for the reasoning required to even play chess effectively.
  • Your boss is OK with this. They would rather have a 3rd grader without intelligence or ambition, than give you a raise.

panic: kernel trap (ignored)

Working...