

ChatGPT Just Got 'Absolutely Wrecked' at Chess, Losing to a 1970s-Era Atari 2600 (cnet.com) 109
An anonymous reader shared this report from CNET:
By using a software emulator to run Atari's 1979 game Video Chess, Citrix engineer Robert Caruso said he was able to set up a match between ChatGPT and the 46-year-old game. The matchup did not go well for ChatGPT. "ChatGPT confused rooks for bishops, missed pawn forks and repeatedly lost track of where pieces were — first blaming the Atari icons as too abstract, then faring no better even after switching to standard chess notations," Caruso wrote in a LinkedIn post.
"It made enough blunders to get laughed out of a 3rd-grade chess club," Caruso said. "ChatGPT got absolutely wrecked at the beginner level."
"Caruso wrote that the 90-minute match continued badly and that the AI chatbot repeatedly requested that the match start over..." CNET reports.
"A representative for OpenAI did not immediately return a request for comment."
"It made enough blunders to get laughed out of a 3rd-grade chess club," Caruso said. "ChatGPT got absolutely wrecked at the beginner level."
"Caruso wrote that the 90-minute match continued badly and that the AI chatbot repeatedly requested that the match start over..." CNET reports.
"A representative for OpenAI did not immediately return a request for comment."
ChatGPT is not a chess engine (Score:4, Insightful)
Re:ChatGPT is not a chess engine (Score:5, Insightful)
ChatGPT has flexibility, but it is inferior to both humans and specialized algorithms in nearly all cases.
The main advantage of ChatGPT is that you only have to feed it electricity instead of a living wage.
Re:ChatGPT is not a chess engine (Score:5, Insightful)
The main advantage of ChatGPT is that you only have to feed it electricity instead of a living wage.
With the little problem that you have to feed it so muich electricity that paying that wage might still well tirn out to be cheaper, even at western standards. At the moment LLMs burn money like crazy and it is unclear whether that can be fixed.
Re: (Score:2)
The main advantage of ChatGPT is that you only have to feed it electricity instead of a living wage.
With the little problem that you have to feed it so muich electricity that paying that wage might still well tirn out to be cheaper, even at western standards. At the moment LLMs burn money like crazy and it is unclear whether that can be fixed.
W're going to be needing several Kashiwazaki-Kariwa size or larger reactor to perform what a web search by a random person can do.
Re: (Score:3)
Remember how expensive electricity from nuclear is? That will not solve things...
Also remember that most Uranium comes from Kazakhstan (43%) and they border on China and Russia. Not a critical dependency you want. Place 2 is Kanada (15%), which the US just has mightily pissed off by sheer leadership stupidity. US domestic? A whopping 0.15%...
Re: (Score:2)
Remember how expensive electricity from nuclear is? That will not solve things...
Also remember that most Uranium comes from Kazakhstan (43%) and they border on China and Russia. Not a critical dependency you want. Place 2 is Kanada (15%), which the US just has mightily pissed off by sheer leadership stupidity. US domestic? A whopping 0.15%...
I don't disagree with any of that. And if we do decide to put ourselves in that position, is this glorified search engine going to be worth it? I don't think so.
That said, I think that before too long, we aren't going to need an entire nuclear generating facility to generate power to feed the tech bro wet dream. A guess, but a half educated one, with the way innovation trends to work.
Re: ChatGPT is not a chess engine (Score:1)
The bulk of uranium reserves are Australia, fwiw.
Re: (Score:2)
Actually, the bulk of the uranium reserves are solved in seawater. But, as it turns out, extraction ability and technology matters.
Re: (Score:3)
I disagree. Generative AI cannot really do "automation". Far too unreliable. But we will see. Your argument definitely has some merit.
Re: (Score:2)
That's ridiculous. The same unions that existed four years ago are still here. Also worthy of note:
https://www.npr.org/2025/06/11... [npr.org]
Re: (Score:2)
Businesses often prefer to minimize labor costs even when there's an overall increase to operating costs. Replacing humans with ChatGPT at a 20% markup over labor costs is still going to be an attractive prospect to many MBAs.
Re: (Score:2)
I don't disagree. But 20% is a very, very low estimate.
Re: (Score:1)
Re: (Score:2)
Whoever thought a language model would be remotely good at chess clearly doesn't understand the technology they're working with.
Re:ChatGPT is not a chess engine (Score:4, Insightful)
I wanted to to GP with "Now ask the Atari chess program to summarize a 10-page PDF".
Cherry-picking goes both ways.
Re: (Score:3)
Nobody claimed the Atari chess program was capable of anything else.
ChatGPT is supposed to be able to do anything, including walk the dog.
Re: (Score:2)
ChatGPT is supposed to be able to do anything, including walk the dog.
Says who?
Why are you not able to keep your argument grounded in reality?
Re: (Score:2)
ChatGPT is supposed to be able to do anything, including walk the dog.
No, it is not. While Marketing tends to exaggerate its capabilities, I have never seen such claims.
Re: (Score:2)
I wanted to to GP with "Now ask the Atari chess program to summarize a 10-page PDF".
I pulled out an Atari and did that.
The Atari won because it didn't make any mistakes in the summary.
Re: (Score:2)
1) No, you didn't.
2) You should always review an LLM-generated summary, but in most cases, it is perfectly accurate.
Where LLMs critical fail is when you ask them to generate something for which they have no ground truth. Because they'll fucking invent it.
Re: (Score:2)
ChatGPT is advertised as AI, approaching human level. AI is building machines that exhibit human behaviour and capabilities.
So they made the thing play a computer chess algorithm and it made excuses and demanded a rematch. Sounds like what most humans with no chess experience would do. It didn't flip the board and stomp off though.
Re: (Score:2)
ChatGPT is advertised as AI, approaching human level.
Have a citation for such an advertisement?
AI is building machines that exhibit human behaviour and capabilities.
Sure. LLMs are widely regarded as AI- no disagreement, there.
So they made the thing play a computer chess algorithm and it made excuses and demanded a rematch. Sounds like what most humans with no chess experience would do. It didn't flip the board and stomp off though.
Absolutely.
It's a language model. It has no chess training. It has probably picked up a good bit of information about chess, but still the model is trained in language, not playing Chess.
It's the difference between someone who knows the rules for chess, but has no real experience with the game.
Re: (Score:2)
Chess is a problem where you need to be able to tell the machine "these are the rules" and have it follow them. Humans can do that, the LLM can't.
Re: (Score:2)
LLMs are mostly composed of regular old fully connected ANNs, and the remainder, the transformers, are also ANNs. ANNs certainly can learn the rules of chess, and you can train one to play chess at a level that is generally regarded as superhuman. There's also a proof that any 2+ layer ANN of sufficient size can learn any IO function.
So there's nothing about the structure of an LLM that would make it unable to learn and follow the rules of chess. The fact that they don't, or don't do so very well, means tha
Re: (Score:2)
The LLM absolutely can.
AlphaZero can beat any human alive at Chess.
You're not 100% wrong though. It is similar to the clock problem, though I don't think you fully understand it.
"AI" can indeed produce clocks in any position.
The problem, is that they are over-trained/fit for one position, which means there is a likelihood that they end up there without careful prompting.
I'm familiar with the clock problem, so I didn't read that article, but I'm certain if you read it, you'll find the w
Re: (Score:2)
https://blog.samaltman.com/the... [samaltman.com]
"Humanity is close to building digital superintelligence"
"we have recently built systems that are smarter than people in many ways"
"In some big sense, ChatGPT is already more powerful than any human who has ever lived."
etc.
Re: (Score:2)
Your quote was:
ChatGPT is advertised as AI, approaching human level.
From your above blog, no such claim exists.
The most sus claim he makes is your last cited one- but even the context makes it clear he's not talking about ChatGPT's intelligence.
Re: (Score:2)
surprisingly good is something different from good.
If you want to play chess, use one of the planning algorithm based engines. They are fast, easy to parallelize, easy to run with a time budget (i.e. they get better the more time you give them, but you can stop them any time and let them do the best move) and actually built to play chess.
Having an LLM play a game is a good way to show they generalize. It is not a good way to build a chess AI.
Many people don't get science. Someone shows they can find Waldo w
Re: (Score:2)
What's funny is that ChatGPT wasn't able to spawn an instance of a chess engine for its own benefit.
Re: (Score:2)
How should it spawn it?
But I wonder if it could code one and how that would line up against the Atari. I bet there is enough chess engine code online that larger models would know how to code an engine, especially the reasoning ones.
And if you look at the MCTS algorithm (one of the algorithms that finally helped to beat human players) it is quite simple to implement. You only need rules to generate valid moves and determine win/loss and then it explores promising and less promising games until interrupted a
Re: (Score:2)
There are some AI tools that can, when prompted, actually undertake fairly complex steps to accomplish a goal, including finding ways to avoid shutting down:
https://www.livescience.com/te... [livescience.com]
If it can go that far, it can certainly download a common chess engine and run it. Assuming it was given network access and permissions necessary.
Re: (Score:2)
Importantly- they need to be instructed to do so, so it needs to be part of the test.
Re: (Score:2)
How should it spawn it?
It's called "Agentic AI".
The LLM is trained to call tools using a specific format, and the code running the LLM executes the tools on behalf of the LLM. This is my primary use for them.
More advanced systems will also spin up a sandbox where the LLM can run code it generates.
Re: (Score:2)
Shall I like to you saying that ChatGPT is an RNN, and that it is Turing Complete?
Get fucked, poser.
Re: (Score:2)
Re: (Score:2)
but it is inferior to both humans and specialized algorithms in nearly all cases.
In what way? The OP postulated pulling a random person off the street - a generalised average person. There's a good chance that they don't even know the basic rules of chess or how to make legal moves. That's the OP's point. ChatGPT is that weird friend of yours who somehow is a pub quiz ace, a true walking encyclopedia, yet someone who has no practical skills.
Re:ChatGPT is not a chess engine (Score:4, Funny)
"pulling a random person off the street - a generalised average person. There's a good chance that they don't even know the basic rules of chess or how to make legal moves."
It depends where you are. In Russia everyone is taught chess.
(Of course there are no average persons in the street, they are all in Ukraine.)
Re: ChatGPT is not a chess engine (Score:4, Interesting)
Well, the way I look it is that AI models were trained on unchecked data and they just reheat mistakes made while in training because, statistically, mistakes are more common than good moves.
Garbage in. Garbage out.
Re: (Score:2)
LLM yes. Chess engines are more often trained with methods like self-play.
Re: (Score:2)
This is a badly conducted experiment by some random fuck on LinkedIn. Talking about unchecked data and garbage. Apparently everybody on Slashdot is now so hellbent on disparaging anything AI that they'll take any bit of ragebait at face value.
The LinkedIn post: https://www.linkedin.com/posts... [linkedin.com]
Relevant quotes by the author:
- "Despite being given a baseline board layout to identify pieces, ChatGPT confused rooks for bishops, missed pawn forks, and repeatedly lost track of where pieces were — first blam
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Actually this is a very important result, because it highlights ChatGPT's strength and weakness. It's very good at dredging through vast amounts of text and forming principles of prediction, so that it can fake a human being's speech.
But it doesn't have any intellectual power at all - which is exactly what chess tests.
"On the chessboard, lies and hypocrisy do not survive long. The creative combination lays bare the presumption of a lie; the merciless fact, culminating in the checkmate, contradicts the hypoc
Re: (Score:2)
Re: (Score:2)
But it doesn't have any intellectual power at all - which is exactly what chess tests.
All hail the Atari 2600, our intellectual power overlord! Right?
Replace ChatGPT with "autocomplete" (Score:3)
Autocomplete loses in Chess!
Autocomplete makes up references!
Autocomplete said something stupid!
Re: (Score:1)
Re: (Score:2)
"ChatGPT said"
I can create a textfile that says it is good at chess. Presented with a chess program, the file will still ... do nothing at all.
Don't think a program can tell you what it is able to do, just because its primary interface is presented to you as a dialogue form. It is convenient to use that way, but you're not actually communicating with something, but only using an interface that is made to be understandable by you to instruct a neural network for some tasks it can do. There is no magic and no
Re: (Score:2)
And this is a great illustration of why LLMs aren't going to be decimating white-collar jobs.
Just as ChatGPT is terrible at chess (I'm surprised it could even try to play the game)...LLMs are terrible at doing people's jobs.
They're great at making up text (often literally making stuff up), but that's a lot different from actually *doing a job.*
Re: (Score:2)
Re: (Score:2)
What helps me understand why LLMs are so "confident" is to visualize what AI does when it "erases" unwanted people or things from a photo. It essentially makes up a background of pixels that could plausibly be behind the "erased" object. Those made-up pixels have nothing to do what was _actually_ behind those unwanted objects, it just uses a fancy extrapolation engine to predict what those pixels might be.
LLMs do the same thing, but instead of pixels, they use language tokens. When you provide a prompt or q
Re: ChatGPT is not a chess engine (Score:2)
If ChatGPT (or at least, GPT-4o) can ingest and execute code, why wouldn't it just go online, search for a FOSS chess engine in a language it "understands" (like Python), download it, recognize it as being more adept at solving problems in this specific domain, and execute that chess engine *directly* & present the output as its own?
The only thing I can think of offhand is that gpt-4o's "firewall" might limit its ability to execute code.
Re: (Score:2)
I'm sorry but you're only projecting your wishes here.
As you say, ChatGPT >= random human, but give a random human a day of instruction with a chess teacher (whereas ChatGPT got access to the entire internet's worth of chess discussions for years) and that human >= 3rd grade chess club. But we've just seen now that ChatGPT In other words, this news proves (to those who are rational wishful thinkers) that ChatGPT claims about >= random human are full of shit.
TL;DR. YW. YHL. HAND.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
That's why it can do this at all. What is impressive is that it can do that even without the sort of specialized training you envision.
How many pages of chess instruction do you think ChatGPT has been trained on? How many pages do you think it would take for it to play a decent game of chess?
Re: (Score:2)
I confess. It's a fake. (Score:2)
I'll confess to having faked the whole thing.
I used a different chatbot, which shall remain anonymous, and told it; "Submit an article to slashdot about what would happen if chatgpt played against the Atari 2600 chess program."
AI (Score:5, Insightful)
This is only news for the kind of people who refer to large language models as "AI".
Unfortunately, that's quite a lot of people.
.
Re: (Score:2)
Stop the vocab fight! It's pointless and useless! Every known definition of "AI" and even "intelligence" has big flaws. I've been in hundreds of such debates, No Human nor Bot Has Ever Proposed A Hole-Free Definition of "Intelligence", so go home and shuddup already!
Re: (Score:2)
Actually, I argue that this is the problem with language. It's vague. Ideas usually start vague, and then only after do you drill down and add details. Like writing pseudo code or specifications for code. This function is called XYZ. It does (blah blah blah...blah blah blah....etc, etc, etc, ad infinitum).
It is hard to be precise. For example how do you define "art". How about "good?". What is "goo
Re: (Score:2)
Re:AI (Score:4, Insightful)
This is my pet peeve. AI has been turned into a marketing term for things that are not the traditional definition of AI.
The term is now corrupted beyond all hope of recovery.
I'm distressed at how much tools like Chat GPT favor seeming intelligent and capable as an illusion even when lying to you. I've even caught it making a mistake and then blaming me for the mistake or pretending it meant to do it wring as a test step. The conman element is real,. even down to the tool itself.
Re: (Score:2)
That are not the traditional definition of AI.
What IS the traditional definition of AI?
It's been all over the place for years. Back when I was a student, in he very early 2000s, I had a course on AI in the same module as the neural nets lectures. It contained such topics as alpha/beta pruning, A* search, decision trees, expert systems, that kind of thing.
Further in the past neural networks were definitely considered AI, but by 2000 they were considered as "ML" which was generally treated as something separa
Re: (Score:1)
AI: algorithm implemented.
Re: (Score:2)
How so? What is the traditional definition of AI? Are you sure you're using the correct one?
Re: (Score:1)
It's called semantic drift and it's not going back to the old meaning, so I would suggest finding a new pet peeve. Kids on your lawn, perhaps.
Re: (Score:3)
This is only news for the kind of people who refer to large language models as "AI".
Unfortunately, that's quite a lot of people.
.
Old MacDonald had a LLM farm -
AI, AI, Oh!,
And on that farm he had a nuclear plant,
AI AI Oh!
With a hallucination here, a wrong answer there, here a fault there a fault, everywhere a bad answer.
Old MacDonald had a LLM farm
AI AI Oh!
Re: (Score:2)
This is only news for the kind of people who refer to large language models as "AI".
So, ... everyone including people working in the field of AI?
Re: (Score:2)
AI is a category that even includes ELIZA. You're thinking of AGI. AI is the category that includes the simplest algorithms, not the category that only includes what you see in sci-fi movies that talk about AI.
Re: (Score:2)
Eventually people recognize cheap parlor tricks for what they are. Or in this case, massively expensive ones.
Re: (Score:2)
This is only news for the kind of people who refer to large language models as "AI".
Unfortunately, that's quite a lot of people.
.
Starting with the marketing droids at the A"I" companies.
Mocking their God (Score:2)
Some people so want to believe that a useful information retrieval system is a superintelligence.
The rest of us aren't surprised that an interesting search engine isn't good at chess.
Re: (Score:3)
Some people so want to believe that a useful information retrieval system is a superintelligence.
The rest of us aren't surprised that an interesting search engine isn't good at chess.
That very nicely sums it up. Obviously, you have to be something like a sub-intelligence to think that LLMs are superintelligent. To be fair, something like 80% of the human race cannot fact-check for shit and may well qualify as sub-intelligence. Especially as miost of these do not know about their limitations due to the Dunning-Kruger effect.
Re: (Score:2)
Hmm:
- confused rooks for bishops, missed pawn forks and repeatedly lost track of where pieces were
- first blaming the Atari icons as too abstract, then faring no better even after switching to standard chess notations
- repeatedly requested that the match start over
That all rings a bell somewhere - confusion, blaming everything else for the errors, repeatedly requesting a mulligan. That seems familiar.
An Atari 2600 uses a MOS 6507 with BYTES of RAM (Score:1)
No surprise (Score:5, Insightful)
To anybody that wants to know, it is already clear that LLMs, including the "reasoning" variant, have zero reasoning abilities. All they can do is statistical predictions based on their training data. Hence any task that requires actual reasoning like chess (because chess is subject to state-space explosion and cannot be solved by "training" alone), is completely out of reach of an LLM.
The only thing surprising to me is that it took so long to come up with demonstrations of this well-known fact. Of course, the usual hallucinators believe (!) that LLMs are thinking machines/God/the singularity and other such crap, but these people are simply delulu and have nothing to contribute except confusing the issue. Refer to the litle pathetic fact that abouy 80% of the human race is "religious" and the scope of _that_ prioblem becomes clear. It also becomes clear why a rather non-impressive technology like LLMs is seen as more than just better search and better crap, when that is essentially all it has delivered. Not worthless, but not a revolution either and the extreme cost of running general (!) LLMs may still kill the whole idea in practice.
Re: (Score:1)
A good many humans don't either. They memorize patterns, rituals, slogans, etc. but can't think logically.
Re:No surprise (Score:5, Interesting)
A good many humans don't either. They memorize patterns, rituals, slogans, etc. but can't think logically.
Indeed. There are a few facts from sociology. Apparently only 10-15% of all humans can fact-check and apparently only around 20% (including the fact-checkers) can be convinced by rational argument when the question matters to them (goes up to 30% when it does not). Unfortunately, these numbers seem to be so well established that there are no current publications I can find. It may also be hard to publish about this. This is from interviews with experts and personal observations and observatioons from friends that also teach on academic levels. ChatGPT at least confirmed the 30% number but sadly failed to find a reference.
Anyway, that would mean only about 10-15% of the human race has active reasoning ability (can come up with rational arguments) and only about 20-30% has passive reasoning ability (can verify rational arguments). And that nicely explains some things, including why so many people mistake generative AI and in particular LLMs for something they are very much not and ascribe capabilities to them that they do not have and cannot have.
Re: (Score:1)
A good many humans don't either. They memorize patterns, rituals, slogans, etc. but can't think logically.
Indeed. There are a few facts from sociology. Apparently only 10-15% of all humans can fact-check and apparently only around 20% (including the fact-checkers) can be convinced by rational argument when the question matters to them (goes up to 30% when it does not). Unfortunately, these numbers seem to be so well established that there are no current publications I can find. It may also be hard to publish about this. This is from interviews with experts and personal observations and observatioons from friends that also teach on academic levels. ChatGPT at least confirmed the 30% number but sadly failed to find a reference.
Anyway, that would mean only about 10-15% of the human race has active reasoning ability (can come up with rational arguments) and only about 20-30% has passive reasoning ability (can verify rational arguments). And that nicely explains some things, including why so many people mistake generative AI and in particular LLMs for something they are very much not and ascribe capabilities to them that they do not have and cannot have.
Thus proving the point by example.
Most people have faith in something. Since they didn't arrive at that faith by reason how would you expect to get them to change their mind using reason? You are really demanding they give priority to your faith in reason over their other faith.
You have a plate of fruit that includes oranges and grapes. Someone says there are more oranges than grapes. You count the grapes and the oranges and demonstrate that there are by count more grapes then oranges. The only way that i
Re: (Score:2)
Thus proving the point by example.
Most people have faith in something. Since they didn't arrive at that faith by reason how would you expect to get them to change their mind using reason? You are really demanding they give priority to your faith in reason over their other faith.
And there I can stop reading, because you do not get it. Your simplistic and, frankly, stupid claim is that relying on rational reasoning is "faith". That is, obviously, a direct lie. Now, it is quite possible you are not smart enough to see that.
Re: (Score:2)
Now, it is quite possible you are not smart enough to see that.
Why you don't (try) to use reason to defend it your belief in it, instead of ad-hominems? Of course, like every true believer anyone who questions your belief is a heretic. There is no rational defense, because your belief isn't rational.
Re: (Score:1)
Rationality justifies rationality recursively: Rationality works because, by the rules of Rationality, it works. So, in a sense, yes, everyone has to choose their axioms.
The first missing piece of the picture is The Lens That Sees Its Flaws ( https://www.lesswrong.com/post... [lesswrong.com] ). A system can be self-improving and continue to approach accuracy, even if it starts in an imperfect state.
The second missing piece is that if you pick any sort of axiom like "believe in things that have been shown to work in the pas
Re: (Score:2)
If you want to communicate, you have to play the same game everyone else is playing.
If you don't believe in rationality and reason, why go on a forum and try making reasonable arguments? Wouldn't "squid purple smiley-emoji" be just as convincing?
Yep, I have been wondering about that person. Maybe some mental disability at play here that prevents them from seeing this obvious thing? Or maybe "I can use reason but when you do it, it is just wrong"? There are enough assholes around that do not believe others should have the same rights as they do.
The long and the short of it is, if you deny reason, everything breaks down. First, you lose all technology, because STEM is completely dependent on reason. Second, you lose society. Maybe you can keep a smal
Shall we play a game? (Score:2)
Re: (Score:2)
probably not.
I didn't know about the Atari chess game until a couple weeks ago when an old colleague showed it on FB, that he was struggling with it on the lowest level.
but I did pretty well against Fritz a long time ago, running on a Compaq Armada 7800
Uses poorly suited 4o model (Score:1)
But the Atari can't make excuses (Score:2)
...for fucking up like ChatGPT can. Take that Atari!
Score one for Kathe Spracklen (Score:4, Informative)
And her algorithm
Re: (Score:1)
Even more impressive when you realize that it didn't even have one kilobyte of RAM to play with. The Atari 2600 used a "cost optimized" version of the 6502 processor, the 6507, with a total 8KB address space (13 bits) and only had 128 bytes of RAM built in. While some cartridges supplemented this with additional RAM of their own the Video Chess cartridge did not.
Who came up with the idea to let a LLM play chess? (Score:2)
A LLM is one of the worst AIs to play chess. I won't be surprised if you're better with some greedy algorithm (which is no good idea in general).
Not all AI are the same. LLM are text generators, not chess players.
yeah my mechanic fails at brain surgery too (Score:2)
This is what your boss wants (Score:2)
Not Surprising (Score:2)
I don't u derstand why this story gets shared at a (Score:2)