

ChatGPT Loses in a Game of Chess Against Magnus Carlsen (time.com) 58
The world's best human chess player beat ChatGPT, reports Time magazine. Magnus Carlsen posted on X.com earlier this month that "I sometimes get bored while travelling," and shared screenshots of his conversations with ChatGPT after he beat the AI chatbot "without losing a single piece."
ChatGPT lost all its pawns, screenshots the Norwegian grandmaster shared on X on July 10 showed. ChatGPT resigned the match... "That was methodical, clean, and sharp. Well played!" ChatGPT said to him, according to the screenshots Carlsen posted.
Carlsen told the AI bot that he thought it "played really well in the opening," but ultimately "failed to follow it up correctly." He went on to ask ChatGPT for feedback on his performance. "Your play showed several strong traits," ChatGPT told him...
About a week after Carlsen posted that he beat ChatGPT in the online chess match, he lost the Freestyle Chess Grand Slam Tour in Las Vegas to teenage Indian grandmaster Rameshbabu Praggnanandhaa.
Carlsen told the AI bot that he thought it "played really well in the opening," but ultimately "failed to follow it up correctly." He went on to ask ChatGPT for feedback on his performance. "Your play showed several strong traits," ChatGPT told him...
About a week after Carlsen posted that he beat ChatGPT in the online chess match, he lost the Freestyle Chess Grand Slam Tour in Las Vegas to teenage Indian grandmaster Rameshbabu Praggnanandhaa.
Umm... (Score:4, Insightful)
Re: (Score:2)
LLMs aren't chess playing computers. This is a surprise to anyone?
And specialized chess software is "playing"? Like a human it's analyzing.a series of moves and countermoves, perhaps a longer series than a human, but is it "instinctively" pruning those paths to explore like a human? Probably not with the old school programs that were more brute force with limited pruning, maybe more so with ML based series pruning?
Re: (Score:2)
Yeah thats not how LLMs work.
Something like Stockfish absolutely does do something like that. Stockfish is not an LLM, its entire theory of operation is vastly different to how LLMs work.
For the most part LLMs play worse chess than children.
Billions of 1 move advice (Score:2)
We cannot compare something / AI which is essentially a 1 move "advice" tool which does not look ahead 10, 20 moves to a dedicated chess playing program which has been trained on long chains of chess moves.
Noted: Tree pruning, min-max algorithm, etc., etc.
Re:Umm... (Score:4, Insightful)
Re: (Score:2)
The interesting part of this is how it played, why it lost and how that might be fixed.
1. In all these kinds of examples they use models bad at this type of stuff. Here, they used ChatGPT-4o, which is tuned for speed and not for reasoning. So the obvious 'improvement' here would be to use o3 or any of the other competing 'reasoning' models designed for these kinds of things.
2. These examples 'play' chess in the form of a textual representation of a sequence of moves, not 2D representations of the board. Tha
Re: (Score:2)
1. That's an example of the "nuh uh, you didn't try this other model, that one for sure will work" fallacy
This is not how fallacies work. You can't just make something up and call it a fallacy. My point was very specific. The implication of the story is something like "I raced a car on my e-bike and won!" and I point out that the car in question was a tractor. This analogy still holds. I don't have to prove shit to make that analogy and my resulting criticism on the story work.
So "reasoning" doesn't solve the representation issue you've identified.
Except we have actual benchmarks for that and models where that issue is much much less of a problem. You're just talking out of your as
Re: (Score:2)
2. These examples 'play' chess in the form of a textual representation of a sequence of moves, not 2D representations of the board. That means that the AI needs to 'mentally' replay the moves to determine the state of the board. It's like asking it to do 20 consecutive math operations on a matrix of 8x8 and then asking it to come up with an operation that leads to a specific type of state of the matrix.
What you are describing is a good way to play blindfolded chess. Every time someone makes a move, repeat in your mind all previous moves from the beginning until the present. I've taught several people to play blindfolded chess this way (and also learned it myself, from George Koltanowski).
Re: (Score:2)
The fact that that is hard to do for humans and requires teaching and a bunch of practice should make it incredibly suspect when looking at LLM performance in tasks that require it.
I mean, of course Magnus Carlsen can play like that, but given how exotic and hard it is I'd say that AGI could exist without being able to do it.
Re: (Score:3)
Why would anyone expect a linguistic parlor trick to be good at chess?
Re: (Score:2)
Exactly, it always loses at Rock, Paper, Scissors to me, no fingers. :-)
So ChatGPT is a magnificent cut-and-paste machine? (Score:3)
Carlsen told the AI bot that he thought it "played really well in the opening," but ultimately "failed to follow it up correctly."
So ChatGPT is a magnificent cut and paste machine? For chess, its training probably included discussions of opening moves. For coding, it includes discussions on algorithms. No real reasoning going on here. Just pattern matching to training materials. Which is useful. As a human code I sometimes look up algorithms in a reference book or online to look at a sample implementation. And a chess enthusiast friend reads books and articles on opening moves.
Re: (Score:2)
There are dedicated chess engines that are a lot stronger than LLM chatbots. That being said, an LLM chatbot should be able to instantiate a chess engine and have it make the actual moves.
Re: (Score:2)
There are dedicated chess engines that are a lot stronger than LLM chatbots. That being said, an LLM chatbot should be able to instantiate a chess engine and have it make the actual moves.
Yes, but are they really "playing"? Like a human they are analyzing.a series of moves and countermoves, perhaps a longer series than a human, but is it "instinctively" pruning those paths to explore like a human? Probably not with the old school programs that were more brute force with limited pruning, maybe more so with ML based series pruning?
Re: So ChatGPT is a magnificent cut-and-paste mach (Score:2)
Yes, but are they really "playing"?
Yes, and prepare to have your mind blown: planes can fly.
Re: (Score:2)
Yes, but are they really "playing"?
Yes, and prepare to have your mind blown: planes can fly.
I can strap wings onto a brick and drop it from 10K ft. It moves through the air, but its not "flying" in any reasoned controlled flight perspective.
Re: (Score:2)
Re: (Score:2)
Yes, but are they really "playing"? Like a human they are analyzing.a series of moves and countermoves, perhaps a longer series than a human, but is it "instinctively" pruning those paths to explore like a human? Probably not with the old school programs that were more brute force with limited pruning, maybe more so with ML based series pruning?
You do not understand how LLMs work. Its not playing or analyzing anything at all. Its a glorified random number generator that's is literally predicting characters, in this case chess move sequences. Its not judging those sequences by rules, or using lookup tables or probability matrices for movement. Its literally predicting, character by character, what its been trained with.
The fact that it plays a 'real' game and tells a master it played a good game is meaningless - it would tell a child its ideas abo
Re: (Score:2)
Yes, but are they really "playing"? Like a human they are analyzing.a series of moves and countermoves, perhaps a longer series than a human, but is it "instinctively" pruning those paths to explore like a human? Probably not with the old school programs that were more brute force with limited pruning, maybe more so with ML based series pruning?
You do not understand how LLMs work. It's not playing or analyzing anything at all.
Guess again. From my first post in this thread: "So ChatGPT is a magnificent cut and paste machine? For chess, its training probably included discussions of opening moves. For coding, it includes discussions on algorithms. No real reasoning going on here. Just pattern matching to training materials."
A thread which is titled "So ChatGPT is a magnificent cut-and-paste machine"
Re: (Score:2)
Probably not with the old school programs that were more brute force with limited pruning, maybe more so with ML based series pruning?
Explain to me with your knowledge of machine learning, how does an ML model 'prune paths'? Or if you don't know, name one that does for chess that isn't a dedicated chess engine.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
If you add a module checking validity, you did the first step away from the LLM toward just using a chess engine.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
This is vacuous nonsense. LLMs have the ability to generalize. They can actually do shit. They can for example translate languages, base64 decode, solve simple ciphers, double recipes, apply knowledge learned via ICL. All with varying degrees of success. While their behavior is generally rather rote blanket dismissal as a glorified random number generator or a next character predictor fails to speak in any useful way to demonstrated capabilities.
No they can't generalize. They can produce random numbers and characters based on training data. If you're talking about 'summarize this paper' it is just producing random characters that fit the training data. If you think it 'thinks' or 'does shit', other than producing random characters that fit training data (or context data if you're loading it that way for MCP or tooling shit) then you have a fundamental misunderstanding of how both LLM is 'trained' and function. (we use the word training but what we
Re: (Score:2)
That would be simple. Just provide a chess engine as MCP or UTCP service. But it would also just be boring, because you use the LLM as inefficient interface to the chess engine. That said, letting the LLM play is also stupid, because it is not chess engine and will always lose to good players or chess engines.
Re: (Score:2)
Not copy&paste, but it's a thing working on text.
You won't use string libraries to do matrix multiplication, would you? So why would you use a LANGUAGE model to play chess?
Re: (Score:2)
Yes, well, just like Deep Blue wasn't an "AI", wasn't even a(n) (which is correct... 'a' or 'an') LLM... it was just a database with a guy at the keyboard who entered Kasparov's moves and waited for the server rack to spit out an answer... the guy at the keyboard 'was' adjusting parameters of the computers engine as the games went on.
A human, while maybe able to memorize all the popular openings and their defenses, and popular endgame strategies, functions on a different set of principles than a computer wi
William Shakespeare vs Stockfish (Score:3, Insightful)
Re: (Score:2)
In other news, William Shakespeare won in a language fluency, comprehension and poetry composition contest against Stockfish.
You're somehow overlooking how impressive an accomplishment that is, for a guy that's been dead 400 years...
Re: (Score:2)
Re: (Score:2)
I did get your point, and do agree with you. But I couldn't resist the joke!
One other thing this illustrates, at least in my opinion - companies like OpenAI and Anthropic have done a remarkable PR job, convincing people that their products are significantly more general-purpose (and significantly more advanced) than they really are. The general public -represented by Magnus Carlsen, in this instance - basically sees them as AGIs, rather than the hallucinating-and-still-simplistic tools they actually are.
Re: (Score:2)
True. I should probably have used someone contemporary.
Perhaps not. Quite a bit of what we attribute to the Bard didn't come from him. Romeo and Juliet for example is a regurgitation of an Italian work from thirty years prior.
chatGPT is actually terrible at chess (Score:1)
Re: (Score:2)
Because nobody does these benchmarks correctly. Who knows LLM knows how to help them. You would for example prompt them to print the board (probably in some concise form and not as ascii art) after each turn. Then they don't have to invest much "thought" into reconstructing the board from previous moves and can better plan next moves instead.
The sad thing about these benchmarks is, that people didn't say "We optimized it as good as possible and that's the result" but say "We did something and it didn't work
This illustrates why AI isn't a doomsday threat (Score:3)
ChatGPT and its siblings *seem* to be amazingly good at everything. But when it's necessary to do anything that requires a specialized skill, you need specialized AI.
For example, LLMs can interact with patients in a hospital, but it takes specialized AI to properly read an X-ray or an MRI. An LLM can tell you what medications are commonly used to treat specific illnesses, but if you blindly follow the advice, you're likely to miss important clues that a doctor wouldn't miss.
The LLM appears to understand how chess is played, but when it gets beyond the opening gambits that are well-documented, it falls apart.
In any kind of business, LLMs can provide output that appears sound, but when you get into the weeds, falls apart. We're going to need people to do the heavy lifting for some time to come. The death of white-collar employment is greatly exaggerated.
Re: (Score:2)
They merely have to make a plug-in for a chess bot which takes over chess game questions. Sure, it will not be reasoning but operating another heuristic bot; but Turing's test is proving to not be such a bad one after all.
Once you've trained the bot to fool a person for everything they ask it's on to the next person until only some experts are not fooled... There are only so many tests most people can dream up and you only need to cover that large problem space. So, if it fools most people, is it alive?
Re: (Score:2)
I never said anything about passing a test. Turing tests have always been laughable.
They merely have to make a plug-in for a chess bot which takes over chess game questions
Exactly my point. AI isn't going to magically know everything about everything. You're going to have to make "plug-ins" for a million different specialized tasks, in this case, playing chess. And that's what will slow down the progress of AI "taking over white collar jobs." There are simply too many specialized jobs that an LLM won't be good at, on its own. And all those specialized plug-ins are going to be very, very expens
Re: (Score:2)
But did you consider, that such a high-end LLM might be able to write the code for a chess engine, that outdoes itself at playing chess? If I have to beat a chess master, I wouldn't play myself either. I think I can code a chess engine that plays much better than me - not a huge challenge given my chess skills, but as I aren't that good at chess and know some of the algorithms needed for chess engines, I would just use what would help me, as long as I am allowed. And your doomsday AI wouldn't care if you th
Re: (Score:2)
The idea that AI could write code that creates a "really good" chess engine, misunderstands how AI works. AI doesn't create anything, it just takes all the existing code it's seen, and synthesizes / summarizes it to form its output. It's not *actually* writing code, it's just regurgitating code that's a plausible answer to the provided prompt.
So, if your LLM has been trained on source code for an excellent chess engine, it might be able to produce portions of code that fit. But that's very different from bu
Super Excited (Score:3)
Re: (Score:2)
Posting this for the lulz.
https://www.businessinsider.co... [businessinsider.com]
asking ChatGPT for critque (Score:2)
It's a bit like asking the dumbest kid in class for his notes.
Magnus Carlsen failed at prompting (Score:2)
While ChatGPT may be smarter than most slashdot posters (based on the responses here), and probably also knows chess better than most (it's trained on millions of games), but unfortunately it's not geared to show that. I'm too lazy to dig up the results from the person who researched this, but apparently there's a particular version of ChatGPT 3.5 which is decent at chess out of the box, and others require specific prompting to get on the right track.
ChatGPT would likely have still lost to Carlsen even with
why is this a story? (Score:2)