Chess Ratings — Move Over Elo 133
databuff writes "Less than 24 hours ago, Jeff Sonas, the creator of the Chessmetrics rating system, launched a competition to find a chess rating algorithm that performs better than the official Elo rating system. The competition requires entrants to build their rating systems based on the results of more than 65,000 historical chess games. Entrants then test their algorithms by predicting the results of another 7,809 games. Already three teams have managed create systems that make more accurate predictions than the official Elo approach. It's not a surprise that Elo has been outdone — after all, the system was invented half a century ago before we could easily crunch large amounts of historical data. However, it is a big surprise that Elo has been bettered so quickly!"
Indeed (Score:5, Funny)
However, it is a big surprise that Elo has been bettered done so quickly!
Absolutely. I can almost guarantee no one thought that Elo would have been bettered done so quickly.
Re: (Score:1)
Does Timothy even glance at the stories he approves or is it pure pin the tail on the donkey?
Re:Indeed (Score:5, Funny)
Timothy is the bettered done editor of Slashdot!
Re: (Score:1, Offtopic)
# He's a bettered done kid
[bettered done baby]
Battered dome kid
[battered dome baby]
ooh ooooh ooh ooooh oo oo ooh ohh a hooway hooway hoowah hoowah/#
Fuck me, I'd forgotten what a pile of shite Deacon Park South Texas were. Thanks a bastarding bunch for reminding me, you heiferflap.
Surreal Gone Kid (Score:2)
Fuck me, I'd forgotten what a pile of shite Deacon Park South Texas were. Thanks a bastarding bunch for reminding me, you heiferflap.
WTF? Is this what happens when some late-1980s Scottish bands get mixed up in a transporter with a popular animation series?
:-/
If something that tenuous links to "Real Gone Kid" in your head, you must have some major trauma
Re: (Score:1, Offtopic)
Hopefully next time he will bettered posted checked done more carefully.
Re: (Score:3, Funny)
Indubitably. It filled with hope the one that no one thought Elo would have been bettered done so quickly.
Re:Indeed (Score:5, Funny)
Re: (Score:3, Insightful)
The first time I Heard Bev Bevan had joined Sabbath I kind of went "WTF?". But they're all Brummies, along with a lot of heavy metal bands around that time. Priest, Magnum ... they probably all played in pubs together wwhen they were 15.
Similarly you couldn't be a serious goth in the 80s unless you were from Leeds, or a flare-wearing floppy-mopped tossbag in the 90s if you weren't a Manc.
Mx-doctor (Score:3, Funny)
Absolutely. I can almost guarantee no one thought that Elo would have been bettered done so quickly.
Is it because elo would have been bettered done so quickly that you came to me?
What we need now.... (Score:2)
What we need now is a chess rating system rating system. Then chess rating systems can compete with each other and be rated as to how well they rate chess.
First Chess then the BCS! (Score:2)
Re: (Score:2)
been bettered done THAT quickly??? (Score:5, Funny)
umm (Score:5, Informative)
Not really. Jeff Sagarin has had two systems of rating sports teams for a while now. One, ELO_CHESS, is based purely on win-loss, while the other, PURE POINTS, takes into account margin of victory. According to him, the latter is better at predicting future results. From his analysis:
how are victory margins relevant to chess? (Score:5, Insightful)
Re: (Score:2)
Re:how are victory margins relevant to chess? (Score:5, Insightful)
There are definite merits to a sacrificial strategy- it's all about board control. Long as theres more than one or two legal moves available to your opponent, you can't really predict where he'll send his pieces. A queen in the middle of the board can cover a lot of distance and do some impressive maneuvers, but any given piece only occupies one spot. Control where your opponent moves, control the game. Not to mention that less pieces on the board gives you more options for where to move with your remaining pieces, and by allowing your pieces to be taken, you have a measure of control over where the free space on the board is.
Indeed, given the rules of the game, I would say a strategy that goes to great lengths to preserve as many of ones own pieces as possible is flawed...
Re: (Score:1)
Re:how are victory margins relevant to chess? (Score:4, Insightful)
Sorry, but... You can't checkmate with only a king and a bishop.
The hell you can't. It turns out, your opponent has pieces too! Have you ever even played chess?
Re: (Score:2)
When chess nerds talk about end game strategies it is implied that "a king and a __ " ending is one where the other player has just a king.
Re:how are victory margins relevant to chess? (Score:4, Informative)
If you ever find yourself in a game where you can sacrifice all your pieces to get to that position, DO IT!
Re: (Score:2)
Unless 1.Bh4 b3
I get your point, but you need rid of that queen.
Re: (Score:2)
Re: (Score:2)
Oh, I did solve it, but I thought black was playing the wrong move! I realised what I had wrong later last night as I was mixing up yet another lemsip.
In my defense, I'm loaded with the cold at the moment and can't think straight. Someone posted an old school picture on facebook yesterday, and it took me about 3 minutes to figure out which one was me!
Re: (Score:3, Insightful)
If some metric X is a statistically reliable method of predicting future success, then X can be defined as a margin of victory. Whether X is a function of the "values" of remaining pieces, or their positions on the board, or the number of moves, or whatever, is immaterial.
Re: (Score:2)
In chess, you win or lose. If players started "grinding" just to raise their ratings - ick.
Re: (Score:2, Interesting)
Winning with only a king and a bishop remaining is no "better" than winning with all your pieces remaining. A win is a win. That said, winning a game while having many more pieces remaining than one's opponent may imply that the difference between your skill and your opponent's is greater than if you won with only a kind and bishop left. There may be some merit t
Re: (Score:2)
So...what you're saying is it takes more skill to win the game with more of your pieces. Which means you'd be a better player than someone who needs to get rid of those pieces first. Which means the margin of victory would be a good predictor of future outcomes? Am I right?
Or, to put it another way. If a model is derived that accurately represents previous behaviour, and accurately predicts future behaviour, then the model is reasonably accurate. You're not liking it doesn't mean it's wrong.
Re: (Score:2, Informative)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
But this has nothing to do with applying the Elo method to its original setting of chess, where the outcome of the game is only "win/draw/loss" and there is no margin of victory.
You can easily keep track of a "margin" by assigning point values for the pieces that have been taken.
http://en.wikipedia.org/wiki/Chess_piece_relative_value [wikipedia.org]
That metric loses some relevance since someone behind on points can easily have a strategic victory,
but there may still be some information of value gained from crunching the numbers.
Submission error (Score:3, Informative)
Already three teams [kaggle.com] have managed create systems that make more accurate predictions than the official Elo approach.
1 EdR* 0.729125
2 whiteknight* 0.731656
3 Elo Benchmark* 0.738107 {-- The "official Elo approach"
Maybe we're counting from zero and they forgot to put it on the leaderboard?
More like commenter error (Score:3, Informative)
That number is "Root Mean Square Error", so lower is better
Re:More like commenter error (Score:4, Insightful)
Re:Submission error (Score:4, Informative)
Re: (Score:2)
Re: (Score:3, Informative)
1 Elo BenchmarkOpen 0.723834
2 EdROpen 0.729125
3 whiteknightOpen 0.731656
so at this moment elo is back on top.
Could it be that people have been done some quickly jumpening to conclusions?
I guess george [nanc.com] is working at /. now.
When I see the word Elo (Score:1, Offtopic)
I can't think of anything other than 70's cheese and largest white afro up until the release of Bobobo-bo Bo-bobo.
Less than 24 hours ago (Score:5, Funny)
Re: (Score:2)
so what you are saying is that /. editing algorithm has been bettered done quickly?
In other news... (Score:3, Funny)
We all know that's not true though. They totally would have done it.
Re: (Score:3, Funny)
differences are minute (Score:5, Interesting)
Re: (Score:2)
Perhaps there is enough inherent randomness in Chess that even simple predictive models can extract most of the systematics so that what remains after Elo is mostly noise?
No. Chess has no random elements to it. You play against an opponent, with a very strict set of rules.
Now sometimes the rules differ from game to game (such as timing, whether they use something like 3/5 fischer or 20 moves an hour sort of thing), which can have drastic changes to the outcome. For example if you do something like 20 moves an hour, sometimes Chess players will be running short on time, and they'll deliberately try to speed up their 18th 19th and 20th move to get that extra hour of time.
The o
Re: (Score:3, Insightful)
I don't think you understand what the discussion in this post is about. The game of chess has no element of randomness -- but the players do, and it's the players we are trying to model. Just because, on average, player A is better than player B, doesn't mean that player A will win every game. The fact is that the same player will play at different levels of ability on different days, and that is the ran
Re: (Score:2, Informative)
Re: (Score:2)
You clearly don't understand the point I'm trying to make then.
A mistake is an element in my performance and can happen at any time - and it will affect my ranking.
What Elo does it put me up against everyone else who is JUST as affected by these events as I am - There is nothing to say its an unfair battle. I do not have less pieces, I do not have a weaker position to start. Now there will be stronger players, and they will have higher rankings, weaker players will have lower rankings.
What the GP was saying
Re: (Score:2)
Re: (Score:2)
Why is it considered a random phenomenon when a player makes decisions in chess?
When it comes to something like Poker, you don't know that you will ever get a good hand, you are stuck trying to play against your opponent. You can go through the entire game without getting a solid winning hand compared to your opponent, and if your opponent pushes you at every turn - you've already lost and no matter how "skillful" you are, your bad luck would cause a lost even if your opponent plays stupidly calling bluffs
Re: (Score:2)
The poker analogy was talking about a *single hand.* When you shuffle a deck of cards, they will come out in an order which is precisely determined by the actions taken to shuffle them, yet we treat the order of the cards after shuffl
Re: (Score:2)
and whether you had too much coffee that morning, failed to see that move 10 steps ahead, etc. In high level chess, it seems that these kind of things have enormous effects on the outcome of the game and are not things that can be easily modeled except as random effects. Thus there is definitely a random element in the outcome of the game; Kasparov vs. Deep Blue was a mix of wins and losses; definitely not a deterministic outcome.
Re:differences are minute (Score:4, Interesting)
Tal: - "Yes. For example, I will never forget my game with GM Vasiukov on a USSR Championship. We reached a very complicated position where I was intending to sacrifice a knight. The sacrifice was not obvious; there was a large number of possible variations; but when I began to study hard and work through them, I found to my horror that nothing would come of it. Ideas piled up one after another. I would transport a subtle reply by my opponent, which worked in one case, to another situation where it would naturally prove to be quite useless. As a result my head became filled with a completely chaotic pile of all sorts of moves, and the infamous "tree of variations", from which the chess trainers recommend that you cut off the small branches, in this case spread with unbelievable rapidity.
Now I somehow realized that it was not possible to calculate all the variations, and that the knight sacrifice was, by its very nature, purely intuitive. And since it promised an interesting game, I could not refrain from making it."
Journalist: - "And the following day, it was with pleasure that I read in the paper how Mikhail Tal, after carefully thinking over the position for 40 minutes, made an accurately-calculated piece sacrifice".
You will find that lots of chess players have reported making similarly intuitive moves.
Re: (Score:2)
That's not random though, and that kind of intuition is what makes the rankings.
What I mean is - if you were to take something like WoW, put 2 identical players against each other, have them preform the exact same moves at the exact same time - one will likely lose before the other. Because there is too much random generation in the game, like crit chances and things like that.
Chess does not have any of those elements. Yes, you may have tons of moves available to you with far reaching implications but ultim
Re: (Score:2)
Re: (Score:2)
Which is part of your strategy - which would reflect on how well you play chess, no? If you are actively trying to seem more random - and it works, that will make your chess rating go up.
Re: (Score:2)
Re: (Score:2)
Welcome to the world of probability theory. In particular, get started with Bayes [wikipedia.org] and work your way from there.
Small but maybe significant differences? (Score:2)
The differences are indeed quite small, but it seems obvious that you should be able to do better than ELO by splitting it into two parts:
Games played as White and games played as Black.
In fact, this seems so obvious that I suspect there's something I have overlooked! :-)
As the contest site mentions, there's a very significant advantage to White, enough so that in their training data set White has 30+% win vs 20+% for Black.
I suggest that taking the normal ELO-predicted outcome and then biasing it according
Re: (Score:2)
Pointless. Every official ELO rating is (and any rating system that replaces will be) calculated off 50% games as White and 50% games as Black because officially rated games are played in tournaments and matches in which each player is assigned an equal number of games as each. Since every ELO rating has the same White/Black ratio, there is no "bias" from one rating to the next to be corrected for.
Re: (Score:2)
Not pointless at all!
Tournament results is what ELO really tries to predict, and there you are absolutely correct, i.e. everyone plays both White and Black equally often.
However, the current challenge is NOT to predict how well each player is going to do in the aggregate, but to minimize the error for each individual game. THIS IS CRUCIAL!
I.e. the error term is the RMS of the difference between the predicted and actual result for each individual game, not the sum of the normal pair of games against each com
Well, everyone knows (Score:2)
So they've got better... (Score:5, Interesting)
Are the better entries as transparent? ELO's a pretty simple way do do this - add or subtract a few points from the rating based on a win or a loss based on the relative difference of the ratings. Would anyone understand (other than "It's a neural net") the ratings produced by these competitors? Would anything human be able to calculate them?
Also, are the new models' improvements in prediction statistically relevant? Or are they just fitting the noise? Both the training dataset and the test dataset seem rather small to me.
Finally, and most importantly, how stable are the ratings? If I'm drunk and lose to a "patzer", do I go down to his level? Fairness of tournaments having small numbers of games has a lot to do with rating stability (unless we're assuming a population periodically beset by huge random shifts in ability).
All-in-all, there's a lot of problems coming up with a good rating system. Opening the dataset to the world, saying "Have at it!", and looking at a single scorecard based solely on predictability is nowhere near sufficient.
Re: (Score:3, Interesting)
Development of stock trading systems, which are also trying to rank things based on historical data, have this persistent problem there's been waaay more research into than chess rankings. If you train them on a bunch of historical data, you will discover the best system is invariably one that essentially does a giant curve fitting job on that exact data. One thing trading system developers do to address this are use techniques like walk forward testing [automated-...system.com], where the system gets trained on one set of data bu
Re: (Score:2)
In general, it's good to parametrize a range of plausible models, test the assumptions of the model, and conservati
Re: (Score:2)
Are the better entries as transparent? ELO's a pretty simple way do do this - add or subtract a few points from the rating based on a win or a loss based on the relative difference of the ratings. Would anyone understand (other than "It's a neural net") the ratings produced by these competitors? Would anything human be able to calculate them?
Take a look at the formulae used - Elo, particularly for tournament play, is already complicated enough that it's beyond the reach of a "back-of-the-napkin" calculation to work out your rating change. That's seen as one of the big advantages of the English Chess Federation's rating system; it's very simple, so you can just work out the change yourself.
Apples and oranges? (Score:2)
Since the Elo system is not designed to predict future performance (it's designed to capture current relative rankings), then is it really surprising that programs designed to predict future performance are better at it?
Re: (Score:3, Informative)
Since the Elo system is not designed to predict future performance (it's designed to capture current relative rankings), then is it really surprising that programs designed to predict future performance are better at it?
And if my current relative rank is higher than yours, doesn't that imply that if we play each other I should win? If not, what purpose does the rank serve?
Re: (Score:3, Funny)
And if my current relative rank is higher than yours, doesn't that imply that if we play each other I should win? If not, what purpose does the rank serve?
Historical achievement, the glory of the grind. Much as my lower UID implies this comment should be more valuable than your high UID comment.
Re: (Score:3, Interesting)
Much as my lower UID implies this comment should be more valuable than your high UID comment.
I used to think of myself as having a particularly high UID... until I realized that mine is actually lower than a majority of the total UIDs. Weirded me out a little. There are UIDs that are farther from the 1,000,000 mark than I am from Taco.
Re: (Score:2)
Re: (Score:3, Informative)
No we don't. This is not the crawler you're looking for.
OG.
Re: (Score:3, Funny)
Sometimes I suspect low UID users have a crawler that looks for people referencing low UIDs...
I had no idea COBOL was so powerful.
Re: (Score:2)
That depends on the relative difference between the ranks. A narrow difference implies you might win, a wider difference implies you will win - and between the two lies a spectru
Re: (Score:2)
You're picking nits. His point still remains: The ranking system *should* provide a prediction of future performance, as its supposed to be an indicator of relative skill. Of course, if two ranks are close together, that means your error bars will be wider, but that doesn't change the basic fact that a higher rank should fundamentally translate to a higher likelihood of winning.
Re: (Score:2)
My main point though was that Elo is actually predictive. Not that it's perfect.
Re: (Score:2)
Probably too late but ...
Never too late for a good discussion! Who cares if we have no audience?
Elo is predictive in terms of tournament standings (as long as we're talking established ratings. Any kind of provisional rating and ... well it's better than nothing, but Elo felt that provisional ratings were only accurate to within 20 or so points). My point though was that when you're talking specifically head to head they are much less so.
Styles make fights in boxing and the same seems to be largely true in chess
I will happily concede that Elo is imperfect and that there are factors such as style that it won't adequately account for. Given that, I believe that if you take a pool of non-provisionally Elo ranked players and randomly pit them against each other, picking the winner based on who has the higher rank will perform better than a coin toss with statistical significance. On this basis I submit that Elo is predictive, albeit with flaws.
Re: (Score:1)
Re: (Score:2)
That's a damm good question, and one I don't know the answer to.
For a $50 voucher? (Score:2)
I don't think so. The time I'd spend on this project is worth a bit more than $50...
Re: (Score:1)
I can see why it's such a surprise... (Score:2)
Elo in non-chess games (Score:5, Insightful)
Ah man, no matter how inadequate the Elo system may be for chess, it's much worse seeing it applied to other games where it doesn't belong, which happens regrettably often. The trouble is that the Elo system depends on the premise that nothing affects the outcome of a game other than the skill of each player (and who gets the white pieces).
In chess, that assumption is a pretty good approximation to reality, since every tournament game in run the same way. But many games do have variations in rules or format across different events, such as different maps or races in a real-time strategy game, or different card pools in Magic: The Gathering. Then Elo ratings are biased by how often a player has the chance to play to his strong areas. Players in turn are compelled to game the system: "I should avoid this event because they're using Format X and my rating will stay stronger if I stick to Format Y." The Elo system is meant precisely to obviate that kind of gamesmanship: chess players should need to think only about the strengths of their opponents, which (in principle) will be weighted fairly when calculating rating adjustments. But if there are other competitive factors, which is true for most any popular game invented in the last 30 years, Elo ratings become that much less meaningful.
Re: (Score:2)
Yes, linear ranking systems fail hard at anything as, let alone more, complex than rock-paper-scissors.
Re: (Score:2)
Elo is a single number. The real number line is linear.
Re: (Score:2)
The Elo system does not depend on the premise "nothing affects the outcome of a game other than the skill of each player".
Sure it is modelled according to that, but in practise it is very untrue even for chess. There are a lot of examples where player A has won player B N out of M times although according to rating difference very different outcome should have happened.
The chess events are not similar, I have played a few and they do vary considerably (number of games per day, travel, lighting, temperature,
Allow me to clarify (Score:4, Funny)
He sounds like Lady Macbeth on crack.
Cheese! (Score:1)
Microsoft's TrueSkill beat Elo before this comp (Score:1)
I believe the algorithm used by Microsoft to match players for X-Box games was already beating Elo before this competition. They have a description of their algorithm at http://research.microsoft.com/en-us/projects/trueskill/ [microsoft.com]
Re: (Score:3, Informative)
Here’s the problem with Battle.net 2.0: 2002s Warcraft III: Reign of Chaos is one of the most underrated video games ever created. And that’s before you learn its online apparatus is the foundation for modern matchmaking, where Blizzard Entertainment should get royalties every time you brag about your X-Box Live Trueskill rating. (Then again, I shouldn’t be giving Blizzard ideas right now.)
Here’s how Warcraft III matchmaking worked: Everyone starts at level one. The maximum level is fifty. You play players within six levels of your own. Win five games, gain a level. Lose five games, lose a level. The penalty for losing is reduced during levels one to nine. Thus, players who win half their games will become level ten.
It was simple and transparent. That was the hook, and people choked on it. It turned Warcraft III ladder play into what ICCUP serves for Starcraft players, a stomping ground so competitive that climbing the food chain gave you a shot at the guys who played for a living. That’s what a good online gaming system does.
The quote comes from Battle.net 2.0: The Antithesis of Consumer Confidence [the-ghetto.org]. I would encourage you to read the entire thing, but for reasons completely unrelated to this thread.
Re: (Score:2)
Re: (Score:3, Funny)
Pleased to say I jumped straight into the money at #7 with my first submission :-)
Where AM I going to spend a whole 50 Euros ? Maybe I'll donate it to Greece, seems like they need it.
Re: (Score:2)
Damnit ... $50 USD ... that's only 38.50 Euros.
Elo Anecdote (Score:5, Informative)
Not relevant specifically to this story, but I always laugh at the story of how a prisoner manpiulated the Elo system via closed pool ratings inflation [wikipedia.org].
Short summary: said prisoner only played against other prisoners, who he'd trained. Due to careful scheduling of the games, he rose from his true strength (probably sub-master) to being the second-highest rated played in the U.S. in 1996.
System Feedback (Score:2)
The problem with this kind of modeling is that many "good fitting" algorithms would, if implemented, change the system itself. There's more to competition chess than just the rules on how to move pieces. For example, while a game in isolation would almost always be played to win, there are many times that because of information from ratings (or due to the method of the tournament) you would start the game being equally happy to draw, which will affect how you play.
Now, even if the difference in the numb
Complexity vs performance (Score:2)
Re: (Score:2)
bettered done --> bested
Re: (Score:2, Funny)
Re: (Score:2)
I was JUST ranting about how we shouldn't care about trivial things like spacing after periods, but this is just a sad excuse for journalism.
"beaten"
"surpassed"
"out done"
"blown out of the water"
And while it seems a little old-fashioned, "bested" would work.
Come on people, read it twice before submitting for millions to read.
Git 'er bettered done'ed!