Chess Ratings — Move Over Elo 133
databuff writes "Less than 24 hours ago, Jeff Sonas, the creator of the Chessmetrics rating system, launched a competition to find a chess rating algorithm that performs better than the official Elo rating system. The competition requires entrants to build their rating systems based on the results of more than 65,000 historical chess games. Entrants then test their algorithms by predicting the results of another 7,809 games. Already three teams have managed create systems that make more accurate predictions than the official Elo approach. It's not a surprise that Elo has been outdone — after all, the system was invented half a century ago before we could easily crunch large amounts of historical data. However, it is a big surprise that Elo has been bettered so quickly!"
differences are minute (Score:5, Interesting)
So they've got better... (Score:5, Interesting)
Are the better entries as transparent? ELO's a pretty simple way do do this - add or subtract a few points from the rating based on a win or a loss based on the relative difference of the ratings. Would anyone understand (other than "It's a neural net") the ratings produced by these competitors? Would anything human be able to calculate them?
Also, are the new models' improvements in prediction statistically relevant? Or are they just fitting the noise? Both the training dataset and the test dataset seem rather small to me.
Finally, and most importantly, how stable are the ratings? If I'm drunk and lose to a "patzer", do I go down to his level? Fairness of tournaments having small numbers of games has a lot to do with rating stability (unless we're assuming a population periodically beset by huge random shifts in ability).
All-in-all, there's a lot of problems coming up with a good rating system. Opening the dataset to the world, saying "Have at it!", and looking at a single scorecard based solely on predictability is nowhere near sufficient.
Re:Apples and oranges? (Score:3, Interesting)
Much as my lower UID implies this comment should be more valuable than your high UID comment.
I used to think of myself as having a particularly high UID... until I realized that mine is actually lower than a majority of the total UIDs. Weirded me out a little. There are UIDs that are farther from the 1,000,000 mark than I am from Taco.
Re:So they've got better... (Score:3, Interesting)
Development of stock trading systems, which are also trying to rank things based on historical data, have this persistent problem there's been waaay more research into than chess rankings. If you train them on a bunch of historical data, you will discover the best system is invariably one that essentially does a giant curve fitting job on that exact data. One thing trading system developers do to address this are use techniques like walk forward testing [automated-...system.com], where the system gets trained on one set of data but is only evaluated on a second set.
Luckily, this chess rating competition is using that sort of technique: "Competitors train their rating systems using a training dataset of over 65,000 recent results for 8,631 top players. Participants then use their method to predict the outcome of a further 7,809 games." In fact, the current leaderboard reflects results on only 1/10 of the training set. So long as real ranking is ultimately based on the unseen data set, not the training one, there's little risk of them fitting the noise in the training set and still winning.
Re:how are victory margins relevant to chess? (Score:2, Interesting)
Winning with only a king and a bishop remaining is no "better" than winning with all your pieces remaining. A win is a win. That said, winning a game while having many more pieces remaining than one's opponent may imply that the difference between your skill and your opponent's is greater than if you won with only a kind and bishop left. There may be some merit to working that into an algorithm if the goal is to predict the outcome of future matches.
Another data point that might be valuable is simply "how many moves did the game take before checkmate"? Without any other knowledge, the guy who beats me in 10 moves is likely to be a better player than the guy who takes 50 moves to beat me.
Re:differences are minute (Score:4, Interesting)
Tal: - "Yes. For example, I will never forget my game with GM Vasiukov on a USSR Championship. We reached a very complicated position where I was intending to sacrifice a knight. The sacrifice was not obvious; there was a large number of possible variations; but when I began to study hard and work through them, I found to my horror that nothing would come of it. Ideas piled up one after another. I would transport a subtle reply by my opponent, which worked in one case, to another situation where it would naturally prove to be quite useless. As a result my head became filled with a completely chaotic pile of all sorts of moves, and the infamous "tree of variations", from which the chess trainers recommend that you cut off the small branches, in this case spread with unbelievable rapidity.
Now I somehow realized that it was not possible to calculate all the variations, and that the knight sacrifice was, by its very nature, purely intuitive. And since it promised an interesting game, I could not refrain from making it."
Journalist: - "And the following day, it was with pleasure that I read in the paper how Mikhail Tal, after carefully thinking over the position for 40 minutes, made an accurately-calculated piece sacrifice".
You will find that lots of chess players have reported making similarly intuitive moves.