Elo Chess Rating System Topped By Proposed Replacements 102
databuff writes "About six weeks ago, Slashdot reported a competition to find a chess rating algorithm that performed better than the official Elo rating system. The competition has just reached the halfway mark and the best entries have outperformed Elo by over 8 per cent. The leader is a Portuguese physicist, followed by an Israeli mathematician and then a pair of American computer scientists."
Sweet (Score:2, Funny)
Re: (Score:2)
The leader is after all a portRugese person from portRugal.....
Re:what now? (Score:5, Interesting)
Re: (Score:3, Funny)
Long live Physicists and they physicisteries!
Re:what now? (Score:5, Funny)
Re: (Score:2)
How many people in a playground wear bifocals?! (Teachers don't count.)
Re: (Score:3, Informative)
Re:what now? (Score:4, Insightful)
Re: (Score:3, Informative)
"But do they have sharks on which to mount them?"
We must avoid them teaming to Biologists at all costs!
Re: (Score:1)
Re: (Score:2)
Written like an engineer. To the mathematician the magnitude does not mean a thing, the ordering does.
Re: (Score:3, Informative)
Yeah, and his name is Él(Lowercase O-double acute), not Elo, but I understand that "hungarian umlauts" causes significant cognitive stress :)
Even for Slashdot it seems...
Re: (Score:2, Informative)
Re: (Score:2)
It is funny that Slashdot swallows hungarian characters: "Él" is certainly not what you wanted to write :)
Re: (Score:2)
Wake me up when a biologist puts in a credible challenge.
Re:Interesting (Score:5, Informative)
Re: (Score:2)
This is chess rating algorithm. The goal is to predict given a matchup between two players with known histories how they will likely fare in a game or series of games against each other. Elo is the standard rating system and has been for some time. These algorithms are improvements on that. So they predict better who will win. They have nothing to do with playing actual chess. So the Turk is irrelevant to this discussion (aside from the not minor issue that the operator has been dead for some time.)
You don't understand, the winning system is using a midget to guess the outcomes.
Re: (Score:2)
So the Turk is irrelevant to this discussion (aside from the not minor issue that the operator has been dead for some time.)
So now we'll never know the answer to the Istanbul - Constantinople naming question!
Errata (Score:1)
"Portrugese"? :-)
Did you mean "Portuguese"?
Re:Errata (Score:5, Funny)
Re: (Score:1)
It's Portuguese ;-)
http://translate.google.com/translate_t?hl=&ie=UTF-8&text=errata&sl=pt&tl=en [google.com]
Re: (Score:2, Funny)
No, Portrugal. Between Spairn and the Atlantirc.
Re: (Score:2)
Can't be so (Score:5, Funny)
A friend called my on my telephone line and told me out of the blue that the Elo rating system had been bested. I was so stunned I almost turned to stone. I said, "Dude, don't bring me down!". But the news slowly sunk in, and now I can't get it out of my head. But I'll tell you what, the jury is still out. I think there's gonna be a showdown, and then Elo will be back on top.
Re: (Score:2)
A friend called my on my telephone line and told me out of the blue that the Elo rating system had been bested. I was so stunned I almost turned to stone. I said, "Dude, don't bring me down!". But the news slowly sunk in, and now I can't get it out of my head. But I'll tell you what, the jury is still out. I think there's gonna be a showdown, and then Elo will be back on top.
I wonder how many people on Slashdot are old enough to get this... at least 4, apparently!
Re:Can't be so (Score:4, Funny)
I don't get it.
REVEAL YOUR SECRETS!
Wow, Slashdot won't allow me to post with that ratio of non-caps to caps. So I need to write all of this to correct the ratio. The error says "Filter error: Don't use so many caps. It's like YELLING.".
Dear robotic automated moderating overlord,
I know it's like yelling, that's the effect I was going for. Obviously your algorithm is shit, because you don't seem to understand context... or love.
Sincerely,
definate
Re: (Score:1)
Anyway, I suspect the answer is...
http://en.wikipedia.org/wiki/Electric_Light_Orchestra [wikipedia.org]
Re: (Score:2, Informative)
His post is chock full o' snippets ELO [wikipedia.org] songs.
Electric Light Orchesra reference....no? (Score:1)
Re: (Score:2)
I found it funny without realising it's a reference. In fact, finding out it's a reference to something makes it a little less funny.. though not as bad as when I make a joke and people are like "what's that from?".
Re: (Score:2)
To be fair it's not like he took it from one source and changed a couple words around. Stringing together song and album titles:
A friend called my on my telephone line and told me out of the blue that the Elo rating system had been bested. I was so stunned I almost turned to stone. I said, "Dude, don't bring me down!". But the news slowly sunk in, and now I can't get it out of my head. But I'll tell you what, the jury is still out. I think there's gonna be a showdown, and then Elo will be back on top.
There
Re: (Score:2)
Ah, I guess I'm just too young and have never been exposed to ELO's music. That's definitely a worthy set of references, though the post would still be funny even if the band ELO never existed.
Re: (Score:2, Funny)
Ole! (Score:2)
Re: (Score:1)
With milk?
Ah confusion... such a terrible shame... (Score:2)
Confusion. You don't know what you're sayin'.
You've lost your love and you just can't carry on.
You know there's no-one for you to lean on.
To le-ee-an on.
-- ELO
Re: (Score:2)
Is that you, Bruce?
No, my name is actually "Grroosss".
Re: (Score:1)
Not surprising at all (Score:5, Insightful)
This is entirely unsurprising. The Elo system was, in a sense, designed to be easily calculable in a time before things like computers or databases or data mining were especially common (after all, it was adopted by US Chess Federation in 1960!), and it hasn't been revised much if at all since then. Of course statisticians using modern methods and number crunching capabilities and huge databases of both game results and game moves are going to be able to beat it by a lot - this isn't like the Netflix prize, where a bunch of teams were competing to improve something that had been in active development up until that very year.
Re: (Score:2)
Re:More people interested (Score:2)
But notice that a ratings squabble gets prime coverage and Anand's championship win was ignored?
Re: (Score:2)
But notice that a ratings squabble gets prime coverage and Anand's championship win was ignored?
Probably because people here have more interest in algorithms than in chess itself?
Re: (Score:2)
Re:Chess Championship (Score:2)
My point exactly... *The* World Chess Championship - the classical time control match with Topalov.
Our every friendly Wiki Link -
https://secure.wikimedia.org/wikipedia/en/wiki/World_Chess_Championship_2010 [wikimedia.org]
"Arctic Securities Chess Stars" is, to quote Chessbase,
"This rapid chess tournament is taking place in Kristiansund from Saturday, August 28th to Monday, August 30th 2010. It is a double round robin with four players: Magnus Carlsen, Viswanathan Anand, Judit Polgar and Jon Ludvig Hammer. On Monday there fo
Re: (Score:2)
Re: (Score:2)
There have been attempts to improve Elo over the years as well. Glicko and TrueSkill (from Microsoft reseach, used on the Xbox) are the most commonly mentioned. Also, a lot of game sites have developed variants on decayed history Elo by trial and error. The one at KGS, for instance, is pretty impressive. There's also less known academic research, such as Remi Coulom's paper on Whole History Rating.
Deciding which is the better chess player from what they've won in the past is also a far simpler problem than
Re: (Score:2)
Of course statisticians using modern methods and number crunching capabilities and huge databases of both game results and game moves are going to be able to beat it
You mean data miners can predict the database they built their algorithms on? Wow!
A true test would be to accurately predict results in the next ten years.
Re: (Score:2)
You mean data miners can predict the database they built their algorithms on?
As far as I can tell, the principle of the test works similarly to the following: Take a database with multiple years of results, train the algorithm on all but the final year, and predict the final year. Someone who cares enough about chess skill rankings to have read the article carefully could fill in more details.
Re: (Score:2)
I'm participating in the contest. The training set is 100 months; the test set is months 101-105.
Re: (Score:2)
There's not much choice but to start there. I'm sure they are interested in seeing how it does over the next 10 years of results, but unless you have a technology I don't know about, that'll take 10 years.
Re: (Score:2)
That's not how prediction competitions work, obviously.
Everyone is given a "training" dataset, which contains the results. The contestants mine this dataset to determine their algorithm, which is then applied to a "test" dataset that has hidden results (i.e. who won the game). The contestants are judged by how well they do on the test set.
What is suprising... (Re:Not surprising at all) (Score:2)
Is that with the best tech (both machines and math techniques) ELO has only been bested by 8%. You'd think it would be at least in the low 20's. Whether ELO is retained, it's a testament to its genius.
Incidently folks, Chess is only the most well known user of ELO ratings. Many other competitive games make use of them as well.
Whole History Rating (Score:5, Interesting)
The french computer scientist Remi Coulom, well-known for the pioneering computer go program Crazy Stone, has published some very interesting research on this issue. He claims not only to beat Elo, but also Glicko, Microsoft's TrueSkill and decayed-history approaches.
I was going to see if I could implement his ideas for the competition, since he's not going to participate himself. But it doesn't look like I have time for it.
Here's the paper [coulom.free.fr] in case anyone wants to give it a try. I suspect the approach is a bit more solid than the ad-hoc approaches of the quants.
Re: (Score:1)
Re:Whole History Rating (Score:4, Informative)
Glicko isn't designed to take advantage of all the information that's available in this competition. To calculate your new Glicko rating, you just need the Glicko ratings of both players + the result. I bet all serious contenders in the competition use the whole history somehow. (I talked with one who uses a decayed history scheme; he beats Glicko).
As to the leaderboard, it's really not so clear. Almost certainly, some of the contenders are accidentally overfitting to the leaderboard test data.
Obvious question (Score:4, Funny)
What is the punchline? (Score:2)
The bartender says:
Re: (Score:1, Funny)
Elo, elo, elo, what's going on 'ere then?
(He's a part time policeman as well)
Re: (Score:1, Funny)
A Portugese physicist, an Israeli mathematician and two American programmers walk into the bar.
The bartender says:
Sorry lads, read the sign: "Cheques are NOT accepted."
*rim-shot*
Re: (Score:2)
"Whoa, is this some kind of a joke?!"
Re: (Score:2)
That's Jesus fucking Christ to you, FFSMS!
Portrugese (Score:5, Funny)
Re:Portrugese (Score:5, Funny)
Hery trhe Navgatror, a Portugese prirnce, was in lrge partr resposible for Portugese effortrs durirng the age of explorartion.
Wait just a second! you cannot go changing the subject suddenly like that... focus!, we are talking about Portrugese here!
Re:Portrugese (Score:5, Funny)
That's just a typo! Don't be such a grammar nazi!
Re: (Score:1)
and what about rock/paper/scissors (Score:2, Interesting)
Many rating systems seem to assume transitive dominance structures. If you are playing rock/paper/scissors no rating would be sufficient to predict the outcome of a tournament. Many games (using Batttlenet, true skill..) propably are not interested in finding nontransitive structures since players want to be the best and fans want to know who is the best which is kind of pointless with r/p/s.
Re: (Score:2)
Many rating systems seem to assume transitive dominance structures. If you are playing rock/paper/scissors no rating would be sufficient to predict the outcome of a tournament. Many games (using Batttlenet, true skill..) propably are not interested in finding nontransitive structures since players want to be the best and fans want to know who is the best which is kind of pointless with r/p/s.
In other words, styles make fights.
Re: (Score:2)
That's true, but that's not because of intransitivities in the game, it's because there's so little difference between human players. Just because there's intransitivities in the game doesn't mean there's intransitivity in the rankings - Starcraft is built around intransitivities, but the rankings work just fine.
Re: (Score:1)
That's true, but that's not because of intransitivities in the game, it's because there's so little difference between human players. Just because there's intransitivities in the game doesn't mean there's intransitivity in the rankings - Starcraft is built around intransitivities, but the rankings work just fine.
thats true, i should have said intransitives in the player vs player outcomes rather than the game mechanics.
i am certain that the sc2 story is FAR more complicated than the rankings even tough every race setup in 1vs1 should be balanced in theory.
Re: (Score:2)
I am not quite certain that I follow. Depending on tournament type R/P/S is (mostly) a game of chance, chess isn't. The only way I can see R/P/S applying is if they represent the players themselves, not the game. In other words, instead of having individual strength ratings that can be measured in isolation, player R's style of play might be naturally stronger against player S's style than that of player P, in which case we could only express the relative strengths of any pairing. This would allow a scenari
Re: (Score:2)
R/P/S is an interesting game for precisely this reason. A pure random player will win 50% of the time, but no one actually plays completely randomly (even if they try to, humans are terrible random number generators). Both players are trying to model the other player's strategy - if you can predict what the other player will do in the next round, you can win it. For example, if he always follows rock with scissors, you can follow his rock with rock and win. Of course, if he realises that you are modelli
Re: (Score:1)
you are ofcourse correct, i meant the players and not the game.
ELO isn't just for chess (Score:2)
The ELO rating system isn't just used for chess, but many other competitive games (including video games). Therefore, this new 'improvement' may not apply to other games so well, if they've only used chess win/loss data. Sometimes, the simplest formulas are the best/most general.
Even within the ELO system, tweaks can be made [wikipedia.org], though FIDE still uses the original system for whatever reasons.
Emo Chess (Score:2)
Re: (Score:1)
Don't Bring Me Down (Score:2)
ELO ?
I didn't know the Electric Light Orchestra was still around
And once again in dead last... (Score:2)
Fuck chess (Score:1, Flamebait)
Octopus (Score:1)
now... (Score:2)