Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Classic Games (Games) Programming

Elo Chess Rating System Topped By Proposed Replacements 102

databuff writes "About six weeks ago, Slashdot reported a competition to find a chess rating algorithm that performed better than the official Elo rating system. The competition has just reached the halfway mark and the best entries have outperformed Elo by over 8 per cent. The leader is a Portuguese physicist, followed by an Israeli mathematician and then a pair of American computer scientists."
This discussion has been archived. No new comments can be posted.

Elo Chess Rating System Topped By Proposed Replacements

Comments Filter:
  • Sweet (Score:2, Funny)

    by Anonymous Coward
    Castle this.
  • "Portrugese"?
    Did you mean "Portuguese"? :-)

  • Can't be so (Score:5, Funny)

    by Waffle Iron ( 339739 ) on Wednesday September 22, 2010 @12:50AM (#33659282)

    A friend called my on my telephone line and told me out of the blue that the Elo rating system had been bested. I was so stunned I almost turned to stone. I said, "Dude, don't bring me down!". But the news slowly sunk in, and now I can't get it out of my head. But I'll tell you what, the jury is still out. I think there's gonna be a showdown, and then Elo will be back on top.

    • A friend called my on my telephone line and told me out of the blue that the Elo rating system had been bested. I was so stunned I almost turned to stone. I said, "Dude, don't bring me down!". But the news slowly sunk in, and now I can't get it out of my head. But I'll tell you what, the jury is still out. I think there's gonna be a showdown, and then Elo will be back on top.

      I wonder how many people on Slashdot are old enough to get this... at least 4, apparently!

      • by definate ( 876684 ) on Wednesday September 22, 2010 @01:49AM (#33659470)

        I don't get it.

        REVEAL YOUR SECRETS!

        Wow, Slashdot won't allow me to post with that ratio of non-caps to caps. So I need to write all of this to correct the ratio. The error says "Filter error: Don't use so many caps. It's like YELLING.".

        Dear robotic automated moderating overlord,
        I know it's like yelling, that's the effect I was going for. Obviously your algorithm is shit, because you don't seem to understand context... or love.
        Sincerely,
        definate

      • Well that popped into my head as soon as as I seen Elo in the story headline. And I'm only 30 and 2 days and I actually have one of their 8-track a couple of CD's.
      • I found it funny without realising it's a reference. In fact, finding out it's a reference to something makes it a little less funny.. though not as bad as when I make a joke and people are like "what's that from?".

        • To be fair it's not like he took it from one source and changed a couple words around. Stringing together song and album titles:

          A friend called my on my telephone line and told me out of the blue that the Elo rating system had been bested. I was so stunned I almost turned to stone. I said, "Dude, don't bring me down!". But the news slowly sunk in, and now I can't get it out of my head. But I'll tell you what, the jury is still out. I think there's gonna be a showdown, and then Elo will be back on top.

          There

          • Ah, I guess I'm just too young and have never been exposed to ELO's music. That's definitely a worthy set of references, though the post would still be funny even if the band ELO never existed.

    • Re: (Score:2, Funny)

      by halestock ( 1750226 )
      I dunno, I heard the new system has an IQ of 1001, has a jumpsuit on, and is also a telephone.
    • Had to be said...
    • Confusion. It's such a terrible shame.
      Confusion. You don't know what you're sayin'.
      You've lost your love and you just can't carry on.
      You know there's no-one for you to lean on.
      To le-ee-an on.

      -- ELO
    • by Dabido ( 802599 )
      Mister Blue Sky told me. What a Discovery.
  • by IICV ( 652597 ) on Wednesday September 22, 2010 @01:11AM (#33659336)

    This is entirely unsurprising. The Elo system was, in a sense, designed to be easily calculable in a time before things like computers or databases or data mining were especially common (after all, it was adopted by US Chess Federation in 1960!), and it hasn't been revised much if at all since then. Of course statisticians using modern methods and number crunching capabilities and huge databases of both game results and game moves are going to be able to beat it by a lot - this isn't like the Netflix prize, where a bunch of teams were competing to improve something that had been in active development up until that very year.

    • But the point of the story is to get more people interested in their contest by putting it on the front page of Slashdot. Which it probably will do.
      • But notice that a ratings squabble gets prime coverage and Anand's championship win was ignored?

        • But notice that a ratings squabble gets prime coverage and Anand's championship win was ignored?

          Probably because people here have more interest in algorithms than in chess itself?

        • Uh, which championship? Last I can tell he took second to Carlsen in August's Arctic Securities Chess Stars championship in August. Besides that, it shouldn't be newsworthy that the current world champion wins a tournament.....
          • My point exactly... *The* World Chess Championship - the classical time control match with Topalov.

            Our every friendly Wiki Link -
            https://secure.wikimedia.org/wikipedia/en/wiki/World_Chess_Championship_2010 [wikimedia.org]

            "Arctic Securities Chess Stars" is, to quote Chessbase,
            "This rapid chess tournament is taking place in Kristiansund from Saturday, August 28th to Monday, August 30th 2010. It is a double round robin with four players: Magnus Carlsen, Viswanathan Anand, Judit Polgar and Jon Ludvig Hammer. On Monday there fo

            • Dude, the world championship ended in May. Why would you expect anyone to post in September about a tournament that ended in May? And even in May the result wasn't very interesting.
    • There have been attempts to improve Elo over the years as well. Glicko and TrueSkill (from Microsoft reseach, used on the Xbox) are the most commonly mentioned. Also, a lot of game sites have developed variants on decayed history Elo by trial and error. The one at KGS, for instance, is pretty impressive. There's also less known academic research, such as Remi Coulom's paper on Whole History Rating.

      Deciding which is the better chess player from what they've won in the past is also a far simpler problem than

    • by Jurily ( 900488 )

      Of course statisticians using modern methods and number crunching capabilities and huge databases of both game results and game moves are going to be able to beat it

      You mean data miners can predict the database they built their algorithms on? Wow!

      A true test would be to accurately predict results in the next ten years.

      • by tepples ( 727027 )

        You mean data miners can predict the database they built their algorithms on?

        As far as I can tell, the principle of the test works similarly to the following: Take a database with multiple years of results, train the algorithm on all but the final year, and predict the final year. Someone who cares enough about chess skill rankings to have read the article carefully could fill in more details.

        • I'm participating in the contest. The training set is 100 months; the test set is months 101-105.

      • by sjames ( 1099 )

        There's not much choice but to start there. I'm sure they are interested in seeing how it does over the next 10 years of results, but unless you have a technology I don't know about, that'll take 10 years.

      • by rm999 ( 775449 )

        That's not how prediction competitions work, obviously.

        Everyone is given a "training" dataset, which contains the results. The contestants mine this dataset to determine their algorithm, which is then applied to a "test" dataset that has hidden results (i.e. who won the game). The contestants are judged by how well they do on the test set.

    • Is that with the best tech (both machines and math techniques) ELO has only been bested by 8%. You'd think it would be at least in the low 20's. Whether ELO is retained, it's a testament to its genius.

      Incidently folks, Chess is only the most well known user of ELO ratings. Many other competitive games make use of them as well.

  • Whole History Rating (Score:5, Interesting)

    by Vintermann ( 400722 ) on Wednesday September 22, 2010 @01:20AM (#33659370) Homepage

    The french computer scientist Remi Coulom, well-known for the pioneering computer go program Crazy Stone, has published some very interesting research on this issue. He claims not only to beat Elo, but also Glicko, Microsoft's TrueSkill and decayed-history approaches.

    I was going to see if I could implement his ideas for the competition, since he's not going to participate himself. But it doesn't look like I have time for it.

    Here's the paper [coulom.free.fr] in case anyone wants to give it a try. I suspect the approach is a bit more solid than the ad-hoc approaches of the quants.

    • According to the leaderboard, Glicko is being beaten by ~5 per cent. Coulom's system better be pretty good!
      • by Vintermann ( 400722 ) on Wednesday September 22, 2010 @04:11AM (#33659868) Homepage

        Glicko isn't designed to take advantage of all the information that's available in this competition. To calculate your new Glicko rating, you just need the Glicko ratings of both players + the result. I bet all serious contenders in the competition use the whole history somehow. (I talked with one who uses a decayed history scheme; he beats Glicko).

        As to the leaderboard, it's really not so clear. Almost certainly, some of the contenders are accidentally overfitting to the leaderboard test data.

  • by glwtta ( 532858 ) on Wednesday September 22, 2010 @01:23AM (#33659378) Homepage
    So, how did they rank the entries?
  • A Portugese physicist, an Israeli mathematician and two American programmers walk into the bar.

    The bartender says:

    • Re: (Score:1, Funny)

      by Anonymous Coward

      Elo, elo, elo, what's going on 'ere then?

      (He's a part time policeman as well)

    • Re: (Score:1, Funny)

      by Anonymous Coward

      A Portugese physicist, an Israeli mathematician and two American programmers walk into the bar.

      The bartender says:

      Sorry lads, read the sign: "Cheques are NOT accepted."

      *rim-shot*

    • by JamesP ( 688957 )

      "Whoa, is this some kind of a joke?!"

  • Portrugese (Score:5, Funny)

    by Anonymous Coward on Wednesday September 22, 2010 @02:09AM (#33659536)
    True facts about Portrugese:
    1. More than 250 million peoprle spek Portruguese, making it the firfth most sproken language in the wrorld.
    2. Portrugese is an adjective describing thrings relatd to Portrugal.
    3. Christropher Colurmbus spoke Portrugese.
    4. Portrugese is the officiral langurage of ther Repulic rof Angorlra.
    5. Hery trhe Navgatror, a Portugese prirnce, was in lrge partr resposible for Portugese effortrs durirng the age of explorartion.
  • Many rating systems seem to assume transitive dominance structures. If you are playing rock/paper/scissors no rating would be sufficient to predict the outcome of a tournament. Many games (using Batttlenet, true skill..) propably are not interested in finding nontransitive structures since players want to be the best and fans want to know who is the best which is kind of pointless with r/p/s.

    • by srussia ( 884021 )

      Many rating systems seem to assume transitive dominance structures. If you are playing rock/paper/scissors no rating would be sufficient to predict the outcome of a tournament. Many games (using Batttlenet, true skill..) propably are not interested in finding nontransitive structures since players want to be the best and fans want to know who is the best which is kind of pointless with r/p/s.

      In other words, styles make fights.

    • If you are playing rock/paper/scissors no rating would be sufficient to predict the outcome of a tournament.

      That's true, but that's not because of intransitivities in the game, it's because there's so little difference between human players. Just because there's intransitivities in the game doesn't mean there's intransitivity in the rankings - Starcraft is built around intransitivities, but the rankings work just fine.

      • If you are playing rock/paper/scissors no rating would be sufficient to predict the outcome of a tournament.

        That's true, but that's not because of intransitivities in the game, it's because there's so little difference between human players. Just because there's intransitivities in the game doesn't mean there's intransitivity in the rankings - Starcraft is built around intransitivities, but the rankings work just fine.

        thats true, i should have said intransitives in the player vs player outcomes rather than the game mechanics.

        i am certain that the sc2 story is FAR more complicated than the rankings even tough every race setup in 1vs1 should be balanced in theory.

    • I am not quite certain that I follow. Depending on tournament type R/P/S is (mostly) a game of chance, chess isn't. The only way I can see R/P/S applying is if they represent the players themselves, not the game. In other words, instead of having individual strength ratings that can be measured in isolation, player R's style of play might be naturally stronger against player S's style than that of player P, in which case we could only express the relative strengths of any pairing. This would allow a scenari

      • R/P/S is an interesting game for precisely this reason. A pure random player will win 50% of the time, but no one actually plays completely randomly (even if they try to, humans are terrible random number generators). Both players are trying to model the other player's strategy - if you can predict what the other player will do in the next round, you can win it. For example, if he always follows rock with scissors, you can follow his rock with rock and win. Of course, if he realises that you are modelli

      • you are ofcourse correct, i meant the players and not the game.

  • The ELO rating system isn't just used for chess, but many other competitive games (including video games). Therefore, this new 'improvement' may not apply to other games so well, if they've only used chess win/loss data. Sometimes, the simplest formulas are the best/most general.

    Even within the ELO system, tweaks can be made [wikipedia.org], though FIDE still uses the original system for whatever reasons.

  • I could have sworn it said "emo chess". I was going to ask what the goal of the game was, to decide who gets to play black ?
    • Fighting over who plays black in emo chess would be far too much effort. Who really cares, anyway, about the game or its outcome. It's all just a pointless metaphor for the pointless struggle that is life. There might be, I suppose, some small momentary fascination with the inexplicable passions people seem to hold to in chess or life. sighhhhhhhhhhhh. /emo-mode
  • ELO ?

    I didn't know the Electric Light Orchestra was still around

  • A Saudi Arabia mathematician who insists that Allah will guide his way to victory and a Liberty University physicist who insists that the universe revolves around the earth.
  • Fuck chess (Score:1, Flamebait)

    Computer beats world's greatest chess player. Good job. Play Go instead, it's not solved.
  • I say use the Soccer Octopus.
  • If Sagarin would just replace his ELO rating with the eventual winner of this contest. It would be interesting to see how much closer the "ELO replacement" performance is to what he gets from his PREDICTOR method (that takes into account point differentials).

One person's error is another person's data.

Working...