The Problem With Metacritic 131
Metacritic has risen to a position of prominence in the gaming community — but is it given more credit than it's due? This article delves into some of the problems with using Metacritic as a measure of quality or success. Quoting:
"The scores used to calculate the Metascore have issues before they are even averaged. Metacritic operates on a 0-100 scale. While it's simple to convert some scores into this scale (if it's necessary at all), others are not so easy. 1UP, for example, uses letter grades. The manner in which these scores should be converted into Metacritic scores is a matter of some debate; Metacritic says a B- is equal to a 67 because the grades A+ through F- have to be mapped to the full range of its scale, when in reality most people would view a B- as being more positive than a 67. This also doesn't account for the different interpretation of scores that outlets have -- some treat 7 as an average score, which I see as a problem in an of itself, while others see 5 as average. Trying to compensate for these variations is a nigh-impossible task and, lest we forget, Metacritic will assign scores to reviews that do not provide them. ... The act of simplifying reviews into a single Metascore also feeds into a misconception some hold about reviews. If you browse into the comments of a review anywhere on the web (particularly those of especially big games), you're likely to come across those criticizing the reviewer for his or her take on a game. People seem to mistaken reviews as something which should be 'objective.' 'Stop giving your opinion and tell us about the game' is a notion you'll see expressed from time to time, as if it is the job of a reviewer to go down a list of items that need to be addressed — objectively! — and nothing else."
But it's all subjective anyway. (Score:4, Interesting)
Depends on what you mean by using the range (Score:4, Interesting)
In most US schools, the scale is:
A: 100-90
B: 89-80
C: 79-70
D: 69-60
F (or sometimes E): 59-0
So while you can percentage wise score anywhere from 0-100 on an assignment and on the final grade, 59% or below is failing. In terms of the grades an A means (or is supposed to mean) an excellent grasp of the material, a B a good grasp, a C an acceptable grasp, a D a below average grasp but still enough, and an F an unsatisfactory grasp.
So translate that to reviews and you get the same system. Also it can be useful to have a range of bad. Anything under 60% is bad in grade terms but looking at the percentages can tell you how bad. A 55% means you failed, but were close to passing. A 10% means you probably didn't even try.
So games could be looked at the same way. The ratings do seem to get used that way too. When you see sites hand out ratings in the 60s (or 6/10) they usually are giving it a marginal rating, like "We wouldn't really recommend this, but it isn't horrible so maybe if you really like this kind of game." A rating in the 50s is pretty much a no recommendation but for a game that is just bad not truly horrible. When a real piece of shit comes along, it will get things in the 30s or 20s (maybe lower).
A "grade style" rating system does make some sense, also in particular since we are not rating in terms of averages. I don't think anyone gives a shit if the game is "average" or not, they care if it is good. The "average" game could be good or bad, that really isn't relevant. What is relevant is do you want to play a specific game.
Re:Solve it with Machine Learning (Score:5, Interesting)
Your description is not so much machine learning as basic math. If you just want each scoring system to have equal weight on the results then compensating for the variation is trivial.
Where machine learning would come in is to find underlying pattens in the data.
This could be used to weed out reviewers to lazily copy scores, or are subject to influence.
It would also allow them to test your scores for some games against the population of reviewers to find reviewers with similar tastes.
You could also use clustering algorithms to find niche games which got a few really strong scores but whose average was really pulled down because they don't have wide appeal.
Re:Is that so? (Score:5, Interesting)
I work for a review platform. We have decided that you only really need four ratings, Bad, Poor, Good, Excellent. We don't have a neutral option because really neutral tends to mean bad.
Of course, quite a lot of our users (and our marketing department) seem to prefer stars. Because an arbitrary scale is so much more useful that simply saying what you think of something. Apparently.
Rottentomatoes (Score:5, Interesting)
By that logic, Rottentomatoes (which averages reviews using only a binary fresh/rotten scale) should be utterly useless. Except it isn't. It's IMHO the most dependable rating site on the net.
It seems the magic lies not in the rating resolution, but in the quality and size of the reviewer pool (100+ for Rottentomatoes). In other words, make the law of averages work for you.
Re:Rottentomatoes (Score:4, Interesting)