Monday, April 23, 2012

Online rating systems need an overhaul

As Brad observed quite nicely in this post, ebay has a problem with its rating system. Well, actually not only does ebay, but all websites who use them do. The root lies in different behaviour of users, and different reasons for them to give ratings. And there's the question of what the parties that are subject to the ratings intent to achieve. In ebay it's mostly a question of finances - the better the score, the better the revenue. In IMDB or youtube it's all about attention, the more votes you get, the better.

In my opinion, it would be better not to hand out negative votes on a quorum-basis like brad suggested, but to overhaul the whole rating system. Ebay has a scale of 1-5, some websites have 1-10, 1-3, 0-10, etc. Most systems don't even have descriptions for the meaning of the scores. And users use these scales differently. There's the question of defaults: some have highest rating as default, some have average default, some have lowest as default. Some tend to rate extreme, not using the full range, some are very careful, not using the full range either. Until now, the companies assume that with large numbers, these effects will all even out into a nice bell shaped graph.

For example, to avoid bad results, IMDB has very special vote rating system for their Top250 list that goes like this:
"weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C" I don't think this is very understandable, and they also have to exclude non-regular voters.

Another problem is, that with the old rating system, many users are influenced. For example they have a look at a movies rating on IMDB with something like 60000 votes, averaged 7,6. and they think, "hey, this movie doesn't deserve 7,6, it should be lower". So what they do is, they don't won't rate the movie at the score, which they actually think the movie deserves. But they vote 1, in order to lower the aggregated result as much as possible towards the score they think it deserves. This is a problem that all rating systems have, where the user has information on other voters choice before they vote on their own. On Websites that is most of the time the case.

So this is my new system:

I think it would be best, if every vote is contextualized with all your other votes, and thus normalized and comparable and aggregateable in a combined rating. If on a scale from 1-5, one user only gives two votes 1 and 5, and another user only the votes 2 and 4, they actually express the same opinion. By normalizing the data, you would have as a result, that the first user has an average vote of 3, with an average variation of 2, which leads to a -1.000 and +1.000 from average vote. The second user has the same vote results, because he has an average vote of 3 and an average variation of 1.

If you ever had a course on variance analysis, this comes naturally to your mind. Apparently it doesn't for the creators of the voting systems, they just calculate the average vote.

What will it change? First of all, you won't be confused with interpreting different rating scales anymore. You'll have a good guess, what -0.612 means in contrast to +0.997. Second of all, the influence of prevoting result spotting won't affect the vote so much anymore. This is because, you can't overexpress your opinion anymore by voting more extreme. Extreme votes are only heavier weighted, if you have a lot more moderate votes, and that takes time and consideration.

So, the math is easy and already used heavily in analysing surveys and stuff like that. The voting wouldn't change from the users perspective, so they wouldn't notice much of a difference. Only the results would be much more interesting, because fake-votes are stripped of their power.

Creative Commons LicenseCreative Commons License picture by Sara and Mike

This post originally appeared here

1 comment:

  1. This is very effective post I liked this one.
    I am preparing for my admission essay this can be the topic of mine.