If a website, say the BBC News (news.bbc.co.uk) links to another site, say Fred’s Auto Shop (bestautoshop.com), because it’s relevant to the story then Google will think more highly of Fred’s Auto Shop because if the BBC is linking to it, it must be important. So a little Google love is passed from the BBC page to Fred’s site, this is the principle behind Google and its PageRank algorithm. Basically, a site’s reputation is based on the number and quality of the inbound links to it.
Unfortunately, spammers realized that if they could also get their links on to these high-ranking pages they too could get some Google love and promote their own sites full of spyware, cheap medication or other dubious items.
So Google came up with the NoFollow attribute and encouraged people to use it in public areas such as blog comments so that spammers gain no benefit. Wikipedia uses NoFollow on all of its links and Digg is about to use it on sites it doesn’t trust.
For Wikipedia I find that disturbing. As an encyclopedia it’s supposed to be an authoritative page of information and therefore anything a page links to is likely to be a valuable source of information. Therefore, Google should be using this to measure the reputation of the linked page.
As for Digg, how will Digg know which sites to trust? Why will they be any better than Google at sifting the good from the bad. Ultimately, they will only have one set of data to base their judgement upon, i.e. the database behind Digg, a user’s activity on the site, how many domains they’ve linked to, number of comments with a single link in, etc. Spam detection works better when you have a larger pool of data, one that crosses multiple business sectors, languages and domains. Therefore, Digg should just use NoFollow everywhere … but only if we don’t strictly interpret the NoFollow attribute.
If NoFollow is strictly interpreted, it’s wrong as you’re penalizing good links because of the few bad ones. A maxim that’s never worked in any aspect of life.
If it’s interpreted as ‘this link is user submitted and may be spam’ that’s better. It means we can be wary of this link and maybe we don’t award all of the Google love the moment we see it, maybe we cross-reference other sources and see if it fits into a pattern of abuse. If high-ranking sites are linking to this page without NoFollow and a few are linking with NoFollow, then it’s probably safe to ignore the NoFollow attribute for this link. However, if you only have NoFollow links pointing to a page, it’s more likely to be spam. Factoring in details about the domain, the timespan from the first awareness of this link to the last should also enable a more accurate prediction to be made and identify spam.
With the amount of data Google has it should be able to use this information wisely and be able to more accurately assess the SpamRank (my idea!) of the linked page.
I think the ‘NoFollow’ name is a bad idea and would have been better if it was called ‘UserGenerated’ instead or something similar. I see the value and I trust Google to use the information in a logical manner but if Digg added its own ‘trust’ algorithm to the site, it’s basically nothing more than a PageRank sculpting mechanism however noble it seems. Out of interest, I see Reddit doesn’t use the NoFollow at all for user comments, which again is probably a bad idea.