Skip to main content



PageRank is vulnerable to Sybil Attacks

One of the first things that struck me while learning about the PageRank algorithm is the following problem: What happens if you just make a lot of fake pages to boost your ranking? It seemed clear that this would be an effective strategy, and indeed it is [1].

The creation of dummy nodes in a network in order to boost a reputation score is known as the Sybil attack. One sort of case you might be familiar with is the creation of fake accounts to boost reviews of a product or the use of bots to upvote content on Reddit. Sybil attacks are an open problem[2], and are famously considered impossible to beat without the presence of a centralized verification system[3].

One possible solution would be to introduce a time component to PageRank calculations, such that rather than being calculated from scratch periodically; PageRank “flows” from pages at a certain rate, with new pages starting at 0. If a Sybil cluster only had a handful inlinks from the rest of the nodes, the entire cluster would only receive a handful of link’s worth of PageRank each tick, no matter how large the cluster.

As far as I’m able to tell, Google has not disclosed any information about how they might be combatting such an attack. After all, it’s not even precisely clear how PageRank fits into Google’s purported 200+ other ranking factors. I also have not been able to find estimates on the degree to which Sybil attacks are currently being used to thwart PageRank– I would not be surprised if it was extremely large.

[1] https://www.cs.duke.edu/nicl/netecon06/papers/ne06-sybil.pdf

[2] http://forensics.umass.edu/pubs/levine.sybil.tr.2006.pdf

[2] http://freehaven.net/anonbib/cache/sybil.pdf

Comments

Leave a Reply

Blogging Calendar

October 2016
M T W T F S S
 12
3456789
10111213141516
17181920212223
24252627282930
31  

Archives