Skip to main content



Spam-Resilient SourceRank

The dominant PageRank system brings with it certain weaknesses. The primary weakness of PageRank is that, as explored within class, it is highly susceptible to spamming. For example, when creating a single new legitimate site, spammers may create extra dummy sites to pool their resources and manipulate their mutual authority and therefore hub rank to place the legitimate site at an artificially high rank. To counter this, the authors of this paper have developed a new ranking system referred to as Spam-Resilient SourceRank. The major divergences from PageRank include a “hierarchical source view of the Web”, a “source-based influence flow”, and Influence Throttling.

The hierarchical view organizes pages into groups called sources.  Transforming your pages into these new sources, the sources are then linked by directed edges, just as the pages are in PageRank. This helps catch duplicate sites made from one developer, as they will be grouped within a source. The influence flow modifies the edge strength within the graph, based on a source consensus edge that weights based on quantity and distribution of unique pages within a source. This helps prevent highjacking, or imbedding of fake links into honest pages, as it would have less effect on the consensus edge. Finally, Influence Throttling prevents spammers from creating multiple sources as they made multiple pages in PageRank. It does so by including self-edges, that effectively reduce the outputted weight of the source. The throttling vector, varying from 0 to 1, determines how strongly the source is throttled, and depends on a number of variables such as size of dataset, link density, and spam-proximity. When compared to PageRank, it managed to reduce a spam impact percentage from 80 percent inflation to just 4 percent inflation.

This directly relates to and builds off the Search Engine information presented in class. When exploring the PageRank system in homework problems, we were requested to act in some extent as spammers, to manipulate a small set of pages to our favor. By using the Source Rank upon the solution used to answer the homework problem, our group of colluding sites are now represented as a single source, effectively canceling the benefit of having a page pool. This is a simple demonstration of the continuing struggle of offensive web advancement (spamming) and defensive counter-measure development.

Comments

Leave a Reply

Blogging Calendar

November 2014
M T W T F S S
 12
3456789
10111213141516
17181920212223
24252627282930

Archives