Skip to main content



Rocchio Algorithm vs. PageRank

https://nlp.stanford.edu/IR-book/html/htmledition/relevance-feedback-and-pseudo-relevance-feedback-1.html

The above link directs to an overview of the Rocchio Feedback algorithm, which I was immediately reminded of when we delved into PageRank. Both are in the domain of processing and updating information; it makes sense that the seeds for both algorithms came out of Stanford. In the way that PageRank updates a given search engine’s database for relevant webpages, the Rocchio algorithm takes user input and updates the results to a search query accordingly. I took CS4300 last semester, in which we learned how Rocchio is a powerful tool to compound onto another similarity score to give more tailored results.

Both algorithms function quite similarly. In PageRank, based off of the webpages relative importance in relation to one another, the webpages are assigned new scores to mark their importance as search results. For Rocchio, instead of inherent values dictating which search results have more weight, the user assigns the importance by labelling each result as either relevant or irrelevant. Upon the next search, the system will retrieve documents from a newly updated database, complete with some potential results weighted higher and some weighted higher (just as in the PageRank algorithm) What I find interesting, is that even though the algorithm represents the bare-bones idea of tailoring information based off of user input, this idea has now become inextricably linked with search engines. Every search someone does it based heavily on their personalized information, including location and previous searches. It’s cool to see one of the algorithms that underpins a concept so ubiquitous and ingrained in our daily lives. Both algorithms seem to harness the staggering amount of information available on the web/in a database and work to make it easier to return results by highlighting or relegating. PageRank uses the relative importance of each page to others, and Rocchio takes in user input.

Comments

Leave a Reply

Blogging Calendar

October 2018
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
293031  

Archives