Google News Personalization: Challenges Facing Large Numbers and Churning
When you are looking for something, but you don’t even know what you are looking for, who can give you the answer? Search engines like Google are investing extravagant amounts of money to search engine optimization, or SEO. For users like us, that’s good news, because it means you will be getting recommendations that interest you more when web-surfing.
There are many ongoing and past studies about the SEO in an effort to create a novel and improved way for search recommendations and personalization. One such study composed and executed by Google presents a method that is a combination of different algorithms. Their goal is to create a scalable online collaborative filtering system using interaction between multiple nodes of information sources, hence a “collaborative” filtering. Their study is set apart from others by the fact that they are using immense real-life data sets that scale up to millions of click-throughs from millions of users, and a very fast item churn rate. These two factors are also the challenges they face in creating their system of interest.
To give a crude summary of their system, they use a mix of memory based and model based algorithms to generate recommendations; PLSI and MinHash for the former, and item covisitation for the latter.
Different statistics and information are fed back and forth among multiple system components:
Using these information, a candidate story is assigned a score given by an algorithm:
The score () from clustering approaches is proportional to the algorithm above, and the scores of all the stories are combined to form a ranked list of stories, and the highest ranked stories are recommended to the users.
The PageRank system that we were exposed to during lecture is a very basic model that only takes into account hub and authority assignments. Even at this level, when a real-life data set that includes millions of nodes to consider comes into the picture, things get much more complicated. However, people take for granted the brain-power that goes into creating a top-notch, bordering on artificial-intelligence technology provided while web-surfing.
It is a true challenge to keep up with the dynamics of the internet, and without the guiding hand of SEO, people would be lost in the myriad of information flooding the internet.
Source:
http://delivery.acm.org/10.1145/1250000/1242610/p271-das.pdf?ip=128.84.124.70&acc=ACTIVE%20SERVICE&CFID=188767310&CFTOKEN=90557010&__acm__=1351614362_a9adc22ad0a7898098a6ef7c854addaf