Skip to main content

Google Image Search: using PageRank

We are very familiar with the PageRank algorithm used by Google Search to rank web pages in their search results. However, we might not know that Google has again tried to leverage the wisdom of PageRank to experiment and produce the next generation of image search.

Most search engines like Yahoo, Bing, etc. provide image searching. However they simply index the text associated with an image to determine the content of the picture. Clearly there are many drawbacks of the approach. We rely too much on the texts, and the texts can be misleading. We have no way of ensuring that every image is correctly described by texts in the web page where the image is embedded. However, there are also no efficient ways of identifying the content in the pictures. So, how do we approach?

Google has provided us solutions. They are ready to see beyond the text: determining rank based on similarities between images by looking at the visual characteristics of popular images. As web pages have linked to each other, images have “visual-hyperlinks”, and they use those links to generate image search graphs.  An assumption will be that if a user is viewing a page, it is very likely that others similar images are of interest, and this allows the application of PageRank algorithm. Like web pages, images that are visited often are deem more important.

This is related to the PageRank algorithm taught in class. With the random walk on pages gives an probability distribution of the likelihood that each page will be visited. Thus pages with more links to it tend to have higher importance. Likewise, images that are of similarity to more other images are also of higher importance. However there might be questions on how the two really relates.

There are no links existing in the image search graph, but the paper attached in the article describes a method to infer the similarities. The replacement of user-created hyperlinks by automatically inferred “visual-hyperlinks” is under question as it deviates from PageRank algorithm where the large number of manually-created links are deemed as an important factor. Fortunately, the authors have take this into consideration. They say that they have two ways of recapturing a significant amount of human-coded information. One is to make approach query dependent: selecting an initial set of image results returned by the search where we obtain the linking of relevant images to web pages through the current Google ranking system. The other is to rely on the intelligence of crowds: generating the image search graph by common features between images.

In conclusion, we see the proposed approach as an advanced image search mechanism that utilizes the wisdom behind links and network analysis for web-document search.




Leave a Reply

Blogging Calendar

November 2014