Skip to main content

Images. Images everywhere! But how to rank them?

We’ve seen that page rankings depend on the analysis of links in the network. However, we search for more than web pages on a certain topic. Image searches form the natural extension for things people may want to look for on the web. The question then arises, how do search engines rank images? The difficulty in trying to do so in a manner similar to PageRank results from the fact that queries are presented as words and that images don’t have explicit links to other images.

One way that images may be ranked is to make use of the environment in which the images are presented. For example, say one (image) searches for ‘laptops’.  A search engine could effectively use the same rankings thrown up if one were to search for pages with the same query, except it could simply pull out images present in these to return as a result.

Naturally, there are a number of flaws with this technique. First, images on a highly ranked webpage may have absolutely nothing to do with the actual query itself. The search term throws up HP’s laptop sales page as one of the top results. However, this page actually has numerous pictures of accessories that can be bought with a laptop. A search engine would have to find a way to ignore those, but clearly with this primitive technique, that is not possible. Secondly, images related to the query may not actually be present on a web page that has a high ranking.

Google’s VisualRank algorithm was first described in 2008 (presented in links below). Not much has been said about it since, especially because of the amount of competition present in the market. Clearly, Google’s engineers were aware of the described issues and went about trying to alter their ranking mechanism, as described in the above link. Unfortunately, at that point in time it seems like a lot of the focus was on gathering information from the placement and structure of the image, which, though okay, did not do a sufficiently good job of ranking images by content.

Because it has been 8 years since that paper was published, it seems more and more likely that Google and other search engines use some form of artificial intelligence to optimize their altered image ranking algorithms. Google already uses an algorithm, nicknamed RankBrain, to help it interpret queries and should, if it isn’t already, use Computer Vision to optimize image search. Instead of depending on the ‘links’ between different images that may manifest themselves in the form of similar images on the same page, structurally similar images on different websites, it makes more sense to at least process and classify images before doing some form of ranking.

Of course, there are challenges in using these techniques as well. Creating a training model to classify images clearly adds a layer of complexity to image searches that is not present in normal web searches. Furthermore, classifying millions of images is computationally expensive and may not be completely feasible right now.

We already know that search engines can beat human experts in ranking images because of the sheer volume of images available but these are for relatively straight-forward searches (such as ‘Eiffel Tower’, ‘Cornell’, ‘Hamburgers’). Their performance for more ambiguous queries, such as ‘apple transformation’ which should ideally throw up results of actual fruit transformations and/or the transformation of the apple logo as the highest results, is not as good as it throws up various relevant and irrelevant results. I eagerly await the day when search engines manage to combine different forms of Artificial Intelligence to effectively rank the different results people expect from the most ambiguous of queries.


How Do Images Get Ranked in Image Search?


Leave a Reply

Blogging Calendar

October 2016