Comments on: Google Similar Images: Only 20%?!

By: dinesh vadhia

dinesh vadhia — Tue, 28 Apr 2009 11:43:42 +0000

As it is still from Google Lab’s it is probably still a 20% project. I suspect that image (or content) search scalability is being worked on by most of the mainstream search engines as well as IBM. Let’s use images as the example: the extraction of features (also called the image signature) from images is not a problem these days though it does require considerable resources and time to process images to extract the features.

Next, what search model (ie. algorithm) is going to be used to deliver the search results. From this follows the scalability question.

This 2008 paper provides an almost up to date survey of the state of the art in image search:

Image Retrieval: Ideas, Influences, and Trends of the New Age – http://infolab.stanford.edu/~wangz/project/imsearch/review/JOUR/datta.pdf

Btw, did you have a look at our beta image search at http://www.xyggy.com/image.php ?

Dinesh

By: jeremy

jeremy — Mon, 27 Apr 2009 18:12:06 +0000

I agree that scalability makes the problem difficult. But that still doesn’t address why the project only languished as a 20% project for however many years, rather than being promoted to a full 80% project. If anything, one would think that the issue of scalability would be one of those core competencies in which Google would rise to the challenge; i.e. Google’s competitive advantage is scalability, and so they’d want to apply 80% effort to it, rather than 20% effort.

By: dinesh vadhia

dinesh vadhia — Sun, 26 Apr 2009 16:34:09 +0000

“… why was this not somebody’s 80% project, rather than a 20% time project?”

I’ll hazard a guess – scalability.

Last year, Google estimated that there were approximately one trillion images on the web.

Using image labels and associated data around the image on a web page gets you only so far.

Content-based image retrieval on the web poses a different set of problems than text based search of which, scalability is the most significant.