Google Similar Images: Only 20%?!

A few days ago, Google launched “similar image search” functionality.  From TechCrunch:

A new 20% time Google project has just launched called Google Similar Images. It’s pretty self-explanatory — when you search for an image and find one close to what you’re looking for, Google can now find ones that it believes to be the same, or similar.

Much has been written and discussed about this, around the web.  I am not going to add my reviews or my opinions about the service itself, right now.  Rather, I have one overarching, confounding question that I would like answered: Why was this only a 20% time project?

Search, information retrieval, and information organization are all key to Google’s primary mission statement.  Image search via similarity is a well-known, long-studied problem. Why is it only now that such solutions are being offered by the Web giant, and not years ago?  More importantly, why was this not somebody’s 80% project, rather than a 20% time project?  One would think that such a mission-core piece of functionality would be something that Google would pursue 4 of 5 days a week, not only 1 of 5 days a week.

In related work, Microsoft’s web image search engine has facets.  Innovation all around.

This entry was posted in General, Information Retrieval Foundations. Bookmark the permalink.

3 Responses to Google Similar Images: Only 20%?!

  1. “… why was this not somebody’s 80% project, rather than a 20% time project?”

    I’ll hazard a guess – scalability.

    Last year, Google estimated that there were approximately one trillion images on the web.

    Using image labels and associated data around the image on a web page gets you only so far.

    Content-based image retrieval on the web poses a different set of problems than text based search of which, scalability is the most significant.

  2. jeremy says:

    I agree that scalability makes the problem difficult. But that still doesn’t address why the project only languished as a 20% project for however many years, rather than being promoted to a full 80% project. If anything, one would think that the issue of scalability would be one of those core competencies in which Google would rise to the challenge; i.e. Google’s competitive advantage is scalability, and so they’d want to apply 80% effort to it, rather than 20% effort.

  3. As it is still from Google Lab’s it is probably still a 20% project. I suspect that image (or content) search scalability is being worked on by most of the mainstream search engines as well as IBM. Let’s use images as the example: the extraction of features (also called the image signature) from images is not a problem these days though it does require considerable resources and time to process images to extract the features.

    Next, what search model (ie. algorithm) is going to be used to deliver the search results. From this follows the scalability question.

    This 2008 paper provides an almost up to date survey of the state of the art in image search:

    Image Retrieval: Ideas, Influences, and Trends of the New Age – http://infolab.stanford.edu/~wangz/project/imsearch/review/JOUR/datta.pdf

    Btw, did you have a look at our beta image search at http://www.xyggy.com/image.php ?

    Dinesh

Leave a Reply

Your email address will not be published.