As a researcher, it is occasionally quite interesting to reread thoughts and positions that I’ve taken in years and works past. Sometimes I can observe a marked shift from my previous thinking; avenues or approaches that I once considered fruitful I now no longer do. And sometimes I can observe hints and seeds of my current research; avenues of which I only had a vague inkling have blossomed into larger pursuits.
In April of 2006 I had the good fortune to attend a Dagstuhl Seminar on Content-Based Multimedia Information Retrieval (I am toward the upper left corner of the seminar group photo). Ramesh Jain has a good writeup of Dagstuhl Seminars, what they are and how they work. In the abstract of my Seminar presentation I wrote:
The classic problem of ad hoc information retrieval involves a user with an information need, a representation or expression of that information need (the query), and a system or retrieval engine that compares the query against a collection of items in order to return the most relevant items to the user information need. Despite numerous and obvious exceptions, in general text information retrieval has a fairly high correlation between the syntax of a query as expressed in language and the semantics of the information need. Textual similarity is highly correlated with relevance. On the other hand, in content-based multimedia retrieval (images, video, music, 3d models), objects encompass multitudinous semantics in many different dimensions. In music for example there are properties of pitch, tempo, rhythm, timbre, singer characteristics, genre, instrumentation, year of production, and so on. The correlation between similarity and relevance is much lower. Two music pieces might be similar because they both use similar instruments, timbres, tempos and singers, but they are not necessary both relevant to my information need if I am looking for waltzes, and one piece is in 3/4 and the other in 4/4.
The current popular solution to this problem, characterized by buzzwords such as “collective intelligence”, “wisdom of crowds” and “Web 2.0”, is to bypass content altogether. By instead aggregating the media interactions (playlists, tags, click behavior, etc.) of massive numbers of people, the collective intelligence approach hopes to be able to determine relevance directly, without the need for content-based methods. If people are not only the ultimate consumers, but also the ultimate producers of relevance, why waste any effort on a problem as difficult as content-based retrieval? In our presentation we reject this notion of complete reliance on collective intelligence methods and argue that content-based methods are necessary. Aggregate crowd relevance information may be able to tell us what should be retrieved, but it still will not tell us why something was retrieved. For that, we still need to rely on the explanatory power of content. Therefore, we propose the “cognitive disclosure” paradigm, in which semantic representations are chosen a priori by designers of a content retrieval system, i.e. content-features necessary to call a piece of music a “waltz”, or to call an image a “landscape”. These semantic categories are then revealed to users at retrieval time, to allow them more intelligent selection of the types of information that is relevant to them. This problem is still very difficult and there are no easy solutions. However, our purpose is simply to explain why “wisdom of crowds” approaches will inevitably fall short, and content-based methods are still going to be necessary.
At the time, I was having a visceral reaction against all the hype surrounding “collective intelligence” and “wisdom of the crowds” as a primary basis for doing information retrieval. I was not interested in collaboration as massive data crunching on top an anonymous crowd, nor even on top of a known crowd of friends. I wasn’t interested in crowds of any kind. Rather, I thought (and still do) there is a lot more value left to be extracted from content-based search methods.
However, where these thoughts soon led me was to a notion of collaboration in search quite different from “wisdom of crowds” methods. With Maribeth Back and Gene Golovchinsky, I envisioned collaboration as a sort of “musical jam session”, where a small set of common-goal searchers got together and “played” their search “tunes” together over a content-based retrieval back end. The purpose of this jamming wasn’t to repeat each others’ notes (“people who play this note also play that note”) but to play melodies and baselines that were different, but that worked together toward the larger goal of creating a full “song”, a commonly constructed set of relevant information. To date, this notion of collaboration in search has proven, and continues to be, quite fruitful. There are all sort of research “melodies” left to be played, all sorts of songs left to be sung, by all sorts of researchers. I continue to be excited about this notion of “search jamming”, and look forward to the solutions that the community will continue to invent.