|
|
I have more questions than I have answers. One of the topics that I know very little about, and on which I often seek clarification and wisdom, is A/B testing in the context of rapid iteration, rapid deployment online systems. So I’d like to ask a question of my readership (all four of you [...]
What sort of information retrieval system would you build if you knew that all the users of your system would be expert or highly-motivated amateur searchers? What sort of system would you build when you have a very large collection of unstructured information, and the goal in searching that information is not to find one document (e.g. navigate to a home page), but to find (a) relationships between documents, or (b) large sets of documents that all pertain to a single topic? How would your algorithms be different? How would your interfaces be difference? How would the process itself (that middle layer in between algorithms and interfaces) be different?
Via Daniel Tunkelang’s recent post, I think that Government information might be a perfect domain in which to ask (and answer) these sorts of questions. The U.S. Open Government Initiative has as its goal the release of loads of raw government data for use by any individual or organization. How are people going to use this data? What types of questions will they ask? What types of questions could they ask, if given the proper tools (i.e. what might they not know that they want to ask, until it becomes possible?)
Two types of information retrieval might be perfect for this domain: Exploratory Search and (Explicitly) Collaborative Search. Continue reading…
Via Xavier Amatriain: The Dirty Little Secret About the “Wisdom of the Crowds” – There is No Crowd:
This is hardly the first time that the so-called “wisdom of the crowds” has been called into question. The term, which implies that a diverse collection of individuals makes more accurate decisions and predications than individuals or even experts, has been used in the past to describe how everything from Wikipedia to user-generated news sites like Digg.com offer better services than anything created by a smaller group could do.
Of course, we now know that simply isn’t true. For one thing, Wikipedia isn’t written and edited by the “crowd” at all. In fact, 1% of Wikipedia users are responsible for half of the site’s edits. Even Wikipedia’s founder, Jimmy Wales, has been quoted as saying that the site is really written by a community, “a dedicated group of a few hundred volunteers.”
<snip>
Still, there [has] yet to be a perfect solution to the problem. Perhaps it’s time we give up the idea that the “wisdom of the crowds” was ever a driving force behind any socialized, user-generated anything and realize that, just like in life, there will always be active participants as well as the passive passerbys.
I have never quite liked the notion of “wisdom of crowds”, and the hype behind it even less, so I”m glad to see signs that the hype cycle is finally starting to wind down. However, by having to confront exactly what it was that I didn’t like about the notion, I was intellectually forced to propose an alternative: Explicit Collaboration in Search. As I wrote half a year ago: Continue reading…
…via Ask and SearchMe, that is? Let me explain. Google announced a new bit of interface design into its News search results today: Fast Flip:
Google Fast Flip is a web application that lets users…”flip” through pages online as quickly as flipping through a magazine…We capture images of the articles on our partners’ websites and then display them in an easy-to-read way…Readers can flip through stories quickly by simply pressing the left- and right-arrow keys until they find one that catches their interest. Clicking on the story takes them directly to the publisher’s website.
Funny, it reminds me a lot of Searchme.com (see this writeup by Danny Sullivan) from 2008, which itself was largely a continuation of Ask’s visual previews (binoculars) from 2006. Funny thing is, visual search interfaces such as these have been pretty universally panned for quite some time now. And panned by Google as well, if I remember correctly — I’m fairly sure I read something fairly official about it, though darned if I can find that post because Google’s search doesn’t allow “sort by least recent” relevant results, only “sort by most recent”. Personally, I love interfaces like this and find them much easier to deal with. But Google disagrees, and has (presumably) done all sorts of A/B testing to conclude that users don’t want to see their search results visually. Because otherwise they would have rolled out these changes years ago, at the same time as, if not ahead of, Ask and SearchMe. Right?
Or are Bing’s innovations in the interface domain finally spurring Google on, finally providing the competition to improve search that A/B testing cannot? Continue reading…
Half a year ago I wrote a blogpost about an easy change that Google could make to its interface, one that would both sacrifice only the least bit of simplicity as well as entice and encourage the user to enter longer queries, thus improving retrieval effectiveness. In particular, I wrote:
So even though research has found that longer queries lead to more satisfied users, and that larger query input boxes lead to longer queries, Google is unable to take an evolutionary step in that direction. That step violates their current locally-maximum hill principle of simplicity. They seem fundamentally incapable of passing through the valley of complexity to reach an even higher effectiveness peak because evolutionary thinking does not allow them to take that large leap necessary. They can only follow their current gradient. In ten years of using Google, I don’t think that I have ever seen, even for brief experimental time periods, a query input box that was taller than one line. Thus, evolutionary thinking conflicts with long-term goals.
Well, it’s time for me to eat those words. For today, the search box grew in size. From the official Google blog: Continue reading…
A few days ago I posted a question about why modern web retrieval systems offer no explicit relevance feedback mechanisms. I wonder if it has anything to do with the following attitude, explained by one of my favorite bloggers, Nick Carr:
The problem with the Web, as I see it, is that it imposes, with its imperialistic iron fist, the “ecstatic surfing” behavior on everything and to the exclusion of other modes of experience (not just for how we listen to music, but for how we interact with all media once they’ve been digitized). In the pre-Web world, we not only enjoyed the thrill of the overnight sensation – the 45 that became the center of your waking hours for a week only to be replaced by the new song – but also the deeper thrill of the favorite band in whose work we deeply immersed ourselves, often following its progression over many records and many years. Continue reading…
As a researcher, I have more questions than answers. And one of the questions that I have is in regards to the widely-accepted maxim that users are too lazy to give explicit relevance feedback to the search engine. See Danny Sullivan’s take, here.
Perhaps I am stuck back in a view of Information Retrieval that is 10-15 years old, but I tend to find my views heavily shaped or influenced by things like the following bit from Marti Hearst’s chapter in Modern Information Retrieval:
An important part of the information access process is query reformulation, and a proven effective technique for query reformulation is relevance feedback. In its original form, relevance feedback refers to an interaction cycle in which the user selects a small set of documents that appear to be relevant to the query, and the system then uses features derived from these selected relevant documents to revise the original query. This revised query is then executed and a new set of documents is returned. Documents from the original set can appear in the new results list, although they are likely to appear in a different rank order. Relevance feedback in its original form has been shown to be an effective mechanism for improving retrieval results in a variety of studies and settings [salton90a][harman92c][buckley94b]. In recent years the scope of ideas that can be classified under this term has widened greatly.
Given that explicit relevance feedback works, why is it essentially non-existent on the web? A bird in the hand (an explicit relevance judgment) is worth two in the bush (two implied or inferred relevance judgments). Continue reading…
On my drive to work this morning, as I mentally began preparing for all the research I wanted to accomplish today, I started thinking about the relationship between information retrieval, machine learning, probability, and statistics. And I found myself wondering how most of us think about machine learning when we use it as a [...]
A number of people have already written about the Sue Dumais “Salton Award” talk at SIGIR. I encourage you to read their posts, and in particular pay attention to the emphasis that she put on her work at the intersection of HCI and IR. I see this area as only continuing to grow over [...]
I just finished reading a though provoking post from Anil Dash, about how Google’s recent Chrome OS announcement signifies an important moment:
This is, for lack of a better term, Google’s “Microsoft Moment”. This is the point when the difference between their internal conception of the company starts to diverge just a bit too far from the public perception of the company, and even starts to diverge from reality. At this inflection point, the reasons for doing new things at Google start to change.
Dash gives a number of explanations for why he believes this moment has arrived. The first observation that struck me was about Google’s attitude toward self-promotion. For its entire company history, Google has proudly and vocally called attention to the fact that it does not advertise its own services; its products speak for themselves and are spread by word-of-mouth and by reputation alone. That is the self-declared “Googly” way. This was not just early days rhetoric, spoken only when the company was young. As recently as last year’s SIGIR 2008 conference in Singapore, Googler Kai Fu Lee explicitly stated during his keynote speech the fact that Google does not self-advertise. But this, Dash says, is changing. Now there are slick television ads for Chrome. There are highly promoted developer conferences for Android. And just two days before Kai Fu Lee gave his SIGIR talk, categorically declaring that Google never self-advertises, I was at a San Francisco Giants game and saw a large, LED banner advertisement for Google Transit. This change has an effect on the public perception of the company. Dash writes:
This would be okay, except that I doubt Google’s internal self-image as an organization has changed to reflect this new reality. “We’re not like some giant company with flashy TV ads — we’re just a bunch of geeks in Mountain View!” And while that might be true for the vast number of engineers who define the company’s internal culture, the external impression of Google being just another tech titan like Microsoft will gain footing, making the audience for Google’s messages less tolerant of ambiguity and less forgiving of mistakes…Google has made commendable steps towards communicating with those outside of its sphere of influence in the tech world. But the messages will be incomplete or insufficient as long as Google doesn’t truly internalize and accept that its public perception is about to change radically. The era of Google as a trusted, “non-evil” startup whose actions are automatically assumed to be benevolent is over.
Now, you might ask: What does all this have to do with this Information Retrieval blog? Continue reading…
|
|
Recent Comments