Information Retrieval Gupf

Speed Matters. So Does the Metric.

Posted on June 29, 2009 by jeremy

Via Greg Linden, I came across the following experimental result from Google as to the importance of quickly returning results to users. The gist of the experiment is summed up in the abstract:

Experiments demonstrate that increasing web search latency 100 to 400 ms reduces the daily number of searches per user by 0.2% to 0.6%. Furthermore, users do fewer searches the longer they are exposed. For longer delays, the loss of searches persists for a time even after latency returns to previous levels.

Google therefore concludes that speed matters, that it is of utmost importance to return results as fast as possible, otherwise users will be less satisfied users. Less satisfied users, the metric assumes, means fewer queries.

I am not as immediately convinced. Sure, I have no doubt that the number of queries issued did drop as a result of latency increase. But can we immediately conclude, from the information contained within this report, that users were less satisfied with their overall search processes? The author writes: Continue reading →

Posted in Information Retrieval Foundations | Leave a comment

Semantic Technology Search Panel

Posted on June 19, 2009 by jeremy

On Wednesday I attended the Executive Round Table on Semantic Search, at the 2009 Semantic Technology Conference. Researchers from Ask, Hakia, Yahoo, Google, Powerset/Bing, and True Knowledge were on the panel. In the next few days I hope to give a longer write-up of the session over on the FXPAL blog. In the meantime I wanted to quickly point out one nugget, and one related Tweet.

The panel covered a large number of topics. But it was inevitable that the moderator would turn to the Google panelist (Peter Norvig) and ask him what he thought about Bing. There has been too much buzz lately for that question not to be asked. I was pleasantly surprised by his answer. I’m not going to risk quoting him, only paraphrasing. And if I misrepresent anything, any mistakes are mine, and not intentional.

[Paraphrase] Norvig’s first answer to the Bing question was to say that he likes the idea of innovation in the user interface. He thinks that there is a lot of room for more such innovation, and for a lot of different reasons. Historically, there has been too much emphasis on getting the ranking right, at the expense of all else. Of course (he added) a quality ranking is something that you absolutely must have. But for too long it has been the only thing that has been worked on, and that needs to change. He thinks Bing has made some good steps, and that there are a lot more that can be made as well.

Wow! This is not the Google that I’ve known for a decade, the Google that has actively shunned most forms of interactivity, feedback, and exploration other than spelling correction. Continue reading →

Posted in Exploratory Search | 7 Comments

Exploratory Food Search

Posted on June 17, 2009 by jeremy

I came across an interesting article today in the New Scientist on the topic of mass-scale food annotation. The idea is that we can instrument our food, so that we know much more about its origin and manner of production:

WHERE does your food come from? A few years ago, most consumers were satisfied with a sticker showing the country of origin. But concerns about fair trade and the environment, as well as food safety, are now driving a wave of projects aimed at tracking food from farm to shopping basket. Though price is still the main factor determining the food that people buy, many are demanding to know more about its source. This is partly due to a series of recent food safety scandals, from major outbreaks of salmonella and E. coli to melamine showing up in baby formula and pet food. “The public want to know where their food and other products come from, how they are made, and whether they contain any ‘unhealthful’ contaminants,” says Dara O’Rourke, an environmental policy expert at the University of California, Berkeley. Ethical and environmental concerns figure prominently, too. In the US, for example, “a small but rapidly growing percentage of the population – perhaps 8 to 10 per cent – are deeply interested in these issues,” says food policy expert Marion Nestle of New York University. “Interest in where food comes from is part of a growing social movement.”

Most manufacturers already use barcodes or RFID chips to track their products. But with the help of cheap cellphone and internet access it is becoming possible to collate data from remote locations around the world and make it available to the people who are actually going to eat the food. In many cases manufacturers are alive to the notion that transparency about the source of their food is good for business. Sime Darby, a large palm oil supplier in Indonesia and Malaysia, is working with FoodReg, a firm based in Barcelona, Spain, that develops food-tracking software. The idea is to develop a system to prove to customers that its crops are not grown on land recently occupied by tropical rainforest. In remote regions where farmers don’t have access to computers, they can use cellphones to record onto FoodReg’s online database the time and place the crop was harvested. Tracking systems like this should also make it easy to calculate the distance that goods travel to reach stores, allowing consumers to estimate the greenhouse gas emissions racked up by the transport of their food. “The calculation of food miles and carbon footprint could be the killer application for traceability,” says Heiner Lehr of FoodReg. “The technology is there. If a big retailer puts itself behind this, it could happen very fast.”

Projects like this are interesting to me because I can imagine myself in the future making decisions about how and what I buy, based on the information that I am able to obtain about my various choices. In fact, it would be nice to be able to walk into the grocery store with the information seeking intent of finding a good source of protein (whether chicken or beef, or maybe even just beans) for the evening’s meal, and come out of the store with a product that not only fit my budget, but that I felt good about buying. But in order to make this information useful to consumers, there has to be some sort of search or information retrieval layer built on top of the data.

This is where the “I’m feeling lucky” model of simply trying to give the consumer an “answer” breaks down. Continue reading →

Posted in Exploratory Search, Social Implications | 3 Comments

Bing = Bing Is Not Google

Posted on May 28, 2009 by jeremy

As reported by Scoble: http://friendfeed.com/scobleizer/01bbb409/i-love-that-bing-means-is-not-google-very

I agree — very clever and quite funny. A recursion that all computer scientists should appreciate, tipping its hat to similarly-constructed acronyms, such as GNU (Gnu’s Not Unix) and Pine (Pine is not Elm).

It’s interesting, too, how the new generation references the older generation, i.e. the final character in Bing is the G. Lest we forget, the very first Google logo had an exclamation point at the end — a not-so-subtle reference to Yahoo!

http://www.flickr.com/photos/umax46/2978899299/

Posted in General | 1 Comment

Wired Article on Bing

Posted on May 27, 2009 by jeremy

I just came across a Wired article today on a new search push from Microsoft, which will supposedly be named Bing. It touches on some of the issues that we were discussing in yesterday’s comment thread, in particular:

People thought online e-mail was just fine and more or less converged on the same specific set of features — until Google came along and gave people gigs of disk space, organized e-mails by conversations and let people send big attachments. Soon Yahoo and Microsoft were forced to follow. So too with search. Google appears to have created the staple recipe, but there is a clear hunger for something more. Unfortunately people may not know what that something extra is until they see it — and that’s something not even Google has been able to figure out. So what do we know about what web searchers want? Weitz gave Wired.com a look at some of what Microsoft found when it when “back to the data” — namely Live.com search results — in a bid to make a qualitative leap in search performance. The data shows rampant clicking by many on the back button, while others get desperate enough to look to the second page of results. And when that doesn’t work, the users try again, coming up with slightly different terms. That’s about half of the searches. Only a quarter of searches return a good result — meaning an answer to a question (think a stock price), a satisfying search engine result or a happy ad click.

While this is a good start, it’s still not clear to me that the interpretations of the measurements are correct. Just because someone doesn’t click something, does that mean the search was a failure? Just because someone did click something, does it mean that the search was a success? It is not to difficult to come up with reasonable and abundant, counterexamples. And it’s still not clear how to differentiate task failure from process failure.

On a slightly different note, I found the following excerpt from the article particularly interesting: Continue reading →

Posted in Exploratory Search, General | 2 Comments

Machine Learning and Search: Action or Reaction?

Posted on May 26, 2009 by jeremy

I have a question that has been bothering me, kicking around in my head, for at least half a decade now. And I can’t seem to come to any solid conclusion on it. I suppose it can’t hurt to throw it out here onto the web, and see if one of my 3 readers has any thoughts on the matter.

A large amount of web search effort goes into statistics and probabilistic modeling of user queries and behaviors. There seems to be a generally-accepted, widely-held belief that the way to go about web search is to look at vast quantities of data, and modify search engine parameters based on that observable data. With enough users, and enough data, the search engine can be made better.

The approach seems very reasonable. Where I get concerned is with the iterative feedback loop:

The designers of the search engine provide specific functionality
The user utilizes that functionality
The search engine collects data/logs about the use of that functionality
The designers use machine learning to analyze those logs for evidence of success and failure, and finally
The designers modify the search engine to provide improved functionality that better satisfies these user behaviors
Goto 2.

What I do not see in this whole process is a way for the user to tell the engine designers that they need anything other than what the search engine already provides. From another perspective: Analysis of failure in steps 3&4 seems to only really tell you how well the user did when using the existing functionality. It does not tell you that the user had an information need, the satisfaction method of which falls outside of that existing functionality.

So how do (or how should) the developers of a search engine learn to recognize user needs and behaviors that fall outside of the what the users are able to say (through log analysis)? I am assuming that user-interviews are too small-scale to gather the necessary information, at web scale. And A/B testing only works when you’ve got a B that you’ve designed in reaction to a known behavior. But how do you develop that B, if you’ve never observed the behavior that will make use of B in the first place.. not because that behavior doesn’t exist, but because you cannot observe it until B exists? In short, when adding interactive features and capabilities to a search engine, how does one take proactive action in the development of those features, rather than only re-active reaction?

Posted in General, Information Retrieval Foundations | 13 Comments

Week Links, Volume 1

Posted on May 22, 2009 by jeremy

This was a particularly busy week, and I did not get a chance to post many thoughts. Instead, I’ll do a quick roundup of articles that I enjoyed reading this past week+.

First, a tongue-in-cheek post from Nick Carr entitled For Whom the Google Tolls:

It’s amazing that, before Google came along, any of us was able to survive beyond childhood. At the company’s Zeitgeist conference in London yesterday, cofounder Larry Page warned that privacy-protecting restrictions on Google’s ability to store personal data were hindering the company from tracking the spread of diseases and hence increasing the risk of mankind’s extinction. The less data Google is allowed to store, said Page, the “more likely we all are to die.”

Continue reading →

Posted in General | 2 Comments

Google Search Options and the Paradox of Choice

Posted on May 15, 2009 by jeremy

Google finally acquiesces, and starts exposing more advanced, user-controllable search result refactorization options. See here, here, and here:

But as people get more sophisticated at search they are coming to us to solve more complex problems. To stay on top of this, we have spent a lot of time looking at how we can better understand the wide range of information that’s on the web and quickly connect people to just the nuggets they need at that moment. We want to help our users find more useful information, and do more useful things with it. Our first announcement today is a new set of features that we call Search Options, which are a collection of tools that let you slice and dice your results and generate different views to find what you need faster and easier. Search Options helps solve a problem that can be vexing: what query should I ask? Let’s say you are looking for forum discussions about a specific product, but are most interested in ones that have taken place more recently. That’s not an easy query to formulate, but with Search Options you can search for the product’s name, apply the option to filter out anything but forum sites, and then apply an option to only see results from the past week.

I’m pleased to see that it is finally happening. For years I’ve clamored about how frustrating it is that Google not only hasn’t given users these sorts of options, but has actively campaigned against such functionality: They have often said that exposing advanced tools is “too complex” for users and that it would clutter the famously clean Google interface. Perhaps the long-held belief that simplicity trumps all other considerations is finally being let go, with the understanding that functionality is sometimes more important than bare and minimal interfaces. This is a good thing.

Willingness to expose these tools helps topple the myth that is often perpetuated, about how HCIR interfaces offer the user too much choice and therefore do more harm than good Continue reading →

Posted in Exploratory Search | 1 Comment