Speed Matters. So Does the Metric.

Via Greg Linden, I came across the following experimental result from Google as to the importance of quickly returning results to users.  The gist of the experiment is summed up in the abstract:

Experiments demonstrate that increasing web search latency 100 to 400 ms reduces the daily number of searches per user by 0.2% to 0.6%. Furthermore, users do fewer searches the longer they are exposed. For longer delays, the loss of searches persists for a time even after latency returns to previous levels.

Google therefore concludes that speed matters, that it is of utmost importance to return results as fast as possible, otherwise users will be less satisfied users.  Less satisfied users, the metric assumes, means fewer queries.

I am not as immediately convinced.  Sure, I have no doubt that the number of queries issued did drop as a result of latency increase.  But can we immediately conclude, from the information contained within this report, that users were less satisfied with their overall search processes?  The author writes: Continue reading…

200 Signals, Still Only One Route

Via Paul Lamere, I came across this recent Google blogpost on large scale graph computing.  I started reading, and quickly became excited by what I was hearing:

A relatively simple analysis of a standard map (a graph!) can provide the shortest route between two cities. But progressively more sophisticated analysis could be applied to richer information such as speed limits, expected traffic jams, roadworks and even weather conditions. In addition to the shortest route, measured as sheer distance, you could learn about the most scenic route, or the most fuel-efficient one, or the one which has the most rest areas. All these options, and more, can all be extracted from the graph and made useful — provided you have the right tools and inputs.

“Yes!” I thought.  ”Yes!  I am finally starting to see a growing acknowledgement from one of the Search Majors that when you have a goal-oriented topic, to get from Point A to Point B, there isn’t just a single, most effective, most efficient route.  A user might actually want to choose — explicitly choose via input tools — different pathways through all the potential waypoints.   Continue reading…

Compare Google Yahoo Bing

I would like to point to a post worth reading, over at Blogoscoped, about personal, blind side-by-side comparisons of the various contending search engines.  I have seen studies like this for years, both on the web and in published, academic papers (see my earlier post).  And this current, informal study continues to confirm what all the other studies have shown: When you strip away branding information, there is no clear winner from among the top-contending search engines.  Maybe years ago, Google was leaps and bounds better than all the others.  Today, it does not appear to be the case.  

The reason I point out this informal study is not only to continue to raise awareness of the essential parity among the engines, but to point out something interesting that the author of the post (Philipp Lenssen) says: Continue reading…

Machine Learning and Search: Action or Reaction?

I have a question that has been bothering me, kicking around in my head, for at least half a decade now.  And I can’t seem to come to any solid conclusion on it. I suppose it can’t hurt to throw it out here onto the web, and see if one of my 3 readers [...]

Search Engine Rotation: Wolfram Alpha vs. Google

Apropos to my post yesterday, Technology Review has a short comparison of Wolfram Alpha and Google.  Here are a few samples:

Here’s what I entered, and what I found.

SEARCH TERM: Microsoft Apple

WOLFRAM ALPHA: I got side-by-side tables and graphics on the stock prices and data on the two companies, plus a chart plotting the price of both stocks over time.

GOOGLE: The top hits were mostly news stories, from major and minor publications, containing both words.

And.. Continue reading…

The Tyranny of Simplicity

One of my ongoing frustrations with modern, consumer-facing information organization and retrieval systems is the way in which functionality is often sacrificed in the name of simplicity.

Full functionality under the rubric of simplicity is a laudable goal, and I would agree that this is where we all eventually want to end up in the information systems, interfaces and algorithms that we are designing.  Simplicity without full functionality, but with alternative complex interfaces for an advanced user to specify greater functionality is a satisfactory stepping stone along the path to this goal.  But simplicity with obstructed or stunted functionality, with no possibility for the user to improve that functionality, is too often what we end up with.

Case in point: Apple’s iTunes/iPod. Continue reading…

Retrievability and Prague Cafes

A week or two ago I began writing a few thoughts about large-data based algorithms and retrievability.  It was spawned by the Unreasonable Effectiveness of Data position paper by a couple of notable Googlers, which then led to a brief discussion.

My main contention was that by relying to heavily on algorithms that are based solely on accumulations of large-data, and by not offering users exploratory search options to turn off the large-data, popularity bias, searchers would be unable to ever find certain pieces of relevant information. This is not even a matter of knowing the correct query terms to use; I argued (backed up by published research) that even if you knew the correct terms, you still could not find certain pieces of information.

Well, now I want to write about the other half of the equation: What do you do when the information is retrievable under some term, but you just do not know that term?  Why do search engines not give you more help with finding information which does exist if you know exactly the right word to use, but for which no reasonable person would ever know the correct word?

Let me give an example: Hidden Cafes in Prague.

Continue reading…

Google Similar Images: Only 20%?!

A few days ago, Google launched “similar image search” functionality.  From TechCrunch:

A new 20% time Google project has just launched called Google Similar Images. It’s pretty self-explanatory — when you search for an image and find one close to what you’re looking for, Google can now find ones that it believes to be [...]

Dagstuhl Seminar on Content-Based Retrieval

As a researcher, it is occasionally quite interesting to reread thoughts and positions that I’ve taken in years and works past. Sometimes I can observe a marked shift from my previous thinking; avenues or approaches that I once considered fruitful I now no longer do. And sometimes I can observe hints and seeds of my current research; avenues of which I only had a vague inkling have blossomed into larger pursuits.

In April of 2006 I had the good fortune to attend a Dagstuhl Seminar on Content-Based Multimedia Information Retrieval (I am toward the upper left corner of the seminar group photo).  Ramesh Jain has a good writeup of Dagstuhl Seminars, what they are and how they work.  In the abstract of my Seminar presentation I wrote:

Continue reading…

Retrievability

In my previous post I talked a little about the notion that big data alone cannot solve many of our problems.  I would like to give a more concrete example of this by discussing a paper published at CIKM 2008: “Retrievability: An Evaluation Measure for Higher Order Information Access Tasks” by Azzopardi and Vinay.  In large part, my desire to discuss this paper comes from a few of Peter Norvig’s comments in the aforementioned thread on big data: 

Continue reading…