Speed Matters. So Does the Metric.

Via Greg Linden, I came across the following experimental result from Google as to the importance of quickly returning results to users.  The gist of the experiment is summed up in the abstract:

Experiments demonstrate that increasing web search latency 100 to 400 ms reduces the daily number of searches per user by 0.2% to 0.6%. Furthermore, users do fewer searches the longer they are exposed. For longer delays, the loss of searches persists for a time even after latency returns to previous levels.

Google therefore concludes that speed matters, that it is of utmost importance to return results as fast as possible, otherwise users will be less satisfied users.  Less satisfied users, the metric assumes, means fewer queries.

I am not as immediately convinced.  Sure, I have no doubt that the number of queries issued did drop as a result of latency increase.  But can we immediately conclude, from the information contained within this report, that users were less satisfied with their overall search processes?  The author writes:

All other things being equal, more usage, as measured by number of searches, reflects more satisfied users.

The problem I see here is not with experiment.  It is with the metric.  To begin with, all other things were not equal.  They are simply assumed to be.  Nowhere in this experimental setup do I see any evidence of whether the number of clicks between the two test conditions went up or down.  Users could have been doing fewer searches, but were they clicking on fewer results in the searches they did?  Hand-in-hand with that is a question of where in the results lists the items were, that the user clicked when they finally did click an item. Without knowing this information, there is not a whole lot that one can conclude from these experiments.

Let me illustrate what I mean with the following two scenarios:

(Scenario A) In this scenario, let us suppose that there is no delay in the system. Query results come back fast.  User #1 types query#1, doesn’t find what they need in the top 4 results above the fold, then types query#2, and finds the desired result at rank 3, thereby concluding the search.

(Scenario B) In this scenario, let us suppose that there is a 400 ms delay.  Query results come back perceptibly slower.  User #2 types the same query#1, and again doesn’t find what they need in the top 4 results above the fold.  However, because searches don’t seem as instantaneous as they used to, user#2 decides to scroll down a little further in the ranked list, rather than impatiently hammering out another query.  So the user scrolls down, finds their result at rank 6, and concludes the search.

So even though results came back slower in Scenario B, it could very well be that the pace of the search engine subconsciously encouraged a little less “attention deficit disorder” in the user’s behavior.  Instead of hammering out another query, the user learns to more patiently take the time to look through the existing results to find what he needs.  Both users in each scenario are not only completely satisfied with their search results, but it could very well be the case that User#2 in Scenario B actually found the result in an overall smaller total amount of time.  Why?  Because User #2 didn’t have any additional overhead time involved with formulating, typing, and then executing a second query.  User #1 may be wasting more time than User #2, because of User#1’s speed-induced “jackrabbit driving” style of search engine usage.  By encouraging “fast acceleration, fast braking” style of search, rather than a more contemplative, investigative, nay even exploratory pattern, the search engine may be doing more harm than good.  It may be that fewer total  searches are indicative of higher overall user satisfaction!

I say may, because I do not know, either.  This experiment does not cite any of the necessary, relevant information in helping us, the readers, make that determination.  Rather, this experiment introduces a (what I find to be) somewhat unjustified metric, and immediately jumps to conclusions based on this metric.  It is automatically assumed, without offering any ancillary evidence or hypothesis, that “more searches = more satisfied users” is the proper, correct metric.  I can’t dismiss the conclusions reached in this experiment, but I also can’t quite believe them, either.  I would be interested in learning if there are better metrics out there.

Update 1: This need to choose a proper metric, and know what it is that you’re really measuring, is in part what I was getting at with one of my other posts from a month ago.

Update 2: I should point out the concluding paragraph in the aforementioned paper; I find it interesting:

Furthermore, observing these users for the 400 ms delay after we stop subjecting them to the delay, the rate of daily searches per user for the experiment is still −0.21% relative to the control (averaged over the 5 weeks after removal of the delay). For longer delays, the loss of searches persists for a time even after latency returns to previous levels.

Obviously the author is interpreting this observation as evidence in support of the hypothesis that the disfavorable impression that the users develop, as a result of 400ms delays, persists even after the delay has been removed.  And that it is therefore of utmost importance to keep user’s satisfied by keeping things fast.  However, I find that this observation equally supports the hypothesis that users have learned to slow down their jackrabbit, non-thinking, query-hammering, brute-force searching behavior in favor of an equally satisfying, more patient approach.  And that this patience persists even after the delay has been removed. Which hypothesis is correct?  I don’t know.  The point is that I couldn’t quite tell, either way, from the metric that was used and the information that was provided.

This entry was posted in Information Retrieval Foundations. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *