Daniel Tunkelang pointed me to a NYT article on the growing power, and therefore growing public unwariness, of Google. The article makes a number of points, but what struck me most was the pseudo-repartee between Google’s chief economist, Hal Varian, and long-time search watcher Danny Sullivan:
“You buy a car, use it for four years, and then you’ll look around at your choices,” Mr. Varian said. “But for search, we’re competing on a click-by-click basis.” If more users are going to Google, he said, it’s because they are concluding that Google’s product is superior.
Mr. Sullivan, who has been studying search engines since 1995, said that similar surveys have been done for many years — and that they always fail to reflect that most people have a primary attachment to a single search engine. When users try an alternative, he said, they “don’t go into active taste-testing mode”; afterward, they revert to their favorite. “Google is a habit,” he said, “and habits are very hard to break.”
It’s almost painful to admit, but Google’s Varian sounds more like a pundit than an economist or a scientist. Is it really true that Google is competing on a click-by-click basis? In the user studies that Google does, which of the following happens more often when the user types in a query to Google, and sees that Google has not succeeded in producing the information that they sought (fails):
- Does the user reformulate his or her query, and click “Search Google” again (one click)? Or,
- Does the user leave Google (one click), and try his or her query on Yahoo or Ask or MSN (second click), instead?
If what Hal Varian is saying is true, then I would expect there to be maximum entropy in the system.. 50% of the time users would take action (1), and 50% of the time they would take action (2). But my guess is that there isn’t that much entropy. My guess is that action (1) happens a lot more than action (2). And not just because it’s Google. The same sort of patterns are likely observable no matter the search engine. If a user on Yahoo doesn’t get what they want, I’ll bet they reformulate their query on Yahoo before trying a different engine. Same with MSN, even.
It therefore seems highly dubious to say that Google really is competing on ever single click. As any economist knows, there are other factors such as branding, loyalty, and laziness, that keep users reformulating their queries on the same site, rather than clicking to better results on other sites. Even distribution matters. Google is the default engine on Safari. It is the default engine on Firefox. And Google even paid Dell a rumored $1 billion in order for Google to be pre-bundled and set as the default search engine on all new Dell machines. All of that matters. Does Varian seriously not know this?
Google could put its punditry where its mouth is by one simple, innovative user interface improvement: They could have a set of links at the bottom of the page that say “Try your search on Yahoo” and “Try your search on Clusty”. That would turn a failed search from a two-click effort into a one-click effort, because the link would not only go to Yahoo, it would run the user’s same query on Yahoo.
In the early days of the web (1995, 1996) there were a lot more search engines that did this. If the user was truly dissatisfied with the Google results, it would truly make it a one-click experience to go to another search engine, and get immediate knowledge and feedback about which search engine is better. Rather than keeping the user locked inside the Google experience, where it is easier (one click) to re-search on Google rather than search on another engine (two clicks), Google should make it seamless so that competition truly is only one click away. Then, Varian could honestly claim that Google is competing on a click-by-click basis.
Or, another option would be for Google to relax its “you may not meta-search us” rule. By relaxing that rule, it would allow all sorts of innovation around the display of search results, so that users could more easily compare the results of Google vs. another search engine, side by side. By being able to do this, the users would get real feedback on the engine, and Google would have to compete for every click. Instead, by keeping the search results locked inside of the Google walls, and not letting users make better use of them, even for personal use only, Google introduces a barrier to competition, and plugs up the free, open flow of information that is the web.
Update: I was just reminded that Google itself used to have the “try your search on another engine” links in its very early days. Here are screenshots that show it. Now, those links are gone, and the user has to make twice as much effort to leave Google as they used to. That doesn’t seem very competitive.
Your points about actions 1 versus 2 are very astute. I’d guess that #2 happens a LOT on the # 2-10 search engines. Meaning people give that engine a try.. maybe attempt a reformulation.. then abandon that engine and try on Google. And I’m betting that people ‘abandon’ Google at a far less rate than other engines.. ie asymmetry of abandonment.
I’d love to do the following analysis given a browser log of search behavior:
Form a graph where the major search engines are nodes in the graph
For each pair of searches found in the log at time t and time t+1 for a given user, increment the counter on the edge SearchEngine(t) -> SearchEngine(t+1). Once the entire log is processed normalize the weights on all edges leaving a particular node.
We now have a markov chain of engine usage behavior. The directional edges in the graph represent probability of use transference to another engine, self-loops are the probability of sticking with the current engine.
If we calculate the stationary distribution of the adjacency matrix or probabilities, we should have a probability distribution that closely matches the market shares of the major engines. (FYI – this is what PageRank version 1,0 is – the stationary distribution of the link graph of the entire web)
What else can we do? We can analyze it like it’s a random walk and calculate the expected # of searches until a given user of any internet search engine will end up using Google. If the probabilities on the graph are highly asymmetric.. which we think they are.. this is a measure of the monopolistic power of people’s Google habit.
This should also predict the lifetime of a given ‘new’ MSN Live or Ask.com user.. meaning the number of searches they do before abandoning it for some other engine.
Predicted End Result: Google is the near-absorbing state of the graph.. meaning that all other engines are transient states on the route to Google sucking up market share. Of course this is patently obvious unless one of the bigs changes the game.
That’s an interesting idea! Could I suggest an augmentation to your initial idea? Each time an edge in the graph is traversed, we should compute (query for) the ranked list that comes from all available search engine, #1-10. That is, the top 100 pages and their ranks for all top 10 search engines in the graph should be noted. And then when the user finally lands on their end-destination engine and ends up clicking a result, we record not only the fact that the user traversed that particular edge, but we note the position in all the other engine lists of that same result.
That way, we can also compute an opportunity cost. If another engine had the result, ranked higher, then the user was, in a sense “wrong” to have traversed that particular edge. They should have traversed another edge, instead. And we, and the user, can be either (1) made aware of which edge they should have traversed, or (2) receive confirmation that their chosen edge traversal was indeed the optimal one.
My guess is that if this information were made available to the user, through some standardized set of APIs across all search engines, Google would not be as near-absorbing as you might think. But this will never happen, because Google will never open up its results-API to this sort of side-by-side comparison. As mentioned in the post above, it’s specifically against Google’s terms of service to “meta-search” Google, which is essentially what this is.
I would think that if Google really were the all-around best engine, it would have no problem releasing this information, because that openness would actually accelerate the shift to Google, a further increase in market share. Receiving confirmation that Google produced the best result, or feedback that your choice to switch to something other than Google was sub-optimal, can be a powerful incentive. But Google keeps this information closed and locked down, and is not open to this sort of real-time, user based comparison.
I bemoan the lack of transparency.
One could do part of this in a Firefox extension. Of course firefox users that install extensions are savvy and more likely to switch around to different engines.
There are people out there doing similar things with search logs and ‘learning to rank’ problems… something might be learned from there in what you propose… or we might get at some usable data.
Looks like Greg Linden found some interesting related research on click-graphs in search:
Note that metacrawler, dogpile, webcrawler and webfetch (all owned by InfoSpace) do a meta search or Google, Yahoo, Ask and MSN and blend the results together, You can more or less figure out the approximate ordering on the source engines by looking at the text annotating each link.
MetaCrawler was Oren Etzioni’s baby back in the day:
I see this problem as more of a social or political problem than a technological problem. Yes, you’re correct that this could be done with a Firefox extension. But if you’ve read the Google terms of service, you’ll see that it’s against their terms of service to do this. Will they catch you? Could they catch you? I don’t know. But the point is, their political will is that you don’t do it. They explicitly forbid it.
And that needs to change.
I think all these things are great — learning to rank, metasearch, click-graphs, etc. Daniel Dreilinger had some interesting work in 1996-1997 about learning which search engine to automatically select on a query-by-query basis. That’s different than results aggregation/dogpile. We’ve still not caught up to the things that were envisioned back then.
But again, this is less of a technological failure than it is a political failure, a competitiveness failure, and an openness failure. If the search engines were to actually open up and give any end user sanctioned access to their deep rankings, not just the top 10 or 20, there are all sorts of innovations that the community could and would build on top of that.
For that matter, I still want to see Google put back at the bottom of the first query results page the “Try your query at Yahoo, at AltaVista” etc. links. They used to have those links, a decade ago. They took them away. Why?
The work that Greg cites uses two fundamental graph types: query-click bipartite graphs and the static web link graph. I’m talking about creating a third type of graph: a between-engine query graph. It’s the dynamically-created link that appears at the bottom of search results pages, which users then traverse manually whenever that particular search engine does not meet their needs.
Google used to provide edges on this third graph. They no longer do. Their goal use to be to get users off of Google as quickly as possible. Now, rather than admit that another engine might do better on a particular query than Google itself (which was one of Dreilinger’s hypotheses, if I’m not mistaken), they rather you stay within Google’s garden walls.
How is that open? How is that competitive? It is not, as Hal Varian says, competing for every single click. Rather, by removing the ability to go to another engine with a single click, as Google used to let you do in the days before it started making so much advertising money, Google de-levels the playing field, and slopes it in their own favor.
http://www.Yandex.ru – the most popular (60%) russian search engine still does it, you can click&search out of 5 top search engines at the bottom of a result list.