Opposite Day

Two pieces of recent news have my head spinning. Both are instances of technology companies acting in exactly the opposite manner from their ideals (and public statements). The first is Microsoft announcement of an open-source version of BigTable

Instead of creating a proprietary copy of these pieces of infrastructure, Powerset decided instead to turn to Hadoop, a Lucene subproject that is a framework for running data-intensive applications on large clusters of commodity hardware…Unfortunately, there was no Hadoop equivalent to Google’s BigTable storage engine.  Because we have benefited greatly by leveraging the available Hadoop technology, Powerset decided to give back to the community by developing an open-source analog to BigTable that is built on top of HDFS (Hadoop Distributed File System). After all, we need to develop it, anyway, it isn’t part of the Powerset “secret sauce,” and we, in turn, could benefit from contributions from other members of the community.

After years of FUD about how open source is buggy, insecure, and costlier than proprietary software, this MS announcement had me doing a double-take. I then read that Google is advertising its Chrome browser on television

…we designed a Google TV Ads campaign which we hope will raise awareness of our browser, and also help us better understand how television can supplement our other online media campaigns. So today, we’re pleased to announce that we’re using Google TV Ads to run our Chrome ad on various television networks starting this weekend. We’re excited to see how this test goes and what impact television might have on creating more awareness of Google Chrome.

Now my head was spinning. Google has crooned for years about how it purposely does not advertise any of its own services, mandating that Engineers’ projects must spread by word of mouth alone. I’ve been hearing this (to the earliest of my memory, though maybe they’ve been saying it for even longer) since 2001, and I heard it again as recently as nine months ago when Googler Kai-Fu Lee repeated the talking point in front of an audience of hundreds during his SIGIR 2008 keynote.  

So now Microsoft is releasing open source code, and Google is marketing itself on television.  It must be opposite day.

Update: It may be worth noting the actual technologies involved, through which opposite day is occurring. Microsoft (Powerset) is going open-source via a search-based technology, while Google is going old-school television marketing through what essentially amounts to an operating system (Chrome as an application programming layer / virtual web OS).  It must be double opposite day.

Posted in General, Social Implications | Leave a comment

Personal Branding and Search Results Integrity

Google is an information retrieval company that prides itself on the purity of its results.  It does not allow the integrity of its ranked list ordering to be tampered with by sponsored results. It also has claimed for years that it does not engage in hand-coding (aka hand-crafting or hard-coding) of results. Everything that it returns in the non-sponsored, organic list is purely algorithmic, or at least only indirectly influenced by the hand of humans (e.g. relevance assessors and quality raters).  The order in which a result is ranked will not be — as far as I’ve always understood Google’s position — hand-picked.

So I was much surprised recently to learn about a new initiative from Google that allows you to create a Google profile for yourself, which Google places into the 10th slot in the organic results when someone searches for your name!  From the official Google blog:

To give you greater control over what people find when they search for your name, we’ve begun to show Google profile results at the bottom of U.S. name-query search pages…Don’t have a Google profile? Just search for [me] and follow the instructions at the top of the page to create one. In just a few minutes, you can create a public profile that represents you and that appears when people search for your name on Google.

How is this not hard-coding of results?  Continue reading

Posted in General, Social Implications | Leave a comment

Universal Search is not Exploratory Search

In a recent response article, Danny Sullivan takes Forbes CEO Spanfeller to task on the whole Google vs. The Newspapers issue.  There are a lot of things I agree with Danny about, and an equal number of things that I disagree with.  But I feel compelled to propagate one nugget from Spanfeller:

Spanfeller: Search is not really all that great at the moment, a comment repeated time and again by much more astute folks then me. This is especially true when looking for high-quality professionally created content. This is not to say that user-generated content or ecommerce options or product specs should not be returned in search results, simply that there is clearly a better way to showcase the different paths an end user might be pursuing. The idea that everyone is forced into trying to “game” the system so that they get their “fair” (or sometimes not so fair) share is testament to how terribly wrong this entire process has become.

This excites me because I see in this statement an acknowledgment and realization that Exploratory Search and HCIR (“showcasing”) is necessary.  Sullivan, however, completely misses the point: Continue reading

Posted in Exploratory Search, Social Implications | Leave a comment

Search Engine Rotation: Wolfram Alpha vs. Google

Apropos to my post yesterday, Technology Review has a short comparison of Wolfram Alpha and Google.  Here are a few samples:

Here’s what I entered, and what I found.

SEARCH TERM: Microsoft Apple

WOLFRAM ALPHA: I got side-by-side tables and graphics on the stock prices and data on the two companies, plus a chart plotting the price of both stocks over time.

GOOGLE: The top hits were mostly news stories, from major and minor publications, containing both words.

And.. Continue reading

Posted in Information Retrieval Foundations, Social Implications | Leave a comment

Do You Rotate Your Search Engine Usage?

It is good practice to rotate the mattress on your bed, to prevent lopsided wear-and-tear from shortening its useful life.  The same thing applies to car tires; they need rotating.  Smart travelers know to rotate the airlines from which they purchase tickets, as the accumulation over time of per-ticket better prices often outweighs the rewards or miles than comes from a single airline’s loyalty perks. Even the internet itself works by allowing packets of information to dynamically rotate across different routes, based on traffic congestion, rather than tying up a full end-to-end circuit.

So why wouldn’t you rotate your search engine usage? Continue reading

Posted in General, Social Implications | 1 Comment

More and Faster versus Smarter and More Effective

Last month, in reaction to the “Unreasonable Effectiveness of Data” paper that made the rounds, Stephen Few from the Business Intelligence community wrote an interesting post:

The notion that “we need more data” seems to have always served as a fundamental assumption and driver of the data warehousing and business intelligence industries. It is true that a missing piece of information can at times make the difference between a good or bad decision, but there is another truth that we must take more seriously today: most poor decisions are caused by lack of understanding, not lack of data. The way that data warehousing and business intelligence resources are typically allocated fails to reflect this fact. The more and faster emphasis of these efforts must shift to smarter and more effective. Although current efforts to build bigger and faster data repositories and better production reporting systems should continue, they should take a back seat to efforts to increase the data sense-making skills of workers and to improve the tools that support these skills.

This is a point that I wholely subscribe to, and an aspect of which I encountered the other day when attempting to use web search engines to satisfy my “hidden cafes in prague” information need.  Continue reading

Posted in Explanatory Search, Exploratory Search | 2 Comments

The Tyranny of Simplicity

One of my ongoing frustrations with modern, consumer-facing information organization and retrieval systems is the way in which functionality is often sacrificed in the name of simplicity.

Full functionality under the rubric of simplicity is a laudable goal, and I would agree that this is where we all eventually want to end up in the information systems, interfaces and algorithms that we are designing.  Simplicity without full functionality, but with alternative complex interfaces for an advanced user to specify greater functionality is a satisfactory stepping stone along the path to this goal.  But simplicity with obstructed or stunted functionality, with no possibility for the user to improve that functionality, is too often what we end up with.

Case in point: Apple’s iTunes/iPod. Continue reading

Posted in General, Information Retrieval Foundations | 12 Comments

World Pinhole Photography Day

While the focus of this blog is the retrieval of existing information, from music to images to videos to text, every once it a while it is nice to create new information as well.  In that spirit I decided to participate in World Pinhole Photography Day, which is today, Sunday April 26, 2009.  While I do not own a true pinhole camera — a cardboard box and some black tape for the shutter — I do own a pinhole lens (f/177) for my digital camera.  So I went out and took some pinhole photos today.  Not my best work.. but hey, it’s pinhole day!

b/w pinhole

Posted in General | Leave a comment

Retrievability and Prague Cafes

A week or two ago I began writing a few thoughts about large-data based algorithms and retrievability.  It was spawned by the Unreasonable Effectiveness of Data position paper by a couple of notable Googlers, which then led to a brief discussion.

My main contention was that by relying to heavily on algorithms that are based solely on accumulations of large-data, and by not offering users exploratory search options to turn off the large-data, popularity bias, searchers would be unable to ever find certain pieces of relevant information. This is not even a matter of knowing the correct query terms to use; I argued (backed up by published research) that even if you knew the correct terms, you still could not find certain pieces of information.

Well, now I want to write about the other half of the equation: What do you do when the information is retrievable under some term, but you just do not know that term?  Why do search engines not give you more help with finding information which does exist if you know exactly the right word to use, but for which no reasonable person would ever know the correct word?

Let me give an example: Hidden Cafes in Prague.

Continue reading

Posted in Information Retrieval Foundations | 4 Comments

Google Similar Images: Only 20%?!

A few days ago, Google launched “similar image search” functionality.  From TechCrunch:

A new 20% time Google project has just launched called Google Similar Images. It’s pretty self-explanatory — when you search for an image and find one close to what you’re looking for, Google can now find ones that it believes to be the same, or similar.

Much has been written and discussed about this, around the web.  I am not going to add my reviews or my opinions about the service itself, right now.  Rather, I have one overarching, confounding question that I would like answered: Why was this only a 20% time project?

Search, information retrieval, and information organization are all key to Google’s primary mission statement.  Image search via similarity is a well-known, long-studied problem. Why is it only now that such solutions are being offered by the Web giant, and not years ago?  More importantly, why was this not somebody’s 80% project, rather than a 20% time project?  One would think that such a mission-core piece of functionality would be something that Google would pursue 4 of 5 days a week, not only 1 of 5 days a week.

In related work, Microsoft’s web image search engine has facets.  Innovation all around.

Posted in General, Information Retrieval Foundations | 3 Comments