Exploration, Collaboration, and Open Government

What sort of information retrieval system would you build if you knew that all the users of your system would be expert or highly-motivated amateur searchers?  What sort of system would you build when you have a very large collection of unstructured information, and the goal in searching that information is not to find one document (e.g. navigate to a home page), but to find (a) relationships between documents, or (b) large sets of documents that all pertain to a single topic?  How would your algorithms be different?  How would your interfaces be difference?  How would the process itself (that middle layer in between algorithms and interfaces) be different?

Via Daniel Tunkelang’s recent post, I think that Government information might be a perfect domain in which to ask (and answer) these sorts of questions.  The U.S. Open Government Initiative has as its goal the release of loads of raw government data for use by any individual or organization.  How are people going to use this data?  What types of questions will they ask?  What types of questions could they ask, if given the proper tools (i.e. what might they not know that they want to ask, until it becomes possible?)

Two types of information retrieval might be perfect for this domain: Exploratory Search and (Explicitly) Collaborative Search.  Continue reading

Posted in Collaborative Information Seeking, Exploratory Search, Information Retrieval Foundations, Social Implications | Leave a comment

Google Music China: Unprecedented?

There is an interesting recent post about the history of Google Music in China from an article by Michael Zhang in Global Entrepreneur Magazine.  Some excerpts:

“It will play the right music without you having to give it any thought.” You get the music you want at the right time, for the right environment, and in the right mood.  Well, it sounds unrealistic that one’s thoughts can actually control music. Yet like the famous science fiction writer Arthur C. Clarke said, “Any sufficiently advanced technology is indistinguishable from magic.”  Revolutionary technologies like electrical power, airplanes and search engines have all changed the world in ways that were out of expectation. To some extent, Google’s second edition of their music search product, launched on March 30 by Hong Feng and his team, fulfilled the criteria in some way.  When Google China’s president Li Kaifu [who by now quit Google] and executives from hundreds of record companies posed for a photograph at the media conference, people might have ignored the fact that this product transcended reality on at least two levels.

They’re calling Google Music, an extremely late entrant into the music search business, a revolutionary, magical technology that transcends reality?  Are you kidding me?  Have they forgotten about Moodlogic (1998)?  Shazam (2000)?  Pandora (2000 or 2004, depending on whether you’re talking B2B or B2C)?  Last.fm (2004)?  Echo Nest (2005-ish?)   How does Google Music China compare with the solutions provided by these other companies?

Google Music unprecedentedly enriched the way people find music. You can find a song through the name of artist, titles of the song, albums, or even a sentence of lyric, and you can also play the hottest songs from the charts. However, the most impressive breakthroughs are these two functions – one can have music recommendations according to difference of the tempo, tone, and timber; similar songs are recommended according to the timber of specific songs. Fresh experience it offers and the technical complexity in its realization makes it the most ambitious and imaginative work of Google after it entered China.

“Unprecedentedly”?  Come on.  It’s totally “precedented”.  Continue reading

Posted in Music IR | 1 Comment

Data Liberation and Ownership

I split my blogging between this and the FXPAL blog.  This morning I have a post on the latter site that asks an (imho) important question about data ownership and data liberation with respect to one’s web search history.. not just the queries, but the results produce by a mashup between those queries and the back-end algorithms.  Here is the key point:

Here is an analogy by way of Adobe Photoshop.  Suppose you open one of your images in the online (webapp) version of Photoshop, apply the Gaussian Blur (soft focus) filter to the image, and then save that result out again.  It’s clear that you own the input (it’s your photo), that Adobe owns the Gaussian Blur algorithm (or at least the implementation of it), and that you own the resulting image.  Adobe doesn’t lay ownership claim to the output of the algorithm, even though it was their algorithm that produced the output.

So how is this different from a web search?  You own the input (the query string that you type).  Google owns the algorithm that transforms that input into a list of results.  So wouldn’t you also then own the output of that transformation?  Not the algorithm, but the output of the algorithm, i.e. the results set.  Just like you own the output in Photoshop.

It will be interesting to see whether or not Google will be open enough to allow you to extract this particular form of your data.  Currently, they do not.

I would invite you to visit the FXPAL blog, and read the post in full.  And comment/disagree, where necessary.

Posted in General | Leave a comment

There is No Crowd

Via Xavier Amatriain: The Dirty Little Secret About the “Wisdom of the Crowds” – There is No Crowd:

This is hardly the first time that the so-called “wisdom of the crowds” has been called into question. The term, which implies that a diverse collection of individuals makes more accurate decisions and predications than individuals or even experts, has been used in the past to describe how everything from Wikipedia to user-generated news sites like Digg.com offer better services than anything created by a smaller group could do.

Of course, we now know that simply isn’t true. For one thing, Wikipedia isn’t written and edited by the “crowd” at all. In fact, 1% of Wikipedia users are responsible for half of the site’s edits. Even Wikipedia’s founder, Jimmy Wales, has been quoted as saying that the site is really written by a community, “a dedicated group of a few hundred volunteers.”


Still, there [has] yet to be a perfect solution to the problem. Perhaps it’s time we give up the idea that the “wisdom of the crowds” was ever a driving force behind any socialized, user-generated anything and realize that, just like in life, there will always be active participants as well as the passive passerbys.

I have never quite liked the notion of “wisdom of crowds”, and the hype behind it even less, so I”m glad to see signs that the hype cycle is finally starting to wind down.  However, by having to confront exactly what it was that I didn’t like about the notion, I was intellectually forced to propose an alternative: Explicit Collaboration in Search.  As I wrote half a year ago: Continue reading

Posted in Collaborative Information Seeking, Information Retrieval Foundations | 2 Comments

Fast Flip: Is Bing Affecting Google?

…via Ask and SearchMe, that is?  Let me explain.  Google announced a new bit of interface design into its News search results today: Fast Flip:

Google Fast Flip is a web application that lets users…”flip” through pages online as quickly as flipping through a magazine…We capture images of the articles on our partners’ websites and then display them in an easy-to-read way…Readers can flip through stories quickly by simply pressing the left- and right-arrow keys until they find one that catches their interest. Clicking on the story takes them directly to the publisher’s website.

Funny, it reminds me a lot of Searchme.com (see this writeup by Danny Sullivan) from 2008, which itself was largely a continuation of Ask’s visual previews (binoculars) from 2006.  Funny thing is, visual search interfaces such as these have been pretty universally panned for quite some time now.  And panned by Google as well, if I remember correctly — I’m fairly sure I read something fairly official about it, though darned if I can find that post because Google’s search doesn’t allow “sort by least recent” relevant results, only “sort by most recent”.  Personally, I love interfaces like this and find them much easier to deal with.  But Google disagrees, and has (presumably) done all sorts of A/B testing to conclude that users don’t want to see their search results visually.  Because otherwise they would have rolled out these changes years ago, at the same time as, if not ahead of, Ask and SearchMe.  Right?

Or are Bing’s innovations in the interface domain finally spurring Google on, finally providing the competition to improve search that A/B testing cannot?  Continue reading

Posted in General, Information Retrieval Foundations | 1 Comment

Time to Eat My Words: The Search Box Grows

Half a year ago I wrote a blogpost about an easy change that Google could make to its interface, one that would both sacrifice only the least bit of simplicity as well as entice and encourage the user to enter longer queries, thus improving retrieval effectiveness.  In particular, I wrote:

So even though research has found that longer queries lead to more satisfied users, and that larger query input boxes lead to longer queries, Google is unable to take an evolutionary step in that direction.  That step violates their current locally-maximum hill principle of simplicity.  They seem fundamentally incapable of passing through the valley of complexity to reach an even higher effectiveness peak because evolutionary thinking does not allow them to take that large leap necessary.  They can only follow their current gradient.  In ten years of using Google, I don’t think that I have ever seen, even for brief experimental time periods, a query input box that was taller than one line.  Thus, evolutionary thinking conflicts with long-term goals.

Well, it’s time for me to eat those words.  For today, the search box grew in size.  From the official Google blog: Continue reading

Posted in Information Retrieval Foundations | 4 Comments

Apple 2009 = ISMIR 2000

If you haven’t heard (probably not likely), Apple announced a number of upgrades to its iPod/iTunes product line today.  It is interesting to me because I see more and more Music Information Retrieval making it into consumer products. Genius added smart playlisting a year or two ago (though from the reviews it doesn’t perform too well). And just today, the new ipod nano with a built-in FM reciever allows you to mark a song that is currently playing on the radio and iTunes will identify that song for you the next time you sync.

Granted, this musical audio fingerprinting is one of the oldest forms of Music Information Retrieval out there.  Fraunhofer had a working version at ISMIR 2000, nine years ago, and Avery Wang (the fellow behind Shazam) had also finished an implementation around that time.  Countless implementations have been done by numerous companies since then.  Apple isn’t even doing the “hard” version of the problem, i.e. in a crowded, noisy bar with lots of audio interference.  They’re instead using the pure radio signal.  This new Nano has a built-in microphone; you’d think that they’d take advantage of that do the noisy pub song ID thing, too.  Nevertheless, it’s nice to see more Music IR work being integrated into consumer products.

If you’re interested in research on creating better playlists, and doing all sorts of interesting search and exploration of music information, check out this year’s ISMIR in Kobe, Japan. It will be the 10th one.  Amazing!

Posted in Music IR | 5 Comments

Google’s Long Term Goals: More of the Same

A few days ago there was a Techcrunch interview of Google’s Eric Schmidt. Here’s the bit that struck me:

[TC] The long term goal of Google search, he says, is to give the user one exactly right answer to a query:

[Schmidt] So I don’t know how to characterize the next 10 years except to say that we’ll get to the point – the long-term goal is to be able to give you one answer, which is exactly the right answer over time.

The one answer?  Are you kidding me?  That’s Google’s long term goal?  That’s the extent of their imagination when it comes to information seeking?  Granted, even that narrow problem isn’t solved yet, so I’m not against more work being done in that area.  But where the love for Exploratory Search?  Search is so many things, including but not limited to learning, comparing and contrasting, synthesizing, discovery, planning and forecasting, as so on.

All of these information needs go beyond finding the one answer.  Sometimes there is no one answer, and the goal of the search is the discovery that multiple answers exist.  Sometimes you don’t even know what question to ask, and the goal of the search is the accretion of enough knowledge so as to be able to ask the right questions in the first place.  (See Belkin’s ASK model of information seeking, in which it is explicitly acknowledged that a single text is likely not sufficient for satisfying a user’s anomalous state of knowledge.)

Yet Google’s long term goals do not include supporting anything other than “the one right answer” finding?  Unbelievable.

Daniel Tunkelang read this same interview and sees a bit more hope than I do.  He sees some of Schmidt’s comments as suggestive of an increased willingness to engage in HCIR, which itself is more exploratory by nature.  I don’t see it, though.  The CEO of the corporation has spoken clearly: “The long-term goal is to be able to give you one answer.”

That sounds no different from what Google is trying to do right now.  The long-term goal does not include any growth into or acknowledgment of other forms of user information seeking behavior.  Rather, it is more of the same of what they already do now.  Just piled higher and deeper.


Posted in General | 4 Comments

My Luddite Summer

I came across an interesting post today entitled “My Luddite Summer” by NYT blogger Timothy Egan.  Just wanted to share this tidbit:

Came back to the city. Took part in a three-day experiment with other writers to see who was better informed — readers of newspapers only, or readers of the Web who had to stay away from aggregator sites built on newspaper stories.  The dead-tree team kicked a little digital butt, in my humble opinion, during Slate’s first annual News Junkie Smackdown. Didn’t matter. All the things we thought were important for a democracy to stay informed — Boeing’s inability to get its latest plane off the ground, Hillary Clinton’s overseas tour, Nancy Pelosi on her health care bottom line — merited not even a mention among the Webbies.

No commentary today.  I’ll just let that one stand on its own.

Posted in General | Leave a comment

Breadth Destroys Depth

A few days ago I posted a question about why modern web retrieval systems offer no explicit relevance feedback mechanisms.  I wonder if it has anything to do with the following attitude, explained by one of my favorite bloggers, Nick Carr:

The problem with the Web, as I see it, is that it imposes, with its imperialistic iron fist, the “ecstatic surfing” behavior on everything and to the exclusion of other modes of experience (not just for how we listen to music, but for how we interact with all media once they’ve been digitized). In the pre-Web world, we not only enjoyed the thrill of the overnight sensation – the 45 that became the center of your waking hours for a week only to be replaced by the new song – but also the deeper thrill of the favorite band in whose work we deeply immersed ourselves, often following its progression over many records and many years. Continue reading

Posted in Exploratory Search, Information Retrieval Foundations, Social Implications | 2 Comments