Google Claim: Make Algorithms Smart through Data, not Complexity

Google researchers Alon Halevy, Peter Norvig, and Fernando Pereira have an article in IEEE Computer magazine entitled “The Unreasonable Effectiveness of Data“.  The article continues a theme that has been running strong within Google circles for the past half decade about how training a simple algorithm with larger amounts of data is more effective than having a smart algorithm that tries to generalize or draw inferences from smaller amounts of data:

Continue reading

Posted in Information Retrieval Foundations | 3 Comments

Controversial Views and Web Search

Daniel Tunkelang continues to raise provocative and interesting questions over on his blog.  I would like to point readers to the comments section of a recent post.  In one of my own comments there, I raise a question about ad-supported web search engines (as typified by, though by no means limited to, Google) and their willingness and ability to switch business models.  In particular, I express the following consternation:

Continue reading

Posted in Information Retrieval Foundations, Social Implications | Leave a comment

Media Gatekeepers and Transparency

PBS has an interesting article on the new media gatekeepers and the need for transparency in the process by which they promote media.  Here is an excerpt:

The problem for these new gatekeepers is that they are providing the old editorial functions, but there’s a key difference between the way they operate and the way that movie critics, music reviewers and video store clerks operate: They are making editorial decisions without telling us who they are, what they like and how they are making those decisions. Otherwise, we will be left to wonder, left to come up with our own conspiracy theories, and we will lose trust in these services.

I believe this need for transparency is true not only for Twitter, Apple and YouTube, but for all types of search, including general web search.  Search engines need to get better at at explaining why results were retrieved, lest users begin losing trust in those engines and/or find themselves ultimately unable to find the information they desire, due to an inability to correctly express their information needs.

Posted in Explanatory Search, Exploratory Search, Social Implications | 1 Comment

Music Retrieval: Algorithms or Explanatory Context?

At SXSW this year, Paul Lamere of The Echo Nest and Anthony Volodkin of Hype Machine engaged in a head-to-head panel about the utility of:

  1. Using computer algorithms (e.g. collaborative filtering, tag-based, content-based, etc.) to automatically recommend music, versus
  2. Using computers to (a) connect people who can directly recommend music to each other and (b) provide contextually relevant information around any shared songs

Perhaps I don’t fully understand the full subtlety of the conflict, but I find myself wondering: Why can’t you do both?

Continue reading

Posted in Explanatory Search, Exploratory Search, Music IR | Leave a comment

Good Interaction Design Trumps Smart Algorithms

Over on the new CACM blog, researcher Tessa Lau has an interesting post on three common misconceptions that folks have about HCI.  I recommend reading the full article, but I would like to call attention to her provocative opening statement (emphasis mine):

I come to the field of HCI via a background in AI, having learned the hard way that good interaction design trumps smart algorithms in the quest to deploy software that has an impact on millions of users. Currently a researcher at IBM’s Almaden Research Center, I lead a team that is exploring new ways of capturing and sharing knowledge about how people interact with the web.  We conduct HCI research in designing and developing new interaction paradigms for end-user programming.

One of my biggest grievances with web scale search engines is that they have made the assumption that smart algorithms (or, at least, simple algorithms trained with enough data to be made smart) are more important than good interaction design.

Continue reading

Posted in Information Retrieval Foundations | 4 Comments

Content-Based Audio Search

Long-time Music Information Retrieval researcher Pedro Cano has a new book out, based on his dissertation: “Content-based Audio Search: From Audio Fingerprinting to Semantic Audio Retrieval“.  From the review:

Music search sound engines rely on metadata, mostly human generated, to manage collections of audio assets. Even though time-consuming and error-prone, human labeling is a common practice. Audio content-based methods, algorithms that automatically extract description from audio files, are generally not mature enough to provide the user friendly representation that users demand when interacting with audio content. This dissertation has two parts. In a first part we explore the strengths and limitation of a pure low-level audio description technique: audio fingerprinting. In the second part, we hypothesize that one of the problems that hinders the closing the semantic gap is the lack of intelligence that encodes common sense knowledge and that such a knowledge base is a primary step toward bridging the semantic gap. We present a sound effects retrieval system which leverages both low-level and semantic technologies.

Continue reading

Posted in Music IR | 3 Comments

Evolutionary Thinking and IR Design

Just the other day I observed that Google, by thinking only evolutionarily and being unable to make leap-based changes, long ago fell into a local maximum trap.  The following blogpost from a designer who is leaving Google appears to reinforce this conjecture:

When a company is filled with engineers, it turns to engineering to solve problems. Reduce each decision to a simple logic problem. Remove all subjectivity and just look at the data. Data in your favor? Ok, launch it. Data shows negative effects? Back to the drawing board. And that data eventually becomes a crutch for every decision, paralyzing the company and preventing it from making any daring design decisions.  Yes, it’s true that a team at Google couldn’t decide between two blues, so they’re testing 41 shades between each blue to see which one performs better. I had a recent debate over whether a border should be 3, 4 or 5 pixels wide, and was asked to prove my case. I can’t operate in an environment like that. I’ve grown tired of debating such miniscule design decisions. There are more exciting design problems in this world to tackle.

Continue reading

Posted in Information Retrieval Foundations | Leave a comment

Social ?= Collaborative

There is an an interesting comment thread happening over on the FXPAL blog, about the differences between social search and collaborative search:

http://palblog.fxpal.com/?p=350#comments

Posted in Collaborative Information Seeking, Social Implications | Leave a comment

Long Term versus Evolutionary Thinking (Part 2 of 2)

Continued from Part 1.

Now that I’ve fully (perhaps too much so) explained the analogy that I will be using, I’d like to ground this discussion in the subject of information retrieval.  And I’ll start with an example that O’Reilly used in his talk: Google. (This is an Information Retrieval blog, after all, and Google was the example that Tim used.)  The company, he says, successfully exhibits both long term and evolutionary thinking.  It takes the long term view through its very mission statement: “To organize the world’s information”.  What could be more long term, more global, than that?  At the same time, Google has a very evolutionary approach in that it starts with simple, elegant solutions and couples them with ongoing user measurements.  If and when changes to Google’s engine are made, they are made based on small evolutionary steps that become apparent through the actions of the user.  It’s a point of pride within the Google organization that every change to the engine is scrupulously measured and A/B tested so as to be able to tell whether the change was better or worse.  The user provides the fitness function, the arrow that points in the uphill direction, toward which the search engine evolves. 

So the question is, does Google suffer from this conflict between long term and evolutionary thinking?  My contention is that they do.

Continue reading

Posted in General, Information Retrieval Foundations, Social Implications | 8 Comments

Long Term versus Evolutionary Thinking (Part 1 of 2)

Last week I attended the O’Reilly eTech conference.  The first night, Tim O’Reilly gave his annual Radar talk, in which he surveys the technology landscape and comments on upcoming and interesting trends. I have heard this Radar talk for years, via the IT Conversations podcast network, but this was the first time I’d seen it in person. O’Reilly always has challenging, thought-provoking things to say, and this year was no different.  He did, however, mention two emerging trends or patterns that I thought contradicted each other, and I want to specifically comment on those.

Continue reading

Posted in General, Information Retrieval Foundations, Social Implications | 1 Comment