Researcher on Fire

Over the past month and a half, computer science researcher and UQAM Professor Daniel Lemire has been on fire.  He’s written a series of blog posts on what it means to do research and be involved with a research community.  I’ve thoroughly enjoyed the whole series, and want to pass along pointers to his last 8 posts:

Continue reading…

Collaborative Information Seeking (Ongoing Recap)

Now seems as good a time as any to post a quick recap of the series of collaborative information seeking posts that Gene and I have been writing over on Palblog.  We’re about halfway through the series.

Communicating about Collaboration Communicating about Collaboration: Intent Communicating about Collaboration: Synchronization Social Search Social Search Redux

I [...]

Google Music China launches

Well, the move comes 9 years after I suggested it to ‘em, but Google finally launches a music service:

http://www.wired.com/techbiz/media/news/2009/03/reuters_us_google_china

Now, my only question is whether they have simultaneously been researching and implementing intelligent search algorithms to go with the free music downloads, or whether they have been too busy moving Microsoft Office into [...]

Google Claim: Make Algorithms Smart through Data, not Complexity

Google researchers Alon Halevy, Peter Norvig, and Fernando Pereira have an article in IEEE Computer magazine entitled “The Unreasonable Effectiveness of Data“.  The article continues a theme that has been running strong within Google circles for the past half decade about how training a simple algorithm with larger amounts of data is more effective than having a smart algorithm that tries to generalize or draw inferences from smaller amounts of data:

Continue reading…

Controversial Views and Web Search

Daniel Tunkelang continues to raise provocative and interesting questions over on his blog.  I would like to point readers to the comments section of a recent post.  In one of my own comments there, I raise a question about ad-supported web search engines (as typified by, though by no means limited to, Google) and their willingness and ability to switch business models.  In particular, I express the following consternation:

Continue reading…

Media Gatekeepers and Transparency

PBS has an interesting article on the new media gatekeepers and the need for transparency in the process by which they promote media.  Here is an excerpt:

The problem for these new gatekeepers is that they are providing the old editorial functions, but there’s a key difference between the way they operate and the [...]

Music Retrieval: Algorithms or Explanatory Context?

At SXSW this year, Paul Lamere of The Echo Nest and Anthony Volodkin of Hype Machine engaged in a head-to-head panel about the utility of:

  1. Using computer algorithms (e.g. collaborative filtering, tag-based, content-based, etc.) to automatically recommend music, versus
  2. Using computers to (a) connect people who can directly recommend music to each other and (b) provide contextually relevant information around any shared songs

Perhaps I don’t fully understand the full subtlety of the conflict, but I find myself wondering: Why can’t you do both?

Continue reading…

Good Interaction Design Trumps Smart Algorithms

Over on the new CACM blog, researcher Tessa Lau has an interesting post on three common misconceptions that folks have about HCI.  I recommend reading the full article, but I would like to call attention to her provocative opening statement (emphasis mine):

I come to the field of HCI via a background in AI, having learned the hard way that good interaction design trumps smart algorithms in the quest to deploy software that has an impact on millions of users. Currently a researcher at IBM’s Almaden Research Center, I lead a team that is exploring new ways of capturing and sharing knowledge about how people interact with the web.  We conduct HCI research in designing and developing new interaction paradigms for end-user programming.

One of my biggest grievances with web scale search engines is that they have made the assumption that smart algorithms (or, at least, simple algorithms trained with enough data to be made smart) are more important than good interaction design.

Continue reading…

Content-Based Audio Search

Long-time Music Information Retrieval researcher Pedro Cano has a new book out, based on his dissertation: “Content-based Audio Search: From Audio Fingerprinting to Semantic Audio Retrieval“.  From the review:

Music search sound engines rely on metadata, mostly human generated, to manage collections of audio assets. Even though time-consuming and error-prone, human labeling is a common practice. Audio content-based methods, algorithms that automatically extract description from audio files, are generally not mature enough to provide the user friendly representation that users demand when interacting with audio content. This dissertation has two parts. In a first part we explore the strengths and limitation of a pure low-level audio description technique: audio fingerprinting. In the second part, we hypothesize that one of the problems that hinders the closing the semantic gap is the lack of intelligence that encodes common sense knowledge and that such a knowledge base is a primary step toward bridging the semantic gap. We present a sound effects retrieval system which leverages both low-level and semantic technologies.

Continue reading…

Evolutionary Thinking and IR Design

Just the other day I observed that Google, by thinking only evolutionarily and being unable to make leap-based changes, long ago fell into a local maximum trap.  The following blogpost from a designer who is leaving Google appears to reinforce this conjecture:

When a company is filled with engineers, it turns to engineering to solve problems. Reduce each decision to a simple logic problem. Remove all subjectivity and just look at the data. Data in your favor? Ok, launch it. Data shows negative effects? Back to the drawing board. And that data eventually becomes a crutch for every decision, paralyzing the company and preventing it from making any daring design decisions.  Yes, it’s true that a team at Google couldn’t decide between two blues, so they’re testing 41 shades between each blue to see which one performs better. I had a recent debate over whether a border should be 3, 4 or 5 pixels wide, and was asked to prove my case. I can’t operate in an environment like that. I’ve grown tired of debating such miniscule design decisions. There are more exciting design problems in this world to tackle.

Continue reading…