Last March I pointed out a short piece by Tessa Lau about how good interaction design trumps smart algorithms. Today I have a followup. In particular, Xavier Amatriain has a good writeup of the recently concluded Netflix contest. Some of the lessons learned by going through the process are related to the importance of good evaluation metrics, the effect of (lapsed) time, matrix factorization, algorithm combination, and the value of data.
Data is always important, but what struck me in the writeup was his discovery that the biggest advances came not from accumulation of massive amount of data, log files, clicks, etc. Rather, while dozens and dozens of researchers around the world were struggling to reach that coveted 10% improvement by eking out every last drop of value from large data-only methods, Amatriain comparatively easily blew past that ceiling and hit 14%.
How? He simply asked users to denoise their existing data by rerating a few items. In short, Amatriain resorted to HCIR:
Well, that is exactly what we have done in a paper that has been accepted for publication in Recsys09 (NY). The paper is entitled “Rate It Again” and you can access a preprint copy here. The basic idea in our approach is to ask users to re-rate items that they already rated in the past. We can then denoise ratings that prove to be inconsistent by minimizing their contribution to the recommendation process…We measured relative improvement in terms of RMSE up to 14% and we verified that this is consistent regardless of the particular recommendation algorithm (item and user-based CF, SVD, etc…).
Amatraian chalks this up as another victory for data-driven methods. And it’s true; collecting a little more data yielded vast improvements. But my interpretation of Amatriain’s work is not that of typical data-driven methodology. This isn’t the collection of mass- or web-scale amounts of unlabeled data, as is recommended by Halevy, Norvig, and Pereira. This is an interactive, small-scale, dialogue-oriented data. This is a small amount of human intelligence injected back into an otherwise faceless algorithm. This is what HCIR is all about:
Marchionini’s main thesis is that “HCIR aims to empower people to explore large-scale information bases but demands that people also take responsibility for this control by expending cognitive and physical energy.”
How best to elicit this interaction from the users, convincing the users of value while simultaneously minimizing the amount of effort (or the perceived amount of effort) required of the users is the purpose of interaction design of the sort Lau writes, above. And much more work needs to be done in this area. But I do find it interesting that this work shows that putting users back into the process yields much larger improvements than the combination of hundreds of data-only driven algorithms.
See also my previous posts:
This is actually why I went to work for SpeechWorks. I’d just seen Mike Phillips, CTO and co-founder, give a talk at ASRU emphasizing the fact that spoken dialogue systems present mainly an HCI problem, not an electrical engineering problem. It was a breath of fresh air compared to EE-heavy Bell Labs research (being in the Rockies rather than NYC may also have helped).
But we have to be careful. The single most highly correlated feature with task completion rates is speech recognition accuracy. You might think that’d motivate a garage full of electrical engineers toiling away on accuracy. Well, it does. But what it also motivates is a team of user interface designers toiling away on dialogue strategies to elicit understandable response from users and present data to them in an understandable way. And to develop dialogue strategies to deal with the inevitable errors.
For instance, asking a zip code before an address is a huge help. Zip codes are five digits, which are relatively easy to recognize. Then states, then cities. The state fact-checks the zip, then together restrict the possible cities. The research problem is to just say your address and transcribe it. This problem hasn’t been solved by better speech recognition.
Built-in design also helps. Credit cards have checksums, which can be used to aid recognition. Who knows how many errors this has saved, and not just from speech reco.
Bob: Good points. You’re saying, if I understand correctly, that if you phrase how you elicit the question, you can actually make the job of the algorithm much easier?
Pingback: vaultis.com » Blog Archive » 7 Days of Search and Social; the ultimate update
Pingback: Information Retrieval Gupf » Kasparov and Good Interaction Design