A NYT books article about Kasparov and chess, and the relationship between humans, machines, and decision processes is making the Twitter rounds today. I don’t have time at the moment to write a long comment about it, but I do want to point out that it supports a position that I’ve been taking on this blog for some time now:
This experiment goes unmentioned by Russkin-Gutman, a major omission since it relates so closely to his subject. Even more notable was how the advanced chess experiment continued. In 2005, the online chess-playing site Playchess.com hosted what it called a “freestyle” chess tournament in which anyone could compete in teams with other players or computers. Normally, “anti-cheating” algorithms are employed by online sites to prevent, or at least discourage, players from cheating with computer assistance. (I wonder if these detection algorithms, which employ diagnostic analysis of moves and calculate probabilities, are any less “intelligent” than the playing programs they detect.)
Lured by the substantial prize money, several groups of strong grandmasters working with several computers at the same time entered the competition. At first, the results seemed predictable. The teams of human plus machine dominated even the strongest computers. The chess machine Hydra, which is a chess-specific supercomputer like Deep Blue, was no match for a strong human player using a relatively weak laptop. Human strategic guidance combined with the tactical acuity of a computer was overwhelming.
The surprise came at the conclusion of the event. The winner was revealed to be not a grandmaster with a state-of-the-art PC but a pair of amateur American chess players using three computers at the same time. Their skill at manipulating and “coaching” their computers to look very deeply into positions effectively counteracted the superior chess understanding of their grandmaster opponents and the greater computational power of other participants. Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.
This result seems awfully similar to some of the other results I’ve reported on in the past. For example, see this paper by Amatriain:
Data is always important, but what struck me in the writeup was his discovery that the biggest advances came not from accumulation of massive amount of data, log files, clicks, etc. Rather, while dozens and dozens of researchers around the world were struggling to reach that coveted 10% improvement by eking out every last drop of value from large data-only methods, Amatriain comparatively easily blew past that ceiling and hit 14%. How? He simply asked users to denoise their existing data by rerating a few items. In short, Amatriain resorted to HCIR:
See also Tessa Lau’s post about how good interaction design trumps smart algorithms:
I come to the field of HCI via a background in AI, having learned the hard way that good interaction design trumps smart algorithms in the quest to deploy software that has an impact on millions of users. Currently a researcher at IBM’s Almaden Research Center, I lead a team that is exploring new ways of capturing and sharing knowledge about how people interact with the web. We conduct HCI research in designing and developing new interaction paradigms for end-user programming.
The theme that I see is that, while big data approaches do work well, what works even better is a small amount of user interaction. With big data methods (even ones that incorporate human interaction in the form of massive log data) all you can do is make inferences about what is good and what is not good. The more historical user data you have, the more correct your inference about the current scenario is likely to be. But none of it is as correct as receiving explicit feedback from the user, and turning a probability into a certainty.
And that’s where I see good interaction design coming into play. By turning a probability into a certainty, your back end algorithms can stop wasting their CPU cycles doing all the inferential heavy lifting about what the user is actually trying to say or do, and can start using their CPU cycles to explore a wider range of consequences of that informational certainty.