The web is abuzz this week with talk of the Google Books Ngram Viewer. It’s a great tool, and leads to some very interesting exploration and trend visualization. So does this tool fly in the face of my rant from a few days ago, about how Google’s improvements to search are all automated improvements, with no opportunity for the user to learn and grow?
The first problem is that because (most of) those 550 changes happen while the users are still “asleep”, users don’t actually notice them. Google doesn’t exactly go out of its way to make many of its search improvements visible to the user, and so it’s often difficult to tell whether or not something has happened. As a user, I personally don’t like that approach, because a change that is invisible or purposely hidden is a change that I as a user have no control over, and am not able to change back or alter further. And as I argued in an earlier post, the way to creating passionate search users is not to give them luxury seats without waking them up. Instead, the way to create passionate search users is to give them search tools that give users a path in which they can grow, improve, and get better at searching. Do users get better at flying, or at seeing and comprehending an information landscape from 30,000 feet, if they’ve got luxury chairs? Arguably not. If anything, the luxury chairs make it harder for users to sit upright, to have a “leaning forward”, engaged experience. Users are less inclined, pun intended, to be active participants in the experience. All the decision are being made for them.
On the surface, it would appear that users now have such a tool, a way to explore, compare, and learn. A way to lean forward at the edge of their seats (rather than lean back, asleep, in luxury chairs) and make search work for them, rather than the other way around. However, the problem is that this tool is still not connected back into an actual search. One can visualize trends, but one cannot actually find books that best exemplify these trends.
Take for example the Ngram Viewer query [science, religion]. Over the past two centuries, the word religion has decreased, and the word science has increased. Around the 1930s, the two terms cross. Now, as a user, what I would like to be able to do is run an actual Google Books query that looked something like this: [find books that contain both science and religion, and rank ascendingly by absoluteValue(1 – science/religion)]. Put clearly, books in which the number of occurrances of science to religion is about equal should be ranked at the top of the results list. What this does is let me not only visualize the trend, but also find the best (most relevant!) books that exemplify what I think is the most interesting aspects of the trend: the point at which the lines cross.
And to be clear: I don’t just want the list of books from that decade. I want the books from any decade, but in which the relative frequencies of the terms [science] and [religion] are at their closest. The most relevant book might be from 1931. The next most relevant from 1942, and the third most relevant from 1919. I don’t know. But this is more than just selecting books by decade. It is selecting books that best exemplify a pattern than I am seeing in the data.
As another example, check out the Ngram Viewer query [car, pollution]. Generally, car seems to diverge from pollution, even though they both trend upwards after 1960. During the mid 1960s to mid 1970s, their divergence narrowed. Something happened then. What was it? Was it the beginnings of the environmental movement, maybe? I don’t know. What I would like to do is be able to see some of the actual books. So again, I want to query Google for Books that contain both car and pollution, and again want to rank by the narrowness of the gap between the two term frequencies — if and only if the book was published after 1920. There is a narrow gap in the 1800s, but only because cars were not as prevalent. So I am not interested in those books; I’m only interested in the modern era books. For a different query, I might prefer the older books, instead. Either way, I should be able to specify it.
And again, this is more than just selecting books by decade. It is about selecting books by the narrowness of the gap. The gap narrows once in the mid 1960s, when the relative frequency of [car] dips. And it narrows again in the mid 1970s when the relative frequency of [pollution] increases. So the most relevant books might be, in order, 1964, 1973, 1966, etc. Not just from any one decade.
This is what I mean by leaning forward, actively engaged users. If that were enabled it would be a great example of exploratory search (see my earlier posts: What you can find out, Universal search is not exploratory search, and “Improving findability” falls short of the mark). But it is not enabled. Not yet.
What Google needs to do is close the loop to give us an exciting form of exploratory search. Book search exists. Book ngram data visualization exists. Now, connect the two and let one select patterns of interest from the visualization and use those patterns as queries into the book collection. Let me, the user, take an active part in determining and creating the query that best exemplifies my information need, the ranking algorithm that best exemplifies what is relevant to me. Don’t just give me a cushier seat by ranking books by their CitationRank or some other universal signal. Let me have a hand in specifying what is interesting, and therefore relevant!
Close the loop.