HOW IS THE INTERNET CHANGING THE WAY YOU THINK?
As an Information Retrieval research scientist, I of course was quite interested in what search folks had to say. I found this blurb from Marissa Mayer intriguing:
It’s not what you know, it’s what you can find out. The Internet has put at the forefront resourcefulness and critical-thinking and relegated memorization of rote facts to mental exercise or enjoyment. Because of the abundance of information and this new emphasis on resourcefulness, the Internet creates a sense that anything is knowable or findable — as long as you can construct the right search, find the right tool, or connect to the right people. The Internet empowers better decision-making and a more efficient use of time…
The Web has also enabled amazing dynamic visualizations, where an ideal presentation of information is constructed — a table of comparisons or a data-enhanced map, for example. These visualizations — be it news from around the world displayed on a globe or a sortable table of airfares — can greatly enhance our understanding of the world or our sense of opportunity. We can understand in an instant what would have taken months to create just a few short years ago. Yet, the Internet’s lack of structure means that it is not possible to construct these types of visualizations over any or all data. To achieve true automated, general understanding and visualization, we will need much better machine learning, entity extraction, and semantics capable of operating at vast scale.
It sounds like there is an increased awareness of (and respect for) Exploratory Search. I’ve heard this via private channels, but this is the first time I’ve seen an acknowledgment of the need for more exploratory search from such an official channel.
I do want to point out, however, that in order to make this work at web scale, we won’t just need better automated methods. I.e. we cannot rely solely on machine learning, entity extraction, or web-scale semantics. Rather, what is also desperately needed is a way for the user him- or herself to inject personal semantics and structure into the search, visualization, and comparison process. The search engine itself needs to be responsive to the structure that the user is giving to it, and rearrange itself around that information.
I am afraid that I am not being very clear in the vision that I’m attempting to lay out, so let me draw an analogy to parametric and non-parametric statistical modeling. In parametric modeling, you assume that your data is distributed according to some function (say, Gaussian) and then you try and find those parameters that best fit the data. On the other hand, with non-parametric modeling you make no such assumption. You simply let the data describe itself through its own correlations and patterns.
By analogy: Assuming that the only way to visualize and compare information (do exploratory search) on the web is to rely on machine learning to do entity extraction and web-scale semantics is like assuming that one has to have a parametric model. It helps, but it is not absolutely necessary. My vision is for another approach, one analogous to non-parametric methods: Let the user give feedback on the relationship between items that he or she has examined during the search process and then use that comparison information to build personalized visualization or comparison tool for that user’s specific information need, from the ground up. Don’t rely on the parametric form of semantic categories or named entities. Use bottom-up patterns to facilitate organization and comparison, discovery and learning, decision making and exploration. More importantly, use the feedback provided by the user (e.g. “these two items are similar”, and “these two items are not”) to drive your online, bottom-up exploration.
We have to get away from this attempt to solve the exploration problem ahead of time, off-line, before the user has ever issued a query. That’s the parametric way of thinking, the way that presumes that categories and labels and entities are the best way of tackling organization and discovery. Rather, we have to become better at involving the user, the person doing the exploration, in the feedback loop, and not rely solely on pre-computed, machine-learning-extracted entities.
Unlike navigational search, in which users are rarely willing to do any extra work themselves, users engaged in exploratory search by their very nature desire to interact more with the system and put more of their own sweat and tears into the search process. They would not be exploring, if they weren’t. So why not make use of this user willingness?
Computational resources are going to be a challenge. But that’s where Google’s new commitment to openness (and Yahoo!’s initial, existing commitment) comes in handy. There should be a willingness to offload some of the computation (and therefore also the search data itself) to the user’s own computer. Instead of SETI@Home, we could have SEARCH@Home. Let the user’s underutilized processing power be partially responsible for computing some of these bottom-up patterns in his or her own search data that will help make dynamic visualization a reality. Make the user’s own computer partially responsible for the additional necessary processing.
Mayer is correct: “The Internet has put at the forefront resourcefulness and critical-thinking and relegated memorization of rote facts to mental exercise or enjoyment. Because of the abundance of information and this new emphasis on resourcefulness, the Internet creates a sense that anything is knowable or findable — as long as you can construct the right search, find the right tool, or connect to the right people.” We should be developing systems that enable the users to construct the right search. The user should be able to rely on our her resourcefulness to mash up and explore the data herself, to shed light on patterns of information hitherto unknowable by single-line input box navigational search. Users should be able to apply critical thinking to their search process in a way that makes sense to the user, not in a way that has been pre-computed through some semantic category and machine learning classifier. And a good search engine should be a valuable partner in this process, by way of flexibility and openness, not by way of constraint and closedness.
Only then will we, the users of these systems, be able to find out what we previously could not find out. At least, that is how the Internet is changing the way I think.