What sort of information retrieval system would you build if you knew that all the users of your system would be expert or highly-motivated amateur searchers? What sort of system would you build when you have a very large collection of unstructured information, and the goal in searching that information is not to find one document (e.g. navigate to a home page), but to find (a) relationships between documents, or (b) large sets of documents that all pertain to a single topic? How would your algorithms be different? How would your interfaces be difference? How would the process itself (that middle layer in between algorithms and interfaces) be different?
Via Daniel Tunkelang’s recent post, I think that Government information might be a perfect domain in which to ask (and answer) these sorts of questions. The U.S. Open Government Initiative has as its goal the release of loads of raw government data for use by any individual or organization. How are people going to use this data? What types of questions will they ask? What types of questions could they ask, if given the proper tools (i.e. what might they not know that they want to ask, until it becomes possible?)
Two types of information retrieval might be perfect for this domain: Exploratory Search and (Explicitly) Collaborative Search. In exploratory search, the goal of your information seeking is to learn, discover, compare, contrast, etc. In explicitly collaborative search, your goal is to do something similar, but with another set of like-minded partners working with you on the same task/topic. Each partner may have different expertise; one may be an expert in energy policy, another might understand trade and commerce, and another might have experience with the inner workings of Congress and understand how it works on a practical level. If you put all these people together right now, the only way they can work together on a shared task is to search separately and then email each other their results. What if, however, you could design a system that not only mediated between them on an interface level (immediate notification of marked documents and passages, shared highlighting of seen documents, etc.) but mediated between them on an algorithmic level as well? Algorithmic mediation of the collaborative process would mean that the retrieval system itself has a hand in both combining and partitioning the inputs and actions of the search team members, as necessary. They might then be able to find important, valuable information that none of the searchers, had they been working alone, could have.
It seems like an interesting domain, and one with real, potentially quite important consequences and societal implications. It will be interesting to watch as this develops.