Over the past month and a half, computer science researcher and UQAM Professor Daniel Lemire has been on fire. He’s written a series of blog posts on what it means to do research and be involved with a research community. I’ve thoroughly enjoyed the whole series, and want to pass along pointers to his last 8 posts:
- Are Solo Authors Less Cited?
- The Missing Research Tool
- On Academic Branding
- Are Your Research Papers Telling Original Stories?
- Gardening and Research (I love the analogy here!)
- Research Productivity: Some Paths Less Traveled
- Social Software…Toys or Productivity?
- A Taxonomy of Computer Science Researchers
My favorite line is from the “Original Stories” post:
Life is multidimensional. Research papers should be multidimensional too! We should ask several interesting questions. We should give several nuanced answers. We should expect more from the reviewers and the readers!
One of the things that I learned very early on as a researcher is that evaluation drives innovation. By that I do not mean that getting 10% improvement by some metric is the key to research. That is of course important, even necessary. Even more important, however, is the metric that you have chosen to carry out the evaluation in the first place. Asking interesting questions means that you have likely been forced to struggle with choosing the appropriate evaluation metric, and in some cases have had to propose a new metric. That choice of metric is itself one of the more interesting questions you deal with during a paper, and the way you approach that metric yields those nuanced answers. Evaluation drives innovation means that there is a deep connection between a new question that you’re trying to ask and where you think the answer to that question fits into the world.
Pingback: Information Retrieval Gupf » Is the Ad-Sponsored Web Search Market a Conversation?
I’ll focus on one of Daniel’s topics “The missing research tool”. In simple terms, information retrieval is text-based ie. enter one or more keywords (or a phrase) into a search box and the search engine returns all documents matching the keywords ranked according to some method.
What if you could enter a document as the search item and the search engine returned all documents similar to the query document in ranked order. We call this item-based search.
We don’t cut and paste the document contents into the search box but drag and drop the document (object or item) into the search box.
It is easy to see how academic papers are a natural fit for item-based search.
In our case, an item is not restricted to documents but could be images, music and so on.
Demos of item-based search using different data types can be found at http://www.xyggy.com.
Dinesh — I think another term that folks have used, in addition to “item-based query”, is “query-by-example”. But either way you label it, I agree that it’s something that search engines need to get much better at. I look forward to checking out your demo.
Query-by-example is a very database-centric term and doesn’t quite do justice to what we are offering. Sure, you have to give the search engine an item to find similar items. In our case, you can provide multiple items per query eg. “given one or more query images, find similar images in ranked order”. At a later date, we will show that query items of different data types (eg. one or more of patents, documents, images etc.) can be entered to find similar items.
Is it database-centric? I come from the music IR world (as you know) and people used the “query by example” phrase all the time to do things like find songs with similar rhythmic patterns. You give the system 1, 2, or even 10 songs that contain a certain pattern, and then the system extracts that pattern, looking for commonality amongst all the “examples” in the query to really know what the salient pattern is. Then the system uses this automatically-inferred pattern as the query. Is that similar to what you’re talking about? Or is that more database-y?