|
|
Last month, in reaction to the “Unreasonable Effectiveness of Data” paper that made the rounds, Stephen Few from the Business Intelligence community wrote an interesting post:
The notion that “we need more data” seems to have always served as a fundamental assumption and driver of the data warehousing and business intelligence industries. It is true that a missing piece of information can at times make the difference between a good or bad decision, but there is another truth that we must take more seriously today: most poor decisions are caused by lack of understanding, not lack of data. The way that data warehousing and business intelligence resources are typically allocated fails to reflect this fact. The more and faster emphasis of these efforts must shift to smarter and more effective. Although current efforts to build bigger and faster data repositories and better production reporting systems should continue, they should take a back seat to efforts to increase the data sense-making skills of workers and to improve the tools that support these skills.
This is a point that I wholely subscribe to, and an aspect of which I encountered the other day when attempting to use web search engines to satisfy my “hidden cafes in prague” information need. Continue reading…
One of my ongoing frustrations with modern, consumer-facing information organization and retrieval systems is the way in which functionality is often sacrificed in the name of simplicity.
Full functionality under the rubric of simplicity is a laudable goal, and I would agree that this is where we all eventually want to end up in the information systems, interfaces and algorithms that we are designing. Simplicity without full functionality, but with alternative complex interfaces for an advanced user to specify greater functionality is a satisfactory stepping stone along the path to this goal. But simplicity with obstructed or stunted functionality, with no possibility for the user to improve that functionality, is too often what we end up with.
Case in point: Apple’s iTunes/iPod. Continue reading…
While the focus of this blog is the retrieval of existing information, from music to images to videos to text, every once it a while it is nice to create new information as well. In that spirit I decided to participate in World Pinhole Photography Day, which is today, Sunday April 26, 2009. While [...]
A week or two ago I began writing a few thoughts about large-data based algorithms and retrievability. It was spawned by the Unreasonable Effectiveness of Data position paper by a couple of notable Googlers, which then led to a brief discussion.
My main contention was that by relying to heavily on algorithms that are based solely on accumulations of large-data, and by not offering users exploratory search options to turn off the large-data, popularity bias, searchers would be unable to ever find certain pieces of relevant information. This is not even a matter of knowing the correct query terms to use; I argued (backed up by published research) that even if you knew the correct terms, you still could not find certain pieces of information.
Well, now I want to write about the other half of the equation: What do you do when the information is retrievable under some term, but you just do not know that term? Why do search engines not give you more help with finding information which does exist if you know exactly the right word to use, but for which no reasonable person would ever know the correct word?
Let me give an example: Hidden Cafes in Prague.
Continue reading…
A few days ago, Google launched “similar image search” functionality. From TechCrunch:
A new 20% time Google project has just launched called Google Similar Images. It’s pretty self-explanatory — when you search for an image and find one close to what you’re looking for, Google can now find ones that it believes to be [...]
As a researcher, it is occasionally quite interesting to reread thoughts and positions that I’ve taken in years and works past. Sometimes I can observe a marked shift from my previous thinking; avenues or approaches that I once considered fruitful I now no longer do. And sometimes I can observe hints and seeds of my current research; avenues of which I only had a vague inkling have blossomed into larger pursuits.
In April of 2006 I had the good fortune to attend a Dagstuhl Seminar on Content-Based Multimedia Information Retrieval (I am toward the upper left corner of the seminar group photo). Ramesh Jain has a good writeup of Dagstuhl Seminars, what they are and how they work. In the abstract of my Seminar presentation I wrote:
Continue reading…
Via Tim O’Reilly on Twitter, I came across this article by Vanessa Fox on how government can improve the findability of their web pages, and thereby allow citizens to become better informed and government to be more transparent. Fox writes:
Continue reading…
From Wired:
Vevo will launch later this year, a collaboration between Universal Music Group and Google the partners expect to be the leading music video service in the world from day one. Google confirmed to Wired.com Thursday that all of Universal Music Group’s video assets (music videos, interviews, concert footage and possibly Kyte-style backstage [...]
In my previous post I talked a little about the notion that big data alone cannot solve many of our problems. I would like to give a more concrete example of this by discussing a paper published at CIKM 2008: “Retrievability: An Evaluation Measure for Higher Order Information Access Tasks” by Azzopardi and Vinay. In large part, my desire to discuss this paper comes from a few of Peter Norvig’s comments in the aforementioned thread on big data:
Continue reading…
Large data can be extremely effective, but how widely applicable is it, really?
A week or two ago the blogosphere was abuzz with discussion about the Unreasonable Effectiveness of Data position paper by Googlers A. Halevy, P. Norvig, and F. Pereira. I had my own commentary, but some great discussion came when Peter Norvig jumped in to the comments section of Daniel’s blog and clarified some of his points. I have decided now to write a few followup posts on this topic, as it touches all sorts of information seeking behaviors and domains, from music recommendation to web search to enterprise search to exploratory search.
Continue reading…
|
|
Recent Comments