|
|
What is Social Search as opposed to Social Media? Social Search in Media? Search in Social Media?
Next week, Gene Golovchinsky and I are moderating a pair of panels at the SSM workshop. So we spent some time this week asking ourselves these definitional questions in preparation for the panel. We came up with [...]
A NYT books article about Kasparov and chess, and the relationship between humans, machines, and decision processes is making the Twitter rounds today. I don’t have time at the moment to write a long comment about it, but I do want to point out that it supports a position that I’ve been taking on this blog for some time now:
This experiment goes unmentioned by Russkin-Gutman, a major omission since it relates so closely to his subject. Even more notable was how the advanced chess experiment continued. In 2005, the online chess-playing site Playchess.com hosted what it called a “freestyle” chess tournament in which anyone could compete in teams with other players or computers. Normally, “anti-cheating” algorithms are employed by online sites to prevent, or at least discourage, players from cheating with computer assistance. (I wonder if these detection algorithms, which employ diagnostic analysis of moves and calculate probabilities, are any less “intelligent” than the playing programs they detect.)
Lured by the substantial prize money, several groups of strong grandmasters working with several computers at the same time entered the competition. At first, the results seemed predictable. The teams of human plus machine dominated even the strongest computers. The chess machine Hydra, which is a chess-specific supercomputer like Deep Blue, was no match for a strong human player using a relatively weak laptop. Human strategic guidance combined with the tactical acuity of a computer was overwhelming.
The surprise came at the conclusion of the event. The winner was revealed to be not a grandmaster with a state-of-the-art PC but a pair of amateur American chess players using three computers at the same time. Their skill at manipulating and “coaching” their computers to look very deeply into positions effectively counteracted the superior chess understanding of their grandmaster opponents and the greater computational power of other participants. Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.
This result seems awfully similar to some of the other results I’ve reported on in the past. Continue reading…
The Edge has published their annual question for 2010:
HOW IS THE INTERNET CHANGING THE WAY YOU THINK?
As an Information Retrieval research scientist, I of course was quite interested in what search folks had to say. I found this blurb from Marissa Mayer intriguing:
It’s not what you know, it’s what you can find out. The Internet has put at the forefront resourcefulness and critical-thinking and relegated memorization of rote facts to mental exercise or enjoyment. Because of the abundance of information and this new emphasis on resourcefulness, the Internet creates a sense that anything is knowable or findable — as long as you can construct the right search, find the right tool, or connect to the right people. The Internet empowers better decision-making and a more efficient use of time…
The Web has also enabled amazing dynamic visualizations, where an ideal presentation of information is constructed — a table of comparisons or a data-enhanced map, for example. These visualizations — be it news from around the world displayed on a globe or a sortable table of airfares — can greatly enhance our understanding of the world or our sense of opportunity. We can understand in an instant what would have taken months to create just a few short years ago. Yet, the Internet’s lack of structure means that it is not possible to construct these types of visualizations over any or all data. To achieve true automated, general understanding and visualization, we will need much better machine learning, entity extraction, and semantics capable of operating at vast scale.
It sounds like there is an increased awareness of (and respect for) Exploratory Search. I’ve heard this via private channels, but this is the first time I’ve seen an acknowledgment of the need for more exploratory search from such an official channel.
I do want to point out, however, that in order to make this work at web scale, we won’t just need better automated methods. I.e. we cannot rely solely on machine learning, entity extraction, or web-scale semantics. Rather, what is also desperately needed is a way for the user him- or herself to inject personal semantics and structure into the search, visualization, and comparison process. The search engine itself needs to be responsive to the structure that the user is giving to it, and rearrange itself around that information.
I am afraid that I am not being very clear in the vision that I’m attempting to lay out, so let me draw an analogy to parametric and non-parametric statistical modeling. Continue reading…
Greg Linden has an interesting post on Search on a domain like YouTube. I reproduce it here because I would like to elaborate on it:
The article focuses on YouTube’s “plans to rely more heavily on personalization and ties between users to refine recommendations” and “suggesting videos that users may want to watch based [...]
On Twitter today, Josh Young made an interesting observation to which I would like to call attention:
Ya, @jerepick, with “fauxpen” attached, google’s “nav. search as the top of the stack” is a fragile local maximum for the web.
This observation is a followup to the web-wide discussion that Google kicked off about the meaning of open. Essentially, Rosenberg says that all of Google’s products at that are not at search layers of the stack should work toward being open, but that the search layer itself should be closed. To protect it from spammers, you understand {cough}.
Earlier in the same post Rosenberg makes a distinction between open source and open data, calling for increased openness in both. However, when it comes to defending closed-search, this distinction gets lost. But this distinction between open source vs. open data is important. Here is how it translates to the search domain:
- Open Source = Open search algorithm is about letting the world know what features are used to rank pages and how those features interrelate (are weighted)
- Open Data = Open search results is about letting users refactor, remix, reuse, mashup, store and re-search locally any and all query results that the user issues. And about letting the user use any software that they want to accomplish this — not just Google software
The excuse given about why Google cannot open up is that of spammers would be able to game the engine. But if we look closely, we’ll see that it is an excuse that is primarily, if not exclusively, related to the “open source” aspect of openness. Black hat SEO algorithmic gaming is not an issue when it comes to user results re-use and remixing.
And so the point (I think) Josh is making is that by closing not only the algorithm, but also the results of that algorithm, Google has effectively declared a moratorium on Internet application stack progress along that vertical. Google is essentially saying to the Internet: Continue reading…
There is a fantastic Google blog post today by Jonathan Rosenberg on the meaning (and value) of openness. Whooo-boy.. where do we start with this can of worms? Guess I’ll jump right in. Warning: This is probably the longest post I’ve written, so if you are easily bored, understand that this is not required reading. It will not be on the test.
Here we go:
At Google we believe that open systems win. They lead to more innovation, value, and freedom of choice for consumers, and a vibrant, profitable, and competitive ecosystem for businesses.
Agreed! I’m fully on board the spirit of this opening statement!
Many companies will claim roughly the same thing since they know that declaring themselves to be open is both good for their brand and completely without risk.
True. So the question arises: What happens when being open carries with it an amount of risk? Do you open up those areas of your business as well? Or do you forever keep your most valuable layer of the stack closed and proprietary, both in terms of closed source as well as not-fully-open information?
We run the company and make our product decisions based on these principles, so I encourage you to carefully read, review, and debate them. Then own them and try to incorporate them into your work. This is a complex subject and if there is debate (and I’m sure there will be) it should be in the open! Please feel free to comment.
I like the spirit of this discussion so far. I earnestly believe that Google is debating these things internally. But I also take them at their word that they would like this debate to be in the open. Consider this blog post part of my ongoing comment, and ongoing engagement in what I consider to be an extremely important area: The organization and dissemination of information. Continue reading…
Chris Dixon has a post yesterday about search and the social graph. An interesting read, but what struck me the most was a tangent about how current search engines make money:
Lost amid this discussion, however, is that the links people tend to share on social networks – news, blog posts, videos – are [...]
Daniel T. has an interesting bipartite use-case model for exploratory search:
I know what I want, but I don’t know how to describe it. I don’t know what I want, but I hope to figure it out once I see what’s out there.
Perhaps this is a silly analogy, but framing the problem in [...]
One of my ongoing research interest areas is in retrieval interfaces that allow more expressive and powerful statements of a user information need. In that spirit, I wrote a minor rant last April about how the Apple iTunes smart playlist creation interface sacrifices functionality in the interest of simplicity. One could only create smart [...]
Last March I pointed out a short piece by Tessa Lau about how good interaction design trumps smart algorithms. Today I have a followup. In particular, Xavier Amatriain has a good writeup of the recently concluded Netflix contest. Some of the lessons learned by going through the process are related to the importance of good evaluation metrics, the effect of (lapsed) time, matrix factorization, algorithm combination, and the value of data.
Data is always important, but what struck me in the writeup was his discovery that the biggest advances came not from accumulation of massive amount of data, log files, clicks, etc. Rather, while dozens and dozens of researchers around the world were struggling to reach that coveted 10% improvement by eking out every last drop of value from large data-only methods, Amatriain comparatively easily blew past that ceiling and hit 14%.
How? Continue reading…
|
|
Recent Comments