|
|
What is Social Search as opposed to Social Media? Social Search in Media? Search in Social Media?
Next week, Gene Golovchinsky and I are moderating a pair of panels at the SSM workshop. So we spent some time this week asking ourselves these definitional questions in preparation for the panel. We came up with a lightweight taxonomy, and have done a few classifications/examples of existing systems into that taxonomy. Whether or not you are one of the 80 participants in the workshop, I would invite you to take a look at our framework and comment or critique where necessary. Here’s the link to Gene’s writeup:
We think the phrase ’search in social media’ has been used to refer to both the information being searched, and to the process for doing so. The information is standard user-generated content — tweets, blog posts, comment threads, tags, etc. The process, however, seems less well understood…It will be interesting to see how these ideas will be transformed by the discussion at the workshop. In any case, having a language with which to talk about phenomena is a prerequisite to articulating a research agenda, particularly in a young and multi-disciplinary field.
Please note, however, that one topic that will probably not be covered is the difference between social search (process) and collaborative search (process). The latter workshop will be held a few days later at CSCW. For an interesting thread on the distinction between the two, please see another FXPAL post from March of last year.
A NYT books article about Kasparov and chess, and the relationship between humans, machines, and decision processes is making the Twitter rounds today. I don’t have time at the moment to write a long comment about it, but I do want to point out that it supports a position that I’ve been taking on this blog for some time now:
This experiment goes unmentioned by Russkin-Gutman, a major omission since it relates so closely to his subject. Even more notable was how the advanced chess experiment continued. In 2005, the online chess-playing site Playchess.com hosted what it called a “freestyle” chess tournament in which anyone could compete in teams with other players or computers. Normally, “anti-cheating” algorithms are employed by online sites to prevent, or at least discourage, players from cheating with computer assistance. (I wonder if these detection algorithms, which employ diagnostic analysis of moves and calculate probabilities, are any less “intelligent” than the playing programs they detect.)
Lured by the substantial prize money, several groups of strong grandmasters working with several computers at the same time entered the competition. At first, the results seemed predictable. The teams of human plus machine dominated even the strongest computers. The chess machine Hydra, which is a chess-specific supercomputer like Deep Blue, was no match for a strong human player using a relatively weak laptop. Human strategic guidance combined with the tactical acuity of a computer was overwhelming.
The surprise came at the conclusion of the event. The winner was revealed to be not a grandmaster with a state-of-the-art PC but a pair of amateur American chess players using three computers at the same time. Their skill at manipulating and “coaching” their computers to look very deeply into positions effectively counteracted the superior chess understanding of their grandmaster opponents and the greater computational power of other participants. Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.
This result seems awfully similar to some of the other results I’ve reported on in the past. Continue reading…
The Edge has published their annual question for 2010:
HOW IS THE INTERNET CHANGING THE WAY YOU THINK?
As an Information Retrieval research scientist, I of course was quite interested in what search folks had to say. I found this blurb from Marissa Mayer intriguing:
It’s not what you know, it’s what you can find out. The Internet has put at the forefront resourcefulness and critical-thinking and relegated memorization of rote facts to mental exercise or enjoyment. Because of the abundance of information and this new emphasis on resourcefulness, the Internet creates a sense that anything is knowable or findable — as long as you can construct the right search, find the right tool, or connect to the right people. The Internet empowers better decision-making and a more efficient use of time…
The Web has also enabled amazing dynamic visualizations, where an ideal presentation of information is constructed — a table of comparisons or a data-enhanced map, for example. These visualizations — be it news from around the world displayed on a globe or a sortable table of airfares — can greatly enhance our understanding of the world or our sense of opportunity. We can understand in an instant what would have taken months to create just a few short years ago. Yet, the Internet’s lack of structure means that it is not possible to construct these types of visualizations over any or all data. To achieve true automated, general understanding and visualization, we will need much better machine learning, entity extraction, and semantics capable of operating at vast scale.
It sounds like there is an increased awareness of (and respect for) Exploratory Search. I’ve heard this via private channels, but this is the first time I’ve seen an acknowledgment of the need for more exploratory search from such an official channel.
I do want to point out, however, that in order to make this work at web scale, we won’t just need better automated methods. I.e. we cannot rely solely on machine learning, entity extraction, or web-scale semantics. Rather, what is also desperately needed is a way for the user him- or herself to inject personal semantics and structure into the search, visualization, and comparison process. The search engine itself needs to be responsive to the structure that the user is giving to it, and rearrange itself around that information.
I am afraid that I am not being very clear in the vision that I’m attempting to lay out, so let me draw an analogy to parametric and non-parametric statistical modeling. Continue reading…
Greg Linden has an interesting post on Search on a domain like YouTube. I reproduce it here because I would like to elaborate on it:
The article focuses on YouTube’s “plans to rely more heavily on personalization and ties between users to refine recommendations” and “suggesting videos that users may want to watch based on what they have watched before, or on what others with similar tastes have enjoyed.” What is striking about this is how little this has to do with search. As described in the article, what YouTube needs to do is entertain people who are bored but do not entirely know what they want. YouTube wants to get from users spending “15 minutes a day on the site” closer to the “five hours in front of the television.” This is entertainment, not search. Passive discovery, playlists of content, deep classification hierarchies, well maintained catalogs, and recommendations of what to watch next will play a part; keyword search likely will play a lesser role.
My feeling is that the dichotomy that is being drawn does not exhaustively cover the space. I would characterize the space using the following two orthogonal dimensions: (1) Information Need Clarity and (2) User Engagement. The first dimension (clarity) is related to the degree with which the user understands his or her own information need, i.e. has something specific in mind that he is looking for and/or understands what he needs to do to find it. That need may either be well understood, or (to borrow Nick Belkin’s terminology) “anomalous”: The user doesn’t know what he or she doesn’t know. The second dimension is related to the level at which the user applies himself to the information seeking process. That level may be active or passive.
Greg points out two modes: “Active Understood” (typical navigational web search) and “Passive Anomalous” (entertainment/discovery/recommendation). But I believe that there are more than these two modes. A large, interesting design space opens up when one realizes that information seeking can be “Active Anomalous” and “Passive Understood“.

Exploratory Search is a good example of Active Anomalous seeking. One doesn’t yet fully know or understand what it is that one is looking for, but at the same time one is willing to engage with an information system in order to discover what it is that he or she does not yet know. And the system itself is designed not necessarily toward trying to answer a well understood need, but toward helping the user map out and better comprehend a space.
Collaborative Information Seeking (see here and here and here) is a good example of where an need may be well understood, but a user does not necessarily have to actively express every last query detail in order to get more information on a topic. Why not? Because when User #1 is explicitly collaborating with User #2, an algorithmic mediation engine can push some of User #2’s activity on to User #1 without requiring User #1 to make additional effort. Note that I am not implying that every aspect of collaborative information seeking is passive; quite the contrary, as it requires at least one co-collaborator to be active. I am only pointing out that it is a domain in which it becomes possible for a user to passively obtain specific information on a well understood need.
There is a lot discussion in the Information Retrieval Community on the similarities and differences between Search and Recommendation. A fruitful tension opens up as one travels back and forth along the diagonal from Active Understood to Passive Anomalous; the two approaches often end up complementing each other. Where I see much less discussion is on the tension that opens up along the other diagonal, between Passive Understood and Active Anomalous. When Exploratory Search meets Collaborative Information Seeking, it yields Collaborative Exploratory Search and a whole host of interesting possibilities. Over the coming year I will be blogging more about the tension along this alternative diagonal (both here at on the FXPAL blog) and what it means for the Information Retrieval systems I and others are designing. Happy 2010!
On Twitter today, Josh Young made an interesting observation to which I would like to call attention:
Ya, @jerepick, with “fauxpen” attached, google’s “nav. search as the top of the stack” is a fragile local maximum for the web.
This observation is a followup to the web-wide discussion that Google kicked off about the meaning of open. Essentially, Rosenberg says that all of Google’s products at that are not at search layers of the stack should work toward being open, but that the search layer itself should be closed. To protect it from spammers, you understand {cough}.
Earlier in the same post Rosenberg makes a distinction between open source and open data, calling for increased openness in both. However, when it comes to defending closed-search, this distinction gets lost. But this distinction between open source vs. open data is important. Here is how it translates to the search domain:
- Open Source = Open search algorithm is about letting the world know what features are used to rank pages and how those features interrelate (are weighted)
- Open Data = Open search results is about letting users refactor, remix, reuse, mashup, store and re-search locally any and all query results that the user issues. And about letting the user use any software that they want to accomplish this — not just Google software
The excuse given about why Google cannot open up is that of spammers would be able to game the engine. But if we look closely, we’ll see that it is an excuse that is primarily, if not exclusively, related to the “open source” aspect of openness. Black hat SEO algorithmic gaming is not an issue when it comes to user results re-use and remixing.
And so the point (I think) Josh is making is that by closing not only the algorithm, but also the results of that algorithm, Google has effectively declared a moratorium on Internet application stack progress along that vertical. Google is essentially saying to the Internet: Continue reading…
There is a fantastic Google blog post today by Jonathan Rosenberg on the meaning (and value) of openness. Whooo-boy.. where do we start with this can of worms? Guess I’ll jump right in. Warning: This is probably the longest post I’ve written, so if you are easily bored, understand that this is not required reading. It will not be on the test.
Here we go:
At Google we believe that open systems win. They lead to more innovation, value, and freedom of choice for consumers, and a vibrant, profitable, and competitive ecosystem for businesses.
Agreed! I’m fully on board the spirit of this opening statement!
Many companies will claim roughly the same thing since they know that declaring themselves to be open is both good for their brand and completely without risk.
True. So the question arises: What happens when being open carries with it an amount of risk? Do you open up those areas of your business as well? Or do you forever keep your most valuable layer of the stack closed and proprietary, both in terms of closed source as well as not-fully-open information?
We run the company and make our product decisions based on these principles, so I encourage you to carefully read, review, and debate them. Then own them and try to incorporate them into your work. This is a complex subject and if there is debate (and I’m sure there will be) it should be in the open! Please feel free to comment.
I like the spirit of this discussion so far. I earnestly believe that Google is debating these things internally. But I also take them at their word that they would like this debate to be in the open. Consider this blog post part of my ongoing comment, and ongoing engagement in what I consider to be an extremely important area: The organization and dissemination of information. Continue reading…
Chris Dixon has a post yesterday about search and the social graph. An interesting read, but what struck me the most was a tangent about how current search engines make money:
Lost amid this discussion, however, is that the links people tend to share on social networks – news, blog posts, videos – are in categories Google barely makes money on. (The same point also seems lost on Rupert Murdoch and news organizations who accuse Google of profiting off their misery).
Searches related to news, blog posts, funny videos, etc. are mostly a loss leaders for Google. Google’s real business is selling ads for plane tickets, dvd players, and malpractice lawyers. (I realize this might be depressing to some internet idealists, but it’s a reality). Online advertising revenue is directly correlated with finding users who have purchasing intent. Google’s true primary competitive threats are product-related sites, especially Amazon. As it gets harder to find a washing machine on Google, people will skip search and go directly to Amazon and other product-related sites.
I’ll repeat the salient bit: “Google’s real business is selling ads for plane tickets, dvd players, and malpractice lawyers.” What struck me about this statement was not its veracity. What struck me was its relationship to exploratory search. It is when searching for a plane ticket, purchasing an expensive consumer good, or hiring a decent lawyer that my need for exploratory search is at its highest.
So my question is whether or not there is a tension here between getting the users off of the results page as quickly as possible — especially when the route off that page is typically via an advertisement on which the search engine makes money — versus enabling the user to remain on the results page in a process-oriented mode of sorting and filtering and playing around with the results in a myriad of different ways, so as to come up with a set of options that best satisfies the exploratory need.
Do these two goals conflict? Why or why not? It is an old question, but I am still searching for a satisfactory answer.
Update: Perhaps I should have been more clear as to what characterizes an exploratory search session. There are dozens of papers out there that tell the story much better than I can, so I will quote one of them. It’s by Michael Levi at the U.S. Bureau of Labor Statistics, published at the Information Seeking Support Systems (ISSS) workshop in June 2008. Title of the paper is “Musings on Information Seeking Support Systems”. (See http://ils.unc.edu/ISSS/ISSS_final_report.pdf) I quote:
Some characteristics of open-ended, discovery-oriented exploration emerge:
1) I may not know, at the beginning, whether a seemingly straightforward line of inquiry will expand beyond recognition. Sometimes it will, sometimes it won’t. A lot depends on my mood at any given moment.
2) I can’t predict when the exploration will end. It may be when I’m satisfied that I have learned enough (which also would vary from day to day and query to query.) It may be when I get tired or bored. It may be when I’ve run out of time. Or it may be when I get distracted by dinner or the allure of the swimming pool.
3) I can’t determine, objectively, whether the exploration has been a success. There is usually no “right answer” against which I can measure my progress.
4) My exploration is not a linear process. I could get interested in a tangent at any time from which I may not return. I am also likely to backtrack, possibly with some regularity, either because a tangent proved unfulfilling and I want to resume my original quest, or because I thought of a new question (or a new way of formulating a previous question) to direct at a resource I visited previously.
5) I am likely to want to combine, compare, or contrast information from multiple sources. One of those sources is my memory – which may or may not be reliable in any given circumstance.
Levi then makes a number of recommendations about what an information seeking support system should do, to enable this sort of exploratory search:
A useful information seeking support system, then, would require the following minimum functionality:
1) It should not interfere with my behavior as listed under Characteristics of Exploration above.
2) It should give me capabilities at least as good as those listed under Manual Tools above.
3) It should positively assist my explorations by making them easier or faster or more comprehensive or less error-prone or…
In addition, an ISSS might give me capabilities that I never employed before because they were not possible or because I didn’t think of them.
But, to be truly a leap forwards, an ISSS would need to exhibit at least elements of discernment, judgment, subject matter expertise, and research savvy.
Again the question: Is there a tension here between getting the users off of the results page as quickly as possible — especially when the route off that page is typically via an advertisement on which the search engine makes money — versus enabling the user to remain on the results page in a process-oriented mode of sorting and filtering and playing around with the results in a myriad of different ways, so as to come up with a set of options that best satisfies the exploratory need?
I’ve already heard certain search engines state that their goal is to get the user off the search page as quickly as possible. That it and of itself tells me that they’re specifically designing the system so as to interfere with behaviors listed under Characteristics of Exploration above (Levi’s first recommendation). Why does it interfere? Because my goal is to stick around in the results and compare and contrast, whereas their goal is to get me off of the page as quickly as possible. And so the whole system is designed to do the opposite of what I want it to do.
Additionally, I was also pointing out that the information domains on which I usually have the largest exploratory-type information needs are very similar to the information domains on which the search engines make most to all of their money. I’m still trying to figure out what to make of that.
Thoughts?
Daniel T. has an interesting bipartite use-case model for exploratory search:
- I know what I want, but I don’t know how to describe it.
- I don’t know what I want, but I hope to figure it out once I see what’s out there.
Perhaps this is a silly analogy, but framing the problem in this way reminded me abstractly of P vs. NP. Some problems can be both computed and verified in polynomial (P) time. Other problems can be verified in P time, but it is unknown whether a P-time solution to the problem exists. These are the non-deterministic polynomial set of problems (NP). In the worst case, it might take exponential time to get the answer.
Google and related web search engines are lookup, navigational, known item engines. You can both obtain and verify your answer in polynomial time. Linear even (hence the classic ranked list).
With an exploratory information need, the satisfaction of your information need can be verified in polynomial time. It doesn’t take too long to examine the assembled set of summarized/contrasted/accumulated information and tell whether or not your information need has been satisfied. Maybe it doesn’t take constant time, but it certainly can be accomplished in linear time. But accumulating that information in the first place? It is generally unknown how long that will take, as there is a bit of non-determinism in the information seeking pathways that you need to traverse. Exploratory search is, dare I say, NP.
So the big question: Is P = NP. That is to say, can one use a tool such as Yahoo!, Google, etc. which has been generally optimized for lookup, P-time problems and use it to satisfy one’s exploratory information seeking task? Certainly one can try and use these tools in this manner. Nothing stops a user from entering vast quantities of queries and accumulating the necessary set of information themselves. But the tool has not been designed for that purpose. So can it really be used to solve that problem? Are multiple iterations of lookup search capable of satisfying an exploratory information need? Does P = NP?
I don’t think that it does.
Question for the day: For certain classes of NP problems (e.g. the knapsack problem) there are often heuristic that yield good approximations (nearly-optimal) solutions in P-time. What are the analogous classes of problems in the exploratory information seeking domain? And how would we, in general, recognize them?
One of my ongoing research interest areas is in retrieval interfaces that allow more expressive and powerful statements of a user information need. In that spirit, I wrote a minor rant last April about how the Apple iTunes smart playlist creation interface sacrifices functionality in the interest of simplicity. One could only create smart playlists using a flat conjunction or flat disjunction of expressions. See this screenshot:

Well, the times they are a’changing. I just noticed that the newest version of iTunes (9.0) allows arbitrarily-nested conjunctions and disjunctions. This ability to mix and match gives rise to much greater capability, and only adds the minimum of interface clutter and complexity, i.e. expression indentation and an additional (…) button:

I laud the change and improvement, and I feel that it is another step in the ongoing attempt to raise consciousness about the value of moving beyond barren and crippled-functionality information organization interfaces. That is one of the core challenges of HCIR, and I see Apple now taking another step in this direction.
Via Greg Linden, I came across this interesting quote from Eric Schmidt about the obligation to help newspapers succeed:
Finally, Eric claimed Google has a moral duty to help newspapers succeed:
Google sees itself as trying to make the world a better place. And our values are that more information is positive — transparency. And the historic role of the press was to provide transparency, from Watergate on and so forth. So we really do have a moral responsibility to help solve this problem.
Well-funded, targeted professionally managed investigative journalism is a necessary precondition in my view to a functioning democracy … That’s what we worry about … There [must be] enough revenue that … the newspaper [can] fulfill its mission.
This is great that Google feels this professional responsibility. And I wholeheartedly agree with Schmidt that “more information is positive”. My only question is: Why don’t we see “more information” and transparency when it comes to other media companies, aka search engines? Newspapers engage in investigative journalism in order to bring stories from industry and politics to the citizens. Search engines engage in algorithmic retrieval in order to bring stories from the newspapers (and other sources) to the citizens. The historical role of the press has been to provide transparency. So also is the modern role of the retrieval engine to provide transparency. And just as a good reporter has to cite sources to make their stories credible, so should a search algorithm provide explanatory interfaces, algorithms, and information to make their results credible.
Shouldn’t there be an expectation of as much information and transparency from our search interfaces and algorithms as we have from our press? It is no secret that I think there should be. It is a goal that I strive for in my own research; I can’t say that it’s not difficult, but it is worth striving for.
See also some of my previous posts:
|
|
Recent Comments