<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Information Retrieval Gupf &#187; Information Retrieval Foundations</title>
	<atom:link href="http://irgupf.com/category/information-retrieval-foundations/feed/" rel="self" type="application/rss+xml" />
	<link>http://irgupf.com</link>
	<description>Information Retrieval Research, Issues, and Discussion</description>
	<lastBuildDate>Fri, 25 Jun 2010 18:30:07 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The Search User Wants a Story</title>
		<link>http://irgupf.com/2010/06/25/the-search-user-wants-a-story/</link>
		<comments>http://irgupf.com/2010/06/25/the-search-user-wants-a-story/#comments</comments>
		<pubDate>Fri, 25 Jun 2010 18:25:50 +0000</pubDate>
		<dc:creator>jeremy</dc:creator>
				<category><![CDATA[Exploratory Search]]></category>
		<category><![CDATA[Information Retrieval Foundations]]></category>

		<guid isPermaLink="false">http://irgupf.com/?p=1219</guid>
		<description><![CDATA[I fired up reddit this morning and was completely flabbergasted by one of the top posts.  The title of the post was &#8220;This is Why I Use Google, Not Bing&#8221;.  And it linked straight to this screenshot (which I reproduce here, in case the target disappears at some point):

This blew my mind, not only that [...]]]></description>
			<content:encoded><![CDATA[<p>I fired up reddit this morning and was completely flabbergasted by one of the top posts.  The title of the post was &#8220;This is Why I Use Google, Not Bing&#8221;.  And it linked straight to <a href="http://imgur.com/cl8qo">this screenshot</a> (which I reproduce here, in case the target disappears at some point):</p>
<p><img class="aligncenter size-full wp-image-1220" title="cl8qo" src="http://irgupf.com/wp-content/uploads/2010/06/cl8qo.png" alt="cl8qo" width="877" height="343" /></p>
<p>This blew my mind, not only that an alphageek would prefer the (Google) interface on the left to the (Bing) interface on the right, but that the redditor alphageek community would so heavily upvote it.  The way I see it, this speaks directly to the issues of simplicity as storytelling vs. sparsity that I&#8217;ve talked about from time to time.  The interface on the left is anything but sparse.  In fact, it is extremely busy and filled with images,  a tool belt of various verticals (news, video images), query modification tools such as timelines and recency sorting, and query reformulation tools such as narrowly related searches (top middle) and broadly related searches (lower left).</p>
<p>In short, everything about it is &#8220;non-Googly&#8221;<span id="more-1219"></span>, i.e. non-sparse and non-clean.  Ironically, the Bing results for this particular query &#8212; which is held up as the example of what not to do &#8212; is the cleaner one.</p>
<p>So why is it that thousands of Google-loving redditors prefer the interface that is, well, more Bing-like?  Could it be that the user is finally starting to understand that simplicity is not the same thing as sparsity?  That what matters is the story?  The Google results in this case tell a really good story.  They give a concise overview of the latest matches and scores.  They link directly to highlights.  They give a concise overview of upcoming matches and the time at which each occurs.  And they acknowledge that when you search for &#8220;World Cup&#8221;, you&#8217;re not just trying to navigate to a single page.  Instead, you are &#8220;exploratorily&#8221; looking for as much information as you can about what is happening at the event as a whole, and perhaps even with football (soccer) as a whole. This is not just a &#8220;one box&#8221; answer. This is a whole &#8220;cluttered&#8221; set of rich information and interaction options.</p>
<p>That&#8217;s the story.  And if it takes a non-sparse (complex or cluttered) interface to tell that story, then so be it.  The story is more important than the strict adherence to sparsity.  Which is something that I&#8217;ve been hammering on about for at least the past half decade now.  It is just encouraging to see users finally start to acknowledge it.</p>
<p>Now, all we need to do is let the redditor community know that even though Google beat Bing on this one particular query, overall Bing has been pushing more of this story-appropriate, non-sparse, information rich (&#8221;cluttered&#8221;) interaction in their results.  What I wish users did more of is constantly rotate between the various engines, to know for themselves which queries work on which engines, and what each of the various engines are capable of.  Because the irony here is that the redditor that which &#8220;This is Why I Use Google, Not Bing&#8221; has chosen and interface that is much more Bing-like, and less traditionally &#8220;Googly&#8221;.</p>
<p>See also my related post, about two Googlers (Norvig and an anonymous employee) and their c<a href="http://irgupf.com/2009/06/19/semantic-technology-search-panel/">omments about Bing at the Semantic Technology conference</a> in June 2009.</p>
<p>Update: In the couple of minutes between when I saw the reddit link and when I finished writing this post, the Google vs. Bing image went from 4th on the reddit home page (with ~500 upvotes) to 2nd (with ~750 upvotes).  Clearly this has touched a nerve.  It&#8217;s very interesting to see this reaction, especially because the preferred interface, again, is so traditionally non-Googly and cluttered.</p>
]]></content:encoded>
			<wfw:commentRss>http://irgupf.com/2010/06/25/the-search-user-wants-a-story/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>More on Simplicity and the Paradox of Choice</title>
		<link>http://irgupf.com/2010/06/23/more-on-simplicity-and-the-paradox-of-choice/</link>
		<comments>http://irgupf.com/2010/06/23/more-on-simplicity-and-the-paradox-of-choice/#comments</comments>
		<pubDate>Wed, 23 Jun 2010 16:12:48 +0000</pubDate>
		<dc:creator>jeremy</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Information Retrieval Foundations]]></category>

		<guid isPermaLink="false">http://irgupf.com/?p=1217</guid>
		<description><![CDATA[I came across an interesting blogpost today, entitled &#8220;The Paradox of Choice is Not Robust&#8220;.  To requote their quote:
Benjamin Scheibehenne, a psychologist at the University of Basel, was  thinking along these lines when he decided (with Peter Todd and, later,  Rainer Greifeneder) to design a range of experiments to figure out when  [...]]]></description>
			<content:encoded><![CDATA[<p>I came across an interesting blogpost today, entitled &#8220;<a href="http://www.marginalrevolution.com/marginalrevolution/2009/11/the-paradox-of-choice-is-not-robust.html">The Paradox of Choice is Not Robust</a>&#8220;.  To requote their quote:</p>
<blockquote><p>Benjamin Scheibehenne, a psychologist at the University of Basel, was  thinking along these lines when he decided (with Peter Todd and, later,  Rainer Greifeneder) to design a range of experiments to figure out when  choice demotivates, and when it does not.</p>
<p>But a curious thing happened almost immediately. They began by trying  to replicate some classic experiments – such as the jam study, and a  similar one with luxury chocolates. They couldn’t find any sign of the  “choice is bad” effect. Neither the original Lepper-Iyengar experiments  nor the new study appears to be at fault: the results are just different  and we don’t know why.</p>
<p>After designing 10 different experiments in which participants were  asked to make a choice, and finding very little evidence that variety  caused any problems, Scheibehenne and his colleagues tried to assemble  all the studies, published and unpublished, of the effect.</p>
<p>The average of all these studies suggests that offering lots of extra  choices seems to make no important difference either way.</p></blockquote>
<p>I&#8217;ll let that speak for itself, and will note only a few of my related blog posts from a year+ ago: <a href="http://irgupf.com/2009/05/15/google-search-options-and-the-paradox-of-choice/">Google Search Options and the Paradox of Choice</a> and <a href="http://irgupf.com/2009/03/04/ranked-lists-and-the-paradox-of-choice/">Ranked Lists and the Paradox of Choice</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://irgupf.com/2010/06/23/more-on-simplicity-and-the-paradox-of-choice/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Simplicity: Sparsity or Storytelling?</title>
		<link>http://irgupf.com/2010/06/10/simplicity-sparsity-or-storytelling/</link>
		<comments>http://irgupf.com/2010/06/10/simplicity-sparsity-or-storytelling/#comments</comments>
		<pubDate>Thu, 10 Jun 2010 17:39:00 +0000</pubDate>
		<dc:creator>jeremy</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Information Retrieval Foundations]]></category>
		<category><![CDATA[Social Implications]]></category>

		<guid isPermaLink="false">http://irgupf.com/?p=1207</guid>
		<description><![CDATA[A tweet by @akumar prompted me to punch up this quick blogpost:
as with all controversial issues, there&#8217;s a positive in google trying bing/image &#8211; that they&#8217;re not afraid to learn from competition
What Amit is referring to is the recent addition of gorgeous photographic images as search page background.  See for example this writeup: http://blogs.abcnews.com/theworldnewser/2010/06/google-vs-bing-copycat-picture-on-prominent-page.html
He is [...]]]></description>
			<content:encoded><![CDATA[<p>A tweet by @akumar prompted me to punch up this quick blogpost:</p>
<blockquote><p>as with all controversial issues, there&#8217;s a positive in google trying bing/image &#8211; that they&#8217;re not afraid to learn from competition</p></blockquote>
<p>What Amit is referring to is the recent addition of gorgeous photographic images as search page background.  See for example this writeup: <a href="http://blogs.abcnews.com/theworldnewser/2010/06/google-vs-bing-copycat-picture-on-prominent-page.html">http://blogs.abcnews.com/theworldnewser/2010/06/google-vs-bing-copycat-picture-on-prominent-page.html</a></p>
<p>He is of course correct; Google is learning from the competition.  But there is another issue at play here, one that I don&#8217;t want to overlook because I feel it is very important.  It is the issue of simplicity.  What is simplicity?  How is it defined?  How is it measured? Conversely, what is complexity?  What is clutter?<span id="more-1207"></span></p>
<p>For over a decade now, Google has essentially defined simplicity as <em><strong>sparsity</strong></em>.  Sparse backgrounds, lots of negative space, sparse color schemes, sparse auxiliary information (e.g. query term suggestions on the SERP page have only started appearing in the last year or two, despite the fact that such features existed 15 years ago in search engines of old such as Infoseek and Altavista).  The reason given was that people didn&#8217;t like clutter, that people like simplicity.  And in Google&#8217;s definition, simplicity equals sparsity.</p>
<p>I agree.  People <em>do</em> like simplicity.  I don&#8217;t question the veracity of that general sentiment.  What has always bothered me, though, is the equivocation of simplicity with sparsity.  I think a much better definition of simplicity is not the amount of information or colors or negative space on a page, but the <em>story that a design, interface, interaction, or algorithm tells</em>.  Something with a lot of colors and links and words can still be simple&#8230;<em>if it tells a clear story</em>!  Conversely, something with fewer colors and links (sparser) can be more complex, if the story that it communicates is muddy and not as purposely focused.</p>
<p>This brings us to the Bing background image.  In my opinion, the even though the inclusion of a background image is less sparse and more &#8220;cluttered&#8221; (more colors, more shapes, more textures), it actually assists in the telling of a clearer story.  Why?  Because it more cleanly separates foreground and background, subject and frame.  It provides compositional balance to the page.  The white query input box on white background (10+ years of Google design) is sparser, but the story that it tells is less clear because foreground and background are not as cleanly separated.  A white query input box on a richly colored and textured background tells a clearer, simpler story because the background image frames and separates the foreground query input box.  Furthermore, because you can now distinguish background and foreground, you can more clearly see that the query input box lies near the pleasing &#8220;rule of thirds&#8221; line, which aids further in the overall storytelling.</p>
<p>In short, I applaud this move by Google, just as I applaud it from Bing.  I never liked the white-on-white, because sparsity is not the same thing as simplicity.  Simplicity arises through good storytelling, not through minimalism.  No A/B testing will tell you this, though.  It&#8217;s a definitional issue that must be defined before you start your A/B tests.  Google has learned from the competition, as @akumar says.  But I hope that the lesson Google has learned is not just that users like pretty pictures.  I hope the lesson is that, when it comes to simplicity, there is a difference between sparsity and storytelling.</p>
<p>See also my posts: <a href="http://irgupf.com/2009/04/29/the-tyranny-of-simplicity/">The Tyranny of Simplicity</a>, <a href="http://irgupf.com/2009/11/16/the-tyranny-of-simplicity-ii/">The Tyranny of Simplicity, Redux</a>, and <a href="http://irgupf.com/2009/11/05/the-craft-of-storytelling/">The Craft of Storytelling</a>.  I also found this <a href="http://www.massively.com/2009/01/02/the-death-of-lively-and-some-lessons-about-complexity/">older discussion on Google&#8217;s Lively</a> to be a fascinating read.  In my understanding, the issue of &#8220;necessary complexity&#8221; that the author of that post hammers home about is related to the issue of storytelling.  Too much sparsity (of interaction in Lively&#8217;s case) leads to an inability to tell a clear story.  Simplicity is storytelling, not sparsity.</p>
]]></content:encoded>
			<wfw:commentRss>http://irgupf.com/2010/06/10/simplicity-sparsity-or-storytelling/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Seeing Stars</title>
		<link>http://irgupf.com/2010/04/28/seeing-stars/</link>
		<comments>http://irgupf.com/2010/04/28/seeing-stars/#comments</comments>
		<pubDate>Wed, 28 Apr 2010 20:59:52 +0000</pubDate>
		<dc:creator>jeremy</dc:creator>
				<category><![CDATA[Explanatory Search]]></category>
		<category><![CDATA[Information Retrieval Foundations]]></category>

		<guid isPermaLink="false">http://irgupf.com/?p=1197</guid>
		<description><![CDATA[There is an interesting blogpost on the Official Google blog today, about seeing stars:
We&#8217;ve long believed that personalization makes search more relevant and  fun. For nearly five years, we&#8217;ve been tailoring results with personalized  search. Today we&#8217;re announcing a new feature in search that makes  it easier for you to mark and [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://googleblog.blogspot.com/2010/03/stars-make-search-more-personal.html">There is an interesting blogpost</a> on the Official Google blog today, about seeing stars:</p>
<blockquote><p>We&#8217;ve long believed that personalization makes search more relevant and  fun. For nearly five years, we&#8217;ve been tailoring results with <a href="http://googleblog.blogspot.com/2005/06/search-gets-personal.html">personalized  search</a>. Today we&#8217;re announcing a new feature in search that makes  it easier for you to mark and rediscover your favorite web content —  stars.  With stars, you can simply click the star marker on any  search result or map and the next time you perform a search, that item  will appear in a special list right at the top of your results when  relevant. That means if you star the official websites for your favorite  football teams, you might see those results right at the top of your  next search for [nfl].</p></blockquote>
<p>So it sounds to me like this is a sort of bookmarking.  What it not as obviously, however, is what this sentence means:<span id="more-1197"></span>&#8220;the next time you perform a search, that item  will appear in a special  list right at the top of your results when  relevant&#8221;.  Does that mean the next time you perform the same search (e.g. [nfl]) that starred item will appear at the top?  Or is it more dynamic than that?  I.e., if I happen to perform the search [new england patriots], and that same link that I&#8217;d previously starred after executing the [nfl] query happens to be ranked in the top k, will it again appear at the top of my list?  (And if so, what is the cutoff/threshold for k?)  Similarly, if Google&#8217;s ranking of my original [nfl] query changes, due to shifting PageRank calculations, changes in freshness, or any of the hundreds++ of other signals that go into the ranking algorithm, and my particular starred web page no longer appears in the top k because it is no longer relevant to the [nfl] query using the signal vector from the current state of index, will the starred item not appear?  After all, Google says that the starred item will only appear if it is relevant, and if it is no longer relevant to the [nfl] query, as determined by Google&#8217;s relevance algorithm, then it won&#8217;t appear?  Even though I had previously starred it with respect to that exact query?</p>
<p>The post continues:</p>
<blockquote><p>In our testing, we learned that people really liked the idea of marking a  website for future reference, but they didn&#8217;t like changing the order  of Google&#8217;s organic search results. With stars, we&#8217;ve created a  lightweight and flexible way for people to mark and rediscover web  content.</p></blockquote>
<p>Now I am thoroughly confused.  People didn&#8217;t like changing the order of Google&#8217;s organic search results, but at the same time, they claim earlier in the post that &#8220;For nearly five years, we&#8217;ve been tailoring results with <a href="http://googleblog.blogspot.com/2005/06/search-gets-personal.html">personalized   search</a>.&#8221;  What does it mean to personalize search results, if not to change the order of Google&#8217;s organic search results? (Quoting the earlier post:</p>
<blockquote><p>With the launch of <a href="http://www.google.com/psearch">Personalized  Search</a>,  you can use that <a href="http://googleblog.blogspot.com/2005/04/from-lost-to-found.html">search  history</a> you&#8217;ve been building to get better results. You probably  won&#8217;t notice much difference at first, but as your search history grows,  your personalized results will gradually improve.</p></blockquote>
<p>So if users didn&#8217;t like changing the order of the organic search results, does this mean that Google has turned off (or will be turning off) personalization completely for all signed-in users?  Or does personalization co-exist with explicit starring/bookmarks?  If so, how exactly does that work?  Will Google change the order (personalize) your organic results using only the signals of query history and implicit relevance (i.e. clickthrough), but not the signal of explicit starring?  That&#8217;s even more confusing&#8230;the amount of mental jazz involved is a bit overwhelming.  Sure, the interface jazz is kept to a minimum, but at the expense of making the user&#8217;s mental model of what the search engine is actually doing for him or her even more muddled.</p>
<p>Perhaps the best way to sort out this confusion is to dive in headfirst and start playing around with the system, seeing what it actually does and when.  But I personally have a difficult time generating the gumption to use a feature for which I have an unclear mental model, an unclear understanding of what it is trying to do for me, how it might change, when it might or might not magically appear.  Especially when some of my actions affect the state of the system and others do not.</p>
<p>One thing I do like about this feature, however, is that it uses out-of-band displays to show different types of information.  Rather than trying to mix global/non-personalized results, implicit personalized results, and starred results, it lets you know via a separate channel whether there is any information that you have previously starred.  This is an IR design principle that I would like to see more of &#8212; separate goals in separate channels.  Examples of different IR goals include navigation, re-finding, discovery, exploration, etc.  Rather than trying to mix results from all of these goals into a single channel (a single ranked list) it is quite useful to separate each goal from the other.  This new Google interface does that.  What exactly the goal attached to that separate channel is, again, unclear.  But the existence of a separate channel is an interesting and exciting approach, one that I hope to see more of.</p>
]]></content:encoded>
			<wfw:commentRss>http://irgupf.com/2010/04/28/seeing-stars/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Search in Social Media</title>
		<link>http://irgupf.com/2010/01/29/search-in-social-media/</link>
		<comments>http://irgupf.com/2010/01/29/search-in-social-media/#comments</comments>
		<pubDate>Fri, 29 Jan 2010 16:22:41 +0000</pubDate>
		<dc:creator>jeremy</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Information Retrieval Foundations]]></category>

		<guid isPermaLink="false">http://irgupf.com/?p=1175</guid>
		<description><![CDATA[What is Social Search as opposed to Social Media?  Social Search in Media?  Search in Social Media?
Next week, Gene Golovchinsky and I are moderating a pair of panels at the SSM workshop.  So we spent some time this week asking ourselves these definitional questions in preparation for the panel.  We came up with a lightweight [...]]]></description>
			<content:encoded><![CDATA[<p>What is Social Search as opposed to Social Media?  Social Search in Media?  Search in Social Media?</p>
<p>Next week, Gene Golovchinsky and I are moderating a pair of panels at the <a href="http://ir.mathcs.emory.edu/SSM2010/">SSM workshop</a>.  So we spent some time this week asking ourselves these definitional questions in preparation for the panel.  We came up with a lightweight taxonomy, and have done a few classifications/examples of existing systems into that taxonomy.  Whether or not you are one of the 80 participants in the workshop, I would invite you to take a look at our framework and comment or critique where necessary.  Here&#8217;s the <a href="http://palblog.fxpal.com/?p=2814">link to Gene&#8217;s writeup</a>:</p>
<blockquote><p>We think the phrase ’search in social media’ has been used to refer to both the information being searched, and to the process for doing so. The information is standard user-generated content — tweets, blog posts, comment threads, tags, etc. The process, however, seems less well understood&#8230;It will be interesting to see how these ideas will be transformed by the discussion at the workshop. In any case, having a language with which to talk about phenomena is a prerequisite to articulating a research agenda, particularly in a young and multi-disciplinary field.</p></blockquote>
<p>Please note, however, that one topic that will probably not be covered is the difference between social search (process) and collaborative search (process).  The <a href="http://workshops.fxpal.com/cscw2010cis/">latter workshop will be held a few days later at CSCW</a>.  For an interesting thread on the distinction between the two, <a href="http://palblog.fxpal.com/?p=350#comments">please see another FXPAL post from March of last year</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://irgupf.com/2010/01/29/search-in-social-media/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Kasparov and Good Interaction Design</title>
		<link>http://irgupf.com/2010/01/25/kasparov-and-good-interaction-design/</link>
		<comments>http://irgupf.com/2010/01/25/kasparov-and-good-interaction-design/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 22:59:14 +0000</pubDate>
		<dc:creator>jeremy</dc:creator>
				<category><![CDATA[Explanatory Search]]></category>
		<category><![CDATA[Exploratory Search]]></category>
		<category><![CDATA[Information Retrieval Foundations]]></category>

		<guid isPermaLink="false">http://irgupf.com/?p=1170</guid>
		<description><![CDATA[A NYT books article about Kasparov and chess, and the relationship between humans, machines, and decision processes is making the Twitter rounds today.  I don&#8217;t have time at the moment to write a long comment about it, but I do want to point out that it supports a position that I&#8217;ve been taking on this [...]]]></description>
			<content:encoded><![CDATA[<p>A <a href="http://www.nybooks.com/articles/23592">NYT books article about Kasparov and chess, and the relationship between humans, machines, and decision processes</a> is making the Twitter rounds today.  I don&#8217;t have time at the moment to write a long comment about it, but I do want to point out that it supports a position that I&#8217;ve been taking on this blog for some time now:</p>
<blockquote><p>This experiment goes unmentioned by Russkin-Gutman, a major omission since it relates so closely to his subject. Even more notable was how the advanced chess experiment continued. In 2005, the online chess-playing site Playchess.com hosted what it called a &#8220;freestyle&#8221; chess tournament in which anyone could compete in teams with other players or computers. Normally, &#8220;anti-cheating&#8221; algorithms are employed by online sites to prevent, or at least discourage, players from cheating with computer assistance. (I wonder if these detection algorithms, which employ diagnostic analysis of moves and calculate probabilities, are any less &#8220;intelligent&#8221; than the playing programs they detect.)</p>
<p>Lured by the substantial prize money, several groups of strong grandmasters working with several computers at the same time entered the competition. At first, the results seemed predictable. The teams of human plus machine dominated even the strongest computers. The chess machine Hydra, which is a chess-specific supercomputer like Deep Blue, was no match for a strong human player using a relatively weak laptop. Human strategic guidance combined with the tactical acuity of a computer was overwhelming.</p>
<p>The surprise came at the conclusion of the event. The winner was revealed to be not a grandmaster with a state-of-the-art PC but a pair of amateur American chess players using three computers at the same time. Their skill at manipulating and &#8220;coaching&#8221; their computers to look very deeply into positions effectively counteracted the superior chess understanding of their grandmaster opponents and the greater computational power of other participants. Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process.</p></blockquote>
<p>This result seems awfully similar to some of the other results I&#8217;ve reported on in the past.  <span id="more-1170"></span>For example, see this <a href="http://irgupf.com/2009/11/04/good-interaction-ii-just-ask/">paper by Amatriain</a>:</p>
<blockquote><p>Data is always important, but what struck me in the writeup was his discovery that the biggest advances came not from accumulation of massive amount of data, log files, clicks, etc.  Rather, while dozens and dozens of researchers around the world were struggling to reach that coveted 10% improvement by eking out every last drop of value from large data-only methods, Amatriain comparatively easily blew past that ceiling and hit 14%.  How?  He simply asked users to <a href="http://technocalifornia.blogspot.com/2009/08/rate-it-again.html">denoise their existing data by rerating a few items</a>.  In short, Amatriain resorted to <a href="http://en.wikipedia.org/wiki/Human_Computer_Information_Retrieval">HCIR</a>:</p></blockquote>
<p>See also Tessa Lau&#8217;s post about how <a href="http://irgupf.com/2009/03/25/good-interaction-design-trumps-smart-algorithms/">good interaction design trumps smart algorithms</a>:</p>
<blockquote><p>I come to the field of HCI via a background in AI, <em>having learned the hard way that good interaction design trumps smart algorithms </em>in the quest to deploy software that has an impact on millions of users. Currently a researcher at IBM’s Almaden Research Center, I lead a team that is exploring new ways of capturing and sharing knowledge about how people interact with the web.  We conduct HCI research in <em>designing and developing new interaction paradigms</em> for end-user programming.</p></blockquote>
<p>See also two of my previous posts, <a href="http://irgupf.com/2009/04/30/more-and-faster-versus-smarter-and-more-effective/">More and Faster versus Smarter and More Effective</a> and <a href="http://irgupf.com/2009/08/24/a-bird-in-the-hand/">A Bird in the Hand</a>.</p>
<p>The theme that I see is that, while big data approaches do work well, what works even better is a small amount of user interaction.  With big data methods  (even ones that incorporate human interaction in the form of massive log data) all you can do is make inferences about what is good and what is not good.  The more historical user data you have, the more correct your inference about the current scenario is likely to be.  But none of it is as correct as receiving explicit feedback from the user, and turning a probability into a certainty.</p>
<p>And that&#8217;s where I see good interaction design coming into play.  By turning a probability into a certainty, your back end algorithms can stop wasting their CPU cycles doing all the inferential heavy lifting about what the user is actually trying to say or do, and can start using their CPU cycles to explore a wider range of consequences of that informational certainty.</p>
]]></content:encoded>
			<wfw:commentRss>http://irgupf.com/2010/01/25/kasparov-and-good-interaction-design/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>What You Can Find Out</title>
		<link>http://irgupf.com/2010/01/12/what-you-can-find-out/</link>
		<comments>http://irgupf.com/2010/01/12/what-you-can-find-out/#comments</comments>
		<pubDate>Tue, 12 Jan 2010 11:53:59 +0000</pubDate>
		<dc:creator>jeremy</dc:creator>
				<category><![CDATA[Exploratory Search]]></category>
		<category><![CDATA[Information Retrieval Foundations]]></category>

		<guid isPermaLink="false">http://irgupf.com/?p=1150</guid>
		<description><![CDATA[The Edge has published their annual question for 2010:
HOW IS THE INTERNET CHANGING THE WAY YOU THINK?
As an Information Retrieval research scientist, I of course was quite interested in what search folks had to say.  I found this blurb from Marissa Mayer intriguing:
It&#8217;s not what you know, it&#8217;s what you can find out. The Internet [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.edge.org/q2010/q10_index.html">The Edge has published their annual question for 2010</a>:</p>
<blockquote><p><strong>HOW IS THE INTERNET CHANGING THE WAY YOU THINK?</strong></p></blockquote>
<p>As an Information Retrieval research scientist, I of course was quite interested in what search folks had to say.  I found this blurb from Marissa Mayer intriguing:</p>
<blockquote><p>It&#8217;s not what you know, it&#8217;s what you can find out. The Internet has put at the forefront resourcefulness and critical-thinking and relegated memorization of rote facts to mental exercise or enjoyment. Because of the abundance of information and this new emphasis on resourcefulness, the Internet creates a sense that anything is knowable or findable — as long as you can construct the right search, find the right tool, or connect to the right people. The Internet empowers better decision-making and a more efficient use of time&#8230;</p>
<p>The Web has also enabled amazing dynamic visualizations, where an ideal presentation of information is constructed — a table of comparisons or a data-enhanced map, for example. These visualizations — be it news from around the world displayed on a globe or a sortable table of airfares — can greatly enhance our understanding of the world or our sense of opportunity. We can understand in an instant what would have taken months to create just a few short years ago. Yet, the Internet&#8217;s lack of structure means that it is not possible to construct these types of visualizations over any or all data. To achieve true automated, general understanding and visualization, we will need much better machine learning, entity extraction, and semantics capable of operating at vast scale.</p></blockquote>
<p>It sounds like there is an increased awareness of (and respect for) Exploratory Search.  I&#8217;ve heard this via private channels, but this is the first time I&#8217;ve seen an acknowledgment of the need for more exploratory search from such an official channel.</p>
<p>I do want to point out, however, that in order to make this work at web scale, we won&#8217;t just need better automated methods.  I.e. we cannot rely solely on machine learning, entity extraction, or web-scale semantics.  Rather, what is also desperately needed is a way for the user him- or herself to inject personal semantics and structure into the search, visualization, and comparison process.  The search engine itself needs to be responsive to the structure that the user is giving to it, and rearrange itself around that information.</p>
<p>I am afraid that I am not being very clear in the vision that I&#8217;m attempting to lay out, so let me draw an analogy to parametric and non-parametric statistical modeling.  <span id="more-1150"></span>In parametric modeling, you assume that your data is distributed according to some function (say, Gaussian) and then you try and find those parameters that best fit the data.  On the other hand, with non-parametric modeling you make no such assumption.  You simply let the data describe itself through its own correlations and patterns.</p>
<p>By analogy: Assuming that the only way to visualize and compare information (do exploratory search) on the web is to rely on machine learning to do entity extraction and web-scale semantics is like assuming that one has to have a parametric model.  It helps, but it is not absolutely necessary.  My vision is for another approach, one analogous to non-parametric methods: Let the user give feedback on the relationship between items that he or she has examined during the search process and then use that comparison information to build personalized visualization or comparison tool for that user&#8217;s specific information need, from the ground up.  Don&#8217;t rely on the parametric form of semantic categories or named entities.  Use bottom-up patterns to facilitate organization and comparison, discovery and learning, decision making and exploration.  More importantly, use the feedback provided by the user (e.g. &#8220;these two items are similar&#8221;, and &#8220;these two items are not&#8221;) to drive your online, bottom-up exploration.</p>
<p>We have to get away from this attempt to solve the exploration problem ahead of time, off-line, before the user has ever issued a query.  That&#8217;s the parametric way of thinking, the way that presumes that categories and labels and entities are the best way of tackling organization and discovery.  Rather, we have to become better at involving the user, the person doing the exploration, in the feedback loop, and not rely solely on pre-computed, machine-learning-extracted entities.</p>
<p>Unlike navigational search, in which users are rarely willing to do any extra work themselves, users engaged in exploratory search by their very nature desire to interact more with the system and put more of their own sweat and tears into the search process.  They would not be exploring, if they weren&#8217;t.  So why not make use of this user willingness?</p>
<p>Computational resources are going to be a challenge.  But that&#8217;s where <a href="http://googleblog.blogspot.com/2009/12/meaning-of-open.html">Google&#8217;s new commitment to openness</a> (<a href="http://developer.yahoo.com/search/boss/">and Yahoo!&#8217;s initial, existing commitment</a>) <a href="http://irgupf.com/2009/12/22/google-and-the-meaning-of-open/">comes in handy</a>.  There should be a willingness to offload some of the computation (and therefore also the search data itself) to the user&#8217;s own computer.  Instead of SETI@Home, we could have SEARCH@Home.  Let the user&#8217;s underutilized processing power be partially responsible for computing some of these bottom-up patterns in his or her own search data that will help make dynamic visualization a reality.  Make the user&#8217;s own computer partially responsible for the additional necessary processing.</p>
<p>Mayer is correct: &#8220;<em>The Internet has put at the forefront resourcefulness and critical-thinking and relegated memorization of rote facts to mental exercise or enjoyment. Because of the abundance of information and this new emphasis on resourcefulness, the Internet creates a sense that anything is knowable or findable — as long as you can construct the right search, find the right tool, or connect to the right people.</em>&#8220;  We should be developing systems that enable the users to construct the right search. The user should be able to rely on our her resourcefulness to mash up and explore the data herself, to shed light on patterns of information hitherto unknowable by single-line input box navigational search.  Users should be able to apply critical thinking to their search process in a way that makes sense to the user, not in a way that has been pre-computed through some semantic category and machine learning classifier.  And a good search engine should be a valuable partner in this process, by way of flexibility and openness, not by way of constraint and closedness.</p>
<p>Only then will we, the users of these systems, be able to find out what we previously could not find out.  At least, that is how the Internet is changing the way I think.</p>
]]></content:encoded>
			<wfw:commentRss>http://irgupf.com/2010/01/12/what-you-can-find-out/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Search versus Recommendation: Not The Only Tension</title>
		<link>http://irgupf.com/2010/01/05/more-tensions/</link>
		<comments>http://irgupf.com/2010/01/05/more-tensions/#comments</comments>
		<pubDate>Tue, 05 Jan 2010 11:14:59 +0000</pubDate>
		<dc:creator>jeremy</dc:creator>
				<category><![CDATA[Collaborative Information Seeking]]></category>
		<category><![CDATA[Exploratory Search]]></category>
		<category><![CDATA[Information Retrieval Foundations]]></category>

		<guid isPermaLink="false">http://irgupf.com/?p=1131</guid>
		<description><![CDATA[Greg Linden has an interesting post on Search on a domain like YouTube.  I reproduce it here because I would like to elaborate on it:
The article focuses on YouTube&#8217;s &#8220;plans to rely more heavily on personalization and ties between users to refine recommendations&#8221; and &#8220;suggesting videos that users may want to watch based on what [...]]]></description>
			<content:encoded><![CDATA[<p>Greg Linden has an interesting post on Search on a domain like YouTube.  I reproduce it here because I would like to elaborate on it:</p>
<blockquote><p>The article focuses on YouTube&#8217;s &#8220;plans to rely more heavily on personalization and ties between users to refine recommendations&#8221; and &#8220;suggesting videos that users may want to watch based on what they have watched before, or on what others with similar tastes have enjoyed.&#8221;  What is striking about this is how little this has to do with search. As described in the article, what YouTube needs to do is entertain people who are bored but do not entirely know what they want. YouTube wants to get from users spending &#8220;15 minutes a day on the site&#8221; closer to the &#8220;five hours in front of the television.&#8221; This is entertainment, not search. Passive discovery, playlists of content, deep classification hierarchies, well maintained catalogs, and recommendations of what to watch next will play a part; keyword search likely will play a lesser role.</p></blockquote>
<p>My feeling is that the dichotomy that is being drawn does not exhaustively cover the space.  I would characterize the space using the following two orthogonal dimensions: (1) Information Need Clarity and (2) User Engagement.  The first dimension (clarity) is related to the degree with which the user understands his or her own information need, i.e. has something specific in mind that he is looking for and/or understands what he needs to do to find it.  That need may either be well understood, or (<a href="http://blog.codalism.com/?p=984">to borrow Nick Belkin&#8217;s terminology</a>) &#8220;anomalous&#8221;: The user doesn&#8217;t know what he or she doesn&#8217;t know.  The second dimension is related to the level at which the user applies himself to the information seeking process.  That level may be active or passive.</p>
<p>Greg points out two modes: &#8220;<em>Active Understood</em>&#8221; (typical navigational web search) and &#8220;<em>Passive Anomalous</em>&#8221; (entertainment/discovery/recommendation).  But I believe that there are more than these two modes.  A large, interesting design space opens up when one realizes that information seeking can be &#8220;<em>Active Anomalous</em>&#8221; and &#8220;<em>Passive Understood</em>&#8220;.</p>
<p><img class="aligncenter size-full wp-image-1136" title="dimensions" src="http://irgupf.com/wp-content/uploads/2010/01/dimensions2.png" alt="dimensions" width="966" height="119" /></p>
<p><a href="http://en.wikipedia.org/wiki/Exploratory_search">Exploratory Search</a> is a good example of Active Anomalous seeking.  One doesn&#8217;t yet fully know or understand what it is that one is looking for, but at the same time one is willing to engage with an information system in order to discover what it is that he or she does not yet know.  And the system itself is designed not necessarily toward trying to answer a well understood need, but toward helping the user map out and better comprehend a space.</p>
<p>Collaborative Information Seeking (see <a href="http://irgupf.com/2009/03/30/collaborative-information-seeking-ongoing-recap/">here</a> and <a href="http://palblog.fxpal.com/?p=272">here</a> and <a href="http://workshops.fxpal.com/cscw2010cis/CFP.aspx">here</a>) is a good example of where an need may be well understood, but a user does not necessarily have to actively express every last query detail in order to get more information on a topic.  Why not?  Because when User #1 is explicitly collaborating with User #2, an algorithmic mediation engine can push some of User #2&#8217;s activity on to User #1 without requiring User #1 to make additional effort.  Note that I am not implying that every aspect of collaborative information seeking is passive; quite the contrary, as it requires at least one co-collaborator to be active.  I am only pointing out that it is a domain in which it becomes possible for a user to passively obtain specific information on a well understood need.</p>
<p>There is a lot discussion in the Information Retrieval Community on the similarities and differences between Search and Recommendation.  A fruitful tension opens up as one travels back and forth along the diagonal from Active Understood to Passive Anomalous; the two approaches often end up complementing each other.  Where I see much less discussion is on the tension that opens up along the other diagonal, between Passive Understood and Active Anomalous.  When Exploratory Search meets Collaborative Information Seeking, it yields <a href="http://www.fxpal.com/?p=abstract&amp;abstractID=460">Collaborative Exploratory Search</a> and a whole host of interesting possibilities.  Over the coming year I will be blogging more about the tension along this alternative diagonal (both here at on the<a href="http://palblog.fxpal.com/"> FXPAL blog</a>) and what it means for the Information Retrieval systems I and others are designing.  Happy 2010!</p>
]]></content:encoded>
			<wfw:commentRss>http://irgupf.com/2010/01/05/more-tensions/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>A Fragile Local Maximum for the Web</title>
		<link>http://irgupf.com/2009/12/23/a-fragile-local-maximum-for-the-web/</link>
		<comments>http://irgupf.com/2009/12/23/a-fragile-local-maximum-for-the-web/#comments</comments>
		<pubDate>Thu, 24 Dec 2009 00:23:10 +0000</pubDate>
		<dc:creator>jeremy</dc:creator>
				<category><![CDATA[Information Retrieval Foundations]]></category>
		<category><![CDATA[Social Implications]]></category>

		<guid isPermaLink="false">http://irgupf.com/?p=1112</guid>
		<description><![CDATA[On Twitter today, Josh Young made an interesting observation to which I would like to call attention:
Ya, @jerepick, with &#8220;fauxpen&#8221; attached, google&#8217;s &#8220;nav. search as the top of the stack&#8221; is a fragile local maximum for the web.
This observation is a followup to the web-wide discussion that Google kicked off about the meaning of open.  [...]]]></description>
			<content:encoded><![CDATA[<p>On Twitter today, <a href="http://twitter.com/jny2/statuses/6972686099">Josh Young made an interesting observation</a> to which I would like to call attention:</p>
<blockquote><p>Ya, @jerepick, with &#8220;fauxpen&#8221; attached, google&#8217;s &#8220;nav. search as the top of the stack&#8221; is a fragile local maximum for the web.</p></blockquote>
<p>This observation is a followup to the web-wide discussion that <a href="http://irgupf.com/2009/12/22/google-and-the-meaning-of-open/">Google kicked off about the meaning of open</a>.  Essentially, Rosenberg says that all of Google&#8217;s products at that are not at search layers of the stack should work toward being open, but that the search layer itself should be closed.  To protect it from spammers, you understand {cough}.</p>
<p>Earlier in the same post Rosenberg makes a distinction between open source  and open data, calling for increased openness in both.  However, when it comes to defending closed-search, this distinction gets lost.  But this distinction between open source vs. open data is important.  Here is how it translates to the search domain:</p>
<ul>
<li><strong>Open Source = Open search algorithm</strong> is about letting the world know what features are used to rank pages and how those features interrelate (are weighted)</li>
<li><strong>Open Data = Open search results</strong> is about letting users refactor, remix, reuse, mashup, store and re-search locally any and all query results that the user issues.  And about letting the user use any software that they want to accomplish this &#8212; not just Google software</li>
</ul>
<p>The excuse given about why Google cannot open up is that of spammers would be able to game the engine.  But if we look closely, we&#8217;ll see that it is an excuse that is primarily, if not exclusively, related to the &#8220;open source&#8221; aspect of openness.  Black hat SEO algorithmic gaming is not an issue when it comes to user results re-use and remixing.</p>
<p>And so the point (I think) Josh is making is that by closing not only the algorithm, but also the results of that algorithm, Google has effectively declared a moratorium on Internet application stack progress along that vertical.  Google is essentially saying to the Internet: <em><span id="more-1112"></span>&#8220;You shall not pass.  If anyone wants to develop a application that makes use of search results as a &#8220;lower&#8221; stack layer, that person will have to write an entire search engine, themself.  We are in favor of any layer underneath us &#8212; or parallel to us such as gmail &#8212; growing the internet pie, but we will not directly participate in growing the pie ourselves by opening up our results so as to allow search itself to become a middling layer in someone else&#8217;s stack.&#8221;</em></p>
<p>This does not sit well with me. A search engine by nature is built on the stack layer of web page content, which is built on the stack layer of internet and transmission control protocols, and so on.  To say that users have no right to use whatever software they choose to build further on this search engine layer denies users the same basic open rights that people like Lawrence Lessig so passionately fight for.  In fact, <a href="http://www.windley.com/archives/2009/09/claiming_my_right_to_a_purposecentric_web_sidewiki.shtml">some have argued that these rights are not even Google&#8217;s to grant</a>, that fair use lets us re-use the search results that we, by our querying effort, had a hand in creating.  So in the spirit of openness Google should fully open up its results to programmatic, API access.  And allow users to remix and reuse, to metasearch and to share (e.g. social search and collaborative search).  That grows the pie, does it not?</p>
<p>I do understand why Google will not do this. It&#8217;s because by so doing, Google would effectively allow itself to be disintermediated, the same way they are currently disintermediating the newspaper industry.  By decoupling results from ads (where money is made), it makes it much more difficult for Google to monetize its traffic &#8212; a problem that all disintermediated layers of the stack face.  Naturally Google doesn&#8217;t want to put itself in this position.   But that is what makes their current stance on &#8220;open&#8221; all the more perplexing &#8212; they expect others (e.g. newspapers) to open up their revenue-stream stack layers, but refuse to do so themselves.  Why take such a strong position on openness and then give an unrelated (spammer) excuse about why you cannot be?</p>
<p>So why am I writing about this?  Again, let&#8217;s go back to <a href="http://twitter.com/jny2/statuses/6980852068">something Josh just retweeted</a>:</p>
<blockquote><p><span><span>RT @<a href="http://twitter.com/jonathanglick">jonathanglick</a> The Open debate matters because, right now, for the first time in a decade, the forces of Closed are on the march.</span></span></p></blockquote>
<p><span><span>This, combined with Google&#8217;s open call for earnest web-wide discussion and debate, has increased my desire to add to the conversation. And my point here is that spammers are not the issue when it come to making Google search &#8220;open&#8221;.  You can open the data without opening the algorithm.  If anyone has pointers to refutations to this line of reasoning, I welcome them.<br />
</span></span></p>
]]></content:encoded>
			<wfw:commentRss>http://irgupf.com/2009/12/23/a-fragile-local-maximum-for-the-web/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google and the Meaning of Open</title>
		<link>http://irgupf.com/2009/12/22/google-and-the-meaning-of-open/</link>
		<comments>http://irgupf.com/2009/12/22/google-and-the-meaning-of-open/#comments</comments>
		<pubDate>Tue, 22 Dec 2009 11:44:43 +0000</pubDate>
		<dc:creator>jeremy</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Information Retrieval Foundations]]></category>
		<category><![CDATA[Social Implications]]></category>

		<guid isPermaLink="false">http://irgupf.com/?p=1093</guid>
		<description><![CDATA[There is a fantastic Google blog post today by Jonathan Rosenberg on the meaning (and value) of openness.  Whooo-boy.. where do we start with this can of worms?  Guess I&#8217;ll jump right in.  Warning: This is probably the longest post I&#8217;ve written, so if you are easily bored, understand that this is not required reading.  [...]]]></description>
			<content:encoded><![CDATA[<p>There is a fantastic Google blog post today by Jonathan Rosenberg on the <a href="http://googleblog.blogspot.com/2009/12/meaning-of-open.html">meaning (and value) of openness</a>.  Whooo-boy.. where do we start with this can of worms?  Guess I&#8217;ll jump right in.  Warning: This is probably the longest post I&#8217;ve written, so if you are easily bored, understand that this is not required reading.  It will not be on the test.</p>
<p>Here we go:</p>
<blockquote><p>At Google we believe that open systems win. They lead to more innovation, value, and freedom of choice for consumers, and a vibrant, profitable, and competitive ecosystem for businesses.</p></blockquote>
<p>Agreed!  I&#8217;m fully on board the spirit of this opening statement!</p>
<blockquote><p>Many companies will claim roughly the same thing since they know that declaring themselves to be open is both good for their brand and completely without risk.</p></blockquote>
<p>True.  So the question arises: What happens when being open carries with it an amount of risk?  Do you open up those areas of your business as well?  Or do you forever keep your most valuable layer of the stack closed and proprietary, both in terms of closed source as well as not-fully-open information?</p>
<blockquote><p>We run the company and make our product decisions based on these principles, so I encourage you to carefully read, review, and debate them. Then own them and try to incorporate them into your work. This is a complex subject and if there is debate (and I&#8217;m sure there will be) it should be in the open! Please feel free to comment.</p></blockquote>
<p>I like the spirit of this discussion so far.  I earnestly believe that Google is debating these things internally.  But I also take them at their word that they would like this debate to be in the open.  Consider this blog post part of my ongoing comment, and ongoing engagement in what I consider to be an extremely important area: The organization and dissemination of information.<span id="more-1093"></span></p>
<blockquote><p>There are two components to our definition of open: open technology and open information. Open technology includes open source, meaning we release and actively support code that helps grow the Internet, and open standards, meaning we adhere to accepted standards and, if none exist, work to create standards that improve the entire Internet (and not just benefit Google). Open information means that when we have information about users we use it to provide something that is valuable to them, we are transparent about what information we have about them, and we give them ultimate control over their information.</p></blockquote>
<p>Ok, first question: Why does open information only include the information that you collect about users?  Why does it not include information about what you are trying to do for users, and how, and why?  Why does it not include what metrics you are optimizing, so that the user can understand what the search ranking functions are trying to do for him or her.  For example, is Google open enough to enable me to know how much diversity is intentionally being injected into my search results?  Will they allow me to know to what extent my particular query being optimized for precision or for recall?  Will they allow me to know what factors went into the decision to rank page X higher than page Y, and can I change those factors, so that I am able to instruct the search engine to give me different sets of results for the same exact query term, so as to maximize my value for my own understanding of my own particular information need?  Never mind whether or not most users would even want to do something like this.  Most probably do not, but many (more than 5%, I have little doubt) do.   Is Google open (transparent) enough, in principle, to ever allow a user to make use of the service in this manner? Or is this something that will forever be hidden from the user&#8217;s view?</p>
<p>More importantly: Why is the information shown to the user not symmetric?  Google stores (and uses) more about the user than the user is allowed to store (and use) about Google.  Let me give a concrete example: When I run a query, Google knows a number of piece of information that arises out of my information-generating actions.  It knows:</p>
<ol>
<li>The text of the query itself, along with the timestamp</li>
<li>All the results that are mutually generated from intersection of the query (my intellectual product) and the algorithm (Google&#8217;s intellectual product)</li>
<li>The link(s) that I clicked as a result of that query</li>
<li>The link(s) that I <em>didn&#8217;t</em> click as a result of that query</li>
</ol>
<p>Last I checked, Google allows me to access and export (1) and (3).  But they do not allow me to access and export (2) and (4).  It&#8217;s not that I didn&#8217;t interact with those results, or have a shared hand in their creation.  I created the query in the first place.  And then for a number of results, I viewed them and explicitly made a choice (judgment) not to visit certain pages.  Just because there was no click does not mean that there isn&#8217;t any information.  This idea is counterintuitive at first, but it&#8217;s true.  There was an interface, an option about whether or not to click, and a decision not to click.  I created that information during my search session.  So why can I not export that information?  Why can I not get a list of all the results that I viewed, that I decided not to click?</p>
<p>If this information were open, especially through an API, I could start to do all sorts of interesting things with it.  I could keep track of whether or not there were certain pages that kept coming up in the results, over and over.  And maybe that would allow me to reevaluate my initial non-relevance decision and look deeper into a piece of information that had initially not appeared fruitful (not had much of an information scent associated with it).  Being able to keep API, algorithmic track of all the pages I didn&#8217;t click would also allow me to compare and contrast relevance and non-relevance information on Google with other search engines such as Yahoo! and Bing.  It would allow an ecosystem of services providers to grow up around Google and provide me with software solutions that allowed me to keep tabs on all the search engines simultaneously and understand the relative differences between and relative merits of each.  In other words, being able to API-download both my clicks and my non-clicks would allow me to <a href="http://en.wikipedia.org/wiki/Metasearch_engine">metasearch</a> Google!  (And there is a long history of academic literature on the value of metasearch; I don&#8217;t need to go into it here.)</p>
<p>I am not talking about some third party company scraping Google&#8217;s results and displaying them on their own website for their own purposes.  I&#8217;m talking about being able to use software (that I&#8217;ve licensed from a third party) to do it myself, for myself.  For more on this, see Phil Windley&#8217;s posts, <a href="http://www.windley.com/archives/2009/10/its_my_browser_and_ill_autoclick_if_i_want_to.shtml">It&#8217;s My Browser and I&#8217;ll Auto-Click If I Want To</a> and <a href="http://www.windley.com/archives/2009/09/claiming_my_right_to_a_purposecentric_web_sidewiki.shtml">Claiming My Right to a Purpose-Centric Web</a> and Jon Udell&#8217;s post <a href="http://blog.jonudell.net/2009/10/08/magic-glasses-and-magic-projectors-private-versus-public-augmentation-of-experience/">Magic Glasses and Magic Projectors: Private Versus Public Augmentation of Experience</a> (see also the comments/discussion in this latter post).  As Windley writes, the issue is this: <em>D</em><em>o people have the right to control how Web content is displayed in their browser?</em> Openness of search data and information, including all the data associated with that mutually-interactive process (both clicked and non-clicked results) is the cornerstone of transparency and end-user value.  In order to claim openness, and to establish end-user value, I feel that there has to be a completely symmetry between everything that Google stores (and makes use of, internally) about the user, and everything that the user is allowed store (and make use of, internally) about Google.</p>
<p>So is there  a symmetry?  No.  Read the <a href="http://www.google.com/intl/en/mobile/xhtml/terms_of_service.html">Terms of Service</a>:</p>
<blockquote><p>The Google Services are made available for your personal, non-commercial use only&#8230;{snip}&#8230;<em>You may not take the results from a Google search and reformat and display them</em>, or mirror the Google home page or results pages on your Web site. <em>You may not &#8220;meta-search&#8221; Google</em>&#8230;{snip}&#8230;You may not send automated queries of any sort to Google&#8217;s system without express permission in advance from Google. Note that &#8220;sending automated queries&#8221; includes, among other things: using any software which sends queries to Google to determine how a website or webpage &#8220;ranks&#8221; on Google for various queries; <em>&#8220;meta-searching&#8221; Google</em>; and <em>performing &#8220;offline&#8221; searches on Google</em>. Please do not write to Google to request permission to &#8220;meta-search&#8221; Google for a research project, as such requests will not be granted.</p></blockquote>
<p>Compare that to what was being said above:</p>
<blockquote><p>Open information means that when we have information about users we use it to provide something that is valuable to them, we are transparent about what information we have about them, and we give them ultimate control over their information.</p></blockquote>
<p>To me, that does not sound like I have ultimate control over my information&#8230;my queries, my clicks, and my non-clicks.  If I had ultimate control over those, then I could take those clicked and non-clicked links and use them in any way that I wanted.  I could re-search those links, offline, without having to return to Google (value to the user: this response time would be much quicker, and internet connectivity would not be required &#8212; privacy could also be enhanced!)  I could mash those links up with other search results, from both Google and non-Google sources (metasearch; value to the user: better results, more diverse results, more complete results).  As Google says:</p>
<blockquote><p>Another way to look at the difference between open and closed systems is that open systems allow innovation at all levels — from the operating system to the application layer — not just at the top. This means that one company doesn&#8217;t have to depend on another&#8217;s benevolence to ship a product.</p></blockquote>
<p>That is, if I want to install software that lets me metasearch Google, the creator of that software should not have to depend on Google&#8217;s benevolence in order to be able to create their product.  Openness at the system level is what allows a third-party company develop the software, and Google-to-user openness on the data level is what lets me, the user, put my data into that software to mash up all of my queries, clicks, and non-clicks, essentially metasearching Google.  This is exactly the sort of scenario that openness is meant to address.</p>
<p>Continuing on:</p>
<blockquote><p>If we can embody a consistent commitment to open — which I believe we can — then we have a big opportunity to lead by example and encourage other companies and industries to adopt the same commitment. If they do, the world will be a better place.</p></blockquote>
<p>And:</p>
<blockquote><p>If they use our products and store content with us, it&#8217;s their content, not ours. They should be able to export it or delete it at any time, at no cost, and as easily as possible.</p></blockquote>
<p>Again, I agree.  When <em>all</em> search engines, Google included, allow me to store (and reuse!) my search interaction data (all queries, clicks, and non-clicks) then this data truly becomes valuable to me, the user.  Yahoo! has taken much more of a lead in this area with <a href="http://developer.yahoo.com/searchmonkey/">SearchMonkey</a>.  Not only do I not see the same openness from Google, I see Terms of Service that actively discriminate against these sorts of applications and usage of one&#8217;s own data.</p>
<blockquote><p>Open systems are just the opposite. They are competitive and far more dynamic. In an open system, a competitive advantage doesn&#8217;t derive from locking in customers, but rather from understanding the fast-moving system better than anyone else and using that knowledge to generate better, more innovative products. The successful company in an open system is both a fast innovator and a thought leader; the brand value of thought leadership attracts customers and then fast innovation keeps them. This isn&#8217;t easy — far from it — but fast companies have nothing to fear, and when they are successful they can generate great shareholder value.  Open systems have the potential to spawn industries. They harness the intellect of the general population and spur businesses to compete, innovate, and win based on the merits of their products and not just the brilliance of their business tactics.</p></blockquote>
<p>Make no mistake: I greatly admire the ideals being expressed here.  I cannot stop agreeing.  I just have a hard time reconciling this with what is written later on in the same post:</p>
<blockquote><p>While we are committed to opening the code for our developer tools, not all Google products are open source. Our goal is to keep the Internet open, which promotes choice and competition and keeps users and developers from getting locked in. In many cases, most notably our search and ads products, opening up the code would not contribute to these goals and would actually hurt users. The search and advertising markets are already highly competitive with very low switching costs, so users and advertisers already have plenty of choice and are not locked in.</p></blockquote>
<p>By only making part of one&#8217;s information available (queries and clicks) and not the other part (seen but non-clicked results that were created by the user+Google collaborative query+algorithm search session, as well as unseen and non-clicked results from that same session), it makes it much more difficult for a user to migrate to another service.  For example, what if Bing were to start offering full-time, default personalization the same way Google now does?  Google undoubtedly trains one&#8217;s personalized algorithm using all of one&#8217;s own information: queries, clicks <strong><em>and</em></strong> non-clicks.  Abandoned searches (and knowledge of exactly which results were <em>not</em> clicked) are just as much an important part of the overall algorithmic mixture, are they not?  They convey important information.  So suppose Bing now wanted to allow you, the user, to jump headfirst into personalized Bing results.  The best way to do that would be to upload your Google search data to Bing, so that Bing could start personalizing based on this same years-long history.  Google does not allow this.  That information is not available for you to use, in any third-party application&#8230;whether metasearch or Bing-provided personalized search.  Saying that a user is only one click away from another search engine masks the full truth.  If the first search engine has learned something about a user and can therefore provide different, personalized results for that user based on months or years of history, then that search engine is stickier than others.  In order to be truly open, a user has to be able to &#8220;export&#8221; his or her profile, both the clicks and the non-clicks, and take that information to another search engine, or else there will be lock-in.</p>
<p>The Google post continues:</p>
<blockquote><p>Not to mention the fact that opening up these systems would allow people to &#8220;game&#8221; our algorithms to manipulate search and ads quality rankings, reducing our quality for everyone.</p></blockquote>
<p>Ok, here is where I have to take another strong, contrary stance.  Google just got finished saying the following, earlier in the post:</p>
<blockquote><p>Open systems are just the opposite. They are competitive and far more dynamic. In an open system, a competitive advantage doesn&#8217;t derive from locking in customers, but rather from understanding the fast-moving system better than anyone else and using that knowledge to generate better, more innovative products.</p></blockquote>
<p>So let me get this straight: Google is pro-openness in all the areas where they don&#8217;t make any money, where openness doesn&#8217;t actually affect the bottom line.  But when it comes to the moneymaker, that has to be closed and proprietary to protect it from spammers? But I thought that open systems are just the opposite.  Doesn&#8217;t competition and dynamicism create an environment in which everyone can work on the problem of fighting spam, thereby coming up with a better solution and a quicker solution than a closed environment can produce?  Isn&#8217;t one of the maxims of openness the idea that the smartness of the competitive environment as a whole can far outproduce any one company?  Isn&#8217;t that the whole idea behind the Netflix prize (for example) that as everyone shares with each other their solutions, everyone gets better than any one team could have done on their own?</p>
<p>Opening up Google&#8217;s algorithms might allow people to &#8220;game&#8221; them for a short time.  But in the long run, the benefits that come from openness would outpace the spammers&#8217; ability to game.  Right?  Isn&#8217;t that the technological optimism that this blog post is expressing?</p>
<blockquote><p>Our skills and our culture give us the opportunity and responsibility to prevent this from happening. We believe in the power of technology to deliver information. We believe in the power of information to do good. We believe that open is the only way for this to have the broadest impact for the most people. We are technology optimists who trust that the chaos of open benefits everyone. We will fight to promote it every chance we get.  Open will win. It will win on the Internet and will then cascade across many walks of life: The future of government is transparency. The future of commerce is information symmetry. The future of culture is freedom. The future of science and medicine is collaboration. The future of entertainment is participation. Each of these futures depends on an open Internet.</p></blockquote>
<p>Open will win.  Open will beat the spammers, will it not?  Hasn&#8217;t Google just expressed confidence that it will?  Simultaneously, open will better serve the users.  And open will allow third-parties to create exploratory- and recall-oriented and social and collaborative search systems that make use of Google algorithms and indices as just another layer in the overall information organization and dissemination stack.  This grows the overall pie.  Remember:</p>
<blockquote><p>Another way to look at the difference between open and closed systems is that open systems allow innovation at all levels — from the operating system to the application layer — not just at the top. This means that one company doesn&#8217;t have to depend on another&#8217;s benevolence to ship a product.</p></blockquote>
<p>Search is another layer in that stack, one that should be just as open as the other layers, and one that need not have fear from gaming, because openness will overcome.  That&#8217;s the call-to-arms that I see expressed in this blog post.  And it gets me excited.  At the same time, I see an elephant in the room.  And that is Google&#8217;s unwillingness to be open in the one (and mainly only) area where money is made.  And what really bothers me about that is that it seems no different from any other technology company: All technology companies wants openness in those layers of the stack where they don&#8217;t make money, and closedness where they do. Google does not seem any different.</p>
<p>Even Google&#8217;s excuse for not being open is very similar to others, like Microsoft&#8217;s.  Microsoft, which makes the bulk of its money on Windows, says that Windows can&#8217;t be open-sourced because hackers will see all the bugs in the code and be able to more easily exploit the OS and write viruses.  Companies like Google say &#8220;nonsense&#8221;, and point to open source OSes like Linux as an example of how openness can breed hardened code that is less hackable.  And artists and record labels and newspapers also worry about bootleggers and content-copiers depriving them of their stack-layer income, despite absolutist assurances from Google that  increased traffic guarantees monetization.  So why does Google think that if we open-sourced search algorithms, the community would not be able to &#8220;harden&#8221; those algorithms against spammers, and simultaneously guarantee Google&#8217;s income?  There may be a little chaos at first, but the chaos of open benefits everyone.</p>
<p>Toward the end, Google wraps up:</p>
<blockquote><p>All of this is useless, however, if we fail when it comes to being open. So we need to constantly push ourselves. Are we contributing to open standards that better the industry? What&#8217;s stopping us from open sourcing our code? Are we giving our users value, transparency, and control? Open up as much as you can as often as you can, and if anyone questions whether this is a good approach, explain to them why it&#8217;s not just a good approach, but the best approach. It is an approach that will transform business and commerce in this still young century, and when we are successful we will effectively re-write the MBA curriculum for the next several decades!</p></blockquote>
<p>Make no mistake, I am on board with the stated goals.  However, in order to rewrite that MBA curriculum, I need to see Google be just as open at their moneymaking core stack layer (search and ads) as they want everyone else to be with operating systems, networks, and intellectual property (books, music, news articles, etc.)  It is just a little too convenient to have an excuse about why one&#8217;s own layer cannot be opened up (spammers), but that everyone else&#8217;s layers can, hackers and bootleggers be damned.</p>
<p>I feel very strongly about all this, but that does not mean that I am correct.  Rather, I am taking Google seriously at its word: &#8220;<em>I encourage you to carefully read, review, and debate [these principles]. Then own them and try to incorporate them into your work. This is a complex subject and if there is debate (and I&#8217;m sure there will be) it should be in the open! Please feel free to comment</em>.&#8221;  Consider this blog post my first of many comments.  Not for the purpose of tearing down, or argument for argument&#8217;s sake.  But for the purpose of getting these challenges and questions and comments out in the open, exactly as requested, in order to further the same end goal.  It would be fantastic if Google succeeded, and I agree with them: It&#8217;s not just a good approach, but the best approach.  But in order to start transforming business and commerce, Google needs to set the example by being open in its core area (search and ads) and trusting that the chaos of openness will defeat spammers in addition to hackers and bootleggers.  Right now, for all the openness in non-core business areas, and for all the talk, Google is unwilling to be open where it really matters.</p>
<p><em>Update</em>: <a href="http://www.techcrunch.com/2009/12/22/google-open-when-convenient/">TechCrunch makes almost exactly the same points</a>, without going into as much detail as I have above.  Still, I think the details that I mention add an important layer to the overall discussion.  First, I have pointed out how Google can be open about its end search results/data without having to open its algorithms (via allowing metasearch and via port-to-Bing personalized search).  The problem is that doing so would disintermediate Google, and push them down from the top of the stack.  Why? Because now users could build all sorts of applications on top of Google search results, instead of going through the ad-filled Google interface.  So openness is a problem when it comes to making money.  I also think there is value in pointing out that, were the algorithms themselves to be opened up, it&#8217;s not like search is the only industry that has to contend with miscreants.  OS developers have to deal with hackers, content creators (musicians, authors) have to deal with bootleggers. So if you ask the OS to go open-source, and the musician to go DRM-free, what&#8217;s so different about asking the search engine to go open-algorithm?</p>
<p>Update 2: Also check out Chris Dixon&#8217;s post (<a href="http://cdixon.org/2009/12/22/google-should-open-source-what-actually-matters-their-search-ranking-algorithm/">http://cdixon.org/2009/12/22/google-should-open-source-what-actually-matters-their-search-ranking-algorithm/</a>) and Danny Sullivan&#8217;s comments (<a href="http://cdixon.org/2009/12/22/google-should-open-source-what-actually-matters-their-search-ranking-algorithm/#comment-27024421">http://cdixon.org/2009/12/22/google-should-open-source-what-actually-matters-their-search-ranking-algorithm/#comment-27024421</a>), both of which are quite similar in spirit to where I am coming from on this matter.  Also interesting is Harvard Business School Prof. Tom Eisenmann&#8217;s take (<a href="http://platformsandnetworks.blogspot.com/2009/12/googles-svp-product-management-jonathan.html">http://platformsandnetworks.blogspot.com/2009/12/googles-svp-product-management-jonathan.html</a>).  If discussion is what Google wants, discussion is what Google gets <img src='http://irgupf.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://irgupf.com/2009/12/22/google-and-the-meaning-of-open/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>
