Search Engine Land has a short article on bias versus brands. The issue at hand is whether Google Instant has a brand bias. Google says it does not:
Singhal explains that when someone types in T, mathematically “most people typing T will go to Target. That’s the probability model. If you add R to it (“Tr”), most people are looking for a translation system. It’s actually just pure mathematical modeling.” It is just math, he says, not a bias.
Oh come on, now! What kind of explanation is that? There is no such thing as “just math”. There is always a conscious decision to use math in a particular way.
Let’s take as an example the classic information retrieval ranking function: tf * idf. Researchers have long known that it is important to rank documents by their “it’s just math” query term frequency with in a document. However, it is just as, if not more, important to correct for that internal term frequency by using global frequency statistics, such as (inverse) document frequency. The reason is that if you have a query such as [the table] and you do not correct for the collection-wide ubiquity of the word [the], your document rankings will be dominated by probabilities of [the] within a document. IDF values are used to correct this bias, and bring the term [table] to much higher prominence. Experimental results almost always show that tf * idf beats ranking by tf alone. (Aside: Even language models have a idf-like, global probability smoothing factor to correct for tf alone.)
Thus, to make a ranking algorithm truly useful, the mathematics have to be designed to account for the “it’s just math” probabilities. Without such correction, the algorithms are biased away from relevance. So to claim that brands dominate because “it’s just math” masks the deeper issue that existing bias within Google Instant isn’t being corrected and is propagated to the user.
At the risk of getting a little too geeky, I am reminded of the Asimov laws of robotics. The first and primary law is: “A robot may not injure a human being or, through inaction, allow a human being to come to harm.” Note that there are two edges of the “no harm” sword: No harm through commission, and no harm through omission. Both are required.
I think that there is a analogy to brand bias within search algorithms. In the search engine domain, the robot is the algorithm. Just like it is not enough for a robot to avoid committing acts of harm against a human, it is also not enough for the algorithm to throw up its hands, say “it’s just math” and therefore I haven’t actively, explicitly committed any bias in my rankings. No, the algorithm has to simultaneously be aware of the biases that arise through through inaction, or omission, as well.
Click-through probabilities are the tf, the component of the ranking algorithm that ensures true, unbiased, no-acts-of-commission probability. Where is idf, to balance out that math, to ensure that no acts of bias omission slip through? Without it there is still bias. “It’s just math” is not a proper defense.
Agree? Disagree?
Surely. I agree that just math is a lame argument. Note, however, that in a TF * IDF formula IDF corrects a local occurence of the word using the global score. What is global and and what is local in a query suggestion algorithm? Doesn’t it just select the most probabl query completion? That is, if most people type Target after they typed T, Google is just optimizing the average typing effort (averaged among users).
I don’t have access to all the statistics that Google does. But I would say that the analogous IDF in the query suggestion would correct against average behavior. I’m not an average user. Don’t use probabilities estimated from average behavior. Use something else, a different signal.
Or, it could have a prior probability that weighted against brands, so as to not encourage the feedback loop of one brand getting slightly more clicks, leading to more prominent auto-complete rank, leading to more clicks, leading to higher auto-complete rank, etc.
I’d say it’s “just a probability model”. Let’s assume Google’s engineers can count (not always easy on the web with all the duplication and robo-spam), and that the most frequent query starting with “T” is “Target” and with “Tr” is translate.
Technically, doing anything other than simple counting adds bias, though asymptotically things might still converge to the correct parameter estimates (MAP estimates with Bayesian priors are one example, MLE estimates of normal variances another; both are finite count biased but asymptotically correct). So I doubt what Google’s saying is biased in the statistical sense.
So the question is: what should they be doing? Should they be trying to bias the results away from the ones that are most popular? Of course — they want to suggest searches that work, not searches that are popular. Then things get a lot trickier, because you have to measure what works to some approximation. And what works can mean “what makes the user’s search life happier” or “what makes us most money through ad click through” or some balance of those factors and others.
Ok, perhaps not statistical bias. But there is a certain lack of iid, wouldn’t you say? The suggestions beget actions beget suggestions. There is loopy feedback in the whole process, something that an MLE doesn’t account for. And because of that loopiness, there needs to be a correction, an attempt at deloooping. If you’re mathematical model doesn’t take that into account, then you’ve got a problem. It appears that they don’t take it into account. Quoting again: “most people typing T will go to Target. That’s the probability model.” That’s the “more is more” approach to model estimation. So where’s the delooping?
I agree; the question now is to figure out what they should be doing. But we have to understand that if they don’t think that there is a problem to begin with, if “it’s just math”, then they’ll never consider any suggestions we might come up with, because they’ll never acknowledge that there is a problem.