Search Engine Land has a short article on bias versus brands. The issue at hand is whether Google Instant has a brand bias. Google says it does not:
Singhal explains that when someone types in T, mathematically “most people typing T will go to Target. That’s the probability model. If you add R to it (“Tr”), most people are looking for a translation system. It’s actually just pure mathematical modeling.” It is just math, he says, not a bias.
Oh come on, now! What kind of explanation is that? There is no such thing as “just math”. There is always a conscious decision to use math in a particular way.
Let’s take as an example the classic information retrieval ranking function: tf * idf. Researchers have long known that it is important to rank documents by their “it’s just math” query term frequency with in a document. However, it is just as, if not more, important to correct for that internal term frequency by using global frequency statistics, such as (inverse) document frequency. The reason is that if you have a query such as [the table] and you do not correct for the collection-wide ubiquity of the word [the], your document rankings will be dominated by probabilities of [the] within a document. IDF values are used to correct this bias, and bring the term [table] to much higher prominence. Experimental results almost always show that tf * idf beats ranking by tf alone. (Aside: Even language models have a idf-like, global probability smoothing factor to correct for tf alone.)
Thus, to make a ranking algorithm truly useful, the mathematics have to be designed to account for the “it’s just math” probabilities. Without such correction, the algorithms are biased away from relevance. So to claim that brands dominate because “it’s just math” masks the deeper issue that existing bias within Google Instant isn’t being corrected and is propagated to the user.
At the risk of getting a little too geeky, I am reminded of the Asimov laws of robotics. The first and primary law is: “A robot may not injure a human being or, through inaction, allow a human being to come to harm.” Note that there are two edges of the “no harm” sword: No harm through commission, and no harm through omission. Both are required.
I think that there is a analogy to brand bias within search algorithms. In the search engine domain, the robot is the algorithm. Just like it is not enough for a robot to avoid committing acts of harm against a human, it is also not enough for the algorithm to throw up its hands, say “it’s just math” and therefore I haven’t actively, explicitly committed any bias in my rankings. No, the algorithm has to simultaneously be aware of the biases that arise through through inaction, or omission, as well.
Click-through probabilities are the tf, the component of the ranking algorithm that ensures true, unbiased, no-acts-of-commission probability. Where is idf, to balance out that math, to ensure that no acts of bias omission slip through? Without it there is still bias. “It’s just math” is not a proper defense.