A relatively simple analysis of a standard map (a graph!) can provide the shortest route between two cities. But progressively more sophisticated analysis could be applied to richer information such as speed limits, expected traffic jams, roadworks and even weather conditions. In addition to the shortest route, measured as sheer distance, you could learn about the most scenic route, or the most fuel-efficient one, or the one which has the most rest areas. All these options, and more, can all be extracted from the graph and made useful — provided you have the right tools and inputs.
“Yes!” I thought. “Yes! I am finally starting to see a growing acknowledgement from one of the Search Majors that when you have a goal-oriented topic, to get from Point A to Point B, there isn’t just a single, most effective, most efficient route. A user might actually want to choose — explicitly choose via input tools — different pathways through all the potential waypoints.
And, by analogy to search results, a user might actually want to choose different ways of scrolling through a set of results other than the single-route, linear path ranked list, a topic that I’ve gone into at great detail in the past. The author continues:
The web graph is similar. The web contains billions of documents, and that number increases daily. To help you find what you need from that vast amount of information, Google extracts more than 200 signals from the web graph, ranging from the language of a webpage to the number and quality of other pages pointing to it.
“Ok,” I thought, “this is it! Here’s where we’re finally going to get the Google philosophy for providing different ways to traverse from Point A to Point B! They’ve held out publicly discussing this for years. One almost wonders if they even have a philosophy or strategy for opening up new routes! But they must, right? How very exciting!”
And then it hit:
In order to achieve that, we have created scalable infrastructure, named Pregel, to mine a wide range of graphs. In Pregel, programs are expressed as a sequence of iterations. In each iteration, a vertex can, independently of other vertices, receive messages sent to it in the previous iteration, send messages to other vertices, modify its own and its outgoing edges’ states, and mutate the graph’s topology (experts in parallel processing will recognize that the Bulk Synchronous Parallel Model inspired Pregel).
Sigh. That’s not what the article was about, after all.
If you’re an infrastructure geek, it’s a good read and I would recommend it. The “thinking like a vertex” spoilers that they give makes me believe that they’re taking a very “shallow Markov blanket” sort of approach to all of their machine learning, which is an approach I don’t completely disagree with, having found it interesting enough to dabble with a little. Or maybe they’re doing some sort of loopy belief propagation on a lot of their information structures. Not for me to speculate too much, I suppose.
But if you think it’s just as (if not more) important to discuss the philosophical underpinnings of something as socially, politically, and culturally important as information retrieval, this is not the beginning of that discussion. I was hoping to get more insight into why, with the myriad possibilities opened up by the 200 signals available to the researchers, the focus is essentially still on finding only one route from Point A to Point B. That dialogue will have to wait for another day.