I have more questions than I have answers. One of the topics that I know very little about, and on which I often seek clarification and wisdom, is A/B testing in the context of rapid iteration, rapid deployment online systems. So I’d like to ask a question of my readership (all four of you 😉 )
Suppose Feature B tests significantly better than A. You therefore roll out B. Furthermore, suppose later on that Feature C tests significantly better than B. You again roll out C. Now, suppose you then do an A/C test, and find that your original Feature A tests significantly better than C.
What do you do? Do you Pick A again, even though that’s where you started? Roll back to B, because that beats A? Stick with C, because that beats B?
I’ve worked on enough low-level retrieval algorithms to have seen things like this. Changes and improvements do not always monotonically increase. When you run into a loop like this, what do you do? Sure, it would be nice to come up with a Feature D that beats all of them. And in an offline system, with no real users depending on you minute-by-minute, you can take the research time to find D. But in the heat-of-the-moment online system, one in which rapid iteration is a highly valued process, which of A, B, or C do you roll out to your end users?