The internet connectivity of client software (e.g., apps running on phones and PCs), web sites, a... more The internet connectivity of client software (e.g., apps running on phones and PCs), web sites, and online services provide an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called A/B tests, split tests, randomized experiments, control/treatment tests, and online field experiments. Unlike most data mining techniques for finding correlational patterns, controlled experiments allow establishing a causal relationship with high probability. Experimenters can utilize the Scientific Method to form a hypothesis of the form “If a specific change is introduced, will it improve key metrics?” and evaluate it with real users. The theory of a controlled experiment dates back to Sir Ronald A. Fisher’s experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, and the topic of offline experiments is well developed in Statistics (Box 2005). Online Controlled Experiments started to be used in the late 1990s with the growth of the Internet. Today, many large sites, including Amazon, Bing, Facebook, Google, LinkedIn, and Yahoo! run thousands to tens of thousands of experiments each year testing user interface (UI) changes, enhancements to algorithms (search, ads, personalization, recommendation, etc.), changes to apps, content management system, etc. Online controlled experiments are now considered an indispensable tool, and their use is growing for startups and smaller websites. Controlled experiments are especially useful in combination with Agile software development (Martin 2008, Rubin 2012), Steve Blank’s Customer Development process (Blank 2005), and MVPs (Minimum Viable Products) popularized by Eric Ries’s Lean Startup (Ries 2011).
Web site owners, from small web sites to the largest properties that include Amazon, Facebook, Go... more Web site owners, from small web sites to the largest properties that include Amazon, Facebook, Google, LinkedIn, Microsoft, and Yahoo, attempt to improve their web sites, optimizing for criteria ranging from repeat usage, time on site, to revenue. Having been involved in running thousands of controlled experiments at Amazon, Booking.com, LinkedIn, and multiple Microsoft properties, we share seven rules of thumb for experimenters, which we have generalized from these experiments and their results. These are principles that we believe have broad applicability in web optimization and analytics outside of controlled experiments, yet they are not provably correct, and in some cases exceptions are known. To support these rules of thumb, we share multiple real examples, most being shared in a public paper for the first time. Some rules of thumb have previously been stated, such as “speed matters, ” but we describe the assumptions in the experimental design and share additional experiments ...
Web site owners, from small web sites to the largest properties that include Amazon, Facebook, Go... more Web site owners, from small web sites to the largest properties that include Amazon, Facebook, Google, LinkedIn, Microsoft, and Yahoo, attempt to improve their web sites, optimizing for criteria ranging from repeat usage, time on site, to revenue. Having been involved in running thousands of controlled experiments at Amazon, Booking.com, LinkedIn, and multiple Microsoft properties, we share seven rules of thumb for experimenters, which we have generalized from these experiments and their results. These are principles that we believe have broad applicability in web optimization and analytics outside of controlled experiments, yet they are not provably correct, and in some cases exceptions are known. To support these rules of thumb, we share multiple real examples, most being shared in a public paper for the first time. Some rules of thumb have previously been stated, such as “speed matters, ” but we describe the assumptions in the experimental design and share additional experiments ...
Organizations have used rigorous methodologies to identify the improvements necessary for remaini... more Organizations have used rigorous methodologies to identify the improvements necessary for remaining viable and competitive in today’s turbulent business environment. However, they have rarely used the same level of rigor in the implementation of the identified improvements. One organization benefited by using design of experiments to determine the best approach for implementing a difficult organization-wide change. Change is a reality of life. This is particularly true in the world of business where globalization, advances in technology, and increased competition at home and abroad have created a hostile and turbulent environment for most organizations. Change has become a mantra – change (improve) or die. Motivations for change have been customer satisfaction, cost reduction, improved efficiency, improved quality, or, in extreme cases, survival. In the past two decades, change/improvement intiatives have been driven by a plethora of approaches: ISO 9000, business process engineerin...
Controlled experiments, also called randomized experiments and A/B tests, have had a profound inf... more Controlled experiments, also called randomized experiments and A/B tests, have had a profound influence on multiple fields, including medicine, agriculture, manufacturing, and advertising. While the theoretical aspects of offline controlled experiments have been well studied and documented, the practical aspects of running them in online settings, such as web sites and services, are still being developed. As the usage of controlled experiments grows in these online settings, it is becoming more important to understand the opportunities and pitfalls one might face when using them in practice. A survey of online controlled experiments and lessons learned were previously documented in Controlled Experiments on the Web: Survey and Practical Guide (Kohavi, et al., 2009). In this follow-on paper, we focus on pitfalls we have seen after running numerous experiments at Microsoft. The pitfalls include a wide range of topics, such as assuming that common statistical formulas used to calculate...
Controlled experiments, also called randomized experiments and A/B tests, have had a profound inf... more Controlled experiments, also called randomized experiments and A/B tests, have had a profound influence on multiple fields, including medicine, agriculture, manufacturing, and advertising. Offline controlled experiments have been well studied and documented since Sir Ronald A. Fisher led the development of statistical experimental design while working at the Rothamsted Agricultural Experimental Station in England in the 1920s. With the growth of the world-wide-web and web services, online controlled experiments are being used frequently, utilizing software capabilities like ramp-up (exposure control) and running experiments on large server farms with millions of users. We share several real examples of unexpected results and lessons learned.
Controlled experiment has been used widely to support data driven decision making for on-line bus... more Controlled experiment has been used widely to support data driven decision making for on-line businesses. By applying appropriate randomization of the experiment units, causal inference can be established. The choice of the experiment unit for randomization can vary. User and page view are two mostly used units. Moreover, the analysis unit is sometimes different from the experiment unit. There are pros and cons in choosing which experiment unit to use and the choice affects the downstream statistical analysis. Generally for page level metrics, randomization by page will have an edge in power due to variance reduction. In this paper, we compare the two experiment units and provide a method to correctly analyze a page view randomization experiment in a two layer randomization framework.
The internet connectivity of client software (e.g., apps running on phones and PCs), web sites, a... more The internet connectivity of client software (e.g., apps running on phones and PCs), web sites, and online services provide an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called A/B tests, split tests, randomized experiments, control/treatment tests, and online field experiments. Unlike most data mining techniques for finding correlational patterns, controlled experiments allow establishing a causal relationship with high probability. Experimenters can utilize the Scientific Method to form a hypothesis of the form “If a specific change is introduced, will it improve key metrics?” and evaluate it with real users. The theory of a controlled experiment dates back to Sir Ronald A. Fisher’s experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, and the topic of offline experiments is well developed in Statistics (Box 2005). Online Controlled Experiments started to be used in the late 1990s with the growth of the...
The internet connectivity of client software (e.g., apps running on phones and PCs), web sites, a... more The internet connectivity of client software (e.g., apps running on phones and PCs), web sites, and online services provide an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called A/B tests, split tests, randomized experiments, control/treatment tests, and online field experiments. Unlike most data mining techniques for finding correlational patterns, controlled experiments allow establishing a causal relationship with high probability. Experimenters can utilize the Scientific Method to form a hypothesis of the form “If a specific change is introduced, will it improve key metrics?” and evaluate it with real users. The theory of a controlled experiment dates back to Sir Ronald A. Fisher’s experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, and the topic of offline experiments is well developed in Statistics (Box 2005). Online Controlled Experiments started to be used in the late 1990s with the growth of the Internet. Today, many large sites, including Amazon, Bing, Facebook, Google, LinkedIn, and Yahoo! run thousands to tens of thousands of experiments each year testing user interface (UI) changes, enhancements to algorithms (search, ads, personalization, recommendation, etc.), changes to apps, content management system, etc. Online controlled experiments are now considered an indispensable tool, and their use is growing for startups and smaller websites. Controlled experiments are especially useful in combination with Agile software development (Martin 2008, Rubin 2012), Steve Blank’s Customer Development process (Blank 2005), and MVPs (Minimum Viable Products) popularized by Eric Ries’s Lean Startup (Ries 2011).
Web site owners, from small web sites to the largest properties that include Amazon, Facebook, Go... more Web site owners, from small web sites to the largest properties that include Amazon, Facebook, Google, LinkedIn, Microsoft, and Yahoo, attempt to improve their web sites, optimizing for criteria ranging from repeat usage, time on site, to revenue. Having been involved in running thousands of controlled experiments at Amazon, Booking.com, LinkedIn, and multiple Microsoft properties, we share seven rules of thumb for experimenters, which we have generalized from these experiments and their results. These are principles that we believe have broad applicability in web optimization and analytics outside of controlled experiments, yet they are not provably correct, and in some cases exceptions are known. To support these rules of thumb, we share multiple real examples, most being shared in a public paper for the first time. Some rules of thumb have previously been stated, such as “speed matters, ” but we describe the assumptions in the experimental design and share additional experiments ...
Web site owners, from small web sites to the largest properties that include Amazon, Facebook, Go... more Web site owners, from small web sites to the largest properties that include Amazon, Facebook, Google, LinkedIn, Microsoft, and Yahoo, attempt to improve their web sites, optimizing for criteria ranging from repeat usage, time on site, to revenue. Having been involved in running thousands of controlled experiments at Amazon, Booking.com, LinkedIn, and multiple Microsoft properties, we share seven rules of thumb for experimenters, which we have generalized from these experiments and their results. These are principles that we believe have broad applicability in web optimization and analytics outside of controlled experiments, yet they are not provably correct, and in some cases exceptions are known. To support these rules of thumb, we share multiple real examples, most being shared in a public paper for the first time. Some rules of thumb have previously been stated, such as “speed matters, ” but we describe the assumptions in the experimental design and share additional experiments ...
Organizations have used rigorous methodologies to identify the improvements necessary for remaini... more Organizations have used rigorous methodologies to identify the improvements necessary for remaining viable and competitive in today’s turbulent business environment. However, they have rarely used the same level of rigor in the implementation of the identified improvements. One organization benefited by using design of experiments to determine the best approach for implementing a difficult organization-wide change. Change is a reality of life. This is particularly true in the world of business where globalization, advances in technology, and increased competition at home and abroad have created a hostile and turbulent environment for most organizations. Change has become a mantra – change (improve) or die. Motivations for change have been customer satisfaction, cost reduction, improved efficiency, improved quality, or, in extreme cases, survival. In the past two decades, change/improvement intiatives have been driven by a plethora of approaches: ISO 9000, business process engineerin...
Controlled experiments, also called randomized experiments and A/B tests, have had a profound inf... more Controlled experiments, also called randomized experiments and A/B tests, have had a profound influence on multiple fields, including medicine, agriculture, manufacturing, and advertising. While the theoretical aspects of offline controlled experiments have been well studied and documented, the practical aspects of running them in online settings, such as web sites and services, are still being developed. As the usage of controlled experiments grows in these online settings, it is becoming more important to understand the opportunities and pitfalls one might face when using them in practice. A survey of online controlled experiments and lessons learned were previously documented in Controlled Experiments on the Web: Survey and Practical Guide (Kohavi, et al., 2009). In this follow-on paper, we focus on pitfalls we have seen after running numerous experiments at Microsoft. The pitfalls include a wide range of topics, such as assuming that common statistical formulas used to calculate...
Controlled experiments, also called randomized experiments and A/B tests, have had a profound inf... more Controlled experiments, also called randomized experiments and A/B tests, have had a profound influence on multiple fields, including medicine, agriculture, manufacturing, and advertising. Offline controlled experiments have been well studied and documented since Sir Ronald A. Fisher led the development of statistical experimental design while working at the Rothamsted Agricultural Experimental Station in England in the 1920s. With the growth of the world-wide-web and web services, online controlled experiments are being used frequently, utilizing software capabilities like ramp-up (exposure control) and running experiments on large server farms with millions of users. We share several real examples of unexpected results and lessons learned.
Controlled experiment has been used widely to support data driven decision making for on-line bus... more Controlled experiment has been used widely to support data driven decision making for on-line businesses. By applying appropriate randomization of the experiment units, causal inference can be established. The choice of the experiment unit for randomization can vary. User and page view are two mostly used units. Moreover, the analysis unit is sometimes different from the experiment unit. There are pros and cons in choosing which experiment unit to use and the choice affects the downstream statistical analysis. Generally for page level metrics, randomization by page will have an edge in power due to variance reduction. In this paper, we compare the two experiment units and provide a method to correctly analyze a page view randomization experiment in a two layer randomization framework.
The internet connectivity of client software (e.g., apps running on phones and PCs), web sites, a... more The internet connectivity of client software (e.g., apps running on phones and PCs), web sites, and online services provide an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called A/B tests, split tests, randomized experiments, control/treatment tests, and online field experiments. Unlike most data mining techniques for finding correlational patterns, controlled experiments allow establishing a causal relationship with high probability. Experimenters can utilize the Scientific Method to form a hypothesis of the form “If a specific change is introduced, will it improve key metrics?” and evaluate it with real users. The theory of a controlled experiment dates back to Sir Ronald A. Fisher’s experiments at the Rothamsted Agricultural Experimental Station in England in the 1920s, and the topic of offline experiments is well developed in Statistics (Box 2005). Online Controlled Experiments started to be used in the late 1990s with the growth of the...
Uploads