research-article

Seven pitfalls to avoid when running controlled experiments on the web

Authors:

Roger LongbothamAuthors Info & Claims

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 1105 - 1114

https://doi.org/10.1145/1557019.1557139

Published: 28 June 2009 Publication History

Abstract

Controlled experiments, also called randomized experiments and A/B tests, have had a profound influence on multiple fields, including medicine, agriculture, manufacturing, and advertising. While the theoretical aspects of offline controlled experiments have been well studied and documented, the practical aspects of running them in online settings, such as web sites and services, are still being developed. As the usage of controlled experiments grows in these online settings, it is becoming more important to understand the opportunities and pitfalls one might face when using them in practice. A survey of online controlled experiments and lessons learned were previously documented in Controlled Experiments on the Web: Survey and Practical Guide (Kohavi, et al., 2009). In this follow-on paper, we focus on pitfalls we have seen after running numerous experiments at Microsoft. The pitfalls include a wide range of topics, such as assuming that common statistical formulas used to calculate standard deviation and statistical power can be applied and ignoring robots in analysis (a problem unique to online settings). Online experiments allow for techniques like gradual ramp-up of treatments to avoid the possibility of exposing many customers to a bad (e.g., buggy) Treatment. With that ability, we discovered that it's easy to incorrectly identify the winning Treatment because of Simpson's paradox.

Supplementary Material

JPG File (p1105-crook.jpg)

Download
13.59 KB

MP4 File (p1105-crook.mp4)

Download
86.06 MB

References

[1]

Bacher, Paul, et al. 2005. Know your Enemy: Tracking Botnets. The Honeynet Project. {Online} March 13, 2005. http://www.honeynet.org/papers/bots/.

[2]

Bomhardt, Christian, Gaul, Wolfgang and Schmidt-Thieme, Lars. 2005. Web Robot Detection - Preprocessing Web Logfiles for Robot Detection. {book auth.} Maurizio Vichi, et al. New Developments in Classification and Data Analysis. s.l. : Springer, 2005.

[3]

Box, George E.P., Hunter, J Stuart and Hunter, William G. 2005. Statistics for Experimenters: Design, Innovation, and Discovery. 2nd. s.l. : John Wiley&Sons, Inc, 2005. 0471718130.

[4]

Claypool, Mark, et al. 2001. Inferring user interest. IEEE Internet Computing. 2001, Vol. 5, pp. 32--39. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.5967.

Digital Library

[5]

Efron, Bradley and Robert J. Tibshirani. 1993. An Introduction to the Bootstrap. New York : Chapman&Hall, 1993. 0-412-04231-2.

[6]

Fieller, E C. 1940. The Biological Standardization of Insulin. Supplement to the Journal of the Royal Statistical Society. 1940, Vol. 7, 1, pp. 1--64.

[7]

Fox, Steve, et al. 2005. Evaluating implicit measures to improve web search. ACM Transactions on Information Systems (TOIS). 2005, Vol. 23, 2, pp. 147--168. http://portal.acm.org/citation.cfm?id=1059981.1059982.

Digital Library

[8]

Hill, Nigel, Roche, Greg and Allen, Rachel. 2007. Customer Satisfaction: The Customer Experience Through the Customer's Eyes. s.l. : Cogent Publishing, 2007.

[9]

Hopkins, Claude. 1923. Scientific Advertising. New York City : Crown Publishers Inc., 1923.

[10]

Keppel, Geoffrey, Saufley, William H and Tokunaga, Howard. 1992. Introduction to Design and Analysis. 2nd. s.l. : W.H. Freeman and Company, 1992.

[11]

Kohavi, Ron, et al. 2009. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery. February 2009, Vol. 18, 1, pp. 140--181. http://exp-platform.com/hippo_long.aspx.

Digital Library

[12]

Kohavi, Ron, et al. 2004. Lessons and Challenges from Mining Retail E-Commerce Data. 2004, Vol. 57, 1-2, pp. 83--113. http://ai.stanford.edu/~ronnyk/lessonsInDM.pdf.

Digital Library

[13]

Kohavi, Ron, Henne, Randal M and Sommerfield, Dan. 2007. Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO. The Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007). August 2007, pp. 959--967. http://exp-platform.com/hippo.aspx.

Digital Library

[14]

Koselka, Rita. 1996. The New Mantra: MVT. Forbes. March 11, 1996, pp. 114--118.

[15]

Malinas, Gary and Bigelow, John. 2004. Simpson's Paradox. Stanford Encyclopedia of Philosophy. {Online} 2004. {Cited: February 28, 2008.} http://plato.stanford.edu/entries/paradox-simpson/.

[16]

Mason, Robert L, Gunst, Richard F and Hess, James L. 1989. Statistical Design and Analysis of Experiments With Applications to Engineering and Science. s.l. : John Wiley&Sons, 1989. 047185364X .

[17]

Montgomery, Douglas C. 2005. Design and Analysis of Experiments. 6th edition. s.l. : John Wiley&Sons, Inc, 2005. 0-471-66159-7.

Digital Library

[18]

Rao, C. Radhakrishna. 1973. Linear Statistical Inference and Its Applications. 2nd. s.l. : John Wiley&Sons, Inc., 1973.

[19]

Roy, Ranjit K. 2001. Design of Experiments using the Taguchi Approach : 16 Steps to Product and Process Improvement. s.l. : John Wiley&Sons, Inc, 2001. 0-471-36101-1.

[20]

Simpson, Edward H. 1951. The Interpretation of Interaction in Contingency Tables. Journal of the Royal Statistical Society, Ser. B. 1951, Vol. 13, pp. 238--241.

[21]

Spears, Steven J. 2004. Learning to Lead at Toyota. Harvard Business Review. May 2004, pp. 78--86.

[22]

Tan, Pang-Ning and Kumar, Vipin. 2002. Discovery of Web Robot Sessions based on their Navigational Patterns. Data Mining and Knowledge. 2002, Vol. 6, 1, pp. 9--35. http://citeseer.ist.psu.edu/article/tan02discovery.html.

Digital Library

[23]

Wikipedia: Botnet. 2008. Botnet. Wikipedia. {Online} 2008. {Cited: February 28, 2008.} http://en.wikipedia.org/wiki/Botnet.

[24]

Wikipedia: Internet bot. 2008. Internet Bot. Wikipedia. {Online} 2008. {Cited: February 28, 2008.} http://en.wikipedia.org/wiki/Internet_bot.

[25]

Wikipedia: Simpson's Paradox. 2008. Simpson's paradox. Wikipedia. {Online} 2008. {Cited: February 28, 2008.} http://en.wikipedia.org/wiki/Simpson%27s_paradox.

Cited By

Mahajan P(2024)Cost-Effective A/B Testing: Leveraging Go and Python for Efficient Experimentation in Hermes Testing Platform2024 10th International Conference on Communication and Signal Processing (ICCSP)10.1109/ICCSP60870.2024.10543437(1048-1050)Online publication date: 12-Apr-2024
https://doi.org/10.1109/ICCSP60870.2024.10543437
Larsen NStallrich JSengupta SDeng AKohavi RStevens N(2023)Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing MethodologyThe American Statistician10.1080/00031305.2023.225723778:2(135-149)Online publication date: 18-Oct-2023
https://doi.org/10.1080/00031305.2023.2257237
Erthal Vde Souza Bdos Santos PTravassos G(2023)Characterization of continuous experimentation in software engineering: Expressions, models, and strategiesScience of Computer Programming10.1016/j.scico.2023.102961229(102961)Online publication date: Jul-2023
https://doi.org/10.1016/j.scico.2023.102961
Show More Cited By

Index Terms

Seven pitfalls to avoid when running controlled experiments on the web
1. Computing methodologies
  1. Machine learning
    1. Learning settings
    2. Machine learning approaches
      1. Logical and relational learning
        Inductive logic learning
2. Mathematics of computing
  1. Probability and statistics
    1. Statistical paradigms
      1. Exploratory data analysis

Recommendations

Online controlled experiments at large scale
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Web-facing companies, including Amazon, eBay, Etsy, Facebook, Google, Groupon, Intuit, LinkedIn, Microsoft, Netflix, Shop Direct, StumbleUpon, Yahoo, and Zynga use online controlled experiments to guide product development and accelerate innovation. At ...
Seven rules of thumb for web site experimenters
KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

Web site owners, from small web sites to the largest properties that include Amazon, Facebook, Google, LinkedIn, Microsoft, and Yahoo, attempt to improve their web sites, optimizing for criteria ranging from repeat usage, time on site, to revenue. ...
Practical guide to controlled experiments on the web: listen to your customers not to the hippo
KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining

The web provides an unprecedented opportunity to evaluate ideas quickly using controlled experiments, also called randomized experiments (single factor or factorial designs), A/B tests (and their generalizations), split tests, Control/Treatment tests, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

June 2009

1426 pages

ISBN:9781605584959

DOI:10.1145/1557019

General Chairs:
John Elder
Elder Research, Inc., USA
,
Françoise Soulié Fogelman
KXEN, France
,
Program Chairs:
Peter Flach
University of Bristol, UK
,
Mohammed Zaki
RPI, USA

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD09

Sponsor:

KDD09: The 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

June 28 - July 1, 2009

Paris, France

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '24

Sponsor:
sigkdd
sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

100
Total Citations
View Citations
1,139
Total Downloads

Downloads (Last 12 months)43
Downloads (Last 6 weeks)6

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mahajan P(2024)Cost-Effective A/B Testing: Leveraging Go and Python for Efficient Experimentation in Hermes Testing Platform2024 10th International Conference on Communication and Signal Processing (ICCSP)10.1109/ICCSP60870.2024.10543437(1048-1050)Online publication date: 12-Apr-2024
https://doi.org/10.1109/ICCSP60870.2024.10543437
Larsen NStallrich JSengupta SDeng AKohavi RStevens N(2023)Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing MethodologyThe American Statistician10.1080/00031305.2023.225723778:2(135-149)Online publication date: 18-Oct-2023
https://doi.org/10.1080/00031305.2023.2257237
Erthal Vde Souza Bdos Santos PTravassos G(2023)Characterization of continuous experimentation in software engineering: Expressions, models, and strategiesScience of Computer Programming10.1016/j.scico.2023.102961229(102961)Online publication date: Jul-2023
https://doi.org/10.1016/j.scico.2023.102961
Kohavi RLongbotham R(2023)Online Controlled Experiments and A/B TestsEncyclopedia of Machine Learning and Data Science10.1007/978-1-4899-7502-7_891-2(1-13)Online publication date: 8-Mar-2023
https://doi.org/10.1007/978-1-4899-7502-7_891-2
Issa Mattos DDakkak ABosch JOlsson H(2023)The HURRIER process for experimentation in business‐to‐business mission‐critical systemsJournal of Software: Evolution and Process10.1002/smr.239035:5Online publication date: 25-Apr-2023
https://dl.acm.org/doi/10.1002/smr.2390
Zangerle EBauer C(2022)Evaluating Recommender Systems: Survey and FrameworkACM Computing Surveys10.1145/355653655:8(1-38)Online publication date: 23-Dec-2022
https://dl.acm.org/doi/10.1145/3556536
Kohavi RDeng AVermeer LZhang ARangwala H(2022)A/B Testing Intuition BustersProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539160(3168-3177)Online publication date: 14-Aug-2022
https://dl.acm.org/doi/10.1145/3534678.3539160
He YYu LChen MChoi WMatheson D(2022)A Cluster-Based Nearest Neighbor Matching Algorithm for Enhanced A/A Validation in Online ExperimentationCompanion Proceedings of the Web Conference 202210.1145/3487553.3524220(136-140)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3487553.3524220
Staron M(2022)Introduction to the Metrics ThemeAccelerating Digital Transformation10.1007/978-3-031-10873-0_9(155-161)Online publication date: 20-Oct-2022
https://doi.org/10.1007/978-3-031-10873-0_9
Jongeling RCiccozzi FCicchetti ACarlson J(2022)Chapter 6 Lightweight Consistency Checking for Agile Model-Based Development in PracticeAccelerating Digital Transformation10.1007/978-3-031-10873-0_8(131-151)Online publication date: 20-Oct-2022
https://doi.org/10.1007/978-3-031-10873-0_8
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents