Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3289600.3291000acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

How A/B Tests Could Go Wrong: Automatic Diagnosis of Invalid Online Experiments

Published: 30 January 2019 Publication History

Abstract

We have seen a massive growth of online experiments at Internet companies. Although conceptually simple, A/B tests can easily go wrong in the hands of inexperienced users and on an A/B testing platform with little governance. An invalid A/B test hurts the business by leading to non-optimal decisions. Therefore, it is now more important than ever to create an intelligent A/B platform that democratizes A/B testing and allows everyone to make quality decisions through built-in detection and diagnosis of invalid tests. In this paper, we share how we mined through historical A/B tests and identified the most common causes for invalid tests, ranging from biased design, self-selection bias to attempting to generalize A/B test result beyond the experiment population and time frame. Furthermore, we also developed scalable algorithms to automatically detect invalid A/B tests and diagnose the root cause of invalidity. Surfacing up invalidity not only improved decision quality, but also served as a user education and reduced problematic experiment designs in the long run.

References

[1]
{n. d.}. Cross Promotion. https://en.wikipedia.org/wiki/Cross-promotion
[2]
{n. d.}. Detecting and Avoiding Bucket Imbalance in A/B Tests.
[3]
Eytan Bakshy, Dean Eckles, and Michael S Bernstein. 2014. Designing and deploying online field experiments. In Proceedings of the 23rd international conference on World wide web. ACM, 283--292.
[4]
George EP Box, J Stuart Hunter, and William Gordon Hunter. 2005. Statistics for experimenters: design, innovation, and discovery. Vol. 2. Wiley-Interscience New York.
[5]
Thomas Crook, Brian Frasca, Ron Kohavi, and Roger Longbotham. 2009. Seven pitfalls to avoid when running controlled experiments on the web. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1105--1114.
[6]
pages = 349--358 title = Diluted Treatment Effect Estimation for Trigger Analysis in Online Controlled Experiments year = 2015 Deng, Alex; Hu, Victor, booktitle = Proceedings of the Eighth ACM International Conference on Web Search and Data Mining - WSDM '15. {n. d.}.
[7]
Pavel Dmitriev, Somit Gupta, Dong Woo Kim, and Garnet Vaz. 2017. A Dirty Dozen: Twelve Common Metric Interpretation Pitfalls in Online Controlled Experiments. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1427--1436.
[8]
Alan S Gerber and Donald P Green. 2012. Field experiments: Design, analysis, and interpretation. WW Norton.
[9]
Henning Hohnhold, Deirdre O'Brien, and Diane Tang. 2015. Focusing on the Long-term: It's Good for Users and Business. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1849--1858.
[10]
Ron Kohavi, Alex Deng, Brian Frasca, Roger Longbotham, Toby Walker, and Ya Xu. 2012. Trustworthy online controlled experiments: Five puzzling outcomes explained. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 786--794.
[11]
Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Nils Pohlmann. 2013. Online controlled experiments at large scale. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1168--1176.
[12]
Ron Kohavi, Alex Deng, Roger Longbotham, and Ya Xu. 2014. Seven rules of thumb for web site experimenters. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '14.
[13]
Ron Kohavi and Roger Longbotham. 2017. Online controlled experiments and a/b testing. In Encyclopedia of machine learning and data mining. Springer, 922--929.
[14]
Jon NK Rao and Alastair J Scott. 1981. The analysis of categorical data from complex sample surveys: chi-squared tests for goodness of fit and independence in two-way tables. Journal of the American statistical association 76, 374 (1981), 221--230.
[15]
Paul R Rosenbaum and Donald B Rubin. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70, 1 (1983), 41--55.
[16]
Paul R. Rosenbaum and Donald B. Rubin. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika (1983). arXiv:http://www.jstor.org/stable/2335942
[17]
Donald B Rubin. 1974. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology 66, 5 (1974), 688.
[18]
Martin Saveski, Jean Pouget-Abadie, Guillaume Saint-Jacques, Weitao Duan, Souvik Ghosh, Ya Xu, and Edoardo M Airoldi. 2017. Detecting network effects: Randomizing over randomized experiments. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, 1027--1035.
[19]
Diane Tang, Ashish Agarwal, Deirdre O'Brien, and Mike Meyer. 2010. Overlapping experiment infrastructure: More, better, faster experimentation. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 17--26.
[20]
Ya Xu and Nanyu Chen. 2016. Evaluating mobile apps with a/b and quasi a/b tests. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 313--322.
[21]
Ya Xu, Nanyu Chen, Addrian Fernandez, Omar Sinno, and Anmol Bhasin. 2015. From Infrastructure to Culture. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '15.
[22]
Ya Xu, Weitao Duan, and Shaochen Huang. 2018. SQR: Balancing Speed, Quality and Risk in Online Experiments. arXiv preprint arXiv:1801.08532 (2018).

Cited By

View all
  • (2024)Automating Pipelines of A/B Tests with Population Split Using Self-Adaptation and Machine LearningProceedings of the 19th International Symposium on Software Engineering for Adaptive and Self-Managing Systems10.1145/3643915.3644087(84-97)Online publication date: 15-Apr-2024
  • (2024)When Search Engine Services meet Large Language Models: Visions and ChallengesIEEE Transactions on Services Computing10.1109/TSC.2024.3451185(1-23)Online publication date: 2024
  • (2024)A/B testingJournal of Systems and Software10.1016/j.jss.2024.112011211:COnline publication date: 2-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '19: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining
January 2019
874 pages
ISBN:9781450359405
DOI:10.1145/3289600
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 January 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. a/b testing
  2. algorithms
  3. automatic decision making
  4. causal inference
  5. controlled experiment
  6. experimentation

Qualifiers

  • Research-article

Conference

WSDM '19

Acceptance Rates

WSDM '19 Paper Acceptance Rate 84 of 511 submissions, 16%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)55
  • Downloads (Last 6 weeks)8
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Automating Pipelines of A/B Tests with Population Split Using Self-Adaptation and Machine LearningProceedings of the 19th International Symposium on Software Engineering for Adaptive and Self-Managing Systems10.1145/3643915.3644087(84-97)Online publication date: 15-Apr-2024
  • (2024)When Search Engine Services meet Large Language Models: Visions and ChallengesIEEE Transactions on Services Computing10.1109/TSC.2024.3451185(1-23)Online publication date: 2024
  • (2024)A/B testingJournal of Systems and Software10.1016/j.jss.2024.112011211:COnline publication date: 2-Jul-2024
  • (2023)The Price is Right: Removing A/B Test Bias in a Marketplace of Expirable GoodsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615502(4681-4687)Online publication date: 21-Oct-2023
  • (2023)Online Controlled Experiments and A/B TestsEncyclopedia of Machine Learning and Data Science10.1007/978-1-4899-7502-7_891-2(1-13)Online publication date: 8-Mar-2023
  • (2022)Ensure A/B Test Quality at Scale with Automated Randomization Validation and Sample Ratio Mismatch DetectionProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557087(3391-3399)Online publication date: 17-Oct-2022
  • (2022)Novelty and Primacy: A Long-Term Estimator for Online ExperimentsTechnometrics10.1080/00401706.2022.212430964:4(524-534)Online publication date: 8-Nov-2022
  • (2020)PiranhaProceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice10.1145/3377813.3381350(221-230)Online publication date: 27-Jun-2020
  • (2020)Batch Mode Active Learning for Individual Treatment Effect Estimation2020 International Conference on Data Mining Workshops (ICDMW)10.1109/ICDMW51313.2020.00123(859-866)Online publication date: Nov-2020
  • (2020)Trustworthy Online Controlled Experiments10.1017/9781108653985Online publication date: 13-Mar-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media