Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3534678.3539325acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

Non-stationary A/B Tests

Published: 14 August 2022 Publication History

Abstract

A/B tests, also known as online controlled experiments, have been used at scale by data-driven enterprises to guide decisions and test innovative ideas. Meanwhile, nonstationarity, such as the time-of-day effect, can commonly arise in various business metrics. We show that inadequately addressing nonstationarity can cause A/B tests to be statistically inefficient or invalid, leading to wrong conclusions. To address these issues, we develop a new framework that provides appropriate modeling and adequate statistical analysis for nonstationary A/B tests. Without changing the infrastructure for any existing A/B test procedure, we propose a new estimator that views time as a continuous covariate to perform post stratification with a sample-dependent number of stratification levels. We prove central limit theorem in a natural limiting regime under nonstationarity, so that valid large-sample statistical inference is available. We show that the proposed estimator achieves the optimal asymptotic variance among all estimators. When the experiment design phase of an A/B test allows, we propose a new time-grouped randomization approach to make a better balance on treatment and control assignments in presence of time nonstationarity. A brief account of numerical experiments are conducted to illustrate the theoretical analysis.

Supplemental Material

MP4 File
A/B tests have been used at scale by data-driven enterprises to test innovative ideas to improve core business metrics. Non-stationarities, such as the time-of-day effect and the day-of-week effect, can often arise nonparametrically in key business metrics involving purchases, revenue, conversions, customer experiences, etc. We show that ignoring or inadequately addressing non-stationarities can cause standard A/B tests estimators to have sub-optimal variance and non-vanishing bias, therefore leading to loss of statistical efficiency and accuracy. We provide new estimators, prove central limit theorems, and prove them achieving optimal variance and asymptotic zero bias. A new time-grouped randomization design is proposed, under which simple estimators can achieve asymptotically optimal variance.

References

[1]
Søren Asmussen and Peter W Glynn. 2007. Stochastic Simulation: Algorithms and Analysis. Vol. 57. Springer Science & Business Media.
[2]
Peter L Cohen and Colin B Fogarty. 2020. No-harm calibration for generalized oaxaca-blinder estimators. arXiv preprint arXiv:2012.09246 (2020).
[3]
Alex Deng, Ya Xu, Ron Kohavi, and Toby Walker. 2013. Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. In Proceedings of the sixth ACM international conference onWeb search and data mining. 123--132.
[4]
Somit Gupta, Ronny Kohavi, Diane Tang, Ya Xu, Reid Andersen, Eytan Bakshy, Niall Cardin, Sumita Chandran, Nanyu Chen, Dominic Coey, et al. 2019. Top challenges from the first practical online controlled experiments summit. ACM SIGKDD Explorations Newsletter 21, 1 (2019), 20--35.
[5]
Jinyong Hahn. 1998. On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica (1998), 315--331.
[6]
Steven R Howard, Aaditya Ramdas, Jon McAuliffe, and Jasjeet Sekhon. 2021. Time-uniform, nonparametric, nonasymptotic confidence sequences. The Annals of Statistics 49, 2 (2021), 1055--1080.
[7]
Ramesh Johari, Pete Koomen, Leonid Pekelis, and David Walsh. 2017. Peeking at A/B tests: Why it matters, and what to do about it. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1517--1525.
[8]
Ramesh Johari, Pete Koomen, Leonid Pekelis, and David Walsh. 2021. Always valid inference: Continuous monitoring of A/B tests. Operations Research (2021).
[9]
Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Nils Pohlmann. 2013. Online controlled experiments at large scale. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 1168--1176.
[10]
Ron Kohavi and Roger Longbotham. 2017. Online Controlled Experiments and A/B Testing. Encyclopedia of machine learning and data mining 7, 8 (2017), 922--929.
[11]
Ron Kohavi, Diane Tang, and Ya Xu. 2020. Trustworthy online controlled experiments: A practical guide to A/B testing. Cambridge University Press.
[12]
Matt Taddy, Hedibert Freitas Lopes, and Matt Gardner. 2016. Scalable semiparametric inference for the means of heavy-tailed distributions. arXiv preprint arXiv:1602.08066 (2016).
[13]
Diane Tang, Ashish Agarwal, Deirdre O'Brien, and Mike Meyer. 2010. Overlapping experiment infrastructure: More, better, faster experimentation. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. 17--26.
[14]
Edward Wu and Johann A Gagnon-Bartsch. 2018. The loop estimator: Adjusting for covariates in randomized experiments. Evaluation review 42, 4 (2018), 458-- 488.
[15]
Huizhi Xie and Juliette Aurisset. 2016. Improving the sensitivity of online controlled experiments: Case studies at netflix. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 645-- 654.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2022
5033 pages
ISBN:9781450393850
DOI:10.1145/3534678
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2022

Check for updates

Author Tags

  1. a/b test
  2. bias correction
  3. central limit theorem
  4. non-stationarity
  5. statistical inference
  6. variance reduction

Qualifiers

  • Research-article

Conference

KDD '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)350
  • Downloads (Last 6 weeks)35
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Experimental Design through an Optimization LensSSRN Electronic Journal10.2139/ssrn.4780792Online publication date: 2024
  • (2024)A/B testingJournal of Systems and Software10.1016/j.jss.2024.112011211:COnline publication date: 2-Jul-2024
  • (2023)Fair adaptive experimentsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666962(19157-19169)Online publication date: 10-Dec-2023
  • (2023)An information-theoretic analysis of nonstationary bandit learningProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619442(24831-24849)Online publication date: 23-Jul-2023
  • (2023)Online Experiments with Diminishing Marginal EffectsSSRN Electronic Journal10.2139/ssrn.4640583Online publication date: 2023
  • (2023)Non-stationary Experimental Design under Structured TrendsSSRN Electronic Journal10.2139/ssrn.4514568Online publication date: 2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media