research-article

Practical Aspects of Sensitivity in Online Experimentation with User Engagement Metrics

Authors:

Gleb GusevAuthors Info & Claims

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

Pages 763 - 772

https://doi.org/10.1145/2806416.2806496

Published: 17 October 2015 Publication History

Abstract

Online controlled experiments, e.g., A/B testing, is the state-of-the-art approach used by modern Internet companies to improve their services based on data-driven decisions. The most challenging problem is to define an appropriate online metric of user behavior, so-called Overall Evaluation Criterion (OEC), which is both interpretable and sensitive. A typical OEC consists of a key metric and an evaluation statistic. Sensitivity of an OEC to the treatment effect of an A/B test is measured by a statistical significance test. We introduce the notion of Overall Acceptance Criterion (OAC) that includes both the components of an OEC and a statistical significance test. While existing studies on A/B tests are mostly concentrated on the first component of an OAC, its key metric, we widely study the two latter ones by comparison of several statistics and several statistical tests with respect to user engagement metrics on hundreds of A/B experiments run on real users of Yandex. We discovered that the application of the state-of-the-art Student's t-tests to several main user engagement metrics may lead to an underestimation of the false-positive rate by an order of magnitude. We investigate both well-known and novel techniques to overcome this issue in practical settings. At last, we propose the entropy and the quantiles as novel OECs that reflect the diversity and extreme cases of user engagement.

References

[1]

E. Bakshy and D. Eckles. Uncertainty in online experiments with dependent data: An evaluation of bootstrap methods. In KDD'2013, pages 1303--1311, 2013.

Digital Library

[2]

C. Bandt and B. Pompe. Permutation entropy: a natural complexity measure for time series. Physical review letters, 88(17):174102, 2002.

[3]

S. Chakraborty, F. Radlinski, M. Shokouhi, and P. Baecke. On correlation of absence time and search effectiveness. In SIGIR'2014, pages 1163--1166, 2014.

Digital Library

[4]

T. Crook, B. Frasca, R. Kohavi, and R. Longbotham. Seven pitfalls to avoid when running controlled experiments on the web. In KDD'2009, pages 1105--1114, 2009.

Digital Library

[5]

A. Deng and V. Hu. Diluted treatment effect estimation for trigger analysis in online controlled experiments. In WSDM'2015, pages 349--358, 2015.

Digital Library

[6]

A. Deng, T. Li, and Y. Guo. Statistical inference in two-stage online controlled experiments with treatment selection and validation. In WWW'2014, pages 609--618, 2014.

Digital Library

[7]

A. Deng, Y. Xu, R. Kohavi, and T. Walker. Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. In WSDM'2013, pages 123--132, 2013.

Digital Library

[8]

A. Drutsa, G. Gusev, and P. Serdyukov. Engagement periodicity in search engine usage: analysis and its application to search quality evaluation. In WSDM'2015, pages 27--36, 2015.

Digital Library

[9]

G. Dupret and M. Lalmas. Absence time and user engagement: evaluating ranking functions. In WSDM'2013, pages 173--182, 2013.

Digital Library

[10]

J. H. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, 2001.

[11]

V. Hu, M. Stone, J. Pedersen, and R. W. White. Effects of search success on search engine re-use. In CIKM'2011, pages 1841--1846, 2011.

Digital Library

[12]

B. J. Jansen, A. Spink, and V. Kathuria. How to define searching sessions on web search engines. In Advances in Web Mining and Web Usage Analysis, pages 92--109. Springer, 2007.

Digital Library

[13]

R. Kohavi, T. Crook, R. Longbotham, B. Frasca, R. Henne, J. L. Ferres, and T. Melamed. Online experimentation at microsoft. Data Mining Case Studies, page 11, 2009.

[14]

R. Kohavi, A. Deng, B. Frasca, R. Longbotham, T. Walker, and Y. Xu. Trustworthy online controlled experiments: Five puzzling outcomes explained. In KDD'2012, pages 786--794, 2012.

Digital Library

[15]

R. Kohavi, A. Deng, B. Frasca, T. Walker, Y. Xu, and N. Pohlmann. Online controlled experiments at large scale. In KDD'2013, pages 1168--1176, 2013.

Digital Library

[16]

R. Kohavi, A. Deng, R. Longbotham, and Y. Xu. Seven rules of thumb for web site experimenters. In KDD'2014, 2014.

Digital Library

[17]

R. Kohavi, R. M. Henne, and D. Sommerfield. Practical guide to controlled experiments on the web: listen to your customers not to the hippo. In KDD'2007, pages 959--967, 2007.

Digital Library

[18]

R. Kohavi, R. Longbotham, D. Sommerfield, and R. M. Henne. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery, 18(1):140--181, 2009.

Digital Library

[19]

R. Kohavi, D. Messner, S. Eliot, J. L. Ferres, R. Henne, V. Kannappan, and J. Wang. Tracking users' clicks and submits: Tradeoffs between user experience and data loss, 2010.

[20]

J. Lehmann, M. Lalmas, G. Dupret, and R. Baeza-Yates. Online multitasking and user engagement. In CIKM'2013, pages 519--528, 2013.

Digital Library

[21]

J. Lehmann, M. Lalmas, E. Yom-Tov, and G. Dupret. Models of user engagement. In User Modeling, Adaptation, and Personalization, pages 164--175. Springer, 2012.

Digital Library

[22]

E. T. Peterson. Web analytics demystified: a marketer's guide to understanding how your web site affects your business. Ingram, 2004.

[23]

S. M. Pincus. Approximate entropy as a measure of system complexity. Proceedings of the National Academy of Sciences, 88(6):2297--2301, 1991.

[24]

J. S. Richman and J. R. Moorman. Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology, 278(6):H2039--H2049, 2000.

[25]

K. Rodden, H. Hutchinson, and X. Fu. Measuring the user experience on a large scale: user-centered metrics for web applications. In CHI'2010, pages 2395--2398, 2010.

Digital Library

[26]

C. E. Shannon. A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review, 5(1):3--55, 2001.

Digital Library

[27]

Y. Song, X. Shi, and X. Fu. Evaluating and predicting user engagement change with degraded search relevance. In WWW'2013, pages 1213--1224, 2013.

Digital Library

[28]

D. Tang, A. Agarwal, D. O'Brien, and M. Meyer. Overlapping experiment infrastructure: More, better, faster experimentation. In KDD'2010, pages 17--26, 2010.

Digital Library

[29]

W. W.-S. Wei. Time series analysis. Addison-Wesley Redwood City, California, 1994.

[30]

R. W. White, A. Kapoor, and S. T. Dumais. Modeling long-term search engine usage. In User Modeling, Adaptation, and Personalization, pages 28--39. Springer, 2010.

Digital Library

Cited By

Quin FWeyns DGalster MSilva C(2024)A/B testingJournal of Systems and Software10.1016/j.jss.2024.112011211:COnline publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1016/j.jss.2024.112011
Larsen NStallrich JSengupta SDeng AKohavi RStevens N(2023)Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing MethodologyThe American Statistician10.1080/00031305.2023.225723778:2(135-149)Online publication date: 18-Oct-2023
https://doi.org/10.1080/00031305.2023.2257237
Chandar PSt. Thomas BMaystre LPappu VSanchis-Ojeda RWu TCarterette BLalmas MJebara T(2022)Using Survival Models to Estimate User Engagement in Online ExperimentsProceedings of the ACM Web Conference 202210.1145/3485447.3512038(3186-3195)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3485447.3512038
Show More Cited By

Index Terms

Practical Aspects of Sensitivity in Online Experimentation with User Engagement Metrics
1. Computing methodologies
  1. Artificial intelligence
    1. Philosophical/theoretical foundations of artificial intelligence
      1. Cognitive science
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI design and evaluation methods

Recommendations

Tutorial on Online User Engagement: Metrics and Optimization
WWW '19: Companion Proceedings of The 2019 World Wide Web Conference

User engagement plays a central role in companies operating online services, such as search engines, news portals, e-commerce sites, entertainment services, and social networks. A main challenge is to leverage collected knowledge about the daily online ...
Consistent Transformation of Ratio Metrics for Efficient Online Controlled Experiments
WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining

We study ratio overall evaluation criteria (user behavior quality metrics) and, in particular, average values of non-user level metrics, that are widely used in A/B testing as an important part of modern Internet companies» evaluation instruments (e.g., ...
Tutorial on Metrics of User Engagement: Applications to News, Search and E-Commerce
WSDM '18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining

User engagement plays a central role in companies operating online services, such as search engines, news portals, e-commerce sites, and social networks. A main challenge is to leverage collected knowledge about the daily online behavior of millions of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management

October 2015

1998 pages

ISBN:9781450337946

DOI:10.1145/2806416

General Chairs:
James Bailey
The University of Melbourne
,
Alistair Moffat
The University of Melbourne
,
Program Chairs:
Charu C. Aggarwal
IBM
,
Maarten de Rijke
University of Amsterdam
,
Ravi Kumar
Google
,
Vanessa Murdock
Microsoft
,
Timos Sellis
RMIT University
,
Jeffrey Xu Yu
Chinese University of Hong Kong

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM'15

Sponsor:

CIKM'15: 24th ACM International Conference on Information and Knowledge Management

October 18 - 23, 2015

Melbourne, Australia

Acceptance Rates

CIKM '15 Paper Acceptance Rate 165 of 646 submissions, 26%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
410
Total Downloads

Downloads (Last 12 months)33
Downloads (Last 6 weeks)2

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Quin FWeyns DGalster MSilva C(2024)A/B testingJournal of Systems and Software10.1016/j.jss.2024.112011211:COnline publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1016/j.jss.2024.112011
Larsen NStallrich JSengupta SDeng AKohavi RStevens N(2023)Statistical Challenges in Online Controlled Experiments: A Review of A/B Testing MethodologyThe American Statistician10.1080/00031305.2023.225723778:2(135-149)Online publication date: 18-Oct-2023
https://doi.org/10.1080/00031305.2023.2257237
Chandar PSt. Thomas BMaystre LPappu VSanchis-Ojeda RWu TCarterette BLalmas MJebara T(2022)Using Survival Models to Estimate User Engagement in Online ExperimentsProceedings of the ACM Web Conference 202210.1145/3485447.3512038(3186-3195)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3485447.3512038
Nie KKong YYuan TBurke P(2020)Dealing with Ratio Metrics in A/B Testing at the Presence of Intra-user Correlation and SegmentsWeb Information Systems Engineering – WISE 202010.1007/978-3-030-62008-0_39(563-577)Online publication date: 20-Oct-2020
https://dl.acm.org/doi/10.1007/978-3-030-62008-0_39
Drutsa AGusev GKharitonov EKulemyakin DSerdyukov PYashkov IPiwowarski BChevalier MGaussier EMaarek YNie JScholer F(2019)Effective Online Evaluation for Web SearchProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331378(1399-1400)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3331184.3331378
Budylin RDrutsa AKatsev ITsoy VChang YZhai CLiu YMaarek Y(2018)Consistent Transformation of Ratio Metrics for Efficient Online Controlled ExperimentsProceedings of the Eleventh ACM International Conference on Web Search and Data Mining10.1145/3159652.3159699(55-63)Online publication date: 2-Feb-2018
https://dl.acm.org/doi/10.1145/3159652.3159699
Wirth T(2018)Das Experiment gestern und heute, oder: die normative Kraft des FaktischenQualität und Data Science in der Marktforschung10.1007/978-3-658-19660-8_14(217-241)Online publication date: 28-Apr-2018
https://doi.org/10.1007/978-3-658-19660-8_14
Machmouchi WAwadallah AZitouni IBuscher GLim EWinslett MSanderson MFu ASun JCulpepper SLo EHo JDonato DAgrawal RZheng YCastillo CSun ATseng VLi C(2017)Beyond Success RateProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3132850(757-765)Online publication date: 6-Nov-2017
https://dl.acm.org/doi/10.1145/3132847.3132850
Mehrotra RBhattacharya PKamps JKanoulas Ede Rijke MFang HYilmaz E(2017)Characterizing and Predicting Supply-side Engagement on Video Sharing Platforms Using a Hawkes Process ModelProceedings of the ACM SIGIR International Conference on Theory of Information Retrieval10.1145/3121050.3121077(159-166)Online publication date: 1-Oct-2017
https://dl.acm.org/doi/10.1145/3121050.3121077
Drutsa AGusev GSerdyukov PBarrett RCummings RAgichtein EGabrilovich E(2017)Using the Delay in a Treatment Effect to Improve Sensitivity and Preserve Directionality of Engagement Metrics in A/B ExperimentsProceedings of the 26th International Conference on World Wide Web10.1145/3038912.3052664(1301-1310)Online publication date: 3-Apr-2017
https://dl.acm.org/doi/10.1145/3038912.3052664
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten