Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3485447.3512038acmconferencesArticle/Chapter ViewAbstractPublication PageswebconfConference Proceedingsconference-collections
research-article

Using Survival Models to Estimate User Engagement in Online Experiments

Published: 25 April 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Online controlled experiments, in which different variants of a product are compared based on an Overall Evaluation Criterion (OEC), have emerged as a gold standard for decision making in online services. It is vital that the OEC is aligned with the overall goal of stakeholders for effective decision making. However, this is a challenge when the overall goal is not immediately observable. For instance, we might want to understand the effect of deploying a feature on long-term retention, where the outcome (retention) is not observable at the end of an A/B test.
    In this work, we examine long-term user engagement outcomes as a time-to-event problem and demonstrate the use of survival models for estimating long-term effects. We then discuss the practical challenges in using time-to-event metrics for decision making in online experiments. We propose a simple churn-based time-to-inactivity metric and describe a framework for developing & validating modeled metrics using survival models for predicting long-term retention. Then, we present a case study and provide practical guidelines on developing and evaluating a time-to-churn metric on a large scale real-world dataset of online experiments. Finally, we compare the proposed approach to existing alternatives in terms of sensitivity and directionality.

    References

    [1]
    J. Ahn, J. Hwang, D. Kim, H. Choi, and S. Kang. 2020. A Survey on Churn Analysis in Various Business Domains. IEEE Access 8(2020), 220816–220839.
    [2]
    Laura Antolini, Patrizia Boracchi, and Elia Biganzoli. 2005. A time-dependent discrimination index for survival data. Statistics in Medicine 24, 24 (2005), 3927–3944.
    [3]
    Susan Athey, Raj Chetty, Guido W Imbens, and Hyunseung Kang. 2019. The Surrogate Index: Combining Short-Term Proxies to Estimate Long-Term Treatment Effects More Rapidly and Precisely. Technical Report 26463. National Bureau of Economic Research.
    [4]
    Albert C. Chen and Xin Fu. 2017. Data + Intuition: A Hybrid Approach to Developing Product North Star Metrics. In Proc. of WWW. 617–625.
    [5]
    D. R. Cox. 1972. Regression Models and Life-Tables. Journal of the Royal Statistical Society: Series B (Methodological) 34, 2(1972), 187–202.
    [6]
    D. R. Cox. 1975. Partial likelihood. Biometrika 62, 2 (1975), 269–276.
    [7]
    Alex Deng. 2015. Objective Bayesian Two Sample Hypothesis Testing for Online Controlled Experiments. In Proc. of WWW. 923–928.
    [8]
    Alex Deng and Xiaolin Shi. 2016. Data-driven metric development for online controlled experiments: Seven lessons learned. In Proc. of KDD. 77–86.
    [9]
    Alex Deng, Ya Xu, Ron Kohavi, and Toby Walker. 2013. Improving the sensitivity of online controlled experiments by utilizing pre-experiment data. In Proc. of WSDM. 123–132.
    [10]
    Drew Dimmery, Eytan Bakshy, and Jasjeet Sekhon. 2019. Shrinkage estimators in online experiments. In Proc. of KDD. 2914–2922.
    [11]
    Pavel Dmitriev, Brian Frasca, Somit Gupta, Ron Kohavi, and Garnet Vaz. 2016. Pitfalls of long-term online controlled experiments. In 2016 IEEE international conference on big data (big data). 1367–1376.
    [12]
    Pavel Dmitriev, Somit Gupta, Dong Woo Kim, and Garnet Vaz. 2017. A dirty dozen: twelve common metric interpretation pitfalls in online controlled experiments. In Proc, of KDD. 1427–1436.
    [13]
    Pavel Dmitriev and Xian Wu. 2016. Measuring metrics. In Proc. of CIKM. 429–437.
    [14]
    Alexey Drutsa, Gleb Gusev, and Pavel Serdyukov. 2015. Future user engagement prediction and its application to improve the sensitivity of online experiments. In Proc. of WWW. 256–266.
    [15]
    Alexey Drutsa, Gleb Gusev, and Pavel Serdyukov. 2017. Using the Delay in a Treatment Effect to Improve Sensitivity and Preserve Directionality of Engagement Metrics in A/B Experiments. In Proc. of WWW. 1301–1310.
    [16]
    Alexey Drutsa, Anna Ufliand, and Gleb Gusev. 2015. Practical aspects of sensitivity in online experimentation with user engagement metrics. In Proc. of CIKM. 763–772.
    [17]
    Weitao Duan, Shan Ba, and Chunzhe Zhang. 2021. Online Experimentation with Surrogate Metrics: Guidelines and a Case Study(WSDM ’21). 193–201.
    [18]
    Georges Dupret and Mounia Lalmas. 2013. Absence time and user engagement: evaluating ranking functions. In Proceedings of the sixth ACM international conference on Web search and data mining. 173–182.
    [19]
    Cameron Davidson-Pilon et al. 2021. CamDavidsonPilon/lifelines: v0.25.8.
    [20]
    Aleksander Fabijan, Jayant Gupchup, Somit Gupta, Jeff Omhover, Wen Qin, Lukas Vermeer, and Pavel Dmitriev. 2019. Diagnosing sample ratio mismatch in online controlled experiments: a taxonomy and rules of thumb for practitioners. In Proc. of KDD. 2156–2164.
    [21]
    Peter S Fader and Bruce GS Hardie. 2007. How to project customer retention. Journal of Interactive Marketing 21, 1 (2007), 76–90.
    [22]
    T. Gerds, M. Kattan, M. Schumacher, and C. Yu. 2013. Estimating a time-dependent concordance index for survival prediction models with covariate dependent censoring.Statistics in medicine 32 13 (2013), 2173–84.
    [23]
    Nicolas Glady, Bart Baesens, and Christophe Croux. 2009. Modeling churn using customer lifetime value. European Journal of Operational Research 197, 1 (2009), 402–411.
    [24]
    Erika Graf, Claudia Schmoor, Willi Sauerbrei, and Martin Schumacher. 1999. Assessment and comparison of prognostic classification schemes for survival data. Statistics in medicine 18, 17-18 (1999), 2529–2545.
    [25]
    Arthur Gretton, Kenji Fukumizu, Choon Hui Teo, Le Song, Bernhard Schölkopf, Alexander J Smola, [n.d.]. A kernel statistical test of independence.
    [26]
    Sunil Gupta, Dominique Hanssens, Bruce Hardie, Wiliam Kahn, V Kumar, Nathaniel Lin, Nalini Ravishanker, and S Sriram. 2006. Modeling customer lifetime value. Journal of service research 9, 2 (2006), 139–155.
    [27]
    Somit Gupta, Ronny Kohavi, Diane Tang, Ya Xu, Reid Andersen, Eytan Bakshy, Niall Cardin, Sumita Chandran, Nanyu Chen, Dominic Coey, 2019. Top challenges from the first practical online controlled experiments summit. ACM SIGKDD Explorations Newsletter 21, 1 (2019), 20–35.
    [28]
    Henning Hohnhold, Deirdre O’Brien, and Diane Tang. 2015. Focusing on the long-term: It’s good for users and business. In Proc. of KDD. 1849–1858.
    [29]
    Liangjie Hong and Mounia Lalmas. 2020. Tutorial on Online User Engagement: Metrics and Optimization. In Proc. of KDD(KDD ’20). 3551–3552. https://doi.org/10.1145/3394486.3406472
    [30]
    Komal Kapoor, Mingxuan Sun, Jaideep Srivastava, and Tao Ye. 2014. A hazard based approach to user return time prediction. In Proc. of KDD. 1719–1728.
    [31]
    Eugene Kharitonov, Alexey Drutsa, and Pavel Serdyukov. 2017. Learning sensitive combinations of A/B test metrics. In Proc. of WSDM. 651–659.
    [32]
    Ron Kohavi, Alex Deng, Brian Frasca, Roger Longbotham, Toby Walker, and Ya Xu. 2012. Trustworthy online controlled experiments: Five puzzling outcomes explained. In Proc. of KDD. 786–794.
    [33]
    Ron Kohavi, Alex Deng, Roger Longbotham, and Ya Xu. 2014. Seven rules of thumb for web site experimenters. In Proc. of KDD. 1857–1866.
    [34]
    Ron Kohavi, Roger Longbotham, Dan Sommerfield, and Randal M. Henne. 2009. Controlled Experiments on the Web: Survey and Practical Guide. Data Min. Knowl. Discov. 18, 1 (2009), 140–181.
    [35]
    Ross L Prentice. 1989. Surrogate endpoints in clinical trials: definition and operational criteria. Statistics in medicine 8, 4 (1989), 431–440.
    [36]
    Markus Viljanen, Antti Airola, J. Heikkonen, and T. Pahikkala. 2017. A/B-Test of Retention and Monetization Using the Cox Model. In AIIDE.
    [37]
    Ya Xu, Nanyu Chen, Addrian Fernandez, Omar Sinno, and Anmol Bhasin. 2015. From infrastructure to culture: A/b testing challenges in large scale social networks. In Proc. of KDD. 2227–2236.
    [38]
    Ya Xu, Weitao Duan, and Shaochen Huang. 2018. SQR: balancing speed, quality and risk in online experiments. In Proc. of KDD. 895–904.
    [39]
    Jeremy Yang, Dean Eckles, Paramveer Dhillon, and Sinan Aral. 2020. Targeting for long-term outcomes. arXiv preprint arXiv:2010.15835(2020).

    Cited By

    View all
    • (2024)Long-term Off-Policy Evaluation and LearningProceedings of the ACM on Web Conference 202410.1145/3589334.3645446(3432-3443)Online publication date: 13-May-2024
    • (2023)Interpretable User Retention Modeling in RecommendationProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608818(702-708)Online publication date: 14-Sep-2023
    • (2023)Quantifying and Leveraging User Fatigue for Interventions in Recommender SystemsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592044(2293-2297)Online publication date: 19-Jul-2023
    • Show More Cited By

    Index Terms

    1. Using Survival Models to Estimate User Engagement in Online Experiments
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          WWW '22: Proceedings of the ACM Web Conference 2022
          April 2022
          3764 pages
          ISBN:9781450390965
          DOI:10.1145/3485447
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 25 April 2022

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Experimentation
          2. Long-term metrics
          3. Surrogacy

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          WWW '22
          Sponsor:
          WWW '22: The ACM Web Conference 2022
          April 25 - 29, 2022
          Virtual Event, Lyon, France

          Acceptance Rates

          Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)108
          • Downloads (Last 6 weeks)7

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Long-term Off-Policy Evaluation and LearningProceedings of the ACM on Web Conference 202410.1145/3589334.3645446(3432-3443)Online publication date: 13-May-2024
          • (2023)Interpretable User Retention Modeling in RecommendationProceedings of the 17th ACM Conference on Recommender Systems10.1145/3604915.3608818(702-708)Online publication date: 14-Sep-2023
          • (2023)Quantifying and Leveraging User Fatigue for Interventions in Recommender SystemsProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3539618.3592044(2293-2297)Online publication date: 19-Jul-2023
          • (2022)Temporally-consistent survival analysisProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601045(10671-10683)Online publication date: 28-Nov-2022

          View Options

          Get Access

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media