Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3485447.3512103acmconferencesArticle/Chapter ViewAbstractPublication PageswebconfConference Proceedingsconference-collections
research-article

LBCF: A Large-Scale Budget-Constrained Causal Forest Algorithm

Published: 25 April 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Offering incentives (e.g., coupons at Amazon, discounts at Uber and video bonuses at Tiktok) to user is a common strategy used by online platforms to increase user engagement and platform revenue. Despite its proven effectiveness, these marketing incentives incur an inevitable cost and might result in a low ROI (Return on Investment) if not used properly. On the other hand, different users respond differently to these incentives, for instance, some users never buy certain products without coupons, while others do anyway. Thus, how to select the right amount of incentives (i.e. treatment) to each user under budget constraints is an important research problem with great practical implications. In this paper, we call such problem as a budget-constrained treatment selection (BTS) problem.
    The challenge is how to efficiently solve BTS problem on a Large-Scale dataset and achieve improved results over the existing techniques. We propose a novel tree-based treatment selection technique under budget constraints, called Large-Scale Budget-Constrained Causal Forest (LBCF) algorithm, which is also an efficient treatment selection algorithm suitable for modern distributed computing systems. A novel offline evaluation method is also proposed to overcome an intrinsic challenge in assessing solutions’ performance for BTS problem in randomized control trials (RCT) data. We deploy our approach in a real-world scenario on a large-scale video platform, where the platform gives away bonuses in order to increase users’ campaign engagement duration. The simulation analysis, offline and online experiments all show that our method outperforms various tree-based state-of-the-art baselines 1. The proposed approach is currently serving over hundreds of millions of users on the platform and achieves one of the most tremendous improvements over these months.

    References

    [1]
    Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, 2016. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16). 265–283.
    [2]
    Susan Athey, Julie Tibshirani, and Stefan Wager. 2018. Generalized Random Forests. (2018). arxiv:1610.01271 [stat.ME]
    [3]
    Shuyang Du, James Lee, and Farzin Ghaffarizadeh. 2019. Improve User Retention with Causal Learning. In The 2019 ACM SIGKDD Workshop on Causal Discovery. PMLR, 34–49.
    [4]
    Martin E Dyer. 1984. An O (n) algorithm for the multiple-choice knapsack linear program. Mathematical programming 29, 1 (1984), 57–63.
    [5]
    Martin E Dyer and John Walker. 1998. Dominance in multi-dimensional multiple-choice knapsack problems. Asia-Pacific Journal of Operational Research 15, 2 (1998), 159.
    [6]
    Dmitri Goldenberg, Javier Albert, Lucas Bernardi, and Pablo Estevez. 2020. Free Lunch! Retrospective Uplift Modeling for Dynamic Promotions Recommendation within ROI Constraints. In Fourteenth ACM Conference on Recommender Systems. 486–491.
    [7]
    Leo Guelman, Montserrat Guillén, and Ana M Pérez-Marín. 2015. Uplift random forests. Cybernetics and Systems 46, 3-4 (2015), 230–248.
    [8]
    Pierre Gutierrez and Jean-Yves Gérardy. 2017. Causal inference and uplift modelling: A review of the literature. In International Conference on Predictive Applications and APIs. PMLR, 1–13.
    [9]
    Chaitr S Hiremath and Raymond R Hill. 2013. First-level tabu search approach for solving the multiple-choice multidimensional knapsack problem. international Journal of Metaheuristics 2, 2 (2013), 174–199.
    [10]
    Hans Kellerer, Ulrich Pferschy, and David Pisinger. 2004. Multidimensional knapsack problems. In Knapsack problems. Springer, 235–283.
    [11]
    Ron Kohavi, Alex Deng, Brian Frasca, Toby Walker, Ya Xu, and Nils Pohlmann. 2013. Online controlled experiments at large scale. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 1168–1176.
    [12]
    Ron Kohavi, Alex Deng, Roger Longbotham, and Ya Xu. 2014. Seven rules of thumb for web site experimenters. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 1857–1866.
    [13]
    Ying-Chun Lin, Chi-Hsuan Huang, Chu-Cheng Hsieh, Yu-Chen Shu, and Kun-Ta Chuang. 2017. Monetary discount strategies for real-time promotion campaign. In Proceedings of the 26th International Conference on World Wide Web. 1123–1132.
    [14]
    Rahul Makhijani, Shreya Chakrabarti, Dale Struble, and Yi Liu. 2019. LORE: a large-scale offer recommendation engine with eligibility and capacity constraints. In Proceedings of the 13th ACM Conference on Recommender Systems. 160–168.
    [15]
    Silvano Martello. 1990. Knapsack problems: algorithms and computer implementations. Wiley-Interscience series in discrete mathematics and optimiza tion (1990).
    [16]
    Nicholas Radcliffe. 2007. Using control groups to target on predicted lift: Building and assessing uplift model. Direct Marketing Analytics Journal(2007), 14–21.
    [17]
    Nicholas J Radcliffe and Patrick D Surry. 2011. Real-world uplift modelling with significance-based uplift trees. White Paper TR-2011-1, Stochastic Solutions(2011), 1–33.
    [18]
    Thomas Reutterer, Andreas Mild, Martin Natter, and Alfred Taudes. 2006. A dynamic segmentation approach for targeting and customizing direct marketing campaigns. Journal of interactive Marketing 20, 3-4 (2006), 43–57.
    [19]
    Paul R Rosenbaum and Donald B Rubin. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70, 1 (1983), 41–55.
    [20]
    Donald B Rubin. 1974. Estimating causal effects of treatments in randomized and nonrandomized studies.Journal of educational Psychology 66, 5 (1974), 688.
    [21]
    Piotr Rzepakowski and Szymon Jaroszewicz. 2012. Decision trees for uplift modeling with single and multiple treatments. Knowledge and Information Systems 32, 2 (2012), 303–327.
    [22]
    Prabhakant Sinha and Andris A Zoltners. 1979. The multiple-choice knapsack problem. Operations Research 27, 3 (1979), 503–515.
    [23]
    Michał Sołtys, Szymon Jaroszewicz, and Piotr Rzepakowski. 2015. Ensemble methods for uplift modeling. Data mining and knowledge discovery 29, 6 (2015), 1531–1559.
    [24]
    Jerzy Splawa-Neyman, Dorota M Dabrowska, and TP Speed. 1990. On the application of probability theory to agricultural experiments. Essay on principles. Section 9.Statist. Sci. (1990), 465–472.
    [25]
    Diane Tang, Ashish Agarwal, Deirdre O’Brien, and Mike Meyer. 2010. Overlapping experiment infrastructure: More, better, faster experimentation. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. 17–26.
    [26]
    Ye Tu, Kinjal Basu, Cyrus DiCiccio, Romil Bansal, Preetam Nandy, Padmini Jaikumar, and Shaunak Chatterjee. 2021. Personalized Treatment Selection using Causal Heterogeneity. In Proceedings of the Web Conference 2021. 1574–1585.
    [27]
    Stefan Wager and Susan Athey. 2018. Estimation and inference of heterogeneous treatment effects using random forests. J. Amer. Statist. Assoc. 113, 523 (2018), 1228–1242.
    [28]
    Ya Xu, Nanyu Chen, Addrian Fernandez, Omar Sinno, and Anmol Bhasin. 2015. From infrastructure to culture: A/B testing challenges in large scale social networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2227–2236.
    [29]
    Kui Zhao, Junhao Hua, Ling Yan, Qi Zhang, Huan Xu, and Cheng Yang. 2019. A Unified Framework for Marketing Budget Allocation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1820–1830.
    [30]
    Yan Zhao, Xiao Fang, and David Simchi-Levi. 2017. Uplift modeling with multiple treatments and general response types. In Proceedings of the 2017 SIAM International Conference on Data Mining. SIAM, 588–596.
    [31]
    Zhenyu Zhao and Totte Harinen. 2019. Uplift modeling for multiple treatments with cost optimization. In 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 422–431.

    Cited By

    View all
    • (2024)Treatment Effect Estimation for User Interest Exploration on Recommender SystemsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657736(1861-1871)Online publication date: 10-Jul-2024
    • (2024)Stable Heterogeneous Treatment Effect Estimation across Out-of-Distribution Populations2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00091(1117-1130)Online publication date: 13-May-2024
    • (2024)Improve ROI with Causal Learning and Conformal Prediction2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00052(598-610)Online publication date: 13-May-2024
    • Show More Cited By

    Index Terms

    1. LBCF: A Large-Scale Budget-Constrained Causal Forest Algorithm
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image ACM Conferences
            WWW '22: Proceedings of the ACM Web Conference 2022
            April 2022
            3764 pages
            ISBN:9781450390965
            DOI:10.1145/3485447
            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Sponsors

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            Published: 25 April 2022

            Permissions

            Request permissions for this article.

            Check for updates

            Author Tags

            1. Constraint optimization
            2. Distributed Computing
            3. Heterogeneous causal effects
            4. Personalization
            5. Treatment Selection

            Qualifiers

            • Research-article
            • Research
            • Refereed limited

            Conference

            WWW '22
            Sponsor:
            WWW '22: The ACM Web Conference 2022
            April 25 - 29, 2022
            Virtual Event, Lyon, France

            Acceptance Rates

            Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)83
            • Downloads (Last 6 weeks)4
            Reflects downloads up to 27 Jul 2024

            Other Metrics

            Citations

            Cited By

            View all
            • (2024)Treatment Effect Estimation for User Interest Exploration on Recommender SystemsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657736(1861-1871)Online publication date: 10-Jul-2024
            • (2024)Stable Heterogeneous Treatment Effect Estimation across Out-of-Distribution Populations2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00091(1117-1130)Online publication date: 13-May-2024
            • (2024)Improve ROI with Causal Learning and Conformal Prediction2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00052(598-610)Online publication date: 13-May-2024
            • (2024)A Multi-Channel Advertising Budget Allocation Using Reinforcement Learning and an Improved Differential Evolution AlgorithmIEEE Access10.1109/ACCESS.2024.342935912(100559-100580)Online publication date: 2024
            • (2023)Explicit Feature Interaction-aware Uplift Network for Online MarketingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599820(4507-4515)Online publication date: 6-Aug-2023
            • (2023)A Multi-stage Framework for Online Bonus Allocation Based on Constrained User Intent DetectionProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599764(5028-5038)Online publication date: 6-Aug-2023
            • (2022)E-Commerce Promotions Personalization via Online Multiple-Choice Knapsack with Uplift ModelingProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557100(2863-2872)Online publication date: 17-Oct-2022

            View Options

            Get Access

            Login options

            View options

            PDF

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format.

            HTML Format

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media