Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3580305.3599254acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Free access

Adversarial Constrained Bidding via Minimax Regret Optimization with Causality-Aware Reinforcement Learning

Published: 04 August 2023 Publication History
  • Get Citation Alerts
  • Abstract

    The proliferation of the Internet has led to the emergence of online advertising, driven by the mechanics of online auctions. In these repeated auctions, software agents participate on behalf of aggregated advertisers to optimize for their long-term utility. To fulfill the diverse demands, bidding strategies are employed to optimize advertising objectives subject to different spending constraints. Existing approaches on constrained bidding typically rely on i.i.d. train and test conditions, which contradicts the adversarial nature of online ad markets where different parties possess potentially conflicting objectives. In this regard, we explore the problem of constrained bidding in adversarial bidding environments, which assumes no knowledge about the adversarial factors. Instead of relying on the i.i.d. assumption, our insight is to align the train distribution of environments with the potential test distribution meanwhile minimizing policy regret. Based on this insight, we propose a practical Minimax Regret Optimization (MiRO) approach that interleaves between a teacher finding adversarial environments for tutoring and a learner meta-learning its policy over the given distribution of environments. In addition, we pioneer to incorporate expert demonstrations for learning bidding strategies. Through a causality-aware policy design, we improve upon MiRO by distilling knowledge from the experts. Extensive experiments on both industrial data and synthetic data show that our method, MiRO with Causality-aware reinforcement Learning (MiROCL), outperforms prior methods by over 30%.

    Supplementary Material

    MP4 File (rtfp1167-2min-promo.mp4)
    Promotion video for the paper "Adversarial Constrained Bidding via Minimax Regret Optimization with Causality-Aware Reinforcement Learning".

    References

    [1]
    Jonas Adler and Sebastian Lunz. 2018. Banach wasserstein gan. Advances in neural information processing systems 31 (2018).
    [2]
    Alexander A Alemi, Ian Fischer, Joshua V Dillon, and Kevin Murphy. 2016. Deep variational information bottleneck. arXiv preprint arXiv:1612.00410 (2016).
    [3]
    Alimama 2022. Alimama. Retrieved 2022 from https://www.alimama.com/
    [4]
    David Balduzzi, Sebastien Racaniere, James Martens, Jakob Foerster, Karl Tuyls, and Thore Graepel. 2018. The mechanics of n-player differentiable games. In International Conference on Machine Learning. PMLR, 354--363.
    [5]
    S. Balseiro, A. Kim, M. Mahdian, and V. Mirrokni. 2021. Budget-Management Strategies in Repeated Auctions. Operations Research 69, 3 (2021).
    [6]
    Santiago R Balseiro, Omar Besbes, and Gabriel Y Weintraub. 2015. Repeated auctions with budgets in ad exchanges: Approximations and design. Management Science 61, 4 (2015), 864--884.
    [7]
    Santiago R Balseiro and Yonatan Gur. 2019. Learning in repeated auctions with budgets: Regret minimization and equilibrium. Management Science 65, 9 (2019), 3952--3968.
    [8]
    Christopher M Bishop and Nasser M Nasrabadi. 2006. Pattern recognition and machine learning. Vol. 4. Springer.
    [9]
    Avrim Blum, Vijay Kumar, Atri Rudra, and Felix Wu. 2004. Online learning in online auctions. Theoretical Computer Science 324, 2--3 (2004), 137--146.
    [10]
    Stephen Boyd, Stephen P Boyd, and Lieven Vandenberghe. 2004. Convex optimization. Cambridge university press.
    [11]
    Sébastien Bubeck, Nikhil R Devanur, Zhiyi Huang, and Rad Niazadeh. 2017. Multi-scale online learning and its applications to online auctions. arXiv preprint arXiv:1705.09700 (2017).
    [12]
    Han Cai, Kan Ren, Weinan Zhang, Kleanthis Malialis, Jun Wang, Yong Yu, and Defeng Guo. 2017. Real-time bidding by reinforcement learning in display advertising. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. 661--670.
    [13]
    Dragos Florin Ciocan and Vivek Farias. 2012. Model predictive control for dynamic resource allocation. Mathematics of Operations Research 37, 3 (2012), 501--525.
    [14]
    Alexey Drutsa. 2020. Reserve pricing in repeated second-price auctions with strategic bidders. In International Conference on Machine Learning. PMLR, 2678--2689.
    [15]
    Chao Du, Zhifeng Gao, Shuo Yuan, Lining Gao, Ziyan Li, Yifan Zeng, Xiaoqiang Zhu, Jian Xu, Kun Gai, and Kuang-Chih Lee. 2021. Exploration in Online Advertising Systems with Deep Uncertainty-Aware Learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2792--2801.
    [16]
    Paul Dütting, Zhe Feng, Harikrishna Narasimhan, David Parkes, and Sai Srivatsa Ravindranath. 2019. Optimal auctions through deep learning. In International Conference on Machine Learning. PMLR, 1706--1715.
    [17]
    Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz. 2007. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American economic review 97, 1 (2007), 242--259.
    [18]
    Zhe Feng, Sébastien Lahaie, Jon Schneider, and Jinchao Ye. 2021. Reserve price optimization for first price auctions in display advertising. In International Conference on Machine Learning. PMLR, 3230--3239.
    [19]
    Zhe Feng, Chara Podimata, and Vasilis Syrgkanis. 2018. Learning to bid without knowing your value. In Proceedings of the 2018 ACM Conference on Economics and Computation. 505--522.
    [20]
    Joaquin Fernandez-Tapia. 2019. An analytical solution to the budget-pacing problem in programmatic advertising. Journal of Information and Optimization Sciences 40, 3 (2019), 603--614.
    [21]
    Joaquin Fernandez-Tapia, Olivier Guéant, and Jean-Michel Lasry. 2017. Optimal real-time bidding strategies. Applied mathematics research express 2017, 1 (2017), 142--183.
    [22]
    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139--144.
    [23]
    Google 2022. Google. Retrieved 2022 from https://ads.google.com/
    [24]
    Ramki Gummadi, Peter Key, and Alexandre Proutiere. 2013. Optimal bidding strategies and equilibria in dynamic auctions with budget constraints. Available at SSRN 2066175 (2013).
    [25]
    Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, et al. 2018. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018).
    [26]
    Yanjun Han, Zhengyuan Zhou, Aaron Flores, Erik Ordentlich, and Tsachy Weissman. 2020. Learning to bid optimally and efficiently in adversarial first-price auctions. arXiv preprint arXiv:2007.04568 (2020).
    [27]
    Yue He, Xiujun Chen, Di Wu, Junwei Pan, Qing Tan, Chuan Yu, Jian Xu, and Xiaoqiang Zhu. 2021. A Unified Solution to Constrained Bidding in Online Display Advertising. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2993--3001.
    [28]
    Zhaolin Hu and L Jeff Hong. 2013. Kullback-Leibler divergence constrained distributionally robust optimization. Available at Optimization Online (2013), 1695--1724.
    [29]
    Antoine Jamin and Anne Humeau-Heurtier. 2019. (Multiscale) Cross-Entropy Methods: A Review. Entropy 22 (12 2019). https://doi.org/10.3390/e22010045
    [30]
    Olivier Jeunen, Sean Murphy, and Ben Allison. 2022. Learning to bid with AuctionGym. (2022).
    [31]
    Sascha Lange, Thomas Gabel, and Martin Riedmiller. 2012. Batch reinforcement learning. Reinforcement learning: State-of-the-art (2012), 45--73.
    [32]
    Roger B Myerson. 1981. Optimal auction design. Mathematics of operations research 6, 1 (1981), 58--73.
    [33]
    Thomas Nedelec, Jules Baudet, Vianney Perchet, and Noureddine El Karoui. 2021. Adversarial Learning in Revenue-Maximizing Auctions. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems. 955--963.
    [34]
    Thomas Nedelec, Clément Calauzènes, Noureddine El Karoui, Vianney Perchet, et al. 2022. Learning in repeated auctions. Foundations and Trends® in Machine Learning 15, 3 (2022), 176--334.
    [35]
    Michael Ostrovsky and Michael Schwarz. 2011. Reserve prices in internet advertising auctions: A field experiment. In Proceedings of the 12th ACM conference on Electronic commerce. 59--60.
    [36]
    Judea Pearl et al. 2000. Models, reasoning and inference. Cambridge, UK: Cam-bridgeUniversityPress 19, 2 (2000).
    [37]
    Jad Rahme, Samy Jelassi, and S Matthew Weinberg. 2020. Auction learning as a two-player game. arXiv preprint arXiv:2006.05684 (2020).
    [38]
    Aman Sinha, Hongseok Namkoong, and John Duchi. 2017. Certifiable distributional robustness with principled adversarial training. arXiv preprint arXiv:1710.10571 2 (2017).
    [39]
    Matthew Staib and Stefanie Jegelka. 2019. Distributionally robust optimization and generalization in kernel methods. Advances in Neural Information Processing Systems 32 (2019).
    [40]
    Alberto Vera, Siddhartha Banerjee, and Itai Gurvich. 2021. Online allocation and pricing: Constant regret via bellman inequalities. Operations Research 69, 3 (2021), 821--840.
    [41]
    Haozhe Wang, Chao Du, Panyan Fang, Shuo Yuan, Xuming He, Liang Wang, and Bo Zheng. 2022. ROI-Constrained Bidding via Curriculum-Guided Bayesian Reinforcement Learning. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4021--4031.
    [42]
    Haozhe Wang, Jiale Zhou, and Xuming He. 2020. Learning context-aware task rea- soning for efficient meta-reinforcement learning. arXiv preprint arXiv:2003.01373 (2020).
    [43]
    Jonathan Weed, Vianney Perchet, and Philippe Rigollet. 2016. Online learning in repeated auctions. In Conference on Learning Theory. PMLR, 1562--1583.
    [44]
    Christopher A Wilkens, Ruggiero Cavallo, Rad Niazadeh, and Samuel Taggart. 2016. Mechanism design for value maximizers. arXiv preprint arXiv:1607.04362 (2016).
    [45]
    Annie Xie, James Harrison, and Chelsea Finn. 2020. Deep reinforcement learning amidst lifelong non-stationarity. arXiv preprint arXiv:2006.10701 (2020).
    [46]
    Tian Xu, Ziniu Li, and Yang Yu. 2020. Error bounds of imitating policies and environments. Advances in Neural Information Processing Systems 33 (2020), 15737--15749.
    [47]
    Xun Yang, Yasong Li, Hao Wang, Di Wu, Qing Tan, Jian Xu, and Kun Gai. 2019. Bid optimization by multivariable control in display advertising. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1966--1974.
    [48]
    Luisa Zintgraf, Sebastian Schulze, Cong Lu, Leo Feng, Maximilian Igl, Kyriacos Shiarlis, Yarin Gal, Katja Hofmann, and Shimon Whiteson. 2021. VariBAD: variational Bayes-adaptive deep RL via meta-learning. The Journal of Machine Learning Research 22, 1 (2021), 13198--13236.

    Cited By

    View all
    • (2024)Bayesian reinforcement learning for navigation planning in unknown environmentsFrontiers in Artificial Intelligence10.3389/frai.2024.13080317Online publication date: 4-Jul-2024

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
    August 2023
    5996 pages
    ISBN:9798400701030
    DOI:10.1145/3580305
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 August 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. causality
    2. constrained bidding
    3. reinforcement learning

    Qualifiers

    • Research-article

    Conference

    KDD '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)217
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Bayesian reinforcement learning for navigation planning in unknown environmentsFrontiers in Artificial Intelligence10.3389/frai.2024.13080317Online publication date: 4-Jul-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media