Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3580305.3599877acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Free access

Off-Policy Learning-to-Bid with AuctionGym

Published: 04 August 2023 Publication History

Abstract

Online advertising opportunities are sold through auctions, billions of times every day across the web. Advertisers who participate in those auctions need to decide on a bidding strategy: how much they are willing to bid for a given impression opportunity. Deciding on such a strategy is not a straightforward task, because of the interactive and reactive nature of the repeated auction mechanism. Indeed, an advertiser does not observe counterfactual outcomes of bid amounts that were not submitted, and successful advertisers will adapt their own strategies based on bids placed by competitors. These characteristics complicate effective learning and evaluation of bidding strategies based on logged data alone.
The interactive and reactive nature of the bidding problem lends itself to a bandit or reinforcement learning formulation, where a bidding strategy can be optimised to maximise cumulative rewards. Several design choices then need to be made regarding parameterisation, model-based or model-free approaches, and the formulation of the objective function. This work provides a unified framework for such "learning to bid'' methods, showing how many existing approaches fall under the value-based paradigm. We then introduce novel policy-based and doubly robust formulations of the bidding problem. To allow for reliable and reproducible offline validation of such methods without relying on sensitive proprietary data, we introduce AuctionGym: a simulation environment that enables the use of bandit learning for bidding strategies in online advertising auctions. We present results from a suite of experiments under varying environmental conditions, unveiling insights that can guide practitioners who need to decide on a model class. Empirical observations highlight the effectiveness of our newly proposed methods. AuctionGym is released under an open-source license, and we expect the research community to benefit from this tool.

Supplementary Material

MP4 File (adfp005-2min-promo.mp4)
Promotional video for "Off-Policy Learning-to-Bid with AuctionGym"

References

[1]
P. Bajari, B. Burdick, G. W. Imbens, L. Masoero, J. McQueen, T. Richardson, and I. M. Rosen. 2021. Multiple Randomization Designs. https://arxiv.org/abs/2112.13495
[2]
W. Bendada, G. Salha, and T. Bontempelli. 2020. Carousel Personalization in Music Streaming Apps with Contextual Bandits. In RecSys '20.
[3]
G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. 2016. OpenAI Gym. https://arxiv.org/abs/1606.01540
[4]
O. Chapelle and L. Li. 2011. An Empirical Evaluation of Thompson Sampling. In NeurIPS '11.
[5]
M. Dudík, J. Langford, and L. Li. 2011. Doubly Robust Policy Evaluation and Learning. In ICML '11.
[6]
P. Duetting, Z. Feng, H. Narasimhan, D. Parkes, and S. S. Ravindranath. 2019. Optimal Auctions through Deep Learning. In ICML '19.
[7]
M. D. Ekstrand, A. Chaney, P. Castells, R. Burke, D. Rohde, and M. Slokom. 2021. SimuRec: Workshop on Synthetic Data and Simulation Methods for Recommender Systems Research. In RecSys '21.
[8]
M. Farajtabar, Y. Chow, and M. Ghavamzadeh. 2018. More Robust Doubly Robust Off-policy Evaluation. In ICML '19.
[9]
L. Faury, U. Tanielian, F. Vasile, E. Smirnova, and E. Dohmatob. 2020. Distributionally Robust Counterfactual Risk Minimization. In AAAI '20.
[10]
A. Gilotte, C. Calauzènes, T. Nedelec, A. Abraham, and S. Dollé. 2018. Offline A/B Testing for Recommender Systems. In WSDM '19.
[11]
D. Gligorijevic, T. Zhou, B. Shetty, B. Kitts, S. Pan, J. Pan, and A. Flores. 2020. Bid Shading in The Brave New World of First-Price Auctions. In CIKM '20.
[12]
C. A. E. Goodhart. 1984. Problems of Monetary Management: The UK Experience. Macmillan Education UK, 91--121.
[13]
T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine. 2018. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In ICML '19.
[14]
C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer, M. H. van Kerkwijk, M. Brett, A. Haldane, J. F. del Río, M. Wiebe, P. Peterson, P. Gérard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi, C. G., and T. E. Oliphant. 2020. Array programming with NumPy. Nature (2020).
[15]
X. He, O. Pan, J.and Jin, T. Xu, B. Liu, T. Xu, Y. Shi, A. Atallah, R. Herbrich, S. Bowers, and J. Q. Candela. 2014. Practical Lessons from Predicting Clicks on Ads at Facebook. In KDD '14 AdKDD Workshop.
[16]
E. Ie, C. Hsu, M. Mladenov, V. Jain, S. Narvekar, J. Wang, R. Wu, and C. Boutilier. 2019. RecSim: A Configurable Simulation Platform for Recommender Systems. https://arxiv.org/abs/1909.04847
[17]
E. L. Ionides. 2008. Truncated Importance Sampling. Journal of Computational and Graphical Statistics, Vol. 17, 2 (2008), 295--311.
[18]
O. Jeunen and B. Goethals. 2020. An Empirical Evaluation of Doubly Robust Learning for Recommendation. In RecSys '20 REVEAL Workshop.
[19]
O. Jeunen and B. Goethals. 2021. Pessimistic Reward Models for Off-Policy Learning in Recommendation. In RecSys '21.
[20]
O. Jeunen and B. Goethals. 2023. Pessimistic Decision-Making for Recommender Systems. ACM ToRS (2023).
[21]
O. Jeunen, D. Rohde, F. Vasile, and M. Bompaire. 2020. Joint Policy-Value Learning for Recommendation. In KDD '20.
[22]
O. Jeunen, L. Stavrogiannis, A. Sayedi, and B. Allison. 2023. A Probabilistic Framework to Learn Auction Mechanisms via Gradient Descent. In AAAI '23 AI4WebAds Workshop.
[23]
N. Karlsson and Q. Sang. 2021. Adaptive Bid Shading Optimization of First-Price Ad Inventory. In ACC '21.
[24]
D. P. Kingma and J. Ba. 2014. Adam: A Method for Stochastic Optimization. https://arxiv.org/abs/1412.6980
[25]
D. P Kingma, T. Salimans, and M. Welling. 2015. Variational Dropout and the Local Reparameterization Trick. In NeurIPS '15.
[26]
D. P. Kingma and M. Welling. 2013. Auto-Encoding Variational Bayes. https://arxiv.org/abs/1312.6114
[27]
Y. Koren, R. Bell, and C. Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer, Vol. 42, 8 (Aug. 2009), 30--37.
[28]
S. K. Lam, A. Pitrou, and S. Seibert. 2015. Numba: A LLVM-Based Python JIT Compiler. In LLVM '15.
[29]
X. Liu, X. Han, N. Z., and Q. Liu. 2020. Certified Monotonic Neural Networks. In NeurIPS '20.
[30]
X. Liu, C. Yu, Z. Zhang, Z. Zheng, Y. Rong, H. Lv, D. Huo, Y. Wang, D. Chen, J. Xu, F. Wu, G. Chen, and X. Zhu. 2021. Neural Auction: End-to-End Learning of Auction Mechanisms for E-Commerce Advertising. In KDD '21.
[31]
Y. A. Malkov and D. A. Yashunin. 2020. Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs. IEEE TPAMI (2020).
[32]
A. McDowell. 2003. From the Help Desk: Hurdle Models. The Stata Journal, Vol. 3, 2 (2003), 178--184.
[33]
H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, D. Golovin, et al. 2013. Ad click prediction: a view from the trenches. In KDD '13.
[34]
R. B. Myerson. 1981. Optimal Auction Design. Mathematics of Operations Research, Vol. 6, 1 (1981), 58--73.
[35]
A. B. Owen. 2013. Monte Carlo theory, methods and examples.
[36]
S. Pan, B. Kitts, T. Zhou, H. He, B. Shetty, a. Flores, D. Gligorijevic, J. Pan, T. Mao, S. Gultekin, and J. Zhang. 2020. Bid Shading by Win-Rate Estimation and Surplus Maximization. In KDD '20 AdKDD Workshop.
[37]
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS '19.
[38]
D. Rohde, S. Bonner, T. Dunlop, F. Vasile, and A. Karatzoglou. 2018. RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising. In RecSys '18 REVEAL Workshop.
[39]
J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz. 2015. Trust Region Policy Optimization. In ICML '15.
[40]
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. 2017. Proximal Policy Optimization Algorithms. https://arxiv.org/abs/1707.06347
[41]
N. Si, F. Zhang, Z. Zhou, and J. Blanchet. 2020. Distributionally Robust Policy Evaluation and Learning in Offline Contextual Bandits. In ICML '20.
[42]
M. Strathern. 1997. ?Improving ratings': audit in the British University system. European Review, Vol. 5, 3 (1997), 305--321.
[43]
Y. Su, M. Dimakopoulou, A. Krishnamurthy, and M. Dudik. 2020. Doubly robust off-policy evaluation with shrinkage. In ICML '20.
[44]
Y. Su, L. Wang, M. Santacatterina, and T. Joachims. 2019. CAB: Continuous Adaptive Blending for Policy Evaluation and Learning. In ICML '19.
[45]
A. Swaminathan and T. Joachims. 2015. Batch learning from logged bandit feedback through counterfactual risk minimization. JMLR (2015).
[46]
W. Vickrey. 1961. Counterspeculation, Auctions, and Competitive Sealed Tenders. The Journal of Finance, Vol. 16, 1 (1961), 8--37.
[47]
R. Wang, B. Fu, G. Fu, and M. Wang. 2017. Deep & Cross Network for Ad Click Predictions. In KDD '17 AdKDD Workshop.
[48]
D. Wu, X. Chen, X. Yang, H. Wang, Q. Tan, X. Zhang, J. Xu, and K. Gai. 2018. Budget Constrained Bidding by Model-Free Reinforcement Learning in Display Advertising. In CIKM '19.
[49]
M. Xu, M. Quiroz, R. Kohn, and S. A. Sisson. 2019. Variance reduction properties of the reparameterization trick. In AISTATS '19.
[50]
T. Xu, Z. Yang, Z. Wang, and Y. Liang. 2021. Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality. In ICML '21.
[51]
X. Yang, Y. Li, H. Wang, D. Wu, Q. Tan, J. Xu, and K. Gai. 2019. Bid Optimization by Multivariable Control in Display Advertising. In KDD '19.
[52]
W. Zhang, B. Kitts, Y. Han, Z. Zhou, T. Mao, H. He, S. Pan, A. Flores, S. Gultekin, and T. Weissman. 2021a. MEOW: A Space-Efficient Nonparametric Bid Shading Algorithm. In KDD '21.
[53]
Z. Zhang, X. Liu, Z. Zheng, C. Zhang, M. Xu, J. Pan, C. Yu, F. Wu, J. Xu, and K. Gai. 2021b. Optimizing Multiple Performance Metrics with Deep GSP Auctions for E-Commerce Advertising. In WSDM '21.
[54]
T. Zhou, H. He, S. Pan, N. Karlsson, B. Shetty, B. Kitts, D. Gligorijevic, S. Gultekin, T. Mao, J. Pan, J. Zhang, and A. Flores. 2021. An Efficient Deep Distribution Network for Bid Shading in First-Price Auctions. In KDD '21.

Cited By

View all
  • (2024)Optimal Baseline Corrections for Off-Policy Contextual BanditsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688105(722-732)Online publication date: 8-Oct-2024
  • (2024)Spending Programmed Bidding: Privacy-friendly Bid Optimization with ROI Constraint in Online AdvertisingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671540(5731-5740)Online publication date: 25-Aug-2024
  • (2024)Practical Bandits: An Industry PerspectiveProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3636449(1132-1135)Online publication date: 4-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2023
5996 pages
ISBN:9798400701030
DOI:10.1145/3580305
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. counterfactual inference
  2. off-policy learning
  3. online advertising

Qualifiers

  • Research-article

Conference

KDD '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)840
  • Downloads (Last 6 weeks)62
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Optimal Baseline Corrections for Off-Policy Contextual BanditsProceedings of the 18th ACM Conference on Recommender Systems10.1145/3640457.3688105(722-732)Online publication date: 8-Oct-2024
  • (2024)Spending Programmed Bidding: Privacy-friendly Bid Optimization with ROI Constraint in Online AdvertisingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671540(5731-5740)Online publication date: 25-Aug-2024
  • (2024)Practical Bandits: An Industry PerspectiveProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3636449(1132-1135)Online publication date: 4-Mar-2024
  • (2024)Ad-load Balancing via Off-policy Learning in a Content MarketplaceProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635846(586-595)Online publication date: 4-Mar-2024
  • (2024)Bidder Selection Problem in Position Auctions: A Fast and Simple Algorithm via Poisson ApproximationProceedings of the ACM Web Conference 202410.1145/3589334.3645418(89-98)Online publication date: 13-May-2024
  • (2023)Adversarial Constrained Bidding via Minimax Regret Optimization with Causality-Aware Reinforcement LearningProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599254(2314-2325)Online publication date: 4-Aug-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media

Access Granted

The conference sponsors are committed to making content openly accessible in a timely manner.
This article is provided by ACM and the conference, through the ACM OpenTOC service.