research-article

Open access

Learning To Maximize Welfare with a Reusable Resource

Authors:

Orestis Papadigenopoulos,

Constantine Caramanis,

Sanjay ShakkottaiAuthors Info & Claims

Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 6, Issue 2

Article No.: 27, Pages 1 - 30

https://doi.org/10.1145/3530893

Published: 06 June 2022 Publication History

Abstract

Considerable work has focused on optimal stopping problems where random IID offers arrive sequentially for a single available resource which is controlled by the decision-maker. After viewing the realization of the offer, the decision-maker irrevocably rejects it, or accepts it, collecting the reward and ending the game. We consider an important extension of this model to a dynamic setting where the resource is "renewable'' (a rental, a work assignment, or a temporary position) and can be allocated again after a delay period d. In the case where the reward distribution is known a priori, we design an (asymptotically optimal) 1/2-competitive Prophet Inequality, namely, a policy that collects in expectation at least half of the expected reward collected by a prophet who a priori knows all the realizations. This policy has a particularly simple characterization as a thresholding rule which depends on the reward distribution and the blocking period d, and arises naturally from an LP-relaxation of the prophet's optimal solution. Moreover, it gives the key for extending to the case of unknown distributions; here, we construct a dynamic threshold rule using the reward samples collected when the resource is not blocked. We provide a regret guarantee for our algorithm against the best policy in hindsight, and prove a complementing minimax lower bound on the best achievable regret, establishing that our policy achieves, up to poly-logarithmic factors, the best possible regret in this setting.

References

[1]

Melika Abolhassani, Soheil Ehsani, Hossein Esfandiari, MohammadTaghi Hajiaghayi, Robert Kleinberg, and Brendan Lucier. 2017. Beating 1--1/e for ordered prophets. In Proceedings of the 49th Annual ACM SIGACT Symp. on Theory of Computing. 61--71.

Digital Library

[2]

Saeed Alaei. 2014. Bayesian combinatorial auctions: Expanding single buyer mechanisms to many buyers. SIAM J. Comput., Vol. 43, 2 (2014), 930--972.

[3]

Saeed Alaei, MohammadTaghi Hajiaghayi, and Vahid Liaghat. 2012. Online Prophet-Inequality Matching with Applications to Ad Allocation. In Proceedings of the 13th ACM Conf. on Electronic Commerce (Valencia, Spain) (EC '12). ACM, NY, NY, USA, 18--35.

Digital Library

[4]

Pablo D Azar, Robert Kleinberg, and S Matthew Weinberg. 2014. Prophet inequalities with limited information. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms. SIAM, 1358--1377.

[5]

Soumya Basu, Orestis Papadigenopoulos, Constantine Caramanis, and Sanjay Shakkottai. 2021. Contextual blocking bandits. In Int'l Conf. on Artificial Intelligence and Statistics. PMLR, 271--279.

[6]

Jean Bretagnolle and Catherine Huber. 1979. Estimation des densités: risque minimax. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, Vol. 47, 2 (1979), 119--137.

[7]

Sébastien Bubeck and Nicolo Cesa-Bianchi. 2012. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. arXiv preprint arXiv:1204.5721 (2012).

[8]

Constantine Caramanis, Paul Dütting, Matthew Faw, Federico Fusco, Philip Lazos, Stefano Leonardi, Orestis Papadigenopoulos, Emmanouil Pountourakis, and Rebecca Reiffenh"auser. 2022. Single-Sample Prophet Inequalities via Greedy-Ordered Selection. In Proceedings of the 2022 Annual ACM-SIAM Symp. on Discrete Algorithms (SODA). SIAM, 1298--1325.

[9]

Shuchi Chawla, Jason D. Hartline, David L. Malec, and Balasubramanian Sivan. 2010. Multi-Parameter Mechanism Design and Sequential Posted Pricing. In Proceedings of the Forty-Second ACM Symp. on Theory of Computing (Cambridge, Massachusetts, USA) (STOC '10). ACM, NY, NY, USA, 311--320.

Digital Library

[10]

Xinyun Chen, Yunan Liu, and Guiyu Hong. 2020. An online learning approach to dynamic pricing and capacity sizing in service systems. arxiv: 2009.02911 [math.PR]

[11]

Yuan Shih Chow, Herbert Ellis Robbins, and David Siegmund. 1971. Great expectations: The theory of optimal stopping.

[12]

José Correa, Paul Dütting, Felix Fischer, and Kevin Schewior. 2019 a. Prophet Inequalities for I.I.D. Random Variables from an Unknown Distribution. In Proceedings of the 2019 ACM Conf. on Economics and Computation (Phoenix, AZ, USA) (EC '19). ACM, NY, NY, USA, 3--17.

Digital Library

[13]

José Correa, Patricio Foncea, Ruben Hoeksma, Tim Oosterwijk, and Tjark Vredeveld. 2017. Posted price mechanisms for a random stream of customers. In Proceedings of the 2017 ACM Conf. on Economics and Computation. 169--186.

Digital Library

[14]

Jose Correa, Patricio Foncea, Ruben Hoeksma, Tim Oosterwijk, and Tjark Vredeveld. 2019 b. Recent Developments in Prophet Inequalities. SIGecom Exch., Vol. 17, 1 (May 2019), 61--70.

[15]

José R. Correa, André s Cristi, Boris Epstein, and José A. Soto. 2020. The Two-Sided Game of Googol and Sample-Based Prophet Inequalities. In Proceedings of the 2020 ACM-SIAM Symp. on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5--8, 2020. 2066--2081.

[16]

John P. Dickerson, Karthik A. Sankararaman, Aravind Srinivasan, and Pan Xu. 2021. Allocation Problems in Ride-Sharing Platforms: Online Matching with Offline Reusable Resources. ACM Trans. Econ. Comput., Vol. 9, 3, Article 13 (jun 2021), bibinfonumpages17 pages.

Digital Library

[17]

Devdatt P Dubhashi and Alessandro Panconesi. 2009. Concentration of measure for the analysis of randomized algorithms .Cambridge University Press.

Digital Library

[18]

Paul Dutting, Michal Feldman, Thomas Kesselheim, and Brendan Lucier. 2020. Prophet inequalities made easy: Stochastic optimization by pricing nonstochastic inputs. SIAM J. Comput., Vol. 49, 3 (2020), 540--582.

Digital Library

[19]

Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz. 1956. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. The Annals of Mathematical Statistics (1956), 642--669.

[20]

Tomer Ezra, Michal Feldman, Nick Gravin, and Zhihao Gavin Tang. 2020. Online Stochastic Max-Weight Matching: Prophet Inequality for Vertex and Edge Arrival Models. In Proceedings of the 21st ACM Conf. on Economics and Computation (Virtual Event, Hungary) (EC '20). ACM, NY, NY, USA, 769--787.

Digital Library

[21]

Moran Feldman, Ola Svensson, and Rico Zenklusen. 2016. Online Contention Resolution Schemes. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symp. on Discrete Algorithms (Arlington, Virginia) (SODA '16). Society for Industrial and Applied Mathematics, USA, 1014--1033.

Digital Library

[22]

Amos Fiat, Ilia Gorelik, Haim Kaplan, and Slava Novgorodov. 2015. The temp secretary problem. In Algorithms-ESA 2015 . Springer, 631--642.

[23]

Xiao-Yue Gong, Vineet Goyal, Garud N. Iyengar, David Simchi-Levi, Rajan Udwani, and Shuangyu Wang. 0. Online Assortment Optimization with Reusable Resources. Management Science, Vol. 0, 0 ( 0), null.

[24]

Vineet Goyal, Garud Iyengar, and Rajan Udwani. 2021. Asymptotically Optimal Competitive Ratio for Online Allocation of Reusable Resources. arxiv: 2002.02430 [cs.DS]

[25]

Nikolai Gravin and Hongao Wang. 2019. Prophet Inequality for Bipartite Matching: Merits of Being Simple and Non Adaptive. In Proceedings of the 2019 ACM Conf. on Economics and Computation (Phoenix, AZ, USA) (EC '19). ACM, NY, NY, USA, 93--109.

Digital Library

[26]

Mohammad Taghi Hajiaghayi, Robert Kleinberg, and Tuomas Sandholm. 2007. Automated online mechanism design and prophet inequalities. In AAAI, Vol. 7. 58--65.

Digital Library

[27]

Theodore P Hill and Robert P Kertz. 1982. Comparisons of stop rule and supremum expectations of iid random variables. The Annals of Probability, Vol. 10, 2 (1982), 336--345.

[28]

Thomas Jaksch, Ronald Ortner, and Peter Auer. 2010. Near-optimal Regret Bounds for Reinforcement Learning. Journal of Machine Learning Research, Vol. 11, 4 (2010).

[29]

Thomas Kesselheim and Andreas Tönnis. 2016. Think eternally: Improved algorithms for the temp secretary problem and extensions. arXiv preprint arXiv:1606.06926 (2016).

[30]

Robert Kleinberg and Seth Matthew Weinberg. 2012. Matroid prophet inequalities. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing. 123--136.

Digital Library

[31]

Ulrich Krengel and Louis Sucheston. 1977. Semiamarts and finite values. Bull. Amer. Math. Soc., Vol. 83 (1977), 745--747.

[32]

Ulrich Krengel and Louis Sucheston. 1978. On semiamarts, amarts, and processes with finite value. Probability on Banach Spaces (01 1978), 197--266.

[33]

Kailasam Lakshmanan, Ronald Ortner, and Daniil Ryabko. 2015. Improved regret bounds for undiscounted continuous reinforcement learning. In Int'l Conf. on Machine Learning. PMLR, 524--532.

[34]

Tor Lattimore and Csaba Szepesvári. 2020. Bandit Algorithms .Cambridge University Press. https://doi.org/10.1017/9781108571401

[35]

Retsef Levi and Ana Radovanovi?. 2010. Provably Near-Optimal LP-Based Policies for Revenue Management in Systems with Reusable Resources. Operations Research, Vol. 58, 2 (2010), 503--507.

Digital Library

[36]

David A Levin and Yuval Peres. 2017. Markov chains and mixing times . Vol. 107. American Mathematical Soc.

[37]

Brendan Lucier. 2017. An Economic View of Prophet Inequalities. SIGecom Exch., Vol. 16, 1 (Sept. 2017), 24--47.

Digital Library

[38]

Ronald Ortner and Daniil Ryabko. 2013. Online regret bounds for undiscounted continuous reinforcement learning. arXiv preprint arXiv:1302.2550 (2013).

[39]

Jian QIAN, Ronan Fruit, Matteo Pirotta, and Alessandro Lazaric. 2019. Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc.

[40]

Aviad Rubinstein. 2016. Beyond Matroids: Secretary Problem and Prophet Inequality with General Constraints (STOC '16). ACM, NY, NY, USA, 324--332.

[41]

Aviad Rubinstein and Sahil Singla. 2017. Combinatorial Prophet Inequalities. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symp. on Discrete Algorithms (Barcelona, Spain) (SODA '17). Society for Industrial and Applied Mathematics, USA, 1671--1687.

Digital Library

[42]

Aviad Rubinstein, Jack Z. Wang, and S. Matthew Weinberg. 2020. Optimal Single-Choice Prophet Inequalities from Samples. In 11th Innovations in Theoretical Computer Science Conf., ITCS 2020, January 12--14, 2020, Seattle, Washington, USA (LIPIcs, Vol. 151). 60:1--60:10.

[43]

Ester Samuel-Cahn. 1984. Comparison of Threshold Stop Rules and Maximum for Independent Nonnegative Random Variables. Annals of Probability, Vol. 12 (1984), 1213--1216.

[44]

Albert N Shiryaev. 2007. Optimal stopping rules . Vol. 8. Springer Science & Business Media.

[45]

Aristide Tossou, Debabrota Basu, and Christos Dimitrakakis. 2019. Near-optimal optimistic reinforcement learning using empirical bernstein inequalities. arXiv preprint arXiv:1905.12425 (2019).

[46]

Alberto Vera and Siddhartha Banerjee. 2019. The bayesian prophet: A low-regret framework for online decision making. ACM SIGMETRICS Performance Evaluation Review, Vol. 47, 1 (2019), 81--82.

Digital Library

Cited By

Sun BYang LHajiesmaili MWierman ALui JTowsley DTsang D(2022)The Online Knapsack Problem with DeparturesProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706186:3(1-32)Online publication date: 8-Dec-2022
https://dl.acm.org/doi/10.1145/3570618

Index Terms

Learning To Maximize Welfare with a Reusable Resource
1. Theory of computation
  1. Design and analysis of algorithms
    1. Approximation algorithms analysis
    2. Online algorithms
      1. Online learning algorithms

Recommendations

Learning To Maximize Welfare with a Reusable Resource
SIGMETRICS/PERFORMANCE '22: Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems
Beyond the hazard rate: more perturbation algorithms for adversarial multi-armed bandits

Recent work on follow the perturbed leader (FTPL) algorithms for the adversarial multiarmed bandit problem has highlighted the role of the hazard rate of the distribution generating the perturbations. Assuming that the hazard rate is bounded, it is ...
Bandits with switching costs: T^2/3 regret
STOC '14: Proceedings of the forty-sixth annual ACM symposium on Theory of computing

We study the adversarial multi-armed bandit problem in a setting where the player incurs a unit cost each time he switches actions. We prove that the player's T-round minimax regret in this setting is [EQUATION], thereby closing a fundamental gap in our ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems

Proceedings of the ACM on Measurement and Analysis of Computing Systems Volume 6, Issue 2

POMACS

June 2022

499 pages

EISSN:2476-1249

DOI:10.1145/3543145

Editors:
Augustin Chaintreau
Columbia University
,
Leana Golubchik
University of Southern California
,
Zhi-Li Zhang
University of Minnesota

Issue’s Table of Contents

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2022

Published in POMACS Volume 6, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF (National Science Foundation)
Wireless Networking and Communications Group (WNCG) Industrial Affiliates Program
Office of Naval Research

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
356
Total Downloads

Downloads (Last 12 months)141
Downloads (Last 6 weeks)9

Reflects downloads up to 26 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sun BYang LHajiesmaili MWierman ALui JTowsley DTsang D(2022)The Online Knapsack Problem with DeparturesProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706186:3(1-32)Online publication date: 8-Dec-2022
https://dl.acm.org/doi/10.1145/3570618

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents