Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Learning To Maximize Welfare with a Reusable Resource

Published: 06 June 2022 Publication History
  • Get Citation Alerts
  • Abstract

    Considerable work has focused on optimal stopping problems where random IID offers arrive sequentially for a single available resource which is controlled by the decision-maker. After viewing the realization of the offer, the decision-maker irrevocably rejects it, or accepts it, collecting the reward and ending the game. We consider an important extension of this model to a dynamic setting where the resource is "renewable'' (a rental, a work assignment, or a temporary position) and can be allocated again after a delay period d. In the case where the reward distribution is known a priori, we design an (asymptotically optimal) 1/2-competitive Prophet Inequality, namely, a policy that collects in expectation at least half of the expected reward collected by a prophet who a priori knows all the realizations. This policy has a particularly simple characterization as a thresholding rule which depends on the reward distribution and the blocking period d, and arises naturally from an LP-relaxation of the prophet's optimal solution. Moreover, it gives the key for extending to the case of unknown distributions; here, we construct a dynamic threshold rule using the reward samples collected when the resource is not blocked. We provide a regret guarantee for our algorithm against the best policy in hindsight, and prove a complementing minimax lower bound on the best achievable regret, establishing that our policy achieves, up to poly-logarithmic factors, the best possible regret in this setting.

    References

    [1]
    Melika Abolhassani, Soheil Ehsani, Hossein Esfandiari, MohammadTaghi Hajiaghayi, Robert Kleinberg, and Brendan Lucier. 2017. Beating 1--1/e for ordered prophets. In Proceedings of the 49th Annual ACM SIGACT Symp. on Theory of Computing. 61--71.
    [2]
    Saeed Alaei. 2014. Bayesian combinatorial auctions: Expanding single buyer mechanisms to many buyers. SIAM J. Comput., Vol. 43, 2 (2014), 930--972.
    [3]
    Saeed Alaei, MohammadTaghi Hajiaghayi, and Vahid Liaghat. 2012. Online Prophet-Inequality Matching with Applications to Ad Allocation. In Proceedings of the 13th ACM Conf. on Electronic Commerce (Valencia, Spain) (EC '12). ACM, NY, NY, USA, 18--35.
    [4]
    Pablo D Azar, Robert Kleinberg, and S Matthew Weinberg. 2014. Prophet inequalities with limited information. In Proceedings of the twenty-fifth annual ACM-SIAM symposium on Discrete algorithms. SIAM, 1358--1377.
    [5]
    Soumya Basu, Orestis Papadigenopoulos, Constantine Caramanis, and Sanjay Shakkottai. 2021. Contextual blocking bandits. In Int'l Conf. on Artificial Intelligence and Statistics. PMLR, 271--279.
    [6]
    Jean Bretagnolle and Catherine Huber. 1979. Estimation des densités: risque minimax. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, Vol. 47, 2 (1979), 119--137.
    [7]
    Sébastien Bubeck and Nicolo Cesa-Bianchi. 2012. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. arXiv preprint arXiv:1204.5721 (2012).
    [8]
    Constantine Caramanis, Paul Dütting, Matthew Faw, Federico Fusco, Philip Lazos, Stefano Leonardi, Orestis Papadigenopoulos, Emmanouil Pountourakis, and Rebecca Reiffenh"auser. 2022. Single-Sample Prophet Inequalities via Greedy-Ordered Selection. In Proceedings of the 2022 Annual ACM-SIAM Symp. on Discrete Algorithms (SODA). SIAM, 1298--1325.
    [9]
    Shuchi Chawla, Jason D. Hartline, David L. Malec, and Balasubramanian Sivan. 2010. Multi-Parameter Mechanism Design and Sequential Posted Pricing. In Proceedings of the Forty-Second ACM Symp. on Theory of Computing (Cambridge, Massachusetts, USA) (STOC '10). ACM, NY, NY, USA, 311--320.
    [10]
    Xinyun Chen, Yunan Liu, and Guiyu Hong. 2020. An online learning approach to dynamic pricing and capacity sizing in service systems. arxiv: 2009.02911 [math.PR]
    [11]
    Yuan Shih Chow, Herbert Ellis Robbins, and David Siegmund. 1971. Great expectations: The theory of optimal stopping.
    [12]
    José Correa, Paul Dütting, Felix Fischer, and Kevin Schewior. 2019 a. Prophet Inequalities for I.I.D. Random Variables from an Unknown Distribution. In Proceedings of the 2019 ACM Conf. on Economics and Computation (Phoenix, AZ, USA) (EC '19). ACM, NY, NY, USA, 3--17.
    [13]
    José Correa, Patricio Foncea, Ruben Hoeksma, Tim Oosterwijk, and Tjark Vredeveld. 2017. Posted price mechanisms for a random stream of customers. In Proceedings of the 2017 ACM Conf. on Economics and Computation. 169--186.
    [14]
    Jose Correa, Patricio Foncea, Ruben Hoeksma, Tim Oosterwijk, and Tjark Vredeveld. 2019 b. Recent Developments in Prophet Inequalities. SIGecom Exch., Vol. 17, 1 (May 2019), 61--70.
    [15]
    José R. Correa, André s Cristi, Boris Epstein, and José A. Soto. 2020. The Two-Sided Game of Googol and Sample-Based Prophet Inequalities. In Proceedings of the 2020 ACM-SIAM Symp. on Discrete Algorithms, SODA 2020, Salt Lake City, UT, USA, January 5--8, 2020. 2066--2081.
    [16]
    John P. Dickerson, Karthik A. Sankararaman, Aravind Srinivasan, and Pan Xu. 2021. Allocation Problems in Ride-Sharing Platforms: Online Matching with Offline Reusable Resources. ACM Trans. Econ. Comput., Vol. 9, 3, Article 13 (jun 2021), bibinfonumpages17 pages.
    [17]
    Devdatt P Dubhashi and Alessandro Panconesi. 2009. Concentration of measure for the analysis of randomized algorithms .Cambridge University Press.
    [18]
    Paul Dutting, Michal Feldman, Thomas Kesselheim, and Brendan Lucier. 2020. Prophet inequalities made easy: Stochastic optimization by pricing nonstochastic inputs. SIAM J. Comput., Vol. 49, 3 (2020), 540--582.
    [19]
    Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz. 1956. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. The Annals of Mathematical Statistics (1956), 642--669.
    [20]
    Tomer Ezra, Michal Feldman, Nick Gravin, and Zhihao Gavin Tang. 2020. Online Stochastic Max-Weight Matching: Prophet Inequality for Vertex and Edge Arrival Models. In Proceedings of the 21st ACM Conf. on Economics and Computation (Virtual Event, Hungary) (EC '20). ACM, NY, NY, USA, 769--787.
    [21]
    Moran Feldman, Ola Svensson, and Rico Zenklusen. 2016. Online Contention Resolution Schemes. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symp. on Discrete Algorithms (Arlington, Virginia) (SODA '16). Society for Industrial and Applied Mathematics, USA, 1014--1033.
    [22]
    Amos Fiat, Ilia Gorelik, Haim Kaplan, and Slava Novgorodov. 2015. The temp secretary problem. In Algorithms-ESA 2015 . Springer, 631--642.
    [23]
    Xiao-Yue Gong, Vineet Goyal, Garud N. Iyengar, David Simchi-Levi, Rajan Udwani, and Shuangyu Wang. 0. Online Assortment Optimization with Reusable Resources. Management Science, Vol. 0, 0 ( 0), null.
    [24]
    Vineet Goyal, Garud Iyengar, and Rajan Udwani. 2021. Asymptotically Optimal Competitive Ratio for Online Allocation of Reusable Resources. arxiv: 2002.02430 [cs.DS]
    [25]
    Nikolai Gravin and Hongao Wang. 2019. Prophet Inequality for Bipartite Matching: Merits of Being Simple and Non Adaptive. In Proceedings of the 2019 ACM Conf. on Economics and Computation (Phoenix, AZ, USA) (EC '19). ACM, NY, NY, USA, 93--109.
    [26]
    Mohammad Taghi Hajiaghayi, Robert Kleinberg, and Tuomas Sandholm. 2007. Automated online mechanism design and prophet inequalities. In AAAI, Vol. 7. 58--65.
    [27]
    Theodore P Hill and Robert P Kertz. 1982. Comparisons of stop rule and supremum expectations of iid random variables. The Annals of Probability, Vol. 10, 2 (1982), 336--345.
    [28]
    Thomas Jaksch, Ronald Ortner, and Peter Auer. 2010. Near-optimal Regret Bounds for Reinforcement Learning. Journal of Machine Learning Research, Vol. 11, 4 (2010).
    [29]
    Thomas Kesselheim and Andreas Tönnis. 2016. Think eternally: Improved algorithms for the temp secretary problem and extensions. arXiv preprint arXiv:1606.06926 (2016).
    [30]
    Robert Kleinberg and Seth Matthew Weinberg. 2012. Matroid prophet inequalities. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing. 123--136.
    [31]
    Ulrich Krengel and Louis Sucheston. 1977. Semiamarts and finite values. Bull. Amer. Math. Soc., Vol. 83 (1977), 745--747.
    [32]
    Ulrich Krengel and Louis Sucheston. 1978. On semiamarts, amarts, and processes with finite value. Probability on Banach Spaces (01 1978), 197--266.
    [33]
    Kailasam Lakshmanan, Ronald Ortner, and Daniil Ryabko. 2015. Improved regret bounds for undiscounted continuous reinforcement learning. In Int'l Conf. on Machine Learning. PMLR, 524--532.
    [34]
    Tor Lattimore and Csaba Szepesvári. 2020. Bandit Algorithms .Cambridge University Press. https://doi.org/10.1017/9781108571401
    [35]
    Retsef Levi and Ana Radovanovi?. 2010. Provably Near-Optimal LP-Based Policies for Revenue Management in Systems with Reusable Resources. Operations Research, Vol. 58, 2 (2010), 503--507.
    [36]
    David A Levin and Yuval Peres. 2017. Markov chains and mixing times . Vol. 107. American Mathematical Soc.
    [37]
    Brendan Lucier. 2017. An Economic View of Prophet Inequalities. SIGecom Exch., Vol. 16, 1 (Sept. 2017), 24--47.
    [38]
    Ronald Ortner and Daniil Ryabko. 2013. Online regret bounds for undiscounted continuous reinforcement learning. arXiv preprint arXiv:1302.2550 (2013).
    [39]
    Jian QIAN, Ronan Fruit, Matteo Pirotta, and Alessandro Lazaric. 2019. Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. dtextquotesingle Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc.
    [40]
    Aviad Rubinstein. 2016. Beyond Matroids: Secretary Problem and Prophet Inequality with General Constraints (STOC '16). ACM, NY, NY, USA, 324--332.
    [41]
    Aviad Rubinstein and Sahil Singla. 2017. Combinatorial Prophet Inequalities. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symp. on Discrete Algorithms (Barcelona, Spain) (SODA '17). Society for Industrial and Applied Mathematics, USA, 1671--1687.
    [42]
    Aviad Rubinstein, Jack Z. Wang, and S. Matthew Weinberg. 2020. Optimal Single-Choice Prophet Inequalities from Samples. In 11th Innovations in Theoretical Computer Science Conf., ITCS 2020, January 12--14, 2020, Seattle, Washington, USA (LIPIcs, Vol. 151). 60:1--60:10.
    [43]
    Ester Samuel-Cahn. 1984. Comparison of Threshold Stop Rules and Maximum for Independent Nonnegative Random Variables. Annals of Probability, Vol. 12 (1984), 1213--1216.
    [44]
    Albert N Shiryaev. 2007. Optimal stopping rules . Vol. 8. Springer Science & Business Media.
    [45]
    Aristide Tossou, Debabrota Basu, and Christos Dimitrakakis. 2019. Near-optimal optimistic reinforcement learning using empirical bernstein inequalities. arXiv preprint arXiv:1905.12425 (2019).
    [46]
    Alberto Vera and Siddhartha Banerjee. 2019. The bayesian prophet: A low-regret framework for online decision making. ACM SIGMETRICS Performance Evaluation Review, Vol. 47, 1 (2019), 81--82.

    Cited By

    View all
    • (2022)The Online Knapsack Problem with DeparturesProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706186:3(1-32)Online publication date: 8-Dec-2022

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
    Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 6, Issue 2
    POMACS
    June 2022
    499 pages
    EISSN:2476-1249
    DOI:10.1145/3543145
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 June 2022
    Published in POMACS Volume 6, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. lower bounds
    2. online learning
    3. prophet inequalities
    4. regret

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)141
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)The Online Knapsack Problem with DeparturesProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706186:3(1-32)Online publication date: 8-Dec-2022

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media