Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3618260.3649653acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
research-article
Open access

No-Regret Learning in Bilateral Trade via Global Budget Balance

Published: 11 June 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Bilateral trade models the problem of intermediating between two rational agents — a seller and a buyer — both characterized by a private valuation for an item they want to trade. We study the online learning version of the problem, in which at each time step a new seller and buyer arrive and the learner has to set prices for them without any knowledge about their (adversarially generated) valuations.
    In this setting, known impossibility results rule out the existence of no-regret algorithms when budget balanced has to be enforced at each time step. In this paper, we introduce the notion of global budget balance, which only requires the learner to fulfill budget balance over the entire time horizon. Under this natural relaxation, we provide the first no-regret algorithms for adversarial bilateral trade under various feedback models. First, we show that in the full-feedback model, the learner can guarantee Õ(√T) regret against the best fixed prices in hindsight, and that this bound is optimal up to poly-logarithmic terms. Second, we provide a learning algorithm guaranteeing a Õ(T 34) regret upper bound with one-bit feedback, which we complement with a Ω(T 57) lower bound that holds even in the two-bit feedback model. Finally, we introduce and analyze an alternative benchmark that is provably stronger than the best fixed prices in hindsight and is inspired by the literature on bandits with knapsacks.

    References

    [1]
    Shipra Agrawal and Nikhil R. Devanur. 2019. Bandits with Global Convex Constraints and Objective. Oper. Res., 67, 5 (2019), 1486–1502. https://doi.org/10.1287/opre.2019.1840
    [2]
    Noga Alon, Nicolò Cesa-Bianchi, Claudio Gentile, Shie Mannor, Yishay Mansour, and Ohad Shamir. 2017. Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback. SIAM J. Comput., 46, 6 (2017), 1785–1826. https://doi.org/10.1137/140989455
    [3]
    Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. 2002. The Nonstochastic Multiarmed Bandit Problem. SIAM J. Comput., 32, 1 (2002), 48–77. https://doi.org/10.1137/S0097539701398375
    [4]
    Yossi Azar, Amos Fiat, and Federico Fusco. 2022. An alpha-regret analysis of Adversarial Bilateral Trade. In NeurIPS.
    [5]
    Ashwinkumar Badanidiyuru, Robert Kleinberg, and Aleksandrs Slivkins. 2018. Bandits with knapsacks. J. ACM, 65, 3 (2018), 1–55. https://doi.org/10.1145/3164539
    [6]
    Santiago R. Balseiro and Yonatan Gur. 2019. Learning in Repeated Auctions with Budgets: Regret Minimization and Equilibrium. Manag. Sci., 65, 9 (2019), 3952–3968. https://doi.org/10.1287/mnsc.2018.3174
    [7]
    Gábor Bartók, Dean P. Foster, Dávid Pál, Alexander Rakhlin, and Csaba Szepesvári. 2014. Partial Monitoring - Classification, Regret Bounds, and Algorithms. Math. Oper. Res., 39, 4 (2014), 967–997. https://doi.org/10.1287/moor.2014.0663
    [8]
    Martino Bernasconi, Matteo Castiglioni, Andrea Celli, and Federico Fusco. 2023. No-Regret Learning in Bilateral Trade via Global Budget Balance. CoRR, abs/2310.12370 (2023).
    [9]
    Martino Bernasconi, Matteo Castiglioni, Andrea Celli, and Federico Fusco. 2024. Bandits with Replenishable Knapsacks: the Best of both Worlds. In ICLR.
    [10]
    Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Alberto Marchesi, Francesco Trovò, and Nicola Gatti. 2023. Optimal Rates and Efficient Algorithms for Online Bayesian Persuasion. In ICML (Proceedings of Machine Learning Research, Vol. 202). PMLR, 2164–2183.
    [11]
    Liad Blumrosen and Shahar Dobzinski. 2014. Reallocation mechanisms. In EC. ACM, 617. https://doi.org/10.1145/2600057.2602843
    [12]
    Liad Blumrosen and Yehonatan Mizrahi. 2016. Approximating Gains-from-Trade in Bilateral Trading. In WINE (Lecture Notes in Computer Science, Vol. 10123). Springer, 400–413. https://doi.org/10.1007/978-3-662-54110-4_28
    [13]
    Nataša Bolić, Tommaso Cesari, and Roberto Colomboni. 2024. An Online Learning Theory of Brokerage. The 23rd International Conference on Autonomous Agents and Multi-Agent Systems.
    [14]
    Johannes Brustle, Yang Cai, Fa Wu, and Mingfei Zhao. 2017. Approximating gains from trade in two-sided markets via simple mechanisms. In Proceedings of the 2017 ACM Conference on Economics and Computation. 589–590. https://doi.org/10.1145/3033274.3085148
    [15]
    Matteo Castiglioni, Andrea Celli, and Christian Kroer. 2022. Online Learning with Knapsacks: the Best of Both Worlds. In ICML (Proceedings of Machine Learning Research, Vol. 162). PMLR, 2767–2783.
    [16]
    Matteo Castiglioni, Andrea Celli, Alberto Marchesi, and Nicola Gatti. 2020. Online Bayesian Persuasion. In NeurIPS.
    [17]
    Matteo Castiglioni, Andrea Celli, Alberto Marchesi, and Nicola Gatti. 2023. Regret minimization in online Bayesian persuasion: Handling adversarial receiver’s types under full and partial feedback models. Artif. Intell., 314 (2023), 103821.
    [18]
    Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Federico Fusco, and Stefano Leonardi. 2021. A Regret Analysis of Bilateral Trade. In EC. ACM, 289–309. https://doi.org/10.1145/3465456.3467645
    [19]
    Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Federico Fusco, and Stefano Leonardi. 2023. Repeated Bilateral Trade Against a Smoothed Adversary. In COLT (Proceedings of Machine Learning Research, Vol. 195). PMLR, 1095–1130.
    [20]
    Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Federico Fusco, and Stefano Leonardi. 2024. Bilateral trade: A regret minimization perspective. Mathematics of Operations Research, 49, 1 (2024), 171–203. https://doi.org/10.1287/moor.2023.1351
    [21]
    Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Federico Fusco, and Stefano Leonardi. 2024. The Role of Transparency in Repeated First-Price Auctions with Unknown Valuations. In STOC. ACM.
    [22]
    Nicolò Cesa-Bianchi, Claudio Gentile, and Yishay Mansour. 2015. Regret Minimization for Reserve Prices in Second-Price Auctions. IEEE Trans. Inf. Theory, 61, 1 (2015), 549–564. https://doi.org/10.1109/TIT.2014.2365772
    [23]
    Nicolò Cesa-Bianchi, Gábor Lugosi, and Gilles Stoltz. 2006. Regret Minimization Under Partial Monitoring. Math. Oper. Res., 31, 3 (2006), 562–580. https://doi.org/moor.1060.0206
    [24]
    Constantinos Daskalakis and Vasilis Syrgkanis. 2022. Learning in auctions: Regret is hard, envy is easy. Games Econ. Behav., 134 (2022), 308–343.
    [25]
    Yuan Deng, Jieming Mao, Balasubramanian Sivan, and Kangning Wang. 2022. Approximately efficient bilateral trade. In STOC. ACM, 718–721. https://doi.org/10.1145/3519935.3520054
    [26]
    Paul Duetting, Guru Guruganesh, Jon Schneider, and Joshua Ruizhi Wang. 2023. Optimal No-Regret Learning for One-Sided Lipschitz Functions. In ICML (Proceedings of Machine Learning Research, Vol. 202). PMLR, 8836–8850.
    [27]
    Paul Dütting, Federico Fusco, Philip Lazos, Stefano Leonardi, and Rebecca Reiffenhäuser. 2021. Efficient two-sided markets with limited information. In STOC. ACM, 1452–1465. https://doi.org/10.1145/3406325.3451076
    [28]
    Yumou Fei. 2022. Improved approximation to first-best gains-from-trade. In International Conference on Web and Internet Economics. 204–218. https://doi.org/10.1007/978-3-031-22832-2_12
    [29]
    Michal Feldman, Tomer Koren, Roi Livni, Yishay Mansour, and Aviv Zohar. 2016. Online pricing with strategic and patient buyers. Advances in Neural Information Processing Systems, 29 (2016).
    [30]
    Chien-Ju Ho, Aleksandrs Slivkins, and Jennifer Wortman Vaughan. 2016. Adaptive Contract Design for Crowdsourcing Markets: Bandit Algorithms for Repeated Principal-Agent Problems. J. Artif. Intell. Res., 55 (2016), 317–359.
    [31]
    Nicole Immorlica, Karthik Sankararaman, Robert Schapire, and Aleksandrs Slivkins. 2022. Adversarial bandits with knapsacks. J. ACM, 69, 6 (2022), 1–47. https://doi.org/10.1145/3557045
    [32]
    Rodolphe Jenatton, Jim C. Huang, and Cédric Archambeau. 2016. Adaptive Algorithms for Online Convex Optimization with Long-term Constraints. In ICML (JMLR Workshop and Conference Proceedings, Vol. 48). JMLR.org, 402–411.
    [33]
    Sham M. Kakade, Adam Tauman Kalai, and Katrina Ligett. 2009. Playing Games with Approximation Algorithms. SIAM J. Comput., 39, 3 (2009), 1088–1106. https://doi.org/10.1137/070701704
    [34]
    Zi Yang Kang, Francisco Pernice, and Jan Vondrák. 2022. Fixed-Price Approximations in Bilateral Trade. In SODA. SIAM, 2964–2985. https://doi.org/10.1137/1.9781611977073.115
    [35]
    Thomas Kesselheim and Sahil Singla. 2020. Online Learning with Vector Costs and Bandits with Knapsacks. In COLT (Proceedings of Machine Learning Research, Vol. 125). PMLR, 2286–2305.
    [36]
    Robert D. Kleinberg and Frank Thomson Leighton. 2003. The Value of Knowing a Demand Curve: Bounds on Regret for Online Posted-Price Auctions. In FOCS. IEEE Computer Society, 594–605. https://doi.org/10.1109/SFCS.2003.1238232
    [37]
    Raunak Kumar and Robert Kleinberg. 2022. Non-monotonic Resource Utilization in the Bandits with Knapsacks Problem. In NeurIPS.
    [38]
    Thodoris Lykouris, Vasilis Syrgkanis, and Éva Tardos. 2016. Learning and Efficiency in Games with Dynamic Population. In SODA. SIAM, 120–129. https://doi.org/10.1137/1.9781611977554.ch17
    [39]
    Mehrdad Mahdavi, Rong Jin, and Tianbao Yang. 2012. Trading regret for efficiency: online convex optimization with long term constraints. J. Mach. Learn. Res., 13 (2012), 2503–2528.
    [40]
    Shie Mannor, John N. Tsitsiklis, and Jia Yuan Yu. 2009. Online Learning with Sample Path Constraints. J. Mach. Learn. Res., 10 (2009), 569–590.
    [41]
    R Preston McAfee. 2008. The gains from trade under fixed price mechanisms. Applied economics research bulletin, 1, 1 (2008), 1–10.
    [42]
    Jamie Morgenstern and Tim Roughgarden. 2015. On the Pseudo-Dimension of Nearly Optimal Auctions. In NIPS. 136–144.
    [43]
    Roger B Myerson and Mark A Satterthwaite. 1983. Efficient mechanisms for bilateral trading. Journal of economic theory, 29, 2 (1983), 265–281.
    [44]
    Thomas Nedelec, Clément Calauzènes, Noureddine El Karoui, and Vianney Perchet. 2022. Learning in Repeated Auctions. Found. Trends Mach. Learn., 15, 3 (2022), 176–334. https://doi.org/10.1561/2200000077
    [45]
    2007. Algorithmic Game Theory, Noam Nisan, Tim Roughgarden, Éva Tardos, and Vijay V. Vazirani (Eds.). Cambridge University Press. https://doi.org/10.1017/CBO9780511800481
    [46]
    Aleksandrs Slivkins. 2019. Introduction to Multi-Armed Bandits. Found. Trends Mach. Learn., 12, 1-2 (2019), 1–286. https://doi.org/10.1561/2200000068
    [47]
    Aleksandrs Slivkins, Karthik Abinav Sankararaman, and Dylan J. Foster. 2023. Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression. In COLT (Proceedings of Machine Learning Research, Vol. 195). PMLR, 4633–4656.
    [48]
    Wen Sun, Debadeepta Dey, and Ashish Kapoor. 2017. Safety-Aware Algorithms for Adversarial Contextual Bandit. In ICML (Proceedings of Machine Learning Research, Vol. 70). PMLR, 3280–3288.
    [49]
    William Vickrey. 1961. Counterspeculation, auctions, and competitive sealed tenders. The Journal of finance, 16, 1 (1961), 8–37.
    [50]
    Jonathan Weed, Vianney Perchet, and Philippe Rigollet. 2016. Online learning in repeated auctions. In COLT (JMLR Workshop and Conference Proceedings, Vol. 49). JMLR.org, 1562–1583.
    [51]
    Hao Yu, Michael J. Neely, and Xiaohan Wei. 2017. Online Convex Optimization with Stochastic Constraints. In NIPS. 1428–1438.
    [52]
    Banghua Zhu, Stephen Bates, Zhuoran Yang, Yixin Wang, Jiantao Jiao, and Michael I. Jordan. 2023. The Sample Complexity of Online Contract Design. In EC. ACM, 1188. https://doi.org/10.1145/3580507.3597673
    [53]
    You Zu, Krishnamurthy Iyer, and Haifeng Xu. 2021. Learning to Persuade on the Fly: Robustness Against Ignorance. In EC. ACM, 927–928. https://doi.org/10.1145/3465456.3467593

    Cited By

    View all

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    STOC 2024: Proceedings of the 56th Annual ACM Symposium on Theory of Computing
    June 2024
    2049 pages
    ISBN:9798400703836
    DOI:10.1145/3618260
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Bilateral Trade
    2. Budget Balance
    3. Online Learning
    4. Partial Feedback

    Qualifiers

    • Research-article

    Funding Sources

    • Ministero dell'Università e della Ricerca
    • European Research Council
    • Ministero dell'Università e della Ricerca

    Conference

    STOC '24
    Sponsor:
    STOC '24: 56th Annual ACM Symposium on Theory of Computing
    June 24 - 28, 2024
    BC, Vancouver, Canada

    Acceptance Rates

    Overall Acceptance Rate 1,469 of 4,586 submissions, 32%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 65
      Total Downloads
    • Downloads (Last 12 months)65
    • Downloads (Last 6 weeks)52
    Reflects downloads up to 09 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media