abstract

Public Access

Matching while Learning

Authors:

Yash KanoriaAuthors Info & Claims

EC '17: Proceedings of the 2017 ACM Conference on Economics and Computation

Page 119

https://doi.org/10.1145/3033274.3084095

Published: 20 June 2017 Publication History

Abstract

We consider the problem faced by a service platform that needs to match supply with demand but also to learn attributes of new arrivals in order to match them better in the future. We introduce a benchmark model with heterogeneous workers and jobs that arrive over time. Job types are known to the platform, but worker types are unknown and must be learned by observing match outcomes. Workers depart after performing a certain number of jobs. The payoff from a match depends on the pair of types and the goal is to maximize the steady-state rate of accumulation of payoff.

Our main contribution is a complete characterization of the structure of the optimal policy in the limit that each worker performs many jobs. The platform faces a trade-off for each worker between myopically maximizing payoffs (exploitation) and learning the type of the worker (exploration). This creates a multitude of multi-armed bandit problems, one for each worker, coupled together by the constraint on the availability of jobs of different types (capacity constraints). We find that the platform should estimate a shadow price for each job type, and use the payoffs adjusted by these prices, first, to determine its learning goals and then, for each worker, (i) to balance learning with payoffs during the exploration phase, and (ii) to myopically match after it has achieved its learning goals during the exploitation phase.

Supplementary Material

MP4 File (02a_03johari.mp4)

Download
416.01 MB

References

[1]

Rajeev Agrawal, Demosthenis Teneketzis, and Venkatachalam Anantharam. Asymptotically efficient adaptive allocation schemes for controlled iid processes: finite parameter space. phAutomatic Control, IEEE Transactions on, 34 (3): 258--267, 1989.

[2]

Shipra Agrawal and Nikhil R Devanur. Bandits with concave rewards and convex knapsacks. In phProceedings of the fifteenth ACM conference on Economics and computation, pages 989--1006. ACM, 2014.

Digital Library

[3]

Shipra Agrawal and Nikhil R Devanur. Linear contextual bandits with global constraints and objective. pharXiv preprint arXiv:1507.06738, 2015.

[4]

Shipra Agrawal and Navin Goyal. Analysis of thompson sampling for the multi-armed bandit problem. pharXiv preprint arXiv:1111.1797, 2011.

[5]

Shipra Agrawal, Nikhil R Devanur, and Lihong Li. Contextual bandits with global constraints and objective. pharXiv preprint arXiv:1506.03374, 2015.

[6]

Mohammad Akbarpour, Shengwu Li, and Shayan Oveis Gharan. Dynamic matching market design. phAvailable at SSRN 2394319, 2014.

[7]

Ross Anderson, Itai Ashlagi, David Gamarnik, and Yash Kanoria. A dynamic model of barter exchange. In phProceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1925--1933. SIAM, 2015.

Digital Library

[8]

Baris Ata and Sunil Kumar. Heavy traffic analysis of open processing networks with complete resource pooling: asymptotic optimality of discrete review policies. phThe Annals of Applied Probability, 15 (1A): 331--391, 2005.

[9]

J.-Y. Audibert and R. Munos. Introduction to bandits: Algorithms and theory. In phICML, 2011.

[10]

Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. phMachine learning, 47 (2--3): 235--256, 2002.

Digital Library

[11]

Moshe Babaioff, Shaddin Dughmi, Robert Kleinberg, and Aleksandrs Slivkins. Dynamic pricing with limited supply. phACM Transactions on Economics and Computation, 3 (1): 4, 2015.

Digital Library

[12]

Mariagiovanna Baccara, SangMok Lee, and Leeat Yariv. Optimal dynamic matching. phAvailable at SSRN 2641670, 2015.

[13]

Ashwinkumar Badanidiyuru, Robert Kleinberg, and Yaron Singer. Learning on a budget: posted price mechanisms for online procurement. In phProceedings of the 13th ACM Conference on Electronic Commerce, pages 128--145. ACM, 2012.

Digital Library

[14]

Ashwinkumar Badanidiyuru, Robert Kleinberg, and Aleksandrs Slivkins. Bandits with knapsacks. In phFoundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pages 207--216. IEEE, 2013.

Digital Library

[15]

Ashwinkumar Badanidiyuru, John Langford, and Aleksandrs Slivkins. Resourceful contextual bandits. In phProceedings of The 27th Conference on Learning Theory, pages 1109--1134, 2014.

[16]

Omar Besbes and Assaf Zeevi. Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. phOperations Research, 57 (6): 1407--1420, 2009.

Digital Library

[17]

Omar Besbes and Assaf Zeevi. Blind network revenue management. phOperations research, 60 (6): 1537--1550, 2012.

[18]

Sébastien Bubeck and Nicolo Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. phMachine Learning, 5 (1): 1--122, 2012.

[19]

Jim G Dai. On positive harris recurrence of multiclass queueing networks: a unified approach via fluid limit models. phThe Annals of Applied Probability, pages 49--77, 1995.

[20]

Ettore Damiano and Ricky Lam. Stability in dynamic matching markets. phGames and Economic Behavior, 52 (1): 34--53, 2005.

[21]

Sanmay Das and Emir Kamenica. Two-sided bandits and the dating market. In phProceedings of the 19th international joint conference on Artificial intelligence, pages 947--952. Morgan Kaufmann Publishers Inc., 2005.

Digital Library

[22]

Daniel Fershtman and Alessandro Pavan. Dynamic matching: experimentation and cross subsidization. Technical report, Citeseer, 2015.

[23]

John Gittins, Kevin Glazebrook, and Richard Weber. phMulti-armed bandit allocation indices. John Wiley & Sons, 2011.

[24]

Ming Hu and Yun Zhou. Dynamic matching in a two-sided market. phAvailable at SSRN, 2015.

[25]

Sangram V Kadam and Maciej H Kotowski. Multi-period matching. Technical report, Harvard University, John F. Kennedy School of Government, 2015.

[26]

Emilie Kaufmann, Nathaniel Korda, and Rémi Munos. Thompson sampling: An asymptotically optimal finite-time analysis. In phAlgorithmic Learning Theory, pages 199--213. Springer, 2012.

Digital Library

[27]

Morimitsu Kurino. Credibility, efficiency, and stability: A theory of dynamic matching markets. 2005.

[28]

Tze Leung Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. phAdvances in applied mathematics, 6 (1): 4--22, 1985.

Digital Library

[29]

Constantinos Maglaras and Assaf Zeevi. Pricing and capacity sizing for systems with shared resources: Approximate solutions and scaling relations. phManagement Science, 49 (8): 1018--1038, 2003.

Digital Library

[30]

Constantinos Maglaras and Assaf Zeevi. Pricing and design of differentiated services: Approximate analysis and structural insights. phOperations Research, 53 (2): 242--262, 2005.

Digital Library

[31]

Laurent Massoulie and Kuang Xu. On the capacity of information processing systems, 2016. Unpublished.

[32]

Aranyak Mehta. Online matching and ad allocation. phTheoretical Computer Science, 8 (4): 265--368, 2012.

Digital Library

[33]

Daniel Russo and Benjamin Van Roy. Learning to optimize via posterior sampling. phMathematics of Operations Research, 39 (4): 1221--1243, 2014.

[34]

nd Zeevi(2013)}saure2013optimalDenis Sauré and Assaf Zeevi. Optimal dynamic assortment planning with demand learning. phManufacturing & Service Operations Management, 15 (3): 387--404, 2013.

Digital Library

[35]

Lloyd S Shapley and Martin Shubik. The assignment game i: The core. phInternational Journal of game theory, 1 (1): 111--130, 1971.

[36]

Adish Singla and Andreas Krause. Truthful incentives in crowdsourcing tasks using regret minimization mechanisms. In phProceedings of the 22nd international conference on World Wide Web, pages 1167--1178. International World Wide Web Conferences Steering Committee, 2013.

Digital Library

[37]

Zizhuo Wang, Shiming Deng, and Yinyu Ye. Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. phOperations Research, 62 (2): 318--331, 2014.

Digital Library

Cited By

Wang XYang LChen YLiu XHajiesmaili MTowsley DLui JEvans RShpitser I(2023)Exploration for freeProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3626039(2192-2202)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.5555/3625834.3626039
Hsu WXu JLin XBell M(2022)Integrated Online Learning and Adaptive Control in Queueing Systems with Uncertain PayoffsOperations Research10.1287/opre.2021.210070:2(1166-1181)Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1287/opre.2021.2100
Wang ZGuo LYin JLi SAl Hasan MXiong L(2022)Bandit Learning in Many-to-One Matching MarketsProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557248(2088-2097)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557248
Show More Cited By

Index Terms

Matching while Learning
1. Theory of computation
  1. Design and analysis of algorithms
    1. Online algorithms
      1. Online learning algorithms
  2. Theory and algorithms for application domains
    1. Machine learning theory
      1. Regret bounds

Recommendations

Matching While Learning
Platforms face a cold start problem whenever new users arrive: namely, the platform must learn attributes of new users (explore) in order to match them better in the future (exploit). How should a platform handle cold starts when there are limited ...
We consider the problem faced by a service platform that needs to match limited supply with demand while learning the attributes of new users to match them better in the future. We introduce a benchmark model with heterogeneous workers (demand) and a ...
Selling to a No-Regret Buyer
EC '18: Proceedings of the 2018 ACM Conference on Economics and Computation

We consider the problem of a single seller repeatedly selling a single item to a single buyer (specifically, the buyer has a value drawn fresh from known distribution D in every round). Prior work assumes that the buyer is fully rational and will ...
To Interfere or Not To Interfere: Information Revelation and Price-Setting Incentives in a Multiagent Learning Environment
Demand uncertainty and seller competition are substantial challenges for online platforms. In “To Interfere or Not To Interfere: Information Revelation and Price-Setting Incentives in a Multiagent Learning Environment,” Birge, Chen, Keskin, and Ward ...
We consider a platform in which multiple sellers offer their products for sale over a time horizon of T periods. Each seller sets its own price. The platform collects a fraction of the sales revenue and provides price-setting incentives to the sellers to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

EC '17: Proceedings of the 2017 ACM Conference on Economics and Computation

June 2017

740 pages

ISBN:9781450345279

DOI:10.1145/3033274

General Chair:
Constantinos Daskalakis
Massachusetts Institute of Technology, USA
,
Program Chairs:
Moshe Babaioff
Microsoft Research, Israel
,
Hervé Moulin
University of Glasgow, UK

Copyright © 2017 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGecom: Special Interest Group on Economics and Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2017

Check for updates

Author Tags

Qualifiers

Abstract

Funding Sources

Conference

EC '17

Sponsor:

SIGecom

EC '17: ACM Conference on Economics and Computation

June 26 - 30, 2017

Massachusetts, Cambridge, USA

Acceptance Rates

EC '17 Paper Acceptance Rate 75 of 257 submissions, 29%;

Overall Acceptance Rate 664 of 2,389 submissions, 28%

Upcoming Conference

EC '25

Sponsor:
sigecom

The 25th ACM Conference on Economics and Computation

July 7 - 11, 2025

Stanford , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

13
Total Citations
View Citations
395
Total Downloads

Downloads (Last 12 months)50
Downloads (Last 6 weeks)14

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang XYang LChen YLiu XHajiesmaili MTowsley DLui JEvans RShpitser I(2023)Exploration for freeProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3626039(2192-2202)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.5555/3625834.3626039
Hsu WXu JLin XBell M(2022)Integrated Online Learning and Adaptive Control in Queueing Systems with Uncertain PayoffsOperations Research10.1287/opre.2021.210070:2(1166-1181)Online publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1287/opre.2021.2100
Wang ZGuo LYin JLi SAl Hasan MXiong L(2022)Bandit Learning in Many-to-One Matching MarketsProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557248(2088-2097)Online publication date: 17-Oct-2022
https://dl.acm.org/doi/10.1145/3511808.3557248
Garg NJohari R(2021)Designing Informative Rating SystemsManufacturing & Service Operations Management10.1287/msom.2020.092123:3(589-605)Online publication date: 1-May-2021
https://dl.acm.org/doi/10.1287/msom.2020.0921
Wager SXu K(2021)Experimenting in EquilibriumManagement Science10.1287/mnsc.2020.384467:11(6694-6715)Online publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1287/mnsc.2020.3844
Pan XSong JZhao JTruong V(2020)Online contextual learning with perishable resources allocationIISE Transactions10.1080/24725854.2020.175295852:12(1343-1357)Online publication date: 4-Jun-2020
https://doi.org/10.1080/24725854.2020.1752958
Monachou FAshlagi IWallach HLarochelle HBeygelzimer Ad'Alché-Buc FFox E(2019)Discrimination in online marketsProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454479(2145-2155)Online publication date: 8-Dec-2019
https://dl.acm.org/doi/10.5555/3454287.3454479
Talebi MProutiere A(2018)Learning Proportionally Fair Allocations with Low RegretProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/32244312:2(1-31)Online publication date: 13-Jun-2018
https://dl.acm.org/doi/10.1145/3224431
Hsu WXu JLin XBell M(2018)Integrating Online Learning and Adaptive Control in Queueing Systems with Uncertain Payoffs2018 Information Theory and Applications Workshop (ITA)10.1109/ITA.2018.8503124(1-9)Online publication date: Feb-2018
https://doi.org/10.1109/ITA.2018.8503124
Shah VGulikers LMassoulie LVojnovic M(2017)Adaptive matching for expert systems with uncertain task types2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton)10.1109/ALLERTON.2017.8262814(753-760)Online publication date: Oct-2017
https://doi.org/10.1109/ALLERTON.2017.8262814
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents