Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3033274.3084095acmconferencesArticle/Chapter ViewAbstractPublication PagesecConference Proceedingsconference-collections
abstract
Public Access

Matching while Learning

Published: 20 June 2017 Publication History

Abstract

We consider the problem faced by a service platform that needs to match supply with demand but also to learn attributes of new arrivals in order to match them better in the future. We introduce a benchmark model with heterogeneous workers and jobs that arrive over time. Job types are known to the platform, but worker types are unknown and must be learned by observing match outcomes. Workers depart after performing a certain number of jobs. The payoff from a match depends on the pair of types and the goal is to maximize the steady-state rate of accumulation of payoff.
Our main contribution is a complete characterization of the structure of the optimal policy in the limit that each worker performs many jobs. The platform faces a trade-off for each worker between myopically maximizing payoffs (exploitation) and learning the type of the worker (exploration). This creates a multitude of multi-armed bandit problems, one for each worker, coupled together by the constraint on the availability of jobs of different types (capacity constraints). We find that the platform should estimate a shadow price for each job type, and use the payoffs adjusted by these prices, first, to determine its learning goals and then, for each worker, (i) to balance learning with payoffs during the exploration phase, and (ii) to myopically match after it has achieved its learning goals during the exploitation phase.

Supplementary Material

MP4 File (02a_03johari.mp4)

References

[1]
Rajeev Agrawal, Demosthenis Teneketzis, and Venkatachalam Anantharam. Asymptotically efficient adaptive allocation schemes for controlled iid processes: finite parameter space. phAutomatic Control, IEEE Transactions on, 34 (3): 258--267, 1989.
[2]
Shipra Agrawal and Nikhil R Devanur. Bandits with concave rewards and convex knapsacks. In phProceedings of the fifteenth ACM conference on Economics and computation, pages 989--1006. ACM, 2014.
[3]
Shipra Agrawal and Nikhil R Devanur. Linear contextual bandits with global constraints and objective. pharXiv preprint arXiv:1507.06738, 2015.
[4]
Shipra Agrawal and Navin Goyal. Analysis of thompson sampling for the multi-armed bandit problem. pharXiv preprint arXiv:1111.1797, 2011.
[5]
Shipra Agrawal, Nikhil R Devanur, and Lihong Li. Contextual bandits with global constraints and objective. pharXiv preprint arXiv:1506.03374, 2015.
[6]
Mohammad Akbarpour, Shengwu Li, and Shayan Oveis Gharan. Dynamic matching market design. phAvailable at SSRN 2394319, 2014.
[7]
Ross Anderson, Itai Ashlagi, David Gamarnik, and Yash Kanoria. A dynamic model of barter exchange. In phProceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1925--1933. SIAM, 2015.
[8]
Baris Ata and Sunil Kumar. Heavy traffic analysis of open processing networks with complete resource pooling: asymptotic optimality of discrete review policies. phThe Annals of Applied Probability, 15 (1A): 331--391, 2005.
[9]
J.-Y. Audibert and R. Munos. Introduction to bandits: Algorithms and theory. In phICML, 2011.
[10]
Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. phMachine learning, 47 (2--3): 235--256, 2002.
[11]
Moshe Babaioff, Shaddin Dughmi, Robert Kleinberg, and Aleksandrs Slivkins. Dynamic pricing with limited supply. phACM Transactions on Economics and Computation, 3 (1): 4, 2015.
[12]
Mariagiovanna Baccara, SangMok Lee, and Leeat Yariv. Optimal dynamic matching. phAvailable at SSRN 2641670, 2015.
[13]
Ashwinkumar Badanidiyuru, Robert Kleinberg, and Yaron Singer. Learning on a budget: posted price mechanisms for online procurement. In phProceedings of the 13th ACM Conference on Electronic Commerce, pages 128--145. ACM, 2012.
[14]
Ashwinkumar Badanidiyuru, Robert Kleinberg, and Aleksandrs Slivkins. Bandits with knapsacks. In phFoundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on, pages 207--216. IEEE, 2013.
[15]
Ashwinkumar Badanidiyuru, John Langford, and Aleksandrs Slivkins. Resourceful contextual bandits. In phProceedings of The 27th Conference on Learning Theory, pages 1109--1134, 2014.
[16]
Omar Besbes and Assaf Zeevi. Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. phOperations Research, 57 (6): 1407--1420, 2009.
[17]
Omar Besbes and Assaf Zeevi. Blind network revenue management. phOperations research, 60 (6): 1537--1550, 2012.
[18]
Sébastien Bubeck and Nicolo Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. phMachine Learning, 5 (1): 1--122, 2012.
[19]
Jim G Dai. On positive harris recurrence of multiclass queueing networks: a unified approach via fluid limit models. phThe Annals of Applied Probability, pages 49--77, 1995.
[20]
Ettore Damiano and Ricky Lam. Stability in dynamic matching markets. phGames and Economic Behavior, 52 (1): 34--53, 2005.
[21]
Sanmay Das and Emir Kamenica. Two-sided bandits and the dating market. In phProceedings of the 19th international joint conference on Artificial intelligence, pages 947--952. Morgan Kaufmann Publishers Inc., 2005.
[22]
Daniel Fershtman and Alessandro Pavan. Dynamic matching: experimentation and cross subsidization. Technical report, Citeseer, 2015.
[23]
John Gittins, Kevin Glazebrook, and Richard Weber. phMulti-armed bandit allocation indices. John Wiley & Sons, 2011.
[24]
Ming Hu and Yun Zhou. Dynamic matching in a two-sided market. phAvailable at SSRN, 2015.
[25]
Sangram V Kadam and Maciej H Kotowski. Multi-period matching. Technical report, Harvard University, John F. Kennedy School of Government, 2015.
[26]
Emilie Kaufmann, Nathaniel Korda, and Rémi Munos. Thompson sampling: An asymptotically optimal finite-time analysis. In phAlgorithmic Learning Theory, pages 199--213. Springer, 2012.
[27]
Morimitsu Kurino. Credibility, efficiency, and stability: A theory of dynamic matching markets. 2005.
[28]
Tze Leung Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. phAdvances in applied mathematics, 6 (1): 4--22, 1985.
[29]
Constantinos Maglaras and Assaf Zeevi. Pricing and capacity sizing for systems with shared resources: Approximate solutions and scaling relations. phManagement Science, 49 (8): 1018--1038, 2003.
[30]
Constantinos Maglaras and Assaf Zeevi. Pricing and design of differentiated services: Approximate analysis and structural insights. phOperations Research, 53 (2): 242--262, 2005.
[31]
Laurent Massoulie and Kuang Xu. On the capacity of information processing systems, 2016. Unpublished.
[32]
Aranyak Mehta. Online matching and ad allocation. phTheoretical Computer Science, 8 (4): 265--368, 2012.
[33]
Daniel Russo and Benjamin Van Roy. Learning to optimize via posterior sampling. phMathematics of Operations Research, 39 (4): 1221--1243, 2014.
[34]
nd Zeevi(2013)}saure2013optimalDenis Sauré and Assaf Zeevi. Optimal dynamic assortment planning with demand learning. phManufacturing & Service Operations Management, 15 (3): 387--404, 2013.
[35]
Lloyd S Shapley and Martin Shubik. The assignment game i: The core. phInternational Journal of game theory, 1 (1): 111--130, 1971.
[36]
Adish Singla and Andreas Krause. Truthful incentives in crowdsourcing tasks using regret minimization mechanisms. In phProceedings of the 22nd international conference on World Wide Web, pages 1167--1178. International World Wide Web Conferences Steering Committee, 2013.
[37]
Zizhuo Wang, Shiming Deng, and Yinyu Ye. Close the gaps: A learning-while-doing algorithm for single-product revenue management problems. phOperations Research, 62 (2): 318--331, 2014.

Cited By

View all
  • (2023)Exploration for freeProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3626039(2192-2202)Online publication date: 31-Jul-2023
  • (2022)Integrated Online Learning and Adaptive Control in Queueing Systems with Uncertain PayoffsOperations Research10.1287/opre.2021.210070:2(1166-1181)Online publication date: 1-Mar-2022
  • (2022)Bandit Learning in Many-to-One Matching MarketsProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557248(2088-2097)Online publication date: 17-Oct-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EC '17: Proceedings of the 2017 ACM Conference on Economics and Computation
June 2017
740 pages
ISBN:9781450345279
DOI:10.1145/3033274
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2017

Check for updates

Author Tags

  1. capacity constraints
  2. learning
  3. matching
  4. multi-armed bandits
  5. two-sided platforms

Qualifiers

  • Abstract

Funding Sources

Conference

EC '17
Sponsor:
EC '17: ACM Conference on Economics and Computation
June 26 - 30, 2017
Massachusetts, Cambridge, USA

Acceptance Rates

EC '17 Paper Acceptance Rate 75 of 257 submissions, 29%;
Overall Acceptance Rate 664 of 2,389 submissions, 28%

Upcoming Conference

EC '25
The 25th ACM Conference on Economics and Computation
July 7 - 11, 2025
Stanford , CA , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)50
  • Downloads (Last 6 weeks)14
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Exploration for freeProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3626039(2192-2202)Online publication date: 31-Jul-2023
  • (2022)Integrated Online Learning and Adaptive Control in Queueing Systems with Uncertain PayoffsOperations Research10.1287/opre.2021.210070:2(1166-1181)Online publication date: 1-Mar-2022
  • (2022)Bandit Learning in Many-to-One Matching MarketsProceedings of the 31st ACM International Conference on Information & Knowledge Management10.1145/3511808.3557248(2088-2097)Online publication date: 17-Oct-2022
  • (2021)Designing Informative Rating SystemsManufacturing & Service Operations Management10.1287/msom.2020.092123:3(589-605)Online publication date: 1-May-2021
  • (2021)Experimenting in EquilibriumManagement Science10.1287/mnsc.2020.384467:11(6694-6715)Online publication date: 1-Nov-2021
  • (2020)Online contextual learning with perishable resources allocationIISE Transactions10.1080/24725854.2020.175295852:12(1343-1357)Online publication date: 4-Jun-2020
  • (2019)Discrimination in online marketsProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454479(2145-2155)Online publication date: 8-Dec-2019
  • (2018)Learning Proportionally Fair Allocations with Low RegretProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/32244312:2(1-31)Online publication date: 13-Jun-2018
  • (2018)Integrating Online Learning and Adaptive Control in Queueing Systems with Uncertain Payoffs2018 Information Theory and Applications Workshop (ITA)10.1109/ITA.2018.8503124(1-9)Online publication date: Feb-2018
  • (2017)Adaptive matching for expert systems with uncertain task types2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton)10.1109/ALLERTON.2017.8262814(753-760)Online publication date: Oct-2017
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media