Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

To Interfere or Not To Interfere: : Information Revelation and Price-Setting Incentives in a Multiagent Learning Environment

Published: 27 March 2024 Publication History

Abstract

Demand uncertainty and seller competition are substantial challenges for online platforms. In “To Interfere or Not To Interfere: Information Revelation and Price-Setting Incentives in a Multiagent Learning Environment,” Birge, Chen, Keskin, and Ward analyze whether and how an online platform should offer demand information or price incentives to the sellers participating on the platform. The authors show that, when facing uncertain demand, the platform could be better off by doing nothing—that is, not providing any information or incentives to the sellers. They also develop a strategic reveal-and-incentivize policy for the platform to choose when to start sharing information and offering rewards to coordinate the sellers’ pricing. They prove that the strategic reveal-and-incentivize policy achieves near-optimal profit performance for the platform.

Abstract

We consider a platform in which multiple sellers offer their products for sale over a time horizon of T periods. Each seller sets its own price. The platform collects a fraction of the sales revenue and provides price-setting incentives to the sellers to maximize its own revenue. The demand for each seller’s product is a function of all sellers’ prices and some customer features. Initially, neither the platform nor the sellers know the demand function, but they can learn about it through sales observations: each seller observes its own sales, whereas the platform observes all sellers’ sales as well as the customer feature information. We measure the platform’s performance by comparing its expected revenue with the full-information optimal revenue, and we design policies that enable the platform to manage information revelation and price-setting incentives. Perhaps surprisingly, a simple “do-nothing” policy does not always exhibit poor revenue performance and can perform exceptionally well under certain conditions. With a more conservative policy that reveals information to make price-setting incentives more effective, the platform can always protect itself from large revenue losses caused by demand model uncertainty. We develop a strategic reveal-and-incentivize policy that combines the benefits of the aforementioned policies and thereby achieves asymptotically optimal revenue performance as T grows large.
Funding: This work was supported by Duke University Fuqua School of Business, University of Chicago Booth School of Business, and CUHK Business School.
Supplemental Material: The e-companion is available at https://doi.org/10.1287/opre.2023.0363.

References

[1]
Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Proc. 24th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 2312–2320.
[2]
Abdallah S, Kaisers M (2016) Addressing environment non-stationarity by repeating Q-learning updates. J. Machine Learning Res. 17:1–31.
[3]
Abdallah S, Lesser V (2008) A multiagent reinforcement learning algorithm with non-linear dynamics. J. Artificial Intelligence Res. 33:521–549.
[4]
Afèche P, Zhe L, Maglaras C (2023) Ride-hailing networks with strategic drivers: The impact of platform control capabilities on the performance of rideshare networks. Manufacturing Service Oper. Management 15(5):1890–1908.
[5]
Airbnb (2019a) How does the Airbnb plus incentive program work. Retrieved November 13, https://www.airbnb.com/help/article/2365/how-does-the-airbnb-plus-incentive-program-work.
[6]
Airbnb (2019b) How you make money on Airbnb. Retrieved November 13, https://www.airbnb.com/d/financials.
[8]
Araman V, Caldentey R (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169–1188.
[9]
Arnosti N, Johari R, Kanoria Y (2014) Managing congestion in decentralized matching markets. Proc. 15th ACM Conf. Econom. Computat. (ACM, New York), 451.
[10]
Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learning Res. 3:397–422.
[11]
Aviv Y, Pazgal A (2005) A partially observed Markov decision process for dynamic pricing. Management Sci. 51(9):1400–1416.
[12]
Bai J, So KC, Tang CS, Chen X, Wang H (2018) Coordinating supply and demand on on-demand service platform with impatient customers. Manufacturing Service Oper. Management 21(3):556–570.
[13]
Ban G-Y, Keskin NB (2021) Personalized dynamic pricing with machine learning: High-dimensional features and heterogeneous elasticity. Management Sci. 67(9):5549–5568.
[14]
Banerjee S, Johari R, Riquelme C (2015) Pricing in ride-sharing platforms: A queueing-theoretic approach. Proc. 16th ACM Conf. Econom. Computat. (ACM, New York), 639.
[15]
Battigalli P (1996) Strategic independence and perfect Bayesian equilibria. J. Econom. Theory 70:201–234.
[16]
Bernstein F, DeCroix GA, Keskin NB (2021) Competition between two-sided platforms under demand and supply congestion effects. Manufacturing Service Oper. Management 23(5):1043–1061.
[17]
Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Oper. Res. 57(6):1407–1420.
[18]
Bimpikis K, Candogan O, Saban D (2019) Spatial pricing in ride-sharing networks. Oper. Res. 67(3):744–769.
[19]
Birge J, Candogan O, Chen H, Saban D (2021a) Optimal commissions and subscriptions in networked markets. Manufacturing Service Oper. Management 23(3):569–588.
[20]
Birge JR, Feng Y, Keskin NB, Schultz A (2021b) Dynamic learning and market making in spread betting markets with informed bettors. Oper. Res. 69(6):1746–1766.
[21]
Bouvard M, Levy R (2018) Two-sided reputation in certification markets. Management Sci. 64(10):4755–4774.
[22]
Bowling M (2004) Convergence and no-regret in multiagent learning. Proc. 17th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 209–216.
[23]
Bravo M, Leslie D, Mertikopoulos P (2018) Bandit learning in concave N-person games. Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 5666–5676.
[24]
Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965–980.
[25]
Calmon AP, Ciocan FD, Romero G (2021) Revenue management with repeated customer interactions. Management Sci. 67(5):2944–2963.
[26]
Camerer C, Ho T-H (1999) Experienced-weighted attraction learning in normal form games. Econometrica 67(4):827–874.
[27]
Che Y-K, Horner J (2018) Recommender systems as mechanisms for social learning. Quart. J. Econom. 133(2):871–925.
[28]
Chen B, Frazier PI, Kempe D (2018) Incentivizing exploration by heterogeneous users. Proc. Machine Learning Res. 75:1–21.
[29]
Chen QG, Jasin S, Duenyas I (2016) Real-time dynamic pricing with minimal and flexible price adjustment. Management Sci. 62(8):2437–2455.
[30]
Chen X, Owen Z, Pixton C, Simchi-Levi D (2022) A statistical learning approach to personalization in revenue management. Management Sci. 68(3):1923–1937.
[31]
Chu W, Li L, Reyzin L, Schapire R (2011) Contextual bandits with linear payoff functions. Proc. 14th Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 208–214.
[32]
Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. Proc. 15th National/10th Conf. Artificial Intelligence/Innovative Appl. Artificial Intelligence (American Association for Artificial Intelligence, Washington, DC), 746–752.
[33]
Cohen MC, Lobel I, Paes Leme R (2020) Feature-based dynamic pricing. Management Sci. 66(11):4921–4943.
[34]
den Boer A, Keskin NB (2020) Discontinuous demand functions: Estimation and pricing. Management Sci. 66(10):4516–4534.
[35]
den Boer A, Keskin NB (2022) Dynamic pricing with demand learning and reference effects. Management Sci. 68(10):7112–7130.
[36]
den Boer A, Zwart B (2014) Simultaneously learning and optimizing using controlled variance pricing. Management Sci. 60(3):770–783.
[37]
Erev I, Roth AE (1998) Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. Amer. Econom. Rev. 88(4):848–881.
[38]
Farias VF, Van Roy B (2010) Dynamic pricing with a prior on market response. Oper. Res. 58(1):16–29.
[39]
Filippas A, Jagabathula S, Sundararajan A (2019) Managing market mechanism transitions: A randomized trial of decentralized pricing vs. platform control. Proc. 2019 ACM Conf. Econom. Comput. (ACM, New York), 195–196.
[40]
Frazier P, Kempe D, Kleinberg J, Kleinberg R (2014) Incentivizing exploration. Proc. 15th ACM Conf. Econom. Computat. (ACM, New York), 5–22.
[41]
Fudenberg D, Kreps D (1993) Learning mixed equilibria. Games Econom. Behav. 5:320–367.
[42]
Fudenberg D, Tirole J (1991) Perfect Bayesian equilibrium and sequential equilibrium. J. Econom. Theory 53(2):236–260.
[43]
Greenwald A, Hall K (2003) Correlated Q-learning. Proc. 20th Internat. Conf. Machine Learning (ACM, New York), 242–249.
[44]
Hagiu A, Jullien B (2011) Why do intermediaries divert search. RAND J. Econom. 42(2):337–362.
[45]
Haobauer J, Sandholm WH (2002) On the global convergence of stochastic fictitious play. Econometrica 70(6):2265–2294.
[46]
Harrison JM, Sunar N (2015) Investment timing with incomplete information and multiple means of learning. Oper. Res. 63(2):442–457.
[47]
Harrison JM, Keskin NB, Zeevi A (2012) Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Sci. 58(3):570–586.
[48]
Hopkins E (2002) Two competing models of how people learn in games. Econometrica 70(6):2141–2166.
[49]
Hu J, Wellman MP (2003) Nash Q-learning for general-sum stochastic games. J. Machine Learning Res. 4:1039–1069.
[50]
Hu M, Zhou Y (2020) Price, wage and fixed commission in on-demand matching. Preprint, submitted August 31, https://doi.org/10.2139/ssrn.2949513.
[51]
Kanoria Y, Saban D (2021) Facilitating the search for partners on matching platforms. Management Sci. 67(10):5990–6029.
[52]
Keskin NB, Birge J (2019) Dynamic selling mechanisms for product differentiation and learning. Oper. Res. 67(4):1069–1089.
[53]
Keskin NB, Zeevi A (2014) Dynamic pricing with an unknown demand model: Asymptotically optimal semi-myopic policies. Oper. Res. 62(5):1142–1167.
[54]
Keskin NB, Zeevi A (2017) Chasing demand: Learning and earning in a changing environment. Math. Oper. Res. 42(2):277–307.
[55]
Keskin NB, Zeevi A (2018) On incomplete learning and certainty-equivalence control. Oper. Res. 66(4):1136–1167.
[56]
Keskin NB, Li Y, Song J-S (2022) Data-driven dynamic pricing and ordering with perishable inventory in a changing environment. Management Sci. 68(3):1938–1958.
[57]
Kowalczyk R, Gomes ER (2009) Dynamic analysis of multiagent Q-learning with ϵ-greedy exploration. Proc. 26th Annual Internat. Conf. Machine Learning (ACM, New York), 369–376.
[58]
Kremer I, Mansour Y, Perry M (2014) Implementing the “wisdom of the crowd.” J. Political Econom. 122(5):988–1012.
[59]
Kreps D, Wilson R (1982) Sequential equilibria. Econometrica 50(4):863–894.
[60]
Lahkar R, Riedel F (2015) The logit dynamic for games with continuous strategy sets. Games Econom. Behav. 91:268–282.
[61]
Lai TL, Wei CZ (1982) Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. Ann. Statist. 10(1):154–166.
[62]
Lykouris T, Syrgkanis V, Tardos E (2016) Learning and efficiency in games with dynamic population. Proc. 27th Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 120–129.
[63]
Mansour Y, Slivkins A, Syrgkanis V (2015) Bayesian incentive-compatible bandit exploration. Proc. 15th ACM Conf. Econom. Comput. (ACM, New York), 565–582.
[64]
Mansour Y, Slivkins A, Syrgkanis V, Wu ZS (2016) Bayesian exploration: Incentivizing exploration in Bayesian games. Proc. 17th ACM Conf. Econom. Comput. (ACM, New York), 661.
[65]
Mayzlin D, Dover Y, Chavalier J (2014) Promotional reviews: An empirical investigation of online review manipulation. Amer. Econom. Rev. 104(8):2421–2455.
[66]
Mertikopoulos P, Sandholm W (2016) Learning in games via reinforcement and regularization. Math. Oper. Res. 41(4):1297–1324.
[67]
Mertikopoulos P, Zhou Z (2019) Learning in games with continuous action sets and unkonwn demand functions. Math. Programming 173(1–2):465–507.
[68]
Miao S, Chen X, Chao X, Liu J, Zhang Y (2022) Context-based dynamic pricing with online clustering. Production Oper. Manag. 31(9):3559–3575.
[69]
Monderer D, Shapley LS (1996) Potential games. Games Econom. Behav. 14:124–143.
[70]
Nesterov Y (2009) Primal-dual subgradient methods for convex problems. Math. Programming 120:221–259.
[71]
Ozkan E, Ward A (2020) Dynamic matching for real-time ridesharing. Stochastic Systems 10(1):29–70.
[72]
Papanastasiou Y, Bimpikis K, Savva N (2017) Crowdsourcing exploration. Management Sci. 64(4):1727–1746.
[73]
Perkins S, Leslie DS (2014) Stochastic fictitious play with continuous action sets. J. Econom. Theory. 152:179–213.
[74]
Perkins S, Mertikopoulos P, Leslie DS (2017) Mixed-strategy learning with continuous action sets. IEEE Trans. Automatic Control 62:379–384.
[75]
Rayo L, Segal I (2010) Optimal information disclosure. J. Political Econom. 118(5):949–987.
[76]
Rosen J (1965) Existence and uniqueness of equilibrium points for concave N-person games. Econometrica 33(3):520–534.
[77]
Rusmevichientong P, Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper. Res. 35:395–411.
[78]
Salmon TC (2001) An evaluation of econometric models of adaptive learning. Econometrica 69(6):1597–1628.
[79]
Shin D, Vaccari S, Zeevi A (2023) Dynamic pricing with online reviews. Management Sci. 69(2):824–845.
[80]
Sunar N, Yu S, Kulkarni VG (2021) Competitive investment with Bayesian learning: Choice of business size and timing. Oper. Res. 69(5):1430–1449.
[81]
Taylor T (2018) On-demand service platforms. Manufacturing Service Oper. Management 20(4):704–720.
[82]
Upwork (2019) Upwork pricing. Retrieved November 13, https://www.upwork.com/i/pricing/.
[83]
Wai H-T, Yang Z, Wang Z, Hong M (2018) Multi-agent reinforcement learning via double averaging primal-dual optimization. Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 9672–9683.
[84]
Wang X, Tuomas S (2002) Reinforcement learning to play an optimal Nash equilibrium in team Markov games. Proc. 17th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1603–1610.
[85]
Wang Y, Chen B, Simchi-Levi D (2021) Multimodal dynamic pricing. Management Sci. 67(10):6136–6152.
[86]
Watson J (1998) Alternating-offer bargaining with two-sided incomplete information. Rev. Econom. Stud. 65(4):573–594.
[87]
Weyl EG (2010) A price theory of multi-sided platforms. Amer. Econom. Rev. 100(4):1642–1672.
[88]
Zhang H, Jasin S (2022) Online learning and optimization of (some) cyclic pricing policies in the presence of patient customers. Manufacturing Service Oper. Management 24(2):1165–1182.
[89]
Zhou Z, Mertikopoulos P, Bambos N, Glynn P, Tomlin C (2019) Multi-agent online learning with imperfect information. Working paper, Stanford University, Stanford, CA.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Operations Research
Operations Research  Volume 72, Issue 6
November-December 2024
519 pages
DOI:10.1287/opre.2024.72.issue-6
Issue’s Table of Contents

Publisher

INFORMS

Linthicum, MD, United States

Publication History

Published: 27 March 2024
Accepted: 07 April 2023
Received: 08 June 2021

Author Tag

  1. Market Analytics and Revenue Management

Author Tags

  1. sharing economy
  2. two-sided platforms
  3. revenue management
  4. pricing
  5. demand learning
  6. sequential estimation
  7. exploration-exploitation
  8. regret

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media