Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A Reinforcement Learning Approach to Optimize Discount and Reputation Tradeoffs in E-commerce Systems

Published: 27 October 2020 Publication History

Abstract

Feedback-based reputation systems are widely deployed in E-commerce systems. Evidence shows that earning a reputable label (for sellers of such systems) may take a substantial amount of time, and this implies a reduction of profit. We propose to enhance sellers’ reputation via price discounts. However, the challenges are as follows: (1) The demands from buyers depend on both the discount and reputation, and (2) the demands are unknown to the seller. To address these challenges, we first formulate a profit maximization problem via a semi-Markov decision process to explore the optimal tradeoffs in selecting price discounts. We prove the monotonicity of the optimal profit and optimal discount. Based on the monotonicity, we design a Q-learning with forward projection (QLFP) algorithm, which infers the optimal discount from historical transaction data. We prove that the QLFP algorithm convergences to the optimal policy. We conduct trace-driven simulations using a dataset from eBay to evaluate the QLFP algorithm. Evaluation results show that QLFP improves the profit by as high as 50% over both Q-learning and Speedy Q-learning. The QLFP algorithm also improves both the reputation and profit by as high as two times over the scheme of not providing any price discount.

References

[1]
Mohammad Gheshlaghi Azar, Remi Munos, Mohammad Ghavamzadeh, and Hilbert Kappen. 2011. Speedy Q-learning. In Advances in Neural Information Processing Systems.
[2]
Sulin Ba and Paul A. Pavlou. 2002. Evidence of the effect of trust building technology in electronic markets: Price premiums and buyer behavior. MIS Quart. 26, 3 (2002), 243--268.
[3]
Dimitri P. Bertsekas and John N. Tsitsiklis. 1996. Neuro-Dynamic Programming (1st ed.). Athena Scientific.
[4]
Stephen Boyd and Lieven Vandenberghe. 2004. Convex Optimization. Cambridge University Press.
[5]
Steven J. Bradtke and Michael O. Duff. 1994. Reinforcement learning methods for continuous-time Markov decision problems. In Proceedings of the Conference on Neural Information Processing Systems (NIPS’94).
[6]
Alpha C. Chiang. 1984. Fundamental Methods of Mathematical Economics. McGraw-Hill/Irwin, Boston, Mass.
[7]
Chrysanthos Dellarocas. 2001. Analyzing the economic efficiency of eBay-like online reputation reporting mechanisms. In Proceedings of the ACM Conference on Economics and Computation (EC’01).
[8]
Adithya M. Devraj and Sean Meyn. 2017. Zap Q-learning. In Advances in Neural Information Processing Systems. 2235--2244.
[9]
Prashant Dewan and Partha Dasgupta. 2010. P2P reputation management using distributed identities and decentralized recommendation chains. IEEE Trans. Knowl. Data Eng. 22, 7 (2010), 1000--1013.
[10]
eBay. 1995. eBay Classifies Sellers into Twelve Stars. Retrieved from http://pages.ebay.com/help/feedback/scores-reputation.html.
[11]
Fortune500. 2015. Retrieved from http://fortune.com/fortune500/.
[12]
Ramanthan Guha, Ravi Kumar, Prabhakar Raghavan, and Andrew Tomkins. 2004. Propagation of trust and distrust. In Proceedings of the Annual Conference on the World Wide Web (WWW’04). 403--412.
[13]
Kevin Hoffman, David Zage, and Cristina Nita-Rotaru. 2009. A survey of attack and defense techniques for reputation systems. ACM Comput. Surv. 42, 1, Article 1 (December 2009), 31 pages.
[14]
Daniel Houser and John Wooders. 2006. Reputation in auctions: Theory, and evidence from eBay. J. Econ. Manage. Strategy 15, 2 (2006).
[15]
Daniel R. Jiang and Warren B. Powell. 2015. An approximate dynamic programming algorithm for monotone value functions. Operat. Res. 63, 6 (2015), 1489--1511.
[16]
Ginger Zhe Jin and Andrew Kato. 2006. Price, quality, and reputation: Evidence from an online field experiment. AND J. Econ. 37, 4 (2006), 983--1005.
[17]
Sepandar D. Kamvar, Mario T. Schlosser, and Hector Garcia-Molina. 2003. The eigentrust algorithm for reputation management in P2P networks. In Proceedings of the Annual Conference on the World Wide Web (WWW’03).
[18]
Tapan Khopkar, Xin Li, and Paul Resnick. 2005. Self-selection, slipping, salvaging, slacking, and stoning: The impacts of negative feedback at eBay. In Proceedings of the ACM Conference on Economics and Computation (EC’05).
[19]
Stuart Landon and Constance E. Smith. 1998. Quality expectations, reputation, and price. South. Econ. J. 64, 3 (1998), 628--647.
[20]
Nolan Miller, Paul Resnick, and Richard Zeckhauser. 2005. Eliciting informative feedback: The peer-prediction method. Manage. Sci. 51, 9 (September 2005), 1359--1373.
[21]
Lev Muchnik, Sinan Aral, and Sean J. Taylor. 2013. Social influence bias: A randomized experiment. Science 341, 6146 (2013), 647--651.
[22]
Martin L. Puterman. 2014. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley 8 Sons.
[23]
Paul Resnick, Ko Kuwabara, Richard Zeckhauser, and Eric Friedman. 2000. Reputation systems. Commun. ACM 43, 12 (December 2000), 45--48.
[24]
Paul Resnick and Rahul Sami. 2009. Sybilproof transitive trust protocols. In Proceedings of the ACM Conference on Economics and Computation (EC’09).
[25]
Herbert Robbins and Sutton Monro. 1951. A stochastic approximation method. Ann. Math. Stat. 22, 3 (1951), 400--407.
[26]
Aameek Singh and Ling Liu. 2003. TrustMe: Anonymous management of trust relationships in decentralized P2P systems. In Proceedings of the Annual Peer-to-Peer Conference (P2P’03).
[27]
Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. Vol. 1. MIT press Cambridge.
[28]
Hong Xie and John C. S. Lui. 2015. A data driven approach to uncover deficiencies in online reputation systems. In Proceedings of the IEEE International Conference on Data Mining (ICDM’15).
[29]
Hong Xie and John C. S. Lui. 2015. Modeling eBay-like reputation systems: Analysis, characterization and insurance mechanism design. Perf. Eval. 91 (2015), 132--149.
[30]
Hong Xie and John C. S. Lui. 2017. Mining deficiencies of online reputation systems: Methodologies, experiments and implications. IEEE Trans. Serv. Comput. 13, 5 (2017), 887--900.
[31]
Hong Xie, Richard T. B. Ma, and John C. S. Lui. 2018. Enhancing reputation via price discounts in E-commerce systems: A data-driven approach. ACM Trans. Knowl. Discov. Data 20, 3, Article 26 (Jan. 2018), 29 pages.
[32]
Li Xiong and Ling Liu. 2004. Peertrust: Supporting reputation-based trust for peer-to-peer electronic communities. IEEE Trans. Knowl. Data Eng. 16, 7 (2004), 843--857.
[33]
Haitao Xu, Daiping Liu, Haining Wang, and Angelos Stavrou. 2015. E-commerce reputation manipulation: The emergence of reputation-escalation-as-a-service. In Proceedings of the Annual Conference on the World Wide Web (WWW’15).
[34]
Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, and Abraham Flaxman. 2006. SybilGuard: Defending against sybil attacks via social networks. In Proceedings of the ACM Special Interest Group on Data Communication Conference (SIGCOMM’06).
[35]
Xiuzhen Zhang, Lishan Cui, and Yan Wang. 2014. Commtrust: Computing multi-dimensional trust by mining e-commerce feedback comments. IEEE Trans. Knowl. Data Eng. 26, 7 (2014), 1631--1643.

Cited By

View all
  • (2024)Consumer evaluation mechanisms on e-commerce platforms: reputation management and analysis of influencing factorsApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-18729:1Online publication date: 5-Aug-2024
  • (2023)E-Commerce: Reach Customers and Drive Sales with Data Science and Big Data Analytics2023 2nd International Conference for Innovation in Technology (INOCON)10.1109/INOCON57975.2023.10101132(1-6)Online publication date: 3-Mar-2023
  • (2021)Chinese Emotional Dialogue Response Generation via Reinforcement LearningACM Transactions on Internet Technology10.1145/344639021:4(1-17)Online publication date: 22-Jul-2021

Index Terms

  1. A Reinforcement Learning Approach to Optimize Discount and Reputation Tradeoffs in E-commerce Systems

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Internet Technology
      ACM Transactions on Internet Technology  Volume 20, Issue 4
      November 2020
      391 pages
      ISSN:1533-5399
      EISSN:1557-6051
      DOI:10.1145/3427795
      • Editor:
      • Ling Liu
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 October 2020
      Accepted: 01 May 2020
      Revised: 01 April 2020
      Received: 01 November 2019
      Published in TOIT Volume 20, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Reputation systems
      2. discount
      3. reinforcement learning

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • GRF
      • Chongqing High-Technology Innovation and Application Development Funds
      • National Nature Science Foundation of China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)24
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 01 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Consumer evaluation mechanisms on e-commerce platforms: reputation management and analysis of influencing factorsApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-18729:1Online publication date: 5-Aug-2024
      • (2023)E-Commerce: Reach Customers and Drive Sales with Data Science and Big Data Analytics2023 2nd International Conference for Innovation in Technology (INOCON)10.1109/INOCON57975.2023.10101132(1-6)Online publication date: 3-Mar-2023
      • (2021)Chinese Emotional Dialogue Response Generation via Reinforcement LearningACM Transactions on Internet Technology10.1145/344639021:4(1-17)Online publication date: 22-Jul-2021

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media