Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3357384.3357799acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Multi-Agent Reinforcement Learning for Order-dispatching via Order-Vehicle Distribution Matching

Published: 03 November 2019 Publication History

Abstract

Improving the efficiency of dispatching orders to vehicles is a research hotspot in online ride-hailing systems. Most of the existing solutions for order-dispatching are centralized controlling, which require to consider all possible matches between available orders and vehicles. For large-scale ride-sharing platforms, there are thousands of vehicles and orders to be matched at every second which is of very high computational cost. In this paper, we propose a decentralized execution order-dispatching method based on multi-agent reinforcement learning to address the large-scale order-dispatching problem. Different from the previous cooperative multi-agent reinforcement learning algorithms, in our method, all agents work independently with the guidance from an evaluation of the joint policy since there is no need for communication or explicit cooperation between agents. Furthermore, we use KL-divergence optimization at each time step to speed up the learning process and to balance the vehicles (supply) and orders (demand). Experiments on both the explanatory environment and real-world simulator show that the proposed method outperforms the baselines in terms of accumulated driver income (ADI) and Order Response Rate (ORR) in various traffic environments. Besides, with the support of the online platform of Didi Chuxing, we designed a hybrid system to deploy our model.

References

[1]
Aamena Alshamsi, Sherief Abdallah, and Iyad Rahwan. 2009. Multiagent self-organization for a taxi dispatch system. In8th international conference on autonomous agents and multiagent systems. 21--28.
[2]
Darse Billings, Denis Papp, Jonathan Schaeffer, and Duane Szafron. 1998. Opponent modeling in poker. Aaai/iaai493 (1998), 499.
[3]
Lucian Busoniu, Robert Babuska, and Bart De Schutter. 2006. Multi-agent rein-forcement learning: A survey. In Control, Automation, Robotics and Vision, 2006. ICARCV'06. 9th International Conference on. IEEE, 1--6.
[4]
Stephen C Chadwick and Charles Baron. 2015. Context-aware distributive taxicab dispatching. (March 19 2015). US Patent App. 14/125,549.
[5]
Lee Chean Chung. 2005.GPS taxi dispatch system based on A* shortest pathalgorithm. Ph.D. Dissertation. Master's thesis, Submitted to the Department of Transportation and Logistics at Malausia University of Science and Technology(MUST) in partial fulfillment of the requirements for the degree of Master of Science in Transportation and Logistics.
[6]
Jakob Foerster, Ioannis Alexandros Assael, Nando de Freitas, and Shimon Whiteson. 2016. Learning to communicate with deep multi-agent reinforcement learning. In Advances in Neural Information Processing Systems. 2137--2145.
[7]
Jakob N Foerster, Yannis M Assael, Nando de Freitas, and Shimon Whiteson.2016. Learning to communicate to solve riddles with deep distributed recurrentq-networks. arXiv preprint arXiv:1602.02672(2016).
[8]
Jayesh K Gupta, Maxim Egorov, and Mykel Kochenderfer. 2017. Cooperative multi-agent control using deep reinforcement learning. InInternational Conference onAutonomous Agents and Multiagent Systems. Springer, 66--83.
[9]
Matthew John Hausknecht. 2016.Cooperation and communication in multiagent deep reinforcement learning. Ph.D. Dissertation.
[10]
Junling Hu, Michael P Wellman, et al.1998. Multiagent reinforcement learning: theoretical framework and an algorithm. InICML, Vol. 98. Citeseer, 242--250
[11]
Der-Horng Lee, Hao Wang, Ruey Cheu, and Siew Teo. 2004. Taxi dispatch system based on current demands and real-time traffic conditions. Transportation Research Record: Journal of the Transportation Research Board1882 (2004), 193--200.
[12]
Bin Li, Daqing Zhang, Lin Sun, Chao Chen, Shijian Li, Guande Qi, and Qiang Yang. 2011. Hunting or waiting? Discovering passenger-finding strategies from alarge-scale real-world taxi dataset. In Pervasive Computing and Communications Workshops (PERCOM Workshops), 2011 IEEE International Conference on. IEEE,63--68.
[13]
Minne Li, Yan Jiao, Yaodong Yang, Zhichen Gong, Jun Wang, Chenxi Wang, Guobin Wu, Jieping Ye, et al. 2019. Efficient Ride sharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning. arXiv preprint arXiv:1901.11454(2019).
[14]
Ziqi Liao. 2001. Taxi dispatching via global positioning systems. IEEE Transactionson Engineering Management 48, 3 (2001), 342--347.
[15]
Ziqi Liao. 2003. Real-time taxi dispatching using global positioning systems. Commun. ACM46, 5 (2003), 81--83.
[16]
Kaixiang Lin, Renyu Zhao, Zhe Xu, and Jiayu Zhou. 2018. Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning. arXiv preprintar Xiv:1802.06444(2018).
[17]
Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor Mordatch. 2017. Multi-agent actor-critic for mixed cooperative-competitive environments. In Advances in Neural Information Processing Systems. 6379--6390.
[18]
Gary S Lynch. 2009. Single point of failure: The 10 essential laws of supply chainrisk management. John Wiley & Sons.
[19]
Fei Miao, Shuo Han, Shan Lin, John A Stankovic, Desheng Zhang, Sirajum Munir, Hua Huang, Tian He, and George J Pappas. 2016. Taxi dispatch with real-time sensing data in metropolitan areas: A receding horizon control approach. IEEE Transactions on Automation Science and Engineering 13, 2 (2016), 463--478.
[20]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al.2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.
[21]
Igor Mordatch and Pieter Abbeel. 2017. Emergence of Grounded Compositional Language in Multi-Agent Populations. arXiv preprint arXiv: 1703.04908(2017).
[22]
James Munkres. 1957. Algorithms for the assignment and transportation prob-lems.Journal of the society for industrial and applied mathematics 5, 1 (1957),32--38.
[23]
David Myr. 2013. Automatic optimal taxicab mobile location based dispatching system. (May 14 2013). US Patent 8,442,848.
[24]
Takuma Oda and Yulia Tachibana. 2018. Distributed Fleet Control with Maximum Entropy Deep Reinforcement Learning. (2018).
[25]
Christos H Papadimitriou and Kenneth Steiglitz. 1998. Combinatorial optimization: algorithms and complexity. Courier Corporation.
[26]
Frederik Schadd, Sander Bakkes, and Pieter Spronck. 2007. Opponent Modelingin Real-Time Strategy Games. In GAMEON. 61--70.
[27]
Kiam Tian Seow, Nam Hai Dang, and Der-Horng Lee. 2010. A collaborative multiagent taxi-dispatch system. IEEE Transactions on Automation Science and Engineering 7, 3 (2010), 607--616.
[28]
Lloyd S Shapley. 1953. Stochastic games. Proceedings of the national academy ofsciences 39, 10 (1953), 1095--1100.
[29]
Matthijs TJ Spaan. 2012. Partially observable Markov decision processes. In Reinforcement Learning. Springer, 387--414.
[30]
Sainbayar Sukhbaatar, Rob Fergus, et al.2016. Learning multiagent communication with back propagation. In Advances in Neural Information Processing Systems. 2244--2252.
[31]
Gerald Tesauro. 2004. Extending Q-learning to general adaptive multi-agent systems. In Advances in neural information processing systems. 871--878.
[32]
Yongxin Tong, Yuxiang Zeng, Zimu Zhou, Lei Chen, Jieping Ye, and Ke Xu. 2018.A unified approach to route planning for shared mobility. Proceedings of the VLDB Endowment 11, 11 (2018), 1633--1646.
[33]
Hado Van Hasselt, Arthur Guez, and David Silver. 2016. Deep Reinforcement Learning with Double Q-Learning. In AAAI, Vol. 2. Phoenix, AZ, 5.
[34]
Zheng Wang, Kun Fu, and Jieping Ye. 2018. Learning to estimate the travel time. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 858--866.
[35]
Zhaodong Wang, Zhiwei Qin, Xiaocheng Tang, Jieping Ye, and Hongtu Zhu. 2018. Deep Reinforcement Learning with Knowledge Transfer for Online Rides Order Dispatching. In 2018 IEEE International Conference on Data Mining (ICDM). IEEE, 617--626.
[36]
Chong Wei, Yinhu Wang, Xuedong Yan, and Chunfu Shao. 2018. Look-Ahead Insertion Policy for a Shared-Taxi System Based on Reinforcement Learning. IEEE Access6 (2018), 5716--5726.
[37]
Zhe Xu, Zhixin Li, Qingwen Guan, Dingshui Zhang, Qiang Li, Junxiao Nan,Chunyang Liu, Wei Bian, and Jieping Ye. 2018. Large-Scale Order Dispatch in On-Demand Ride-Hailing Platforms: A Learning and Planning Approach. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 905--913.
[38]
Yaodong Yang, Rui Luo, Minne Li, Ming Zhou, Weinan Zhang, and Jun Wang. 2018. Mean Field Multi-Agent Reinforcement Learning. arXiv preprint arXiv:1802.05438(2018).
[39]
Lingyu Zhang, Tao Hu, Yue Min, Guobin Wu, Junying Zhang, Pengcheng Feng, Pinghua Gong, and Jieping Ye. 2017. A taxi order dispatch model based on combinatorial optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2151--2159.
[40]
Lianmin Zheng, Jiacheng Yang, Han Cai, Weinan Zhang, Jun Wang, and Yong Yu. 2017. MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence. NIPS Demo(2017).

Cited By

View all
  • (2025)Learning the on-demand adaptable matching range with a reinforcement learningTransportation Research Part C: Emerging Technologies10.1016/j.trc.2025.105018172(105018)Online publication date: Mar-2025
  • (2025)Scalable order dispatching through Federated Multi-Agent Deep Reinforcement LearningExpert Systems with Applications10.1016/j.eswa.2024.125792264(125792)Online publication date: Mar-2025
  • (2024)Multi-agent reinforcement learning with hierarchical coordination for emergency responder stationingProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693934(45813-45834)Online publication date: 21-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
November 2019
3373 pages
ISBN:9781450369763
DOI:10.1145/3357384
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep reinforcement learning
  2. multi-agent reinforcement learning
  3. order-dispatching
  4. ride-hailing

Qualifiers

  • Research-article

Funding Sources

  • NSFC

Conference

CIKM '19
Sponsor:

Acceptance Rates

CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)150
  • Downloads (Last 6 weeks)15
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Learning the on-demand adaptable matching range with a reinforcement learningTransportation Research Part C: Emerging Technologies10.1016/j.trc.2025.105018172(105018)Online publication date: Mar-2025
  • (2025)Scalable order dispatching through Federated Multi-Agent Deep Reinforcement LearningExpert Systems with Applications10.1016/j.eswa.2024.125792264(125792)Online publication date: Mar-2025
  • (2024)Multi-agent reinforcement learning with hierarchical coordination for emergency responder stationingProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693934(45813-45834)Online publication date: 21-Jul-2024
  • (2024)Population Game-Assisted Multi-Agent Reinforcement Learning Method for Dynamic Multi-Vehicle Route SelectionElectronics10.3390/electronics1308155513:8(1555)Online publication date: 19-Apr-2024
  • (2024)Reinforcement Learning-Driven Intelligent Truck Dispatching Algorithms for Freeway LogisticsComplex System Modeling and Simulation10.23919/CSMS.2024.00164:4(368-386)Online publication date: Dec-2024
  • (2024)Comparing Multiagent Dynamic and Urgent Task Sharing Methods when the Cost of Unexecuted Task Is Highタスク未実行コストが高い状況下でのマルチエージェント間動的緊急タスクシェア手法の比較Transactions of the Japanese Society for Artificial Intelligence10.1527/tjsai.39-6_AG24-F39:6(AG24-F_1-13)Online publication date: 1-Nov-2024
  • (2024)An End-to-End Reinforcement Learning Based Approach for Micro-View Order-Dispatching in Ride-HailingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680013(5054-5061)Online publication date: 21-Oct-2024
  • (2024)LAMD2: Enabling Economical and Green Travel for Diversified Mobility on Demand SystemsIEEE Transactions on Mobile Computing10.1109/TMC.2024.335362123:8(8525-8540)Online publication date: Aug-2024
  • (2024)Joint Optimization of Pricing, Dispatching and Repositioning in Ride-Hailing With Multiple Models Interplayed Reinforcement LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.346456336:12(8593-8606)Online publication date: Dec-2024
  • (2024)Optimizing Long-Term Efficiency and Fairness in Ride-Hailing under Budget Constraint via Joint Order Dispatching and Driver RepositioningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3348491(1-14)Online publication date: 2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media