Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Exploring Deep Reinforcement Learning for Task Dispatching in Autonomous On-Demand Services

Published: 21 April 2021 Publication History

Abstract

Autonomous on-demand services, such as GOGOX (formerly GoGoVan) in Hong Kong, provide a platform for users to request services and for suppliers to meet such demands. In such a platform, the suppliers have autonomy to accept or reject the demands to be dispatched to him/her, so it is challenging to make an online matching between demands and suppliers. Existing methods use round-based approaches to dispatch demands. In these works, the dispatching decision is based on the predicted response patterns of suppliers to demands in the current round, but they all fail to consider the impact of future demands and suppliers on the current dispatching decision. This could lead to taking a suboptimal dispatching decision from the future perspective. To solve this problem, we propose a novel demand dispatching model using deep reinforcement learning. In this model, we make each demand as an agent. The action of each agent, i.e., the dispatching decision of each demand, is determined by a centralized algorithm in a coordinated way. The model works in the following two steps. (1) It learns the demand’s expected value in each spatiotemporal state using historical transition data. (2) Based on the learned values, it conducts a Many-To-Many dispatching using a combinatorial optimization algorithm by considering both immediate rewards and expected values of demands in the next round. In order to get a higher total reward, the demands with a high expected value (short response time) in the future may be delayed to the next round. On the contrary, the demands with a low expected value (long response time) in the future would be dispatched immediately. Through extensive experiments using real-world datasets, we show that the proposed model outperforms the existing models in terms of Cancellation Rate and Average Response Time.

References

[1]
Aamena Alshamsi, Sherief Abdallah, and Iyad Rahwan. 2009. Multiagent self-organization for a taxi dispatch system. In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems. 21–28.
[2]
P. Arunapuram, J. W. Bartel, and P. Dewan. 2014. Distribution, correlation and prediction of response times in Stack Overflow. In Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, Applications and Worksharing. 378–387.
[3]
N. Burlutskiy, A. Fish, N. Ali, and M. Petridis. 2015. Prediction of users’ response time in Q&A communities. In Proceedings of the 2015 IEEE 14th International Conference on Machine Learning and Applications. 618–623.
[4]
Yong Chen, Ming Zhou, Ying Wen, Yaodong Yang, Yufeng Su, Weinan Zhang, Dell Zhang, Jun Wang, and Han Liu. 2018. Factorized Q-learning for large-scale multi-agent systems. In Proceedings of the 1st International Conference on Distributed Artificial Intelligence. ACM, Article No. 7. https://doi.org/10.1145/3356464.3357707
[5]
P. Cheng, X. Lian, L. Chen, and C. Shahabi. 2017. Prediction-based task assignment in spatial crowdsourcing. In Proceedings of the 2017 IEEE 33rd International Conference on Data Engineering. IEEE, 997–1008.
[6]
David Geiger and Martin Schader. 2014. Personalized task recommendation in crowdsourcing information systems—current state of the art. Decision Support Systems 65, C (2014), 3–16.
[7]
GOGOX. 2020. GOGOX Hong Kong. Retrieved from https://www.gogox.com.hk.
[8]
Jiarui Jin, Ming Zhou, Weinan Zhang, Minne Li, Zilong Guo, Zhiwei Qin, Yan Jiao, Xiaocheng Tang, Chenxi Wang, Jun Wang, Guobin Wi, and Jieping Ye. 2019. CoRide: Joint order dispatching and fleet management for multi-scale ride-hailing platforms. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. ACM, New York, NY, USA, 1983–1992.
[9]
Jintao Ke, Feng Xiao, Hai Yang, and Jieping Ye. 2020. Learning to delay in ride-sourcing systems: a multi-agent deep reinforcement learning framework. IEEE Transactions on Knowledge and Data Engineering.
[10]
S. Klos née Müller, C. Tekin, M. van der Schaar, and A. Klein. 2018. Context-aware hierarchical online learning for performance maximization in mobile crowdsourcing. IEEE/ACM Transactions on Networking 26, 3 (Jun. 2018), 1334–1347.
[11]
Derhorng Lee, Hao Wang, Ruey Long Cheu, and Siew Hoon Teo. 2004. Taxi dispatch system based on current demands and real-time traffic conditions. Transportation Research Record 1882, 1882 (2004), 193–200.
[12]
W. Li, J. Cao, J. Guan, S. Zhou, G. Liang, W. K. Y. So, and M. Szczecinski. 2019. A general framework for unmet demand prediction in on-demand transport services. IEEE Transactions on Intelligent Transportation Systems 20, 8 (Aug. 2019), 2820–2830.
[13]
Kaixiang Lin, Renyu Zhao, Zhe Xu, and Jiayu Zhou. 2018. Efficient large-scale fleet management via multi-agent deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, New York, NY, 1774–1783.
[14]
Jalal Mahmud, Jilin Chen, and Jeffrey Nichols. 2013. When will you answer this? Estimating response time in Twitter. In Proceedings of the 7th International AAAI Conference on Weblogs and Social Media.
[15]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529–533.
[16]
James Munkres. 1957. Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics 10, 1 (1957), 196–210.
[17]
K. T. Seow, N. H. Dang, and D. Lee. 2010. A collaborative multiagent taxi-dispatch system. IEEE Transactions on Automation Science and Engineering 7, 3 (Jul. 2010), 607–616.
[18]
Xiaocheng Tang, Zhiwei (Tony) Qin, Fan Zhang, Zhaodong Wang, Zhe Xu, Yintai Ma, Hongtu Zhu, and Jieping Ye. 2019. A deep value-network based approach for multi-driver order dispatching. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 1780–1790.
[19]
Hien To, Cyrus Shahabi, and Leyla Kazemi. 2015. A server-assigned spatial crowdsourcing framework. ACM Transaction of Spatial Algorithms Systems 1, 1 (Jul. 2015), Article 2, 28 pages.
[20]
Yongxin Tong, Jieying She, Bolin Ding, Lei Chen, Tianyu Wo, and Ke Xu. 2016. Online minimum matching in real-time spatial data: Experiments and analysis. Proceedings of the VLDB Endowment 9, 12 (Aug. 2016), 1053–1064.
[21]
Yongxin Tong, Jieying She, Bolin Ding, Libin Wang, and Lei Chen. 2016. Online mobile micro-task allocation in spatial crowdsourcing. In Proceedings of the 2016 IEEE 32Nd International Conference on Data Engineering. IEEE, 49–60.
[22]
Yongxin Tong, Libin Wang, Zhou Zimu, Bolin Ding, Lei Chen, Jieping Ye, and Ke Xu. 2017. Flexible online task assignment in real-time spatial data. Proceedings of the VLDB Endowment 10, 11 (2017), 1334–1345.
[23]
Y. Tong, Y. Zeng, B. Ding, L. Wang, and L. Chen. 2019. Two-sided online micro-task assignment in spatial crowdsourcing. IEEE Transactions on Knowledge and Data Engineering.
[24]
Yongxin Tong, Zimu Zhou, Yuxiang Zeng, Lei Chen, and Cyrus Shahabi. 2020. Spatial crowdsourcing: A survey. The VLDB Journal 29, 1 (2020), 217–250.
[25]
Yuqi Wang, Jiannong Cao, Lifang He, Wengen Li, Lichao Sun, and Philip S. Yu. 2017. Coupled sparse matrix factorization for response time prediction in logistics services. In Proceedings of the 2017 ACM Conference on Information and Knowledge Management. ACM, New York, NY, 939–947.
[26]
Y. Wang, Y. Tong, C. Long, P. Xu, K. Xu, and W. Lv. 2019. Adaptive dynamic bipartite graph matching: A reinforcement learning approach. In Proceedings of the 2019 IEEE 35th International Conference on Data Engineering. IEEE, 1478–1489.
[27]
Z. Wang, Z. Qin, X. Tang, J. Ye, and H. Zhu. 2018. Deep reinforcement learning with knowledge transfer for online rides order dispatching. In Proceedings of the 2018 IEEE International Conference on Data Mining. IEEE, 617–626.
[28]
Zhe Xu, Zhixin Li, Qingwen Guan, Dingshui Zhang, Qiang Li, Junxiao Nan, Chunyang Liu, Wei Bian, and Jieping Ye. 2018. Large-scale order dispatch in on-demand ride-hailing platforms: A learning and planning approach. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 905–913.
[29]
L. Yang, X. Yu, J. Cao, W. Li, Y. Wang, and M. Szczecinski. 2019. A novel demand dispatching model for autonomous on-demand services. IEEE Transactions on Services Computing.
[30]
Lingyu Zhang, Tao Hu, Yue Min, Guobin Wu, Junying Zhang, Pengcheng Feng, Pinghua Gong, and Jieping Ye. 2017. A taxi order dispatch model based on combinatorial optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, NY, 2151–2159.
[31]
L. Zheng and L. Chen. 2017. Maximizing acceptance in rejection-aware spatial crowdsourcing. IEEE Transactions on Knowledge and Data Engineering 29, 9 (Sep. 2017), 1943–1956.
[32]
Libin Zheng, Lei Chen, and Jieping Ye. 2018. Order dispatch in price-aware ridesharing. Proceedings of the VLDB Endowment 11, 8 (Apr. 2018), 853–865.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 15, Issue 3
June 2021
533 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3454120
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 April 2021
Accepted: 01 December 2020
Revised: 01 November 2020
Received: 01 March 2020
Published in TKDD Volume 15, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Demand dispatching
  2. on-demand services
  3. deep reinforcement learning

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • National Natural Science Foundation of China
  • Hong Kong RGC General Research Fund
  • Guangdong Basic and Applied Basic Research Foundation

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 288
    Total Downloads
  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)6
Reflects downloads up to 21 Jan 2025

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media