Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Ghost: A General Framework for High-Performance Online Similarity Queries over Distributed Trajectory Streams

Published: 20 June 2023 Publication History

Abstract

Trajectory similarity queries, including similarity search and similarity join, offer a foundation for many geo-spatial applications. With the rapid increase of streaming trajectory data volumes, e.g., data from mobile phones, vessel monitoring, or traffic systems, many location-based services benefit from online similarity analytics over trajectory data streams, where moving objects continually emit real-time position data. However, most existing studies focus on offline settings, and thus several major challenges remain unanswered in an online setting. To this end, we describe Ghost, a distributed stream processing framework that enables generic, efficient, and scalable online trajectory similarity search and join.
We propose a novel incremental online similarity computation (IOSC) mechanism to accelerate pair-wise streaming trajectory distance calculation, which supports a broad range of trajectory distance metrics. Compared with previous studies, IOSC reduces the complexity from quadratic to linear in terms of trajectory length. Building on this foundation, we propose histogram-based algorithms that exploit histogram indexes and a series of pruning bounds to enable streaming trajectory similarity search and join. Finally, we extend our methods to the distributed platform Flink for scalability, where a CostPartitioner is developed to ensure parallel processing and workload balancing. An experimental study using two real-life and one synthetic datasets shows that Ghost (i) acquires 6-20× efficiency/throughput gains and one order of magnitude memory overhead savings over state-of-the-art baselines, (ii) achieves 3--8× workload balancing gains on Flink, and (iii) exhibits low parameter sensitivity and high robustness.

Supplemental Material

MP4 File
Presentation video for SIGMOD-Ghost 2023

References

[1]
2005. Brinkhoff. https://iapg.jade-hs.de/personen/brinkhoff/generator/.
[2]
2014. Apache Flink. http://flink.apache.org/.
[3]
2014. Apache Spark. http://spark.apache.org/.
[4]
2014. Apache Storm. http://storm.apache.org/.
[5]
2015. T-drive Project. http://www.geolink.pt/ecmlpkdd2015-challenge/dataset.html.
[6]
2020. AIS Project. https://marinecadastre.gov/ais.
[7]
2023. Movebank. https://www.movebank.org/cms/movebank-main.
[8]
Stefan Atev, Grant Miller, and Nikolaos P. Papanikolopoulos. 2010. Clustering of Vehicle Trajectories. TITS 11, 3 (2010), 647--657.
[9]
Sotiris Brakatsoulas, Dieter Pfoser, Randall Salas, and Carola Wenk. 2005. On map-matching vehicle tracking data. In Proceedings of the 31st international conference on Very large data bases. 853--864.
[10]
Lisi Chen and Gao Cong. 2015. Diversity-aware top-k publish/subscribe for text stream. In SIGMOD.
[11]
Lu Chen, Yunjun Gao, Ziquan Fang, Xiaoye Miao, Christian S Jensen, and Chenjuan Guo. 2019. Real-time distributed co-movement pattern detection on streaming trajectories. VLDB Endowment 12, 10 (2019), 1208--1220.
[12]
Lei Chen and Raymond T. Ng. 2004. On The Marriage of Lp-norms and Edit Distance. In VLDB. 792--803.
[13]
Lei Chen, M. Tamer Özsu, and Vincent Oria. 2005. Robust and Fast Similarity Search for Moving Object Trajectories. In SIGMOD. 491--502.
[14]
Zaiben Chen, Heng Tao Shen, and Xiaofang Zhou. 2011. Discovering popular routes from trajectories. In ICDE. IEEE Computer Society, 900--911.
[15]
Roniel S. de Sousa, Azzedine Boukerche, and Antonio A. F. Loureiro. 2020. Vehicle Trajectory Similarity: Models, Methods, and Applications. ACM Comput. Surv. 53, 5 (2020), 94:1--94:32.
[16]
Jiafeng Ding, Junhua Fang, Zonglei Zhang, Pengpeng Zhao, Jiajie Xu, and Lei Zhao. 2019. Real-Time Trajectory Similarity Processing Using Longest Common Subsequence. In HPCC. IEEE, 1398--1405.
[17]
K Dudzi'ski and S. Walukiewicz. 1987. Exact methods for the knapsack problem and its generalizations. European Journal of Operational Research 28, 1 (1987), 3--21.
[18]
Junhua Fang, Rong Zhang, and Aoying Zhou. 2020. Load Balance for Distributed Real-time Computing Systems. Vol. 13. World Scientific.
[19]
Junhua Fang, Pengpeng Zhao, An Liu, Zhixu Li, and Lei Zhao. 2019. Scalable and Adaptive Joins for Trajectory Data in Distributed Stream System. J. Comput. Sci. Technol. 34, 4 (2019), 747--761.
[20]
Ziquan Fang, Yuntao Du, Xinjun Zhu, Danlei Hu, Lu Chen, Yunjun Gao, and Christian S. Jensen. 2022. Spatio-Temporal Trajectory Similarity Learning in Road Networks. In KDD. 347--356.
[21]
Peng Han, Jin Wang, Di Yao, Shuo Shang, and Xiangliang Zhang. 2021. A graph approach for trajectory similarity computation in networks. In KDD. 556--564.
[22]
Chih-Chieh Hung, Wen-Chih Peng, and Wang-Chien Lee. 2015. Clustering and aggregating clues of trajectories for mining trajectory patterns and routes. VLDB J. 24, 2 (2015), 169--192.
[23]
Young-Ho Jeon, Ki-Hoon Lee, and Ho-Jun Kim. 2019. Distributed join processing between streaming and stored big data under the micro-batch model. IEEE Access 7 (2019), 34583--34598.
[24]
Satoshi Koide, Chuan Xiao, and Yoshiharu Ishikawa. 2020. Fast Subtrajectory Similarity Search in Road Networks under Weighted Edit Distance Constraints. Proc. VLDB Endow. 13, 11 (2020), 2188--2201.
[25]
Xiucheng Li, Kaiqi Zhao, Gao Cong, Christian S. Jensen, and Wei Wei. 2018. Deep representation learning for trajectory similarity computation. In ICDE. 617--628.
[26]
Jiali Mao, Cheqing Jin, Xiaoling Wang, and Aoying Zhou. 2015. Challenges and issues in trajectory streams clustering upon a Sliding-Window Model. In WISA. 303--308.
[27]
Jiali Mao, Jiaye Liu, Cheqing Jin, and Aoying Zhou. 2021. Feature grouping--based trajectory outlier detection over distributed streams. TIST 12, 2 (2021), 1--23.
[28]
de Berg Mark, Cheong Otfried, van Kreveld Marc, and Overmars Mark. 2008. Computational geometry algorithms and applications. Spinger.
[29]
Jean Damascène Mazimpaka and Sabine Timpf. 2016. Trajectory data mining: A review of methods and applications. J. Spatial Inf. Sci. 13, 1 (2016), 61--99.
[30]
Zhicheng Pan, Pingfu Chao, Junhua Fang, Wei Chen, Zhixu Li, and An Liu. 2020. TraSP: A General Framework for Online Trajectory Similarity Processing. In WISE (Lecture Notes in Computer Science, Vol. 12342). Springer, 384--397.
[31]
Elkhan Shahverdi, Ahmed Awad, and Sherif Sakr. 2019. Big stream processing systems: an experimental evaluation. In ICDEW. IEEE, 53--60.
[32]
Shuo Shang, Lisi Chen, Zhewei Wei, Christian S. Jensen, Kai Zheng, and Panos Kalnis. 2017. Trajectory Similarity Join in Spatial Networks. Proc. VLDB Endow. 10, 11 (2017), 1178--1189.
[33]
Zeyuan Shang, Guoliang Li, and Zhifeng Bao. 2018. DITA: distributed in-memory trajectory analytics. In Proceedings of the 2018 International Conference on Management of Data. 725--740.
[34]
Han Su, Shuncheng Liu, Bolong Zheng, Xiaofang Zhou, and Kai Zheng. 2020. A survey of trajectory distance measures and performance evaluation. VLDB J. 29, 1 (2020), 3--32.
[35]
Michail Vlachos, Dimitrios Gunopulos, and George Kollios. 2002. Discovering Similar Multidimensional Trajectories. In ICDE. 673--684.
[36]
Sheng Wang, Zhifeng Bao, J. Shane Culpepper, Zizhe Xie, Qizhi Liu, and Xiaolin Qin. 2018. Torch: A Search Engine for Trajectory Data. In SIGIR. 535--544.
[37]
Xiang Wang, Ying Zhang, Wenjie Zhang, Xuemin Lin, and Zengfeng Huang. 2016. Skype: top-k spatial-keyword publish/subscribe over sliding window. VLDB (2016).
[38]
Dong Xie, Feifei Li, and Jeff M. Phillips. 2017. Distributed Trajectory Similarity Search. Proc. VLDB Endow. 10, 11 (2017), 1478--1489.
[39]
Zhao Xu, Kristian Kersting, and Lorenzo von Ritter. 2017. Stochastic Online Anomaly Analysis for Streaming Time Series. In IJCAI. 3189--3195.
[40]
Jianye Yang, Wenjie Zhang, Xiang Wang, Ying Zhang, and Xuemin Lin. 2020. Distributed Streaming Set Similarity Join. In ICDE. 565--576.
[41]
Peilun Yang, Hanchen Wang, Ying Zhang, Lu Qin, Wenjie Zhang, and Xuemin Lin. 2021. T3S: Effective Representation Learning for Trajectory Similarity Computation. In ICDE. 2183--2188.
[42]
Di Yao, Gao Cong, Chao Zhang, and Jingping Bi. 2019. A generic seed-guided neural metric learning approach. In ICDE. 1358--1369.
[43]
Byoung-Kee Yi, H. V. Jagadish, and Christos Faloutsos. 1998. Efficient Retrieval of Similar Time Sequences Under Time Warping. In ICDE. 201--208.
[44]
Yanwei Yu, Lei Cao, Elke A. Rundensteiner, and Qin Wang. 2014. Detecting moving object outliers in massive-scale trajectory streams. In KDD. 422--431.
[45]
Haitao Yuan and Guoliang Li. 2019. Distributed In-memory Trajectory Similarity Search and Join on Road Network. In ICDE. 1262--1273.
[46]
Dongxiang Zhang, Zhihao Chang, Sai Wu, Ye Yuan, Kian-Lee Tan, and Gang Chen. 2020. Continuous Trajectory Similarity Search for Online Outlier Detection. TKDE (2020).
[47]
Haida Zhang, Zengfeng Huang, Zhewei Wei, Wenjie Zhang, and Xuemin Lin. 2017. Tracking Matrix Approximation over Distributed Sliding Windows. In ICDE. 833--844.
[48]
Hanyuan Zhang, Xinyu Zhang, Qize Jiang, Baihua Zheng, Zhenbang Sun, Weiwei Sun, and Changhu Wang. 2020. Trajectory Similarity Learning with Auxiliary Supervision and Optimal Matching. In IJCAI. 3209--3215.
[49]
Bolong Zheng, Lianggui Weng, Xi Zhao, Kai Zeng, Xiaofang Zhou, and Christian S Jensen. 2021. REPOSE: Distributed top-k trajectory similarity search with local reference point tries. In ICDE. 708--719.
[50]
Yu Zheng. 2015. Trajectory Data Mining: An Overview. TIST 6, 3 (2015), 29:1--29:41.

Cited By

View all
  • (2024)A Time-Identified R-Tree: A Workload-Controllable Dynamic Spatio-Temporal Index Scheme for Streaming ProcessingISPRS International Journal of Geo-Information10.3390/ijgi1302004913:2(49)Online publication date: 4-Feb-2024
  • (2024)An Efficient and Distributed Framework for Real-Time Trajectory Stream ClusteringIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.331231936:5(1857-1873)Online publication date: May-2024
  • (2024)Stream-aware indexing for distributed inequality join processingInformation Systems10.1016/j.is.2024.102425125(102425)Online publication date: Nov-2024

Index Terms

  1. Ghost: A General Framework for High-Performance Online Similarity Queries over Distributed Trajectory Streams

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the ACM on Management of Data
      Proceedings of the ACM on Management of Data  Volume 1, Issue 2
      PACMMOD
      June 2023
      2310 pages
      EISSN:2836-6573
      DOI:10.1145/3605748
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 June 2023
      Published in PACMMOD Volume 1, Issue 2

      Permissions

      Request permissions for this article.

      Author Tags

      1. distributed processing
      2. flink
      3. similarity metrics
      4. trajectory streams

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)261
      • Downloads (Last 6 weeks)18
      Reflects downloads up to 30 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A Time-Identified R-Tree: A Workload-Controllable Dynamic Spatio-Temporal Index Scheme for Streaming ProcessingISPRS International Journal of Geo-Information10.3390/ijgi1302004913:2(49)Online publication date: 4-Feb-2024
      • (2024)An Efficient and Distributed Framework for Real-Time Trajectory Stream ClusteringIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.331231936:5(1857-1873)Online publication date: May-2024
      • (2024)Stream-aware indexing for distributed inequality join processingInformation Systems10.1016/j.is.2024.102425125(102425)Online publication date: Nov-2024

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media