Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Efficient Mining of Outlying Sequence Patterns for Analyzing Outlierness of Sequence Data

Published: 05 August 2020 Publication History

Abstract

Recently, a lot of research work has been proposed in different domains to detect outliers and analyze the outlierness of outliers for relational data. However, while sequence data is ubiquitous in real life, analyzing the outlierness for sequence data has not received enough attention. In this article, we study the problem of mining outlying sequence patterns in sequence data addressing the question: given a query sequence s in a sequence dataset D, the objective is to discover sequence patterns that will indicate the most unusualness (i.e., outlierness) of s compared against other sequences. Technically, we use the rank defined by the average probabilistic strength (aps) of a sequence pattern in a sequence to measure the outlierness of the sequence. Then a minimal sequence pattern where the query sequence is ranked the highest is defined as an outlying sequence pattern. To address the above problem, we present OSPMiner, a heuristic method that computes aps by incorporating several pruning techniques. Our empirical study using both real and synthetic data demonstrates that OSPMiner is effective and efficient.

References

[1]
Charu C. Aggarwal. 2013. Outlier Analysis. Springer.
[2]
Rakesh Agrawal and Ramakrishnan Srikant. 1995. Mining sequential patterns. In Proceedings of the 11th IEEE International Conference on Data Engineering (ICDE’95). 3--14.
[3]
Ahmed AlEroud and George Karabatis. 2017. Contextual information fusion for intrusion detection: A survey and taxonomy. Knowledge and Information Systems 52, 3 (2017), 563--619.
[4]
Fabrizio Angiulli, Fabio Fassetti, Giuseppe Manco, and Luigi Palopoli. 2017. Outlying property detection with numerical attributes. Data Mining and Knowledge Discovery 31, 1 (2017), 134--163.
[5]
Jie Bao, Yu Zheng, and Mohamed F. Mokbel. 2012. Location-based and preference-aware recommendation using sparse geo-social networking data. In Proceedings of the 20th International Conference on Advances in Geographic Information Systems (SIGSPATIAL’12). 199--208.
[6]
Iyad Batal, Dmitriy Fradkin, James H. Harrison Jr., Fabian Moerchen, and Milos Hauskrecht. 2012. Mining recent temporal patterns for event detection in multivariate time series data. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’12). 280--288.
[7]
Ricardo J. G. B. Campello, Davoud Moulavi, Arthur Zimek, and Jörg Sander. 2015. Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Transactions on Knowledge Discovery from Data 10, 1 (2015), 5:1--5:51.
[8]
Lei Cao, Yizhou Yan, Caitlin Kuhlman, Qingyang Wang, Elke A. Rundensteiner, and Mohamed Y. Eltabakh. 2017. Multi-tactic distance-based outlier detection. In Proceedings of the 33rd IEEE International Conference on Data Engineering (ICDE’17). 959--970.
[9]
Lei Cao, Yizhou Yan, Samuel Madden, Elke A. Rundensteiner, and Mathan Gopalsamy. 2019. Efficient discovery of sequence outlier patterns. Proceedings of the VLDB Endowment 12, 8 (2019), 920--932.
[10]
Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2012. Anomaly detection for discrete sequences: A survey. IEEE Transactions on Knowledge 8 Data Engineering 24, 5 (2012), 823--839.
[11]
Varun Chandola, Varun Mithal, and Vipin Kumar. 2014. A reference based analysis framework for understanding anomaly detection techniques for symbolic sequences. Data Mining and Knowledge Discovery 28, 3 (2014), 702--735.
[12]
Xi C. Chen, Yuanshun Yao, Sichao Shi, Snigdhansu Chatterjee, Vipin Kumar, and James H. Faghmous. 2016. A general framework to increase the robustness of model-based change point detection algorithms to outliers and noise. In Proceedings of the 16th SIAM International Conference on Data Mining (SDM’16). 162--170.
[13]
Xuan-Hong Dang, Barbora Micenková, Ira Assent, and Raymond T. Ng. 2013. Local outlier detection with interpretation. In Proceedings of the 2013 European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD’13). 304--320.
[14]
Guozhu Dong and James Bailey. 2013. Contrast Data Mining: Concepts, Algorithms, and Applications. CRC Press.
[15]
Lei Duan, Guanting Tang, Jian Pei, James Bailey, Akiko Campbell, and Changjie Tang. 2015. Mining outlying aspects on numeric data. Data Mining and Knowledge Discovery 29, 5 (2015), 1116--1151.
[16]
Lei Duan, Li Yan, Guozhu Dong, Jyrki Nummenmaa, and Hao Yang. 2017. Mining top-k distinguishing temporal sequential patterns from event sequences. In Proceedings of the 22nd International Conference on Database Systems for Advanced Applications (DASFAA’17). 235--250.
[17]
Nikhil Gupta, Dhivya Eswaran, Neil Shah, Leman Akoglu, and Christos Faloutsos. 2018. Beyond outlier detection: LookOut for pictorial explanation. In Proceedings of the 2018 European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD’18). 122--138.
[18]
Tian Huang, Yongxin Zhu, Yishu Mao, Xinyang Li, Mengyun Liu, Yafei Wu, Yajun Ha, and Gillian Dobbie. 2016. Parallel discord discovery. In Proceedings of the 20th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD’15). 233--244.
[19]
Xiaonan Ji, James Bailey, and Guozhu Dong. 2007. Mining minimal distinguishing subsequence patterns with gap constraints. Knowledge and Information Systems 11, 3 (2007), 259--286.
[20]
Eamonn J. Keogh, Jessica Lin, Sang-Hee Lee, and Helga Van Herle. 2007. Finding the most unusual time series subsequence: Algorithms and applications. Knowledge and Information Systems 11, 1 (2007), 1--27.
[21]
Chun Li, Qingyan Yang, Jianyong Wang, and Ming Li. 2012. Efficient mining of gap-constrained subsequences and its various applications. ACM Transactions on Knowledge Discovery from Data 6, 1 (2012), 2:1--2:39.
[22]
Jundong Li, Harsh Dani, Xia Hu, and Huan Liu. 2017. Radar: Residual analysis for anomaly detection in attributed networks. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). 2152--2158.
[23]
Yuxuan Li, James Bailey, Lars Kulik, and Jian Pei. 2013. Mining probabilistic frequent spatio-temporal sequential patterns with gap constraints from uncertain databases. In Proceedings of the 13th IEEE International Conference on Data Mining (ICDM’13). 448--457.
[24]
Ninghao Liu, Donghwa Shin, and Xia Hu. 2018. Contextual outlier interpretation. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). 2461--2467.
[25]
Barbora Micenková, Raymond T. Ng, Xuan-Hong Dang, and Ira Assent. 2013. Explaining outliers by subspace separability. In Proceedings of the 13th IEEE International Conference on Data Mining (ICDM’13). 518--527.
[26]
Volodymyr Miz, Benjamin Ricaud, Kirell Benzi, and Pierre Vandergheynst. 2019. Anomaly detection in the dynamics of web and social networks using associative memory. In Proceedings of the World Wide Web Conference (WWW’19). 1290--1299.
[27]
Muhammad Muzammal and Rajeev Raman. 2015. Mining sequential patterns from probabilistic databases. Knowledge and Information Systems 44, 2 (2015), 325--358.
[28]
Xuan Vinh Nguyen, Jeffrey Chan, James Bailey, Christopher Leckie, Kotagiri Ramamohanarao, and Jian Pei. 2015. Scalable outlying-inlying aspects discovery via feature ranking. In Proceedings of the 19th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD’15). 422--434.
[29]
Xuan Vinh Nguyen, Jeffrey Chan, Simone Romano, James Bailey, Christopher Leckie, Kotagiri Ramamohanarao, and Jian Pei. 2016. Discovering outlying aspects in large datasets. Data Mining and Knowledge Discovery 30, 6 (2016), 1520--1555.
[30]
Guansong Pang, Longbing Cao, Ling Chen, and Huan Liu. 2018. Learning representations of ultrahigh-dimensional data for random distance-based outlier detection. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’18). 2041--2050.
[31]
Xiao Qin, Lei Cao, Elke A. Rundensteiner, and Samuel Madden. 2019. Scalable kernel density estimation-based local outlier detection over large data streams. In Proceedings of the 22nd International Conference on Extending Database Technology (EDBT’19). 421--432.
[32]
Faraz Rasheed and Reda Alhajj. 2014. A framework for periodic outlier pattern detection in time-series sequences. IEEE Transactions on Cybernetics 44, 5 (2014), 569--582.
[33]
Ron Rymon. 1992. Search through systematic set enumeration. In Proceedings of the 3rd International Conference on Principles of Knowledge Representation and Reasoning (KR’92). 539--550.
[34]
Reza Sadoddin, Jörg Sander, and Davood Rafiei. 2016. Finding surprisingly frequent patterns of variable lengths in sequence data. In Proceedings of the 16th SIAM International Conference on Data Mining (SDM’16). 27--35.
[35]
Md Amran Siddiqui, Alan Fern, Thomas G. Dietterich, and Weng-Keen Wong. 2019. Sequential feature explanations for anomaly detection. ACM Transactions on Knowledge Discovery from Data 13, 1 (2019), 1:1--1:22.
[36]
Pei Sun, Sanjay Chawla, and Bavani Arunasalam. 2006. Mining for outliers in sequential databases. In Proceedings of the 6th SIAM International Conference on Data Mining (SDM’06). 94--105.
[37]
Li Wan, Ling Chen, and Chengqi Zhang. 2013. Mining frequent serial episodes over uncertain sequence data. In Proceedings of the 16th International Conference on Extending Database Technology (EDBT’13). 215--226.
[38]
Haibo Wang, Chuan Zhou, Jia Wu, Weizhen Dang, Xingquan Zhu, and Jilong Wang. 2018. Deep structure learning for fraud detection. In Proceedings of the 18th IEEE International Conference on Data Mining (ICDM’18). 567--576.
[39]
Xing Wang, Jessica Lin, Nital Patel, and Martin Braun. 2018. Exact variable-length anomaly detection algorithm for univariate and multivariate time series. Data Mining and Knowledge Discovery 32, 6 (2018), 1806--1844.
[40]
Diane Myung-kyung Woodbridge, Andrew T. Wilson, Mark D. Rintoul, and Richard H. Goldstein. 2015. Time series discord detection in medical data using a parallel relational database. In Proceedings of the 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM’15). 1420--1426.
[41]
Hao Yang, Lei Duan, Guozhu Dong, Jyrki Nummenmaa, Changjie Tang, and Xiaosong Li. 2015. Mining itemset-based distinguishing sequential patterns with gap constraint. In Proceedings of the 20th International Conference on Database Systems for Advanced Applications (DASFAA’15). 39--54.
[42]
Qingying Yu, Yonglong Luo, Chuanming Chen, and Xiaohan Wang. 2018. Trajectory outlier detection approach based on common slices sub-sequence. Applied Intelligence 48, 9 (2018), 2661--2680.
[43]
Ji Zhang, Qigang Gao, and Hai H. Wang. 2006. A novel method for detecting outlying subspaces in high-dimensional databases using genetic algorithm. In Proceedings of the 6th IEEE International Conference on Data Mining (ICDM’06). 731--740.
[44]
Minghua Zhang, Ben Kao, David W. Cheung, and Kevin Y. Yip. 2007. Mining periodic patterns with gap requirement from sequences. ACM Transactions on Knowledge Discovery from Data 1, 2 (2007), 7.
[45]
Wayne Xin Zhao, Jing Jiang, Jianshu Weng, Jing He, Ee-Peng Lim, Hongfei Yan, and Xiaoming Li. 2011. Comparing Twitter and traditional media using topic models. In Proceedings of the 33rd European Conference on Advances in Information Retrieval (ECIR’11). 338--349.
[46]
Zhou Zhao, Da Yan, and Wilfred Ng. 2012. Mining probabilistically frequent sequential patterns in uncertain databases. In Proceedings of the 15th International Conference on Extending Database Technology (EDBT’12). 74--85.
[47]
Zhigang Zheng, Wei Wei, Chunming Liu, Wei Cao, Longbing Cao, and Maninder Bhatia. 2016. An effective contrast sequential pattern mining approach to taxpayer behavior analysis. World Wide Web 19, 4 (2016), 633--651.
[48]
Jiaqi Zhu, Kaijun Wang, Yunkun Wu, Zhongyi Hu, and Hongan Wang. 2016. Mining user-aware rare sequential topic patterns in document streams. IEEE Transactions on Knowledge 8 Data Engineering 28, 7 (2016), 1790--1804.
[49]
Xingquan Zhu and Xindong Wu. 2007. Mining complex patterns across sequences with gap requirements. In Proceedings of the 20th International Joint Conference on Artifical Intelligence (IJCAI’07). 2934--2941.

Cited By

View all
  • (2024)RNP-Miner: Repetitive Nonoverlapping Sequential Pattern MiningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333430036:9(4874-4889)Online publication date: Sep-2024
  • (2024)Targeted mining of contiguous sequential patternsInformation Sciences10.1016/j.ins.2023.119791653(119791)Online publication date: Jan-2024
  • (2023)COPP-Miner: Top-k Contrast Order-Preserving Pattern Mining for Time Series ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.332174936:6(2372-2387)Online publication date: 19-Oct-2023
  • Show More Cited By

Index Terms

  1. Efficient Mining of Outlying Sequence Patterns for Analyzing Outlierness of Sequence Data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 14, Issue 5
    Special Issue on KDD 2018, Regular Papers and Survey Paper
    October 2020
    376 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3407672
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 August 2020
    Accepted: 01 May 2020
    Revised: 01 January 2020
    Received: 01 May 2019
    Published in TKDD Volume 14, Issue 5

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Outlying sequence pattern
    2. average probabilistic strength
    3. outlierness analysis
    4. sequence mining

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • National Natural Science Foundation of China
    • Australian Research Council
    • Google Faculty Research Award

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)29
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 17 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)RNP-Miner: Repetitive Nonoverlapping Sequential Pattern MiningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333430036:9(4874-4889)Online publication date: Sep-2024
    • (2024)Targeted mining of contiguous sequential patternsInformation Sciences10.1016/j.ins.2023.119791653(119791)Online publication date: Jan-2024
    • (2023)COPP-Miner: Top-k Contrast Order-Preserving Pattern Mining for Time Series ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.332174936:6(2372-2387)Online publication date: 19-Oct-2023
    • (2023)MCoR-Miner: Maximal Co-Occurrence Nonoverlapping Sequential Rule MiningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.324121335:9(9531-9546)Online publication date: 1-Sep-2023
    • (2023)OPR-Miner: Order-Preserving Rule Mining for Time SeriesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322496335:11(11722-11735)Online publication date: 1-Nov-2023
    • (2023)OPP-Miner: Order-Preserving Sequential Pattern Mining for Time SeriesIEEE Transactions on Cybernetics10.1109/TCYB.2022.316932753:5(3288-3300)Online publication date: May-2023
    • (2022)ONP-Miner: One-off Negative Sequential Pattern MiningACM Transactions on Knowledge Discovery from Data10.1145/3549940Online publication date: 4-Aug-2022
    • (2022) Top- k Self-Adaptive Contrast Sequential Pattern Mining IEEE Transactions on Cybernetics10.1109/TCYB.2021.308211452:11(11819-11833)Online publication date: Nov-2022
    • (2022)AOP-Miner: Approximate Order-Preserving Pattern Mining for Time Series2022 IEEE International Conference on Knowledge Graph (ICKG)10.1109/ICKG55886.2022.00026(149-156)Online publication date: Nov-2022
    • (2022)Efficient mining of concept-hierarchy aware distinguishing sequential patternsKnowledge-Based Systems10.1016/j.knosys.2022.109710255(109710)Online publication date: Nov-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media