Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Adaptive Cluster Tendency Visualization and Anomaly Detection for Streaming Data

Published: 03 December 2016 Publication History

Abstract

The growth in pervasive network infrastructure called the Internet of Things (IoT) enables a wide range of physical objects and environments to be monitored in fine spatial and temporal detail. The detailed, dynamic data that are collected in large quantities from sensor devices provide the basis for a variety of applications. Automatic interpretation of these evolving large data is required for timely detection of interesting events. This article develops and exemplifies two new relatives of the visual assessment of tendency (VAT) and improved visual assessment of tendency (iVAT) models, which uses cluster heat maps to visualize structure in static datasets. One new model is initialized with a static VAT/iVAT image, and then incrementally (hence inc-VAT/inc-iVAT) updates the current minimal spanning tree (MST) used by VAT with an efficient edge insertion scheme. Similarly, dec-VAT/dec-iVAT efficiently removes a node from the current VAT MST. A sequence of inc-iVAT/dec-iVAT images can be used for (visual) anomaly detection in evolving data streams and for sliding window based cluster assessment for time series data. The method is illustrated with four real datasets (three of them being smart city IoT data). The evaluation demonstrates the algorithms’ ability to successfully isolate anomalies and visualize changing cluster structure in the streaming data.

Supplementary Material

kumar (kumar.zip)
Supplemental movie, appendix, image and software files for, Adaptive Cluster Tendency Visualization and Anomaly Detection for Streaming Data

References

[1]
C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. 2003. A framework for clustering evolving data streams. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 81--92.
[2]
E. Anderson. 1935. The irises of the Gaspe peninsula. Bulletin of American Iris Society 59 (1935), 2--5.
[3]
P. Angelov. 2010. Evolving Takagi-Sugeno fuzzy systems from streaming data (eTS+). In Evolving Intelligent Systems: Methodology and Applications. IEEE Press.
[4]
ARUP. 2015. http://www.arup.com/Global_locations/Australia.aspx.
[5]
B. Babcock, M. Datar, R. Motwani, and L. O’Callaghan. 2003. Maintaining variance and k-medians over data stream windows. In Proceedings of the ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). 234--243.
[6]
J. L. Bentley and A. C. Yao. 1976. An almost optimal algorithm for unbounded searching. Inform. Process. Lett. 5, 3 (1976), 82--87.
[7]
J. Bezdek and R. Hathaway. 2002. VAT: A tool for visual assessment of (cluster) tendency. In Proceedings of the International Joint Conference on Neural Networks (IJCNN) (2002), 2225--2230.
[8]
J. C. Bezdek, R. J. Hathaway, and J. M. Huband. 2007. Visual assessment of clustering tendency for rectangular dissimilarity matrices. IEEE Trans. Fuzzy Syst. 15(5) (2007), 890--903.
[9]
J. C. Bezdek, S. Rajasegarar, M. Moshtaghi, C. Leckie, M. Palaniswami, and T. C. Havens. 2011. Anomaly detection in environmental monitoring networks. Comput. Intell. Mag. 6, 2 (2011), 52--58.
[10]
B. Delaunay. 1934. Sur la sphère vide. A la mémoire de Georges Voronoï. Bulletin de l’Académie des Sciences de l’URSS, Classe des sciences mathématiques et naturelles 6 (1934), 793--800.
[11]
E. D. Demaine, A. López-Ortiz, and J. I. Munro. 2000. Adaptive set intersections, unions, and differences. In ACM-SIAM Symposium on Discrete Algorithms (SODA). 743--752.
[12]
E. W. Dijkstra. 1959. A note on two problems in connexion with graphs. Numer. Math. 1, 1 (1959), 269--271.
[13]
N. Elmqvist, P. Dragicevic, and J. D. Fekete. 2008. Rolling the dice: Multidimensional visual exploration using scatterplot matrix navigation. IEEE Trans. Vis. Comput. Graphics 14(6) (2008), 1539--1548.
[14]
V. Estivill-Castro and D. Wood. 1992. A survey of adaptive sorting algorithms. ACM Comput. Surv. 24, 4 (1992), 441--476.
[15]
D. Greene, D. Archambault, V. Belák, and P. Cunningham. 2015. TextLuas: Tracking and visualizing document and term clusters in dynamic text data. CoRR abs/1502.04609 (2015).
[16]
J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami. 2013. Internet of things (IoT): A vision, architectural elements, and future directions. Future Gener. Comput. Syst. 29(7) (2013), 1645--1660.
[17]
R. Hathaway, J. Bezdek, and J. Huband. 2006. Scalable visual assessment of cluster tendency for large data sets. Pattern Recognit. 39 (2006), 1315--1324.
[18]
T. Havens and J. Bezdek. 2012. An efficient formulation of the improved visual assessment of cluster tendency (iVAT) algorithm. IEEE Trans. Knowl. Data Eng. 24(5) (2012), 813--822.
[19]
T. Havens, J. Bezdek, J. Keller, M. Popescu, and J. Huband. 2009. Is VAT really single linkage in disguise? Ann. Math. Artif. Intell. 55(3--4) (2009), 237--251.
[20]
Z. He, X. Xu, and S. Deng. 2003. Discovering cluster-based local outliers. Pattern Recognit. Lett. 24, 910 (2003), 1641--1650.
[21]
Internet of Things: A pilot deployment in the city of Melbourne, Australia. 2015. Retrieved 15 Oct 2016 from http://issnip.unimelb.edu.au/research_program/Internet_of_Things/iot_deployment.
[22]
V. Jarník. 1930. O jistém problému minimálním. Práce Moravské Přírodovědecké Společnosti 6 (1930), 57--63.
[23]
D. G. Kirkpatrick and R. Seidel. 1986. The ultimate planar convex hull algorithm. SIAM J. Comput. 15, 1 (1986), 287--299.
[24]
J. Kruskal. 1956. On the shortest spanning subtree of a graph and the traveling salesman problem. In Proceedings of the American Mathematical Society, 7.
[25]
D. Kumar, J. C. Bezdek, M. Palaniswami, S. Rajasegarar, C. Leckie, and T. C. Havens. 2016. A hybrid approach to clustering in big data. IEEE Trans. Cybern. 46, 10 (2016), 2372--2385.
[26]
D. Kumar, J. C. Bezdek, S. Rajasegarar, C. Leckie, and M. Palaniswami. 2015. A visual-numeric approach to clustering and anomaly detection for trajectory data. Vis. Comput. (2015), 1--17.
[27]
D. Kumar, M. Palaniswami, S. Rajasegarar, C. Leckie, J. Bezdek, and T. Havens. 2013. clusiVAT: A mixed visual/numerical clustering algorithm for big data. In Proceedings of the IEEE International Conference on Big Data. 112--117.
[28]
O. D. Lampe and H. Hauser. 2011. Interactive visualization of streaming data with kernel density estimation. In Proceedings of the IEEE Pacific Visualization Symposium (PACIFICVIS). 171--178.
[29]
W. B. March, P. Ram, and A. G. Gray. 2010. Fast Euclidean minimum spanning tree: Algorithm, analysis, and applications. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 603--612.
[30]
M. Moshtaghi, J. C. Bezdek, C. Leckie, S. Karunasekera, and M. Palaniswami. 2015. Evolving fuzzy rules for anomaly detection in data streams. IEEE Trans. Fuzzy Syst. 23, 3 (2015), 688--700.
[31]
N. Murtagh, M. Nati, W. R. Headley, B. Gatersleben, A. Gluhak, M. A. Imran, and D. Uzzell. 2013. Individual energy use and feedback in an office setting: A field trial. Energy Policy 62 (2013), 717--728.
[32]
S. Pettie and V. Ramachandran. 2002. An optimal minimum spanning tree algorithm. J. ACM 49, 1 (2002), 16--34.
[33]
F. P. Preparata and M. I. Shamos. 1985. Computational Geometry: An Introduction. Springer-Verlag, New York, NY.
[34]
R. Prim. 1957. Shortest connection networks and some generalizations. Bell Syst. Techn. J. 36, 6 (1957), 1389--1401.
[35]
S. Rajasegarar, A. Gluhak, M. A. Imran, M. Nati, M. Moshtaghi, C. Leckie, and M. Palaniswami. 2014. Ellipsoidal neighbourhood outlier factor for distributed anomaly detection in resource constrained networks. Pattern Recognit. 47, 9 (2014), 2867--2879.
[36]
S. Rajasegarar, C. Leckie, M. Palaniswami, and J. Bezdek. 2006. Distributed anomaly detection in wireless sensor networks. In Proceedings of the IEEE International Conference on Communication Systems (ICCS) (2006), 1--5.
[37]
S. Rajasegarar, M. Palaniswami, C. Leckie, and J. Bezdek. 2007. Quarter sphere based distributed anomaly detection in wireless sensor networks. In Proceedings of the IEEE International Conference on Communications (ICC). 3864--3869.
[38]
K. Reda, C. Tantipathananandh, A. Johnson, J. Leigh, and T. Berger-Wolf. 2011. Visualizing the evolution of community structures in dynamic social networks. In Proceedings of the Eurographics/IEEE - VGTC Conference on Visualization (EuroVis). 1061--1070.
[39]
REDUCE: Reshaping Energy Demands Using ICT. 2013. Retrieved 15 Oct 2016 from http://info.ee.surrey.ac.uk/CCSR/REDUCE/.
[40]
M. I. Shamos and D. Hoey. 1975. Closest-point problems. In Proceedings of the Annual Symposium on Foundations of Computer Science (SFCS). 151--162.
[41]
A. B. Sharma, L. Golubchik, and R. Govindan. 2010. Sensor faults: Detection methods and prevalence in real-world datasets. ACM Trans. Sensor Netw. 6, 3, Article 23 (2010), 23:1--23:39 pages.
[42]
A. Shilton, S. Rajasegarar, C. Leckie, and M. Palaniswami. 2015. DP1SVM: A dynamic planar one-class support vector machine for internet of things environment. In Proceedings of the International Conference on Recent Advances in Internet of Things (RIoT). 1--6.
[43]
A. Tartakovsky and V. Veeravalli. 2008. Asymptotically optimal quickest change detection in distributed sensor systems. Sequential Anal. 27 (2008), 441--475.
[44]
L. Wang, X. Geng, J. Bezdek, C. Leckie, and K. Ramamohanarao. 2010. Enhanced visual analysis for cluster tendency assessment and data partitioning. IEEE Trans. Knowl. Data Eng. 22(10) (2010), 1401--1414.
[45]
P. C. Wong, H. Foote, D. Adams, W. Cowley, and J. Thomas. 2003. Dynamic visualization of transient data streams. In Proceedings of the Annual IEEE Conference on Information Visualization (INFOVIS). 97--104.
[46]
XmdvTool. 2015. Retrieved 15 Oct 2016 from http://davis.wpi.edu/xmdv/.
[47]
D. Yang, E. A. Rundensteiner, and M. O. Ward. 2012. Shared execution strategy for neighbor-based pattern mining requests over streaming windows. ACM Trans. Database Syst. 37, 1, Article 5 (2012), 5:1--5:44 pages.
[48]
D. Yang, E. A. Rundensteiner, and M. O. Ward. 2013a. Mining neighbor-based patterns in data streams. Inf. Syst. 38, 3 (2013), 331--350.
[49]
D. Yang, K. Zhao, H. Lu, M. Hasan, E. A. Rundensteiner, and M. O. Ward. 2013b. Mining and linking patterns across live data streams and stream archives. Proc. VLDB Endowment 6, 12 (2013), 1346--1349.
[50]
Y. Yao, A. B. Sharma, L. Golubchik, and R. Govindan. 2010. Online anomaly detection for sensor systems: A simple and efficient approach. Perform. Eval. 67, 11 (2010), 1059--1075.
[51]
A. Zhou, F. Cao, W. Qian, and C. Jin. 2008. Tracking clusters in evolving data streams over sliding windows. Knowl. Inf. Syst. 15, 2 (2008), 181--214.

Cited By

View all
  • (2024)Biosensor-Based Drones Anomaly Detection Integration for Sustainable Agriculture DevelopmentAchieving Food Security Through Sustainable Agriculture10.4018/979-8-3693-4240-4.ch012(283-316)Online publication date: 30-Aug-2024
  • (2024)Time and Memory Scalable Algorithms for Clustering Tendency Assessment of Big DataInformation Sciences10.1016/j.ins.2024.120324(120324)Online publication date: Feb-2024
  • (2023)Colorized iVAT Images for Labeled DataAdvances in Science, Technology and Engineering Systems Journal10.25046/aj0804138:4(111-121)Online publication date: Aug-2023
  • Show More Cited By

Index Terms

  1. Adaptive Cluster Tendency Visualization and Anomaly Detection for Streaming Data

                      Recommendations

                      Comments

                      Information & Contributors

                      Information

                      Published In

                      cover image ACM Transactions on Knowledge Discovery from Data
                      ACM Transactions on Knowledge Discovery from Data  Volume 11, Issue 2
                      May 2017
                      419 pages
                      ISSN:1556-4681
                      EISSN:1556-472X
                      DOI:10.1145/3017677
                      Issue’s Table of Contents
                      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                      Publisher

                      Association for Computing Machinery

                      New York, NY, United States

                      Publication History

                      Published: 03 December 2016
                      Accepted: 01 September 2016
                      Revised: 01 May 2016
                      Received: 01 June 2015
                      Published in TKDD Volume 11, Issue 2

                      Permissions

                      Request permissions for this article.

                      Check for updates

                      Author Tags

                      1. Visual assessment of clusters in streaming data
                      2. cluster heat maps
                      3. internet of things (IoT)
                      4. online anomaly detection
                      5. sliding window based time series clustering
                      6. smart city streaming data analysis

                      Qualifiers

                      • Research-article
                      • Research
                      • Refereed

                      Funding Sources

                      • EU Horizon 2020 OrganiCity grant
                      • EU-FP7 SOCIOTAL grant
                      • Australian Research Council (ARC) Linkage Project
                      • ARC Linkage Infrastructure
                      • Equipment and Facilities scheme (LIEF)

                      Contributors

                      Other Metrics

                      Bibliometrics & Citations

                      Bibliometrics

                      Article Metrics

                      • Downloads (Last 12 months)29
                      • Downloads (Last 6 weeks)1
                      Reflects downloads up to 30 Aug 2024

                      Other Metrics

                      Citations

                      Cited By

                      View all
                      • (2024)Biosensor-Based Drones Anomaly Detection Integration for Sustainable Agriculture DevelopmentAchieving Food Security Through Sustainable Agriculture10.4018/979-8-3693-4240-4.ch012(283-316)Online publication date: 30-Aug-2024
                      • (2024)Time and Memory Scalable Algorithms for Clustering Tendency Assessment of Big DataInformation Sciences10.1016/j.ins.2024.120324(120324)Online publication date: Feb-2024
                      • (2023)Colorized iVAT Images for Labeled DataAdvances in Science, Technology and Engineering Systems Journal10.25046/aj0804138:4(111-121)Online publication date: Aug-2023
                      • (2023)State of the art on quality control for data streamsComputer Science Review10.1016/j.cosrev.2023.10055448:COnline publication date: 1-May-2023
                      • (2023)ML-aVAT: A Novel 2-Stage Machine-Learning Approach for Automatic Clustering Tendency AssessmentBig Data Research10.1016/j.bdr.2023.10041334(100413)Online publication date: Nov-2023
                      • (2023)An Improved Visual Assessment with Data-Dependent Kernel for Stream ClusteringAdvances in Knowledge Discovery and Data Mining10.1007/978-3-031-33374-3_16(197-209)Online publication date: 25-May-2023
                      • (2022)Method of Cumulative Anomaly Identification for Security Database Based on Discrete Markov chainSecurity and Communication Networks10.1155/2022/51137252022Online publication date: 1-Jan-2022
                      • (2022)A Trajectory Evaluator by Sub-tracks for Detecting VOT-based Anomalous TrajectoryACM Transactions on Knowledge Discovery from Data10.1145/349003216:4(1-19)Online publication date: 8-Jan-2022
                      • (2022)Scalable Cluster Tendency Assessment for Streaming Activity Data using Recurring Shapelets2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)10.1109/EMBC48229.2022.9871534(1036-1040)Online publication date: 11-Jul-2022
                      • (2022)P2OP—Plant Pathology on PalmsComputers and Electronics in Agriculture10.1016/j.compag.2022.107371202:COnline publication date: 1-Nov-2022
                      • Show More Cited By

                      View Options

                      Get Access

                      Login options

                      Full Access

                      View options

                      PDF

                      View or Download as a PDF file.

                      PDF

                      eReader

                      View online with eReader.

                      eReader

                      Media

                      Figures

                      Other

                      Tables

                      Share

                      Share

                      Share this Publication link

                      Share on social media