Abstract
The last decade has witnessed an unprecedented growth in availability of data having spatio-temporal characteristics. Given the scale and richness of such data, finding spatio-temporal patterns that demonstrate significantly different behavior from their neighbors could be of interest for various application scenarios such as—weather modeling, analyzing spread of disease outbreaks, monitoring traffic congestions, and so on. In this paper, we propose an automated approach of exploring and discovering such anomalous patterns irrespective of the underlying domain from which the data is recovered. Our approach differs significantly from traditional methods of spatial outlier detection, and employs two phases—(i) discovering homogeneous regions, and (ii) evaluating these regions as anomalies based on their statistical difference from a generalized neighborhood. We evaluate the quality of our approach and distinguish it from existing techniques via an extensive experimental evaluation.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
In this paper, we extensively use color-based figures to illustrate the concepts of anomalies. Hence, we request the reader to refer to the electronic version or a colored printout of the paper for better readability.
For sake of clarity, we illustrate a spatial grid; however, the formulation is extendible to the temporal dimension.
We do not include outlier detection techniques in our comparative analysis since it is not clear as to how outlier detection techniques that estimate divergent behavior at each data object level may be fairly compared with techniques that discover groups of objects that exhibit divergent behavior.
It must be noted that conducting user surveys is a difficult task. Hence, we conducted the user survey on \(Dataset_1\) only and not on on \(Dataset_2\).
References
Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. 2012 IEEE Conf Comput Vis Pattern Recognit 0:1597–1604
Arbelaez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Analy Mach Intell 33(5):898–916
Birant D, Kut A (2007) St-dbscan: An algorithm for clustering spatial-temporal data. Data Knowl Eng 60(1):208–221
Bonett DG (2006) Confidence interval for a coefficient of quartile variation. Comput Stat Data Anal 50(11):2953–2957
Bonnet N, Cutrona J, Herbin M (2002) A no-thresholdhistogram-based image segmentation method. Pattern Recognit 35(10):2319–2322
Ceriani L, Verme P (2012) The origins of the gini index: extracts from variabilità e mutabilità (1912) by corrado gini. J Econ Inequal 10(3):421–443
Cheng T, Li Z (2004) A hybrid approach to detect spatial-temporal outliers. In Proceedings of the 12th International Conference on Geoinformatics Geospatial Information Research, pp. 173–178.
Deaton A (1997) The analysis of household surveys: a microeconometric approach to development policy. Johns Hopkins University Press, Baltimore
Duczmal L (2004) A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters. Comput Stat Data Anal 45(2):269–286
El-Hamdouchi A, Willett P (1989) Comparison of hierarchie agglomerative clustering methods for document retrieval. Comput J 32(3):220–227
Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pp. 226–231
Fan J, Yau DK, Elmagarmid AK, Aref WG (2001) Automatic image segmentation by integrating color-edge extraction and seeded region growing. IEEE Trans Image Process 10(10):1454–1466
Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comput Vis 59(2):167–181
Friedman JH, Fisher NI (1999) Bump hunting in high-dimensional data. Stat Comput 9(2):123–143
Gajdos T, Weymark JA (2005) Multidimensional generalized Gini indices. Econ Theory 26(3):471–496
Grady L, Schwartz EL (2006) Isoperimetric graph partitioning for image segmentation. IEEE Trans Pattern Anal Mach Intell 28(3):469–475
Huelsenbeck JP, Crandall KA (1997) Phylogeny estimation and hypothesis testing using maximum likelihood. Ann Rev Ecol Syst 28:437–466
Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Joseph FL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378–382
Kisilevich S, Mansmann F, Nanni M, Rinzivillo S (2010) Spatio-temporal clustering: a survey. Data mining and knowledge discovery handbook. Springer, New York, pp 855–874
Kou Y, tien Lu C (2006) Spatial weighted outlier detection. In Proceedings of SIAM Conference on Data Mining
Kulldorff M (1997) A spatial scan statistic. Commun Stat-Theory Methods 26(6):1481–1496
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(11):159–174
Lukasová A (1979) Hierarchical agglomerative clustering procedure. Pattern Recognit 11(5–6):365–381
Mankiewicz R (2000) The story of mathematics. Princeton University Department of Art, Princeton
Mood A, Graybill F, Boes D (1963) Introduction to the theory of statistics. Mc-graw hill book company. Inc., New York
Neill DB, Moore AW (2004) Rapid detection of significant spatial clusters. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’04, pp. 256–265, New York, NY. ACM.
Neill DB, Moore AW, Cooper GF (2005) A bayesian spatial scan statistic. In NIPS
Ohlander R, Price K, Reddy DR (1978) Picture segmentation using a recursive region splitting method. Comput Gr Image Process 8(3):313–333
Pang LX, Chawla S, Liu W, Zheng Y (2011) On mining anomalous patterns in road traffic streams. In Advanced Data Mining and Applications, pp. 237–251. Springer
Patil GP, Taillie C (2004) Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environ Ecol Stat 11:183–197
Reades J, Calabrese F, Sevtsuk A, Ratti C (2007) Cellular census: explorations in urban data collection. IEEE Pervasive Comput 6(3):30–38
Revol C, Jourlin M (1997) A new minimum variance region growing algorithm for image segmentation. Pattern Recognit Lett 18(3):249–258
Schubert E, Zimek A, Kriegel H-P (2014) Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Know Discov 28(1):190–237
Shekar S, Lu C-T, Zhang P (2002) Detecting graph-based spatial outliers. Intell Data Anal 6(5):451–468
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Sindhu B, Suresh I, Unnikrishnan A, Bhatkar N, Neetu S, Michael G (2007) Improved bathymetric datasets for the shallow water regions in the indian ocean. J Earth Syst Sci 116(3):261–274
Stolorz PE, Nakamura H, Mesrobian E, Muntz RR, Shek EC, Santos JR, Yi J, Ng KW, Chien S-Y, Mechoso CR, Farrara JD (1995) Fast spatio-temporal data mining of large geophysical datasets. In KDD, pp. 300–305
Tango T, Takahashi K (2005) A flexibly shaped spatial scan statistic for detecting clusters. Int J Health Geogr 4:11
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Toon Calders, Floriana Esposito, Eyke Hüllermeier, Rosa Meo.
Rights and permissions
About this article
Cite this article
Telang, A., Deepak, P., Joshi, S. et al. Detecting localized homogeneous anomalies over spatio-temporal data. Data Min Knowl Disc 28, 1480–1502 (2014). https://doi.org/10.1007/s10618-014-0366-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-014-0366-x