Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Interactive outlier exploration in big data streams

Published: 01 August 2014 Publication History
  • Get Citation Alerts
  • Abstract

    We demonstrate our VSOutlier system for supporting interactive exploration of outliers in big data streams. VSOutlier not only supports a rich variety of outlier types supported by innovative and efficient outlier detection strategies, but also provides a rich set of interactive interfaces to explore outliers in real time. Using the stock transactions dataset from the US stock market and the moving objects dataset from MITRE, we demonstrate that the VSOutlier system enables analysts to more efficiently identify, understand, and respond to phenomena of interest in near real-time even when applied to high volume streams.

    References

    [1]
    F. Angiulli and F. Fassetti. Distance-based outlier queries in data streams: the novel task and algorithms. Data Min. Knowl. Discov., 20(2):290--324, 2010.
    [2]
    F. Angiulli and C. Pizzuti. Fast outlier detection in high dimensional spaces. In PKDD, pages 15--26, 2002.
    [3]
    A. Arasu, S. Babu, and J. Widom. The cql continuous query language. VLDB J., 15(2):121--142, 2006.
    [4]
    M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. Lof: Identifying density-based local outliers. In SIGMOD Conference, pages 93--104, 2000.
    [5]
    L. Cao, D. Yang, Q. Wang, Y. Yu, J. Wang, and E. A. Rundensteiner. Scalable distance-based outlier detection over high-volume data streams. In ICDE, 2014.
    [6]
    J. Entzminger, J. N., C. Fowler, and W. Kenneally. Jointstars and gmti: past, present and future. Aerospace and Electronic Systems, IEEE Transactions on, 35(2):748--761, Apr. 1999.
    [7]
    D. Georgiadis, M. Kontaki, A. Gounaris, A. N. Papadopoulos, K. Tsichlas, and Y. Manolopoulos. Continuous outlier detection in data streams: an extensible framework and state-of-the-art algorithms. In SIGMOD Conference, pages 1061--1064, 2013.
    [8]
    I. INETATS. Stock trade traces. http://www.inetats.com/.
    [9]
    E. M. Knorr and R. T. Ng. Algorithms for mining distance-based outliers in large datasets. In VLDB, pages 392--403, 1998.
    [10]
    M. Kontaki, A. Gounaris, A. N. Papadopoulos, K. Tsichlas, and Y. Manolopoulos. Continuous monitoring of distance-based outliers over data streams. In ICDE, pages 135--146, 2011.
    [11]
    H.-P. Kriegel, M. Schubert, and A. Zimek. Angle-based outlier detection in high-dimensional data. In KDD, pages 444--452, 2008.
    [12]
    A. Nazaruk and M. Rauchman. Big data in capital markets. In SIGMOD Conference, pages 917--918, 2013.
    [13]
    S. Papadimitriou, H. Kitagawa, P. B. Gibbons, and C. Faloutsos. Loci: Fast outlier detection using the local correlation integral. In ICDE, pages 315--326, 2003.
    [14]
    S. Ramaswamy, R. Rastogi, and K. Shim. Efficient algorithms for mining outliers from large data sets. In SIGMOD Conference, pages 427--438, 2000.
    [15]
    Z. Xie, S. Huang, M. O. Ward, and E. A. Rundensteiner. Exploratory visualization of multivariate data with variable quality. In IEEE VAST, pages 183--190, 2006.
    [16]
    D. Yang, E. Rundensteiner, and M. Ward. Neighbor-based pattern detection over streaming data. In EDBT, pages 529--540, 2009.

    Cited By

    View all
    • (2024)Convolution and Cross-Correlation of Count Sketches Enables Fast Cardinality Estimation of Multi-Join QueriesProceedings of the ACM on Management of Data10.1145/36549322:3(1-26)Online publication date: 30-May-2024
    • (2022)TODProceedings of the VLDB Endowment10.14778/3570690.357070316:3(546-560)Online publication date: 1-Nov-2022
    • (2021)Multiple Dynamic Outlier-Detection from a Data Stream by Exploiting Duality of Data and QueriesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452810(2063-2075)Online publication date: 9-Jun-2021
    • Show More Cited By

    Index Terms

    1. Interactive outlier exploration in big data streams
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 7, Issue 13
      August 2014
      466 pages
      ISSN:2150-8097
      Issue’s Table of Contents

      Publisher

      VLDB Endowment

      Publication History

      Published: 01 August 2014
      Published in PVLDB Volume 7, Issue 13

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Convolution and Cross-Correlation of Count Sketches Enables Fast Cardinality Estimation of Multi-Join QueriesProceedings of the ACM on Management of Data10.1145/36549322:3(1-26)Online publication date: 30-May-2024
      • (2022)TODProceedings of the VLDB Endowment10.14778/3570690.357070316:3(546-560)Online publication date: 1-Nov-2022
      • (2021)Multiple Dynamic Outlier-Detection from a Data Stream by Exploiting Duality of Data and QueriesProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3452810(2063-2075)Online publication date: 9-Jun-2021
      • (2019)NETSProceedings of the VLDB Endowment10.14778/3342263.334226912:11(1303-1315)Online publication date: 1-Jul-2019
      • (2018)MacroBaseACM Transactions on Database Systems10.1145/327646343:4(1-45)Online publication date: 6-Dec-2018
      • (2018)Clustering stream data by exploring the evolution of density mountainProceedings of the VLDB Endowment10.1145/3164135.316413611:4(393-405)Online publication date: 5-Oct-2018
      • (2017)Clustering stream data by exploring the evolution of density mountainProceedings of the VLDB Endowment10.1145/3186728.316413611:4(393-405)Online publication date: 1-Dec-2017
      • (2017)MacroBaseProceedings of the 2017 ACM International Conference on Management of Data10.1145/3035918.3035928(541-556)Online publication date: 9-May-2017
      • (2017)Scalable validation of industrial equipment using a functional DSMSJournal of Intelligent Information Systems10.1007/s10844-016-0427-248:3(553-577)Online publication date: 1-Jun-2017
      • (2016)Distance-based outlier detection in data streamsProceedings of the VLDB Endowment10.14778/2994509.29945269:12(1089-1100)Online publication date: 1-Aug-2016
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media