Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1557019.1557112acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Category detection using hierarchical mean shift

Published: 28 June 2009 Publication History

Abstract

Many applications in surveillance, monitoring, scientific discovery, and data cleaning require the identification of anomalies. Although many methods have been developed to identify statistically significant anomalies, a more difficult task is to identify anomalies that are both interesting and statistically significant. Category detection is an emerging area of machine learning that can help address this issue using a "human-in-the-loop" approach. In this interactive setting, the algorithm asks the user to label a query data point under an existing category or declare the query data point to belong to a previously undiscovered category. The goal of category detection is to bring to the user's attention a representative data point from each category in the data in as few queries as possible. In a data set with imbalanced categories, the main challenge is in identifying the rare categories or anomalies; hence, the task is often referred to as rare category detection. We present a new approach to rare category detection based on hierarchical mean shift. In our approach, a hierarchy is created by repeatedly applying mean shift with an increasing bandwidth on the data. This hierarchy allows us to identify anomalies in the data set at different scales, which are then posed as queries to the user. The main advantage of this methodology over existing approaches is that it does not require any knowledge of the dataset properties such as the total number of categories or the prior probabilities of the categories. Results on real-world data sets show that our hierarchical mean shift approach performs consistently better than previous techniques.

Supplementary Material

JPG File (p847-wong.jpg)
MP4 File (p847-wong.mp4)

References

[1]
Mikhail Bilenko, Sugato Basu, and Raymond J. Mooney. Integrating constraints and metric learning in semi-supervised clustering. In Proceedings of the Twenty-First International Conference of Machine Learning, pages 81--88, New York, NY, 2004. ACM Press.
[2]
Yizong Cheng. Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell., 17(8):790--799, 1995.
[3]
Dorin Comaniciu and Peter Meer. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell., 24(5):603--619, 2002.
[4]
Daniel Dementhon. Spatio-temporal segmentation of video by hierarchical mean shift analysis. In SMVP 2002 (Statistical Methods in Video Processing Workshop), 2002.
[5]
Shai Fine and Yishay Mansour. Active sampling for multiple output identification. Mach. Learn., 69(2-3):213--228, 2007.
[6]
K. Fukunaga and L. Hostetler. The estimation of the gradient of a density function, with applications in pattern recognition. Information Theory, IEEE Transactions on, 21(1):32--40, 1975.
[7]
Bogdan Georgescu, Ilan Shimshoni, and Peter Meer. Mean shift based clustering in high dimensions: A texture classification example. In ICCV '03: Proceedings of the Ninth IEEE International Conference on Computer Vision, page 456, Washington, DC, USA, 2003. IEEE Computer Society.
[8]
Jingrui He and Jaime Carbonell. Nearest-neighbor-based active learning for rare category detection. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 633--640. MIT Press, Cambridge, MA, 2008.
[9]
M. Chris Jones, James S. Marron, and Simon J. Sheather. A brief survey of bandwidth selection for density estimation. Journal of American Statistical Association, 91(433):401--407, March 1996.
[10]
Andrew Moore Kan Deng. Multiresolution instance-based learning. In Proceedings of the Twelfth International Joint Conference on Artificial Intellingence, pages 1233--1239, San Francisco, 1995. Morgan Kaufmann.
[11]
Ashish Kapoor, Kristen Grauman, Raquel Urtasun, and Trevor Darrell. Active learning with gaussian processes for object categorization. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 1--8, 2007.
[12]
Yee Leung, Jiang-She Zhang, and Zong-Ben Xu. Clustering by scale-space filtering. IEEE Trans. Pattern Anal. Mach. Intell., 22(12):1396--1410, 2000.
[13]
Wendy L. Martinez. Exploratory Data Analysis with MATLAB (Computer Science and Data Analysis). Chapman&Hall/CRC, November 2004.
[14]
C.L. Blake D.J. Newman and C.J. Merz. UCI repository of machine learning databases, 1998.
[15]
Dan Pelleg and Andrew Moore. Active learning for anomaly and rare-category detection. In Advances in Neural Information Processing Systems 18, December 2004.
[16]
Franco P. Preparata and Michael Ian Shamos. Computational Geometry - An Introduction. Springer, 1985.
[17]
Alexander S. Szalay. The sloan digital sky survey. Comput. Sci. Eng., 1(2):54--62, 1999.
[18]
Ping Wang, Dongryeol Lee, Alexander Gray, and James Rehg. Fast mean shift with accurate and stable convergence. In In Proceedings of AISTATS 2007, 2007.
[19]
Eric P. Xing, Andrew Y. Ng, Michael I. Jordan, and Stuart Russell. Distance metric learning, with application to clustering with side-information. In Advances in Neural Information Processing Systems 15, pages 505--512. MIT Press, 2003.
[20]
Changjiang Yang, Ramani Duraiswami, Nail A. Gumerov, and Larry Davis. Improved fast gauss transform and efficient kernel density estimation. In ICCV '03: Proceedings of the Ninth IEEE International Conference on Computer Vision, page 464, Washington, DC, USA, 2003. IEEE Computer Society.
[21]
Liu Yang and Rong Jin. An efficient algorithm for local distance metric learning. In in Proceedings of AAAI, 2006.

Cited By

View all
  • (2023)Rare Category Analysis for Complex Data: A ReviewACM Computing Surveys10.1145/362652056:5(1-35)Online publication date: 27-Nov-2023
  • (2022)HardVis: Visual Analytics to Handle Instance Hardness Using Undersampling and Oversampling TechniquesComputer Graphics Forum10.1111/cgf.1472642:1(135-154)Online publication date: 27-Dec-2022
  • (2022)iNet: visual analysis of irregular transition in multivariate dynamic networksFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-020-0013-116:2Online publication date: 1-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
June 2009
1426 pages
ISBN:9781605584959
DOI:10.1145/1557019
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. anomaly detection
  2. category detection
  3. clustering
  4. mean shift

Qualifiers

  • Research-article

Conference

KDD09

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Rare Category Analysis for Complex Data: A ReviewACM Computing Surveys10.1145/362652056:5(1-35)Online publication date: 27-Nov-2023
  • (2022)HardVis: Visual Analytics to Handle Instance Hardness Using Undersampling and Oversampling TechniquesComputer Graphics Forum10.1111/cgf.1472642:1(135-154)Online publication date: 27-Dec-2022
  • (2022)iNet: visual analysis of irregular transition in multivariate dynamic networksFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-020-0013-116:2Online publication date: 1-Apr-2022
  • (2022)Detecting Complex Intrusion Attempts Using Hybrid Machine Learning TechniquesIntelligent Systems and Applications10.1007/978-3-031-16075-2_10(150-170)Online publication date: 1-Sep-2022
  • (2021)RCDVis: interactive rare category detection on graph dataJournal of Visualization10.1007/s12650-021-00788-6Online publication date: 2-Sep-2021
  • (2020)RCAnalyzer: visual analytics of rare categories in dynamic networksFrontiers of Information Technology & Electronic Engineering10.1631/FITEE.190031021:4(491-506)Online publication date: 30-Apr-2020
  • (2020)Discovering Anomalies by Incorporating Feedback from an ExpertACM Transactions on Knowledge Discovery from Data10.1145/339660814:4(1-32)Online publication date: 22-Jun-2020
  • (2020)LERI: Local Exploration for Rare-Category IdentificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.2911941(1-1)Online publication date: 2020
  • (2019)A Knowledge-Based Semisupervised Hierarchical Online Topic Detection FrameworkIEEE Transactions on Cybernetics10.1109/TCYB.2018.284150449:9(3307-3321)Online publication date: Sep-2019
  • (2018)RCLens: Interactive Rare Category Exploration and IdentificationIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2017.271103024:7(2223-2237)Online publication date: 1-Jul-2018
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media