Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/956750.956786acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Classifying large data sets using SVMs with hierarchical clusters

Published: 24 August 2003 Publication History

Abstract

Support vector machines (SVMs) have been promising methods for classification and regression analysis because of their solid mathematical foundations which convery several salient properties that other methods hardly provide. However, despite the prominent properties of SVMs, they are not as favored for large-scale data mining as for pattern recognition or machine learning because the training complexity of SVMs is highly dependent on the size of a data set. Many real-world data mining applications involve millions or billions of data records where even multiple scans of the entire data are too expensive to perform. This paper presents a new method, Clustering-Based SVM (CB-SVM), which is specifically designed for handling very large data sets. CB-SVM applies a hierarchical micro-clustering algorithm that scans the entire data set only once to provide an SVM with high quality samples that carry the statistical summaries of the data such that the summaries maximize the benefit of learning the SVM. CB-SVM tries to generate the best SVM boundary for very large data sets given limited amount of resources. Our experiments on synthetic and real data sets show that CB-SVM is highly scalable for very large data sets while also generating high classification accuracy.

References

[1]
D. K. Agarwal. Shrinkage estimator generalizations of proximal support vector machines. In Proc. 8th Int. Conf. Knowledge Discovery and Data Mining, Edmonton, Canada, 2002.]]
[2]
J. L. Balczar, Y. Dai, and O. Watanabe. A random sampling technique for training support vector machines. In Proc. 13th Int. Conf. Algorithmic Learning Theory, Washington D.C., 2001.]]
[3]
K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is "nearest neighbor" meaningful? Lecture Notes in Computer Science, 1540:217--235, 1999.]]
[4]
C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121--167, 1998.]]
[5]
G. Cauwenberghs and T. Poggio. Incremental and decremental support vector machine learning. In Proc. Advances in Neural Information Processing Systems, Vancouver, Canada, 2000.]]
[6]
C.-C. Chang and C.-J. Lin. Training nu-support vector classifiers: Thoery and algorithms. Neural Computation, 13:2119--2147, 2001.]]
[7]
R. Collobert and S. Bengio. SVMTorch: Support vector machines for large-scale regression problems. Journal of Machine Learning Research, 1:143--160, 2001.]]
[8]
G. Fung and O. L. Mangasarian. Proximal support vector machine classifiers. In Proc. 7th Int. Conf. Knowledge Discovery and Data Mining, San Francisco, CA, 2001.]]
[9]
R. Greiner, A. J. Grove, and D. Roth. Learning active classifiers. In Proc. 13th Int. Conf. Machine Learning, Bari, Italy, 1996.]]
[10]
S. Guha, R. Rastogi, and K. Shim. CURE: an efficient clustering algorithm for large databases. In Proc. ACM SIGMOD Int. Conf. on Management of Data, Seatle, WA, 1998.]]
[11]
O. W. J. L. Balczar, Y. Dai. A random sampling technique for training support vector machines. In The 2001 IEEE Int. Conf. Data Mining, San Jose, CA, 2001.]]
[12]
W. Jin, A. K. H. Tung, and J. Han. Mining top-n local outliers in large databases. In Proc. 7th Int. Conf. Knowledge Discovery and Data Mining, San Francisco, CA, 2001.]]
[13]
T. Joachims. Making large-scale support vector machine learning practical. In A. S. B. Scholkopf, C. Burges, editor, Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge, MA, 1998.]]
[14]
T. Joachims. Text categorization with support vector machines. In Proc. 10th European Conference on Machine Learning, Chemnitz, Germany, 1998.]]
[15]
G. Karypis, E.-H. Han, and V. Kumar. Chameleon: Hierarchical clustering using dynamic modeling. Computer, 32(8):68--75, 1999.]]
[16]
J. Kivinen, A. J. Smola, and R. C. Williamson. Online learning with kernels. In Proc. Advances in Neural Information Processing Systems, Cambridge, MA, 2002.]]
[17]
Y.-J. Lee and O. L. Mangasarian. RSVM: Reduced support vector machines. In First SIAM Int. Conf. Data Mining, Chicago, IL, 2001.]]
[18]
J. Platt. Fast training of support vector machines using sequential minimal optimization. In A. S. B. Scholkopf, C. Burges, editor, Advances in Kernel Methods: Support Vector Machines. MIT Press, Cambridge, MA, 1998.]]
[19]
G. Schohn and D. Cohn. Less is more: Active learning with support vector machines. In Proc. 17th Int. Conf. Machine Learning, Stanford, CA, 2000.]]
[20]
A. Smola and B. Sch. A tutorial on support vector regression. Technical report, 1998.]]
[21]
N. Syed, H. Liu, and K. Sung. Incremental learning with support vector machines. In Proc. the Workshop on Support Vector Machines at the International Joint Conference on Articial Intelligence, Stockholm, Sweden, 1999.]]
[22]
S. Tong and D. Koller. Support vector machine active learning with applications to text classification. In Proc. 17th Int. Conf. Machine Learning, Stanford, CA, 2000.]]
[23]
V. N. Vapnik. Statistical Learning Theory. John Wiley and Sons, 1998.]]
[24]
H. Yu, J. Han, and K. C. Chang. PEBL: Positive-example based learning for Web page classification using SVM. In Proc. 8th Int. Conf. Knowledge Discovery and Data Mining, Edmonton, Canada, 2002.]]
[25]
T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: an efficient data clustering method for very large databases. In Proc. ACM SIGMOD Int. Conf. on Management of Data, Montreal, Canada, 1996.]]

Cited By

View all
  • (2024)A NOVEL COVID-19 CLASSIFICATION METHOD BASED ON CURE CLUSTERINGScientific Journal of Mehmet Akif Ersoy University10.70030/sjmakeu.14607607:1(25-35)Online publication date: 30-Jun-2024
  • (2024)Instance Selection via Voronoi Neighbors for Binary Classification TasksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.332895236:8(3921-3933)Online publication date: Aug-2024
  • (2024)Unravelling incipient accidents: a machine learning prediction of incident risks in highway operationsSmart and Sustainable Built Environment10.1108/SASBE-08-2024-0316Online publication date: 15-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
August 2003
736 pages
ISBN:1581137370
DOI:10.1145/956750
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2003

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hierarchical cluster
  2. support vector machines

Qualifiers

  • Article

Conference

KDD03
Sponsor:

Acceptance Rates

KDD '03 Paper Acceptance Rate 46 of 298 submissions, 15%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)37
  • Downloads (Last 6 weeks)4
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A NOVEL COVID-19 CLASSIFICATION METHOD BASED ON CURE CLUSTERINGScientific Journal of Mehmet Akif Ersoy University10.70030/sjmakeu.14607607:1(25-35)Online publication date: 30-Jun-2024
  • (2024)Instance Selection via Voronoi Neighbors for Binary Classification TasksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.332895236:8(3921-3933)Online publication date: Aug-2024
  • (2024)Unravelling incipient accidents: a machine learning prediction of incident risks in highway operationsSmart and Sustainable Built Environment10.1108/SASBE-08-2024-0316Online publication date: 15-Oct-2024
  • (2024)A novel fusion Support Vector Machine integrating weak and sphere models for classification challenges with massive dataDecision Analytics Journal10.1016/j.dajour.2024.10045711(100457)Online publication date: Jun-2024
  • (2024)Exploring AI models and applications within a system frameworkSystems Research and Behavioral Science10.1002/sres.3036Online publication date: 21-Jun-2024
  • (2023)Domain generated algorithms detection applying a combination of a deep feature selection and traditional machine learning modelsJournal of Computer Security10.3233/JCS-21013931:1(85-105)Online publication date: 1-Jan-2023
  • (2023)A New Sparse Data Clustering Method Based On Frequent ItemsProceedings of the ACM on Management of Data10.1145/35886851:1(1-28)Online publication date: 30-May-2023
  • (2023)A faster SVM classification technique for remote sensing images using reduced training samplesJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-023-04689-414:12(16807-16827)Online publication date: 11-Oct-2023
  • (2022)Random Partition Based Adaptive Distributed Kernelized SVM for Big DataIEEE Access10.1109/ACCESS.2022.320411410(95623-95637)Online publication date: 2022
  • (2022)New incremental SVM algorithms for human activity recognition in smart homesJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-022-03798-w14:10(13433-13450)Online publication date: 24-Mar-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media