article

Linear separability and classification complexity

Authors:

David A. Elizondo,

Ralph Birkenhead,

Esteban AlfaroAuthors Info & Claims

Expert Systems with Applications: An International Journal, Volume 39, Issue 9

Pages 7796 - 7807

https://doi.org/10.1016/j.eswa.2012.01.090

Published: 01 July 2012 Publication History

Abstract

We study the relationship between linear separability and the level of complexity of classification data sets. Linearly separable classification problems are generally easier to solve than non linearly separable ones. This suggests a strong correlation between linear separability and classification complexity. We propose a novel and simple method for quantifying the complexity of the classification problem. The method, which is shown below, reduces any two class classification problem to a sequence of linearly separable steps. The number of such reduction steps could be viewed as measuring the degree of non-separability and hence the complexity of the problem. This quantification in turn can be used as a measure for the complexity of classification data sets. Results obtained using several benchmarks are provided.

References

[1]

Atiya, A. (2005). Learning with kernels: Support vector machines, regularization, optimization, and beyond. IEEE TNN.

Digital Library

[2]

Data complexity assessment in undersampled classification of high-dimensional biomedical data. Pattern Recognition Letters. v27. 1383-1389.

Digital Library

[3]

Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software. v1. 23-34.

[4]

Domain of competence of xcs classifier system in complexity measurement space. IEEE Transactions on Evolutionary Computation. v9. 82-104.

Digital Library

[5]

Blake, C. L., Newman, D. J., Hettich, S., & Merz, C. J. (1998). UCI repository of machine learning databases.

[6]

Boser, B., Guyon, I., & Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on computational learning theory.

Digital Library

[7]

Coomans, D., Aeberhard, S., & de Vel, O. (1992). Comparison of classifiers in high dimensional settings. Technical Report 92-02, Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland.

[8]

Support-vector network. Machine Learning. v20. 273-297.

Digital Library

[9]

. An introduction to support vector machines, 2003.Cambridge University Press.

[10]

Nosing around the neighborhood: A new system structure and classification rule for recognition in partially exposed environments. IEEE Transactions on Pattern Analysis and Machine Intelligence. v2 i1. 67-71.

Digital Library

[11]

Computer-intensive methods in statistics. Scientific American. v248.

[12]

Dong, M. (2007). A new measure of classifiability and its applications. PhD thesis, University of Cincinnati, Cincinnati, Ohio.

[13]

Feature subset selection using a new definition of classifiability. Pattern Recognition Letters. v24. 1215-1225.

Digital Library

[14]

Classifiability-based omnivariate decision trees. IEEE Transactions on Neural Networks. v16. 1547-1560.

Digital Library

[15]

Duin, R. P. W. (1999). Compactness and complexity of pattern recognition problems. In C. Perneel, (Ed.), Proceedings of the international symposium on pattern recognition in memoriam pierre (pp. 124-128).

[16]

Elizondo, D. (1997). The recursive determinist perceptron (RDP) and topology reduction strategies for neural networks. PhD thesis, Université Louis Pasteur, Strasbourg, France.

[17]

Elizondo, D. A. (1997). The recursive determinist perceptron (RDP) and topology reduction strategies for neural networks. PhD thesis, Université Louis Pasteur, Strasbourg, France.

[18]

Elizondo, D. (2004). Searching for linearly separable subsets using the class of linear separability method. In IEEE-IJCNN (pp. 955-960).

[19]

Current trends on knowledge extraction and neural networks. In: Duch, W. (Ed.), Proceedings of the IEEE-ICANN, Springer.

Digital Library

[20]

The linear separability problem: Some testing methods IEEE. Transactions on Neural Networks. v17 i2. 330-344.

Digital Library

[21]

The use of multiple measurements in taxonomic problems. Annual Eugenics. v7 iII. 179-188.

[22]

The reduced nearest neighbor rule. IEEE Transactions on Information Theory. 431-433.

Digital Library

[23]

Analysis of hidden units in a layered network trained to classify sonar targets. Neural Networks. v1 i1.

[24]

Forensic interpretation of glass evidence. CRC Press, Boca Raton.

[25]

Complexity measures of supervised classification problems. IEEE Transaction on Pattern Analysis and Machine Intelligence. v24. 289-300.

Digital Library

[26]

Multiple comparison procedures. John Wiley.

[27]

Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Technical Digest. v10. 262-266.

[28]

Assistant 86: A knowledge-elicitation tool for sophisticated users. In: Bratko, I., Lavrac, N. (Eds.), Progress in Machine Learning, Sigma Press. pp. 31-45.

[29]

Robust test for equality of variances. Stanford University Press.

[30]

Domains of competence of fuzzy rule based classification systems with data complexity measures: A case of study using a fuzzy hybrid genetic based machine learning method. Fuzzy Sets and Systems. v161. 3-19.

Digital Library

[31]

Cancer diagnosis via linear programming. SIAM News. v23 i5. 1-18.

[32]

. Data characterization for effective prototype selection, 2005.Springer.

[33]

Geoentropy: A measure of complexity and similarity. Pattern Recognition. v43 i3. 887-896.

Digital Library

[34]

Sedgewick, R. (1983). Algorithms. Addison-Wesley Publishing Company. {Chapter 38, p. 508}.

[35]

Estimating classification complexity. IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]

The recursive deterministic perceptron neural network. Neural Networks. v11. 1571-1588.

Digital Library

[37]

Adapting the 2-class recursive deterministic perceptron neural network to m-classes. In: The International Conference on Neural Networks (ICNN), IEEE.

[38]

Thrun, S. B., Bala, J., Bloendorn, E., Bratko, I., Cestnik, B., Cheng, J., et al. (1991). The monks problems, a performance comparison of different learning algorithms. Technical Report CMU-CS-91-197, Carnegie Mellon University.

[39]

Computer systems that learn. Morgan Kaufmann Publishers, San Mateo, California.

[40]

Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. In Proceedings of the national academy of sciences (Vol. 87, pp. 9193-9196).

Cited By

Lorena AGarcia LLehmann JSouto MHo T(2019)How Complex Is Your Classification Problem?ACM Computing Surveys10.1145/334771152:5(1-34)Online publication date: 13-Sep-2019
https://dl.acm.org/doi/10.1145/3347711
Amalaman PEick CWang C(2017)Supervised Taxonomies—Algorithms and ApplicationsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.269845129:9(2040-2052)Online publication date: 4-Aug-2017
https://dl.acm.org/doi/10.1109/TKDE.2017.2698451
Singh APandey B(2016)An Efficient Diagnosis System for Detection of Liver Disease Using a Novel Integrated Method Based on Principal Component Analysis and K-Nearest Neighbor PCA-KNNInternational Journal of Healthcare Information Systems and Informatics10.4018/IJHISI.201610010311:4(56-69)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.4018/IJHISI.2016100103
Show More Cited By

Index Terms

Linear separability and classification complexity
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees

Index terms have been assigned to the content through auto-classification.

Recommendations

Monotonicity of Linear Separability Under Translation

A set of n pattern vectors are given in d-space and classified arbitrarily into two sets. The sets of patterns are said to be linearly separable if there exists a hyperplane that separates them. We ask whether translation of one of these sets in an ...
Complexity Measures of Supervised Classification Problems

We studied a number of measures that characterize the difficulty of a classification problem, focusing on the geometrical complexity of the class boundary. We compared a set of real-world problems to random labelings of points and found that real ...
Feature selection under a complexity constraint
Special section on communities and media computing

Classification on mobile devices is often done in an uninterrupted fashion. This requires algorithms with gentle demands on the computational complexity. The performance of a classifier depends heavily on the set of features used as input variables. ...

Comments

Information & Contributors

Information

Published In

cover image Expert Systems with Applications: An International Journal

Expert Systems with Applications: An International Journal Volume 39, Issue 9

July, 2012

920 pages

ISSN:0957-4174

Issue’s Table of Contents

Copyright © Elsevier Ltd © 2012.

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 July 2012

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lorena AGarcia LLehmann JSouto MHo T(2019)How Complex Is Your Classification Problem?ACM Computing Surveys10.1145/334771152:5(1-34)Online publication date: 13-Sep-2019
https://dl.acm.org/doi/10.1145/3347711
Amalaman PEick CWang C(2017)Supervised Taxonomies—Algorithms and ApplicationsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.269845129:9(2040-2052)Online publication date: 4-Aug-2017
https://dl.acm.org/doi/10.1109/TKDE.2017.2698451
Singh APandey B(2016)An Efficient Diagnosis System for Detection of Liver Disease Using a Novel Integrated Method Based on Principal Component Analysis and K-Nearest Neighbor PCA-KNNInternational Journal of Healthcare Information Systems and Informatics10.4018/IJHISI.201610010311:4(56-69)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.4018/IJHISI.2016100103
Singh APandey B(2016)Diagnosis of Liver Disease by Using Least Squares Support Vector Machine ApproachInternational Journal of Healthcare Information Systems and Informatics10.4018/IJHISI.201604010411:2(62-75)Online publication date: 1-Apr-2016
https://dl.acm.org/doi/10.4018/IJHISI.2016040104
Cano J(2013)Analysis of data complexity measures for classificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2013.02.02540:12(4820-4831)Online publication date: 1-Sep-2013
https://dl.acm.org/doi/10.1016/j.eswa.2013.02.025

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents