Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Linear separability and classification complexity

Published: 01 July 2012 Publication History
  • Get Citation Alerts
  • Abstract

    We study the relationship between linear separability and the level of complexity of classification data sets. Linearly separable classification problems are generally easier to solve than non linearly separable ones. This suggests a strong correlation between linear separability and classification complexity. We propose a novel and simple method for quantifying the complexity of the classification problem. The method, which is shown below, reduces any two class classification problem to a sequence of linearly separable steps. The number of such reduction steps could be viewed as measuring the degree of non-separability and hence the complexity of the problem. This quantification in turn can be used as a measure for the complexity of classification data sets. Results obtained using several benchmarks are provided.

    References

    [1]
    Atiya, A. (2005). Learning with kernels: Support vector machines, regularization, optimization, and beyond. IEEE TNN.
    [2]
    Data complexity assessment in undersampled classification of high-dimensional biomedical data. Pattern Recognition Letters. v27. 1383-1389.
    [3]
    Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software. v1. 23-34.
    [4]
    Domain of competence of xcs classifier system in complexity measurement space. IEEE Transactions on Evolutionary Computation. v9. 82-104.
    [5]
    Blake, C. L., Newman, D. J., Hettich, S., & Merz, C. J. (1998). UCI repository of machine learning databases.
    [6]
    Boser, B., Guyon, I., & Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on computational learning theory.
    [7]
    Coomans, D., Aeberhard, S., & de Vel, O. (1992). Comparison of classifiers in high dimensional settings. Technical Report 92-02, Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland.
    [8]
    Support-vector network. Machine Learning. v20. 273-297.
    [9]
    . An introduction to support vector machines, 2003.Cambridge University Press.
    [10]
    Nosing around the neighborhood: A new system structure and classification rule for recognition in partially exposed environments. IEEE Transactions on Pattern Analysis and Machine Intelligence. v2 i1. 67-71.
    [11]
    Computer-intensive methods in statistics. Scientific American. v248.
    [12]
    Dong, M. (2007). A new measure of classifiability and its applications. PhD thesis, University of Cincinnati, Cincinnati, Ohio.
    [13]
    Feature subset selection using a new definition of classifiability. Pattern Recognition Letters. v24. 1215-1225.
    [14]
    Classifiability-based omnivariate decision trees. IEEE Transactions on Neural Networks. v16. 1547-1560.
    [15]
    Duin, R. P. W. (1999). Compactness and complexity of pattern recognition problems. In C. Perneel, (Ed.), Proceedings of the international symposium on pattern recognition in memoriam pierre (pp. 124-128).
    [16]
    Elizondo, D. (1997). The recursive determinist perceptron (RDP) and topology reduction strategies for neural networks. PhD thesis, Université Louis Pasteur, Strasbourg, France.
    [17]
    Elizondo, D. A. (1997). The recursive determinist perceptron (RDP) and topology reduction strategies for neural networks. PhD thesis, Université Louis Pasteur, Strasbourg, France.
    [18]
    Elizondo, D. (2004). Searching for linearly separable subsets using the class of linear separability method. In IEEE-IJCNN (pp. 955-960).
    [19]
    Current trends on knowledge extraction and neural networks. In: Duch, W. (Ed.), Proceedings of the IEEE-ICANN, Springer.
    [20]
    The linear separability problem: Some testing methods IEEE. Transactions on Neural Networks. v17 i2. 330-344.
    [21]
    The use of multiple measurements in taxonomic problems. Annual Eugenics. v7 iII. 179-188.
    [22]
    The reduced nearest neighbor rule. IEEE Transactions on Information Theory. 431-433.
    [23]
    Analysis of hidden units in a layered network trained to classify sonar targets. Neural Networks. v1 i1.
    [24]
    Forensic interpretation of glass evidence. CRC Press, Boca Raton.
    [25]
    Complexity measures of supervised classification problems. IEEE Transaction on Pattern Analysis and Machine Intelligence. v24. 289-300.
    [26]
    Multiple comparison procedures. John Wiley.
    [27]
    Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Technical Digest. v10. 262-266.
    [28]
    Assistant 86: A knowledge-elicitation tool for sophisticated users. In: Bratko, I., Lavrac, N. (Eds.), Progress in Machine Learning, Sigma Press. pp. 31-45.
    [29]
    Robust test for equality of variances. Stanford University Press.
    [30]
    Domains of competence of fuzzy rule based classification systems with data complexity measures: A case of study using a fuzzy hybrid genetic based machine learning method. Fuzzy Sets and Systems. v161. 3-19.
    [31]
    Cancer diagnosis via linear programming. SIAM News. v23 i5. 1-18.
    [32]
    . Data characterization for effective prototype selection, 2005.Springer.
    [33]
    Geoentropy: A measure of complexity and similarity. Pattern Recognition. v43 i3. 887-896.
    [34]
    Sedgewick, R. (1983). Algorithms. Addison-Wesley Publishing Company. {Chapter 38, p. 508}.
    [35]
    Estimating classification complexity. IEEE Transactions on Pattern Analysis and Machine Intelligence.
    [36]
    The recursive deterministic perceptron neural network. Neural Networks. v11. 1571-1588.
    [37]
    Adapting the 2-class recursive deterministic perceptron neural network to m-classes. In: The International Conference on Neural Networks (ICNN), IEEE.
    [38]
    Thrun, S. B., Bala, J., Bloendorn, E., Bratko, I., Cestnik, B., Cheng, J., et al. (1991). The monks problems, a performance comparison of different learning algorithms. Technical Report CMU-CS-91-197, Carnegie Mellon University.
    [39]
    Computer systems that learn. Morgan Kaufmann Publishers, San Mateo, California.
    [40]
    Wolberg, W. H., & Mangasarian, O. L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. In Proceedings of the national academy of sciences (Vol. 87, pp. 9193-9196).

    Cited By

    View all
    • (2019)How Complex Is Your Classification Problem?ACM Computing Surveys10.1145/334771152:5(1-34)Online publication date: 13-Sep-2019
    • (2017)Supervised Taxonomies—Algorithms and ApplicationsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.269845129:9(2040-2052)Online publication date: 4-Aug-2017
    • (2016)An Efficient Diagnosis System for Detection of Liver Disease Using a Novel Integrated Method Based on Principal Component Analysis and K-Nearest Neighbor PCA-KNNInternational Journal of Healthcare Information Systems and Informatics10.4018/IJHISI.201610010311:4(56-69)Online publication date: 1-Oct-2016
    • Show More Cited By

    Index Terms

    1. Linear separability and classification complexity
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Expert Systems with Applications: An International Journal
        Expert Systems with Applications: An International Journal  Volume 39, Issue 9
        July, 2012
        920 pages

        Publisher

        Pergamon Press, Inc.

        United States

        Publication History

        Published: 01 July 2012

        Author Tags

        1. Classification
        2. Complexity
        3. Linear separability
        4. Non linear separability

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 13 Aug 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2019)How Complex Is Your Classification Problem?ACM Computing Surveys10.1145/334771152:5(1-34)Online publication date: 13-Sep-2019
        • (2017)Supervised Taxonomies—Algorithms and ApplicationsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.269845129:9(2040-2052)Online publication date: 4-Aug-2017
        • (2016)An Efficient Diagnosis System for Detection of Liver Disease Using a Novel Integrated Method Based on Principal Component Analysis and K-Nearest Neighbor PCA-KNNInternational Journal of Healthcare Information Systems and Informatics10.4018/IJHISI.201610010311:4(56-69)Online publication date: 1-Oct-2016
        • (2016)Diagnosis of Liver Disease by Using Least Squares Support Vector Machine ApproachInternational Journal of Healthcare Information Systems and Informatics10.4018/IJHISI.201604010411:2(62-75)Online publication date: 1-Apr-2016
        • (2013)Analysis of data complexity measures for classificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2013.02.02540:12(4820-4831)Online publication date: 1-Sep-2013

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media