Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

A Supervised Learning Model for High-Dimensional and Large-Scale Data

Published: 02 November 2016 Publication History

Abstract

We introduce a new supervised learning model using a discriminative regression approach. This new model estimates a regression vector to represent the similarity between a test example and training examples while seamlessly integrating the class information in the similarity estimation. This distinguishes our model from usual regression models and locally linear embedding approaches, rendering our method suitable for supervised learning problems in high-dimensional settings. Our model is easily extensible to account for nonlinear relationship and applicable to general data, including both high- and low-dimensional data. The objective function of the model is convex, for which two optimization algorithms are provided. These two optimization approaches induce two scalable solvers that are of mathematically provable, linear time complexity. Experimental results verify the effectiveness of the proposed method on various kinds of data. For example, our method shows comparable performance on low-dimensional data and superior performance on high-dimensional data to several widely used classifiers; also, the linear solvers obtain promising performance on large-scale classification.

References

[1]
Zeynep Akata, Florent Perronnin, Zaid Harchaoui, and Cordelia Schmid. 2014. Good practice in large-scale learning for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 36, 3 (2014), 507--520.
[2]
Peter Andras. 2014. Function approximation using combined unsupervised and supervised learning. IEEE Trans. Neur. Netw. Learn. Syst. 25, 3 (2014), 495--505.
[3]
K. Bache and M. Lichman. 2013. UCI Machine Learning Repository. (2013). Retrieved from http://archive.ics.uci.edu/ml.
[4]
Amir Beck and Marc Teboulle. 2009. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 1 (2009), 183--202.
[5]
Peter N. Belhumeur, João P Hespanha, and David Kriegman. 1997. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 19, 7 (1997), 711--720.
[6]
Nicolai Bissantz, Lutz Dümbgen, Axel Munk, and Bernd Stratmann. 2009. Convergence analysis of generalized iteratively reweighted least squares algorithms on convex function spaces. SIAM J. Optim. 19, 4 (2009), 1828--1845.
[7]
Léon Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010. Springer, 177--186.
[8]
Olivier Bousquet and Léon Bottou. 2008. The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems. 161--168.
[9]
Deng Cai, Xiaofei He, and Jiawei Han. 2007. Efficient kernel discriminant analysis via spectral regression. In Seventh IEEE International Conference on Data Mining, 2007. ICDM 2007. IEEE, 427--432.
[10]
Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright. 2011. Robust principal component analysis? Journal of the ACM (JACM) 58, 3 (2011), 11.
[11]
Chih-Chung Chang and Chih-Jen Lin. 2001. IJCNN 2001 challenge: Generalization ability and text decoding. In Proceedings of IJCNN. IEEE. Citeseer.
[12]
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 3 (2011), 27.
[13]
Kai-Wei Chang, Cho-Jui Hsieh, and Chih-Jen Lin. 2008. Coordinate descent method for large-scale l2-loss linear support vector machines. Jo. Mach. Learn. Res. 9 (2008), 1369--1398.
[14]
Scott Shaobing Chen, David L. Donoho, and Michael A. Saunders. 1998. Atomic decomposition by basis pursuit. SIAM Rev. 20, 1 (1998), 33--61.
[15]
Qiang Cheng, Hongbo Zhou, Jie Cheng, and Huiqing Li. 2014. A minimax framework for classification with applications to images and high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36, 11 (2014), 2117--2130.
[16]
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Mach. Learn. 20, 3 (1995), 273--297.
[17]
Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7 (2006), 1--30.
[18]
Luc Devroye. 1996. A Probabilistic Theory of Pattern Recognition. Vol. 31. Springer Science 8 Business Media.
[19]
Marco F. Duarte and Yu Hen Hu. 2004. Vehicle classification in distributed sensor networks. J. Parallel Distrib. Comput. 64, 7 (2004), 826--838.
[20]
Richard O. Duda, Peter E. Hart, and David G. Stork. 2012. Pattern Classification. John Wiley 8 Sons.
[21]
Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2001. The Elements of Statistical Learning. Springer series in statistics, Springer, Berlin, Vol. 1.
[22]
Keinosuke Fukunaga. 2013. Introduction to Statistical Pattern Recognition. Academic Press.
[23]
Athinodoros S. Georghiades, Peter N. Belhumeur, and David Kriegman. 2001. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Mach. Intell. 23, 6 (2001), 643--660.
[24]
Gene H. Golub and Charles F. Van Loan. 1996. Matrix computations. 1996. Johns Hopkins University, Press, Baltimore, MD, 374--426.
[25]
Michael Grant, Stephen Boyd, and Yinyu Ye. 2008. CVX: Matlab software for disciplined convex programming. (2008).
[26]
Onur C. Hamsici and Aleix M. Martinez. 2008. Bayes optimality in linear discriminant analysis. IEEE Trans. Pattern Anal. Mach. Intell. 30, 4 (2008), 647--657.
[27]
Paul W. Holland and Roy E. Welsch. 1977. Robust regression using iteratively reweighted least-squares. Commun. Stat. Theor. Methods 6, 9 (1977), 813--827.
[28]
Roger A. Horn and Charles R. Johnson. 1991. Topics in Matrix Analysis. Cambridge University Press, Cambridge.
[29]
Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, S. Sathiya Keerthi, and Sellamanickam Sundararajan. 2008. A dual coordinate descent method for large-scale linear SVM. In Proceedings of the 25th International Conference on Machine Learning. ACM, 408--415.
[30]
Jin Huang, Feiping Nie, Heng Huang, and Chris Ding. 2014. Robust manifold nonnegative matrix factorization. ACM Trans. Knowl. Discov. Data 8, 3 (2014), 11.
[31]
Shuiwang Ji and Jieping Ye. 2009. An accelerated gradient method for trace norm minimization. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 457--464.
[32]
Thorsten Joachims. 1999. Making Large Scale SVM Learning Practical. Technical Report. Universität Dortmund.
[33]
Thorsten Joachims. 2006. Training linear SVMs in linear time. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 217--226.
[34]
Iain M. Johnstone and D. Michael Titterington. 2009. Statistical challenges of high-dimensional data. Philos. Trans Roy. Soc. Lond. A 367, 1906 (2009), 4237--4253.
[35]
S. Sathiya Keerthi and Dennis DeCoste. 2005. A modified finite Newton method for fast solution of large scale linear SVMs. J. Mach. Learn. Res. 6 (March 2005), 341--361.
[36]
Bastian Leibe and Bernt Schiele. 2003. Analyzing appearance and contour based methods for object categorization. In Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2. IEEE, II--409.
[37]
M. Lichman. 2013. UCI Machine Learning Repository. Retrieved from http://archive.ics.uci.edu/ml.
[38]
Chih-Jen Lin, Ruby C. Weng, and S. Sathiya Keerthi. 2008. Trust region newton method for logistic regression. J. Mach. Learn. Res. 9 (2008), 627--650.
[39]
Li Liu and Paul W. Fieguth. 2012. Texture classification from random features. IEEE Trans. Pattern Anal. Mach. Intell. 34, 3 (2012), 574--586.
[40]
Olvi L. Mangasarian. 2002. A finite newton method for classification. Optim. Methods Softw. 17, 5 (2002), 913--929.
[41]
Aleix M. Martinez and Robert Benavente. 1998. The AR face database. CVC Technical Report 24 (1998).
[42]
John H. McDonald. 2009. Handbook of Biological Statistics. Vol. 2. Sparky House Publishing Baltimore, MD.
[43]
Florent Perronnin, Zeynep Akata, Zaid Harchaoui, and Cordelia Schmid. 2012. Towards good practice in large-scale learning for image classification. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 3482--3489.
[44]
Volker Roth and Volker Steinhage. 1999. Nonlinear discriminant analysis using kernel functions. In Advances in Neural Information Processing Systems. Citeseer.
[45]
Sam T. Roweis and Lawrence K. Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 5500 (2000), 2323--2326.
[46]
E. J. Schlossmacher. 1973. An iterative technique for absolute deviations curve fitting. J. Am. Stat. Assoc. 68, 344 (1973), 857--859.
[47]
Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro, and Andrew Cotter. 2011. Pegasos: Primal estimated sub-gradient solver for svm. Math. Program. 127, 1 (2011), 3--30.
[48]
Qinfeng Shi, Hanxi Li, and Chunhua Shen. 2010. Rapid face recognition using hashing. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’10). 2753--2760.
[49]
Alex J. Smola and Bernhard Schölkopf. 1998. Learning with Kernels. Citeseer.
[50]
Lixin Sun, Ai-Min Hui, Qin Su, Alexander Vortmeyer, Yuri Kotliarov, Sandra Pastorino, Antonino Passaniti, Jayant Menon, Jennifer Walling, Rolando Bailey, and others. 2006. Neuronal and glioma-derived stem cell factor induces angiogenesis within the brain. Canc. Cell 9, 4 (2006), 287--300.
[51]
Joel A. Tropp and Anna C. Gilbert. 2007. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inform. Theor. 53, 12 (2007), 4655--4666.
[52]
Matthew A. Turk and Alex P. Pentland. 1991. Face recognition using eigenfaces. In Proceedings of the 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’91). IEEE, 586--591.
[53]
Vladimir Vapnik. 2000. The Nature of Statistical Learning Theory. Springer Science 8 Business Media.
[54]
Curtis R. Vogel and Mary E. Oman. 1996. Iterative methods for total variation denoising. SIAM J. Sci. Comput. 17, 1 (1996), 227--238.
[55]
John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma. 2009. Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31, 2 (2009), 210--227.
[56]
Jian Yang, David Zhang, Alejandro F. Frangi, and Jing-yu Yang. 2004. Two-dimensional PCA: A new approach to appearance-based face representation and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 26, 1 (2004), 131--137.
[57]
Meng Yang and Lei Zhang. 2010. Gabor feature based sparse representation for face recognition with gabor occlusion dictionary. European Conference on Computer Vision. Springer, 448--461.
[58]
Ming Yuan and Yi Lin. 2006. Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc.: Ser. B 68, 1 (2006), 49--67.
[59]
Tong Zhang. 2004. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the 21st International Conference on Machine Learning. ACM, 116.
[60]
Hongbo Zhou and Qiang Cheng. 2015. A scalable projective scaling algorithm for loss with convex penalizations. IEEE Trans. Neur. Netw. Learn. Syst. 26, 2 (2015), 265--276.

Cited By

View all
  • (2021)Savitzky–Golay filter energy features-based approach to face recognition using symbolic modelingPattern Analysis & Applications10.1007/s10044-021-00991-z24:4(1451-1473)Online publication date: 1-Nov-2021
  • (2020)A Large Size Image Classification Method Based on Semi-supervised LearningRecent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering)10.2174/187447610566619083011015013:5(669-680)Online publication date: 22-Sep-2020
  • (2020)Self-weighted Robust LDA for Multiclass Classification with Edge ClassesACM Transactions on Intelligent Systems and Technology10.1145/341828412:1(1-19)Online publication date: 22-Dec-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 8, Issue 2
Survey Paper, Special Issue: Intelligent Music Systems and Applications and Regular Papers
March 2017
407 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3004291
  • Editor:
  • Yu Zheng
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2016
Accepted: 01 July 2016
Revised: 01 May 2016
Received: 01 February 2016
Published in TIST Volume 8, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Discriminative regression
  2. classification
  3. high dimension
  4. large-scale data
  5. supervised learning

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)98
  • Downloads (Last 6 weeks)17
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Savitzky–Golay filter energy features-based approach to face recognition using symbolic modelingPattern Analysis & Applications10.1007/s10044-021-00991-z24:4(1451-1473)Online publication date: 1-Nov-2021
  • (2020)A Large Size Image Classification Method Based on Semi-supervised LearningRecent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering)10.2174/187447610566619083011015013:5(669-680)Online publication date: 22-Sep-2020
  • (2020)Self-weighted Robust LDA for Multiclass Classification with Edge ClassesACM Transactions on Intelligent Systems and Technology10.1145/341828412:1(1-19)Online publication date: 22-Dec-2020
  • (2020)Distributed Nonlinear Semiparametric Support Vector Machine for Big Data Applications on Spark FrameworksIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2018.285877850:11(4664-4675)Online publication date: Nov-2020
  • (2020)Feature Selection Embedded Robust K-MeansIEEE Access10.1109/ACCESS.2020.30227498(166164-166175)Online publication date: 2020
  • (2018)Image Denoising via Improved Dictionary Learning with Global Structure and Local Similarity PreservationsSymmetry10.3390/sym1005016710:5(167)Online publication date: 16-May-2018
  • (2018)Rotational Invariant Discriminant Subspace Learning For Image Classification2018 24th International Conference on Pattern Recognition (ICPR)10.1109/ICPR.2018.8545100(1217-1222)Online publication date: Aug-2018
  • (2018)Joint Linear Regression and Nonnegative Matrix Factorization Based on Self-Organized Graph for Image Clustering and ClassificationIEEE Access10.1109/ACCESS.2018.28542326(38820-38834)Online publication date: 2018
  • (2018)Handling data irregularities in classification: Foundations, trends, and future challengesPattern Recognition10.1016/j.patcog.2018.03.00881(674-693)Online publication date: Sep-2018
  • (2017)Large Earthquake Magnitude Prediction in Chile with Imbalanced Classifiers and Ensemble LearningApplied Sciences10.3390/app70606257:6(625)Online publication date: 16-Jun-2017
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media