research-article

Public Access

A Supervised Learning Model for High-Dimensional and Large-Scale Data

Authors:

Qiang ChengAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 8, Issue 2

Article No.: 30, Pages 1 - 23

https://doi.org/10.1145/2972957

Published: 02 November 2016 Publication History

Abstract

We introduce a new supervised learning model using a discriminative regression approach. This new model estimates a regression vector to represent the similarity between a test example and training examples while seamlessly integrating the class information in the similarity estimation. This distinguishes our model from usual regression models and locally linear embedding approaches, rendering our method suitable for supervised learning problems in high-dimensional settings. Our model is easily extensible to account for nonlinear relationship and applicable to general data, including both high- and low-dimensional data. The objective function of the model is convex, for which two optimization algorithms are provided. These two optimization approaches induce two scalable solvers that are of mathematically provable, linear time complexity. Experimental results verify the effectiveness of the proposed method on various kinds of data. For example, our method shows comparable performance on low-dimensional data and superior performance on high-dimensional data to several widely used classifiers; also, the linear solvers obtain promising performance on large-scale classification.

References

[1]

Zeynep Akata, Florent Perronnin, Zaid Harchaoui, and Cordelia Schmid. 2014. Good practice in large-scale learning for image classification. IEEE Trans. Pattern Anal. Mach. Intell. 36, 3 (2014), 507--520.

Digital Library

[2]

Peter Andras. 2014. Function approximation using combined unsupervised and supervised learning. IEEE Trans. Neur. Netw. Learn. Syst. 25, 3 (2014), 495--505.

[3]

K. Bache and M. Lichman. 2013. UCI Machine Learning Repository. (2013). Retrieved from http://archive.ics.uci.edu/ml.

[4]

Amir Beck and Marc Teboulle. 2009. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2, 1 (2009), 183--202.

Digital Library

[5]

Peter N. Belhumeur, João P Hespanha, and David Kriegman. 1997. Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 19, 7 (1997), 711--720.

Digital Library

[6]

Nicolai Bissantz, Lutz Dümbgen, Axel Munk, and Bernd Stratmann. 2009. Convergence analysis of generalized iteratively reweighted least squares algorithms on convex function spaces. SIAM J. Optim. 19, 4 (2009), 1828--1845.

Digital Library

[7]

Léon Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010. Springer, 177--186.

[8]

Olivier Bousquet and Léon Bottou. 2008. The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems. 161--168.

Digital Library

[9]

Deng Cai, Xiaofei He, and Jiawei Han. 2007. Efficient kernel discriminant analysis via spectral regression. In Seventh IEEE International Conference on Data Mining, 2007. ICDM 2007. IEEE, 427--432.

Digital Library

[10]

Emmanuel J. Candès, Xiaodong Li, Yi Ma, and John Wright. 2011. Robust principal component analysis? Journal of the ACM (JACM) 58, 3 (2011), 11.

Digital Library

[11]

Chih-Chung Chang and Chih-Jen Lin. 2001. IJCNN 2001 challenge: Generalization ability and text decoding. In Proceedings of IJCNN. IEEE. Citeseer.

[12]

Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 3 (2011), 27.

Digital Library

[13]

Kai-Wei Chang, Cho-Jui Hsieh, and Chih-Jen Lin. 2008. Coordinate descent method for large-scale l2-loss linear support vector machines. Jo. Mach. Learn. Res. 9 (2008), 1369--1398.

Digital Library

[14]

Scott Shaobing Chen, David L. Donoho, and Michael A. Saunders. 1998. Atomic decomposition by basis pursuit. SIAM Rev. 20, 1 (1998), 33--61.

Digital Library

[15]

Qiang Cheng, Hongbo Zhou, Jie Cheng, and Huiqing Li. 2014. A minimax framework for classification with applications to images and high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36, 11 (2014), 2117--2130.

[16]

Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Mach. Learn. 20, 3 (1995), 273--297.

Digital Library

[17]

Janez Demšar. 2006. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7 (2006), 1--30.

Digital Library

[18]

Luc Devroye. 1996. A Probabilistic Theory of Pattern Recognition. Vol. 31. Springer Science 8 Business Media.

[19]

Marco F. Duarte and Yu Hen Hu. 2004. Vehicle classification in distributed sensor networks. J. Parallel Distrib. Comput. 64, 7 (2004), 826--838.

Digital Library

[20]

Richard O. Duda, Peter E. Hart, and David G. Stork. 2012. Pattern Classification. John Wiley 8 Sons.

[21]

Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2001. The Elements of Statistical Learning. Springer series in statistics, Springer, Berlin, Vol. 1.

[22]

Keinosuke Fukunaga. 2013. Introduction to Statistical Pattern Recognition. Academic Press.

[23]

Athinodoros S. Georghiades, Peter N. Belhumeur, and David Kriegman. 2001. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Mach. Intell. 23, 6 (2001), 643--660.

Digital Library

[24]

Gene H. Golub and Charles F. Van Loan. 1996. Matrix computations. 1996. Johns Hopkins University, Press, Baltimore, MD, 374--426.

[25]

Michael Grant, Stephen Boyd, and Yinyu Ye. 2008. CVX: Matlab software for disciplined convex programming. (2008).

[26]

Onur C. Hamsici and Aleix M. Martinez. 2008. Bayes optimality in linear discriminant analysis. IEEE Trans. Pattern Anal. Mach. Intell. 30, 4 (2008), 647--657.

Digital Library

[27]

Paul W. Holland and Roy E. Welsch. 1977. Robust regression using iteratively reweighted least-squares. Commun. Stat. Theor. Methods 6, 9 (1977), 813--827.

[28]

Roger A. Horn and Charles R. Johnson. 1991. Topics in Matrix Analysis. Cambridge University Press, Cambridge.

Digital Library

[29]

Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, S. Sathiya Keerthi, and Sellamanickam Sundararajan. 2008. A dual coordinate descent method for large-scale linear SVM. In Proceedings of the 25th International Conference on Machine Learning. ACM, 408--415.

Digital Library

[30]

Jin Huang, Feiping Nie, Heng Huang, and Chris Ding. 2014. Robust manifold nonnegative matrix factorization. ACM Trans. Knowl. Discov. Data 8, 3 (2014), 11.

Digital Library

[31]

Shuiwang Ji and Jieping Ye. 2009. An accelerated gradient method for trace norm minimization. In Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 457--464.

Digital Library

[32]

Thorsten Joachims. 1999. Making Large Scale SVM Learning Practical. Technical Report. Universität Dortmund.

[33]

Thorsten Joachims. 2006. Training linear SVMs in linear time. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 217--226.

Digital Library

[34]

Iain M. Johnstone and D. Michael Titterington. 2009. Statistical challenges of high-dimensional data. Philos. Trans Roy. Soc. Lond. A 367, 1906 (2009), 4237--4253.

[35]

S. Sathiya Keerthi and Dennis DeCoste. 2005. A modified finite Newton method for fast solution of large scale linear SVMs. J. Mach. Learn. Res. 6 (March 2005), 341--361.

Digital Library

[36]

Bastian Leibe and Bernt Schiele. 2003. Analyzing appearance and contour based methods for object categorization. In Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2. IEEE, II--409.

[37]

M. Lichman. 2013. UCI Machine Learning Repository. Retrieved from http://archive.ics.uci.edu/ml.

[38]

Chih-Jen Lin, Ruby C. Weng, and S. Sathiya Keerthi. 2008. Trust region newton method for logistic regression. J. Mach. Learn. Res. 9 (2008), 627--650.

Digital Library

[39]

Li Liu and Paul W. Fieguth. 2012. Texture classification from random features. IEEE Trans. Pattern Anal. Mach. Intell. 34, 3 (2012), 574--586.

Digital Library

[40]

Olvi L. Mangasarian. 2002. A finite newton method for classification. Optim. Methods Softw. 17, 5 (2002), 913--929.

[41]

Aleix M. Martinez and Robert Benavente. 1998. The AR face database. CVC Technical Report 24 (1998).

[42]

John H. McDonald. 2009. Handbook of Biological Statistics. Vol. 2. Sparky House Publishing Baltimore, MD.

[43]

Florent Perronnin, Zeynep Akata, Zaid Harchaoui, and Cordelia Schmid. 2012. Towards good practice in large-scale learning for image classification. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 3482--3489.

Digital Library

[44]

Volker Roth and Volker Steinhage. 1999. Nonlinear discriminant analysis using kernel functions. In Advances in Neural Information Processing Systems. Citeseer.

Digital Library

[45]

Sam T. Roweis and Lawrence K. Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290, 5500 (2000), 2323--2326.

[46]

E. J. Schlossmacher. 1973. An iterative technique for absolute deviations curve fitting. J. Am. Stat. Assoc. 68, 344 (1973), 857--859.

[47]

Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro, and Andrew Cotter. 2011. Pegasos: Primal estimated sub-gradient solver for svm. Math. Program. 127, 1 (2011), 3--30.

Digital Library

[48]

Qinfeng Shi, Hanxi Li, and Chunhua Shen. 2010. Rapid face recognition using hashing. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’10). 2753--2760.

[49]

Alex J. Smola and Bernhard Schölkopf. 1998. Learning with Kernels. Citeseer.

[50]

Lixin Sun, Ai-Min Hui, Qin Su, Alexander Vortmeyer, Yuri Kotliarov, Sandra Pastorino, Antonino Passaniti, Jayant Menon, Jennifer Walling, Rolando Bailey, and others. 2006. Neuronal and glioma-derived stem cell factor induces angiogenesis within the brain. Canc. Cell 9, 4 (2006), 287--300.

[51]

Joel A. Tropp and Anna C. Gilbert. 2007. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inform. Theor. 53, 12 (2007), 4655--4666.

Digital Library

[52]

Matthew A. Turk and Alex P. Pentland. 1991. Face recognition using eigenfaces. In Proceedings of the 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’91). IEEE, 586--591.

[53]

Vladimir Vapnik. 2000. The Nature of Statistical Learning Theory. Springer Science 8 Business Media.

Digital Library

[54]

Curtis R. Vogel and Mary E. Oman. 1996. Iterative methods for total variation denoising. SIAM J. Sci. Comput. 17, 1 (1996), 227--238.

Digital Library

[55]

John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma. 2009. Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31, 2 (2009), 210--227.

Digital Library

[56]

Jian Yang, David Zhang, Alejandro F. Frangi, and Jing-yu Yang. 2004. Two-dimensional PCA: A new approach to appearance-based face representation and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 26, 1 (2004), 131--137.

Digital Library

[57]

Meng Yang and Lei Zhang. 2010. Gabor feature based sparse representation for face recognition with gabor occlusion dictionary. European Conference on Computer Vision. Springer, 448--461.

Digital Library

[58]

Ming Yuan and Yi Lin. 2006. Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc.: Ser. B 68, 1 (2006), 49--67.

[59]

Tong Zhang. 2004. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the 21st International Conference on Machine Learning. ACM, 116.

Digital Library

[60]

Hongbo Zhou and Qiang Cheng. 2015. A scalable projective scaling algorithm for loss with convex penalizations. IEEE Trans. Neur. Netw. Learn. Syst. 26, 2 (2015), 265--276.

Cited By

Kagawade VAngadi S(2021)Savitzky–Golay filter energy features-based approach to face recognition using symbolic modelingPattern Analysis & Applications10.1007/s10044-021-00991-z24:4(1451-1473)Online publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1007/s10044-021-00991-z
Luo DWang X(2020)A Large Size Image Classification Method Based on Semi-supervised LearningRecent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering)10.2174/187447610566619083011015013:5(669-680)Online publication date: 22-Sep-2020
https://doi.org/10.2174/1874476105666190830110150
Yan CChang XLuo MZheng QZhang XLi ZNie F(2020)Self-weighted Robust LDA for Multiclass Classification with Edge ClassesACM Transactions on Intelligent Systems and Technology10.1145/341828412:1(1-19)Online publication date: 22-Dec-2020
https://dl.acm.org/doi/10.1145/3418284
Show More Cited By

Index Terms

A Supervised Learning Model for High-Dimensional and Large-Scale Data

Recommendations

Supervised Distance Preserving Projections

In this work, we consider dimensionality reduction in supervised settings and, specifically, we focus on regression problems. A novel algorithm, the supervised distance preserving projection (SDPP), is proposed. The SDPP minimizes the difference between ...
How to solve classification and regression problems on high-dimensional data with a supervised extension of slow feature analysis

Supervised learning from high-dimensional data, for example, multimedia data, is a challenging task. We propose an extension of slow feature analysis (SFA) for supervised dimensionality reduction called graph-based SFA (GSFA). The algorithm extracts a ...
Coupled dimensionality reduction and classification for supervised and semi-supervised multilabel learning

Coupled training of dimensionality reduction and classification is proposed previously to improve the prediction performance for single-label problems. Following this line of research, in this paper, we first introduce a novel Bayesian method that ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 8, Issue 2

Survey Paper, Special Issue: Intelligent Music Systems and Applications and Regular Papers

March 2017

407 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/3004291

Editor:
Yu Zheng
Microsoft Research, China

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2016

Accepted: 01 July 2016

Revised: 01 May 2016

Received: 01 February 2016

Published in TIST Volume 8, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
609
Total Downloads

Downloads (Last 12 months)98
Downloads (Last 6 weeks)17

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kagawade VAngadi S(2021)Savitzky–Golay filter energy features-based approach to face recognition using symbolic modelingPattern Analysis & Applications10.1007/s10044-021-00991-z24:4(1451-1473)Online publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1007/s10044-021-00991-z
Luo DWang X(2020)A Large Size Image Classification Method Based on Semi-supervised LearningRecent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering)10.2174/187447610566619083011015013:5(669-680)Online publication date: 22-Sep-2020
https://doi.org/10.2174/1874476105666190830110150
Yan CChang XLuo MZheng QZhang XLi ZNie F(2020)Self-weighted Robust LDA for Multiclass Classification with Edge ClassesACM Transactions on Intelligent Systems and Technology10.1145/341828412:1(1-19)Online publication date: 22-Dec-2020
https://dl.acm.org/doi/10.1145/3418284
Diaz-Morales RNavia-Vazquez A(2020)Distributed Nonlinear Semiparametric Support Vector Machine for Big Data Applications on Spark FrameworksIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2018.285877850:11(4664-4675)Online publication date: Nov-2020
https://doi.org/10.1109/TSMC.2018.2858778
Zhang QPeng C(2020)Feature Selection Embedded Robust K-MeansIEEE Access10.1109/ACCESS.2020.30227498(166164-166175)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3022749
Cai SKang ZYang MXiong XPeng CXiao M(2018)Image Denoising via Improved Dictionary Learning with Global Structure and Local Similarity PreservationsSymmetry10.3390/sym1005016710:5(167)Online publication date: 16-May-2018
https://doi.org/10.3390/sym10050167
Ye QZhang Z(2018)Rotational Invariant Discriminant Subspace Learning For Image Classification2018 24th International Conference on Pattern Recognition (ICPR)10.1109/ICPR.2018.8545100(1217-1222)Online publication date: Aug-2018
https://doi.org/10.1109/ICPR.2018.8545100
Zhu WYan Y(2018)Joint Linear Regression and Nonnegative Matrix Factorization Based on Self-Organized Graph for Image Clustering and ClassificationIEEE Access10.1109/ACCESS.2018.28542326(38820-38834)Online publication date: 2018
https://doi.org/10.1109/ACCESS.2018.2854232
Das SDatta SChaudhuri B(2018)Handling data irregularities in classification: Foundations, trends, and future challengesPattern Recognition10.1016/j.patcog.2018.03.00881(674-693)Online publication date: Sep-2018
https://doi.org/10.1016/j.patcog.2018.03.008
Fernández-Gómez MAsencio-Cortés GTroncoso AMartínez-Álvarez F(2017)Large Earthquake Magnitude Prediction in Chile with Imbalanced Classifiers and Ensemble LearningApplied Sciences10.3390/app70606257:6(625)Online publication date: 16-Jun-2017
https://doi.org/10.3390/app7060625
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents