Abstract
In conventional distributed machine learning methods, distributed support vector machines (SVM) algorithms are trained over pre-configured intranet/internet environments to find out an optimal classifier. These methods are very complicated and costly for large datasets. Hence, we propose a method that is referred as the Cloud SVM training mechanism (CloudSVM) in a cloud computing environment with MapReduce technique for distributed machine learning applications. Accordingly, (i) SVM algorithm is trained in distributed cloud storage servers that work concurrently; (ii) merge all support vectors in every trained cloud node; and (iii) iterate these two steps until the SVM converges to the optimal classifier function. Single computer is incapable to train SVM algorithm with large scale data sets. The results of this study are important for training of large scale data sets for machine learning applications. We provided that iterative training of splitted data set in cloud computing environment using SVM will converge to a global optimal classifier in finite iteration size.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chang, E.Y., Zhu, K., Wang, H., Bai, H., Li, J., Qiu, Z., Cui, H.: PSVM: Parallelizing Support Vector Machines on Distributed Computers. In: Advances in Neural Information Processing Systems, vol. 20 (2007)
Tsang, I.W., Kwok, J.T., Cheung, P.M.: Core Vector Machines: Fast SVM Training on Very Large Data Sets. J. Mach. Learn. Res. 6, 363–392 (2005)
Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.: Feature selection for SVMs. In: Advances in Neural Information Processing Systems, vol. 13, pp. 668–674 (2000)
Golub, G., Reinsch, C.E.: Singular value decomposition and least squares solutions. Numerische Mathematik 14, 403–420 (1970)
Jolliffe, I.T.: Principal Component Analysis, 2nd edn., New York. Springer Series in Statistics (2002)
Comon, P.: Independent Component Analysis, a new concept? Signal Processing 36, 287–314 (1994)
Hall, M.A.: Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann Publishers Inc., San Francisco (2000)
Lu, Y., Roychowdhury, V., Vandenberghe, L.: Distributed parallel support vector machines in strongly connected networks. IEEE Trans. Neural Networks 19, 1167–1178 (2008)
Stefan, R.: Incremental Learning with Support Vector Machines. In: IEEE International Conference on Data Mining, p. 641. IEEE Computer Society, Los Alamitos (2001)
Syed, N.A., Liu, H., Sung, K.: Incremental learning with support vector machines. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), San Diego, California (1999)
Caragea, C., Caragea, D., Honavar, V.: Learning support vector machine classifiers from distributed data sources. In: Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI), Student Abstract and Poster Program, pp. 1602–1603. AAAI Press, Pittsburgh (2005)
Collobert, R., Bengio, S., Bengio, Y.: A parallel mixture of SVMs for very large scale problems. Neural Computation 14, 1105–1114 (2002)
Vapnik, V.N.: The nature of statistical learning theory. Springer, NY (1995)
Graf, H.P., Cosatto, E., Bottou, L., Durdanovic, I., Vapnik, V.: Parallel support vector machines: The cascade SVM. In: Proceedings of the Eighteenth Annual Conference on Neural Information Processing Systems (NIPS), pp. 521–528. MIT Press, Vancouver (2004)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27–27 (2011)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 2278–2324 (1998)
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Cambridge (1999)
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Proceedings of the 6th conference on Symposium on Operating Systems Design & Implementation(OSDI), p. 10. USENIX Association, Berkeley (2004)
Schatz, M.C.: CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25, 1363–1369 (2009)
Rosasco, L., De Vito, E., Caponnetto, A., Piana, M., Verri, A.: Are loss functions all the same. Neural Computation 16, 1063–1076 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Catak, F.O., Balaban, M.E. (2013). CloudSVM: Training an SVM Classifier in Cloud Computing Systems. In: Zu, Q., Hu, B., Elçi, A. (eds) Pervasive Computing and the Networked World. ICPCA/SWS 2012. Lecture Notes in Computer Science, vol 7719. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37015-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-37015-1_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37014-4
Online ISBN: 978-3-642-37015-1
eBook Packages: Computer ScienceComputer Science (R0)