Abstract
The Feature selection (FS) plays an imperative role in Machine Learning (ML) but it is really demanding when we apply feature selection to voluminous data. The conventional FS methods are not competent in handling big datasets. This leads to the need of a technology that processes the data in parallel. MapReduce is a new programming framework used for processing massive data by using the “divide and conquer” approach. In this paper, a novel parallel BAT algorithm is proposed for feature selection of big datasets and finally classification is applied to the set of known classifiers. The proposed parallel FS technique is highly scalable for big datasets. The experimental results have shown improved efficacy of the proposed algorithm in terms of the accuracy and comparatively lesser execution time when the number of parallel nodes is increased.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Gill, S.S., Buyya, R.: Bio-inspired algorithms for big data analytics: a survey. Taxon. Open Chall., 1–17 (2019)
Khalil, Y., Alshayeji, M., Ahmad, I.: Distributed whale optimization algorithm based on MapReduce. Concurr. Comput.: Pract. Exp. 31(1), e4872 (2019)
Li, J., et al.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 94 (2018)
Rodriguez-Galiano, V.F., et al.: Feature selection approaches for predictive modelling of groundwater nitrate pollution: An evaluation of filters, embedded and wrapper methods. Sci. Total Environ. 624, 661–672 (2018)
Judy, M.V., Soman, G.: Parallel fuzzy cognitive map using evolutionary feature reduction for big data classification problem. In: Annual Convention of the Computer Society of India. Springer, Singapore (2018)
Tsamardinos, I., et al.: A greedy feature selection algorithm for Big Data of high dimensionality. Mach. Learn. 108(2), 149–202 (2019)
Palma-Mendoza, R.-J., Rodriguez, D., De-Marcos, L.: Distributed ReliefF-based feature selection in Spark. Knowl. Inform. Syst. 57, 1–20 (2018)
Kečo, D., Subasi, A., Kevric, J.: Cloud computing-based parallel genetic algorithm for gene selection in cancer classification. Neural Comput. Appl. 30(5), 1601–1610 (2018)
Ghaddar, B., Naoum-Sawaya, J.: High dimensional data classification and feature selection using support vector machines. Eur. J. Oper. Res. 265(3), 993–1004 (2018)
Bista, S., Chitrakar, R.: DDoS attack detection using heuristics clustering algorithm and naïve bayes classification (2018)
Sasikala, S., Renuka Devi, D.: A review of traditional and swarm search based feature selection algorithms for handling data stream classification. In: Third International Conference on Sensing, Signal Processing and Security (ICSSS). IEEE (2017)
Reggiani, C., Le Borgne, Y.A., Bontempi, G.: Feature selection in high-dimensional dataset using MapReduce. In: Benelux Conference on Artificial Intelligence. Springer, Cham (2017)
Wu, X., Zhu, X., Wu, G.-Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)
Liu, Z.: A method of SVM with normalization in intrusion detection. Proc. Environ. Sci. 11, 256–262 (2011)
Yang., X.-S.: A new metaheuristic bat-inspired algorithm. In: Nature inspired Cooperative Strategies for Optimization (NICSO 2010), pp. 65–74. Springer, Heidelberg (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Renuka Devi, D., Sasikala, S. (2020). Feature Selection and Classification of Big Data Using MapReduce Framework. In: Pandian, A., Ntalianis, K., Palanisamy, R. (eds) Intelligent Computing, Information and Control Systems. ICICCS 2019. Advances in Intelligent Systems and Computing, vol 1039. Springer, Cham. https://doi.org/10.1007/978-3-030-30465-2_73
Download citation
DOI: https://doi.org/10.1007/978-3-030-30465-2_73
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30464-5
Online ISBN: 978-3-030-30465-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)