Abstract
Hadoop is an open source from Apache with a distributed file system and MapReduce distributed computing framework. The current Apache 2.0 license agreement supports on-demand payment by consumers for cloud platform services, helping users leverage their respective different hardware to provides cloud services. In cloud-based environment, there is a need to balance the resource requirements of workloads, optimize load performance, and the cloud compute costs to manage. When the processing power of clustered machines varies widely, such as when hardware is aging or overloaded, Hadoop offers a speculative execution (SE) optimization strategy, by monitoring task progress in real time, in the starting identical backup tasks on different nodes when multiple tasks under a job are not running at the same speed, providing the first to go. The completed calculations maintain the overall progress of the job. At present, the SE strategy’s incorrect selection of backup nodes and resource constraints may result in poor Hadoop performance, and subsequent tasks cannot be completed execution and other problems. This paper proposes an SE optimization strategy based on near data prediction, which analyzes the prediction of real-time task execution information to predict the required running time, select backup nodes based on actual requirements and approximate data to make the SE strategy achieve the best performance. Experiments prove that in a heterogeneous Hadoop environment, the optimization strategy can effectively improve the effectiveness and accuracy of various tasks and enhance the performance of cloud computing. Platform performance can benefits consumers better than before.
Similar content being viewed by others
References
Abolfazli S, Sanaei Z, Alizadeh M, Gani A, Xia F (2014) An experimental analysis on cloud-based mobile augmentation in mobile cloud computing. IEEE Trans Consum Electron 60(1):146–154
Chi X, Yan C, Wang H, Rafique W, Qi L (2020) Amplified locality-sensitive hashing-based recommender systems with privacy protection. Concurrency and Computation: Practice and Experience e5681. https://doi.org/10.1002/cpe.5681
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Fu Z, Sun X, Linge N, Zhou L (2014) Achieving effective cloud search services: multi-keyword ranked search over encrypted cloud data supporting synonym query. IEEE Trans Consum Electron 60 (1):164–172
Giselsson P, Boyd S (2017) Linear convergence and metric selection for douglas-rachford splitting and admm. IEEE Trans Autom Control 62(2):532–544
Gong W, Qi L, Xu Y (2018) Privacy-aware multidimensional mobile service quality prediction and recommendation in distributed fog environment. Wirel Commun Mob Comput 2018
Gu Z, Qiu M (2018) Introduction to the special issue on ?embedded artificial intelligence and smart computing?
Hamdani M, Aklouf Y, Bouarara HA (2019) Improved fuzzy load-balancing algorithm for cloud computing system. In: Proceedings of the 9th international conference on information systems and technologies, pp 1–4
Huang X, Zhang L, Li R, Wan L, Li K (2016) Novel heuristic speculative execution strategies in heterogeneous distributed environments. Comput Electr Eng 50:166–179
Iqbal MH, Soomro TR (2015) Big data analysis: Apache storm perspective. Int J Comput Trends Technol 19(1):9–14
Kalyampudi PL, Krishna PV, Kuppani S, Saritha V (2021) A work load prediction strategy for power optimization on cloud based data centre using deep machine learning. Evol Intel 14:519–527
Lee YT, Hsiao WH, Huang CM, Seng-cho TC (2016) An integrated cloud-based smart home management system with community hierarchy. IEEE Trans Consum Electron 62(1):1–9
Li J, Liu Y, Pan J, Zhang P, Chen W, Wang L (2020) Map-balance-reduce: an improved parallel programming model for load balancing of mapreduce. Futur Gener Comput Syst 105:993–1001
Li Y, Yang Q, Lai S, Li B (2015) A new speculative execution algorithm based on c4. 5 decision tree for hadoop. In: International conference of young computer scientists, Engineers and Educators. Springer, pp 284–291
Li Z, Shen H, Ligon W, Denton J (2016) An exploration of designing a hybrid scale-up/out hadoop architecture based on performance measurements. IEEE Trans Parallel Distrib Syst 28(2):386–400
Liu Q, Cai W, Fu Z, Shen J, Linge N (2016a) A smart strategy for speculative execution based on hardware resource in a heterogeneous distributed environment. Int J Grid Distrib Comput 9(2):203–214
Liu Q, Cai W, Jin D, Shen J, Fu Z, Liu X, Linge N (2016b) Estimation accuracy on execution time of run-time tasks in a heterogeneous distributed environment. Sensors 16(9):1386
Liu Q, Cai W, Shen J, Fu Z, Liu X, Linge N (2016c) A speculative approach to spatial-temporal efficiency with multi-objective optimization in a heterogeneous cloud environment. Secur Commun Netw 9(17):4002–4012
Liu Q, Cai W, Shen J, Liu X, Linge N (2016d) An adaptive approach to better load balancing in a consumer-centric cloud environment. IEEE Trans Consum Electron 62(3):243–250
Liu Q, Chen F, Chen F, Wu Z, Liu X, Linge N (2018) Home appliances classification based on multi-feature using elm. IJSNet 28(1):34–42
Qi L, Dou W, Wang W, Li G, Yu H, Wan S (2018) Dynamic mobile crowdsourcing selection for electricity load forecasting. IEEE Access 6:46926–46937
Qi L, Chen Y, Yuan Y, Fu S, Zhang X, Xu X (2019) A qos-aware virtual machine scheduling method for energy conservation in cloud-based cyber-physical systems. World Wide Web 1–23
Qi L, Zhang X, Li S, Wan S, Wen Y, Gong W (2020) Spatial-temporal data-driven service recommendation with privacy-preservation. Inform Sci 515:91–102
Sanchez R, Almenares F, Arias P, Diaz-Sanchez D, Marin A (2012) Enhancing privacy and dynamic federation in idm for consumer cloud computing. IEEE Trans Consum Electron 58(1):95–103
Tang S, Lee B S, He B (2014) Dynamicmr: A dynamic slot allocation optimization framework for mapreduce clusters. IEEE Trans Cloud Comput 2(3):333–347
Vaquero L M, Roderomerino L, Caceres J, Lindner M (2008) A break in the clouds: Towards a cloud definition. Acm Sigcomm Comput Commun Rev 39(1):50–55
Wan S, Goudos S (2020) Faster r-cnn for multi-class fruit detection using a robotic vision system. Comput Netw 168:107036
Wan S, Gu Z, Ni Q (2019a) Cognitive computing and wireless communications on the edge for healthcare service robots. Comput Commun
Wan S, Qi L, Xu X, Tong C, Gu Z (2020) Deep learning models for real-time human activity recognition with smartphones. Mob Netw Appl 25:743–755
Wang Y, Lu W, Lou R, Wei B (2015) Improving mapreduce performance with partial speculative execution. J Grid Comput 13(4):587–604
Wu H, Li K, Tang Z, Zhang L (2014) A heuristic speculative execution strategy in heterogeneous distributed environments. In: 2014 Sixth international symposium on parallel architectures, algorithms and programming. IEEE, pp 268–273
Xu H, Lau WC (2015) Optimization for speculative execution in a mapreduce-like cluster. In: 2015 IEEE conference on computer communications, INFOCOM. IEEE, pp 1071–1079
Xu H, Lau W C (2016) Optimization for speculative execution in big data processing clusters. IEEE Trans Parallel Distrib Syst 28(2):530–545
Xu X, He C, Xu Z, Qi L, Wan S, Bhuiyan MZA (2019a) Joint optimization of offloading utility and privacy for edge computing enabled iot. IEEE Intern Things J
Xu X, Li Y, Huang T, Xue Y, Peng K, Qi L, Dou W (2019b) An energy-aware computation offloading method for smart edge computing in wireless metropolitan area networks. J Netw Comput Appl 133:75–85
Xu X, Liu X, Xu Z, Wang C, Wan S, Yang X (2020) Joint optimization of resource utilization and load balance with privacy preservation for edge services in 5g networks. Mobile Netw Appl 25:713–724
Xu X, Mo R, Dai F, Lin W, Wan S, Dou W (2019d) Dynamic resource provisioning with fault tolerance for data-intensive meteorological workflows in cloud. IEEE Trans Industr Inform
Xu X, Xue Y, Qi L, Yuan Y, Zhang X, Umer T, Wan S (2019e) An edge computing-enabled computation offloading method with privacy preservation for internet of connected vehicles. Futur Gener Comput Syst 96:89–100
Xu X, Zhang X, Gao H, Xue Y, Qi L (2019f) Become :Blockchain-enabled computation offloading for iot in mobile edge computing. IEEE Trans Industr Inform
Xu X, Cao H, Geng Q, Liu X, Dai F, Wang C (2020) Dynamic resource provisioning for workflow scheduling under uncertainty in edge computing environment. Concurr Comput Pract Exper 56–74. https://doi.org/10.1002/cpe.5674
Xu Y, Qi L, Dou W, Yu J (2017) Privacy-preserving and scalable service recommendation based on simhash in a distributed cloud environment. Complexity 2017
Yang SJ, Chen YR (2015) Design adaptive task allocation scheduler to improve mapreduce performance in heterogeneous clouds. J Netw Comput Appl 57:61–70
Zhang M, Zheng N, Li H, Gu Z (2018) A decomposition-based approach to optimization of ttp-based distributed embedded systems. J Syst Archit 91:53–61
Zhao Q, Gu Z, Zeng H, Zheng N (2018) Schedulability analysis and stack size minimization with preemption thresholds and mixed-criticality scheduling. J Syst Archit 83:57–74
Acknowledgements
This work has received funding from National Natural Science Foundation of China (No. 41911530242, 41975142), 5150 Spring Specialists (05492018012, 05762018039), Major Program of the National Social Science Fund of China (Grant No.17ZDA092), 333 High-Level Talent Cultivation Project of Jiangsu Province (BRA2018332), Royal Society of Edinburgh, UK and China Natural Science Foundation Council (RSE Reference: 62967_Liu_2018_2) under their Joint International Projects funding scheme and basic Research Programs (Natural Science Foundation) of Jiangsu Province (BK20191398).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, Q., Wu, X., Liu, X. et al. Near-data Prediction Based Speculative Optimization in a Distribution Environment. Mobile Netw Appl 27, 2339–2347 (2022). https://doi.org/10.1007/s11036-021-01793-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11036-021-01793-7