Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A frequency-aware and energy-saving strategy based on DVFS for Spark

Published: 01 October 2021 Publication History

Abstract

With the fast growth of big data applications, it has brought about a huge increase in the energy consumption for big data processing in Cloud data centers. In this study, a frequency-aware and energy-saving strategy based on dynamic voltage and frequency scaling (abbreviated as FAESS-DVFS) is proposed to reduce energy consumption for big data processing in Spark on YARN. Energy saving in two layers (YARN layer and Spark layer) has been designed and implemented for the proposed method. First, an optimal CPU frequency is presented in YARN layer based on the minimum energy efficiency ratio (EER) which can be obtained from status monitoring module. Then, a task scheduling method in Spark layer is constructed to optimize the energy consumption by dynamically adjusting the CPU frequency of nodes in the life cycle of different stages. Test on Hibench, the proposed method can achieve substantial energy saving of up to 29.5% for big data processing compared with the default algorithm in Spark on YARN while satisfying SLA constrains.

References

[1]
Xu L, Yu X, Gulliver A (2021) Intelligent outage probability prediction for mobile IoT networks based on an IGWO-Elman Neural Network.IEEE Transactions on Vehicular Technology PP(99):1–1.
[2]
Jlassi A, Martineau P (2016) Virtualization technologies for the big data environment. In: Proceedings of the 31st annual ACM symposium on applied computing, ACM, pp 542–545.
[3]
Kansal NJ and Chana I Energy-aware virtual machine migration for Cloud computing-a firefly optimization approach J Grid Comput 2016 14 2 327-345
[4]
Liu Q, Ma Y, Alhussein M, Zhang Y, and Peng L Green data center with iot sensing and Cloud-assisted smart temperature control system Comput Netw 2016 101 104-112
[5]
Ilager S, Ramamohanarao K, and Buyya R Thermal prediction for efficient energy management of clouds using machine learning IEEE Trans Parallel Distrib Syst 2021 32 5 1044-1056
[6]
Arroba P, Moya JM, Ayala JL, and BuyyaR, Dynamic voltage and frequency scaling- aware dynamic consolidation of virtual machines for energy efficient Cloud data centers Concurr Comput: Practice Exp 2017 29 10 e4067
[7]
Jiang J, Lin Y, Xie G, Fu L, and Yang J Time and energy optimization algorithms for the static scheduling of multiple workflows in heterogeneous computing system J Grid Comput 2017 15 4 435-456
[8]
MTI I, SNSA B, SK A, RB A (2020) Cost-efficient dynamic scheduling of big data applications in apache spark on cloud. J Syst Softw 162.
[9]
Guo W, Huang C,Tian W (2020) Handling data skew at reduce stage in spark by reducepartition. Concurr Comput: Practice Exp 32(9).
[10]
Tian W, Li G, Yang W, and Buyya R Hscheduler: an optimal approach to minimize the makespan of multiple mapreduce jobs J Supercomput 2016 72 6 2376-2393
[11]
Yousefi MHN and Goudarzi M A task-based greedy scheduling algorithm for minimizing energy of mapreduce jobs J Grid Comput 2018 16 4 535-551
[12]
Rasooli A and Down DG A hybrid scheduling approach for scalable heterogeneous hadoop systems 2012 Networking Storage and Analysis High Performance Computing 1284-1291
[13]
Rasooli A and Down DG Coshh: A classification and optimization-based scheduler for heterogeneous hadoop systems Futur Gener Comput Syst 2014 36 1-15
[14]
Tang Z, Qi L, Cheng Z, Li K, Khan SU, and Li K An energy-efficient task scheduling algorithm in dvfs-enabled Cloud environment J Grid Comput 2016 14 1 55-74
[15]
Rauber T, Runger G (2019) DVFS RK: performance and energy modeling of frequency-scaled multithreaded Runge-Kutta methods. In: 27th Euromicro international conference on parallel, distributed and network-based processing (PDP).
[16]
Ge R, Feng X, Feng WC, Cameron KW (2007) CPU MISER: A performance-directed, run-time system for power-aware clusters. Parallel Processing, 2007. In: International conference on. IEEE computer society.
[17]
Ibrahim S, Phan TD, Carpen-Amarie A, Chihoub HE, Moise D, and Antoniu G Governing energy consumption in hadoop through cpu frequency scaling: An analysis Futur Gener Comput Syst 2016 54 219-232
[18]
Zhu X, He C, Li K, and Qin X Adaptive energy-efficient scheduling for real-time tasks on dvs-enabled heterogeneous clusters J Parallel Distrib Comput 2012 72 6 751-763
[19]
Li S, Abdelzaher T, Yuan M (2011) TAPA: Temperature aware power allocation in data center with Map-Reduce. In: International green computing conference and workshops pp 1–8.
[20]
Li X, Garraghan P, Jiang X, Wu Z, Xu J (2017) Holistic virtual machine scheduling in cloud datacenters towards minimizing total energy. IEEE Trans Parallel Distrib Syst PP(99) 1–1.
[21]
Mhedheb Y, Jrad F, Tao J, Zhao J, Kołodziej J, Streit A (2013) Load and Thermal-Aware VM Scheduling on the Cloud. In: Kołodziej J, Di Martino B, Talia D, Xiong K. (eds) Algorithms and architectures for parallel processing. ICA3PP 2013. Lecture Notes in Computer Science, vol 8285.
[22]
Cai X, Li F, Li P, Ju L, and Jia Z Sla-aware energy-efficient scheduling scheme for hadoop YARN J Supercomput 2017 73 8 3526-3546
[23]
Yao Y, Wang J, Sheng B, Lin J, Mi N (2014) HaSTE: Hadoop YARN scheduling based on task-dependency and resource-demand. In: 2014 IEEE 7th international conference on cloud computing pp 184–191.
[24]
Dhawalia P, Kailasam S, Janakiram D (2013) Chisel: A resource savvy approach for handling skew in MapReduce applications. In: IEEE sixth international conference on cloud computing, IEEE, pp 652–660.
[25]
Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkatara-man S, Franklin MJ, et al. Apache spark: a unified engine for big data processing Commun ACM 2016 59 11 56-65
[26]
Mashayekhy L, Nejad MM, Grosu D, Zhang Q, and Shi W Energy-aware scheduling of mapreduce jobs for big data applications IEEE Trans Parallel Distrib Syst 2014 26 10 2720-2733
[27]
Guo Y, Rao J, Cheng D, and Zhou X ishuffle: Improving hadoop performance with shuffle-on-write IEEE Trans Parallel Distrib Syst 2016 28 6 1649-1662
[28]
Chen Q, Yao J, and Xiao Z Libra: Lightweight data skew mitigation in mapreduce IEEE Trans Parallel Distrib Syst 2014 26 9 2520-2533
[29]
Buyya R, Yeo CS, Venu-gopal S, Broberg J, and Brandic I Cloud computing and emerging it platforms: Vision, hype, and reality for delivering computing as the 5th utility Futur Gener Comput Syst 2009 25 6 599-616
[30]
Lee YC and Zomaya AY Energy conscious scheduling for distributed computing systems under different operating conditions IEEE Trans Parallel Distrib Syst 2010 22 8 1374-1381
[31]
Sundriyal V and Sosonkina M Modeling of the cpu frequency to minimize energy consumption in parallel applications Sustain Comput: Inform Syst 2018 17 1-8
[32]
Li H, Wang H, Fang S, Zou Y, and Tian W An energy-aware scheduling algorithm for big data applications in spark Clust Comput 2019 23 2 593-609
[33]
Huang S, Huang J, Dai J, Xie T, Huang B (2010) The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: 2010 IEEE 26th international conference on data engineering workshops (ICDEW 2010), IEEE, pp 41–51.

Cited By

View all
  • (2024)Reducing Energy Bloat in Large Model TrainingProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695970(144-159)Online publication date: 4-Nov-2024
  • (2023)EFTuner: A Bi-Objective Configuration Parameter Auto-Tuning Method Towards Energy-Efficient Big Data ProcessingProceedings of the 14th Asia-Pacific Symposium on Internetware10.1145/3609437.3609443(292-301)Online publication date: 4-Aug-2023
  • (2023)A dynamic energy conservation scheme with dual-rate adjustment and semi-sleep mode in cloud systemThe Journal of Supercomputing10.1007/s11227-022-04715-w79:3(2451-2487)Online publication date: 1-Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image The Journal of Supercomputing
The Journal of Supercomputing  Volume 77, Issue 10
Oct 2021
1466 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 October 2021
Accepted: 12 March 2021

Author Tags

  1. Big data
  2. Spark on YARN
  3. Energy efficiency
  4. Dynamic voltage and frequency scaling (DVFS)
  5. Frequency-aware

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Reducing Energy Bloat in Large Model TrainingProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles10.1145/3694715.3695970(144-159)Online publication date: 4-Nov-2024
  • (2023)EFTuner: A Bi-Objective Configuration Parameter Auto-Tuning Method Towards Energy-Efficient Big Data ProcessingProceedings of the 14th Asia-Pacific Symposium on Internetware10.1145/3609437.3609443(292-301)Online publication date: 4-Aug-2023
  • (2023)A dynamic energy conservation scheme with dual-rate adjustment and semi-sleep mode in cloud systemThe Journal of Supercomputing10.1007/s11227-022-04715-w79:3(2451-2487)Online publication date: 1-Feb-2023
  • (2023)Cost-Aware Scheduling and Data Skew Alleviation for Big Data Processing in Heterogeneous Cloud EnvironmentJournal of Grid Computing10.1007/s10723-023-09661-221:3Online publication date: 22-Jun-2023
  • (2023)Energy-aware parameter tuning for mixed workloads in cloud serverCluster Computing10.1007/s10586-023-04212-627:4(4805-4821)Online publication date: 27-Dec-2023
  • (2023)A distributed and energy-efficient KNN for EEG classification with dynamic money-saving policy in heterogeneous clustersComputing10.1007/s00607-023-01193-7105:11(2487-2510)Online publication date: 1-Nov-2023
  • (2022)DRL-based and Bsld-Aware Job Scheduling for Apache Spark Cluster in Hybrid Cloud Computing EnvironmentsJournal of Grid Computing10.1007/s10723-022-09630-120:4Online publication date: 1-Dec-2022

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media