Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Green Data Analytics of Supercomputing from Massive Sensor Networks: : Does Workload Distribution Matter?

Published: 01 December 2023 Publication History

Abstract

Energy costs represent a significant share of the total cost of ownership in high-performance computing systems. Using a unique data set collected by massive sensor networks in a petascale national supercomputing center, we first present an explanatory model to identify key factors affecting energy consumption in supercomputing. Our analytic results show that workload distribution among the nodes has significant effects and could effectively be leveraged to improve energy efficiency. We then establish the high model performance using in-sample and out-of-sample analyses and develop prescriptive models for energy-optimal runtime workload management. We present four dynamic resource management methodologies (packing, load balancing, threshold-based switching, and energy optimization), model their application at two levels (within-rack and cross-rack resource allocation), and explore runtime resource redistribution policies for jobs under the computational steering and comparatively evaluate strategies that use computational steering with those that do not. Our experimental results lead to a threshold strategy that yields near-optimal energy efficiency under all workload conditions. We further calibrate the energy-optimal resource allocations over the full range of workloads and present a bi-criteria evaluation to consider energy consumption and job performance tradeoffs. We conclude with implementation guidelines and policy insights into energy-efficient computing resource management in large supercomputing centers.

Abstract

Energy costs represent a significant share of the total cost of ownership in high-performance computing (HPC) systems. Using a unique data set collected by massive sensor networks in a petascale national supercomputing center, we first present an explanatory model to identify key factors that affect energy consumption in supercomputing. Our analytic results show that, not only does computing node utilization significantly affect energy consumption, workload distribution among the nodes also has significant effects and could effectively be leveraged to improve energy efficiency. Next, we establish the high model performance using in-sample and out-of-sample analyses. We then develop prescriptive models for energy-optimal runtime workload management and extend the models to consider energy consumption and job performance tradeoffs. Specifically, we present four dynamic resource management methodologies (packing, load balancing, threshold-based switching, and energy optimization), model their application at two levels (purely within-rack and jointly cross-rack resource allocation), and explore runtime resource redistribution policies for jobs under the emergent principle of computational steering and comparatively evaluate strategies that use computational steering with those that do not. Our experimental studies show that packing is preferred when the total workload of a rack is higher than a threshold and load balancing is preferred when it is lower. These results lead to a threshold strategy that yields near-optimal energy efficiency under all workload conditions. We further calibrate the energy-optimal resource allocations over the full range of workloads and present a bicriteria evaluation to consider energy consumption and job performance tradeoffs. We demonstrate significant energy savings of our proposed strategies under various workload conditions. We conclude with implementation guidelines and policy insights into energy-efficient computing resource management in large supercomputing data centers.
History: Olivia Liu Sheng, Senior Editor; Xiaobai Li, Associate Editor.
Funding: The authors thank the National Supercomputing Center of Singapore for enabling and supporting this research. J. Li received support from the National Natural Science Foundation of China [Grants 71901169, 72032006, and T2293774].
Supplemental Material: The online appendices are available at https://doi.org/10.1287/isre.2023.1208.

References

[1]
Aghdashi A, Mirtaheri SL (2019) A survey on load balancing in cloud systems for big data applications. Grandinetti L, Mirtaheri S, Shahbazian R, eds. Proc. Internat. Congress on High-Performance Comput. and Big Data Analysis (Springer, Berlin), 156–173.
[2]
Akbar S, Li R (2022) A Shapley value-based thermal-efficient workload distribution in heterogeneous data centers. J. Supercomput. 78:14419–14447.
[3]
Atanasov A, Bungartz HJ, Frisch J, Mehl M, Mundani RP, Rank E, van Treeck C (2010) Computational steering of complex flow simulations. Wagner S, Steinmetz M, Bode A, Müller M, eds. High Performance Computing in Science and Engineering (Springer, Berlin), 63–74.
[4]
Beloglazov A, Buyya R (2010) Energy efficient resource management in virtualized cloud data centers. Parashar M, Buyya R, eds. Proc. 10th IEEE/ACM Internat. Conf. on Cluster, Cloud and Grid Comput. (IEEE, New York), 826–831.
[5]
Bermejo B, Juiz C, Guerrero C (2019) Virtualization and consolidation: A systematic review of the past 10 years of research on energy and performance. J. Supercomput. 75:808–836.
[6]
Bruno GSF (2005) Estimation and inference in dynamic unbalanced panel-data models with a small number of individuals. Stata J. 5(4):473–500.
[7]
Buyya R, Beloglazov A, Abawajy J (2010) Energy-efficient management of data center resources for cloud computing: A vision, architectural elements, and open challenges. Preprint, submitted June 2, https://arxiv.org/abs/1006.0308.pdf.
[8]
Chatterjee S, Misra S, Khan SU (2019) Optimal data center scheduling for quality of service management in sensor-cloud. IEEE Trans. Cloud Comput. 7(1):89–101.
[9]
Chen C, He B, Tang X (2012) Green-aware workload scheduling in geographically distributed data centers. Proc. IEEE 4th Internat. Conf. on Cloud Comput. Tech. and Sci. (IEEE, New York), 82–89.
[10]
Chen G, He W, Liu J, Nath S, Rigas L, Xiao L, Zhao F (2008) Energy-aware server provisioning and load dispatching for connection-intensive Internet services. Proc. 5th USENIX Sympos. on Networked Systems Design and Implementation (USENIX Association, Berkeley, CA), 337–350.
[11]
Chernicoff D (2009) The Shortcut Guide to Data Center Energy Efficiency (Realtime Publisher, New York).
[12]
Chetsa GLT, Lefèvre L, Pierson JM, Stolf P, Costa GD (2014) Exploiting performance counters to predict and improve energy performance of HPC system. Future Generation Comput. Systems 36:287–298.
[13]
Coffman EG Jr, Garey MR, Johnson DS (1996) Approximation algorithms for bin packing: A survey. Hochbaum DS, ed. Approximation Algorithms for NP-Hard Problems (PWS Publishing Co., Boston), 46–93.
[14]
Cohen MC, Keller PW, Mirrokni V, Zadimoghaddam M (2019) Overcommitment in cloud services: Bin packing with chance constraints. Management Sci. 65(7):3255–3271.
[15]
Croxton KL, Gendron B, Magnanti TL (2003) A comparison of mixed-integer programming models for nonconvex piecewise linear cost minimization problems. Management Sci. 49(9):1268–1273.
[16]
Dabbagh M, Hamdaoui B, Guizani M, Rayes A (2015) Energy-efficient resource allocation and provisioning framework for cloud data centers. IEEE eTrans. Network Service Management 12(3):377–391.
[17]
Danani BK, D’Amora BD (2015) Computational steering for high performance computing: applications on Blue Gene/Q system. Watson LT, Weinbub J, Sosonkina M, Thacker WI, Rupp K, eds. Proc. Sympos. on High Performance Comput. (Society for Computer Simulation International, San Diego), 202–209.
[18]
Dayarathna M, Wen Y, Fan R (2016) Data center energy consumption modeling: A survey. IEEE Comm. Survey Tutorial 18(1):732–794.
[19]
Delorme M, Iori M, Martello S (2016) Bin packing and cutting stock problems: Mathematical models and exact algorithms. Eur. J. Oper. Res. 255(1):1–20.
[20]
Deng W, Liu F, Jin H, Liao X, Liu H (2014) Reliability-aware server consolidation for balancing energy-lifetime tradeoff in virtualized cloud datacenters. Internat. J. Comm. Systems 27:623–642.
[21]
Drozdiak N (2020) EU eyes carbon-neutral data centers by 2030 in green-tech switch. Energy (February 6). Accessed October 27, 2022, https://www.datacenterknowledge.com/energy/eu-eyes-carbon-neutral-data-centers-2030-green-tech-switch.
[22]
Ei-Moursy AA, Abdelsamea A, Kamran R, Saad M (2019) Multi-dimensional regression host utilization algorithm (MDRHU) for host overload detection in cloud computing. J. Cloud Comput. Adv. Systems Appl. 8(8):1–17.
[23]
Emerson Network Power (2009) Energy logic: Reducing data center energy consumption by creating savings that cascade across systems. Accessed October 27, 2022, https://www.varinsights.com/doc/energy-logic-reducing-data-center-energy-0008.
[24]
Fang X, Sheng OR, Goes P (2013) When is the right time to refresh knowledge discovered from data? Oper. Res. 61(1):32–44.
[25]
Gmach D, Rolia J, Cherkasova L, Kemper A (2009) Resource pool management: Reactive vs. proactive or let’s be friends. Comput. Networks 53:2905–2922.
[26]
Grandl R, Ananthanarayanan G, Kandula S, Rao S, Akella A (2014) Multi-resource packing for cluster schedulers. Comput. Comm. Rev. 44(4):455–466.
[27]
Gupta R, Asgari S, Moazamigoodarzi H, Down DG, Puri IK (2021) Energy, exergy and computing efficiency-based data center workload and cooling management. Appl. Energy 299:117050.
[28]
Gurobi Optimization LLC (2020) Gurobi optimizer reference manual. Accessed August 3, 2021, https://www.gurobi.com/documentation/9.0/refman/index.html.
[29]
Hovestadt M, Kao O, Keller A, Streit A (2003) Scheduling in HPC resource management systems: Queuing vs. planning. Feitelson D, Rudolph L, Schwiegelshohn U, eds. Proc. Workshop on Job Scheduling Strategies for Parallel Processing (Springer, Berlin), 1–20.
[30]
Huang Q, Gao F, Wang R, Qi Z (2011) Power consumption of virtual machine live migration in clouds. Yuan D, Cao M, Wang CX, Huang H, eds. Proc. 3rd Internat. Conf. on Comm. and Mobile Comput. (IEEE, New York), 122–125.
[31]
Ilager S, Ramamohanarao K, Buyya R (2019) Etas: Energy and thermal aware dynamic virtual machine consolidation in cloud data center with proactive hotspot mitigation. Concurrent Comput. 31(17):e5221.
[32]
Info-Tech Research Group (2007) Top 10 energy-saving tips for a greener data center. Accessed October 27, 2022, http://static.infotech.com/downloads/samples/070411_premium_oo_greendc_top_10.pdf.
[33]
Ketter W, Peters M, Collins J, Gupta A (2016) Competitive benchmarking: An IS research approach to address wicked problems with big data and analytics. Management Inform. Systems Quart. 40(4):1057–1080.
[34]
Korte B, Vygen J (2006) Bin-packing. Combinatorial Optimization: Theory and Algorithms (Springer, Berlin), 425–442.
[35]
Kushwaha M, Gupta S (2015) Various schemes of load balancing in distributed systems: A review. Internat. J. Scientific Res. Engrg. Tech. 4(7):741–748.
[36]
Loeser F, Recker J, Brocke JV, Molla A, Zarnekow R (2017) How IT executives create organizational benefits by translating environmental strategies into Green IS initiatives. Inform. Systems J. 27(4):503–553.
[37]
Loock C, Staake T, Thiesse F (2013) Motivating energy-efficient behavior with green IS: An investigation of goal setting and the role of defaults. Management Inform. Systems Quart. 37(4):1313–1332.
[38]
Malhotra A, Melville NP, Watson RT (2013) Spurring impactful research on information systems for environmental sustainability. Management Inform. Systems Quart. 37(4):1265–1274.
[39]
Melville NP (2010) Information systems innovation for environmental sustainability. Management Inform. Systems Quart. 34(1):1–21.
[40]
Meng J, McCauley S, Kaplan F, Leung VJ, Coskun AK (2015) Simulation and optimization of HPC job allocation for jointly reducing communication and cooling costs. Sustainable Comput. Inform. Systems 6:48–57.
[41]
Mišić VV, Perakis G (2020) Data analytics in operations management: A review. Manufacturing Service Oper. Management 22(1):158–169.
[42]
Nickell SJ (1981) Biases in dynamic models with fixed effects. Econometrica 49:1417–1426.
[43]
Pakbaznia E, Pedram M (2009) Minimizing data center cooling and server power costs. Proc. Internat. Sympos. on Low Power Electronics and Design (IEEE, New York), 145–150.
[44]
Pawlish M, Varde AS, Robila SA (2012) Analyzing utilization rates in data centers for optimizing energy management. Proc. Internat. Green Comput. Conf. (IEEE, New York), 1–6.
[45]
Pitkin E (2018) Slashing HPC energy costs with automated, dynamic optimization. The Next Platform (August 24). Accessed October 27, 2022, https://www.nextplatform.com/2018/08/24/slashing-hpc-energy-costs-with-automated-dynamic-optimization/.
[46]
Qiu Y, Jiang C, Wang Y, Ou D, Li Y, Wan J (2019) Energy aware virtual machine scheduling in data centers. Energies 12(4):1–21.
[47]
Rani S, Ahmed SH, Talwar R, Malhotra J (2017) Can sensors collect big data? An energy-efficient big data gathering algorithm for a WSN. IEEE Trans. Industrial Inform. 13(4):1961–1968.
[48]
Schöne R, Treibig J, Dolz MF, Guillen C, Navarrete C, Knobloch M, Rountree B (2014) Tools and methods for measuring and tuning the energy efficiency of HPC systems. Sci. Programming 22(4):273–283.
[49]
Shoukourian H, Wilde T, Auweter A, Bode A (2014) Predicting the energy and power consumption of strong and weak scaling HPC applications. Supercomput. Frontiers Innovations 1(2):20–41.
[50]
Shuja J, Madani SA, Bilal K, Hayat K, Khan SU, Sarwar S (2012) Energy-efficient data centers. Computing 94(12):973–994.
[51]
Speitkamp B, Bichler M (2010) A mathematical programming approach for server consolidation problems in virtualized data centers. IEEE Trans. Service Comput. 3:266–278.
[52]
Takaishi D, Nishiyama H, Kato N, Miura R (2014) Toward energy efficient big data gathering in densely distributed sensor networks. IEEE Trans. Emerging Top Comput. 2(3):388–397.
[53]
Tang F, Yang LT, Tang C, Li J, Guo M (2018) A dynamical and load-balanced flow scheduling approach for big data centers in clouds. IEEE Trans. Cloud Comput. 6(4):915–928.
[54]
Tang Q, Gupta SKS, Varsamopoulos G (2008) Energy-efficient thermal-aware task scheduling for homogeneous high-performance computing data centers: A cyber-physical approach. IEEE Trans. Parallel Distribution Systems 19(22):1458–1472.
[55]
Van Damme T, De Persis C, Tesi P (2019) Optimized thermal-aware job scheduling and control of data centers. IEEE Trans. Control Systems Tech. 27(2):760–771.
[56]
Van Liere R, Mulder JD, Van Wijk JJ (1997) Computational steering. Future Generation Comput. Systems 12(5):441–450.
[57]
Varasteh A, Goudarzi M (2015) Server consolidation techniques in virtualized data centers: A survey. IEEE Systems J. 11(2):772–783.
[58]
Vetter J, Reed D (2000) Real-time performance monitoring, adaptive control, and interactive steering of computational grids. Internat. J. High Performance Comput. Appl. 14:357–366.
[59]
vom Brocke J, Watson RT, Dwyer C, Elliot S, Melville N (2013) Green information systems: Directives for the IS. Comm. Assoc. Inform. Systems 33(30):509–520.
[60]
Watson RT, Boudreau MC, Chen AJ (2010) Information systems and environmentally sustainable development: Energy informatics and new directions for the IS community. Management Inform. Systems Quart. 34(1):23–38.
[61]
Wolke A, Tsend-Ayush B, Pfeiffer C, Bichler M (2015) More than bin packing: Dynamic resource allocation strategies in cloud data centers. Inform. Systems 52:83–95.
[62]
World Economic Forum (2020) Global innovations from the energy sector 2010-2020. Accessed October 27, 2022, https://www.weforum.org/whitepapers/global-innovations-from-the-energy-sector.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Information Systems Research
Information Systems Research  Volume 34, Issue 4
December 2023
502 pages
ISSN:1526-5536
DOI:10.1287/isre.2023.34.issue-4
Issue’s Table of Contents

Publisher

INFORMS

Linthicum, MD, United States

Publication History

Published: 01 December 2023
Accepted: 18 December 2022
Received: 03 August 2021

Author Tags

  1. high-performance computing
  2. data center
  3. energy-efficient operation
  4. data analytics
  5. autoregressive model
  6. dynamic panel data
  7. optimization

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media