Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

An open-source simulation platform for benchmarking geo-distributed data center schedulers

Published: 01 November 2024 Publication History

Abstract

To help meet the ever-increasing demand for cloud computing services and resources worldwide, while providing resilience and adequate resource utilization, cloud service providers have opted to distribute their data centers around the world. This trend has been motivating research from the data center management research and practitioner community on new job schedulers that take into account data center geographical distribution. However, testing and benchmarking new schedulers for geo-distributed data centers is complicated by the lack of a common, easily extensible experimental platform. To fill this gap, we propose GDSim, an open-source, extensible job scheduling simulation environment for geo-distributed data centers that aims at facilitating the benchmarking of existing and new geo-distributed schedulers by subjecting them to a variety of data center features and conditions We use our geo-distributed job scheduler simulation platform to reproduce experiments and results for recently proposed geo-distributed job schedulers, as well as testing those schedulers under new conditions which can reveal trends that have not been previously uncovered.

References

[2]
Google Data Centers. Discover our data center locations, 2020, https://www.google.com/about/datacenters/locations/
[4]
Engineering at Meta. Data centers year in review, 2018, https://engineering.fb.com/data-center-engineering/data-centers-2018/
[5]
Bansal N and Harchol-Balter M. Analysis of SRPT scheduling: investigating unfairness. SIGMETRICS Perform Eval Rev 2001; 29: 279–290.
[6]
Klusàček D, Tóth Š, and Podolníkovà G. Complex job scheduling simulations with alea 4. In: Proceedings of the 9th EAI international conference on simulation tools and techniques. SIMUTOOLS’16, Prague, 22–24 August 2016, pp. 124–129. Brussels: ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering).
[7]
Klusàcek D and Rudovà H. Alea 2: job scheduling simulator. In: SIMUTools ‘10: proceedings of the 3rd international ICST conference on simulation tools and techniques, Malaga, 15–19 March 2010.
[8]
Schwarzkopf M, Konwinski A, and Abd-El-Malek M, et al. Omega: flexible, scalable schedulers for large compute clusters. In: Proceedings of the 8th ACM European conference on computer systems. EuroSys ’13, Prague, 15–17 April 2013, pp. 351–364. New York: Association for Computing Machinery.
[9]
Reiss C, Tumanov A, and Ganger GR, et al. Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In: Proceedings of the third ACM symposium on cloud computing. SoCC ’12, San Jose, CA, 14–17 October 2012. New York: Association for Computing Machinery.
[10]
Sharma G and Ganpati A. Performance evaluation of fair and capacity scheduling in Hadoop YARN. In: 2015 international conference on green computing and Internet of Things (ICGCIoT), Greater Noida, India, 8–10 October 2015, pp. 904–906. New York: IEEE.
[11]
Chen Y, Ganapathi A, and Griffith R, et al. The case for evaluating mapreduce performance using workload suites. In: 2011 IEEE 19th annual international symposium on modelling, analysis, and simulation of computer and telecommunication systems, Singapore, 25–27 July 2011, pp. 390–399. New York: IEEE.
[12]
Alves D, Obraczka K, and Kabbani A. GDSim: benchmarking geo-distributed data center schedulers. In: 2021 IEEE 10th international conference on cloud networking (CloudNet), Cookeville, TN, 8–10 November 2021, pp. 148–156. New York: IEEE.
[13]
Bambagini M, Marinoni M, and Aydin H, et al. Energy-aware scheduling for real-time systems: a survey. ACM Trans Embed Comput Syst 2016; 15: 1–34.
[14]
Govindan S, Nath AR, and Das A, et al. Xen and co. Communication-aware CPU scheduling for consolidated xen-based hosting platforms. In: Proceedings of the 3rd international conference on Virtual execution environments, San Diego, CA, 13–15 June 2007, pp. 126–136. New York: IEEE.
[15]
Zhuravlev S, Saez JC, and Blagodurov S, et al. Survey of energy-cognizant scheduling techniques. IEEE Trans Parallel Distrib Syst 2012; 24: 1447–1464.
[16]
Wang K, Zhou Q, and Guo S, et al. Cluster frameworks for efficient scheduling and resource allocation in data center networks: a survey. IEEE Commun Surv Tutor 2018; 20: 3560–3580.
[17]
Kołodziej J and Khan SU. Data scheduling in data grids and data centers: a short taxonomy of problems and intelligent resolution techniques. In: Nguyen NT, Kołodziej J, and Burczyński T, et al. (eds) Transactions on computational collective intelligence X. Berlin: Springer, 2013, pp. 103–119.
[18]
Adhikary T, Das AK, and Razzaque MA, et al. Energy-efficient scheduling algorithms for data center resources in cloud computing. In: 2013 IEEE 10th international conference on high performance computing and communications 2013 IEEE international conference on embedded and ubiquitous computing, Zhangjiajie, China, 13–15 November 2013, pp. 1715–1720. New York: IEEE.
[19]
Giroire F, Huin N, and Tomassilli A, et al. When network matters: data center scheduling with network tasks. In: IEEE INFOCOM 2019 -IEEE conference on computer communications, Paris, 29 April–2 May 2019, pp. 2278–2286. New York: IEEE.
[20]
Jiang C, Wang C, and Liu X, et al. A survey of job scheduling in grids. In: Dong G, Lin X, and Wang W, et al. (eds) Advances in data and web management. Berlin: Springer, 2007, pp. 419–427.
[21]
Al- Najjar HM and Hassan SSNAS. A survey of job scheduling algorithms in distributed environment. In: 2016 6th IEEE international conference on control system, computing and engineering (ICCSCE), Penang, Malaysia, 25–27 November 2016, pp. 39–44. New York: IEEE.
[22]
Binkert N, Beckmann B, and Black G, et al. The gem5 simulator. SIGARCH Comput Archit News 2011; 39: 1–7.
[23]
Ubal R, Jang B, and Mistry P, et al. Multi2Sim: a simulation framework for CPU-GPU computing. In: Proceedings of the 21st international conference on parallel architectures and compilation techniques, Minneapolis, MN, 19–23 September 2012.
[24]
Vincent S, Montavont J, and Montavont N. Implementation of an IPv6 stack for NS-3. In: VALUETOOLS, Athens, October 2008.
[25]
Kugler P, Nordhus P, and Eskofier B. Shimmer, Cooja and Contiki: a new toolset for the simulation of on-node signal processing algorithms. In: 2013 IEEE international conference on body sensor networks, Cambridge, MA, 6–9 May 2013, pp. 1–6. New York: IEEE.
[27]
Buyya R, Ranjan R, and Calheiros RN. Modeling and simulation of scalable cloud computing environments and the cloudsim toolkit: challenges and opportunities. In: 2009 international conference on high performance computing & simulation, Leipzig, 21–24 June 2009, pp. 1–11. New York: IEEE.
[28]
Halappanavar M, Schram M, and de la Torre L, et al. Towards efficient scheduling of data intensive high energy physics workflows. In: Proceedings of the 10th workshop on workflows in support of large-scale science. WORKS ’15, Austin, TX, 15 November 2015. New York: Association for Computing Machinery.
[29]
Hsieh K, Harlap A, and Vijaykumar N, et al. Gaia: geodistributed machine learning approaching LAN speeds. In: 14th USENIX symposium on networked systems design and implementation (NSDI 17). Boston, MA: USENIX Association, https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/hsieh
[30]
Pu Q, Ananthanarayanan G, and Bodik P, et al. Low latency geo-distributed data analytics. SIGCOMM Comput Commun Rev 2015; 45: 421–434.
[31]
Hung CC, Golubchik L, and Yu M. Scheduling jobs across geo-distributed datacenters. In: Proceedings of the sixth ACM symposium on cloud computing. SoCC 15, Kohala Coast, HI, 27–29 August 2015, pp. 111–124. New York: Association for Computing Machinery.
[32]
Schrage L. A proof of the optimality of the shortest remaining processing time discipline. Oper Res 1968; 16: 687–690, http://www.jstor.org/stable/168596
[33]
Hu Z, Li B, and Luo J. Flutter: scheduling tasks closer to data across geo-distributed datacenters. In: IEEE INFOCOM 2016—The 35th annual IEEE international conference on computer communications, San Francisco, CA, 10–14 April 2016, pp. 1–9. New York: IEEE.
[34]
Convolbo MW, Chou J, and Hsu CH, et al. GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers. Computing 2018; 100: 21–46.
[35]
Amazon. Amazon web services, 2006, https://aws.amazon.com
[36]
Chen L, Liu S, and Li B, et al. Scheduling jobs across geo-distributed datacenters with max-min fairness. In: IEEE INFOCOM 2017—IEEE conference on computer communications, Atlanta, GA, 1–4 May 2017, pp. 1–9. New York: IEEE.
[37]
Jain S, Kumar A, and Mandal S, et al. B4: experience with a globally-deployed software defined WAN. SIGCOMM Comput Commun Rev 2013; 43: 3–14.
[38]
Moore J, Chase J, and Farkas K, et al. Data center workload monitoring, analysis, and emulation. In: Eighth workshop on computer architecture evaluation using commercial workloads, 2005, pp. 1–8, http://issg.cs.duke.edu/publications/caecw05-1.pdf
[39]
Kutare M, Eisenhauer G, and Wang C, et al. Monalytics: online monitoring and analytics for managing large scale data centers. In: Proceedings of the 7th international conference on autonomic computing. ICAC 10, Washington, DC, 7–11 June 2010, pp. 141–150. New York: Association for Computing Machinery.
[40]
Naik VK, Beaty K, and Vogl N, et al. Workload monitoring in hybrid clouds. In: 2013 IEEE sixth international conference on cloud computing, Santa Clara, CA, 28 June–3 July 2013, pp. 816–822. New York: IEEE.
[41]
Wilkes J. More Google cluster data. Google Research Blog, 29 November 2011, http://googleresearch.blogspot.com/2011/11/more-google-cluster-data.html
[42]
Reiss C, Wilkes J, and Hellerstein JL. Google cluster-usage traces: format + schema (Revised 2014-11-17 for version 2.1). Technical Report, Google Inc., Mountain View, CA, 2011, https://github.com/google/cluster-data
[43]
Guo J, Chang Z, and Wang S, et al. Who limits the resource efficiency of my datacenter: an analysis of Alibaba datacenter traces. In: Proceedings of the international symposium on quality of service. IWQoS 19, Phoenix, AZ, 24–25 June 2019. New York: Association for Computing Machinery.
[44]
Kuttivelil HS, Sreenivasamurthy S, and Krishnaswamy L, et al. Network simulation bridge: bridging applications to network simulators. In: Proceedings of the 19th ACM international symposium on QoS and security for wireless and mobile networks. Q2SWinet ’23, Montreal, QC, Canada, 30 October–3 November 2023, pp. 39–46. New York: Association for Computing Machinery.
[45]
Tolk A. Conceptual alignment for simulation interoperability: lessons learned from 30 years of interoperability research. SIMULATION. Epub ahead of print 21 December 2023.
[46]
Possik J, Zacharewicz G, and Zouggar A, et al. Hla-based time management and synchronization framework for lean manufacturing tools evaluation. SIMULATION 2023; 99: 347–362.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Simulation
Simulation  Volume 100, Issue 11
Nov 2024
104 pages

Publisher

Society for Computer Simulation International

San Diego, CA, United States

Publication History

Published: 01 November 2024

Author Tags

  1. Job scheduling
  2. data centers
  3. geo-distribution
  4. simulation

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media