Abstract
Although major cloud providers have captured and published workload executions in the form of traces, it is not clear how to use them for workload generation on a wide range of existing platforms. A methodological challenge that remains is to generate and execute realistic datacenter workloads on any infrastructure, using information from available traces. In this paper, we propose Tracie, a methodology addressing this challenge, and introduce the tool supporting its implementation. We present all the necessary steps starting from a trace up to workload execution: analysis of datacenter traces, extraction of parameters, application selection, and scaling of a workload to match the capabilities of the underlying infrastructure. Our evaluation validates that Tracie can generate executable workloads that closely resemble their trace-based counterparts. For validation, we correlate the recorded system metrics of a trace against the actual execution. We find that the average system metrics of synthetic workloads differ at most 5% compared to the trace and that they are highly correlated at 70% on average.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
ab Benchmark - Apache HTTP server benchmarking tool. https://httpd.apache.org/docs/2.4/programs/ab.html
Swim. https://github.com/SWIMProjectUCB/SWIM/wiki/Workloads-repository
The Apache CouchDB. https://couchdb.apache.org/
The Memcached I/O cache. https://memcached.org/
The Redis Database. https://redis.io/
Abdul-Rahman, O.A., Aida, K.: Towards understanding the usage behavior of Google cloud users: the mice and elephants phenomenon. In: IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2014)
Apache: GridMix. https://hadoop.apache.org/docs/r1.2.1/gridmix.html
Benesty, J., Chen, J., Huang, Y., Cohen, I.: Pearson correlation coefficient. In: Noise Reduction in Speech Processing. Springer Topics in Signal Processing, vol. 2, pp. 1–4. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00296-0_5
Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Chen, Y., Alspaugh, S., Katz, R.: Interactive analytical processing in big data systems: A cross-industry study of mapreduce workloads. arXiv preprint arXiv:1208.4174 (2012)
Chen, Y., Ganapathi, A.S., Griffith, R., Katz, R.H.: Analysis and lessons from a publicly available Google cluster trace. Technical report. UCB/EECS-2010-95, EECS Department, University of California, Berkeley, June 2010. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-95.html
Di, S., Kondo, D., Cappello, F.: Characterizing and modeling cloud applications/jobs on a Google data center. J. Supercomput. 69, 139–160 (2014). https://doi.org/10.1007/s11227-014-1131-z
Di, S., Kondo, D., Cirne, W.: Characterization and comparison of cloud versus grid workloads. In: IEEE Cluster (2012)
Efron, B., Tibshirani, R., et al.: Using specially designed exponential families for density estimation. Ann. Stat. 24(6), 2431–2461 (1996)
Gao, W., et al.: Bigdatabench: A scalable and unified big data and ai benchmark suite. arXiv preprint arXiv:1802.08254 (2018)
Gray, A.G., Moore, A.W.: Nonparametric density estimation: toward computational tractability. In: Proceedings of the 2003 SIAM International Conference on Data Mining, pp. 203–211. SIAM (2003)
Guidoum, A.C.: Kernel estimator and bandwidth selection for density and its derivatives. The Kedd package, version 1 (2015)
Han, R., Zong, Z., Zhang, F., Vazquez-Poletti, J.L., Jia, Z., Wang, L.: Cloudmix: generating diverse and reducible workloads for cloud systems. In: 2017 IEEE 10th International Conference on Cloud Computing (CLOUD)
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The hibench benchmark suite: characterization of the mapreduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW). IEEE (2010)
Izenman, A.J.: Review papers: recent developments in nonparametric density estimation. J. Am. Stat. Assoc. 86(413), 205–224 (1991)
Karamizadeh, S., Abdullah, S.M., Manaf, A.A., Zamani, M., Hooman, A.: An overview of principal component analysis. J. Signal Inf. Process. 4(3B), 173 (2013)
Liu, B., Lin, Y., Chen, Y.: Quantitative workload analysis and prediction using Google cluster traces. In: 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS),pp. 935–940 (2016)
Liu, Z., Cho, S.: Characterizing machines and workloads on a Google cluster. In: Proceedings of the 2012 41st International Conference on Parallel Processing Workshops, ICPPW 2012, IEEE Computer Society, Washington, DC
Lu, C., Ye, K., Xu, G., Xu, C.Z., Bai, T.: Imbalance in the cloud: an analysis on Alibaba cluster trace. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 2884–2892. IEEE (2017)
Moreno, I.S., Garraghan, P., Townend, P., Xu, J.: An approach for characterizing workloads in Google cloud to derive realistic resource utilization models. In: SOSE, pp. 49–60. IEEE Computer Society (2013)
Nambiar, R., Wakou, N., Carman, F., Majdalany, M.: Transaction Processing Performance Council (TPC), State of the council (2010)
Nedelcu, C.: Nginx HTTP Server: Adopt Nginx for Your Web Applications to Make the Most of Your Infrastructure and Serve Pages Faster Than Ever. Packt Publishing Ltd., Birmingham (2010)
Reiss, C., Tumanov, A., Ganger, G.R., Katz, R.H., Kozuch, M.A.: Heterogeneity and dynamicity of clouds at scale: Google trace analysis. In: Proceedings of the Third ACM Symposium on Cloud Computing (2012)
Reiss, C., Tumanov, A., Ganger, G.R., Katz, R.H., Kozuch, M.A.: Towards understanding heterogeneous clouds at scale: Google trace analysis (2012)
Tirmazi, M., et al.: Borg: the next generation. In: Proceedings of the Fifteenth European Conference on Computer Systems, pp. 1–14 (2020)
Varanasi, M.K., Aazhang, B.: Parametric generalized gaussian density estimation. J. Acoust. Soc. Am. 86(4), 1404–1415 (1989)
Xiong, X., et al.: DCMIX: generating mixed workloads for the cloud data center. In: Zheng, C., Zhan, J. (eds.) Bench 2018. LNCS, vol. 11459, pp. 105–117. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32813-9_10
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I., et al.: Spark: cluster computing with working sets. HotCloud 10(10), 95 (2010)
Acknowledgments
We thankfully acknowledge the support of the European Commission under the Horizon 2020 Framework Programme for Research and Innovation through the EVOLVE H2020 project (Grant Agreement Nr 825061).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Sfakianakis, Y., Kanellou, E., Marazakis, M., Bilas, A. (2021). Trace-Based Workload Generation and Execution. In: Sousa, L., Roma, N., Tomás, P. (eds) Euro-Par 2021: Parallel Processing. Euro-Par 2021. Lecture Notes in Computer Science(), vol 12820. Springer, Cham. https://doi.org/10.1007/978-3-030-85665-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-85665-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85664-9
Online ISBN: 978-3-030-85665-6
eBook Packages: Computer ScienceComputer Science (R0)