Abstract
Large-scale data centers account for a significant share of the energy consumption in many countries. Machine learning technology requires intensive workloads and thus drives requirements for lots of power and cooling capacity in data centers. It is time to explore green machine learning. The aim of this paper is to profile a machine learning algorithm with respect to its energy consumption and to determine the causes behind this consumption. The first scalable machine learning algorithm able to handle large volumes of streaming data is the Very Fast Decision Tree (VFDT), which outputs competitive results in comparison to algorithms that analyze data from static datasets. Our objectives are to: (i) establish a methodology that profiles the energy consumption of decision trees at the function level, (ii) apply this methodology in an experiment to obtain the energy consumption of the VFDT, (iii) conduct a fine-grained analysis of the functions that consume most of the energy, providing an understanding of that consumption, (iv) analyze how different parameter settings can significantly reduce the energy consumption. The results show that by addressing the most energy intensive part of the VFDT, the energy consumption can be reduced up to a 74.3%.
Similar content being viewed by others
References
Ahmed, N.K., Atiya, A.F., Gayar, N.E., El-Shishiny, H.: An empirical comparison of machine learning models for time series forecasting. Econometric Rev. 29(5–6), 594–621 (2010)
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 161–168. ACM (2006)
De Francisci Morales, G.: SAMOA: a platform for mining big data streams. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 777–778. ACM (2013)
Demirci, M.: A survey of machine learning applications for energy-efficient resource management in cloud computing environments. In: IEEE 14th International Conference on Machine Learning and Applications (ICMLA), pp. 1185–1190 (2015)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)
Flach, P.: Machine Learning: The Art and Science of Algorithms that Make Sense of Data. Cambridge University Press, New York (2012)
Freire, A., Macdonald, C., Tonellotto, N., Ounis, I., Cacheda, F.: A self-adapting latency/power tradeoff model for replicated search engines. In: 7th ACM International Conference on Web Search and Data Mining, pp. 13–22 (2014)
Gama, J., Rocha, R., Medas, P.: Accurate decision trees for mining high-speed data streams. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 523–528. ACM (2003)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Hooper, A.: Green computing. Commun. ACM 51(10), 11–13 (2008)
Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106 (2001)
King, R.D., Feng, C., Sutherland, A.: Statlog: comparison of classification algorithms on large real-world problems. Appl. Artif. Intell. Int. J. 9(3), 289–333 (1995)
Kirkby, R.B.: Improving hoeffding trees. Ph.D. thesis, The University of Waikato (2007)
Kourtellis, N., Morales, G.D.F., Bifet, A., Murdopo, A.: VHT: Vertical Hoeffding Tree. arXiv preprint arXiv:1607.08325 (2016)
Martín, E.G., Lavesson, N., Grahn, H.: Energy efficiency in data stream mining. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, pp. 1125–1132. ACM (2015)
Garcia-Martín, E., Lavesson, N., Grahn, H.: Energy efficiency analysis of the very fast decision tree algorithm. In: Missaoui, R., Abdessalem, T., Latapy, M. (eds.) Trends in Social Network Analysis - Information Propagation, User Behavior Modelling, Forecasting, and Vulnerability Assessment(2017, to appear)
Murdopo, A.: Distributed decision tree learning for mining big data streams (2013)
Murugesan, S.: Harnessing green IT: principles and practices. IT Prof. 10(1), 24–33 (2008)
Noureddine, A., Rouvoy, R., Seinturier, L.: Monitoring energy hotspots in software. Autom. Softw. Eng. 22(3), 291–332 (2015)
Reams, C.: Modelling energy efficiency for computation. Ph.D. thesis, University of Cambridge (2012)
Wu, X., Zhu, X., Wu, G.Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)
Yang, T.J., Chen, Y.H., Sze, V.: Designing energy-efficient convolutional neural networks using energy-aware pruning. arXiv preprint arXiv:1611.05128 (2016)
Acknowledgments
This work is part of the research project “Scalable resource-efficient systems for big data analytics” funded by the Knowledge Foundation (grant: 20140032) in Sweden.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Garcia-Martin, E., Lavesson, N., Grahn, H. (2017). Identification of Energy Hotspots: A Case Study of the Very Fast Decision Tree. In: Au, M., Castiglione, A., Choo, KK., Palmieri, F., Li, KC. (eds) Green, Pervasive, and Cloud Computing. GPC 2017. Lecture Notes in Computer Science(), vol 10232. Springer, Cham. https://doi.org/10.1007/978-3-319-57186-7_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-57186-7_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57185-0
Online ISBN: 978-3-319-57186-7
eBook Packages: Computer ScienceComputer Science (R0)