Abstract
The clouding server providers usually take workload consolidation to maximize server utilization. For eliminating performance interference due to the competition among multiple shared resources, resource partitioning becomes an important problem in daily commercial servers scenario. However, partitioning the critical multiple resources coordinately is particularly challenging due to the complex contention behaviors and the large search space to be explored for finding the optimal solution.
In this paper, we propose GCNPart, which focuses on allocating the optimal shared compete resource partition for colocated applications to optimize system performance. The existing resource partitioning frameworks lack analysis and good modeling of applications, resulting in inefficiencies or lack of generality. We formulate the resource partitioning problem as a sequential decision problem. GCNPart builds an accurate application performance model based on graph convolutional neural networks (GCN) to learn the mapping relationships from multiple resources to applications, and then constructs deep reinforcement learning (DRL) model to consider temporal information for real-time resource partitioning decisions. The extensive experiments evaluate that compared with the existing resource partitioning frameworks, GCNPart improves system throughput by 5.35% \(\sim \) 26.57%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
The python performance benchmark suite. https://pyperformance.readthedocs.io/ (2006)
The spec cpu®2006 benchmark suite. https://www.spec.org/cpu2006/ (2006)
The spec cpu®2017 benchmark suite. https://www.spec.org/cpu2017/ (2017)
Andrew, H., Abbasi, K.M., Marcel, C.: Introduction to memory bandwidth allocation. https://software.intel.com/en-us/articles/introduction-to-memory-bandwidth-allocation (2019)
Brownlee, J.: Gentle introduction to the adam optimization algorithm for deep learning. Machine Learning Mastery 3 (2017)
Chen, R., Wu, J., Shi, H., Li, Y., Liu, X., Wang, G.: DRLPart: a deep reinforcement learning framework for optimally efficient and robust resource partitioning on commodity servers. In: Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing, pp. 175–188. Association for Computing Machinery (2020)
Chen, S., Delimitrou, C., Martínez, F.J.: Parties: QoS-aware resource partitioning for multiple interactive services. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp. 107–120 (2019)
Cheng, Y., Chen, W., Wang, Z., Xiang, Y.: Precise contention-aware performance prediction on virtualized multicore system. J. Syst. Archit. 72, 42–50 (2017)
Delimitrou, C., Kozyrakis, C.: QoS-aware scheduling in heterogeneous datacenters with paragon
Delimitrou, C., Kozyrakis, C.: Paragon: QoS-aware scheduling for heterogeneous datacenters. In: Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). vol. 48, pp. 77–88. ACM (2013)
Delimitrou, C., Kozyrakis, C.: Quasar: resource-efficient and QoS-aware cluster management. ACM SIGPLAN Notices 49(4), 127–144 (2014)
Du, B., Wu, C., Huang, Z.: Learning resource allocation and pricing for cloud profit maximization. In: The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) (2019)
Dublish, S., Nagarajan, V., Topham, N.: Poise: Balancing thread-level parallelism and memory system performance in GPUs using machine learning. In: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 492–505 (2019)
El-Sayed, N., Mukkara, A., Tsai, P.A., Kasture, H., Ma, X., Sanchez, D.: Kpart: A hybrid cache partitioning-sharing technique for commodity multicores. In: 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 104–117. IEEE (2018)
Hammond, D.K., Vandergheynst, P., Gribonval, R.: Wavelets on graphs via spectral graph theory. Applied and Computational Harmonic Analysis 30(2), 129–150 (2011). https://doi.org/10.1016/j.acha.2010.04.005, https://www.sciencedirect.com/science/article/pii/S1063520310000552
Kasture, H., Sanchez, D.: Ubik: efficient cache sharing with strict QoS for latency-critical workloads. In: Proceedings of the 19th international conference on Architectural support for programming languages and operating systems (ASPLOS), vol. 49, pp. 729–742 (2014)
Li, S., Wang, L., Wang, W., Yu, Y., Li, B.: George: Learning to place long-lived containers in large clusters with operation constraints. In: Proceedings of the ACM Symposium on Cloud Computing, pp. 258–272 (2021)
Lo, D., Cheng, L., Govindaraju, R., Ranganathan, P., Kozyrakis, C.: Heracles: Improving resource efficiency at scale. In: International Symposium on Computer Architecture (ISCA), vol. 43, pp. 450–462. ACM (2015)
Mars, J., Tang, L., Hundt, R., Skadron, K., Soffa, M.L.: Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In: Proceedings of the 44th annual IEEE/ACM International Symposium on Microarchitecture, pp. 248–259 (2011)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML (2010)
Nguyen, K.T.: Introduction to cache allocation technology in the intel® xeon® processor e5 v4 family. https://software.intel.com/en-us/articles/introduction-to-cache-allocation-technology/ (2019)
Nikas, K., Papadopoulou, N., Giantsidi, D., Karakostas, V., Goumas, G., Koziris, N.: Dicer: Diligent cache partitioning for efficient workload consolidation. In: Proceedings of the 48th International Conference on Parallel Processing, p. 15 (2019)
Park, J., Park, S., Baek, W.: Copart: Coordinated partitioning of last-level cache and memory bandwidth for fairness-aware workload consolidation on commodity servers. In: Proceedings of the Fourteenth EuroSys Conference 2019, pp. 1–10 (2019)
Park, J., Park, S., Han, M., Hyun, J., Baek, W.: Hypart: A hybrid technique for practical memory bandwidth partitioning on commodity servers. In: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, pp. 1–14 (2018)
Patel, T., Tiwari, D.: Clite: Efficient and QoS-aware co-location of multiple latency-critical jobs for warehouse scale computers. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 193–206 (2020). https://doi.org/10.1109/HPCA47549.2020.00025
Pelikan, M., Sastry, K., Goldberg, D.E.: Scalability of the Bayesian optimization algorithm. Int. J. Approximate Reasoning 31(3), 221–258 (2002)
Qureshi, M.K., Patt, Y.N.: Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 423–432 (2006)
Roijers, D.M., Vamplew, P., Whiteson, S., Dazeley, R.: A survey of multi-objective sequential decision-making. J. Artif. Intell. Res. 48, 67–113 (2013)
Roy, R.B., Patel, T., Tiwari, D.: Satori: efficient and fair resource partitioning by sacrificing short-term benefits for long-term gains. In: 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), pp. 292–305. IEEE (2021)
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT press (2018)
Tang, L., Mars, J., Vachharajani, N., Hundt, R., Soffa, M.L.: The impact of memory subsystem resource sharing on datacenter applications. In: 2011 38th Annual International Symposium on Computer Architecture (ISCA), pp. 283–294. IEEE (2011)
Vaswani, A., et al.: Attention is all you need. In: Advances in neural information processing systems, pp. 5998–6008 (2017)
Wang, L., Weng, Q., Wang, W., Chen, C., Li, B.: Metis: Learning to schedule long-running applications in shared container clusters at scale. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–17. IEEE (2020)
Wu, Z., Pan, S., Chen, F., Long, G., Yu, P.S.: A comprehensive survey on graph neural networks (2019)
Xiang, Y., Wang, X., Huang, Z., Wang, Z., Luo, Y., Wang, Z.: Dcaps: dynamic cache allocation with partial sharing. In: Proceedings of the Thirteenth EuroSys Conference 2018, p. 13 (2018)
Xiao, J., Pimentel, A.D., Liu, X.: CPPF: A prefetch aware LLC partitioning approach. In: Proceedings of the 48th International Conference on Parallel Processing, pp. 1–10 (2019)
Xu, C., Rajamani, K., Ferreira, A., Felter, W., Rubio, J., Li, Y.: DCAT: dynamic cache management for efficient, performance-sensitive infrastructure-as-a-service. In: Proceedings of the Thirteenth EuroSys Conference 2018, p. 14 (2018)
Acknowledgment
This work is supported by Key-Area Research and Development Program of Guangdong Province 2021B0101310002; State Key Laboratory of Computer Architecture, ICT, CAS, under Grant No. CARCHB202013; National Science Foundation of China (62141412, 61872201); Science and Technology Development Plan of Tianjin (20JCZDJC00610); Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, R., Shi, H., Wu, J., Li, Y., Liu, X., Wang, G. (2023). GCNPart: Interference-Aware Resource Partitioning Framework with Graph Convolutional Neural Networks and Deep Reinforcement Learning. In: Meng, W., Lu, R., Min, G., Vaidya, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2022. Lecture Notes in Computer Science, vol 13777. Springer, Cham. https://doi.org/10.1007/978-3-031-22677-9_30
Download citation
DOI: https://doi.org/10.1007/978-3-031-22677-9_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22676-2
Online ISBN: 978-3-031-22677-9
eBook Packages: Computer ScienceComputer Science (R0)