Abstract
Understanding the characteristics of scientific computing programs has been of great importance due to its close relationship with the design and implementation of program optimization methods. Generally, scientific computing programs can be divided into three categories according to their computing, memory access and communication characteristics, namely compute-intensive, memory-intensive and communication-intensive, respectively. There are more than one commonly used program classification methods, particularly for compute-intensive and memory-intensive programs. In most cases, all kinds of classification methods have consistent results but occasionally different classification results also occur. Why are there occasionally inconsistent classification results and where? How to understand such inconsistencies and what is the reason behind that? We answer these questions by analyzing four representative program classification methods (IPC, MPKI, MEM/Uop and Roofline) on two platforms. Firstly, we discover some occasional inconsistency cases, the inconsistency from various indicators, the inconsistency from multi-phase characteristics and the inconsistency from various platforms, followed by some possible reasons. Secondly, we explore the impact of threshold settings on classification inconsistencies. All the experiment and analysis results and the data collected from other references prove that different classification methods have the same classification results in most cases but occasionally bring about inconsistencies especially for in-between programs that are between memory-intensive and compute-intensive programs, which have a bad impact on some optimization algorithms.
This work is supported in part by the Advanced Research Project of China under grant number 31511010203 and the Research Program of NUDT grant number ZK18-03-10.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Intel\(\textregistered \) vtune\(^{\rm TM}\) amplifier (2019). https://software.intel.com/en-us/vtune
Perf (2019). https://perf.wiki.kernel.org/index.php/Main_Page
Alcaraz, J., Sikora, A., Cesar, E.: Dynamic tuning of openmp memory bound applications in multisocket systems using mate. In: Proceedings of the 47th International Conference on Parallel Processing Companion, ICPP 2018. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3229710.3229748. https://doi-org-s.nudtproxy.yitlink.com/10.1145/3229710.3229748
Bailey, D., et al.: The NAS parallel benchmarks. Int. J. High Perform. Comput. Appl. 5, 63–73 (1991). https://doi.org/10.1177/109434209100500306
Begum, R., Werner, D., Hempstead, M., Prasad, G., Challen, G.: Energy-performance trade-offs on energy-constrained devices with multi-component DVFs. In: 2015 IEEE International Symposium on Workload Characterization (IISWC) (2015)
Crovella, M., Bianchini, R., LeBlanc, T., Markatos, E., Wisniewski, R.: Using communication-to-computation ratio in parallel program design and performance prediction. In: Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing, pp. 238–245 (1992)
Denning, P.J.: The working set model for program behavior. Commun. ACM 11(5), 323–333 (1968)
Ge, R., Zou, P., Feng, X.: [IEEE 2017 46th International Conference on Parallel Processing (ICPP) - Bristol, United Kingdom (14 August 2017–17 August 2017)] 2017 46th International Conference on Parallel Processing (ICPP) - Application-Aware Power Coordination on Power Bounded Numa Multicore, pp. 591–600 (2017)
Hashemi, M., Mutlu, O., Patt, Y.: Continuous runahead: transparent hardware acceleration for memory intensive workloads, pp. 1–12 (2016). https://doi.org/10.1109/MICRO.2016.7783764
Hashemi, M., Mutlu, O., Patt, Y.N.: Continuous runahead: transparent hardware acceleration for memory intensive workloads (2016)
Hashemi, M., Patt, Y.N.: Filtered runahead execution with a runahead buffer. In: Proceedings of the 48th International Symposium on Microarchitecture, pp. 358–369 (2015)
Huang, S., Feng, W.: Energy-efficient cluster computing via accurate workload characterization. In: 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, CCGrid 2009, Shanghai, China, 18–21 May 2009 (2009)
Isci, C., Contreras, G., Martonosi, M.: Live, runtime phase monitoring and prediction on real systems with application to dynamic power management, pp. 359–370 (2006). https://doi.org/10.1109/MICRO.2006.30
Jang, H., Lee, J., Kong, J., Suh, T., Chung, S.: Leveraging process variation for performance and energy: in the perspective of overclocking. IEEE Trans. Comput. 63, 1 (2014). https://doi.org/10.1109/TC.2012.286
Chen, J., et al.: Analyzing time-dimension communication characterizations for representative scientific applications on supercomputer systems. Front. Comput. Sci. 13(6), 1228–1242 (2019)
Konstantinidis, E., Cotronis, Y.: A practical performance model for compute and memory bound GPU kernels (2015). https://doi.org/10.1109/PDP.2015.51
Loew, J., Ponomarev, D.: Two-level reorder buffers: accelerating memory-bound applications on SMT architectures. In: 2008 37th International Conference on Parallel Processing, pp. 182–189 (2008)
Luszczek, P., et al.: The HPC challenge (HPCC) benchmark suite, p. 213 (2006). https://doi.org/10.1145/1188455.1188677
Maron, B., Chen, T., Vianney, D., Olszewski, B., Kunkel, S., Mericas, A.: Workload characterization for the design of future servers. In: IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, pp. 129–136 (2005). https://doi.org/10.1109/IISWC.2005.1526009
Tikir, M.M., Carrington, L., Strohmaier, E., Snavely, A.: A genetic algorithms approach to modeling the performance of memory-bound computations. In: SC 2007: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, pp. 1–12 (2007)
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52, 65–76 (2009). https://doi.org/10.1145/1498765.1498785
Wu, F., et al.: A holistic energy-efficient approach for a processor-memory system. Tsinghua Sci. Technol. 24(4), 468–483 (2019)
Wu, Q., et al.: A dynamic compilation framework for controlling microprocessor energy and performance. In: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (2005)
Dong, Y., Chen, J., Tang, Y., Wu, J., Wang, H., Zhou, E.: Lazy scheduling based disk energy optimization method. Tsinghua Sci. Technol. 25(2), 203–216 (2020)
Zhou, H., Conte, T.M.: Enhancing memory-level parallelism via recovery-free value prediction. IEEE Trans. Comput. 54(7), 897–912 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Qi, X., Yuan, Y., Chen, J., Dong, Y. (2020). How to Evaluate Various Commonly Used Program Classification Methods?. In: Dong, D., Gong, X., Li, C., Li, D., Wu, J. (eds) Advanced Computer Architecture. ACA 2020. Communications in Computer and Information Science, vol 1256. Springer, Singapore. https://doi.org/10.1007/978-981-15-8135-9_17
Download citation
DOI: https://doi.org/10.1007/978-981-15-8135-9_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-8134-2
Online ISBN: 978-981-15-8135-9
eBook Packages: Computer ScienceComputer Science (R0)