Abstract
The influences of the operating system and system-specific effects on application performance are increasingly important considerations in high performance computing. OS kernel measurement is key to understanding the performance influences and the interrelationship of system and user-level performance factors. The KTAU (Kernel TAU) methodology and Linux-based framework provides parallel kernel performance measurement from both a kernel-wide and process-centric perspective. The first characterizes overall aggregate kernel performance for the entire system. The second characterizes kernel performance when it runs in the context of a particular process. KTAU extends the TAU performance system with kernel-level monitoring, while leveraging TAU’s measurement and analysis capabilities. We explain the rational and motivations behind our approach, describe the KTAU design and implementation, and show working examples on multiple platforms demonstrating the versatility of KTAU in integrated system/application monitoring.
Similar content being viewed by others
References
Petrini, F., Kerbyson, D.J., Pakin, S.: The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of asci q. In: SC ’03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p. 55. IEEE Computer Society, Washington (2003)
Jones, T., et al.: Improving the scalability of parallel jobs by adding parallel awareness to the operating system. In: SC ’03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing. IEEE Computer Society, Washington (2003)
TAU: Tuning and Analysis Utilities, http://www.cs.uoregon.edu/research/paracomp/tau/
Hollingsworth, J.K., Miller, B.P., Cargille, J.: Dynamic program instrumentation for scalable performance tools. Tech. Rep. CS-TR-1994-1207 (1994) [Online]. Available: citeseer.ist.psu.edu/75570.html
Tamches, A., Miller, B.P.: Fine-grained dynamic instrumentation of commodity operating system kernels. Oper. Syst. Des. Implement, 117–130 (1999)
Cantrill, B.M., Shapiro, M.W., Leventhal, A.H.: Dynamic instrumentation of production systems. In: USENIX ’04: Proceedings of the 2004 USENIX Annual Technical Conference, p. 13. USENIX, Boston (2004)
Yaghmour, K., Dagenais, M.R.: Measuring and characterizing system behavior using kernel-level event logging. In: USENIX ’00: Proceedings of the 2000 USENIX Annual Technical Conference, p. 15. USENIX, Boston (2000)
Wisniewski, R.W., Rosenburg, B.: Efficient, unified, and scalable performance monitoring for multiprocessor operating systems. [Online]. Available: citeseer.csail.mit.edu/675589.html
Richard, M.D., et al.: Efficient and accurate tracing of events in linux clusters. [Online]. Available: citeseer.ist.psu.edu/627702.html
Sgi kernprof, http://oss.sgi.com/projects/kernprof/
Ruan, Y., Pai, V.: Making the “box” transparent: System call performance as a first-class result. In: USENIX ’04: Proceedings of the 2004 USENIX Annual Technical Conference, p. 15. USENIX, Boston (2004)
Mirgorodskiy, A., Miller, B.P.: Crosswalk: A tool for performance profiling across the user-kernel boundary. [Online]. Available: citeseer.csail.mit.edu/692418.html
Etsion, Y., Tsafrir, D., Kirkpatrick, S., Feitelson, D.G.: Fine grained kernel logging with klogger: Experience and insights, Technical Report 2005-35. School of Computer Science and Engineering, The Hebrew University of Jerusalem (2005)
Sharma, S., Bridges, P.G., Maccabe, A.B.: A framework for analyzing linux system overheads on hpc applications. In: LACSI ’05: Proceedings of the 2005 Los Alamos Computer Science Institute Symposium, Santa Fe, NM, USA, p. 17 (2005)
Bell, R., Malony, A.D., Shende, S.: A portable, extensible, and scalable tool for parallel performance profile analysis. In: Lecture Notes in Computer Science, vol. 2790, pp. 17–26. Springer, Berlin (2003)
Nagel, W.E., Arnold, A., Weber, M., Hoppe, H.C., Solchenbach, K.: VAMPIR: Visualization and analysis of MPI resources. Supercomputer 12(1), 69–80 (1996). [Online]. Available: citeseer.ist.psu.edu/nagel96vampir.html
Zaki, O., Lusk, E., Gropp, W., Swider, D.: Toward scalable performance visualization with Jumpshot. Int. J. High Perform. Comput. Appl. 13(3), 277–288 (1999). [Online]. Available: citeseer.ist.psu.edu/zaki99toward.html
ZeptoOS: The small linux for big computers, http://www.mcs.anl.gov/zeptoos/
Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, D., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The nas parallel benchmarks. Int. J. Supercomput. Appl. 5(3), 63–73 (1991). [Online]. Available: citeseer.ist.psu.edu/bailey95nas.html
Hoisie, A., Lubeck, O.M., Wasserman, H.J., Petrini, F., Alme, H.: A general predictive performance model for wavefront algorithms on clusters of SMPs. In: International Conference on Parallel Processing, p. 219 (2000)
McVoy, L.W., Staelin, C.: lmbench: Portable tools for performance analysis. In: USENIX Annual Technical Conference, pp. 279–294 (1996). [Online]. Available: citeseer.ist.psu.edu/mcvoy96lmbench.html
Nataraj, A., Malony, A., Morris, A., Shende, S.: Early experiences with ktau on the ibm bg/l. In: EuroPar06 European Conference on Parallel Processing (2006)
Bhattacharya, S., Apte, V.: A measurement study of the linux tcp/ip stack performance and scalability on smp systems. In: 1st International Conference on COMmunication Systems softWAre and middlewaRE (COMSWARE) (2006)
Personal communication—Application Specific Linux, http://www.cs.ucsb.edu/~lyouseff/ASL.htm
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nataraj, A., Malony, A.D., Shende, S. et al. Integrated parallel performance views. Cluster Comput 11, 57–73 (2008). https://doi.org/10.1007/s10586-007-0051-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-007-0051-6