Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A Similarity Measure for GPU Kernel Subgraph Matching

  • Conference paper
  • First Online:
Languages and Compilers for Parallel Computing (LCPC 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11882))

Abstract

Accelerator architectures specialize in executing SIMD (single instruction, multiple data) in lockstep. Because the majority of CUDA applications are parallelized loops, control flow information can provide an in-depth characterization of a kernel. CUDAflow is a tool that statically separates CUDA binaries into basic block regions and dynamically measures instruction and basic block frequencies. CUDAflow captures this information in a control flow graph (CFG) and performs subgraph matching across various kernel’s CFGs to gain insights into an application’s resource requirements, based on the shape and traversal of the graph, instruction operations executed and registers allocated, among other information. The utility of CUDAflow is demonstrated with SHOC and Rodinia application case studies on a variety of GPU architectures, revealing novel control flow characteristics that facilitate end users, autotuners, and compilers in generating high performing code.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Adhianto, L., et al.: HPCToolkit: tools for performance analysis of optimized parallel programs. Concurr. Comput. Pract. Exp. 22(6), 685–701 (2010)

    Google Scholar 

  2. Ammons, G., Ball, T., Larus, J.R.: Exploiting hardware performance counters with flow and context sensitive profiling. ACM Sigplan Not. 32(5), 85–96 (1997)

    Article  Google Scholar 

  3. Ball, T., Larus, J.R.: Optimally profiling and tracing programs. ACM Trans. Program. Lang. Syst. (TOPLAS) 16(4), 1319–1360 (1994)

    Article  Google Scholar 

  4. Böhm, C., Jacopini, G.: Flow diagrams, turing machines and languages with only two formation rules. Commun. ACM 9(5), 366–371 (1966)

    Article  Google Scholar 

  5. Borgelt, C., Berthold, M.R.: Mining molecular fragments: finding relevant substructures of molecules. In: Proceedings of the IEEE International Conference on Data Mining, pp. 51–58. IEEE (2002)

    Google Scholar 

  6. Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization, IISWC 2009, pp. 44–54. IEEE (2009)

    Google Scholar 

  7. Collective Knowledge (CK). http://cknowledge.org

  8. Csardi, G., Nepusz, T.: The iGraph software package for complex network research

    Google Scholar 

  9. Danalis, A., et al.: The scalable heterogeneous computing (SHOC) benchmark suite. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pp. 63–74. ACM (2010)

    Google Scholar 

  10. Allinea DDT. http://www.allinea.com/products/ddt

  11. Diamos, G., Ashbaugh, B., Maiyuran, S., Kerr, A., Wu, H., Yalamanchili, S.: SIMD re-convergence at thread frontiers. In: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 477–488. ACM (2011)

    Google Scholar 

  12. Farooqui, N., Kerr, A., Eisenhauer, G., Schwan, K., Yalamanchili, S.: Lynx: a dynamic instrumentation system for data-parallel applications on GPGPU architectures. In: International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 58–67. IEEE (2012)

    Google Scholar 

  13. Gonzales, R.C., Woods, R.E.: Digital Image Processing. Addison-Wesley, Reading (1993)

    Google Scholar 

  14. Huan, J., Wang, W., Prins, J.: Efficient mining of frequent subgraphs in the presence of isomorphism. In: Third IEEE International Conference on Data Mining, ICDM 2003, pp. 549–552. IEEE (2003)

    Google Scholar 

  15. Koutra, D., Vogelstein, J.T., Faloutsos, C.: DeltaCon: a principled massive-graph similarity function. SIAM

    Google Scholar 

  16. Lim, R., Carrillo-Cisneros, D., Alkowaileet, W., Scherson, I.: Computationally efficient multiplexing of events on hardware counters. In: Linux Symposium (2014)

    Google Scholar 

  17. Lim, R., Malony, A., Norris, B., Chaimov, N.: Identifying optimization opportunities within kernel execution in GPU codes. In: Hunold, S., et al. (eds.) Euro-Par 2015. LNCS, vol. 9523, pp. 185–196. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27308-2_16

    Chapter  Google Scholar 

  18. Lim, R., Norris, B., Malony, A.: Autotuning GPU kernels via static and predictive analysis. In: 2017 46th International Conference on Parallel Processing (ICPP), pp. 523–532. IEEE (2017)

    Google Scholar 

  19. Marin, G., Dongarra, J., Terpstra, D.: MIAMI: A framework for application performance diagnosis. In: 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 158–168. IEEE (2014)

    Google Scholar 

  20. Miller, B.P., et al.: The paradyn parallel performance measurement tool. Computer 28(11), 37–46 (1995)

    Article  Google Scholar 

  21. Nvidia Visual Profiler. https://developer.nvidia.com/nvidia-visual-profiler

  22. Sabne, A., Sakdhnagool, P., Eigenmann, R.: Formalizing structured control flow graphs. In: Ding, C., Criswell, J., Wu, P. (eds.) LCPC 2016. LNCS, vol. 10136, pp. 153–168. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52709-3_13

    Chapter  Google Scholar 

  23. Sarkar, V.: Determining average program execution times and their variance. In: ACM SIGPLAN Notices, vol. 24, pp. 298–312. ACM (1989)

    Google Scholar 

  24. Shende, S.S., Malony, A.D.: The TAU parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)

    Article  Google Scholar 

  25. Singh, R., Xu, J., Berger, B.: Pairwise global alignment of protein interaction networks by matching neighborhood topology. In: Speed, T., Huang, H. (eds.) RECOMB 2007. LNCS, vol. 4453, pp. 16–31. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71681-5_2

    Chapter  Google Scholar 

  26. Sreepathi, S., et al.: Application characterization using Oxbow toolkit and PADS infrastructure. In: Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing, pp. 55–63. IEEE Press (2014)

    Google Scholar 

  27. Williams, M.H., Ossher, H.: Conversion of unstructured flow diagrams to structured form. Comput. J. 21(2), 161–167 (1978)

    Article  Google Scholar 

  28. Wu, H., Diamos, G., Li, S., Yalamanchili, S.: Characterization and transformation of unstructured control flow in GPU applications. In: 1st International Workshop on Characterizing Applications for Heterogeneous Exascale Systems (2011)

    Google Scholar 

  29. Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: Proceedings of 2002 IEEE International Conference on Data Mining, ICDM 2003, pp. 721–724. IEEE (2002)

    Google Scholar 

  30. Zhang, F., D’Hollander, E.H.: Using hammock graphs to structure programs. IEEE Trans. Softw. Eng. 30(4), 231–245 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robert Lim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lim, R., Norris, B., Malony, A. (2019). A Similarity Measure for GPU Kernel Subgraph Matching. In: Hall, M., Sundar, H. (eds) Languages and Compilers for Parallel Computing. LCPC 2018. Lecture Notes in Computer Science(), vol 11882. Springer, Cham. https://doi.org/10.1007/978-3-030-34627-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34627-0_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34626-3

  • Online ISBN: 978-3-030-34627-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics