Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

The Design and Implementation of a Verification Technique for GPU Kernels

Published: 22 May 2015 Publication History

Abstract

We present a technique for the formal verification of GPU kernels, addressing two classes of correctness properties: data races and barrier divergence. Our approach is founded on a novel formal operational semantics for GPU kernels termed <i>synchronous, delayed visibility (SDV)</i> semantics, which captures the execution of a GPU kernel by multiple groups of threads. The SDV semantics provides operational definitions for barrier divergence and for both inter- and intra-group data races. We build on the semantics to develop a method for reducing the task of verifying a massively parallel GPU kernel to that of verifying a sequential program. This completely avoids the need to reason about thread interleavings, and allows existing techniques for sequential program verification to be leveraged. We describe an efficient encoding of data race detection and propose a method for automatically inferring the loop invariants that are required for verification. We have implemented these techniques as a practical verification tool, GPUVerify, that can be applied directly to OpenCL and CUDA source code. We evaluate GPUVerify with respect to a set of 162 kernels drawn from public and commercial sources. Our evaluation demonstrates that GPUVerify is capable of efficient, automatic verification of a large number of real-world kernels.

References

[1]
Jade Alglave, Mark Batty, Alastair F. Donaldson, Ganesh Gopalakrishnan, Jeroen Ketema, Daniel Poetzl, Tyler Sorensen, and John Wickerson. 2015. GPU concurrency: Weak behaviours and programming assumptions. In Proceedings of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’15). ACM, New York, 577--591.
[2]
AMD. 2013. OpenCL Programming Guide, Revision 2.7. Retrieved from http://developer.amd.com/wordpress/media/2013/07/AMD&lowbar;Accelerated&lowbar;Parallel&lowbar;Processing&lowbar;OpenCL&lowbar;Programming&lowbar;Guide-rev-2.7.pdf.
[3]
Ethel Bardsley, Adam Betts, Nathan Chong, Peter Collingbourne, Pantazis Deligiannis, Alastair F. Donaldson, Jeroen Ketema, Daniel Liew, and Shaz Qadeer. 2014a. Engineering a static verification tool for GPU kernels. In Proceedings of the 26th International Conference on Computer Aided Verification (CAV 2014). Lecture Notes in Computer Science, Vol. 8559. Springer, Berlin, 226--242.
[4]
Ethel Bardsley and Alastair F. Donaldson. 2014. Warps and atomics: Beyond barrier synchronization in the verification of GPU kernels. In Proceedings of the 6th NASA Formal Methods Symposium (NFM’14). Lecture Notes in Computer Science, Vol. 8430. Springer, Berlin, 230--245.
[5]
Ethel Bardsley, Alastair F. Donaldson, and John Wickerson. 2014b. KernelInterceptor: Automating GPU kernel verification by intercepting kernels and their parameters. In Proceedings of the 2014 International Workshop on OpenCL (IWOCL’14). ACM, New York, Article 7, 5 pages.
[6]
Michael Barnett, Bor-Yuh Evan Chang, Robert DeLine, Bart Jacobs, and K. Rustan M. Leino. 2005. Boogie: A modular reusable verifier for object-oriented programs. In Revised Lectures of the 4th International Symposium on Formal Methods for Components and Objects (FMCO’05). Lecture Notes in Computer Science, Vol. 4111. Springer, Berlin, 364--387.
[7]
Clark Barrett, Christopher L. Conway, Morgan Deters, Liana Hadarean, Dejan Jovanovic, Tim King, Andrew Reynolds, and Cesare Tinelli. 2011. CVC4. In Proceedings of the 23rd International Conference in Computer Aided Verification (CAV’11). Lecture Notes in Computer Science, Vol. 6806. Springer, Berlin, 171--177.
[8]
Al Bessey, Ken Block, Benjamin Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles-Henri Gros, Asya Kamsky, Scott McPeak, and Dawson R. Engler. 2010. A few billion lines of code later: Using static analysis to find bugs in the real world. Commun. ACM 53, 2 (2010), 66--75.
[9]
Adam Betts, Nathan Chong, Alastair F. Donaldson, Shaz Qadeer, and Paul Thomson. 2012. GPUVerify: A verifier for GPU kernels. In Proceedings of the 27th ACM International Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA’12). ACM, New York, 113--132.
[10]
Dirk Beyer, Thomas A. Henzinger, Rupak Majumdar, and Andrey Rybalchenko. 2007. Path invariants. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’07). ACM, New York, 300--309.
[11]
Stefan Blom, Marieke Huisman, and Matej Mihelčić. 2014. Specification and verification of GPGPU programs. Science of Computer Programming 95, 3 (2014), 376--388.
[12]
Michael Boyer, Kevin Skadron, and Westley Weimer. 2008. Automated dynamic analysis of CUDA programs. In Proceedings of the 3rd Workshop on Software Tools for MultiCore Systems (STMCS’08). Retrieved from http://people.csail.mit.edu/rabbah/conferences/08/cgo/stmcs/.
[13]
Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. 2008. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI’08). USENIX Association, Berkeley, CA, 209--224.
[14]
Joshua E. Cates, Aaron E. Lefohn, and Ross T. Whitaker. 2004. GIST: An interactive, GPU-based level set segmentation tool for 3D medical images. Medical Image Analysis 8 (2004), 217--231. Issue 3.
[15]
Wei-Fan Chiang, Ganesh Gopalakrishnan, Guodong Li, and Zvonimir Rakamaric. 2013. Formal analysis of GPU programs with atomics via conflict-directed delay-bounding. In Proceedings of 5th International NASA Formal Methods Symposium (NFM’13). Lecture Notes in Computer Science, Vol. 7871. Springer, Berlin, 213--228.
[16]
Nathan Chong, Alastair F. Donaldson, Paul Kelly, Shaz Qadeer, and Jeroen Ketema. 2013. Barrier invariants: A shared state abstraction for the analysis of data-dependent GPU kernels. In Proceedings of the 28th ACM International Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA’13). ACM, New York, 605--622.
[17]
Nathan Chong, Alastair F. Donaldson, and Jeroen Ketema. 2014. A sound and complete abstraction for reasoning about parallel prefix sums. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’14). ACM, New York, 397--410.
[18]
Ching-Tsun Chou, Phanindra K. Mannava, and Seungjoon Park. 2004. A simple method for parameterized verification of cache coherence protocols. In Proceedings of the 5th International Conference on Formal Methods in Computer-Aided Design (FMCAD’04) Lecture Notes in Computer Science, Vol. 3312. Springer, Berlin, 382--398.
[19]
Peter Collingbourne, Cristian Cadar, and Paul H. J. Kelly. 2012. Symbolic testing of OpenCL code. In Revised Selected Papers of the 7th International Haifa Verification Conference (HVC’11) Lecture Notes in Computer Science, Vol. 7261. Springer, Berlin, 203--218.
[20]
Peter Collingbourne, Alastair F. Donaldson, Jeroen Ketema, and Shaz Qadeer. 2013. Interleaving and lock-step semantics for analysis and verification of GPU kernels. In Proceedings of the 22nd European Symposium on Programming (ESOP’13). Lecture Notes in Computer Science, Vol. 7792. Springer, Berlin, 270--289.
[21]
Patrick Cousot and Radhia Cousot. 1977. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL’77). ACM, New York, 238--252.
[22]
Leonardo Mendonça de Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’08). Lecture Notes in Computer Science, Vol. 4963. Springer, Berlin, 337--340.
[23]
Alastair F. Donaldson. 2014. The GPUVerify method: A tutorial overview. Electronic Communications of the EASST 70, Article 1 (2014), 16 pages. Retrieved from http://journal.ub.tu-berlin.de/eceasst/article/view/986.
[24]
Alastair F. Donaldson, Daniel Kroening, and Philipp Rümmer. 2010. Automatic analysis of scratch-pad memory code for heterogeneous multicore processors. In Proceedings of the 16th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS’10). Lecture Notes in Computer Science, Vol. 6015. Springer, Berlin, 280--295.
[25]
Alastair F. Donaldson, Daniel Kroening, and Philipp Rümmer. 2011. Automatic analysis of DMA races using model checking and k-induction. Formal Methods in System Design 39, 1 (2011), 83--113.
[26]
Michael Emmi, Shaz Qadeer, and Zvonimir Rakamaric. 2011. Delay-bounded scheduling. In Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2011). ACM, New York, 411--422.
[27]
Michael D. Ernst, Jeff H. Perkins, Philip J. Guo, Stephen McCamant, Carlos Pacheco, Matthew S. Tschantz, and Chen Xiao. 2007. The Daikon system for dynamic detection of likely invariants. Science of Computer Programming 69, 1--3 (2007), 35--45.
[28]
Cormac Flanagan and K. Rustan M. Leino. 2001. Houdini, an annotation assistant for ESC/Java. In Proceedings of the International Symposium of Formal Methods Europe (FME’01). Lecture Notes in Computer Science, Vol. 2021. Springer, Berlin, 500--517.
[29]
Susanne Graf and Hassen Saïdi. 1997. Construction of abstract state graphs with PVS. In Proceedings of the 9th International Conference on Computer Aided Verification (CAV’97). Lecture Notes in Computer Science, Vol. 1254. Springer, Berlin, 72--83.
[30]
Axel Habermaier and Alexander Knapp. 2012. On the correctness of the SIMT execution model of GPUs. In Proceedings of the 21st European Symposium on Programming (ESOP’12). Lecture Notes in Computer Science, Vol. 7211. Springer, Berlin, 316--335.
[31]
Mark J. Harris. 2004. Fast fluid dynamics simulation on the GPU. In GPU Gems. Addison-Wesley, Boston, MA, 637--665.
[32]
Temesghen Kahsai, Yeting Ge, and Cesare Tinelli. 2011. Instantiation-based invariant discovery. In Proceedings of 3rd International NASA Formal Methods Symposium (NFM’11). Lecture Notes in Computer Science, Vol. 6617. Springer, Berlin, 192--206.
[33]
Rajesh K. Karmani, P. Madhusudan, and Brandon M. Moore. 2011. Thread contracts for safe parallelism. In Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP’11). ACM, New York, 125--134.
[34]
Khronos OpenCL Working Group. 2012. The OpenCL Specification, Version 1.2. Document revision 19. Retrieved from https://www.khronos.org/registry/cl/specs/opencl-1.2.pdf.
[35]
Khronos OpenCL Working Group. 2014a. The OpenCL C Specification, Version 2.0. Document revision 26. Retrieved from https://www.khronos.org/registry/cl/specs/opencl-2.0-openclc.pdf.
[36]
Khronos OpenCL Working Group. 2014b. The OpenCL Extension Specification, Version 2.0. Document revision 26. Retrieved from https://www.khronos.org/registry/cl/specs/opencl-2.0-extensions.pdf.
[37]
Petr Klus, Simon Lam, Dag Lyberg, Ming Sin Cheung, Graham Pullan, Ian McFarlane, Giles S. H. Yeo, and Brian Y. H. Lam. 2012. BarraCUDA—A fast short read sequence aligner using graphics processing units. BMC Research Notes 5, Article 27 (2012), 7 pages.
[38]
Kensuke Kojima and Atsushi Igarashi. 2013. A Hoare logic for SIMT programs. In Proceeding of the 11th Asian Symposium on Programming Languages and Systems (APLAS’13). Lecture Notes in Computer Science, Vol. 8301. Springer, Berlin, 58--73.
[39]
Shuvendu K. Lahiri and Shaz Qadeer. 2009. Complexity and algorithms for monomial and clausal predicate abstraction. In Proceedings of the 22nd International Conference on Automated Deduction (CADE-22). Lecture Notes in Computer Science, Vol. 5663. Springer, Berlin, 214--229.
[40]
K. Rustan M. Leino, Greg Nelson, and James B. Saxe. 2000. ESC/Java User’s Manual. Technical Report SRC Technical Note 2000-002. Compaq Systems Research Center.
[41]
Alan Leung, Manish Gupta, Yuvraj Agarwal, Rajesh Gupta, Ranjit Jhala, and Sorin Lerner. 2012. Verifying GPU kernels by test amplification. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’12). ACM, New York, 383--394.
[42]
Guodong Li and Ganesh Gopalakrishnan. 2010. Scalable SMT-based verification of GPU kernel functions. In Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’10). ACM, New York, 187--196.
[43]
Guodong Li, Peng Li, Geoffrey Sawaya, Ganesh Gopalakrishnan, Indradeep Ghosh, and Sreeranga P. Rajan. 2012b. GKLEE: Concolic verification and test generation for GPUs. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP’12). ACM, New York, 215--224.
[44]
Heng Li and Richard Durbin. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 14 (2009), 1754--1760.
[45]
Peng Li, Guodong Li, and Ganesh Gopalakrishnan. 2012a. Parametric flows: Automated behavior equivalencing for symbolic analysis of races in CUDA programs. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’12). IEEE, Los Alamitos, CA.
[46]
Anton Lokhmotov. 2011. Mobile and Embedded Computing on Mali GPUs. In Proceedings of the 2nd UK GPU Computing Conference.
[47]
Kenneth McMillan. 1999. Verification of infinite state systems by compositional model checking. In Proceedings of the 10th Conference on Correct Hardware Design and Verification Methods (CHARME’99). Lecture Notes in Computer Science, Vol. 1703. Springer, Berlin, 219--234.
[48]
Kenneth L. McMillan. 2006. Lazy abstraction with interpolants. In Proceedings of the 18th International Conference on Computer Aided Verification (CAV’06). Lecture Notes in Computer Science, Vol. 4144. Springer, Berlin, 123--136.
[49]
Microsoft Corporation. 2012. C&plus;&plus; AMP: Language and Programming Model, Version 1.0. Retrieved from http://download.microsoft.com/download/4/0/E/40EA02D8-23A7-4BD2-AD3A-0BFFFB640F28/CppAMPLanguageAndProgrammingModel.pdf.
[50]
Tobias Nipkow, Lawrence C. Paulson, and Markus Wenzel. 2002. Isabelle/HOL: A Proof Assistant for Higher-Order Logic. Lecture Notes in Computer Science, Vol. 2283. Springer, Berlin.
[51]
Nvidia. 2012a. CUDA C Programming Guide, Version 5.0. http://docs.nvidia.com/cuda/cuda-c-programming-guide/.
[52]
Nvidia. 2012b. Parallel Thread Execution ISA, Version 3.1.
[53]
Lars Nyland. 2012. Personal Communication.
[54]
Lars Nyland, Mark Harris, and Jan Prins. 2007. Fast N-Body Simulation with CUDA. Addison-Wesley, Upper Saddle River, NJ, 677--696.
[55]
Renato F. Salas-Moreno, Richard A. Newcombe, Hauke Strasdat, Paul H. J. Kelly, and Andrew J. Davison. 2013. SLAM&plus;&plus;: Simultaneous localisation and mapping at the level of objects. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13). IEEE, Los Alamitos, CA, 1352--1359.
[56]
Saurabh Srivastava and Sumit Gulwani. 2009. Program verification using templates over predicate abstraction. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’09). ACM, New York, 223--234.
[57]
Bjarne Steensgaard. 1996. Points-to analysis in almost linear time. In Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL’96). ACM, New York, 32--41.
[58]
Murali Talupur and Mark R. Tuttle. 2008. Going with the flow: Parameterized verification using message flows. In Proceedings of the 8th International Conference on Formal Methods in Computer-Aided Design (FMCAD’08). IEEE, Los Alamitos, CA.
[59]
Stavros Tripakis, Christos Stergiou, and Roberto Lublinerman. 2010. Checking equivalence of SPMD programs using non-interference. In Proceedings of the 2nd USENIX Workshop on Hot Topics in Parallelism (HotPar’10). Retrieved from http://static.usenix.org/events/hotpar10/.
[60]
Christian Urban and Julien Narboux. 2009. Formal SOS-proofs for the lambda-calculus. Electronic Notes in Theoretical Computer Science 247 (2009), 139--155.
[61]
John Wickerson. 2014. Syntax and semantics of a GPU kernel programming language. Archive of Formal Proofs. Retrieved from http://afp.sourceforge.net/entries/GPU_Kernel_PL.shtml.

Cited By

View all
  • (2024)Sound and Partially-Complete Static Analysis of Data-Races in GPU ProgramsProceedings of the ACM on Programming Languages10.1145/36897978:OOPSLA2(2434-2461)Online publication date: 8-Oct-2024
  • (2023)GPURepair: Automated Repair of GPU Kernels (Extended Version)Sādhanā10.1007/s12046-023-02291-049:1Online publication date: 22-Dec-2023
  • (2023)Memory access protocols: certified data-race freedom for GPU kernelsFormal Methods in System Design10.1007/s10703-023-00415-0Online publication date: 26-May-2023
  • Show More Cited By

Index Terms

  1. The Design and Implementation of a Verification Technique for GPU Kernels

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Programming Languages and Systems
      ACM Transactions on Programming Languages and Systems  Volume 37, Issue 3
      June 2015
      134 pages
      ISSN:0164-0925
      EISSN:1558-4593
      DOI:10.1145/2785583
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 22 May 2015
      Accepted: 01 March 2015
      Revised: 01 November 2014
      Received: 01 April 2014
      Published in TOPLAS Volume 37, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. GPUs
      2. Verification
      3. barrier synchronization
      4. concurrency
      5. data races

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • EU FP7 STREP project CARP (project number 287767)
      • EPSRC-funded PhD studentships
      • EPSRC project EP/G051100/2

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)136
      • Downloads (Last 6 weeks)16
      Reflects downloads up to 04 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Sound and Partially-Complete Static Analysis of Data-Races in GPU ProgramsProceedings of the ACM on Programming Languages10.1145/36897978:OOPSLA2(2434-2461)Online publication date: 8-Oct-2024
      • (2023)GPURepair: Automated Repair of GPU Kernels (Extended Version)Sādhanā10.1007/s12046-023-02291-049:1Online publication date: 22-Dec-2023
      • (2023)Memory access protocols: certified data-race freedom for GPU kernelsFormal Methods in System Design10.1007/s10703-023-00415-0Online publication date: 26-May-2023
      • (2023)Model Checking Race-Freedom When “Sequential Consistency for Data-Race-Free Programs” is GuaranteedComputer Aided Verification10.1007/978-3-031-37703-7_13(265-287)Online publication date: 17-Jul-2023
      • (2022)Provable GPU Data-Races in Static Race DetectionElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.356.4356(36-45)Online publication date: 24-Mar-2022
      • (2022)Symbolic identification of shared memory based bank conflicts for GPUsJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2022.102518127:COnline publication date: 1-Jun-2022
      • (2021)Specifying and testing GPU workgroup progress modelsProceedings of the ACM on Programming Languages10.1145/34855085:OOPSLA(1-30)Online publication date: 15-Oct-2021
      • (2021)iGUARDProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles10.1145/3477132.3483545(49-65)Online publication date: 26-Oct-2021
      • (2021)Static detection of uncoalesced accesses in GPU programsFormal Methods in System Design10.1007/s10703-021-00362-860:1(1-32)Online publication date: 5-Mar-2021
      • (2021)Checking Data-Race Freedom of GPU Kernels, CompositionallyComputer Aided Verification10.1007/978-3-030-81685-8_19(403-426)Online publication date: 20-Jul-2021
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media