Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Skip header Section
CUDA by Example: An Introduction to General-Purpose GPU ProgrammingJuly 2010
Publisher:
  • Addison-Wesley Professional
ISBN:978-0-13-138768-3
Published:29 July 2010
Pages:
312
Skip Bibliometrics Section
Bibliometrics
Skip Abstract Section
Abstract

This book is required reading for anyone working with accelerator-based computing systems. From the Foreword by Jack Dongarra, University of Tennessee and Oak Ridge National Laboratory CUDA is a computing architecture designed to facilitate the development of parallel programs. In conjunction with a comprehensive software platform, the CUDA Architecture enables programmers to draw on the immense power of graphics processing units (GPUs) when building high-performance applications. GPUs, of course, have long been available for demanding graphics and game applications. CUDA now brings this valuable resource to programmers working on applications in other domains, including science, engineering, and finance. No knowledge of graphics programming is requiredjust the ability to program in a modestly extended version of C. CUDA by Example, written by two senior members of the CUDA software platform team, shows programmers how to employ this new technology. The authors introduce each area of CUDA development through working examples. After a concise introduction to the CUDA platform and architecture, as well as a quick-start guide to CUDA C, the book details the techniques and trade-offs associated with each key CUDA feature. Youll discover when to use each CUDA C extension and how to write CUDA software that delivers truly outstanding performance. Major topics covered include Parallel programming Thread cooperation Constant memory and events Texture memory Graphics interoperability Atomics Streams CUDA C on multiple GPUs Advanced atomics Additional CUDA resources All the CUDA software tools youll need are freely available for download from NVIDIA.http://developer.nvidia.com/object/cuda-by-example.html

Cited By

  1. ACM
    Schor A and Kim T Into the Portal: Directable Fractal Self-Similarity ACM SIGGRAPH 2024 Conference Papers, (1-8)
  2. ACM
    Park J, Shin Y and Shin W Turbo-CF: Matrix Decomposition-Free Graph Filtering for Fast Recommendation Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, (2672-2676)
  3. Tong V, Dao C, Tran H, Tran T and Souihi S (2024). Enhancing BERT-Based Language Model for Multi-label Vulnerability Detection of Smart Contract in Blockchain, Journal of Network and Systems Management, 32:3, Online publication date: 1-Jul-2024.
  4. Cao H, Xu J, Li D, Shangguan L, Liu Y and Yang Z (2023). Edge Assisted Mobile Semantic Visual SLAM, IEEE Transactions on Mobile Computing, 22:12, (6985-6999), Online publication date: 1-Dec-2023.
  5. ACM
    Zhao Z, Ling N, Guan N and Xing G Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU Proceedings of the 21st ACM Conference on Embedded Networked Sensor Systems, (97-110)
  6. ACM
    Fumero J, Blanaru F, Stratikopoulos A, Dohrmann S, Viswanathan S and Kotselidis C Unified Shared Memory: Friend or Foe? Understanding the Implications of Unified Memory on Managed Heaps Proceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes, (143-157)
  7. Hu Y, Zhang F, Xia Y, Yao Z, Zeng L, Ding H, Wei Z, Zhang X, Zhai J, Du X and Ma S (2023). Enabling Efficient Random Access to Hierarchically Compressed Text Data on Diverse GPU Platforms, IEEE Transactions on Parallel and Distributed Systems, 34:10, (2699-2717), Online publication date: 1-Oct-2023.
  8. Zeng D, Zhu A, Gu L, Li P, Chen Q and Guo M (2023). Enabling Efficient Spatio-Temporal GPU Sharing for Network Function Virtualization, IEEE Transactions on Computers, 72:10, (2963-2977), Online publication date: 1-Oct-2023.
  9. ACM
    Xu W, Sun Y, Fan S, Yu H and Fu X (2023). Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUs, ACM Transactions on Architecture and Code Optimization, 20:3, (1-26), Online publication date: 30-Sep-2023.
  10. ACM
    Du Bois A and Cavalheiro G GPotion: An embedded DSL for GPU programming in Elixir Proceedings of the XXVII Brazilian Symposium on Programming Languages, (1-8)
  11. Zhuo Y, Zhang T, Du F and Liu R (2023). A parallel particle swarm optimization algorithm based on GPU/CUDA, Applied Soft Computing, 144:C, Online publication date: 1-Sep-2023.
  12. ACM
    Zhang B, Tian J, Di S, Yu X, Feng Y, Liang X, Tao D and Cappello F FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, (129-142)
  13. Li P, Zhu W, Chen J, Yao S, Hsu C and Xiong G (2023). High-speed implementation of rainbow table method on heterogeneous multi-device architecture, Future Generation Computer Systems, 143:C, (293-304), Online publication date: 1-Jun-2023.
  14. ACM
    Hu Y A Convolutional Neural Network Acceleration Method Based on 1-D Fast Fourier Transform Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things, (811-815)
  15. Sequeiros C, Otero-Muras I, Vázquez C and Banga J (2023). Global Optimization Approach for Parameter Estimation in Stochastic Dynamic Models of Biosystems, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 20:3, (1971-1982), Online publication date: 1-May-2023.
  16. Pan Z, Gao X and Wu K (2023). First-order topology optimization via inexact Finite Element Analysis, Computer-Aided Design, 157:C, Online publication date: 1-Apr-2023.
  17. Zhang F, Hu Y, Ding H, Yao Z, Wei Z, Zhang X and Du X Optimizing random access to hierarchically-compressed data on GPU Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, (1-15)
  18. Lin F, Xu Y, Zhang Z, Gao C and Yamada K (2022). Cosmos Propagation Network, Neurocomputing, 507:C, (221-234), Online publication date: 1-Oct-2022.
  19. Khalilov M, Timofeev A and Polyakov D Towards OpenUCX and GPUDirect Technology Support for the Angara Interconnect Supercomputing, (591-603)
  20. Gungon R, Hernandez K, Cabarle F, de la Cruz R, Adorna H, Martínez-del-Amor M, Orellana-Martín D and Pérez-Hurtado I (2022). GPU implementation of evolving spiking neural P systems, Neurocomputing, 503:C, (140-161), Online publication date: 7-Sep-2022.
  21. Sibai F, El-Moursy A, Asaduzzaman A and Majzoub S (2021). Hardware Acceleration of the STRIKE String Kernel Algorithm for Estimating Protein to Protein Interactions, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 19:4, (2272-2283), Online publication date: 1-Jul-2022.
  22. Tusor B, Gubo S and Várkonyi-Kóczy A Fuzzy Inference Speed Enhancement for Low Budget Computers using Hash Indices 2022 IEEE International Symposium on Medical Measurements and Applications (MeMeA), (1-2)
  23. Stokfiszewski K, Wieloch K and Yatsymirskyy M (2022). An efficient implementation of one-dimensional discrete wavelet transform algorithms for GPU architectures, The Journal of Supercomputing, 78:9, (11539-11563), Online publication date: 1-Jun-2022.
  24. Heiden E, Denniston C, Millard D, Ramos F and Sukhatme G Probabilistic Inference of Simulation Parameters via Parallel Differentiable Simulation 2022 International Conference on Robotics and Automation (ICRA), (3638-3645)
  25. Van Gendt M, Besard T, Vandenberghe S and De Sutter B (2022). Productively accelerating positron emission tomography image reconstruction on graphics processing units with Julia, International Journal of High Performance Computing Applications, 36:3, (320-336), Online publication date: 1-May-2022.
  26. ACM
    Zhang D, Huda S, Songhori E, Prabhu K, Le Q, Goldie A and Mirhoseini A A full-stack search technique for domain optimized deep learning accelerators Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, (27-42)
  27. Li J, Agung M and Takizawa H Evaluating the Performance and Conformance of a SYCL Implementation for SX-Aurora TSUBASA Parallel and Distributed Computing, Applications and Technologies, (36-47)
  28. Everett M Neural Network Verification in Control 2021 60th IEEE Conference on Decision and Control (CDC), (6326-6340)
  29. ACM
    Wei R, Zheng F, Gao L, Dong J, Fan G, Wan L, Lin J and Wang Y Heterogeneous-PAKE: Bridging the Gap between PAKE Protocols and Their Real-World Deployment Proceedings of the 37th Annual Computer Security Applications Conference, (76-90)
  30. ACM
    Wang L, Yang L, Yu Y, Wang W, Li B, Sun X, He J and Zhang L Morphling Proceedings of the ACM Symposium on Cloud Computing, (639-653)
  31. ACM
    Rozemberczki B, Scherer P, He Y, Panagopoulos G, Riedel A, Astefanoaei M, Kiss O, Beres F, López G, Collignon N and Sarkar R PyTorch Geometric Temporal: Spatiotemporal Signal Processing with Neural Machine Learning Models Proceedings of the 30th ACM International Conference on Information & Knowledge Management, (4564-4573)
  32. ACM
    Kim M, Mandrà S, Venturelli D and Jamieson K Physics-inspired heuristics for soft MIMO detection in 5G new radio and beyond Proceedings of the 27th Annual International Conference on Mobile Computing and Networking, (42-55)
  33. ACM
    Tine B, Yalamarthy K, Elsabbagh F and Hyesoon K Vortex: Extending the RISC-V ISA for GPGPU and 3D-Graphics MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture, (754-766)
  34. Pessanha Santos N, Lobo V and Bernardino A (2021). Unscented Particle Filters with Refinement Steps for UAV Pose Tracking, Journal of Intelligent and Robotic Systems, 102:2, Online publication date: 1-Jun-2021.
  35. Strubytska I and Strubytskyi P (2021). Efficiency of Parallelization Using GPU in Discrete Dynamic Models Construction Process, SN Computer Science, 2:3, Online publication date: 1-May-2021.
  36. ACM
    Bashir J and Sarangi S (2020). GPUOPT, ACM Journal on Emerging Technologies in Computing Systems, 17:1, (1-26), Online publication date: 31-Jan-2021.
  37. Ordóñez Á, Argüello F, Heras D and Demir B (2020). GPU-accelerated registration of hyperspectral images using KAZE features, The Journal of Supercomputing, 76:12, (9478-9492), Online publication date: 1-Dec-2020.
  38. Shapira G and Hassner T (2018). Fast and accurate line detection with GPU-based least median of squares, Journal of Real-Time Image Processing, 17:4, (839-851), Online publication date: 1-Aug-2020.
  39. de Doncker E, Yuasa F, Olagbemi O and Ishikawa T Large Scale Automatic Computations for Feynman Diagrams with up to Five Loops Computational Science and Its Applications – ICCSA 2020, (145-162)
  40. ACM
    Abdelatti M and Sodhi M An improved GPU-accelerated heuristic technique applied to the capacitated vehicle routing problem Proceedings of the 2020 Genetic and Evolutionary Computation Conference, (663-671)
  41. D’Ambrosio R, Di Giovacchino S and Pera D Parallel Numerical Solution of a 2D Chemotaxis-Stokes System on GPUs Technology Computational Science – ICCS 2020, (59-72)
  42. Li J, Deng G, Zhang W, Zhang C, Wang F and Liu Y (2019). Realization of CUDA-based real-time multi-camera visual SLAM in embedded systems, Journal of Real-Time Image Processing, 17:3, (713-727), Online publication date: 1-Jun-2020.
  43. ACM
    Chuang H, Lyerly R, Lankes S and Ravindran B Scaling Shared Memory Multiprocessing Applications in Non-cache-coherent Domains Proceedings of the 13th ACM International Systems and Storage Conference, (13-24)
  44. Kamath A, George A and Basu A ScoRD Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture, (1036-1049)
  45. Porcù R, Miglio E, Parolini N, Penati M and Vergopolan N (2020). HPC simulations of brownout, International Journal of High Performance Computing Applications, 34:3, (267-281), Online publication date: 1-May-2020.
  46. Shen Q, Sharp C, Davison R, Ushaw G, Ranjan R, Zomaya A and Morgan G (2020). A general purpose contention manager for software transactions on the GPU, Journal of Parallel and Distributed Computing, 139:C, (1-17), Online publication date: 1-May-2020.
  47. Kim T, Lee H and Kang S (2020). GPU-Based Redundancy Analysis Using Concurrent Evaluation, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 28:3, (805-817), Online publication date: 1-Mar-2020.
  48. Rodriguez-Borbon J, Ma X, Roy-Chowdhury A and Najjar W (2020). Heterogeneous Acceleration of HAR Applications, IEEE Transactions on Circuits and Systems for Video Technology, 30:3, (888-902), Online publication date: 1-Mar-2020.
  49. ACM
    Wei Y, You X, Yang H, Luan Z and Qian D Towards GPU Acceleration of Phonon Computation with ShengBTE Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, (32-42)
  50. ACM
    Jiang W, Ma Y, Liu B, Liu H, Zhou B, Zhu J, Wu S and Jin H (2019). Layup, ACM Transactions on Architecture and Code Optimization, 16:4, (1-23), Online publication date: 31-Dec-2020.
  51. Bohacek J, Kharicha A, Ludwig A, Wu M, Holzmann T and Karimi-Sibaki E (2019). A GPU solver for symmetric positive-definite matrices vs. traditional codes, Computers & Mathematics with Applications, 78:9, (2933-2943), Online publication date: 1-Nov-2019.
  52. Zhang E, Tafreshian A and Masoud N Parallel computing algorithm for real-time mapping between large-scale networks 2019 IEEE Intelligent Transportation Systems Conference (ITSC), (4087-4092)
  53. Zhang C, Yu M, Wang W and Yan F MArk Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference, (1049-1062)
  54. Chen Z, Zhang H, Guo D, Jia S, Fang X, Huang Z, Wang Y, Hu P, Wen L, Chen L, Li Z and Xiong R Champion Team Paper: Dynamic Passing-Shooting Algorithm of the RoboCup Soccer SSL 2019 Champion RoboCup 2019: Robot World Cup XXIII, (479-490)
  55. Munk D, Kipouros T and Vio G (2019). Multi-physics bi-directional evolutionary topology optimization on GPU-architecture, Engineering with Computers, 35:3, (1059-1079), Online publication date: 1-Jul-2019.
  56. ACM
    Nie J, Zhang C, Zou D, Xia F, Lu L, Wang X and Zhao F Adaptive Sparse Matrix-Vector Multiplication on CPU-GPU Heterogeneous Architecture Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference, (6-10)
  57. ACM
    Yamato Y, Noguchi H, Kataoka M and Isoda T Proposal of Environment Adaptive Software Proceedings of the 2nd International Conference on Control and Computer Vision, (102-108)
  58. Bak D, Mazurek P and Oszutowska–Mazurek D Optimization of Demodulation for Air–Gap Data Transmission Based on Backlight Modulation of Screen Computational Science – ICCS 2019, (71-80)
  59. Mazurek P and Krupinski R Monte Carlo Analysis of Local Cross–Correlation ST–TBD Algorithm Computational Science – ICCS 2019, (60-70)
  60. Ivanovsky L, Khryashchev V, Pavlov V and Ostrovskaya A Building Detection on Aerial Images Using U-NET Neural Networks Proceedings of the 24th Conference of Open Innovations Association FRUCT, (116-122)
  61. ACM
    Ramroach S, Dhanoo A, Cockburn B and Joshi A CUDA Optimized Neural Network Predicts Blood Glucose Control from Quantified Joint Mobility and Anthropometrics Proceedings of the 2019 3rd International Conference on Information System and Data Mining, (32-36)
  62. ACM
    Vineyard C, Dellana R, Aimone J, Rothganger F and Severa W Low-Power Deep Learning Inference using the SpiNNaker Neuromorphic Platform Proceedings of the 7th Annual Neuro-inspired Computational Elements Workshop, (1-7)
  63. Llanos D and Vigo-Aguiar J (2019). Computational and mathematical models meet heterogeneous computing, The Journal of Supercomputing, 75:3, (999-1000), Online publication date: 1-Mar-2019.
  64. Terenin A, Dong S and Draper D (2019). GPU-accelerated Gibbs sampling, Statistics and Computing, 29:2, (301-310), Online publication date: 1-Mar-2019.
  65. Jararweh Y, Al-Ayyoub M, Fakirah M, Alawneh L and Gupta B (2019). Improving the performance of the needleman-wunsch algorithm using parallelization and vectorization techniques, Multimedia Tools and Applications, 78:4, (3961-3977), Online publication date: 1-Feb-2019.
  66. Zhang Z, Sun Y, Xie H, Teng Y and Wang J (2019). GMMA, Applied Intelligence, 49:1, (63-78), Online publication date: 1-Jan-2019.
  67. Manglayev T, Kizilirmak R, Kho Y and Hamid N (2018). GPU Accelerated Successive Interference Cancellation for NOMA Uplink with User Clustering, Wireless Personal Communications: An International Journal, 103:3, (2391-2400), Online publication date: 1-Dec-2018.
  68. Poon L GPU-Accelerated Clique Tree Propagation for Pouch Latent Tree Models Network and Parallel Computing, (90-102)
  69. Khryashchev V, Ivanovsky L, Pavlov V, Ostrovskaya A and Rubtsov A Comparison of Different Convolutional Neural Network Architectures for Satellite Image Segmentation Proceedings of the 23rd Conference of Open Innovations Association FRUCT, (172-179)
  70. He L, Bai H, Jiang Y, Ouyang D and Jiang S (2018). Revised simplex algorithm for linear programming on GPUs with CUDA, Multimedia Tools and Applications, 77:22, (30035-30050), Online publication date: 1-Nov-2018.
  71. Hernández D, Olague G, Hernández B and Clemente E (2018). CUDA-based parallelization of a bio-inspired model for fast object classification, Neural Computing and Applications, 30:10, (3007-3018), Online publication date: 1-Nov-2018.
  72. ACM
    Zhou H, Ni F, Zhao L and Zheng F Parallel Optimization of Relion Proceedings of the 2nd International Conference on Algorithms, Computing and Systems, (48-52)
  73. ACM
    León-Sandoval E and Barbosa-Santillán L Data intensive parallel tree algorithm patterns based on GPUs Proceedings of the 2018 International Conference on Data Science and Information Technology, (69-73)
  74. Reyes E, Gómez C, Norambuena E and Ruiz-del-Solar J Near Real-Time Object Recognition for Pepper Based on Deep Neural Networks Running on a Backpack RoboCup 2018: Robot World Cup XXII, (287-298)
  75. ACM
    Hu Z, Shi J, Huang Y, Xiong J and Bu X GANFuzz Proceedings of the 15th ACM International Conference on Computing Frontiers, (138-145)
  76. Ben Boudaoud L, Solaiman B and Tari A (2018). A modified ZS thinning algorithm by a hybrid approach, The Visual Computer: International Journal of Computer Graphics, 34:5, (689-706), Online publication date: 1-May-2018.
  77. Yamato Y (2018). Server Selection, Configuration and Reconfiguration Technology for IaaS Cloud with Multiple Server Types, Journal of Network and Systems Management, 26:2, (339-360), Online publication date: 1-Apr-2018.
  78. Jiang B (2018). Real-time multi-resolution edge detection with pattern analysis on graphics processing unit, Journal of Real-Time Image Processing, 14:2, (293-321), Online publication date: 1-Feb-2018.
  79. ACM
    Fukutomi D, Iida Y, Azumi T, Kato S and Nishio N GPUhd Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, (127-136)
  80. Klionskiy D, Kaplun D and Geppener V (2018). Empirical Mode Decomposition for Signal Preprocessing and Classification of Intrinsic Mode Functions, Pattern Recognition and Image Analysis, 28:1, (122-132), Online publication date: 1-Jan-2018.
  81. Fioretto F, Pontelli E, Yeoh W and Dechter R (2018). Accelerating exact and approximate inference for (distributed) discrete optimization with GPUs, Constraints, 23:1, (1-43), Online publication date: 1-Jan-2018.
  82. Yang X and Zhu P (2017). Stochastic seismic inversion based on an improved local gradual deformation method, Computers & Geosciences, 109:C, (75-86), Online publication date: 1-Dec-2017.
  83. Lan G, Shen Y, Chen T and Zhu H (2017). Parallel implementations of structural similarity based no-reference image quality assessment, Advances in Engineering Software, 114:C, (372-379), Online publication date: 1-Dec-2017.
  84. Jarząbek ź and Czarnul P (2017). Performance evaluation of unified memory and dynamic parallelism for selected parallel CUDA applications, The Journal of Supercomputing, 73:12, (5378-5401), Online publication date: 1-Dec-2017.
  85. Gocawski J, Sekulska-Nalewajko J, Korzeniewska E and Piekarska A (2017). The use of optical coherence tomography for the evaluation of textural changes of grapes exposed to pulsed electric field, Computers and Electronics in Agriculture, 142:PA, (29-40), Online publication date: 1-Nov-2017.
  86. ACM
    Abdolrashidi A, Tripathy D, Belviranli M, Bhuyan L and Wong D Wireframe Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, (600-611)
  87. ACM
    Güler Z, Özkaynak F and Çınar A CUDA Implementation of DES Algorithm for Lightweight Platforms Proceedings of the 2017 International Conference on Biomedical Engineering and Bioinformatics, (49-52)
  88. ACM
    Ota K, Dao M, Mezaris V and Natale F (2017). Deep Learning for Mobile Multimedia, ACM Transactions on Multimedia Computing, Communications, and Applications, 13:3s, (1-22), Online publication date: 31-Aug-2017.
  89. ACM
    Eslami T, Awan M and Saeed F GPU-PCC Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology,and Health Informatics, (723-728)
  90. Wang Q, Chen D, Li S, Wu Q and Zhang Q (2017). An adaptive cartoon-like stylization for color video in real time, Multimedia Tools and Applications, 76:15, (16767-16782), Online publication date: 1-Aug-2017.
  91. ACM
    Szőke M, Józsa T, Koleszár Á, Moulitsas I and Könözsy L (2017). Performance Evaluation of a Two-Dimensional Lattice Boltzmann Solver Using CUDA and PGAS UPC Based Parallelisation, ACM Transactions on Mathematical Software, 44:1, (1-22), Online publication date: 24-Jul-2017.
  92. Klionskiy D, Kaplun D, Kupriyanov M, Dorokhov A, Geppener V and Golubkov A (2017). Vibrational and hydroacoustic signal processing in the frequency domain and its software-hardware implementation, Pattern Recognition and Image Analysis, 27:3, (588-598), Online publication date: 1-Jul-2017.
  93. Aissa M, Verstraete T and Vuik C (2017). Toward a GPU-aware comparison of explicit and implicit CFD simulations on structured meshes, Computers & Mathematics with Applications, 74:1, (201-217), Online publication date: 1-Jul-2017.
  94. Wang H, Zhang N and Crput J (2017). A massively parallel neural network approach to large-scale Euclidean traveling salesman problems, Neurocomputing, 240:C, (137-151), Online publication date: 31-May-2017.
  95. Méndez-Jiménez I, Cárdenas-Montes M, Rodríguez-Vázquez J, Sevilla-Noarbe I, Álvaro E, Alonso D and Vega-Rodríguez M An Accuracy-Aware Implementation of Two-Point Three-Dimensional Correlation Function using Bin-Recycling Strategy on GPU Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, (913-920)
  96. Kruliš M, Osipyan H and Marchand-Maillet S (2017). Employing GPU architectures for permutation-based indexing, Multimedia Tools and Applications, 76:9, (11859-11887), Online publication date: 1-May-2017.
  97. Nikolaos T, Georgopoulos K and Papaefstathiou Y A novel way to efficiently simulate complex full systems incorporating hardware accelerators Proceedings of the Conference on Design, Automation & Test in Europe, (658-661)
  98. ACM
    Filippone S, Cardellini V, Barbieri D and Fanfarillo A (2017). Sparse Matrix-Vector Multiplication on GPGPUs, ACM Transactions on Mathematical Software, 43:4, (1-49), Online publication date: 23-Mar-2017.
  99. ACM
    Sato H and Usami T SkyCube-tree based query processing in OLAP skyline cubes Proceedings of the 11th International Conference on Ubiquitous Information Management and Communication, (1-8)
  100. Hu D, Shen Q, Zhou S, Liu X, Fan Y, Wang L and Qian Z (2017). Adaptive Steganalysis Based on Selection Region and Combined Convolutional Neural Networks, Security and Communication Networks, 2017, Online publication date: 1-Jan-2017.
  101. Martinasso M, Kwasniewski G, Alam S, Schulthess T and Hoefler T A PCIe congestion-aware performance model for densely populated accelerator servers Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, (1-11)
  102. Kang S, Kim S, Won J and Kang Y (2016). GPU-based parallel genetic approach to large-scale travelling salesman problem, The Journal of Supercomputing, 72:11, (4399-4414), Online publication date: 1-Nov-2016.
  103. Varvello M, Laufer R, Zhang F and Lakshman T (2016). Multilayer Packet Classification With Graphics Processing Units, IEEE/ACM Transactions on Networking, 24:5, (2728-2741), Online publication date: 1-Oct-2016.
  104. Torun M, Yilmaz O and Akansu A (2016). FPGA, GPU, and CPU implementations of Jacobi algorithm for eigenanalysis, Journal of Parallel and Distributed Computing, 96:C, (172-180), Online publication date: 1-Oct-2016.
  105. Chang C and Kehtarnavaz N (2016). Computationally efficient image deblurring using low rank image approximation and its GPU implementation, Journal of Real-Time Image Processing, 12:3, (567-573), Online publication date: 1-Oct-2016.
  106. ACM
    Sorensen T and Donaldson A (2016). Exposing errors related to weak memory in GPU applications, ACM SIGPLAN Notices, 51:6, (100-113), Online publication date: 1-Aug-2016.
  107. ACM
    Breß S, Funke H and Teubner J Robust Query Processing in Co-Processor-accelerated Databases Proceedings of the 2016 International Conference on Management of Data, (1891-1906)
  108. ACM
    Shahvarani A and Jacobsen H A Hybrid B+-tree as Solution for In-Memory Indexing on CPU-GPU Heterogeneous Computing Platforms Proceedings of the 2016 International Conference on Management of Data, (1523-1538)
  109. ACM
    Sorensen T and Donaldson A Exposing errors related to weak memory in GPU applications Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, (100-113)
  110. Zimmer A and Ghuman P (2016). CUDA Optimization of Non-local Means Extended to Wrapped Gaussian Distributions for Interferometric Phase Denoising, Procedia Computer Science, 80:C, (166-177), Online publication date: 1-Jun-2016.
  111. Dai Y, Fang Y, Yang L and Jeon G (2016). Graphics processing unit-accelerated joint-bitplane belief propagation algorithm in DSC, The Journal of Supercomputing, 72:6, (2351-2375), Online publication date: 1-Jun-2016.
  112. Liu Y, Zheng H, Zhao R and Jian L (2016). Design and evaluation of multi-GPU enabled Multiple Symbol Detection algorithm, The Journal of Supercomputing, 72:6, (2111-2131), Online publication date: 1-Jun-2016.
  113. ACM
    Xu Y, Gao L, Wang R, Luan Z, Wu W and Qian D Lock-based synchronization for GPU architectures Proceedings of the ACM International Conference on Computing Frontiers, (205-213)
  114. Le T, Fioretto F, Yeoh W, Son T and Pontelli E ER-DCOPs Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, (606-614)
  115. Wang H, Zhang N, Créput J, Ruichek Y and Moreau J (2016). Massively parallel GPU computing for fast stereo correspondence algorithms, Journal of Systems Architecture: the EUROMICRO Journal, 65:C, (46-58), Online publication date: 1-Apr-2016.
  116. Mansouri F, Huet S and Houzet D (2016). A domain-specific high-level programming model, Concurrency and Computation: Practice & Experience, 28:3, (750-767), Online publication date: 10-Mar-2016.
  117. Wang F, Zhu L and Zhang C SSSP on GPU Without Atomic Operation Revised Selected Papers of the Second International Conference on Human Centered Computing - Volume 9567, (409-419)
  118. Ma Y, Chen L, Liu P and Lu K (2016). Parallel programing templates for remote sensing image processing on GPU architectures, Computing, 98:1-2, (7-33), Online publication date: 1-Jan-2016.
  119. Sengupta P, Nguyen J, Kwan J, Menon P, Heien E and Rundle J (2015). Accelerating earthquake simulations on general-purpose graphics processors, Concurrency and Computation: Practice & Experience, 27:17, (5460-5471), Online publication date: 10-Dec-2015.
  120. ACM
    Khorasani F, Gupta R and Bhuyan L Efficient warp execution in presence of divergence with collaborative context collection Proceedings of the 48th International Symposium on Microarchitecture, (204-215)
  121. ACM
    Harvey P, Hentschel K and Sventek J Parallel Programming in Actor-Based Applications via OpenCL Proceedings of the 16th Annual Middleware Conference, (162-172)
  122. ACM
    Siegel S, Zheng M, Luo Z, Zirkel T, Marianiello A, Edenhofner J, Dwyer M and Rogers M CIVL Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, (1-12)
  123. ACM
    Li A, van den Braak G, Kumar A and Corporaal H Adaptive and transparent cache bypassing for GPUs Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, (1-12)
  124. ACM
    Nose H, Nakajyo S, Suzuki H and Fuwa Y Development and Evaluation of a High-Speed Simulator for Wireless Sensor Network Protocols using GPGPU Proceedings of the 12th ACM Symposium on Performance Evaluation of Wireless Ad Hoc, Sensor, & Ubiquitous Networks, (109-115)
  125. Agosta G, Barenghi A, Di Federico A and Pelosi G (2015). OpenCL performance portability for general-purpose computation on graphics processor units, Concurrency and Computation: Practice & Experience, 27:14, (3633-3660), Online publication date: 25-Sep-2015.
  126. De Floriani L (2015). Editor's Note, IEEE Transactions on Visualization and Computer Graphics, 21:9, (994-995), Online publication date: 1-Sep-2015.
  127. Hongjian Wang , Naiyu Zhang , Creput J, Moreau J and Ruichek Y (2015). Parallel Structured Mesh Generation with Disparity Maps by GPU Implementation, IEEE Transactions on Visualization and Computer Graphics, 21:9, (1045-1057), Online publication date: 1-Sep-2015.
  128. Leonenko V, Pertsev N and Artzrouni M (2015). Using High Performance Algorithms for the Hybrid Simulation of Disease Dynamics on CPU and GPU, Procedia Computer Science, 51:C, (150-159), Online publication date: 1-Sep-2015.
  129. Borisenko A, Haidl M and Gorlatch S Parallelizing Branch-and-Bound on GPUs for Optimization of Multiproduct Batch Plants Proceedings of the 13th International Conference on Parallel Computing Technologies - Volume 9251, (324-337)
  130. Kim T Quaternion Julia set shape optimization Proceedings of the Eurographics Symposium on Geometry Processing, (167-176)
  131. Loh W and Yu H (2015). Fast density-based clustering through dataset partition using graphics processing units, Information Sciences: an International Journal, 308:C, (94-112), Online publication date: 1-Jul-2015.
  132. ACM
    Sabne A, Sakdhnagool P and Eigenmann R HeteroDoop Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, (235-246)
  133. ACM
    Li A, van den Braak G, Corporaal H and Kumar A Fine-Grained Synchronizations and Dataflow Programming on GPUs Proceedings of the 29th ACM on International Conference on Supercomputing, (109-118)
  134. ACM
    Alglave J, Batty M, Donaldson A, Gopalakrishnan G, Ketema J, Poetzl D, Sorensen T and Wickerson J (2015). GPU Concurrency, ACM SIGARCH Computer Architecture News, 43:1, (577-591), Online publication date: 29-May-2015.
  135. Reza H, Aguilar M and Jalal S Regression testing of GPU/MIC systems for HPCC Proceedings of the 2015 International Workshop on Software Engineering for High Performance Computing in Science, (30-37)
  136. ACM
    Alglave J, Batty M, Donaldson A, Gopalakrishnan G, Ketema J, Poetzl D, Sorensen T and Wickerson J (2015). GPU Concurrency, ACM SIGPLAN Notices, 50:4, (577-591), Online publication date: 12-May-2015.
  137. Camier J Improving Performance Portability and Exascale Software Productivity with the ∇ Numerical Programming Language Proceedings of the 3rd International Conference on Exascale Applications and Software, (126-131)
  138. ACM
    Grigorian B and Reinman G (2015). Accelerating Divergent Applications on SIMD Architectures Using Neural Networks, ACM Transactions on Architecture and Code Optimization, 12:1, (1-23), Online publication date: 16-Apr-2015.
  139. ACM
    Rodrigues A, Jorge A and Dutra I Accelerating recommender systems using GPUs Proceedings of the 30th Annual ACM Symposium on Applied Computing, (879-884)
  140. ACM
    Alglave J, Batty M, Donaldson A, Gopalakrishnan G, Ketema J, Poetzl D, Sorensen T and Wickerson J GPU Concurrency Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, (577-591)
  141. Islam M, Kim C and Kim J (2015). A GPU-based (8, 4) Hamming decoder for secure transmission of watermarked medical images, Cluster Computing, 18:1, (333-341), Online publication date: 1-Mar-2015.
  142. Islam M and Kim J (2015). GPU-based fast error recovery for high speed data communication in media technology, Cluster Computing, 18:1, (93-101), Online publication date: 1-Mar-2015.
  143. Iuspa L, Fusco P and Ruocco E (2015). An improved GPU-oriented algorithm for elastostatic analysis with boundary element method, Computers and Structures, 146:C, (105-116), Online publication date: 1-Jan-2015.
  144. Abdullah T, Anjum A, Tariq M, Baltaci Y and Antonopoulos N Traffic Monitoring Using Video Analytics in Clouds Proceedings of the 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing, (39-48)
  145. ACM
    Varvello M, Laufer R, Zhang F and Lakshman T Multi-Layer Packet Classification with Graphics Processing Units Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies, (109-120)
  146. Priimak D (2013). Finite difference numerical method for the superlattice Boltzmann transport equation and case comparison of CPU(C) and GPU(CUDA) implementations, Journal of Computational Physics, 278:C, (182-192), Online publication date: 1-Dec-2014.
  147. ACM
    Taskov B Optimizing large scale CUDA applications using input data specific optimizations Proceedings of the 11th European Conference on Visual Media Production, (1-6)
  148. De Rosis A, Barbaresi A, Torreggiani D, Benni S and Tassinari P (2014). Numerical simulations of the airflows in a wine-aging room, Computers and Electronics in Agriculture, 109:C, (261-270), Online publication date: 1-Nov-2014.
  149. Brædstrup C, Damsgaard A and Egholm D (2014). Ice-sheet modelling accelerated by graphics cards, Computers & Geosciences, 72:C, (210-220), Online publication date: 1-Nov-2014.
  150. ACM
    Bellekens X, Tachtatzis C, Atkinson R, Renfrew C and Kirkham T A Highly-Efficient Memory-Compression Scheme for GPU-Accelerated Intrusion Detection Systems Proceedings of the 7th International Conference on Security of Information and Networks, (302-309)
  151. Breß S, Siegmund N, Heimel M, Saecker M, Lauer T, Bellatreche L and Saake G (2014). Load-aware inter-co-processor parallelism in database query processing, Data & Knowledge Engineering, 93:C, (60-79), Online publication date: 1-Sep-2014.
  152. Dai Y, He D, Fang Y and Yang L (2014). Accelerating 2D orthogonal matching pursuit algorithm on GPU, The Journal of Supercomputing, 69:3, (1363-1381), Online publication date: 1-Sep-2014.
  153. Mansouri F, Huet S and Houzet D A Visual Programming Model to Implement Coarse-Grained DSP Applications on Parallel and Heterogeneous Clusters Revised Selected Papers, Part I, of the Euro-Par 2014 International Workshops on Parallel Processing - Volume 8805, (141-152)
  154. Raba N and Stankova E Two Parallel Algorithms for Effective Calculation of the Precipitation Particle Spectra in Elaborated Numerical Models of Convective Clouds Proceedings of the 14th International Conference on Computational Science and Its Applications — ICCSA 2014 - Volume 8584, (289-299)
  155. Amighi A, Blom S, Darabi S, Huisman M, Mostowski W and Zaharieva-Stojanovski M Verification of Concurrent Systems with VerCors Advanced Lectures of the 14th International School on Formal Methods for Executable Software Models - Volume 8483, (172-216)
  156. ACM
    Gilad E, Mackay E, Oskin M and Etsion Y O-structures Proceedings of the workshop on Memory Systems Performance and Correctness, (1-8)
  157. ACM
    Llopard I, Cohen A, Fabre C and Hili N A parallel action language for embedded applications and its compilation flow Proceedings of the 17th International Workshop on Software and Compilers for Embedded Systems, (118-127)
  158. ACM
    Harvey P, Hameed S and Vanderbauwhede W Accelerating Lagrangian particle dispersion in the atmosphere with OpenCL across multiple platforms Proceedings of the International Workshop on OpenCL 2013 & 2014, (1-8)
  159. ACM
    Xu Y, Wang R, Goswami N, Li T, Gao L and Qian D Software Transactional Memory for GPU Architectures Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, (1-10)
  160. ACM
    Xu Y, Wang R, Goswami N, Li T, Gao L and Qian D Software Transactional Memory for GPU Architectures Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, (1-10)
  161. Campeotto F, Palù A, Dovier A, Fioretto F and Pontelli E Exploring the Use of GPUs in Constraint Solving Proceedings of the 16th International Symposium on Practical Aspects of Declarative Languages - Volume 8324, (152-167)
  162. Czarnul P and Rościszewski P Optimization of Execution Time under Power Consumption Constraints in a Heterogeneous Parallel System with GPUs and CPUs Proceedings of the 15th International Conference on Distributed Computing and Networking - Volume 8314, (66-80)
  163. Niemeyer K and Sung C (2014). Accelerating moderately stiff chemical kinetics in reactive-flow simulations using GPUs, Journal of Computational Physics, 256:C, (854-871), Online publication date: 1-Jan-2014.
  164. Miramontes-Jaramillo D, Kober V and Díaz-Ramírez V CWMA Proceedings, Part I, of the 18th Iberoamerican Congress on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications - Volume 8258, (439-446)
  165. ACM
    Prasad S, Shekhar S, McDermott M, Zhou X, Evans M and Puri S GPGPU-accelerated interesting interval discovery and other computations on GeoSpatial datasets Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, (65-72)
  166. ACM
    Zheng R, Liu K, Jin H, Zhang Q and Feng X Accelerate MapReduce on GPUs with multi-level reduction Proceedings of the 5th Asia-Pacific Symposium on Internetware, (1-8)
  167. Gastineau M and Laskar J Highly Scalable Multiplication for Distributed Sparse Multivariate Polynomials on Many-Core Systems Proceedings of the 15th International Workshop on Computer Algebra in Scientific Computing - Volume 8136, (100-115)
  168. Breβ S, Siegmund N, Bellatreche L and Saake G An Operator-Stream-Based Scheduling Engine for Effective GPU Coprocessing Proceedings of the 17th East European Conference on Advances in Databases and Information Systems - Volume 8133, (288-301)
  169. Fernández J, Ferreiro A, García-Rodríguez J, Leitao A, López-Salas J and Vázquez C (2013). Original article, Mathematics and Computers in Simulation, 94, (55-75), Online publication date: 1-Aug-2013.
  170. Castillo D, Ferreiro A, García-Rodríguez J and Vázquez C (2013). Numerical methods to solve PDE models for pricing business companies in different regimes and implementation in GPUs, Applied Mathematics and Computation, 219:24, (11233-11257), Online publication date: 1-Aug-2013.
  171. Maffioletti F, Reffato R, Farinelli A, Kleiner A, Ramchurn S and Shi B RMASBench Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems, (1383-1384)
  172. Kleiner A, Farinelli A, Ramchurn S, Shi B, Maffioletti F and Reffato R RMASBench Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems, (1195-1196)
  173. ACM
    Perumalla K and Protopopescu V (2013). Reversible simulations of elastic collisions, ACM Transactions on Modeling and Computer Simulation, 23:2, (1-25), Online publication date: 1-May-2013.
  174. Dai Y, Fang Y, He D and Huang B (2013). Parallel design for error-resilient entropy coding algorithm on GPU, Journal of Parallel and Distributed Computing, 73:4, (411-419), Online publication date: 1-Apr-2013.
  175. Elbayoumi M, Hsiao M and ElNainay M A novel concurrent cache-friendly binary decision diagram construction for multi-core platforms Proceedings of the Conference on Design, Automation and Test in Europe, (1427-1430)
  176. Capuzzo-Dolcetta R, Spera M and Punzo D (2013). A fully parallel, high precision, N-body code running on hybrid computing platforms, Journal of Computational Physics, 236, (580-593), Online publication date: 1-Mar-2013.
  177. ACM
    Lipscomb T, Zou A and Cho S Parallel verlet neighbor list algorithm for GPU-optimized MD simulations Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine, (321-328)
  178. Krömer P, Corchado E, Snášel V, Platoš J and García-Hernández L Neural PCA and maximum likelihood hebbian learning on the GPU Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II, (132-139)
  179. Hulette G, Sottile M and Malony A A type-based approach to separating protocol from application logic Proceedings of the 18th international conference on Parallel Processing, (40-51)
  180. Calazan R, Nedjah N and de Macedo Mourelle L Swarm grid Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I, (148-160)
  181. Beck P and Nehmeier M Parallel interval newton method on CUDA Proceedings of the 11th international conference on Applied Parallel and Scientific Computing, (454-464)
  182. ACM
    Aggarwal V, Stitt G, George A and Yoon C (2012). SCF, ACM Transactions on Reconfigurable Technology and Systems, 5:2, (1-23), Online publication date: 1-Jun-2012.
  183. Rao V, Agrawal N and Maity S C-DAC's efforts Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way?, (1-4)
  184. Burkitt M, Walker D, Romano D and Fazeli A (2012). Constructing Complex 3D Biological Environments from Medical Imaging Using High Performance Computing, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9:3, (643-654), Online publication date: 1-May-2012.
  185. Jaros J and Pospichal P A fair comparison of modern CPUs and GPUs running the genetic algorithm under the knapsack benchmark Proceedings of the 2012t European conference on Applications of Evolutionary Computation, (426-435)
  186. Keenan M, Komarov I, D'Souza R and Riolo R Novel graphics processing unit-based parallel algorithms for understanding species diversity in forests Proceedings of the 2012 Symposium on High Performance Computing, (1-9)
  187. ACM
    Stromme A, Carlson R and Newhall T Chestnut Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores, (156-167)
  188. Jaros J, Treeby B and Rendell A Use of multiple GPUs on shared memory multiprocessors for ultrasound propagation simulations Proceedings of the Tenth Australasian Symposium on Parallel and Distributed Computing - Volume 127, (43-52)
  189. Kawanami K and Fujimoto N GPU accelerated computation of the longest common subsequence Facing the Multicore-Challenge II, (84-95)
  190. Ivanov L (2012). The right balance, Journal of Computing Sciences in Colleges, 27:3, (115-121), Online publication date: 1-Jan-2012.
  191. ACM
    Bartocci E, Cherry E, Glimm J, Grosu R, Smolka S and Fenton F Toward real-time simulation of cardiac dynamics Proceedings of the 9th International Conference on Computational Methods in Systems Biology, (103-112)
  192. ACM
    Catanzaro B, Garland M and Keutzer K (2011). Copperhead, ACM SIGPLAN Notices, 46:8, (47-56), Online publication date: 7-Sep-2011.
  193. Filelis-Papadopoulos C, Gravvanis G, Matskanidis P and Giannoutakis K (2011). On the GPGPU parallelization issues of finite element approximate inverse preconditioning, Journal of Computational and Applied Mathematics, 236:3, (294-307), Online publication date: 1-Sep-2011.
  194. ACM
    Chentanez N and Müller M Real-time Eulerian water simulation using a restricted tall cell grid ACM SIGGRAPH 2011 papers, (1-10)
  195. Jung Y, Graf H, Behr J and Kuijper A Mesh deformations in X3D via CUDA with freeform deformation lattices Proceedings of the 2011 international conference on Virtual and mixed reality: systems and applications - Volume Part II, (343-351)
  196. ACM
    Chentanez N and Müller M (2011). Real-time Eulerian water simulation using a restricted tall cell grid, ACM Transactions on Graphics, 30:4, (1-10), Online publication date: 1-Jul-2011.
  197. Cárdenas-Montes M, Vega-Rodríguez M, Rodríguez-Vázquez J and Gómez-Iglesias A Effect of the block occupancy in GPGPU over the performance of particle swarm algorithm Proceedings of the 10th international conference on Adaptive and natural computing algorithms - Volume Part I, (310-319)
  198. ACM
    Thall A Fast Mersenne prime testing on the GPU Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, (1-8)
  199. ACM
    Catanzaro B, Garland M and Keutzer K Copperhead Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, (47-56)
  200. ACM
    Garland M and Kirk D (2010). Understanding throughput-oriented architectures, Communications of the ACM, 53:11, (58-66), Online publication date: 1-Nov-2010.
  201. ACM
    Lin C, Chu W, Chang C, Liao H, Yang C, Lee J, You Y and Hsieh T The Support of MISRA C++ Analyzer for Reliability of Embedded Systems, ACM Transactions on Cyber-Physical Systems, 0:0
Contributors

Recommendations