Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- articleFebruary 2018
Out-of-core implementation for accelerator kernels on heterogeneous clouds
The Journal of Supercomputing (JSCO), Volume 74, Issue 2Pages 551–568https://doi.org/10.1007/s11227-017-2141-4Cloud environments today are increasingly featuring hybrid nodes containing multicore CPU processors and a diverse mix of accelerators such as Graphics Processing Units (GPUs), Intel Xeon Phi co-processors, and Field-Programmable Gate Arrays (FPGAs) to ...
- articleAugust 2017
GPU parallelization of the sequential matrix diagonalization algorithm and its application to high-dimensional data
The Journal of Supercomputing (JSCO), Volume 73, Issue 8Pages 3603–3634https://doi.org/10.1007/s11227-017-1961-6This paper presents the parallelization on a GPU of the sequential matrix diagonalization (SMD) algorithm, a method for diagonalizing polynomial covariance matrices, which is the most recent technique for polynomial eigenvalue decomposition. We first ...
- articleApril 2014
From tile algorithm to stripe algorithm: a CUBLAS-based parallel implementation on GPUs of Gauss method for the resolution of extremely large dense linear systems stored on an array of solid state devices
The Journal of Supercomputing (JSCO), Volume 68, Issue 1Pages 365–413https://doi.org/10.1007/s11227-013-1043-3This paper presents an efficient algorithmic approach to the GPU-based parallel resolution of dense linear systems of extremely large size. A formal transformation of the code of Gauss method allows us to develop for matrix calculations the concept of ...
- articleApril 2014
A CUDA implementation of the Continuous Space Language Model
The Journal of Supercomputing (JSCO), Volume 68, Issue 1Pages 65–86https://doi.org/10.1007/s11227-013-1023-7The training phase of the Continuous Space Language Model (CSLM) was implemented in the NVIDIA hardware/software architecture Compute Unified Device Architecture (CUDA). A detailed explanation of the CSLM algorithm is provided. Implementation was ...
- articleAugust 2012
Speeding up solving of differential matrix Riccati equations using GPGPU computing and MATLAB
Concurrency and Computation: Practice & Experience (CCOMP), Volume 24, Issue 12Pages 1334–1348https://doi.org/10.1002/cpe.1835In this work, we developed a parallel algorithm to speed up the resolution of differential matrix Riccati equations using a backward differentiation formula algorithm based on a fixed-point method. The role and use of differential matrix Riccati ...
- ArticleJune 2012
Iterative Methods for Sparse Linear Systems on Graphics Processing Unit
HPCC '12: Proceedings of the 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and SystemsPages 836–842https://doi.org/10.1109/HPCC.2012.118Many engineering and science problems require a computational effort to solve large sparse linear systems. Krylov subspace based iterative solvers have been widely used in that direction. Iterative Krylov methods involve linear algebra operations such ...
- research-articleJune 2012
A unified optimizing compiler framework for different GPGPU architectures
ACM Transactions on Architecture and Code Optimization (TACO), Volume 9, Issue 2Article No.: 9, Pages 1–33https://doi.org/10.1145/2207222.2207225This article presents a novel optimizing compiler for general purpose computation on graphics processing units (GPGPU). It addresses two major challenges of developing high performance GPGPU programs: effective utilization of GPU memory hierarchy and ...
- ArticleMay 2012
Forecasting High Dimensional Volatility Using Conditional Restricted Boltzmann Machine on GPU
IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD ForumPages 1979–1986https://doi.org/10.1109/IPDPSW.2012.258Forecasting the volatility of multivariate asset return is an important issue in financial econometric analysis, where the volatility is represented by a conditional covariance matrix (CCM). Traditional models for predicting CCM such as GARCH(1, 1) ...
- articleNovember 2011
Neville elimination on multi- and many-core systems: OpenMP, MPI and CUDA
The Journal of Supercomputing (JSCO), Volume 58, Issue 2Pages 215–225https://doi.org/10.1007/s11227-009-0360-zThis paper describes several parallel algorithmic variations of the Neville elimination. This elimination solves a system of linear equations making zeros in a matrix column by adding to each row an adequate multiple of the preceding one. The parallel ...
- articleJune 2011
CUDA-enabled implementation of a neural network algorithm for handwritten digit recognition
Optical Memory and Neural Networks (SPOMNN), Volume 20, Issue 2Pages 98–106https://doi.org/10.3103/S1060992X11020032Using a convolutional neural network as an example, we discuss specific aspects of implementing a learning algorithm of pattern recognition on the GPU graphics card using NVIDIA CUDA architecture. The training time of the neural network on a video-...
- ArticleAugust 2009
Accelerating Image Retrieval Using Factorial Correspondence Analysis on GPU
CAIP '09: Proceedings of the 13th International Conference on Computer Analysis of Images and PatternsPages 565–572https://doi.org/10.1007/978-3-642-03767-2_69We are interested in the intensive use of Factorial Correspondence Analysis (FCA) for large-scale content-based image retrieval. Factorial Correspondence Analysis, is a useful method for analyzing textual data, and we adapt it to images using the SIFT ...
- research-articleMarch 2009
Accelerating phase unwrapping and affine transformations for optical quadrature microscopy using CUDA
GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing UnitsPages 28–37https://doi.org/10.1145/1513895.1513899Optical Quadrature Microscopy (OQM) is a process which uses phase data to capture information about the sample being studied. OQM is part of an imaging framework developed by the Optical Science Laboratory at Northeastern University. In one particular ...