Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
AG-SpTRSV: An Automatic Framework to Optimize Sparse Triangular Solve on GPUs
ACM Transactions on Architecture and Code Optimization (TACO), Volume 21, Issue 4Article No.: 70, Pages 1–25https://doi.org/10.1145/3674911Sparse Triangular Solve (SpTRSV) has long been an essential kernel in the field of scientific computing. Due to its low computational intensity and internal data dependencies, SpTRSV is hard to implement and optimize on graphics processing units (GPUs). ...
- research-articleSeptember 2024
- research-articleJune 2024
A mixed cell compressed sparse row for time domain boundary element method in elastodynamics
Advances in Engineering Software (ADES), Volume 192, Issue Chttps://doi.org/10.1016/j.advengsoft.2024.103633Highlights- A novel scheme, termed mixed cell compressed sparse row (mCCSR) is proposed to address the sparsity, block structure, and partial sub-matrix symmetry in the matrix.
- Time domain boundary element method (TDBEM) in elastodynamics.
- The ...
In this paper, a novel mixed cell compressed sparse row (mCCSR) scheme is proposed to address the sparsity, block structure, and partial sub-matrix symmetry inherent in the coefficient matrix of the time domain boundary element method (TDBEM) in ...
- research-articleJune 2024
An Efficient and Scalable Approach to Build Co-occurrence Matrix for DNN's Embedding Layer
ICS '24: Proceedings of the 38th ACM International Conference on SupercomputingPages 286–297https://doi.org/10.1145/3650200.3656629Embedding is a crucial step for deep neural networks. Datasets, from different applications, with different structures, can all be processed through an embedding layer and transformed into a dense matrix. The transformation must minimize both the loss ...
- research-articleMay 2024
Optimizing sparse general matrix–matrix multiplication for DCUs
- Hengliang Guo,
- Haolei Wang,
- Wanting Chen,
- Congxiang Zhang,
- Yubo Han,
- Shengguang Zhu,
- Dujuan Zhang,
- Yang Guo,
- Jiandong Shang,
- Tao Wan,
- Qingyang Li,
- Gang Wu
The Journal of Supercomputing (JSCO), Volume 80, Issue 14Pages 20176–20200https://doi.org/10.1007/s11227-024-06234-2AbstractSparse general matrix–matrix multiplication (SpGEMM) is a crucial and complex computational task in many practical applications. Improving the performance of SpGEMM on SIMT processors like modern GPUs is challenging due to the unpredictable ...
-
- research-articleNovember 2023
Fast density peaks clustering algorithm based on improved mutual K-nearest-neighbor and sub-cluster merging
Information Sciences: an International Journal (ISCI), Volume 647, Issue Chttps://doi.org/10.1016/j.ins.2023.119470AbstractDensity peaks clustering (DPC) has had an impact in many fields, as it can quickly select centers and effectively process complex data. However, it also has low operational efficiency and a “Domino” effect. To solve these defects, we propose a ...
- research-articleSeptember 2023
A flexible sparse matrix data format and parallel algorithms for the assembly of finite element matrices on shared memory systems
AbstractFinite element methods require the composition of the global stiffness matrix from local finite element contributions. The composition process combines the computation of element stiffness matrices and their assembly into the global stiffness ...
Highlights- A new sparse matrix format, Compressed-Rows-Aligned-Columns (CRAC), ideal for hp-FEM.
- Parallelization of the assembly procedure using atomic synchronization.
- Local-to-global degrees of freedom maps for vectorization.
- ...
- research-articleSeptember 2023
Optimizing massively parallel sparse matrix computing on ARM many-core processor
AbstractSparse matrix multiplication is ubiquitous in many applications such as graph processing and numerical simulation. In recent years, numerous efficient sparse matrix multiplication algorithms and computational libraries have been proposed. However,...
- research-articleJuly 2023
Characterizing the performance of node-aware strategies for irregular point-to-point communication on heterogeneous architectures
AbstractSupercomputer architectures are trending toward higher computational throughput due to the inclusion of heterogeneous compute nodes. These multi-GPU nodes increase on-node computational efficiency, while also increasing the amount of ...
- research-articleJuly 2023
Accelerating matrix-centric graph processing on GPUs through bit-level optimizations
Journal of Parallel and Distributed Computing (JPDC), Volume 177, Issue CPages 53–67https://doi.org/10.1016/j.jpdc.2023.02.013AbstractEven though it is well known that binary values are common in graph applications (e.g., adjacency matrix), how to leverage the phenomenon for efficiency has not yet been adequately explored. This paper presents a systematic study on ...
Highlights- The performance of GraphBLAS algorithms can be accelerated with sparse bit storage format and bit manipulation.
- research-articleMarch 2023
Design of an adaptive framework with compressive sensing for spatial data in wireless sensor networks
Wireless Networks (WIRE), Volume 29, Issue 5Pages 2203–2216https://doi.org/10.1007/s11276-023-03291-yAbstractWireless Sensor Networks (WSNs) gather active sensor data within a specified period to the sink node. The data transmission in restricted resource utilization in wireless surroundings is a primary issue. Compressive sensing enables resource ...
- research-articleSeptember 2022
Picture quality and compression analysis of multilevel legendre wavelet transformation based image compression technique
Multimedia Tools and Applications (MTAA), Volume 81, Issue 21Pages 29799–29845https://doi.org/10.1007/s11042-022-12675-9AbstractA novel lossy RGB (Red, Green, Blue) colour still image compression algorithm is proposed. The intended method introduces Legendre wavelet-based image transformation technique integrated with vector quantization and run length encoding. High ...
- research-articleJanuary 2023
TileSpMSpV: A Tiled Algorithm for Sparse Matrix-Sparse Vector Multiplication on GPUs
ICPP '22: Proceedings of the 51st International Conference on Parallel ProcessingArticle No.: 9, Pages 1–11https://doi.org/10.1145/3545008.3545028Sparse matrix-sparse vector multiplication (SpMSpV) is an important primitive for graph algorithms and machine learning applications. The sparsity of the input and output vectors makes its floating point efficiency in general lower than sparse matrix-...
- ArticleJuly 2022
On Active-Set LP Algorithms Allowing Basis Deficiency
Computational Science and Its Applications – ICCSA 2022 WorkshopsPages 174–187https://doi.org/10.1007/978-3-031-10562-3_13AbstractAn interesting phenomenon in linear programming (LP) is how to deal with solutions in which the number of nonzero variables is less than the number of rows of the matrix in standard form. An interesting approach is that of basis-deficiency-...
- research-articleOctober 2021
Segmented Merge: A New Primitive for Parallel Sparse Matrix Computations
International Journal of Parallel Programming (IJPP), Volume 49, Issue 5Pages 732–744https://doi.org/10.1007/s10766-021-00695-1AbstractSegmented operations, such as segmented sum, segmented scan and segmented sort, are important building blocks for parallel irregular algorithms. We in this work propose a new parallel primitive called segmented merge. Its function is in parallel ...
- research-articleSeptember 2021
SortCache: Intelligent Cache Management for Accelerating Sparse Data Workloads
ACM Transactions on Architecture and Code Optimization (TACO), Volume 18, Issue 4Article No.: 56, Pages 1–24https://doi.org/10.1145/3473332Sparse data applications have irregular access patterns that stymie modern memory architectures. Although hyper-sparse workloads have received considerable attention in the past, moderately-sparse workloads prevalent in machine learning applications, ...
- research-articleFebruary 2021
A Survey of Personalized Recommendation Based on Machine Learning Algorithms
EITCE '20: Proceedings of the 2020 4th International Conference on Electronic Information Technology and Computer EngineeringPages 602–610https://doi.org/10.1145/3443467.3444711Personalized recommendation is a key technology to effectively solve the overload of online information and eliminate information islands. It is widely known as an important way to improve the quality of information services. However, cold start, data ...
- ArticleOctober 2020
RFQ-ANN: Artificial Neural Network Model for Predicting Protein-Protein Interaction Based on Sparse Matrix
Intelligent Computing Theories and ApplicationPages 446–454https://doi.org/10.1007/978-3-030-60802-6_39AbstractProtein is a complex organic substance with a spatial structure, which exists widely in the body of living things. Almost all living things rely on protein to form an important part of the body, perform many physiological function adjustments, and ...