Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleFebruary 2025
Preparing for HPC on RISC-V: Examining Vectorization and Distributed Performance of an Astrophysics Application with HPX and Kokkos
- Patrick Diehl,
- Panagiotis Syskakis,
- Gregor Daiß,
- Steven R. Brandt,
- Alireza Kheirkhahan,
- Srinivas Yadav Singanaboina,
- Dominic Marcello,
- Chris Taylor,
- John Leidel,
- Hartmut Kaiser
SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis on ZZZPages 1656–1665https://doi.org/10.1109/SCW63240.2024.00207In recent years, interest in RISC-V computing architectures has moved from academic to mainstream, especially in the field of High Performance Computing where energy limitations are increasingly a concern. As of this year, the first single board RISC-V ...
- ArticleOctober 2024
Language Equivalence from Nondeterministic to Weighted Automata—and Back
Leveraging Applications of Formal Methods, Verification and Validation. REoCAS Colloquium in Honor of Rocco De NicolaPages 75–93https://doi.org/10.1007/978-3-031-73709-1_6AbstractLanguage equivalence is closely related to Rocco De Nicola’s contributions to concurrency theory. Here we study language equivalence and state reduction for Nondeterministic Finite Automata (NFAs). We work in a linear algebraic setting based on ...
- research-articleJuly 2024
Deep Sketch Vectorization via Implicit Surface Extraction
ACM Transactions on Graphics (TOG), Volume 43, Issue 4Article No.: 37, Pages 1–13https://doi.org/10.1145/3658197We introduce an algorithm for sketch vectorization with state-of-the-art accuracy and capable of handling complex sketches. We approach sketch vectorization as a surface extraction task from an unsigned distance field, which is implemented using a two-...
- research-articleMay 2024
Rethinking 'Complement' Recommendations at Scale with SIMD
ICPE '24: Proceedings of the 15th ACM/SPEC International Conference on Performance EngineeringPages 25–36https://doi.org/10.1145/3629526.3645041Maximizing cart value by increasing the number of items in electronic carts is one of the key strategies adopted by e-commerce platforms for optimal conversion of positive user intent during an online shopping session. Recommender systems play a key-role ...
Automatic Generation of Vectorizing Compilers for Customizable Digital Signal Processors
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1Pages 19–34https://doi.org/10.1145/3617232.3624873Embedded applications extract the best power-performance trade-off from digital signal processors (DSPs) by making extensive use of vectorized execution. Rather than handwriting the many customized kernels these applications use, DSP engineers rely on ...
-
Algorithm 1041: HiPPIS—A High-order Positivity-preserving Mapping Software for Structured Meshes
ACM Transactions on Mathematical Software (TOMS), Volume 50, Issue 1Article No.: 8, Pages 1–31https://doi.org/10.1145/3632291Polynomial interpolation is an important component of many computational problems. In several of these computational problems, failure to preserve positivity when using polynomials to approximate or map data values between meshes can lead to negative ...
- research-articleFebruary 2024
GraphCube: Interconnection Hierarchy-aware Graph Processing
- Xinbiao Gan,
- Guang Wu,
- Shenghao Qiu,
- Feng Xiong,
- Jiaqi Si,
- Jianbin Fang,
- Dezun Dong,
- Chunye Gong,
- Tiejun Li,
- Zheng Wang
PPoPP '24: Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel ProgrammingPages 160–174https://doi.org/10.1145/3627535.3638498Processing large-scale graphs with billions to trillions of edges requires efficiently utilizing parallel systems. However, current graph processing engines do not scale well beyond a few tens of computing nodes because they are oblivious to the ...
- research-articleJanuary 2024
A Vectorized Formulation of the Cell Transmission Model for Efficient Simulation of Large-Scale Freeway Networks
Procedia Computer Science (PROCS), Volume 238, Issue CPages 143–150https://doi.org/10.1016/j.procs.2024.06.009AbstractMacroscopic traffic flow models are powerful tools for assessing the level of service on freeway segments. One popular model is the Cell Transmission Model (CTM) proposed by Daganzo. The US highway capacity manual (HCM) operationalizes this model ...
- research-articleNovember 2023
Performance Portability Evaluation of Blocked Stencil Computations on GPUs
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1007–1018https://doi.org/10.1145/3624062.3624177In this new era where multiple GPU vendors are leading the supercomputing landscape, and multiple programming models are available to users, the drive to achieve performance portability across platforms faces new challenges. Consider stencil algorithms, ...
- research-articleDecember 2023
A Tensor Marshaling Unit for Sparse Tensor Algebra on General-Purpose Processors
- Marco Siracusa,
- Víctor Soria-Pardos,
- Francesco Sgherzi,
- Joshua Randall,
- Douglas J. Joseph,
- Miquel Moretó Planas,
- Adrià Armejach
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on MicroarchitecturePages 1332–1346https://doi.org/10.1145/3613424.3614284This paper proposes the Tensor Marshaling Unit (TMU), a near-core programmable dataflow engine for multicore architectures that accelerates tensor traversals and merging, the most critical operations of sparse tensor workloads running on today’s ...
- research-articleSeptember 2023
Fusion-based Representation Learning Model for Multimode User-generated Social Network Content
Journal of Data and Information Quality (JDIQ), Volume 15, Issue 3Article No.: 34, Pages 1–21https://doi.org/10.1145/3603712As mobile networks and APPs are developed, user-generated content (UGC), which includes multi-source heterogeneous data like user reviews, tags, scores, images, and videos, has become an essential basis for improving the quality of personalized services. ...
- research-articleJuly 2023
Image vectorization and editing via linear gradient layer decomposition
ACM Transactions on Graphics (TOG), Volume 42, Issue 4Article No.: 97, Pages 1–13https://doi.org/10.1145/3592128A key advantage of vector graphics over raster graphics is their editability. For example, linear gradients define a spatially varying color fill with a few intuitive parameters, which are ubiquitously supported in standard vector graphics formats and ...
- research-articleFebruary 2023
Lifting Code Generation of Cardiac Physiology Simulation to Novel Compiler Technology
CGO '23: Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and OptimizationPages 68–80https://doi.org/10.1145/3579990.3580008The study of numerical models for the human body has become a major focus of the research community in biology and medicine. For instance, numerical ionic models of a complex organ, such as the heart, must be able to represent individual cells and their ...
- research-articleJanuary 2023
Disaster tweet classification: A majority voting approach using machine learning algorithms
Intelligent Decision Technologies (INTDTEC), Volume 17, Issue 2Pages 343–355https://doi.org/10.3233/IDT-220310Nowadays, people share their opinions through social media. This information may be informative or non-informative. Filtering informative information from social media plays a challenging issue. Nevertheless, people will interact more with that ...
- research-articleNovember 2022
Vectorizing sparse matrix computations with partially-strided codelets
SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 32, Pages 1–15The compact data structures and irregular computation patterns in sparse matrix computations introduce challenges to vectorizing these codes. Available approaches primarily vectorize strided computation regions of a sparse code. In this work, we propose ...
Custom High-Performance Vector Code Generation for Data-Specific Sparse Computations
PACT '22: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 160–171https://doi.org/10.1145/3559009.3569668Sparse computations, such as sparse matrix-dense vector multiplication, are notoriously hard to optimize due to their irregularity and memory-boundedness. Solutions to improve the performance of sparse computations have been proposed, ranging from ...
- research-articleJanuary 2023
Combining Run-Time Checks and Compile-Time Analysis to Improve Control Flow Auto-Vectorization
PACT '22: Proceedings of the International Conference on Parallel Architectures and Compilation TechniquesPages 439–450https://doi.org/10.1145/3559009.3569663SIMD (Single Instruction Multiple Data) instructions apply the same operation to multiple elements simultaneously. Compilers transform codes to exploit SIMD instructions through auto-vectorization. Control flow can lead to challenges for auto-...
- research-articleDecember 2023
Treebeard: An Optimizing Compiler for Decision Tree Based ML Inference
MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on MicroarchitecturePages 494–511https://doi.org/10.1109/MICRO56248.2022.00043Decision tree ensembles are among the most commonly used machine learning models. These models are used in a wide range of applications and are deployed at scale. Decision tree ensemble inference is usually performed with libraries such as XGBoost, ...
- research-articleAugust 2022
Optimizing half precision Winograd convolution on ARM many-core processors
APSys '22: Proceedings of the 13th ACM SIGOPS Asia-Pacific Workshop on SystemsPages 53–60https://doi.org/10.1145/3546591.3547529Convolutional Neural Networks (CNNs) are widely used in real world applications, e.g, computer vision. Winograd based convolution is usually applied due to its low computation complexity. For the underling hardware, ARM many-core CPUs, by their price ...
- research-articleJune 2022
Software-defined floating-point number formats and their application to graph processing
ICS '22: Proceedings of the 36th ACM International Conference on SupercomputingArticle No.: 8, Pages 1–17https://doi.org/10.1145/3524059.3532360This paper proposes software-defined floating-point number formats for graph processing workloads, which can improve performance in irregular workloads by reducing cache misses. Efficient arithmetic on software-defined number formats is challenging, ...