Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJuly 2024JUST ACCEPTED
SPARTA: High-Level Synthesis of Parallel Multi-Threaded Accelerators
- Giovanni Gozzi,
- Michele Fiorito,
- Serena Curzel,
- Claudio Barone,
- Vito Giovanni Castellana,
- Marco Minutoli,
- Antonino Tumeo,
- Fabrizio Ferrandi
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Just Accepted https://doi.org/10.1145/3677035This paper presents a methodology for the Synthesis of PARallel multi-Threaded Accelerators (SPARTA) from OpenMP annotated C/C++ specifications. SPARTA extends an open-source HLS tool, enabling the generation of accelerators that provide latency tolerance ...
- research-articleApril 2024
A new family of fourth-order energy-preserving integrators
Numerical Algorithms (SPNA), Volume 96, Issue 3Jul 2024, Pages 1269–1293https://doi.org/10.1007/s11075-024-01824-wAbstractFor Hamiltonian systems with non-canonical structure matrices, a new family of fourth-order energy-preserving integrators is presented. The integrators take a form of a combination of Runge–Kutta methods and continuous-stage Runge–Kutta methods ...
- research-articleApril 2024
WiseGraph: Optimizing GNN with Joint Workload Partition of Graph and Operations
- Kezhao Huang,
- Jidong Zhai,
- Liyan Zheng,
- Haojie Wang,
- Yuyang Jin,
- Qihao Zhang,
- Runqing Zhang,
- Zhen Zheng,
- Youngmin Yi,
- Xipeng Shen
EuroSys '24: Proceedings of the Nineteenth European Conference on Computer SystemsApril 2024, Pages 1–17https://doi.org/10.1145/3627703.3650063Graph Neural Network (GNN) has emerged as an important workload for learning on graphs. With the size of graph data and the complexity of GNN model architectures increasing, developing an efficient GNN system grows more important. As GNN has heavy neural ...
- research-articleApril 2024
ScaleCache: A Scalable Page Cache for Multiple Solid-State Drives
- Kiet Tuan Pham,
- Seokjoo Cho,
- Sangjin Lee,
- Lan Anh Nguyen,
- Hyeongi Yeo,
- Ipoom Jeong,
- Sungjin Lee,
- Nam Sung Kim,
- Yongseok Son
EuroSys '24: Proceedings of the Nineteenth European Conference on Computer SystemsApril 2024, Pages 641–656https://doi.org/10.1145/3627703.3629588This paper presents a scalable page cache called ScaleCache for improving SSD scalability. Specifically, we first propose a concurrent data structure of page cache based on XArray (ccXArray) to enable access and update the page cache concurrently. Second,...
- research-articleJuly 2024
An Energy-Efficient Parallelism Scheme for Deep Neural Network Training And Inferencing on Heterogeneous Cloud Resources
ICIIT '24: Proceedings of the 2024 9th International Conference on Intelligent Information TechnologyFebruary 2024, Pages 493–498https://doi.org/10.1145/3654522.3654596The emergence of Large Language Models(LLM) and generative AI has led to an explosive increase in computational demands across cloud computing data centers. The growing number of parameters in deep learning models results in significant power consumption ...
-
- ArticleMay 2024
HPX with Spack and Singularity Containers: Evaluating Overheads for HPX/Kokkos Using an Astrophysics Application
Asynchronous Many-Task Systems and ApplicationsFeb 2024, Pages 173–184https://doi.org/10.1007/978-3-031-61763-8_17AbstractCloud computing for high performance computing resources is an emerging topic. This service is of interest to researchers who care about reproducible computing, for software packages with complex installations, and for companies or researchers who ...
Explicit Effects and Effect Constraints in ReML
Proceedings of the ACM on Programming Languages (PACMPL), Volume 8, Issue POPLArticle No.: 79, Pages 2370–2394https://doi.org/10.1145/3632921An important aspect of building robust systems that execute on dedicated hardware and perhaps in constrained environments is to control and manage the effects performed by program code.
We present ReML, a higher-order statically-typed functional ...
Discovering Parallelisms in Python Programs
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringNovember 2023, Pages 832–844https://doi.org/10.1145/3611643.3616259Parallelization is a promising way to improve the performance of Python programs. Unfortunately, developers may miss parallelization possibilities, because they usually do not concentrate on parallelization. Many approaches have been proposed to ...
- research-articleFebruary 2024
CPU and GPU Parallelism of the A* Algorithm on solving N-Puzzle problems
PCI '23: Proceedings of the 27th Pan-Hellenic Conference on Progress in Computing and InformaticsNovember 2023, Pages 21–25https://doi.org/10.1145/3635059.3635063This paper discusses the implementation of parallelism on the A* algorithm, using both the central processing unit and the graphics processing unit, in order to increase its efficiency in terms of the necessary time to solve Sliding Puzzle problems. The ...
- ArticleMarch 2024
A Pipelined AES and SM4 Hardware Implementation for Multi-tasking Virtualized Environments
Algorithms and Architectures for Parallel ProcessingOct 2023, Pages 275–291https://doi.org/10.1007/978-981-97-0801-7_16AbstractVirtualization techniques are becoming increasingly prevalent and are driving trends in hardware development to offer parallelization support for multi-tasking. Existing works on hardware designs of the Advanced Encryption Standard (AES) and SM4 ...
- research-articleOctober 2023
Parallel Execution of Transactions Based on Dynamic and Self-Verifiable Conflict Analysis
LADC '23: Proceedings of the 12th Latin-American Symposium on Dependable and Secure ComputingOctober 2023, Pages 110–119https://doi.org/10.1145/3615366.3615425In most blockchains, miners execute transactions sequentially, while validators reproduce the execution to validate its results. Although simple, this approach does not exploit modern multi-core resources efficiently, thus limiting performance and ...
- research-articleSeptember 2023
Parallel approaches to extract multi-level high utility itemsets from hierarchical transaction databases
Knowledge-Based Systems (KNBS), Volume 276, Issue CSep 2023https://doi.org/10.1016/j.knosys.2023.110733AbstractIn the field of data mining, high utility itemset mining (HUIM) is a relevant mining task, with the aim of analyzing customer transaction databases. HUIM consists of exploiting the set of items that are often purchased together and ...
Highlights- Parallelism is applied at many parts of the algorithm to improve mining performance.
- ArticleJanuary 2024
Construction of Locality-Aware Algorithms to Optimize Performance of Stencil Codes on Heterogeneous Hardware
AbstractRecently, an increase in code performance has been obtained mainly through parallelism. For codes that implement stencil schemes, parallel processing requires data-intensive exchange. When parallel threads need to communicate, memory bandwidth ...
- ArticleAugust 2023
Scalable Random Forest with Data-Parallel Computing
Euro-Par 2023: Parallel ProcessingAug 2023, Pages 397–410https://doi.org/10.1007/978-3-031-39698-4_27AbstractIn the last years, there has been a significant increment in the quantity of data available and computational resources. This leads scientific and industry communities to pursue more accurate and efficient Machine Learning (ML) models. Random ...
- research-articleAugust 2023
Conjugate Gradients Acceleration of Coordinate Descent for Linear Systems
Journal of Scientific Computing (JSCI), Volume 96, Issue 3Sep 2023https://doi.org/10.1007/s10915-023-02307-1AbstractThis paper introduces a conjugate gradients (CG) acceleration of the coordinate descent algorithm (CD) for linear systems. It is shown that the Kaczmarz algorithm (KACZ) can simulate CD exactly, so CD can be accelerated by CG similarly to the CG ...
- research-articleAugust 2023
FPGA Design of Transposed Convolutions for Deep Learning Using High-Level Synthesis
Journal of Signal Processing Systems (JSPS), Volume 95, Issue 10Oct 2023, Pages 1245–1263https://doi.org/10.1007/s11265-023-01883-7AbstractDeep Learning (DL) is pervasive across a wide variety of domains. Convolutional Neural Networks (CNNs) are often used for image processing DL applications. Modern CNN models are growing to meet the needs of more sophisticated tasks, e.g. using ...
- research-articleJune 2023
Parallelism in a Region Inference Context
Proceedings of the ACM on Programming Languages (PACMPL), Volume 7, Issue PLDIArticle No.: 142, Pages 884–906https://doi.org/10.1145/3591256Region inference is a type-based program analysis that takes a non-annotated program as input and constructs a program that explicitly manages memory allocation and deallocation by dividing the heap into a stack of regions, each of which can grow and ...
- research-articleJune 2023
Area-latency efficient floating point adder using interleaved alignment and normalization
Microprocessors & Microsystems (MSYS), Volume 99, Issue CJun 2023https://doi.org/10.1016/j.micpro.2023.104842Highlights- Bidirectional barrel shifter replaces the two barrel shifters in conventional FP adder.
The barrel shifter is an indispensable floating-point (FP) adder circuit. It performs the alignment on the mantissa of the smallest FP number and also normalizes the added mantissa in a conventional FP adder. Alignment and ...
- ArticleMay 2023
Constraint Propagation on GPU: A Case Study for the Cumulative Constraint
Integration of Constraint Programming, Artificial Intelligence, and Operations ResearchMay 2023, Pages 336–353https://doi.org/10.1007/978-3-031-33271-5_22AbstractThe Cumulative constraint is one of the most important global constraints, as it naturally arises in a variety of problems related to scheduling with limited resources. Devising fast propagation algorithms that run at every node of the search tree ...