Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleSeptember 2024
A fast high resolution distributed hydrological model for forecasting, climate scenarios and digital twin applications using wflow_sbm
Environmental Modelling & Software (ENMS), Volume 179, Issue Chttps://doi.org/10.1016/j.envsoft.2024.106099AbstractWe investigated improvements to further speed up the multi-threaded scaling of the distributed hydrological model wflow_sbm. To gain insight in the speed improvements for operational applications, we connected the improved code to ECMWF’s Fields ...
Highlights- Improved multi-threaded scaling of the distributed hydrological model wflow_sbm.
- Two to eleven times faster run times on high-performance computing clusters.
- Distributed hydrological model application for large-scale hydrological ...
- ArticleSeptember 2023
Rapid Prototyping of Complex Micro-architectures Through High-Level Synthesis
Applied Reconfigurable Computing. Architectures, Tools, and ApplicationsPages 19–34https://doi.org/10.1007/978-3-031-42921-7_2AbstractRegister-Transfer Level (RTL) design has been a traditional approach in hardware design for several decades. However, with the growing complexity of designs and the need for fast time-to-market, the design and verification process at the RTL level ...
- research-articleFebruary 2023
A novel ILU preconditioning method with a block structure suitable for SIMD vectorization
Journal of Computational and Applied Mathematics (JCAM), Volume 419, Issue Chttps://doi.org/10.1016/j.cam.2022.114687AbstractIncomplete LU (ILU) preconditioning is typically used when an iterative solver is applied on an asymmetric system of linear equations. A fill-in selection policy significantly affects the ILU preconditioned iterative solver. In this ...
- research-articleNovember 2022
Studying error propagation on application data structure and hardware
The Journal of Supercomputing (JSCO), Volume 78, Issue 17Pages 18691–18724https://doi.org/10.1007/s11227-022-04625-xAbstractAs technology scales, transistors become smaller and aggressive power optimization techniques combined with high operation frequencies and performance-enhancing microarchitectural techniques are employed to achieve increasingly higher performance ...
- research-articleSeptember 2022
A novel hybrid multi-thread metaheuristic approach for fake news detection in social media
Applied Intelligence (KLU-APIN), Volume 53, Issue 9Pages 11182–11202https://doi.org/10.1007/s10489-022-03972-9AbstractIn fake news detection, intelligent optimization seems to be a more effective and explainable solution methodology than the black-box methods that have been extensively used in the literature. This study takes the optimization-based method one ...
-
- research-articleSeptember 2022
Fuzzing with automatically controlled interleavings to detect concurrency bugs
Journal of Systems and Software (JSSO), Volume 191, Issue Chttps://doi.org/10.1016/j.jss.2022.111379AbstractConcurrency vulnerabilities are an irresistible threat to security, and detecting them is challenging. Triggering the concurrency vulnerabilities requires a specific thread interleaving and a bug-inducing input. Existing methods have ...
Highlights- Grey-box fuzzing to detect concurrency vulnerabilities.
- Adjusting the thread ...
- research-articleSeptember 2022
Parallelizing filter-and-verification based exact set similarity joins on multicores
AbstractSet similarity join (SSJ) is a well studied problem with many algorithms proposed to speed up its performance. However, its scalability and performance are rarely discussed in modern multicore environments. Existing algorithms assume a ...
Highlights- Multi-threading has not yet been considered to speed up set similarity joins.
- ...
- research-articleJuly 2022
Real-time edge computing on multi-processes and multi-threading architectures for deep learning applications
Microprocessors & Microsystems (MSYS), Volume 92, Issue Chttps://doi.org/10.1016/j.micpro.2022.104554AbstractAs the computing power of embedded system hardware devices continues to grow, more and more deep learning models have been gradually transplanted into edge devices. Accordingly, a variety of application scenarios have been developed ...
- ArticleJanuary 2023
QR Factorization Using Malleable BLAS on Multicore Processors
High Performance Computing. ISC High Performance 2022 International WorkshopsPages 176–189https://doi.org/10.1007/978-3-031-23220-6_12AbstractWe demonstrate that significant performance benefits can be obtained via the exploitation of malleability in a framework designed to implement portable and high-performance BLAS-like kernels. For this purpose, we integrate thread-level ...
- research-articleFebruary 2022
Parallel best-first search algorithms for planning problems on multi-core processors
The Journal of Supercomputing (JSCO), Volume 78, Issue 3Pages 3122–3151https://doi.org/10.1007/s11227-021-03986-zAbstractThe multiplication of computing cores in modern processor units permits revisiting the design of classical algorithms to improve computational performance in complex application domains. Artificial Intelligence planning is one of those ...
- research-articleJanuary 2022
Cronista: A multi-database automated provenance collection system for runtime-models
Information and Software Technology (INST), Volume 141, Issue Chttps://doi.org/10.1016/j.infsof.2021.106694Abstract Context:Decision making by software systems that face uncertainty needs tracing to support understandability, as accountability is crucial. While logging has been essential to support explanations and understandability of ...
Highlights
- Cronista automatically logs the provenance of changes to a system’s runtime model.
- research-articleOctober 2021
Parallel and distributed association rule mining in life science: A novel parallel algorithm to mine genomics data
Information Sciences: an International Journal (ISCI), Volume 575, Issue CPages 747–761https://doi.org/10.1016/j.ins.2018.07.055Highlights- A parallel algorithm for Association rule mining.
- Association rule mining of ...
Association rule mining (ARM) is largely employed in several scientific areas and application domains, and many different algorithms for learning association rules from databases have been introduced. Despite the presence of many ...
- research-articleSeptember 2021
A mathematical framework for design discovery from multi-threaded applications using neural sequence solvers
Innovations in Systems and Software Engineering (SPISSE), Volume 17, Issue 3Pages 289–307https://doi.org/10.1007/s11334-021-00393-8AbstractComprehending existing multi-threaded applications effectively is a challenge without proper assistance. Research has been proposed to mine programs to extract aspects of high-level design but not much to reverse-engineer the concurrent design ...
- ArticleAugust 2021
Drill Pipe Counting Method Based on Local Dense Optical Flow Estimation
AbstractTo solve the problem of drill pipe counting in coal mines, we propose a drill pipe counting method based on local dense optical flow (LDOF) estimation. Compared to the general tracking method, we provide a new perspective for resolving the problem ...
- ArticleJune 2021
Locality: The 3rd Wall and the Need for Innovation in Parallel Architectures
AbstractIn the past we have seen two major “walls” (memory and power) whose vanquishing required significant advances in architecture. This paper discusses evidence of a third wall dealing with data locality, which is prevalent in data intensive ...
- research-articleMay 2021
An intelligent memory caching architecture for data-intensive multimedia applications
Multimedia Tools and Applications (MTAA), Volume 80, Issue 11Pages 16743–16761https://doi.org/10.1007/s11042-020-08805-wAbstractWith the rapid developments in cloud computing and mobile networks, multimedia content can be accessed conveniently. Recently, some novel intelligent caching-based approaches have been proposed to improve the memory architectures for multimedia ...
- research-articleApril 2021
Multiple-tasks on multiple-devices (MTMD): exploiting concurrency in heterogeneous managed runtimes
- Michail Papadimitriou,
- Eleni Markou,
- Juan Fumero,
- Athanasios Stratikopoulos,
- Florin Blanaru,
- Christos Kotselidis
VEE 2021: Proceedings of the 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution EnvironmentsPages 125–138https://doi.org/10.1145/3453933.3454019Modern commodity devices are nowadays equipped with a plethora of heterogeneous devices serving different purposes. Being able to exploit such heterogeneous hardware accelerators to their full potential is of paramount importance in the pursuit of ...
- research-articleMarch 2020
Programming parallel dense matrix factorizations with look-ahead and OpenMP
Cluster Computing (KLU-CLUS), Volume 23, Issue 1Pages 359–375https://doi.org/10.1007/s10586-019-02927-zAbstractWe investigate a parallelization strategy for dense matrix factorization (DMF) algorithms, using OpenMP, that departs from the legacy (or conventional) solution, which simply extracts concurrency from a multi-threaded version of basic linear ...
- research-articleOctober 2019
Combining sequentialization-based verification of multi-threaded C programs with symbolic Partial Order Reduction
International Journal on Software Tools for Technology Transfer (STTT) (STTT), Volume 21, Issue 5Pages 545–565https://doi.org/10.1007/s10009-019-00507-5AbstractSequentialization has been shown to be an effective symbolic verification technique for safety properties in multi-threaded C programs using POSIX threads. The tool Lazy-CSeq, which applies a lazy sequentialization scheme, demonstrated its ...
- research-articleAugust 2019
AggrePlay: efficient record and replay of multi-threaded programs
ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringPages 567–577https://doi.org/10.1145/3338906.3338959Deterministic replay presents challenges and often results in high memory and runtime overheads. Previous studies deterministically reproduce program outputs often only after several replay iterations or may produce a non-deterministic sequence of ...