Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJuly 2021
Simulation of 3D centimeter-scale continuum tumor growth at sub-millimeter resolution via distributed computing
Computers in Biology and Medicine (CBIM), Volume 134, Issue Chttps://doi.org/10.1016/j.compbiomed.2021.104507AbstractSimulation of cm-scale tumor growth has generally been constrained by the computational cost to numerically solve the associated equations, with models limited to representing mm-scale or smaller tumors. While the work has proven ...
Graphical abstractDisplay Omitted
Highlights- Simulation of clinically-relevant tumor growth is constrained by computational cost.
- research-articleNovember 2018
Automatic annotation of tasks in structured code
PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation TechniquesArticle No.: 31, Pages 1–13https://doi.org/10.1145/3243176.3243200This paper describes the design and implementation of a suit of static analyses and code generation techniques to annotate programs with OpenMP pragmas for task parallelism. These techniques approximate the ranges covered by memory regions, bound ...
Maximizing system utilization via parallelism management for co-located parallel applications
PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation TechniquesArticle No.: 14, Pages 1–14https://doi.org/10.1145/3243176.3243199With an increasing number of cores and memory controllers in multiprocessor platforms, co-location of parallel applications is gaining on importance. Key to achieve good performance is allocating the proper number of threads to co-located applications. ...
- research-articleSeptember 2018
Taskminer: automatic identification of tasks
SBLP '18: Proceedings of the XXII Brazilian Symposium on Programming LanguagesPages 11–18https://doi.org/10.1145/3264637.3264639This paper presents TaskMiner, a tool that automatically finds task parallelism in C code. TaskMiner solves classic problems of irregular parallelism, such as finding the memory ranges accessed by tasks, removing spurious static dependencies, detecting ...
- research-articleAugust 2018
Long-time simulations with complex code using multiple nodes of Intel Xeon Phi Knights Landing
Journal of Computational and Applied Mathematics (JCAM), Volume 337, Issue CPages 18–36https://doi.org/10.1016/j.cam.2017.12.050AbstractModern partial differential equation (PDE) models across scientific disciplines require sophisticated numerical methods resulting in complex codes as well as large numbers of simulations for analysis like parameter studies and ...
Highlights- The Intel Xeon Phi Knights Landing is a many-core processor with 68 cores.
- ...
-
- research-articleJune 2018
Loop unrolling effect on parallel code optimization
ICFNDS '18: Proceedings of the 2nd International Conference on Future Networks and Distributed SystemsArticle No.: 7, Pages 1–6https://doi.org/10.1145/3231053.3231060Parallel code optimization has many challenges to improve the code performance. Loop unrolling is an optimization technique applied to the loops to reduce the frequency of branches. This optimization is a useful technique to enhance the performance of ...
- research-articleNovember 2017
DataRaceBench: a benchmark suite for systematic evaluation of data race detection tools
SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 11, Pages 1–14https://doi.org/10.1145/3126908.3126958Data races in multi-threaded parallel applications are notoriously damaging while extremely difficult to detect. Many tools have been developed to help programmers find data races. However, there is no dedicated OpenMP benchmark suite to systematically ...
- research-articleNovember 2017
An efficient MPI/openMP parallelization of the Hartree-Fock method for the second generation of Intel® Xeon Phi™ processor
SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 39, Pages 1–12https://doi.org/10.1145/3126908.3126956Modern OpenMP threading techniques are used to convert the MPI-only Hartree-Fock code in the GAMESS program to a hybrid MPI/OpenMP algorithm. Two separate implementations that differ by the sharing or replication of key data structures among threads are ...
- research-articleOctober 2017
An enhanced parallel version of RSA public key crypto based algorithm using openMP
SIN '17: Proceedings of the 10th International Conference on Security of Information and NetworksPages 37–42https://doi.org/10.1145/3136825.3136866Due to increased data movement and information exchange over internet and web, preserving data confidentiality and security has emerged as a prime concern for the end users. From bank transactions to document verification portals, from government ...
- research-articleJuly 2017
OpenMP 4 Fortran Modernization of WSM6 for KNL
PEARC '17: Practice and Experience in Advanced Research Computing 2017: Sustainability, Success and ImpactArticle No.: 12, Pages 1–8https://doi.org/10.1145/3093338.3093387Parallel code portability in the petascale era requires modifying existing codes to support new architectures with large core counts and SIMD vector units. OpenMP is a well established and increasingly supported vehicle for portable parallelization. As ...
- research-articleOctober 2014
A Parallel Sparse Direct Solver via Hierarchical DAG Scheduling
ACM Transactions on Mathematical Software (TOMS), Volume 41, Issue 1Article No.: 3, Pages 1–27https://doi.org/10.1145/2629641We present a parallel sparse direct solver for multicore architectures based on Directed Acyclic Graph (DAG) scheduling. Recently, DAG scheduling has become popular in advanced Dense Linear Algebra libraries due to its efficient asynchronous parallel ...
- ArticleAugust 2013
Solving a least-squares problem with algorithmic differentiation and OpenMP
Euro-Par'13: Proceedings of the 19th international conference on Parallel ProcessingPages 763–774https://doi.org/10.1007/978-3-642-40047-6_76Least-squares problems occur often in practice, for example, when a parametrized model is used to describe a behavior of a chemical, physical or an economic application. In this paper, we describe a method for solving least-squares problems that are ...
- research-articleOctober 2018
Portable mapping of openMP to multicore embedded systems using MCA APIs
LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systemsPages 153–162https://doi.org/10.1145/2465554.2465569Multicore embedded systems are being widely used in telecommunication systems, robotics, medical applications and more.While they offer a high-performance with low-power solution, programming in an efficient way is still a challenge. In order to exploit ...
- research-articleJune 2013
Portable mapping of openMP to multicore embedded systems using MCA APIs
LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systemsPages 153–162https://doi.org/10.1145/2491899.2465569Multicore embedded systems are being widely used in telecommunication systems, robotics, medical applications and more.While they offer a high-performance with low-power solution, programming in an efficient way is still a challenge. In order to exploit ...
Also Published in:
ACM SIGPLAN Notices: Volume 48 Issue 5 - research-articleJune 2013
Scaling large-data computations on multi-GPU accelerators
ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputingPages 443–454https://doi.org/10.1145/2464996.2465023Modern supercomputers rely on accelerators to speed up highly parallel workloads. Intricate programming models, limited device memory sizes and overheads of data transfers between CPU and accelerator memories are among the open challenges that restrict ...
- ArticleJune 2012
LIBKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms
IWOMP'12: Proceedings of the 8th international conference on OpenMP in a Heterogeneous WorldPages 102–115https://doi.org/10.1007/978-3-642-30961-8_8To efficiently exploit high performance computing platforms, applications currently have to express more and more finer-grain parallelism. The OpenMP standard allows programmers to do so since version 3.0 and the introduction of task parallelism. Even ...
- ArticleJune 2012
Support for thread-level speculation into OpenMP
IWOMP'12: Proceedings of the 8th international conference on OpenMP in a Heterogeneous WorldPages 275–278https://doi.org/10.1007/978-3-642-30961-8_25Software-based, thread-level speculation (TLS) systems allow the parallel execution of loops that can not be analyzed at compile time. TLS systems optimistically assume that the loop is parallelizable, and augment the original code with functions that ...
- ArticleJune 2012
An OpenMP 3.1 validation testsuite
IWOMP'12: Proceedings of the 8th international conference on OpenMP in a Heterogeneous WorldPages 237–249https://doi.org/10.1007/978-3-642-30961-8_18Parallel programming models are evolving so rapidly that it needs to be ensured that OpenMP can be used easily to program multicore devices. There is also effort involved in getting OpenMP to be accepted as a de facto standard in the embedded system ...
- ArticleJune 2012
SPEC OMP2012 -- an application benchmark suite for parallel systems using OpenMP
- Matthias S. Müller,
- John Baron,
- William C. Brantley,
- Huiyu Feng,
- Daniel Hackenberg,
- Robert Henschel,
- Gabriele Jost,
- Daniel Molka,
- Chris Parrott,
- Joe Robichaux,
- Pavel Shelepugin,
- Matthijs van Waveren,
- Brian Whitney,
- Kalyan Kumaran
IWOMP'12: Proceedings of the 8th international conference on OpenMP in a Heterogeneous WorldPages 223–236https://doi.org/10.1007/978-3-642-30961-8_17This paper describes SPEC OMP2012, a benchmark developed by the SPEC High Performance Group. It consists of 15 OpenMP parallel applications from a wide range of fields. In addition to a performance metric based on the run time of the applications the ...
- ArticleJune 2012
Task-Based execution of nested OpenMP loops
IWOMP'12: Proceedings of the 8th international conference on OpenMP in a Heterogeneous WorldPages 210–222https://doi.org/10.1007/978-3-642-30961-8_16In this work we propose a novel technique to reduce the overheads related to nested parallel loops in OpenMP programs. In particular we show that in many cases it is possible to replace the code of a nested parallel-for loop with equivalent code that ...