Keyword: openMP : Search

research-article

Simulation of 3D centimeter-scale continuum tumor growth at sub-millimeter resolution via distributed computing

Computers in Biology and Medicine (CBIM), Volume 134, Issue Chttps://doi.org/10.1016/j.compbiomed.2021.104507

Abstract

Simulation of cm-scale tumor growth has generally been constrained by the computational cost to numerically solve the associated equations, with models limited to representing mm-scale or smaller tumors. While the work has proven ...

Graphical abstract

Display Omitted

Highlights

Simulation of clinically-relevant tumor growth is constrained by computational cost.

research-article

Automatic annotation of tasks in structured code

PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation TechniquesArticle No.: 31, Pages 1–13https://doi.org/10.1145/3243176.3243200

This paper describes the design and implementation of a suit of static analyses and code generation techniques to annotate programs with OpenMP pragmas for task parallelism. These techniques approximate the ranges covered by memory regions, bound ...

research-article

Maximizing system utilization via parallelism management for co-located parallel applications

PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation TechniquesArticle No.: 14, Pages 1–14https://doi.org/10.1145/3243176.3243199

With an increasing number of cores and memory controllers in multiprocessor platforms, co-location of parallel applications is gaining on importance. Key to achieve good performance is allocating the proper number of threads to co-located applications. ...

research-article

Taskminer: automatic identification of tasks

SBLP '18: Proceedings of the XXII Brazilian Symposium on Programming LanguagesPages 11–18https://doi.org/10.1145/3264637.3264639

This paper presents TaskMiner, a tool that automatically finds task parallelism in C code. TaskMiner solves classic problems of irregular parallelism, such as finding the memory ranges accessed by tasks, removing spurious static dependencies, detecting ...

research-article

Long-time simulations with complex code using multiple nodes of Intel Xeon Phi Knights Landing

Journal of Computational and Applied Mathematics (JCAM), Volume 337, Issue CPages 18–36https://doi.org/10.1016/j.cam.2017.12.050

Abstract

Modern partial differential equation (PDE) models across scientific disciplines require sophisticated numerical methods resulting in complex codes as well as large numbers of simulations for analysis like parameter studies and ...

Highlights

The Intel Xeon Phi Knights Landing is a many-core processor with 68 cores.
...

research-article

Loop unrolling effect on parallel code optimization

ICFNDS '18: Proceedings of the 2nd International Conference on Future Networks and Distributed SystemsArticle No.: 7, Pages 1–6https://doi.org/10.1145/3231053.3231060

Parallel code optimization has many challenges to improve the code performance. Loop unrolling is an optimization technique applied to the loops to reduce the frequency of branches. This optimization is a useful technique to enhance the performance of ...

research-article

DataRaceBench: a benchmark suite for systematic evaluation of data race detection tools

SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 11, Pages 1–14https://doi.org/10.1145/3126908.3126958

Data races in multi-threaded parallel applications are notoriously damaging while extremely difficult to detect. Many tools have been developed to help programmers find data races. However, there is no dedicated OpenMP benchmark suite to systematically ...

research-article

Public Access

An efficient MPI/openMP parallelization of the Hartree-Fock method for the second generation of Intel^® Xeon Phi^™ processor

SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 39, Pages 1–12https://doi.org/10.1145/3126908.3126956

Modern OpenMP threading techniques are used to convert the MPI-only Hartree-Fock code in the GAMESS program to a hybrid MPI/OpenMP algorithm. Two separate implementations that differ by the sharing or replication of key data structures among threads are ...

research-article

An enhanced parallel version of RSA public key crypto based algorithm using openMP

SIN '17: Proceedings of the 10th International Conference on Security of Information and NetworksPages 37–42https://doi.org/10.1145/3136825.3136866

Due to increased data movement and information exchange over internet and web, preserving data confidentiality and security has emerged as a prime concern for the end users. From bank transactions to document verification portals, from government ...

research-article

OpenMP 4 Fortran Modernization of WSM6 for KNL

PEARC '17: Practice and Experience in Advanced Research Computing 2017: Sustainability, Success and ImpactArticle No.: 12, Pages 1–8https://doi.org/10.1145/3093338.3093387

Parallel code portability in the petascale era requires modifying existing codes to support new architectures with large core counts and SIMD vector units. OpenMP is a well established and increasingly supported vehicle for portable parallelization. As ...

research-article

A Parallel Sparse Direct Solver via Hierarchical DAG Scheduling

ACM Transactions on Mathematical Software (TOMS), Volume 41, Issue 1Article No.: 3, Pages 1–27https://doi.org/10.1145/2629641

We present a parallel sparse direct solver for multicore architectures based on Directed Acyclic Graph (DAG) scheduling. Recently, DAG scheduling has become popular in advanced Dense Linear Algebra libraries due to its efficient asynchronous parallel ...

Article

Solving a least-squares problem with algorithmic differentiation and OpenMP

Euro-Par'13: Proceedings of the 19th international conference on Parallel ProcessingPages 763–774https://doi.org/10.1007/978-3-642-40047-6_76

Least-squares problems occur often in practice, for example, when a parametrized model is used to describe a behavior of a chemical, physical or an economic application. In this paper, we describe a method for solving least-squares problems that are ...

research-article

Portable mapping of openMP to multicore embedded systems using MCA APIs

LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systemsPages 153–162https://doi.org/10.1145/2465554.2465569

Multicore embedded systems are being widely used in telecommunication systems, robotics, medical applications and more.While they offer a high-performance with low-power solution, programming in an efficient way is still a challenge. In order to exploit ...

research-article

Portable mapping of openMP to multicore embedded systems using MCA APIs

LCTES '13: Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systemsPages 153–162https://doi.org/10.1145/2491899.2465569

Multicore embedded systems are being widely used in telecommunication systems, robotics, medical applications and more.While they offer a high-performance with low-power solution, programming in an efficient way is still a challenge. In order to exploit ...

Also Published in:

ACM SIGPLAN Notices: Volume 48 Issue 5

research-article

Scaling large-data computations on multi-GPU accelerators

ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputingPages 443–454https://doi.org/10.1145/2464996.2465023

Modern supercomputers rely on accelerators to speed up highly parallel workloads. Intricate programming models, limited device memory sizes and overheads of data transfers between CPU and accelerator memories are among the open challenges that restrict ...

Article

LIBKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms

IWOMP'12: Proceedings of the 8th international conference on OpenMP in a Heterogeneous WorldPages 102–115https://doi.org/10.1007/978-3-642-30961-8_8

To efficiently exploit high performance computing platforms, applications currently have to express more and more finer-grain parallelism. The OpenMP standard allows programmers to do so since version 3.0 and the introduction of task parallelism. Even ...

Article

Support for thread-level speculation into OpenMP

IWOMP'12: Proceedings of the 8th international conference on OpenMP in a Heterogeneous WorldPages 275–278https://doi.org/10.1007/978-3-642-30961-8_25

Software-based, thread-level speculation (TLS) systems allow the parallel execution of loops that can not be analyzed at compile time. TLS systems optimistically assume that the loop is parallelizable, and augment the original code with functions that ...

Article

An OpenMP 3.1 validation testsuite

IWOMP'12: Proceedings of the 8th international conference on OpenMP in a Heterogeneous WorldPages 237–249https://doi.org/10.1007/978-3-642-30961-8_18

Parallel programming models are evolving so rapidly that it needs to be ensured that OpenMP can be used easily to program multicore devices. There is also effort involved in getting OpenMP to be accepted as a de facto standard in the embedded system ...

Article

SPEC OMP2012 -- an application benchmark suite for parallel systems using OpenMP

IWOMP'12: Proceedings of the 8th international conference on OpenMP in a Heterogeneous WorldPages 223–236https://doi.org/10.1007/978-3-642-30961-8_17

This paper describes SPEC OMP2012, a benchmark developed by the SPEC High Performance Group. It consists of 15 OpenMP parallel applications from a wide range of fields. In addition to a performance metric based on the run time of the applications the ...

Article

Task-Based execution of nested OpenMP loops

IWOMP'12: Proceedings of the 8th international conference on OpenMP in a Heterogeneous WorldPages 210–222https://doi.org/10.1007/978-3-642-30961-8_16

In this work we propose a novel technique to reduce the overheads related to nested parallel loops in OpenMP programs. In particular we show that in many cases it is possible to replace the code of a nested parallel-for loop with equivalent code that ...

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences

Also Published in: