Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleMay 2022
Extending an asynchronous runtime system for high throughput applications: A case study
Journal of Parallel and Distributed Computing (JPDC), Volume 163, Issue CPages 214–231https://doi.org/10.1016/j.jpdc.2022.01.027Highlights- Asynchronous Many Task Runtimes effectively maps to Big Data Domain Frameworks.
Current supercomputers are mostly composed of vast numbers of nodes enhanced with accelerators (usually in the form of GPUs). However, having these heterogeneous designs in the forefront have exposed the software toolchains and ...
- research-articleFebruary 2022
A Profile-Based AI-Assisted Dynamic Scheduling Approach for Heterogeneous Architectures
International Journal of Parallel Programming (IJPP), Volume 50, Issue 1Pages 115–151https://doi.org/10.1007/s10766-021-00721-2AbstractWhile heterogeneous architectures are increasing popular with High Performance Computing systems, their effectiveness depends on how efficient the scheduler is at allocating workloads onto appropriate computing devices and how communication and ...
- research-articleNovember 2021
E.T.: re-thinking self-attention for transformer models on GPUs
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 25, Pages 1–18https://doi.org/10.1145/3458817.3476138Transformer-based deep learning models have become a ubiquitous vehicle to drive a variety of Natural Language Processing (NLP) related tasks beyond their accuracy ceiling. However, these models also suffer from two pronounced challenges, that is, ...
- research-articleDecember 2017
Generating Fine-Grain Multithreaded Applications Using a Multigrain Approach
ACM Transactions on Architecture and Code Optimization (TACO), Volume 14, Issue 4Article No.: 47, Pages 1–26https://doi.org/10.1145/3155288The recent evolution in hardware landscape, aimed at producing high-performance computing systems capable of reaching extreme-scale performance, has reignited the interest in fine-grain multithreading, particularly at the intranode level. Indeed, ...
-
- research-articleSeptember 2017
HAMR
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 31, Issue 5Pages 361–374https://doi.org/10.1177/1094342016672080As the attention given to big data grows, cluster computing systems for distributed processing of large data sets become the mainstream and critical requirement in high performance distributed system research. One of the most successful systems is ...
- research-articleMarch 2017
Leveraging access port positions to accelerate page table walk in DWM-based main memory
Domain Wall Memory (DWM) with ultra-high density and comparable read/write latency to DRAM is an attractive replacement for CMOS-based devices. Unlike DRAM, DWM has non-uniform data access latency that is proportional to the number of shift operations. ...
- ArticleOctober 2016
Toward a Parallel Turing Machine Model
AbstractIn the field of parallel computing, the late leader Ken Kennedy, has raised a concern in early 1990s: “Is Parallel Computing Dead?” Now, we have witnessed the tremendous momentum of the “second spring” of parallel computing in recent years. But, ...
- ArticleDecember 2015
Energy efficient multi-level tiling for dense matrix multiplication on many-core architecture
IGSC '15: Proceedings of the 2015 Sixth International Green and Sustainable Computing Conference (IGSC)Pages 1–6https://doi.org/10.1109/IGCC.2015.7393735With computing systems marching to exascale and big data era, power consumption has become more and more important for the system design. Energy efficiency is becoming one of the critical dimensions in the computer system design space and has been ...
- research-articleSeptember 2015
FreshBreeze
Procedia Computer Science (PROCS), Volume 51, Issue CPages 2573–2582https://doi.org/10.1016/j.procs.2015.05.365The DDDAS paradigm, unifying applications, mathematical modeling, and sensors, is now more relevant than ever with the advent of Large-Scale/Big-Data and Big-Computing. Large-Scale-Dynamic-Data (advertised as the next wave of Big Data) includes the ...
- research-articleNovember 2014
TERAFLUX
- Roberto Giorgi,
- Rosa M. Badia,
- François Bodin,
- Albert Cohen,
- Paraskevas Evripidou,
- Paolo Faraboschi,
- Bernhard Fechner,
- Guang R. Gao,
- Arne Garbade,
- Rahul Gayatri,
- Sylvain Girbal,
- Daniel Goodman,
- Behran Khan,
- Souad Koliaï,
- Joshua Landwehr,
- Nhat Minh Lê,
- Feng Li,
- Mikel Lujàn,
- Avi Mendelson,
- Laurent Morin,
- Nacho Navarro,
- Tomasz Patejko,
- Antoniu Pop,
- Pedro Trancoso,
- Theo Ungerer,
- Ian Watson,
- Sebastian Weis,
- Stéphane Zuckerman,
- Mateo Valero
Microprocessors & Microsystems (MSYS), Volume 38, Issue 8Pages 976–990https://doi.org/10.1016/j.micpro.2014.04.001Display Omitted Scalable architecture for manycore, tera-device computing.Task-parallel programming models combining dataflow and stateful computations.Parallel simulation of large-scale multi-node architectures.Fault detection and recovery for task-...
- bookOctober 2013
Design Methods and Applications for Distributed Embedded Systems: IFIP 18th World Computer Congress, TC10 Working Conference on Distributed and ... in Information and Communication Technology)
The ever decreasing price/performance ratio of microcontrollers makes it economically attractive to replace more and more conventional mechanical or electronic control systems within many products by embedded real-time computer systems. An embedded real-...
- ArticleAugust 2013
An implementation of the codelet model
Euro-Par'13: Proceedings of the 19th international conference on Parallel ProcessingPages 633–644https://doi.org/10.1007/978-3-642-40047-6_63Chip architectures are shifting from few, faster, functionally heavy cores to abundant, slower, simpler cores to address pressing physical limitations such as energy consumption and heat expenditure. As architectural trends continue to fluctuate, we ...
- articleApril 2013
StreamTMC: Stream compilation for tiled multi-core architectures
Journal of Parallel and Distributed Computing (JPDC), Volume 73, Issue 4Pages 484–494https://doi.org/10.1016/j.jpdc.2012.12.001Tiled multi-core architectures have become an important kind of multi-core design for its good scalability and low power consumption. Stream programming has been productively applied to a number of important application domains. It provides an ...
- research-articleDecember 2012
Software Pipelining for Stream Programs on Resource Constrained Multicore Architectures
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 23, Issue 12Pages 2338–2350https://doi.org/10.1109/TPDS.2012.41Stream programming model has been productively applied to a number of important application domains. Software pipelining is an important code scheduling technique for stream programs. However, the multicore evolution has presented a new dimension of ...
- ArticleSeptember 2012
Determinacy and Repeatability of Parallel Program Schemata
DFM '12: Proceedings of the 2012 Data-Flow Execution Models for Extreme Scale ComputingPages 1–9https://doi.org/10.1109/DFM.2012.10The concept of "determinism" of parallel programs and parallel systems has received a lot of attention since the dawn of computing, with multiple proposals for formal and informal definitions of deterministic execution. In this paper, we present precise ...
- ArticleDecember 2011
Source Code Partitioning in Program Optimization
ICPADS '11: Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed SystemsPages 56–63https://doi.org/10.1109/ICPADS.2011.125Program analysis and program optimization seek to improve program performance. There are optimization techniques which are applied to various scopes such as a source file, function or basic block. Inter-procedural program optimization techniques have ...