Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- ArticleDecember 2021
Cross Inference of Throughput Profiles Using Micro Kernel Network Method
AbstractDedicated network connections are being increasingly deployed in cloud, centralized and edge computing and data infrastructures, whose throughput profiles are critical indicators of the underlying data transfer performance. Due to the cost and ...
- research-articleMarch 2020
Accelerating Scientific Computing in the Post-Moore’s Era
- Kathleen E. Hamilton,
- Catherine D. Schuman,
- Steven R. Young,
- Ryan S. Bennink,
- Neena Imam,
- Travis S. Humble
ACM Transactions on Parallel Computing (TOPC), Volume 7, Issue 1Article No.: 6, Pages 1–31https://doi.org/10.1145/3380940Novel uses of graphical processing units for accelerated computation revolutionized the field of high-performance scientific computing by providing specialized workflows tailored to algorithmic requirements. As the era of Moore’s law draws to a close, ...
- ArticleDecember 2019
Machine Learning Methods for Connection RTT and Loss Rate Estimation Using MPI Measurements Under Random Losses
AbstractScientific computations are expected to be increasingly distributed across wide-area networks, and Message Passing Interface (MPI) has been shown to scale to support their communications over long distances. Application-level measurements of MPI ...
- research-articleSeptember 2019
Machine learning based design space exploration for hybrid main-memory design
MEMSYS '19: Proceedings of the International Symposium on Memory SystemsPages 480–489https://doi.org/10.1145/3357526.3357544We develop a machine learning (ML) based design space exploration (DSE) method that builds predictive models for various responses of a hybrid main-memory system. To overcome the challenges associated with latency, capacity, and power of memory systems ...
- research-articleApril 2019
Utility-based resource management in an oversubscribed energy-constrained heterogeneous environment executing parallel applications
- Dylan Machovec,
- Bhavesh Khemka,
- Nirmal Kumbhare,
- Sudeep Pasricha,
- Anthony A. Maciejewski,
- Howard Jay Siegel,
- Ali Akoglu,
- Gregory A. Koenig,
- Salim Hariri,
- Cihan Tunc,
- Michael Wright,
- Marcia Hilton,
- Rajendra Rambharos,
- Christopher Blandin,
- Farah Fargo,
- Ahmed Louri,
- Neena Imam
Highlights- Heuristics were designed for maximizing the utility earned by parallel tasks.
- ...
The worth of completing parallel tasks is modeled using utility functions, which monotonically-decrease with time and represent the importance and urgency of a task. These functions define the utility earned by a task at the time of ...
- research-articleJanuary 2019
Defense strategies and expected capacity of high performance computing infrastructures
ICDCN '19: Proceedings of the 20th International Conference on Distributed Computing and NetworkingPages 143–147https://doi.org/10.1145/3288599.3288625We consider high performance computing infrastructures consisting of multiple sites connected over a wide-area network. These sites house heterogeneous computing systems, network elements and local-area connections, and the wide-area network plays a ...
- research-articleNovember 2018
Sparse Hardware Embedding of Spiking Neuron Systems for Community Detection
ACM Journal on Emerging Technologies in Computing Systems (JETC), Volume 14, Issue 4Article No.: 40, Pages 1–13https://doi.org/10.1145/3223048We study the applicability of spiking neural networks and neuromorphic hardware for solving general opti- mization problems without the use of adaptive training or learning algorithms. We leverage the dynamics of Hopfield networks and spin-glass systems ...
- research-articleMay 2018
SHMEMGraph: efficient and balanced graph processing using one-sided communication
CCGrid '18: Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid ComputingPages 513–522https://doi.org/10.1109/CCGRID.2018.00078State-of-the-art synchronous graph processing frameworks face both inefficiency and imbalance issues that cause their performance to be suboptimal. These issues include the inefficiency of communication and the imbalanced graph computation/communication ...
- research-articleJuly 2017
Community detection with spiking neural networks for neuromorphic hardware
NCS '17: Proceedings of the Neuromorphic Computing SymposiumArticle No.: 9, Pages 1–8https://doi.org/10.1145/3183584.3183621We present results related to the performance of an algorithm for community detection which incorporates event-driven computation. We define a mapping which takes a graph 𝒢 to a system of symmetrically connected, spiking neurons and use spike train ...
- tutorialMay 2017
High-Performance Key-Value Store On OpenSHMEM
CCGrid '17: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid ComputingPages 559–568https://doi.org/10.1109/CCGRID.2017.49Recently, there has been a growing interest in enabling fast data analytics by leveraging system capabilities from large-scale high-performance computing (HPC) systems. OpenSHMEM is a popular run-time system on HPC systems that has been used for large-...
- surveySeptember 2016
Understanding GPU Power: A Survey of Profiling, Modeling, and Simulation Methods
ACM Computing Surveys (CSUR), Volume 49, Issue 3Article No.: 41, Pages 1–27https://doi.org/10.1145/2962131Modern graphics processing units (GPUs) have complex architectures that admit exceptional performance and energy efficiency for high-throughput applications. Although GPUs consume large amounts of power, their use for high-throughput applications ...
- ArticleAugust 2015
Graph 500 in OpenSHMEM
OpenSHMEM 2015: Revised Selected Papers of the Second Workshop on OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies - Volume 9397Pages 154–163https://doi.org/10.1007/978-3-319-26428-8_10This document describes the effort to implement the Graph 500 benchmark using OpenSHMEM based on the MPI-2 one-side version. The Graph 500 benchmark performs a breadth-first search in parallel on a large randomly generated undirected graph and can be ...