Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJanuary 2023
Efficient Phase-Functioned Real-time Character Control in Mobile Games: A TVM Enabled Approach
- Haidong Lan,
- Wenxi Zhu,
- Du Wu,
- Qian Qiu,
- Honglin Zhu,
- Jingjing Zhao,
- Xinghui Fu,
- Liu Wei,
- Jintao Meng,
- Minwen Deng
ICPP '22: Proceedings of the 51st International Conference on Parallel ProcessingArticle No.: 15, Pages 1–9https://doi.org/10.1145/3545008.3545095In this paper, we propose a highly efficient computing method for game character control with phase-functioned neural networks (PFNN). The primary challenge to accelerate PFNN on mobile platforms is that PFNN dynamically produces weight matrices with an ...
- research-articleJanuary 2023
DSSA: Dual-Side Sparse Systolic Array Architecture for Accelerating Convolutional Neural Network Training
ICPP '22: Proceedings of the 51st International Conference on Parallel ProcessingArticle No.: 17, Pages 1–10https://doi.org/10.1145/3545008.3545086Ever-growing CNN size incurs a significant amount of redundancy in model parameters, which in turn, puts considerable burden on hardware. Unstructured pruning is widely used to reduce model sparsity. While, the irregularity introduced by unstructured ...
- research-articleJanuary 2023
MG-GCN: A Scalable multi-GPU GCN Training Framework
ICPP '22: Proceedings of the 51st International Conference on Parallel ProcessingArticle No.: 79, Pages 1–11https://doi.org/10.1145/3545008.3545082Full batch training of Graph Convolutional Network (GCN) models is not feasible on a single GPU for large graphs containing tens of millions of vertices or more. Recent work has shown that, for the graphs used in the machine learning community, ...
- research-articleJanuary 2023
DRAM Cache Management with Request Granularity for NAND-based SSDs
ICPP '22: Proceedings of the 51st International Conference on Parallel ProcessingArticle No.: 29, Pages 1–10https://doi.org/10.1145/3545008.3545081Most flash-based solid-state drives (SSDs) employ an on-board Dynamic Random Access Memory (DRAM) to cache hot data at the SSD page granularity. This can significantly reduce the number of flush operations to the underlying arrays of SSDs given that ...
- research-articleJanuary 2023
NCC: Neighbor-aware Congestion Control based on Reinforcement Learning for Datacenter Networks
ICPP '22: Proceedings of the 51st International Conference on Parallel ProcessingArticle No.: 62, Pages 1–10https://doi.org/10.1145/3545008.3545074The challenges of low latency, high throughput datacenter networks create new traffic management problems that require new congestion control mechanisms. Generally, the proposals to solve this problem have focused either on refining existing window-...
- research-articleJanuary 2023
An Online Learning Approach for Client Selection in Federated Edge Learning under Budget Constraint
ICPP '22: Proceedings of the 51st International Conference on Parallel ProcessingArticle No.: 72, Pages 1–11https://doi.org/10.1145/3545008.3545062Federated learning (FL) has emerged as a new paradigm that enables distributed mobile devices to learn a global model collaboratively. Since mobile devices (a.k.a, clients) exhibit diversity in model training quality, client selection (CS) becomes ...
- research-articleJanuary 2023
Acuerdo: Fast Atomic Broadcast over RDMA
- Joseph Izraelevitz,
- Gaukas Wang,
- Rhett Hanscom,
- Kayli Silvers,
- Tamara Silbergleit Lehman,
- Gregory Chockler,
- Alexey Gotsman
ICPP '22: Proceedings of the 51st International Conference on Parallel ProcessingArticle No.: 59, Pages 1–11https://doi.org/10.1145/3545008.3545041Atomic broadcast protocols ensure that messages are delivered to a group of machines in some total order, even when some of these machines can fail. These protocols are key to making distributed services fault-tolerant, as their total order guarantee ...
- research-articleJanuary 2023
Repair-Optimal Data Placement for Locally Repairable Codes with Optimal Minimum Hamming Distance
ICPP '22: Proceedings of the 51st International Conference on Parallel ProcessingArticle No.: 23, Pages 1–11https://doi.org/10.1145/3545008.3545038Modern clustered storage systems increasingly adopt erasure coding to realize reliable data storage at low storage redundancy. Locally Repairable Codes (LRC) are a family of practical erasure codes with high repair efficiency. Among various LRC ...
- research-articleJanuary 2023
Mlog: Multi-log Write Buffer upon Ultra-fast SSD RAID
ICPP '22: Proceedings of the 51st International Conference on Parallel ProcessingArticle No.: 24, Pages 1–11https://doi.org/10.1145/3545008.3545034Parity-based RAID suffering from partial-stripe write-penalty has to introduce write buffer to fast absorb and merge incoming writes, and then flush them to RAID array in batch. However, we experimentally observe that the popular buffering mechanism as ...
- research-articleJanuary 2023
Boosting Cross-rack Multi-stripe Repair in Heterogeneous Erasure-coded Clusters
ICPP '22: Proceedings of the 51st International Conference on Parallel ProcessingArticle No.: 22, Pages 1–11https://doi.org/10.1145/3545008.3545029Large-scale distributed storage systems have introduced erasure code to guarantee high data reliability, yet inevitably at the expense of high repair costs. In practice, storage nodes are usually divided into different racks, and data blocks in storage ...
- research-articleJanuary 2023
TileSpMSpV: A Tiled Algorithm for Sparse Matrix-Sparse Vector Multiplication on GPUs
ICPP '22: Proceedings of the 51st International Conference on Parallel ProcessingArticle No.: 9, Pages 1–11https://doi.org/10.1145/3545008.3545028Sparse matrix-sparse vector multiplication (SpMSpV) is an important primitive for graph algorithms and machine learning applications. The sparsity of the input and output vectors makes its floating point efficiency in general lower than sparse matrix-...
- research-articleJanuary 2023
Regularizing Sparse and Imbalanced Communications for Voxel-based Brain Simulations on Supercomputers
ICPP '22: Proceedings of the 51st International Conference on Parallel ProcessingArticle No.: 81, Pages 1–11https://doi.org/10.1145/3545008.3545019Inter-process communications form a performance bottleneck for large-scale brain simulations. The sparse and imbalanced communication patterns of human brain make it particularly challenging to design a communication system for supporting large-scale ...