default search action
ICPP 2023: Salt Lake City, UT, USA
- Proceedings of the 52nd International Conference on Parallel Processing, ICPP 2023, Salt Lake City, UT, USA, August 7-10, 2023. ACM 2023
Numerics (In-Person)
- Sameer Deshmukh, Rio Yokota, George Bosilca, Qianxiang Ma:
O(N) distributed direct factorization of structured dense matrices using runtime systems. 1-10
Optimization of AI/ML (In Person)
- Georgia Channing, Ria Patel, Paula Olaya, Ariel Keller Rorabaugh, Osamu Miyashita, Silvina Caíno-Lores, Catherine D. Schuman, Florence Tama, Michela Taufer:
Composable Workflow for Accelerating Neural Architecture Search Using In Situ Analytics for Protein Classification. 1
Numerics (In-Person)
- M. Ridwan Apriansyah, Rio Yokota:
Computing the k-th Eigenvalue of Symmetric H2-Matrices. 11-20 - Junqing Lin, Honghe Zhang, Xiaolong Shi, Jingwei Sun, Xianzhi Yu, Jun Yao, Guangzhong Sun:
EC-SpMM: Efficient Compilation of SpMM Kernel on GPUs. 21-30
Compression and Encoding (In Person)
- Fangzheng Lin, Kasidis Arunruangsirilert, Heming Sun, Jiro Katto:
Recoil: Parallel rANS Decoding with Decoder-Adaptive Scalability. 31-40 - Mi Zhang, Qihan Kang, Patrick P. C. Lee:
Minimizing Network and Storage Costs for Consensus with Flexible Erasure Coding. 41-50 - Shui Jiang, Tsung-Wei Huang, Bei Yu, Tsung-Yi Ho:
SNICIT: Accelerating Sparse Neural Network Inference via Compression at Inference Time on GPU. 51-61
AI/ML Performance (Remote Session)
- Lixiao Cui, Kedi Yang, Yusen Li, Gang Wang, Xiaoguang Liu:
DiffLex: A High-Performance, Memory-Efficient and NUMA-Aware Learned Index using Differentiated Management. 62-71 - Hesheng Sun, Xinyi Chen, Zhuzhong Qian, Zengji Li, Ning Chen, Tuo Cao, Suwei Xu, Yitong Zhou:
BIRP: Batch-aware Inference Workload Redistribution and Parallel Scheme for Edge Collaboration. 72-81 - Yongwen Qiu, Yongmei Lei, Guozheng Wang:
PSRA-HGADMM: A Communication Efficient Distributed ADMM Algorithm. 82-91 - Zhenxing Li, Qiang Cao, Yajie Chen, Wenrui Yan:
CoTrain: Efficient Scheduling for Large-Model Training upon GPU and CPU in Parallel. 92-101 - Zixuan Chen, Lei Shi, Xuandong Liu, Jiahui Li, Sen Liu, Yang Xu:
OSP: Boosting Distributed Model Training with 2-stage Synchronization. 102-111 - Yuning Zhang, Zao Zhang, Wei Bao, Dong Yuan:
ITIF: Integrated Transformers Inference Framework for Multiple Tenants on GPU. 112-121
Graph Algorithms (In Person)
- Bin Guo, Emil Sekerinski:
Parallel Order-Based Core Maintenance in Dynamic Graphs. 122-131 - Md Abdul Motaleb Faysal, Maximilian H. Bremer, Cy P. Chan, John Shalf, Shaikh Arifuzzaman:
Fast Parallel Index Construction for Efficient K-truss-based Local Community Detection in Large Graphs. 132-141 - Samiran Kawtikwar, Mohammad Almasri, Wen-Mei Hwu, Rakesh Nagi, Jinjun Xiong:
BEEP: Balanced Efficient subgraph Enumeration in Parallel. 142-152
Programming Models (In Person)
- Omri Mor, George Bosilca, Marc Snir:
Improving the Scaling of an Asynchronous Many-Task Runtime with a Lightweight Communication Engine. 153-162 - Romain Pereira, Adrien Roussel, Patrick Carribault, Thierry Gautier:
Investigating Dependency Graph Discovery Impact on Task-based MPI+OpenMP Applications Performances. 163-172 - Eric Wright, Johannes Doerfert, Shilei Tian, Barbara M. Chapman, Sunita Chandrasekaran:
Implementing OpenMP's SIMD Directive in LLVM's GPU Runtime. 173-182
Applications (Remote Session)
- Peng Wang, Yu Liu, Zhelong Zhao, Ke Zhou, Zhihai Huang, Yanxiong Chen:
Smart Cache Insertion and Promotion Policy for Content Delivery Networks. 183-192 - Haowen Zhang, Jing Li, He Zhao, Tong Zhou, Nianzu Sheng, Hengyu Pan:
BlockPilot: A Proposer-Validator Parallel Execution Framework for Blockchain. 193-202 - Chenyang Jiao, Weihua Zhang, Li Shen:
Communication Optimizations for State-vector Quantum Simulator on CPU+GPU Clusters. 203-212
LMS-Tree Research (Remote Session)
- Zepeng Wang, Shu Yin:
RBC: A bandwidth controller to reduce write-stalls and tail latency. 213-222 - Ziyi Lu, Qiang Cao, Shucheng Wang, Jie Yao, Xiangrui Yang:
PMLDS: An LSM-Tree Direct Managed Storage for Key-Value Stores on Byte-Addressable Devices. 223-232 - Chen Ding, Jian Zhou, Jiguang Wan, Yiqin Xiong, Sicen Li, Shuning Chen, Hanyang Liu, Liu Tang, Ling Zhan, Kai Lu, Peng Xu:
DComp: Efficient Offload of LSM-tree Compaction with Data Processing Units. 233-243
Applications (Remote Session, Part II)
- Jiali Li, Xianzhang Chen, Duo Liu, Ao Ren, Zhaoyang Zeng, Yujuan Tan:
RadarSSD: A Computational Storage for Radar Signal Processing. 244-253
Training (In Person)
- Sixu Hu, Qinbin Li, Bingsheng He:
Communication-Efficient Generalized Neuron Matching for Federated Learning. 254-263 - Jiyao Liu, Xinliang Wei, Xuanzhang Liu, Hongchang Gao, Yu Wang:
Group-based Hierarchical Federated Learning: Convergence, Group Formation, and Sampling. 264-273 - Feiwen Zhu, Michal Futrega, Han Bao, Sukru Burc Eryilmaz, Fei Kong, Kefeng Duan, Xinnian Zheng, Nimrod Angel, Matthias Jouanneaux, Maxmilian Stadler, Michal Marcinkiewicz, Fung Xie, June Yang, Michael Andersch:
FastDimeNet++: Training DimeNet++ in 22 minutes. 274-284
Communication (In Person)
- Thomas Gillis, Ken Raffenetti, Hui Zhou, Yanfei Guo, Rajeev Thakur:
Quantifying the Performance Benefits of Partitioned Communication in MPI. 285-294 - George Katevenis, Manolis Ploumidis, Manolis Marazakis:
Impact of Cache Coherence on the Performance of Shared-Memory based MPI Primitives: A Case Study for Broadcast on Intel Xeon Scalable Processors. 295-305 - Whit Schonbein, Scott Levy, Matthew G. F. Dosanjh, W. Pepper Marts, Elizabeth Reid, Ryan E. Grant:
Modeling and Benchmarking the Potential Benefit of Early-Bird Transmission in Fine-Grained Communication. 306-316
System Software (Remote Session)
- Tiannuo Yang, Ruobing Chen, Yusen Li, Xiaoguang Liu, Gang Wang:
CoTuner: A Hierarchical Learning Framework for Coordinately Optimizing Resource Partitioning and Parameter Tuning. 317-326 - Jingrun Zhang, Guangba Yu, Zilong He, Liang Ai, Pengfei Chen:
DeepPower: Deep Reinforcement Learning based Power Management for Latency Critical Applications in Multi-core Systems. 327-336 - Yi Bian, Fangyu Zheng, Yuewu Wang, Lingguang Lei, Yuan Ma, Jiankuo Dong, Jiwu Jing:
AsyncGBP: Unleashing the Potential of Heterogeneous Computing for SSL/TLS with GPU-based Provider. 337-346 - Benran Wang, Hongyang Chen, Pengfei Chen, Zilong He, Guangba Yu:
MARS: Fault Localization in Programmable Networking Systems with Low-cost In-Band Network Telemetry. 347-357 - Xianzhi Zhu, Yongkun Li, Lulu Yao, Zhihao Qi, Yinlong Xu, Pengcheng Wang, Weiguang Wang, Xia Zhu:
On Optimizing Traffic Scheduling for Multi-replica Containerized Microservices. 358-368 - Xinxin Qi, Juan Chen, Yong Dong, Yuan Yuan, Tao Xu, Rongyu Deng, Zekai Li, Kexing Zhou, Zheng Wang:
HighRPM: Combining Integrated Measurement and Sofware Power Modeling for High-Resolution Power Monitoring. 369-379
Applications (In Person)
- Suneth Dasantha Ekanayake, István Zoltan Reguly, Fabio Luporini, Gihan Ravideva Mudalige:
Communication-Avoiding Optimizations for Large-Scale Unstructured-Mesh Applications with OP2. 380-391 - Abbas Haghi, Lluc Alvarez, Jordi Fornt, Juan Miguel De Haro Ruiz, Roger Figueras, Max Doblas, Santiago Marco-Sola, Miquel Moretó:
WFAsic: A High-Performance ASIC Accelerator for DNA Sequence Alignment on a RISC-V SoC. 392-401 - Jiechao Gao, Wenpeng Wang, Fateme Nikseresht, Viswajith Govinda Rajan, Bradford Campbell:
PFDRL: Personalized Federated Deep Reinforcement Learning for Residential Energy Management. 402-411
Resource Scheduling and Adaptation (In Person)
- Hengwei Xu, Pengyuan Zhou, Haiyong Xie, Yong Liao:
Mercury: Fast and Optimal Device Placement for Large Deep Learning Models. 412-422 - Suraiya Tairin, Haiying Shen, Zeyu Zhang:
Embracing Uncertainty for Equity in Resource Allocation in ML Training. 423-432 - Ghazanfar Ali, Mert Side, Sridutt Bhalachandra, Nicholas J. Wright, Yong Chen:
Performance-Aware Energy-Efficient GPU Frequency Selection using DNN-based Models. 433-442
Federated Learning (Remote Session)
- Jieling Yu, Ruiting Zhou, Chen Chen, Bo Li, Fang Dong:
ASFL: Adaptive Semi-asynchronous Federated Learning for Balancing Model Accuracy and Total Latency in Mobile Edge Networks. 443-451 - Mengyao Du, Miao Zhang, Lin Liu, Kai Xu, Quanjun Yin:
Credit-based Differential Privacy Stochastic Model Aggregation Algorithm for Robust Federated Learning via Blockchain. 452-461 - Songli Zhang, Zhenzhe Zheng, Fan Wu, Bingshuai Li, Yunfeng Shao, Guihai Chen:
Learning From Your Neighbours: Mobility-Driven Device-Edge-Cloud Federated Learning. 462-471 - Qingyuan Wang, Bin Gao, Zhi Zhou, Fei Xu, Chenghao Ouyang:
DAG-Aware Optimization for Geo-Distributed Data Analytics. 472-481 - YuAng Chen, Yeh-Ching Chung:
Connectivity-Aware Link Analysis for Skewed Graphs. 482-491 - Haishuang Fan, Ming Li, Jingya Wu, Wenyan Lu, Xiaowei Li, Guihai Yan:
BitColor: Accelerating Large-Scale Graph Coloring on FPGA with Parallel Bit-Wise Engines. 492-502
Graph-Related Techniques (In Person)
- Andrey Prokopenko, Damien Lebrun-Grandié, Daniel Arndt:
Fast tree-based algorithms for DBSCAN for low-dimensional data on GPUs. 503-512 - Qinglin Lu, Xinyu Wang, Wenjing Ma, Yuwen Zhao, Daokun Chen, Fangfang Liu:
GFFT: a Task Graph Based Fast Fourier Transform Optimization Framework. 513-523 - Octavi Obiols-Sales, Abhinav Vishnu, Nicholas Malaya, Aparna Chandramowlishwaran:
ADARNet: Deep Learning Predicts Adaptive Mesh Refinement. 524-534
Memory and Storage (In Person)
- Louis-Claude Canon, Anthony Dugois, Loris Marchal, Etienne Rivière:
Hector: A Framework to Design and Evaluate Scheduling Strategies in Persistent Key-Value Stores. 535-545 - Jong-Hyun Jeong, Myung Kuk Yoon, Yunho Oh, Gunjae Koo:
Warped-MC: An Efficient Memory Controller Scheme for Massively Parallel Processors. 546-555
Networks (Remote Session)
- Fei Dai, Yawen Chen, Zhiyi Huang, Haibo Zhang:
Wrht: Efficient All-reduce for Distributed DNN Training in Optical Interconnect Systems. 556-565 - Hao Zhang, Yawen Chen, Zhiyi Huang, Haibo Zhang, Fei Dai:
SEECHIP: A Scalable and Energy-Efficient Chiplet-based GPU Architecture Using Photonic Links. 566-575 - Jinbin Hu, Yi He, Jin Wang, Wangqing Luo, Jiawei Huang:
RLB: Reordering-Robust Load Balancing in Lossless Datacenter Networks. 576-584
Scheduling (Remote Session)
- Hehuan Shi, Lin Chen, Ming Lin, Raphael C.-W. Phan:
Scheduling Dependent Batching Tasks. 585-594 - Yicheng Feng, Shihao Shen, Mengwei Xu, Yuanming Ren, Xiaofei Wang, Victor C. M. Leung, Wenyu Wang:
Tango: Harmonious Management and Scheduling for Mixed Services Co-located among Distributed Edge-Clouds. 595-604 - Diaohan Luo, Tian Yu, Yuewen Wu, Heng Wu, Tao Wang, Wenbo Zhang:
SPLIT: QoS-Aware DNN Inference on Shared GPU via Evenly-Sized Model Splitting. 605-614 - Huadong Li, Hui Liu, Changyuan Liu, Aoqi Chen, Zhaocheng Niu, Junzhao Du:
NeiLatS: Neighbor-Aware Latency-Sensitive Application Scheduling in Heterogeneous Cloud-Edge Environment. 615-624
Inference (In Person)
- Xueyu Hou, Yongjie Guan, Tao Han:
Dystri: A Dynamic Inference based Distributed DNN Service Framework on Edge. 625-634 - Jianfeng Gu, Yichao Zhu, Puxuan Wang, Mohak Chadha, Michael Gerndt:
FaST-GShare: Enabling Efficient Spatio-Temporal GPU Sharing in Serverless Computing for Deep Learning Inference. 635-644 - Beilei Jiang, Xianwei Cheng, Yuan Li, Jocelyn Zhang, Song Fu, Qing Yang, Mingxiong Liu, Alejandro Olvera:
Output-Directed Dynamic Quantization for DNN Acceleration. 645-654
Compilation and Checkpointing Techniques (In Person)
- Jan Hückelheim, Johannes Doerfert:
ORAQL - Optimistic Responses to Alias Queries in LLVM. 655-664 - Nigel Tan, Jakob Lüttgau, Jack Marquez, Keita Teranishi, Nicolas M. Morales, Sanjukta Bhowmick, Franck Cappello, Michela Taufer, Bogdan Nicolae:
Scalable Incremental Checkpointing using GPU-Accelerated De-Duplication. 665-674 - Masaki Nakata, Shigeyuki Sato, Tomoharu Ugawa:
General-purpose Asynchronous Periodic Checkpointing in Hybrid Memory. 675-684
Memory and Storage (Remote Session)
- Zhenlin Qi, Shengan Zheng, Yifeng Hui, Bowen Zhang, Linpeng Huang:
Conflux: Exploiting Persistent Memory and RDMA Bandwidth via Adaptive I/O Mode Selection. 685-694 - Hang An, Fang Wang, Dan Feng, Xiaomin Zou, Zefeng Liu, Jianshun Zhang:
Marlin: A Concurrent and Write-Optimized B+-tree Index on Disaggregated Memory. 695-704 - Weiming Huang, Yajuan Du, Mingyang Liu:
GPU Performance Acceleration via Intra-Group Sharing TLB. 705-714 - Baorong Ding, Mingcong Han, Rong Chen:
DArray: A High Performance RDMA-Based Distributed Array. 715-724 - Hao Zhao, Si Wu, Haifeng Liu, Zhixiang Tang, Xiaochun He, Yinlong Xu:
Toward Optimal Repair and Load Balance in Locally Repairable Codes. 725-735 - Zhigang Cai, Chengyong Tang, Minjun Li, François Trahay, Jun Li, Zhibing Sha, Jiaojiao Wu, Fan Yang, Jianwei Liao:
Re-aligning Across-page Requests for Flash-based Solid-state Drives. 736-745
Optimization of AI/ML (In Person)
- Daegun Yoon, Sangyoon Oh:
DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification. 746-755 - Shenggui Li, Hongxin Liu, Zhengda Bian, Jiarui Fang, Haichen Huang, Yuliang Liu, Boxiang Wang, Yang You:
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training. 766-775
Numerics (Remote Session)
- Jie Yan, Zhang Yang, Aiqing Zhang, Zeyao Mo:
JSweep: A Patch-centric Data-driven Approach for Parallel Sweeps on Large-scale Meshes. 776-785 - Mingzhen Li, Hailong Yang, Shanjun Zhang, Fengwei Yu, Ruihao Gong, Yi Liu, Zhongzhi Luan, Depei Qian:
Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs. 786-796 - Zhao Liu, Xuesen Chu, Xiaojing Lv, Hanyue Liu, Haohuan Fu, Guangwen Yang:
Accelerating Large-Scale CFD Simulations with Lattice Boltzmann Method on a 40-Million-Core Sunway Supercomputer. 797-806 - Helin Cheng, Wenxuan Li, Yuechen Lu, Weifeng Liu:
HASpGEMM: Heterogeneity-Aware Sparse General Matrix-Matrix Multiplication on Modern Asymmetric Multicore Processors. 807-817 - Ran Zhao, Chao Li, Xiaowei Guo, Yi Liu, Sifan Long, Sen Zhang, Yanlong Qiu, Canqun Yang:
An Improved Parallel Overset Grid Method for Fluid Simulation with Moving Boundary. 818-827 - Jing Chen, Madhavan Manivannan, Bhavishya Goel, Miquel Pericàs:
JOSS: Joint Exploration of CPU-Memory DVFS and Task Scheduling for Energy Efficiency. 828-838
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.