Liu J, Chen S and Shen L.
(2025). A comprehensive survey on graph neural network accelerators. Frontiers of Computer Science: Selected Publications from Chinese Universities. 19:2. Online publication date: 1-Feb-2025.
Shao Y, Li H, Gu X, Yin H, Li Y, Miao X, Zhang W, Cui B and Chen L.
(2024). Distributed Graph Neural Network Training: A Survey. ACM Computing Surveys. 56:8. (1-39). Online publication date: 31-Aug-2024.
Yang Y, Emer J and Sanchez D.
(2024). Trapezoid: A Versatile Accelerator for Dense and Sparse Matrix Multiplications 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA). 10.1109/ISCA59077.2024.00072. 979-8-3503-2658-1. (931-945).
Lin Y, Chen Y, Gobriel S, Jain N, Jha G and Prasanna V.
(2024). ARGO: An Auto-Tuning Runtime System for Scalable GNN Training on Multi-Core Processor 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 10.1109/IPDPS57955.2024.00039. 979-8-3503-8711-7. (361-372).
Chen S, Liu J and Shen L. A Survey on Graph Neural Network Acceleration: A Hardware Perspective. Chinese Journal of Electronics. 10.23919/cje.2023.00.135. 33:3. (601-622).
Block C, Gerogiannis G, Mendis C, Azad A and Torrellas J. Two-Face: Combining Collective and One-Sided Communication for Efficient Distributed SpMM. Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. (1200-1217).
Ai X, Wang Q, Cao C, Zhang Y, Chen C, Yuan H, Gu Y and Yu G.
(2024). NeutronOrch: Rethinking Sample-Based GNN Training under CPU-GPU Heterogeneous Environments. Proceedings of the VLDB Endowment. 17:8. (1995-2008). Online publication date: 1-Apr-2024.
Gerogiannis G, Aananthakrishnan S, Torrellas J and Hur I.
(2024). HotTiles: Accelerating SpMM with Heterogeneous Accelerator Architectures 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 10.1109/HPCA57654.2024.00081. 979-8-3503-9313-2. (1012-1028).
Fu Q, Rolinger T and Huang H.
(2024). JITSPMM: Just-in-Time Instruction Generation for Accelerated Sparse Matrix-Matrix Multiplication 2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 10.1109/CGO57630.2024.10444827. 979-8-3503-9509-9. (448-459).
Wang Q, Chen Y, Wong W and He B.
(2023). HongTu: Scalable Full-Graph GNN Training on Multiple GPUs. Proceedings of the ACM on Management of Data. 1:4. (1-27). Online publication date: 8-Dec-2023.
Bakhshalipour M and Gibbons P.
(2023). Agents of Autonomy: A Systematic Study of Robotics on Modern Hardware. Proceedings of the ACM on Measurement and Analysis of Computing Systems. 7:3. (1-31). Online publication date: 7-Dec-2023.
Lee K, Gonzales C, Spellings M, Galkin M, Miret S and Kumar N. Towards Foundation Models for Materials Science: The Open MatSci ML Toolkit. Proceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis. (51-59).
Prieto P, Abad P, Gregorio J and Puente V. Performance Characterization of Popular DNN Models on Out-of-Order CPUs. Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques. (199-210).
Khatua A, Mailthody V, Taleka B, Ma T, Song X and Hwu W. IGB: Addressing The Gaps In Labeling, Features, Heterogeneity, and Size of Public Graph Datasets for Deep Learning Research. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. (4284-4295).
Gerogiannis G, Yesil S, Lenadora D, Cao D, Mendis C and Torrellas J. SPADE: A Flexible and Scalable Accelerator for SpMM and SDDMM. Proceedings of the 50th Annual International Symposium on Computer Architecture. (1-15).
Yan M, Zou M, Yang X, Li W, Ye X, Fan D and Xie Y.
(2022). Characterizing and Understanding HGNNs on GPUs. IEEE Computer Architecture Letters. 21:2. (69-72). Online publication date: 1-Jul-2022.