Cited By
View all- Zhang YLu LYang ZLiang ZSuo S(2025)A load-balanced acceleration method for small and irregular batch matrix multiplication on GPUJournal of Systems Architecture10.1016/j.sysarc.2025.103341160(103341)Online publication date: Mar-2025
- Liang HDeng CZhang PFang JTang THuang C(2025)An empirical performance evaluation of SYCL on ARM multi-core processorsCCF Transactions on High Performance Computing10.1007/s42514-024-00212-z7:1(1-16)Online publication date: 14-Feb-2025
- Han RChen JGarg BZhou XLu JYoung JSim JKim H(2024)CuPBoP: Making CUDA a Portable LanguageACM Transactions on Design Automation of Electronic Systems10.1145/365994929:4(1-25)Online publication date: 21-Jun-2024
- Show More Cited By