Cited By
View all- Chen YLi WZhou HYang XYin Y(2024)DeInfer: A GPU resource allocation algorithm with spatial sharing for near-deterministic inferring tasksProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673091(701-711)Online publication date: 12-Aug-2024
- Han ZZhou RXu CZeng YZhang R(2024)
InSS : An Intelligent Scheduling Orchestrator for Multi-GPU Inference With Spatio-Temporal SharingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.343006335:10(1735-1748)Online publication date: 1-Oct-2024 - Pang PDeng GBai KChen QSun SLiu BXu YYao HWang ZWang XLiu ZSong ZYang YMa TGuo M(2023)Async-Fork: Mitigating Query Latency Spikes Incurred by the Fork-based Snapshot Mechanism from the OS LevelProceedings of the VLDB Endowment10.14778/3579075.357907916:5(1033-1045)Online publication date: 6-Mar-2023
- Show More Cited By