Cited By
View all- Du JJiang JZheng JZhang HHuang DLu Y(2023)Improving Computation and Memory Efficiency for Real-world Transformer Inference on GPUsACM Transactions on Architecture and Code Optimization10.1145/361768920:4(1-22)Online publication date: 26-Oct-2023
- Zhang QLiu YLiu TQian D(2023)CoFB: latency-constrained co-scheduling of flows and batches for deep learning inference service on the CPU–GPU systemThe Journal of Supercomputing10.1007/s11227-023-05183-679:13(14172-14199)Online publication date: 4-Apr-2023
- Zhao MZhao KZhou ZChen X(2022)Edge Resource Autoscaling for Hierarchical Federated Learning Over Public Edge Platforms2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta)10.1109/SmartWorld-UIC-ATC-ScalCom-DigitalTwin-PriComp-Metaverse56740.2022.00123(806-814)Online publication date: Dec-2022