Cited By
View all- Kim SSim EShin YCho YBaek W(2024)Activation Sequence Caching: High-Throughput and Memory-Efficient Generative Inference with a Single GPUProceedings of the 2024 International Conference on Parallel Architectures and Compilation Techniques10.1145/3656019.3676945(78-90)Online publication date: 14-Oct-2024
- Zhou HRang WChen HZhou XCheng D(2024)DeepTM: Efficient Tensor Management in Heterogeneous Memory for DNN TrainingIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.343191035:11(1920-1935)Online publication date: Nov-2024
- Zhou JChen YHong ZChen WYu YZhang TWang HZhang CZheng Z(2024)Training and Serving System of Foundation Models: A Comprehensive SurveyIEEE Open Journal of the Computer Society10.1109/OJCS.2024.33808285(107-119)Online publication date: 2024
- Show More Cited By