default search action
Yuxiong He
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c91]Conglong Li, Zhewei Yao, Xiaoxia Wu, Minjia Zhang, Connor Holmes, Cheng Li, Yuxiong He:
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing. AAAI 2024: 18490-18498 - [c90]Zhewei Yao, Xiaoxia Wu, Cheng Li, Stephen Youn, Yuxiong He:
Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation. AAAI 2024: 19377-19385 - [c89]Guanhua Wang, Heyang Qin, Sam Ade Jacobs, Xiaoxia Wu, Connor Holmes, Zhewei Yao, Samyam Rajbhandari, Olatunji Ruwase, Feng Yan, Lei Yang, Yuxiong He:
ZeRO++: Extremely Efficient Collective Communication for Large Model Training. ICLR 2024 - [c88]Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Reza Yazdani Aminabadi, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He:
System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models. IPDPS (Workshops) 2024: 1206-1208 - [c87]Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Reza Yazdani Aminadabi, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He:
System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models. PODC 2024: 121-130 - [c86]Haojun Xia, Zhen Zheng, Xiaoxia Wu, Shiyang Chen, Zhewei Yao, Stephen Youn, Arash Bakhtiari, Michael Wyatt, Donglin Zhuang, Zhongzhu Zhou, Olatunji Ruwase, Yuxiong He, Shuaiwen Leon Song:
Quant-LLM: Accelerating the Serving of Large Language Models via FP6-Centric Algorithm-System Co-Design on Modern GPUs. USENIX ATC 2024: 699-713 - [i48]Connor Holmes, Masahiro Tanaka, Michael Wyatt, Ammar Ahmad Awan, Jeff Rasley, Samyam Rajbhandari, Reza Yazdani Aminabadi, Heyang Qin, Arash Bakhtiari, Lev Kurilenko, Yuxiong He:
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference. CoRR abs/2401.08671 (2024) - [i47]Haojun Xia, Zhen Zheng, Xiaoxia Wu, Shiyang Chen, Zhewei Yao, Stephen Youn, Arash Bakhtiari, Michael Wyatt, Donglin Zhuang, Zhongzhu Zhou, Olatunji Ruwase, Yuxiong He, Shuaiwen Leon Song:
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design. CoRR abs/2401.14112 (2024) - [i46]Guanhua Wang, Olatunji Ruwase, Bing Xie, Yuxiong He:
FastPersist: Accelerating Model Checkpointing in Deep Learning. CoRR abs/2406.13768 (2024) - [i45]Jaeseong Lee, Seung-won Hwang, Aurick Qiao, Daniel F. Campos, Zhewei Yao, Yuxiong He:
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning. CoRR abs/2409.06211 (2024) - 2023
- [j17]Reza Yazdani Aminabadi, Olatunji Ruwase, Minjia Zhang, Yuxiong He, José-María Arnau, Antonio González:
SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Networks. ACM Trans. Embed. Comput. Syst. 22(2): 30:1-30:23 (2023) - [c85]Minjia Zhang, Uma-Naresh Niranjan, Yuxiong He:
Revisiting the Efficiency-Accuracy Tradeoff in Adapting Transformer Models via Adversarial Fine-Tuning. ECAI 2023: 3026-3033 - [c84]Sheng Shen, Zhewei Yao, Chunyuan Li, Trevor Darrell, Kurt Keutzer, Yuxiong He:
Scaling Vision-Language Models with Sparse Mixture of Experts. EMNLP (Findings) 2023: 11329-11344 - [c83]Yucheng Lu, Conglong Li, Minjia Zhang, Christopher De Sa, Yuxiong He:
Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam. ICLR 2023 - [c82]Syed Zawad, Cheng Li, Zhewei Yao, Elton Zheng, Yuxiong He, Feng Yan:
DySR: Adaptive Super-Resolution via Algorithm and System Co-design. ICLR 2023 - [c81]Xiaoxia Wu, Cheng Li, Reza Yazdani Aminabadi, Zhewei Yao, Yuxiong He:
Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases. ICML 2023: 37524-37539 - [c80]Siddharth Singh, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He, Abhinav Bhatele:
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training. ICS 2023: 203-214 - [c79]Chengming Zhang, Shaden Smith, Baixi Sun, Jiannan Tian, Jonathan Soifer, Xiaodong Yu, Shuaiwen Leon Song, Yuxiong He, Dingwen Tao:
HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs. ICS 2023: 324-335 - [c78]Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning. IPDPS 2023: 996-1006 - [i44]Xiaoxia Wu, Cheng Li, Reza Yazdani Aminabadi, Zhewei Yao, Yuxiong He:
Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases. CoRR abs/2301.12017 (2023) - [i43]Siddharth Singh, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He, Abhinav Bhatele:
A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training. CoRR abs/2303.06318 (2023) - [i42]Sheng Shen, Zhewei Yao, Chunyuan Li, Trevor Darrell, Kurt Keutzer, Yuxiong He:
Scaling Vision-Language Models with Sparse Mixture of Experts. CoRR abs/2303.07226 (2023) - [i41]Zhewei Yao, Cheng Li, Xiaoxia Wu, Stephen Youn, Yuxiong He:
A Comprehensive Study on Post-Training Quantization for Large Language Models. CoRR abs/2303.08302 (2023) - [i40]Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning. CoRR abs/2303.08374 (2023) - [i39]Chengming Zhang, Shaden Smith, Baixi Sun, Jiannan Tian, Jonathan Soifer, Xiaodong Yu, Shuaiwen Leon Song, Yuxiong He, Dingwen Tao:
HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs. CoRR abs/2304.07334 (2023) - [i38]Pareesa Ameneh Golnari, Zhewei Yao, Yuxiong He:
Selective Guidance: Are All the Denoising Steps of Guided Diffusion Important? CoRR abs/2305.09847 (2023) - [i37]Guanhua Wang, Heyang Qin, Sam Ade Jacobs, Connor Holmes, Samyam Rajbhandari, Olatunji Ruwase, Feng Yan, Lei Yang, Yuxiong He:
ZeRO++: Extremely Efficient Collective Communication for Giant Model Training. CoRR abs/2306.10209 (2023) - [i36]Xiaoxia Wu, Zhewei Yao, Yuxiong He:
ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats. CoRR abs/2307.09782 (2023) - [i35]Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, Yuxiong He:
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales. CoRR abs/2308.01320 (2023) - [i34]Fengxiang Bie, Yibo Yang, Zhongzhu Zhou, Adam Ghanem, Minjia Zhang, Zhewei Yao, Xiaoxia Wu, Connor Holmes, Pareesa Ameneh Golnari, David A. Clifton, Yuxiong He, Dacheng Tao, Shuaiwen Leon Song:
RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model. CoRR abs/2309.00810 (2023) - [i33]Zhewei Yao, Xiaoxia Wu, Conglong Li, Minjia Zhang, Heyang Qin, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He:
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention. CoRR abs/2309.14327 (2023) - [i32]Sam Ade Jacobs, Masahiro Tanaka, Chengming Zhang, Minjia Zhang, Shuaiwen Leon Song, Samyam Rajbhandari, Yuxiong He:
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models. CoRR abs/2309.14509 (2023) - [i31]Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan A. Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri, Rao Kotamarthi, Venkatram Vishwanath, Arvind Ramanathan, Sam Foreman, Kyle Hippe, Troy Arcomano, Romit Maulik, Maxim Zvyagin, Alexander Brace, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, Carla M. Mann, Michael W. Irvin, J. Gregory Pauloski, Logan T. Ward, Valérie Hayot-Sasson, Murali Emani, Zhen Xie, Diangen Lin, Maulik Shukla, Ian T. Foster, James J. Davis, Michael E. Papka, Thomas S. Brettin, Prasanna Balaprakash, Gina Tourassi, John Gounley, Heidi A. Hanson, Thomas E. Potok, Massimiliano Lupo Pasini, Kate Evans, Dan Lu, Dalton D. Lunga, Junqi Yin, Sajal Dash, Feiyi Wang, Mallikarjun Shankar, Isaac Lyngaas, Xiao Wang, Guojing Cong, Pei Zhang, Ming Fan, Siyan Liu, Adolfy Hoisie, Shinjae Yoo, Yihui Ren, William Tang, Kyle Felker, Alexey Svyatkovskiy, Hang Liu, Ashwin M. Aji, Angela Dalton, Michael J. Schulte, Karl Schulz, Yuntian Deng, Weili Nie, Josh Romero, Christian Dallago, Arash Vahdat, Chaowei Xiao, Thomas Gibbs, Anima Anandkumar, Rick Stevens:
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies. CoRR abs/2310.04610 (2023) - [i30]Zhewei Yao, Reza Yazdani Aminabadi, Stephen Youn, Xiaoxia Wu, Elton Zheng, Yuxiong He:
ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers. CoRR abs/2310.17723 (2023) - [i29]Xiaoxia Wu, Haojun Xia, Stephen Youn, Zhen Zheng, Shiyang Chen, Arash Bakhtiari, Michael Wyatt, Reza Yazdani Aminabadi, Yuxiong He, Olatunji Ruwase, Leon Song, Zhewei Yao:
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks. CoRR abs/2312.08583 (2023) - 2022
- [c77]Minjia Zhang, Uma-Naresh Niranjan, Yuxiong He:
Adversarial Data Augmentation for Task-Specific Knowledge Distillation of Pre-trained Transformers. AAAI 2022: 11685-11693 - [c76]Xiao Zhang, Yuxiong He, Youhuai Wang, Xiaoming Chen, Shi Jin, Ye Liang:
Task Offloading Based on GRU Model in IoT. EITCE 2022: 151-156 - [c75]Conglong Li, Ammar Ahmad Awan, Hanlin Tang, Samyam Rajbhandari, Yuxiong He:
1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed. HIPC 2022: 272-281 - [c74]Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He:
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale. ICML 2022: 18332-18346 - [c73]Conglong Li, Minjia Zhang, Yuxiong He:
The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models. NeurIPS 2022 - [c72]Xiaoxia Wu, Zhewei Yao, Minjia Zhang, Conglong Li, Yuxiong He:
XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient. NeurIPS 2022 - [c71]Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang, Xiaoxia Wu, Conglong Li, Yuxiong He:
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers. NeurIPS 2022 - [c70]Reza Yazdani Aminabadi, Samyam Rajbhandari, Ammar Ahmad Awan, Cheng Li, Du Li, Elton Zheng, Olatunji Ruwase, Shaden Smith, Minjia Zhang, Jeff Rasley, Yuxiong He:
DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale. SC 2022: 46:1-46:15 - [c69]Minjia Zhang, Wenhan Wang, Yuxiong He:
GraSP: Optimizing Graph-based Nearest Neighbor Search with Subgraph Sampling and Pruning. WSDM 2022: 1395-1405 - [i28]Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He:
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale. CoRR abs/2201.05596 (2022) - [i27]Shaden Smith, Mostofa Patwary, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari, Jared Casper, Zhun Liu, Shrimai Prabhumoye, George Zerveas, Vijay Korthikanti, Elton Zheng, Rewon Child, Reza Yazdani Aminabadi, Julie Bernauer, Xia Song, Mohammad Shoeybi, Yuxiong He, Michael Houston, Saurabh Tiwary, Bryan Catanzaro:
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model. CoRR abs/2201.11990 (2022) - [i26]Minjia Zhang, Uma-Naresh Niranjan, Yuxiong He:
ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise. CoRR abs/2201.12469 (2022) - [i25]Yucheng Lu, Conglong Li, Minjia Zhang, Christopher De Sa, Yuxiong He:
Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam. CoRR abs/2202.06009 (2022) - [i24]Xiaoxia Wu, Zhewei Yao, Minjia Zhang, Conglong Li, Yuxiong He:
Extreme Compression for Pre-trained Transformers Made Simple and Efficient. CoRR abs/2206.01859 (2022) - [i23]Zhewei Yao, Reza Yazdani Aminabadi, Minjia Zhang, Xiaoxia Wu, Conglong Li, Yuxiong He:
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers. CoRR abs/2206.01861 (2022) - [i22]Connor Holmes, Minjia Zhang, Yuxiong He, Bo Wu:
Compressing Pre-trained Transformers via Low-Bit NxM Sparsity for Natural Language Understanding. CoRR abs/2206.15014 (2022) - [i21]Reza Yazdani Aminabadi, Samyam Rajbhandari, Minjia Zhang, Ammar Ahmad Awan, Cheng Li, Du Li, Elton Zheng, Jeff Rasley, Shaden Smith, Olatunji Ruwase, Yuxiong He:
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale. CoRR abs/2207.00032 (2022) - [i20]Yuxin Ma, Ping Gong, Jun Yi, Zhewei Yao, Minjie Wang, Cheng Li, Yuxiong He, Feng Yan:
BiFeat: Supercharge GNN Training via Graph Feature Quantization. CoRR abs/2207.14696 (2022) - [i19]Zhewei Yao, Xiaoxia Wu, Conglong Li, Connor Holmes, Minjia Zhang, Cheng Li, Yuxiong He:
Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers. CoRR abs/2211.11586 (2022) - [i18]Conglong Li, Zhewei Yao, Xiaoxia Wu, Minjia Zhang, Yuxiong He:
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing. CoRR abs/2212.03597 (2022) - 2021
- [c68]Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He:
1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed. ICML 2021: 10118-10129 - [c67]Connor Holmes, Minjia Zhang, Yuxiong He, Bo Wu:
NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM. NeurIPS 2021: 1818-1830 - [c66]Heyang Qin, Samyam Rajbhandari, Olatunji Ruwase, Feng Yan, Lei Yang, Yuxiong He:
SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training using Gradient Similarity Measurement. NeurIPS 2021: 20531-20544 - [c65]Samyam Rajbhandari, Olatunji Ruwase, Jeff Rasley, Shaden Smith, Yuxiong He:
ZeRO-infinity: breaking the GPU memory wall for extreme scale deep learning. SC 2021: 59 - [c64]Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, Yuxiong He:
ZeRO-Offload: Democratizing Billion-Scale Model Training. USENIX ATC 2021: 551-564 - [i17]Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li, Yuxiong He:
ZeRO-Offload: Democratizing Billion-Scale Model Training. CoRR abs/2101.06840 (2021) - [i16]Hanlin Tang, Shaoduo Gan, Ammar Ahmad Awan, Samyam Rajbhandari, Conglong Li, Xiangru Lian, Ji Liu, Ce Zhang, Yuxiong He:
1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed. CoRR abs/2102.02888 (2021) - [i15]Conglong Li, Ammar Ahmad Awan, Hanlin Tang, Samyam Rajbhandari, Yuxiong He:
1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed. CoRR abs/2104.06069 (2021) - [i14]Samyam Rajbhandari, Olatunji Ruwase, Jeff Rasley, Shaden Smith, Yuxiong He:
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning. CoRR abs/2104.07857 (2021) - [i13]Conglong Li, Minjia Zhang, Yuxiong He:
Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training. CoRR abs/2108.06084 (2021) - [i12]Young Jin Kim, Ammar Ahmad Awan, Alexandre Muzio, Andrés Felipe Cruz-Salinas, Liyang Lu, Amr Hendy, Samyam Rajbhandari, Yuxiong He, Hany Hassan Awadalla:
Scalable and Efficient MoE Training for Multitask Multilingual Models. CoRR abs/2109.10465 (2021) - [i11]Connor Holmes, Minjia Zhang, Yuxiong He, Bo Wu:
NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM. CoRR abs/2110.15766 (2021) - 2020
- [j16]Abdulaziz Almaslukh, Amr Magdy, Ahmed M. Aly, Mohamed F. Mokbel, Sameh Elnikety, Yuxiong He, Suman Nath, Walid G. Aref:
Local trend discovery on real-time microblogs with uncertain locations in tight memory environments. GeoInformatica 24(2): 301-337 (2020) - [j15]Yang You, Yuxiong He, Samyam Rajbhandari, Wenhan Wang, Cho-Jui Hsieh, Kurt Keutzer, James Demmel:
Fast LSTM by dynamic decomposition on cloud and distributed systems. Knowl. Inf. Syst. 62(11): 4169-4197 (2020) - [c63]Jeff Rasley, Samyam Rajbhandari, Olatunji Ruwase, Yuxiong He:
DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters. KDD 2020: 3505-3506 - [c62]Minjia Zhang, Yuxiong He:
Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping. NeurIPS 2020 - [c61]Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He:
ZeRO: memory optimizations toward training trillion parameter models. SC 2020: 20 - [c60]Conglong Li, Minjia Zhang, David G. Andersen, Yuxiong He:
Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination. SIGMOD Conference 2020: 2539-2554 - [i10]Hanlin Tang, Shaoduo Gan, Samyam Rajbhandari, Xiangru Lian, Ji Liu, Yuxiong He, Ce Zhang:
APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm. CoRR abs/2008.11343 (2020) - [i9]Minjia Zhang, Yuxiong He:
Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping. CoRR abs/2010.13369 (2020)
2010 – 2019
- 2019
- [j14]Yu Su, Xiaoqi Ren, Shai Vardi, Adam Wierman, Yuxiong He:
Communication-Aware Scheduling of Precedence-Constrained Tasks. SIGMETRICS Perform. Evaluation Rev. 47(2): 21-23 (2019) - [c59]Minjia Zhang, Yuxiong He:
GRIP: Multi-Store Capacity-Optimized High-Performance Nearest Neighbor Search for Vector Search Engine. CIKM 2019: 1673-1682 - [c58]Connor Holmes, Daniel Mawhirter, Yuxiong He, Feng Yan, Bo Wu:
GRNN: Low-Latency and Scalable RNN Inference on GPUs. EuroSys 2019: 41:1-41:16 - [c57]Yang You, Yuxiong He, Samyam Rajbhandari, Wenhan Wang, Cho-Jui Hsieh, Kurt Keutzer, James Demmel:
Fast LSTM Inference by Dynamic Decomposition on Cloud Systems. ICDM 2019: 748-757 - [c56]Minjia Zhang, Samyam Rajbhandari, Wenhan Wang, Elton Zheng, Olatunji Ruwase, Jeff Rasley, Jason Li, Junhua Wang, Yuxiong He:
Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft. OpML 2019: 5-7 - [c55]Jonathan Soifer, Jason Li, Mingqin Li, Jeffrey Zhu, Yingnan Li, Yuxiong He, Elton Zheng, Adi Oltean, Maya Mosyak, Chris Barnes, Thomas Liu, Junhua Wang:
Deep Learning Inference Service at Microsoft. OpML 2019: 15-17 - [i8]Samyam Rajbhandari, Harsh Shrivastava, Yuxiong He:
AntMan: Sparse Low-Rank Compression to Accelerate RNN inference. CoRR abs/1910.01740 (2019) - [i7]Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He:
ZeRO: Memory Optimization Towards Training A Trillion Parameter Models. CoRR abs/1910.02054 (2019) - [i6]Reza Yazdani, Olatunji Ruwase, Minjia Zhang, Yuxiong He, José-María Arnau, Antonio González:
LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory. CoRR abs/1911.01258 (2019) - 2018
- [j13]Farshid Farhat, Diman Zad Tootaghaj, Yuxiong He, Anand Sivasubramaniam, Mahmut T. Kandemir, Chita R. Das:
Stochastic Modeling and Optimization of Stragglers. IEEE Trans. Cloud Comput. 6(4): 1164-1177 (2018) - [j12]Feng Yan, Yuxiong He, Olatunji Ruwase, Evgenia Smirni:
Efficient Deep Neural Network Serving: Fast and Furious. IEEE Trans. Netw. Serv. Manag. 15(1): 112-126 (2018) - [c54]Wei Wen, Yuxiong He, Samyam Rajbhandari, Minjia Zhang, Wenhan Wang, Fang Liu, Bin Hu, Yiran Chen, Hai Li:
Learning Intrinsic Sparse Structures within Long Short-Term Memory. ICLR (Poster) 2018 - [c53]Minjia Zhang, Wenhan Wang, Xiaodong Liu, Jianfeng Gao, Yuxiong He:
Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models. NeurIPS 2018: 6311-6322 - [c52]Minjia Zhang, Samyam Rajbhandari, Wenhan Wang, Yuxiong He:
DeepCPU: Serving RNN-based Deep Learning Models 10x Faster. USENIX ATC 2018: 951-965 - [c51]Conglong Li, David G. Andersen, Qiang Fu, Sameh Elnikety, Yuxiong He:
Better Caching in Search Advertising Systems with Rapid Refresh Predictions. WWW 2018: 1875-1884 - [i5]Minjia Zhang, Xiaodong Liu, Wenhan Wang, Jianfeng Gao, Yuxiong He:
Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models. CoRR abs/1806.04189 (2018) - [i4]Minjia Zhang, Yuxiong He:
Zoom: SSD-based Vector Search for Optimizing Accuracy, Latency and Memory. CoRR abs/1809.04067 (2018) - 2017
- [j11]Jaimie Kelley, Christopher Stewart, Nathaniel Morris, Devesh Tiwari, Yuxiong He, Sameh Elnikety:
Obtaining and Managing Answer Quality for Online Data-Intensive Services. ACM Trans. Model. Perform. Evaluation Comput. Syst. 2(2): 11:1-11:31 (2017) - [c50]Samyam Rajbhandari, Yuxiong He, Olatunji Ruwase, Michael Carbin, Trishul M. Chilimbi:
Optimizing CNNs on Multicores for Scalability, Performance and Goodput. ASPLOS 2017: 267-280 - [c49]Conglong Li, David G. Andersen, Qiang Fu, Sameh Elnikety, Yuxiong He:
Workload analysis and caching strategies for search advertising systems. SoCC 2017: 170-180 - [c48]Xinning Hui, Zhihui Du, Jason Liu, Hongyang Sun, Yuxiong He, David A. Bader:
When Good Enough Is Better: Energy-Aware Scheduling for Multicore Servers. IPDPS Workshops 2017: 984-993 - [c47]Md. Enamul Haque, Yuxiong He, Sameh Elnikety, Thu D. Nguyen, Ricardo Bianchini, Kathryn S. McKinley:
Exploiting heterogeneity for tail latency and energy efficiency. MICRO 2017: 625-638 - [c46]Jeff Rasley, Yuxiong He, Feng Yan, Olatunji Ruwase, Rodrigo Fonseca:
HyperDrive: exploring hyperparameters with POP scheduling. Middleware 2017: 1-13 - [c45]Arpan Gujarati, Sameh Elnikety, Yuxiong He, Kathryn S. McKinley, Björn B. Brandenburg:
Swayam: distributed autoscaling to meet SLAs of machine learning inference services with resource efficiency. Middleware 2017: 109-120 - [c44]Bob Goodwin, Michael Hopcroft, Dan Luu, Alex Clemmer, Mihaela Curmei, Sameh Elnikety, Yuxiong He:
BitFunnel: Revisiting Signatures for Search. SIGIR 2017: 605-614 - [c43]Tim Kaler, Yuxiong He, Sameh Elnikety:
Optimal Reissue Policies for Reducing Tail Latency. SPAA 2017: 195-206 - [i3]Wei Wen, Yuxiong He, Samyam Rajbhandari, Wenhan Wang, Fang Liu, Bin Hu, Yiran Chen, Hai Li:
Learning Intrinsic Sparse Structures within Long Short-term Memory. CoRR abs/1709.05027 (2017) - 2016
- [j10]Amr Magdy, Mohamed F. Mokbel, Sameh Elnikety, Suman Nath, Yuxiong He:
Venus: Scalable Real-Time Spatial Queries on Microblogs with Adaptive Load Shedding. IEEE Trans. Knowl. Data Eng. 28(2): 356-370 (2016) - [j9]Seung-won Hwang, Saehoon Kim, Yuxiong He, Sameh Elnikety, Seungjin Choi:
Prediction and Predictability for Search Query Acceleration. ACM Trans. Web 10(3): 19:1-19:28 (2016) - [c42]Myeongjae Jeon, Yuxiong He, Hwanju Kim, Sameh Elnikety, Scott Rixner, Alan L. Cox:
TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services. ASPLOS 2016: 129-141 - [c41]Amr Magdy, Ahmed M. Aly, Mohamed F. Mokbel, Sameh Elnikety, Yuxiong He, Suman Nath, Walid G. Aref:
GeoTrend: spatial trending queries on real-time microblogs. SIGSPATIAL/GIS 2016: 7:1-7:10 - [c40]Jing Li, Kunal Agrawal, Sameh Elnikety, Yuxiong He, I-Ting Angelina Lee, Chenyang Lu, Kathryn S. McKinley:
Work stealing for interactive services to meet target latency. PPoPP 2016: 14:1-14:13 - [c39]Feng Yan, Yuxiong He, Olatunji Ruwase, Evgenia Smirni:
SERF: efficient scheduling for fast deep neural network serving via judicious parallelism. SC 2016: 300-311 - 2015
- [j8]Taesung Lee, Jin-Woo Park, Sanghoon Lee, Seung-won Hwang, Sameh Elnikety, Yuxiong He:
Processing and Optimizing Main Memory Spatial-Keyword Queries. Proc. VLDB Endow. 9(3): 132-143 (2015) - [c38]Md. Enamul Haque, Yong Hun Eom, Yuxiong He, Sameh Elnikety, Ricardo Bianchini, Kathryn S. McKinley:
Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services. ASPLOS 2015: 161-175 - [c37]Jaimie Kelley, Christopher Stewart, Nathaniel Morris, Devesh Tiwari, Yuxiong He, Sameh Elnikety:
Measuring and Managing Answer Quality for Online Data-Intensive Services. ICAC 2015: 167-176 - [c36]Feng Yan, Olatunji Ruwase, Yuxiong He, Trishul M. Chilimbi:
Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems. KDD 2015: 1355-1364 - [c35]A. Hasan Mahmud, Yuxiong He, Shaolei Ren:
BATS: Budget-Constrained Autoscaling for Cloud Performance Optimization. MASCOTS 2015: 232-241 - [c34]Jeong-Min Yun, Yuxiong He, Sameh Elnikety, Shaolei Ren:
Optimal Aggregation Policy for Reducing Tail Latency of Web Search. SIGIR 2015: 63-72 - [c33]Saehoon Kim, Yuxiong He, Seung-won Hwang, Sameh Elnikety, Seungjin Choi:
Delayed-Dynamic-Selective (DDS) Prediction for Reducing Extreme Tail Latency in Web Search. WSDM 2015: 7-16 - [r1]Kishwar Ahmed, Shaolei Ren, Yuxiong He, Athanasios V. Vasilakos:
Online Resource Management for Carbon-Neutral Cloud Computing. Handbook on Data Centers 2015: 607-630 - [i2]Jaimie Kelley, Christopher Stewart, Nathaniel Morris, Devesh Tiwari, Yuxiong He, Sameh Elnikety:
Measuring and Managing Answer Quality for Online Data-Intensive Services. CoRR abs/1506.05172 (2015) - 2014
- [j7]Sherif Sakr, Sameh Elnikety, Yuxiong He:
Hybrid query execution engine for large attributed graphs. Inf. Syst. 41: 45-73 (2014) - [j6]Hongyang Sun, Yuxiong He, Wen-Jing Hsu, Rui Fan:
Energy-efficient multiprocessor scheduling for flow time and makespan. Theor. Comput. Sci. 550: 1-20 (2014) - [c32]Amr Magdy, Mohamed F. Mokbel, Sameh Elnikety, Suman Nath, Yuxiong He:
Mercury: A memory-constrained spatio-temporal real-time search on microblogs. ICDE 2014: 172-183 - [c31]Amr Magdy, Ahmed M. Aly, Mohamed F. Mokbel, Sameh Elnikety, Yuxiong He, Suman Nath:
Mars: Real-time spatio-temporal queries on microblogs. ICDE 2014: 1238-1241 - [c30]Myeongjae Jeon, Saehoon Kim, Seung-won Hwang, Yuxiong He, Sameh Elnikety, Alan L. Cox, Scott Rixner:
Predictive parallelization: taming tail latencies in web search. SIGIR 2014: 253-262 - [c29]A. Hasan Mahmud, Yuxiong He, Shaolei Ren:
BATS: budget-constrained autoscaling for cloud performance optimization. SIGMETRICS 2014: 563-564 - [c28]Shaolei Ren, Yuxiong He, Kathryn S. McKinley:
A Theoretical Foundation for Scheduling and Designing Heterogeneous Processors for Interactive Applications. DISC 2014: 152-166 - 2013
- [j5]Mohamed Sarwat, Sameh Elnikety, Yuxiong He, Mohamed F. Mokbel:
Horton+: A Distributed System for Processing Declarative Reachability Queries over Partitioned Graphs. Proc. VLDB Endow. 6(14): 1918-1929 (2013) - [c27]Jinhan Kim, Sameh Elnikety, Yuxiong He, Seung-won Hwang, Shaolei Ren:
QACO: exploiting partial execution in web servers. CAC 2013: 12:1-12:10 - [c26]Zhihui Du, Ramin Yahyapour, Yuxiong He, Nectarios Koziris, Bilha Mendelson, Veronika Sonigo, Achim Streit, Andrei Tchernykh:
Topic 3: Scheduling and Load Balancing - (Introduction). Euro-Par 2013: 65 - [c25]Myeongjae Jeon, Yuxiong He, Sameh Elnikety, Alan L. Cox, Scott Rixner:
Adaptive parallelism for web search. EuroSys 2013: 155-168 - [c24]Shaolei Ren, Yuxiong He, Sameh Elnikety, Kathryn S. McKinley:
Exploiting Processor Heterogeneity in Interactive Services. ICAC 2013: 45-58 - [c23]Mingyuan Xia, Nan Zhu, Sameh Elnikety, Xue Liu, Yuxiong He:
Performance Inconsistency in Large Scale Data Processing Clusters. ICAC 2013: 297-302 - [c22]Kaiqi Xiong, Yuxiong He:
Power-effiicent resource allocation in MapReduce clusters. IM 2013: 603-608 - [c21]Zhihui Du, Hongyang Sun, Yuxiong He, Yu He, David A. Bader, Huazhe Zhang:
Energy-Efficient Scheduling for Best-Effort Interactive Services to Achieve High Response Quality. IPDPS 2013: 637-648 - [c20]Shaolei Ren, Yuxiong He:
COCA: online distributed resource management for cost minimization and carbon neutrality in data centers. SC 2013: 39:1-39:12 - [c19]Juan Mendivelso, Sunghwan Kim, Sameh Elnikety, Yuxiong He, Seung-won Hwang, Yoan J. Pinzón:
Solving Graph Isomorphism Using Parameterized Matching. SPIRE 2013: 230-242 - 2012
- [c18]Sherif Sakr, Sameh Elnikety, Yuxiong He:
G-SPARQL: a hybrid engine for querying large attributed graphs. CIKM 2012: 335-344 - [c17]Yuxiong He, Sameh Elnikety, James R. Larus, Chenyu Yan:
Zeta: scheduling interactive services with partial execution. SoCC 2012: 12 - [c16]Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety:
Budget-based control for interactive services with adaptive execution. ICAC 2012: 105-114 - [c15]Shaolei Ren, Yuxiong He, Fei Xu:
Provably-Efficient Job Scheduling for Energy and Fairness in Geographically Distributed Data Centers. ICDCS 2012: 22-31 - [c14]Mohamed Sarwat, Sameh Elnikety, Yuxiong He, Gabriel Kliot:
Horton: Online Query Execution Engine for Large Distributed Graphs. ICDE 2012: 1289-1292 - 2011
- [c13]Yuxiong He, Sameh Elnikety:
Position Paper: Embracing Heterogeneity - Improving Energy Efficiency for Interactive Services on Heterogeneous Data Center Hardware. AI for Data Center Management and Cloud Computing 2011 - [c12]Yuxiong He, Sameh Elnikety:
Scheduling for data center interactive services. Allerton 2011: 1170-1181 - [c11]Yuxiong He, Sameh Elnikety, Hongyang Sun:
Tians Scheduling: Using Partial Processing in Best-Effort Applications. ICDCS 2011: 434-445 - [c10]Yuxiong He, Jie Liu, Hongyang Sun:
Scheduling Functionally Heterogeneous Systems with Utilization Balancing. IPDPS 2011: 1187-1198 - [c9]Hongyang Sun, Yuxiong He, Wen-Jing Hsu:
Speed Scaling for Energy and Performance with Instantaneous Parallelism. TAPAS 2011: 240-251 - 2010
- [j4]Yuxiong He, Hongyang Sun, Wen-Jing Hsu:
Improved results for scheduling batched parallel jobs by using a generalized analysis framework. J. Parallel Distributed Comput. 70(2): 173-182 (2010) - [c8]Yuxiong He, Charles E. Leiserson, William M. Leiserson:
The Cilkview scalability analyzer. SPAA 2010: 145-156 - [i1]Hongyang Sun, Yuxiong He, Wen-Jing Hsu:
Energy-Efficient Multiprocessor Scheduling for Flow Time and Makespan. CoRR abs/1010.4110 (2010)
2000 – 2009
- 2008
- [j3]Kunal Agrawal, Charles E. Leiserson, Yuxiong He, Wen-Jing Hsu:
Adaptive work-stealing with parallelism feedback. ACM Trans. Comput. Syst. 26(3): 7:1-7:32 (2008) - [j2]Yuxiong He, Wen-Jing Hsu, Charles E. Leiserson:
Provably Efficient Online Nonclairvoyant Adaptive Scheduling. IEEE Trans. Parallel Distributed Syst. 19(9): 1263-1279 (2008) - 2007
- [c7]Yuxiong He, Hongyang Sun, Wen-Jing Hsu:
Adaptive Scheduling of Parallel Jobs on Functionally Heterogeneous Resources. ICPP 2007: 43 - [c6]Kunal Agrawal, Yuxiong He, Wen-Jing Hsu, Charles E. Leiserson:
Adaptive Scheduling with Parallelism Feedback. IPDPS 2007: 1-7 - [c5]Yuxiong He, Wen-Jing Hsu, Charles E. Leiserson:
Provably Efficient Online Non-clairvoyant Adaptive Scheduling. IPDPS 2007: 1-10 - [c4]Kunal Agrawal, Yuxiong He, Charles E. Leiserson:
Adaptive work stealing with parallelism feedback. PPoPP 2007: 112-120 - 2006
- [c3]Kunal Agrawal, Yuxiong He, Charles E. Leiserson:
An Empirical Evaluation ofWork Stealing with Parallelism Feedback. ICDCS 2006: 19 - [c2]Yuxiong He, Wen-Jing Hsu, Charles E. Leiserson:
Provably Efficient Two-Level Adaptive Scheduling. JSSPP 2006: 1-32 - [c1]Kunal Agrawal, Yuxiong He, Wen-Jing Hsu, Charles E. Leiserson:
Adaptive scheduling with parallelism feedback. PPoPP 2006: 100-109 - 2004
- [j1]Bu-Sung Lee, Wing-Keong Woo, Chai Kiat Yeo, Teck Meng Lim, Bee-Hwa Lim, Yuxiong He, Jie Song:
Secure communications between bandwidth brokers. ACM SIGOPS Oper. Syst. Rev. 38(1): 43-57 (2004)
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-14 23:32 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint