Location via proxy:   
[Report a bug]   [Manage cookies]                

Yang Zhou

yangzhou.rpc@gmail.com

Logo


GitHub | CV | Bio | Scholar

About

I am an Assistant Professor at UC Davis CS. My office is in Kemper 2127. I have equal interests in core systems and ML systems research, e.g., efficient LLMs, GPU communication, heterogeneous computing. I am currently working on UCCL GitHub Repo stars for GPU communication, with close collaborations between Davis and Berkeley.

I was a PostDoc at UC Berkeley during 2024-2025, advised by Ion Stoica on GPU communication. I finished my PhD in Computer Science at Harvard University in 2024, advised by Minlan Yu and James Mickens on network-application co-design. I received my B.S. in Computer Science at Peking University in 2018, advised by Tong Yang on probabilistic data structures and streaming algorithms. I was supported by a Google PhD Fellowship in Systems and Networking (see my app materials).

Updates

  • Feb 2026, check the llm-d blogpost on how they use UCCL to achieve resilient networking for KV cache transfer!
  • Jan 2026, LEANN and FCP to appear at MLSys’26!
  • Dec 2025, UCCL-EP for portable expert-parallel communication (AMD, Broadcom, AWS EFA)!
  • Dec 2025, UCCL talk at UIUC
  • Nov 2025, BlendServe to appear at ASPLOS’26!

Students

Current

  • Shuang Ma (2025-now, PhD), from USTC
  • Yihan Zhang (2025-now, PhD), from UIUC

Mentored

  • Zhongjie Chen (2024-now), Tsinghua University PhD
  • Xuanlin Jiang (2024), Peking University undergrad → Harvard PhD
  • Matt Kiley (2023), Harvard College undergrad → Clockwork Systems
  • Yunxi Shen (2023-2024), Tsinghua University undergrad → Cornell PhD
  • Xingyu Xiang (2023-now), Peking University undergrad → Harvard PhD
  • Zezhou Wang (2022), Peking University undergrad → University of Washington PhD

Publications

2026

  • Unleashing Scalable Context Parallelism for Foundation Models Pre-Training via FCP
    Yilong Zhao, Xiaonan Nie, Kan Zhu, Shuang Ma, Zhichao Lai, Hongxiang Hao, Yang Zhou, Baris Kasikci, Ion Stoica
    MLSys 2026. The Conference on Machine Learning and Systems

  • LEANN: a Low-Storage Vector Index
    Yichuan Wang, Shu Liu, Zhifei Li, Yongji Wu, Ziming Mao, Yilong Zhao, Xiao Yan, Zhiying Xu, Yang Zhou, Ion Stoica, Sewon Min, Matei Zaharia, Joseph E. Gonzalez
    MLSys 2026. The Conference on Machine Learning and Systems
    ICML 2025 Workshop on Vector Databases
    [Arxiv June 2025] [code] GitHub Repo stars

  • BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching
    Yilong Zhao*, Shuo Yang*, Kan Zhu, Lianmin Zheng, Baris Kasikci, Yifan Qiao, Yang Zhou, Jiarong Xing, Ion Stoica
    ASPLOS 2026
    [Arxiv Nov 2024]

2025

  • UCCL-EP: Portable Expert-Parallel Communication
    Ziming Mao, Yihan Zhang, Chihan Cui, Zhen Huang, Kaichao You, Zhongjie Chen, Zhiying Xu, Zhenyu Gu, Scott Shenker, Costin Raiciu, Yang Zhou, Ion Stoica
    [Arxiv Dec 2025]

  • UCCL: an Efficient Collective Communication Library for GPUs
    Yang Zhou*, Zhongjie Chen*, Ziming Mao, ChonLam Lao, Shuo Yang, Pravein Govindan Kannan, Jiaqi Gao, Yilong Zhao, Yongji Wu, Kaichao You, Fengyuan Ren, Zhiying Xu, Costin Raiciu, Ion Stoica
    [Arxiv April 2025] [homepage] [slides] [code] GitHub Repo stars
    Featured in IBM/Red Hat/Google llm-d, Nvidia NIXL

  • ShadowServe: Interference-Free KV Cache Fetching for Distributed Prefix Caching
    Xingyu Xiang, Raj Joshi, Yuhan Liu, Jiayi Yao, Chenxingyu Zhao, Junchen Jiang, Yang Zhou, Eddie Kohler, Minlan Yu
    [Arxiv Sep 2025]

  • Towards Efficient and Practical GPU Multitasking in the Era of LLM
    Jiarong Xing, Yifan Qiao, Simon Mo, Xingqi Cui, Gur-Eyal Sela, Yang Zhou, Joseph Gonzalez, Ion Stoica
    [Arxiv Aug 2025] [code] GitHub Repo stars

  • Locality-Aware Fair Scheduling in LLM Serving
    Shiyi Cao*, Yichuan Wang*, Ziming Mao, Pin-Lun Hsu, Liangsheng Yin, Tian Xia, Dacheng Li, Shu Liu, Yineng Zhang, Yang Zhou, Ying Sheng, Joseph Gonzalez, Ion Stoica
    [Arxiv Jan 2025]

  • Toward Interference-Aware Scheduling for Serverless Functions via eBPF and Meta-Learning
    Yifan Zhang, Jianchang Su, Zixu Shen, Yang Zhou, Wei Zhang
    PACMI 2025. Workshop on Practical Adoption Challenges of ML for Systems
    [paper]

  • Rethinking RPC Communication for Microservices-based Applications
    Xiangfeng Zhu, Yang Zhou, Yuyao Wang, Xiangyu Gao, Arvind Krishnamurthy, Sam Kumar, Ratul Mahajan, Danyang Zhuo
    HotOS 2025. The ACM SIGOPS 20th Workshop on Hot Topics in Operating Systems
    [paper]

  • NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM Inference
    Xuanlin Jiang, Yang Zhou, Shiyi Cao, Ion Stoica, Minlan Yu
    MLSys 2025. The Conference on Machine Learning and Systems
    [paper] [slides] [code]

  • eTran: Extensible Kernel Transport with eBPF
    Zhongjie Chen, Qingkai Meng, ChonLam Lao, Yifan Liu, Fengyuan Ren, Minlan Yu, Yang Zhou
    NSDI 2025. USENIX Symposium on Networked Systems Design and Implementation
    [paper] [code]

2024

  • ConServe: Harvesting GPUs for Low-Latency and High-Throughput Large Language Model Serving
    Yifan Qiao, Shu Anzai, Shan Yu, Haoran Ma, Shuo Yang, Yang Wang, Miryung Kim, Yongji Wu, Yang Zhou, Jiarong Xing, Joseph Gonzalez, Ion Stoica, Harry Xu
    [Arxiv Oct 2024]

  • Post-Training Sparse Attention with Double Sparsity
    Shuo Yang, Ying Sheng, Yilong Zhao, Joseph Gonzalez, Yang Zhou, Ion Stoica, Lianmin Zheng
    [Arxiv Aug 2024]

PhD and prior work

  • SmartNIC Security Isolation in the Cloud with S-NIC
    Yang Zhou, Mark Wilkening, James Mickens, Minlan Yu
    EuroSys 2024. European Conference on Computer Systems
    [paper] [slides] [code]

  • DINT: Fast In-Kernel Distributed Transactions with eBPF
    Yang Zhou*, Xingyu Xiang*, Matthew Kiley, Sowmya Dharanipragada, Minlan Yu
    NSDI 2024. USENIX Symposium on Networked Systems Design and Implementation
    [paper] [slides] [talk] [code]

  • Electrode: Accelerating Distributed Protocols with eBPF
    Yang Zhou*, Zezhou Wang*, Sowmya Dharanipragada, Minlan Yu
    NSDI 2023. USENIX Symposium on Networked Systems Design and Implementation
    [paper] [slides] [talk] [code]

  • Carbink: Fault-Tolerant Far Memory
    Yang Zhou, Hassan Wassel, Sihang Liu, Jiaqi Gao, James Mickens, Minlan Yu, Chris Kennelly, Paul Turner, David Culler, Hank Levy, Amin Vahdat
    OSDI 2022. USENIX Symposium on Operating Systems Design and Implementation
    [paper] [slides] [talk]

  • Evolvable Network Telemetry at Facebook
    Yang Zhou, Ying Zhang, Minlan Yu, Guangyu Wang, Dexter Cao, Eric Sung and Starsky Wong
    NSDI 2022. USENIX Symposium on Networked Systems Design and Implementation
    [paper] [slides] [talk]

  • On the Evolutionary of Bloom Filter False Positives - An Information Theoretical Approach to Optimizing Bloom Filter Parameters
    Zhuochen Fan, Gang Wen, Zhipeng Huang, Yang Zhou, Qiaobin Fu, Tong Yang, Alex X. Liu, Bin Cui
    IEEE Transactions on Knowledge and Data Engineering (TKDE) 2022
    [paper] [Code]

  • Pyramid Family: Generic Frameworks for Accurate and Fast Flow Size Measurement
    Yuanpeng Li, Xiang Yu, Yilong Yang, Yang Zhou, Tong Yang, Zhuo Ma, Shigang Chen
    IEEE/ACM Trasactions on Networking (TON) 2021
    [paper] [Code]

  • Fast and Accurate Stream Processing by Filtering the Cold.
    Tong Yang, Jie Jiang, Yang Zhou, Long He, Jinyang Li, Bin Cui, Steve Uhlig, Xiaoming Li
    VLDB Journal 2019
    [paper] [Code]

  • Adaptive Measurements using One Elastic Sketch.
    Tong Yang, Jie Jiang, Peng Liu, Qun Huang, Junzhi Gong, Yang Zhou, Rui Miao, Xiaoming Li, Steve Uhlig
    IEEE/ACM Trasactions on Networking (TON) 2019
    [paper] [Code]

  • Cold Filter: A Meta-Framework for Faster and More Accurate Stream Processing.
    Yang Zhou, Tong Yang, Jie Jiang, Bin Cui, Minlan Yu, Xiaoming Li, Steve Uhlig
    SIGMOD 2018. ACM SIGMOD International Conference on Management of Data
    [paper] [slides] [Code]

  • Elastic Sketch: Adaptive and Fast Network-wide Measurements.
    Tong Yang, Jie Jiang, Peng Liu, Qun Huang, Junzhi Gong, Yang Zhou, Rui Miao, Xiaoming Li, Steve Uhlig
    SIGCOMM 2018. ACM SIGCOMM International Conference on Data Communications
    [paper] [slides] [talk] [Code]

  • Accelerating Network Measurement in Software.
    Yang Zhou, Omid Alipourfard, Minlan Yu, Tong Yang
    SIGCOMM CCR 2018 July issue, ACM SIGCOMM Computer Communication Review
    [paper] [Code]

  • A Comparison of Performance and Accuracy of Measurement Algorithms in Software.
    Omid Alipourfard, Masoud Moshref, Yang Zhou, Tong Yang, Minlan Yu
    SOSR 2018. ACM Symposium on SDN Research
    [paper]

  • Accurate Per-Flow Measurement with Bloom Sketch.
    Yang Zhou, Hao Jin, Peng Liu, Haowei Zhang, Tong Yang, Xiaoming Li
    INFOCOM 2018 Workshops. IEEE International Conference on Computer Communications
    [paper] [Code]

  • Single Hash: Use One Hash Function to Build Faster Hash Based Data Structures.
    Xiangyang Gou, Chenxingyu Zhao, Tong Yang, Lei Zou, Yang Zhou, Yibo Yan, Xiaoming Li, Bin Cui
    BigComp 2018. IEEE International Conference on Big Data and Smart Computing
    [paper]

  • Pyramid Sketch: a Sketch Framework for Frequency Estimation of Data Streams.
    Tong Yang, Yang Zhou, Hao Jin, Shigang Chen, Xiaoming Li
    VLDB 2017. International Conference on Very Large Data Bases
    [paper] [Code]

  • One Memory Access Sketch: a More Accurate and Faster Sketch for Per-flow Measurement.
    Yang Zhou, Peng Liu, Hao Jin, Tong Yang, Shoujiang Dang, Xiaoming Li
    Globecom 2017. IEEE Global Communications Conference
    [paper] [Code]

  • ABC: a Practicable Sketch Framework for Non-uniform Multisets.
    Junzhi Gong, Tong Yang, Yang Zhou, Dongsheng Yang, Shigang Chen, Bin Cui, Xiaoming Li
    BigData 2017. IEEE International Conference on Big Data
    [paper]

*: co-primary authors

Teaching

  • ECS 289D Seminar in Operating Systems: Datacenter Systems for LLMs: Fall 2025

Service

  • Organizer:
    • Co-Chair for SIGCOMM Artifact Evaluation 2024
  • Program Committee:
    • EuroSys 2026
    • ASPLOS 2026
    • NSDI 2026
    • OSDI 2025
    • SIGCOMM Poster/Demo 2023, 2024
    • SIGCOMM Workshop on eBPF and Kernel Extensions 2024, 2025

Miscellaneous

  • I do skiing and traveling, see my photos from:
    • Japan
    • China
    • White Mountains
    • Ecuador
    • Zion
    • Acadia
    • Salem
    • Puerto Rico
    • Mount Rainier
    • Olympic National Park
    • Great Smoky Mountains
    • Bryce Canyon
    • Capitol Reef
    • Utah 12
    • Death Valley
    • Crater Lake
    • Rocky Mountain
    • Mount Evans
    • Mount Baker
    • Lake Tahoe
    • Yosemite
    • Point Reyes
    • Grand Teton
    • Yellowstone
    • Everglades
    • Key West
    • Lassen Volcanic
    • Kenai Fjords
    • Denali
    • Monument Valley
    • Arches
    • Canyonlands
    • Alta
    • Snowbird
  • I did long-distance biking
    • I cycled 700 miles around Taiwan in ten days, see my photos.
  • I did long-distance running
    • I finished a half-marathon in 1h50min.
  • I was pretty fond of physics during high school and college.

Last updated Feb 11, 2026
Hosted on GitHub Pages — Theme by orderedlist