Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–9 of 9 results for author: Lentz, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04656  [pdf, other

    cs.DC cs.LG

    Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement

    Authors: Yongji Wu, Wenjie Qu, Tianyang Tao, Zhuang Wang, Wei Bai, Zhuohao Li, Yuan Tian, Jiaheng Zhang, Matthew Lentz, Danyang Zhuo

    Abstract: Sparsely-activated Mixture-of-Experts (MoE) architecture has increasingly been adopted to further scale large language models (LLMs) due to its sub-linear scaling for computation costs. However, frequent failures still pose significant challenges as training scales. The cost of even a single failure is significant, as all GPUs need to wait idle until the failure is resolved, potentially losing con… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  2. arXiv:2407.00467  [pdf, other

    cs.LG cs.DC eess.IV

    VcLLM: Video Codecs are Secretly Tensor Codecs

    Authors: Ceyu Xu, Yongji Wu, Xinyu Yang, Beidi Chen, Matthew Lentz, Danyang Zhuo, Lisa Wu Wills

    Abstract: As the parameter size of large language models (LLMs) continues to expand, the need for a large memory footprint and high communication bandwidth have become significant bottlenecks for the training and inference of LLMs. To mitigate these bottlenecks, various tensor compression techniques have been proposed to reduce the data size, thereby alleviating memory requirements and communication pressur… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  3. arXiv:2402.12280  [pdf, other

    cs.CL cs.AI

    Adaptive Skeleton Graph Decoding

    Authors: Shuowei Jin, Yongji Wu, Haizhong Zheng, Qingzhao Zhang, Matthew Lentz, Z. Morley Mao, Atul Prakash, Feng Qian, Danyang Zhuo

    Abstract: Large language models (LLMs) have seen significant adoption for natural language tasks, owing their success to massive numbers of model parameters (e.g., 70B+); however, LLM inference incurs significant computation and memory costs. Recent approaches propose parallel decoding strategies, such as Skeleton-of-Thought (SoT), to improve performance by breaking prompts down into sub-problems that can b… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  4. arXiv:2401.12230  [pdf, other

    cs.DC cs.LG

    Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native

    Authors: Yao Lu, Song Bian, Lequn Chen, Yongjun He, Yulong Hui, Matthew Lentz, Beibin Li, Fei Liu, Jialin Li, Qi Liu, Rui Liu, Xiaoxuan Liu, Lin Ma, Kexin Rong, Jianguo Wang, Yingjun Wu, Yongji Wu, Huanchen Zhang, Minjia Zhang, Qizhen Zhang, Tianyi Zhou, Danyang Zhuo

    Abstract: In this paper, we investigate the intersection of large generative AI models and cloud-native computing architectures. Recent large models such as ChatGPT, while revolutionary in their capabilities, face challenges like escalating costs and demand for high-end GPUs. Drawing analogies between large-model-as-a-service (LMaaS) and cloud database-as-a-service (DBaaS), we describe an AI-native computin… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  5. arXiv:2304.07349  [pdf, other

    cs.NI cs.OS

    Remote Procedure Call as a Managed System Service

    Authors: Jingrong Chen, Yongji Wu, Shihan Lin, Yechen Xu, Xinhao Kong, Thomas Anderson, Matthew Lentz, Xiaowei Yang, Danyang Zhuo

    Abstract: Remote Procedure Call (RPC) is a widely used abstraction for cloud computing. The programmer specifies type information for each remote procedure, and a compiler generates stub code linked into each application to marshal and unmarshal arguments into message buffers. Increasingly, however, application and service operations teams need a high degree of visibility and control over the flow of RPCs b… ▽ More

    Submitted 14 April, 2023; originally announced April 2023.

    Comments: NSDI 2023

  6. arXiv:2207.00592  [pdf, other

    cs.DC cs.NI

    Dissecting Service Mesh Overheads

    Authors: Xiangfeng Zhu, Guozhen She, Bowen Xue, Yu Zhang, Yongsu Zhang, Xuan Kelvin Zou, Xiongchun Duan, Peng He, Arvind Krishnamurthy, Matthew Lentz, Danyang Zhuo, Ratul Mahajan

    Abstract: Service meshes play a central role in the modern application ecosystem by providing an easy and flexible way to connect different services that form a distributed application. However, because of the way they interpose on application traffic, they can substantially increase application latency and resource consumption. We develop a decompositional approach and a tool, called MeshInsight, to system… ▽ More

    Submitted 2 July, 2022; originally announced July 2022.

  7. arXiv:2205.04713  [pdf, other

    cs.LG cs.DB cs.DC

    Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures

    Authors: Yongji Wu, Matthew Lentz, Danyang Zhuo, Yao Lu

    Abstract: With the advent of ubiquitous deployment of smart devices and the Internet of Things, data sources for machine learning inference have increasingly moved to the edge of the network. Existing machine learning inference platforms typically assume a homogeneous infrastructure and do not take into account the more complex and tiered computing infrastructure that includes edge devices, local hubs, edge… ▽ More

    Submitted 3 August, 2022; v1 submitted 10 May, 2022; originally announced May 2022.

  8. arXiv:2011.08069  [pdf, other

    cs.CR cs.CY cs.SI q-bio.PE

    Reconciling Security and Utility in Next-Generation Epidemic Risk Mitigation Systems

    Authors: Pierfrancesco Ingo, Nichole Boufford, Ming Cheng Jiang, Rowan Lindsay, Matthew Lentz, Gilles Barthe, Manuel Gomez-Rodriguez, Bernhard Schölkopf, Deepak Garg, Peter Druschel, Aastha Mehta

    Abstract: Epidemics like the recent COVID-19 require proactive contact tracing and epidemiological analysis to predict and subsequently contain infection transmissions. The proactive measures require large scale data collection, which simultaneously raise concerns regarding users' privacy. Digital contact tracing systems developed in response to COVID-19 either collected extensive data for effective analyti… ▽ More

    Submitted 9 May, 2024; v1 submitted 16 November, 2020; originally announced November 2020.

  9. arXiv:2001.08840  [pdf, other

    cs.CR

    SeCloak: ARM Trustzone-based Mobile Peripheral Control

    Authors: Matthew Lentz, Rijurekha Sen, Peter Druschel, Bobby Bhattacharjee

    Abstract: Reliable on-off control of peripherals on smart devices is a key to security and privacy in many scenarios. Journalists want to reliably turn off radios to protect their sources during investigative reporting. Users wish to ensure cameras and microphones are reliably off during private meetings. In this paper, we present SeCloak, an ARM TrustZone-based solution that ensures reliable on-off control… ▽ More

    Submitted 23 January, 2020; originally announced January 2020.