Author: Xu, Mengwei : Search

short-paper

WiP: Efficient LLM Prefilling with Mobile NPU

EdgeFM '24: Proceedings of the Workshop on Edge and Mobile Foundation ModelsJune 2024, Pages 33–35https://doi.org/10.1145/3662006.3662066

Large language models (LLMs) play a crucial role in various Natural Language Processing (NLP) tasks, prompting their deployment on mobile devices for inference. However, a significant challenge arises due to high waiting latency, especially for long ...

research-article

Large Language Models on Mobile Devices: Measurements, Analysis, and Insights

EdgeFM '24: Proceedings of the Workshop on Edge and Mobile Foundation ModelsJune 2024, Pages 1–6https://doi.org/10.1145/3662006.3662059

Deploying large language models (LLMs) inference into mobile devices is cost-efficient for companies, and well addresses the privacy concern of users. However, the limited computation capacity and memory constraints of mobile devices hinder their ...

short-paper

Poster: Efficient and Accurate Mobile Task Automation through Learning from Code

MOBISYS '24: Proceedings of the 22nd Annual International Conference on Mobile Systems, Applications and ServicesJune 2024, Pages 638–639https://doi.org/10.1145/3643832.3661397

With the emergence and continuous prosperity of large language models (LLMs), artificial intelligence (AI) agents have experienced rapid advancements. Most mobile AI agents merely imitate human operations, executing actions based on the human user ...

research-article

Deciphering the Enigma of Satellite Computing with COTS Devices: Measurement and Analysis

ACM MobiCom '24: Proceedings of the 30th Annual International Conference on Mobile Computing and NetworkingMay 2024, Pages 420–435https://doi.org/10.1145/3636534.3649371

In the wake of the rapid deployment of large-scale low-Earth orbit satellite constellations, exploiting the full computing potential of Commercial Off-The-Shelf (COTS) devices in these environments has become a pressing issue. However, understanding this ...

research-article

Mobile Foundation Model as Firmware

ACM MobiCom '24: Proceedings of the 30th Annual International Conference on Mobile Computing and NetworkingMay 2024, Pages 279–295https://doi.org/10.1145/3636534.3649361

In the current AI era, mobile devices such as smartphones are tasked with executing a myriad of deep neural networks (DNNs) locally. It presents a complex landscape, as these models are highly fragmented in terms of architecture, operators, and ...

research-article

Towards Energy-efficient Federated Learning via INT8-based Training on Mobile DSPs

WWW '24: Proceedings of the ACM Web Conference 2024May 2024, Pages 2786–2794https://doi.org/10.1145/3589334.3645341

AI is making the Web an even cooler place, but also introduces serious privacy risks due to the extensive user data collection. Federated learning (FL), as a privacy-preserving machine learning paradigm, enables mobile devices to collaboratively learn a ...

research-article

Safeguard Privacy for Minimal Data Collection with Trustworthy Autonomous Agents

AAMAS '24: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent SystemsMay 2024, Pages 1966–1974

Ensuring digital privacy necessitates users giving well-considered consent to online service providers for data usage, creating an unsustainable and error-prone decision load. Software privacy agents can help make data consent decisions on behalf of ...

research-article

FedRDMA: Communication-Efficient Cross-Silo Federated LLM via Chunked RDMA Transmission

EuroMLSys '24: Proceedings of the 4th Workshop on Machine Learning and SystemsApril 2024, Pages 126–133https://doi.org/10.1145/3642970.3655834

Communication overhead is a significant bottleneck in federated learning (FL), which has been exaggerated with the increasing size of AI models. In this paper, we propose FedRDMA, a communication-efficient cross-silo FL system that integrates RDMA into ...

research-article

SoCFlow: Efficient and Scalable DNN Training on SoC-Clustered Edge Servers

ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1April 2024, Pages 368–385https://doi.org/10.1145/3617232.3624847

SoC-Cluster, a novel server architecture composed of massive mobile system-on-chips (SoCs), is gaining popularity in industrial edge computing due to its energy efficiency and compatibility with existing mobile applications. However, we observe that the ...

research-article

Demystifying the QoS and QoE of Edge-hosted Video Streaming Applications in the Wild with SNESet

Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 4Article No.: 236, Pages 1–29https://doi.org/10.1145/3626723

Video streaming applications (VSAs) are increasingly being deployed on large-scale edge platforms, which have the potential to significantly improve the quality of service (QoS) and end-user experience (QoE), ultimately maximizing business outcomes. ...

Article

Niagara: Scheduling DNN Inference Services on Heterogeneous Edge Processors

Service-Oriented ComputingNov 2023, Pages 67–85https://doi.org/10.1007/978-3-031-48421-6_6

Abstract

Intelligent applications heavily rely on deep neural network (DNN) inference services executed on edge devices to fulfill functional prerequisites while safeguarding user data privacy. However, the execution of such DNN services on resource-...

Article

CAN-verify: A Verification Tool For BDI Agents

iFM 2023Nov 2023, Pages 364–373https://doi.org/10.1007/978-3-031-47705-8_19

Abstract

CAN-verify is an automated tool that aids the development, verification, and analysis of BDI agents written in the Conceptual Agent Notation (Can) language. It does not require users to be familiar with verification techniques. CAN-verify supports ...

research-article

Seamless Cross-Edge Service Migration for Real-Time Rendering Applications

IEEE Transactions on Mobile Computing (ITMV), Volume 23, Issue 6June 2024, Pages 7084–7098https://doi.org/10.1109/TMC.2023.3331773

Seamless cross-edge migration for real-time rendering applications is challenging. The strong interactive nature of real-time rendering applications demands a downtime lower than <inline-formula><tex-math notation="LaTeX">$\text{15}\;\text{ms}$</tex-math><...

research-article

Federated Few-Shot Learning for Mobile NLP

ACM MobiCom '23: Proceedings of the 29th Annual International Conference on Mobile Computing and NetworkingOctober 2023, Article No.: 63, Pages 1–17https://doi.org/10.1145/3570361.3613277

Natural language processing (NLP) sees rich mobile applications. To support various language understanding tasks, a foundation NLP model is often fine-tuned in a federated, privacy-preserving setting (FL). This process currently relies on at least ...

research-article

Efficient Federated Learning for Modern NLP

ACM MobiCom '23: Proceedings of the 29th Annual International Conference on Mobile Computing and NetworkingOctober 2023, Article No.: 37, Pages 1–16https://doi.org/10.1145/3570361.3592505

Transformer-based pre-trained models have revolutionized NLP for superior performance and generality. Fine-tuning pre-trained models for downstream tasks often requires private data, for which federated learning is the de-facto approach (i.e., FedNLP)...

research-article

Quantitative modelling and analysis of BDI agents

Software and Systems Modeling (SoSyM) (SPSSM), Volume 23, Issue 2Apr 2024, Pages 343–367https://doi.org/10.1007/s10270-023-01121-5

Abstract

Belief–desire–intention (BDI) agents are a popular agent architecture. We extend conceptual agent notation (Can)—a BDI programming language with advanced features such as failure recovery and declarative goals—to include probabilistic action ...

research-article

Tango: Harmonious Management and Scheduling for Mixed Services Co-located among Distributed Edge-Clouds

ICPP '23: Proceedings of the 52nd International Conference on Parallel ProcessingAugust 2023, Pages 595–604https://doi.org/10.1145/3605573.3605589

Co-locating Latency-Critical (LC) and Best-Effort (BE) services in edge-clouds is expected to enhance resource utilization. However, this mixed deployment encounters unique challenges. Edge-clouds are heterogeneous, distributed, and resource-constrained, ...

research-article

A large-scale holistic measurement of crowdsourced edge cloud platform

World Wide Web (WWWJ), Volume 26, Issue 5Sep 2023, Pages 3561–3584https://doi.org/10.1007/s11280-023-01201-y

Abstract

Edge clouds have become a de-facto paradigm to deliver low and stable networks to delay-critical applications such as Web services and AR/VR. A unique form of edge clouds is those crowdsourced from third parties, e.g., idle PCs or workstations. ...

research-article

A Comprehensive Deep Learning Library Benchmark and Optimal Library Selection

IEEE Transactions on Mobile Computing (ITMV), Volume 23, Issue 5May 2024, Pages 5069–5082https://doi.org/10.1109/TMC.2023.3301973

Deploying deep learning (DL) on mobile devices has been a notable trend in recent years. To support fast inference of on-device DL, DL libraries play a critical role as algorithms and hardware do. Unfortunately, no prior work ever dives deep into the ...

poster

FedAdapter: Efficient Federated Learning for Mobile NLP

ACM TURC '23: Proceedings of the ACM Turing Award Celebration Conference - China 2023July 2023, Pages 27–28https://doi.org/10.1145/3603165.3607380

Fine-tuning pre-trained models for downstream tasks often requires private data, for which federated learning is the de-facto approach (i.e., FedNLP). However, FedNLP is prohibitively slow due to the large model sizes and the resultant high network/...

Applied Filters

People

Names

Institutions

Authors

Editors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences