research-article

Large Language Models on Mobile Devices: Measurements, Analysis, and Insights

Authors:

Mengwei XuAuthors Info & Claims

EdgeFM '24: Proceedings of the Workshop on Edge and Mobile Foundation Models

Pages 1 - 6

https://doi.org/10.1145/3662006.3662059

Published: 11 June 2024 Publication History

Abstract

Deploying large language models (LLMs) inference into mobile devices is cost-efficient for companies, and well addresses the privacy concern of users. However, the limited computation capacity and memory constraints of mobile devices hinder their practical deployment. Prior work strives to expand model size for better accuracy performance, while there is a lack of systematic understanding of "small" sub-10 billion LLMs that are already feasible for current commodity devices. To better reveal the current landscape of LLMs on mobile devices, we conducted a comprehensive measurement study, deploying 22 models across 4 mobile devices. Our measurements focus on accuracy, inference latency, and memory footprint across various input lengths, devices, and execution engines. The observations from the measurements point us toward promising directions for efficient LLM deployment on mobile devices.

References

[1]

[n.d.]. Baichuan Models. https://huggingface.co/baichuan-inc.

[2]

[n. d.]. Koala: A Dialogue Model for Academic Research. https://bair.berkeley.edu/blog/2023/04/03/koala/.

[3]

[n.d.]. Pygmalion/Metharme 7B Model. https://huggingface.co/PygmalionAI/pygmalion-7b.

[4]

[n. d.]. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality. https://lmsys.org/blog/2023-03-30-vicuna/.

[5]

[n.d.]. WizardLM Model. https://github.com/nlpxucan/WizardLM.

[6]

2019. WinoGrande: An Adversarial Winograd Schema Challenge at Scale.

[7]

n.d. Android AICore. urlhttps://developer.android.com/ml/aicore. Accessed: 2024-04-06.

[8]

Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, et al. 2023. Scaling Laws for Generative Mixed-Modal Language Models. arXiv:2301.03728 [cs.CL]

[9]

Amey Agrawal, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, et al. 2023. SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills. arXiv:2308.16369 [cs.LG]

[10]

01. AI, Alex Young, Bei Chen, Chao Li, et al. 2024. Yi: Open Foundation Models by 01.AI. arXiv:2403.04652 [cs.CL]

[11]

Ebtesam Almazrouei, Hamza Alobeidli, Abdulaziz Alshamsi, Alessandro Cappelli, Ruxandra Cojocaru, et al. 2023. Falcon-40B: an open large language model with state-of-the-art performance. (2023).

[12]

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, et al. 2023. QwenTechnical Report. arXiv preprint arXiv:2309.16609 (2023).

[13]

BigScience and Hugging Face. 2022. Introducing The World's Largest Open Multilingual Language Model: BLOOM. url-https://bigscience.huggingface.co/blog/bloom. Accessed: 2024-04-07.

[14]

Tianyu Chen, Hangbo Bao, Shaohan Huang, Li Dong, Binxing Jiao, et al. 2022. The-x: Privacy-preserving transformer inference with homomorphic encryption. arXiv preprint arXiv:2206.00216 (2022).

[15]

Yiming Cui, Ziqing Yang, and Xin Yao. 2023. Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca. arXiv preprint arXiv:2304.08177 (2023). https://arxiv.org/abs/2304.08177

[16]

Databricks. 2023. Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs. urlhttps://www.databricks.com/blog/mpt-7b. Accessed: 2024-04-06.

[17]

Alex de Vries. 2023. The growing energy footprint of artificial intelligence. Joule 7, 10 (2023), 2191--2194.

[18]

DeepSeek-AI, Xiao Bi, Deli Chen, Guanting Chen, et al. 2024. DeepSeek LLM: Scaling Open-Source Language Models with Longtermism. arXiv:2401.02954 [cs.CL]

[19]

Erich Elsen, Augustus Odena, Maxwell Nye, Sağnak Taşirlar, Tri Dao, et al. 2023. Releasing Persimmon-8B. https://www.adept.ai/blog/persimmon-8b

[20]

Thomas Mesnard Gemma Team, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Laurent Sifre, et al. 2024. Gemma. (2024). https://doi.org/10.34740/KAGGLE/M/3301

[21]

Georgi Gerganov. 2023. llama.cpp. urlhttps://github.com/ggerganov/llama.cpp.

[22]

Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, et al. 2016. CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 48), Maria Florina Balcan and Kilian Q. Weinberger (Eds.). PMLR, New York, New York, USA, 201--210. https://proceedings.mlr.press/v48/gilad-bachrach16.html

[23]

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, et al. 2021. Measuring Massive Multitask Language Understanding. arXiv:2009.03300 [cs.CY]

[24]

Bofeng Huang. 2023. Vigogne: French Instruction-following and Chat Models. https://github.com/bofenghuang/vigogne.

[25]

Yuanchun Li, Hao Wen, Weijun Wang, Xiangyu Li, Yizhen Yuan, et al. 2024. Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security. arXiv preprint arXiv:2401.05459 (2024).

[26]

Chengdong Lin, Kun Wang, Zhenjiang Li, and Yu Pu. 2023. A Workload-Aware DVFS Robust to Concurrent Tasks for Mobile Devices. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3570361.3592524

Digital Library

[27]

Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Xingyu Dang, et al. 2023. AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration. arXiv:2306.00978 [cs.CL]

[28]

mllm team. 2024. mllm. https://github.com/UbiquitousLearning/mllm

[29]

David Patterson, Joseph Gonzalez, Urs Hölzle, Quoc Le, Chen Liang, et al. 2022. The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink. Computer 55, 7 (2022), 18--28. https://doi.org/10.1109/MC.2022.3148714

Digital Library

[30]

Qualcomm Technologies, Inc. 2023. Snapdragon 8 Gen 3 Mobile Platform Product Brief. urlhttps://docs.qualcomm.com/bundle/publicresource/87-71408-1_REV_C_Snapdragon_8_gen_3_Mobile_Platform_Product_Brief.pdf. Accessed: 2024-04-06.

[31]

Robin Staab, Mark Vero, Mislav Balunović, and Martin Vechev. 2023. Beyond memorization: Violating privacy via inference with large language models. arXiv preprint arXiv:2310.07298 (2023).

[32]

MLC team. 2023. MLC-LLM. https://github.com/mlc-ai/mlc-llm

[33]

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, et al. 2023. LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971 [cs.CL]

[34]

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, et al. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 [cs.CL]

[35]

Jonathan Tow, Marco Bellagente, Dakota Mahan, and Carlos Riquelme. [n. d.]. StableLM 3B 4E1T. https://huggingface.co/stabilityai/stablelm-3b-4e1t

[36]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, et al. 2023. Attention Is All You Need. arXiv:1706.03762 [cs.CL]

[37]

Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, et al. 2022. Emergent Abilities of Large Language Models. arXiv:2206.07682 [cs.CL]

[38]

Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, et al. 2023. SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models. In Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 38087--38099. https://proceedings.mlr.press/v202/xiao23c.html

[39]

Daliang Xu, Wangsong Yin, Xin Jin, Ying Zhang, Shiyun Wei, et al. 2023. Llmcad: Fast and scalable on-device large language model inference. arXiv preprint arXiv:2309.04255 (2023).

[40]

Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, et al. 2024. A survey of resource-efficient llm and multimodal foundation models. arXiv preprint arXiv:2401.08092 (2024).

[41]

Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, et al. 2023. Baichuan 2: Open Large-scale Language Models. arXiv:2309.10305 [cs.CL]

[42]

Wangsong Yin, Mengwei Xu, Yuanchun Li, and Xuanzhe Liu. 2024. LLM as a System Service on Mobile Devices. arXiv preprint arXiv:2403.11805 (2024).

[43]

Jinliang Yuan, Chen Yang, Dongqi Cai, Shihe Wang, Xin Yuan, et al. 2024. Mobile Foundation Model as Firmware. (2024).

[44]

Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. 2019. HellaSwag: Can a Machine Really Finish Your Sentence?. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.

[45]

Zejun Zhang, Li Zhang, Xin Yuan, Anlan Zhang, Mengwei Xu, et al. 2024. A First Look at GPT Apps: Landscape and Vulnerability. arXiv preprint arXiv:2402.15105 (2024).

Index Terms

Large Language Models on Mobile Devices: Measurements, Analysis, and Insights
1. Computing methodologies
  1. Machine learning

Recommendations

Video streaming to mobile handheld devices: challenges in decoding, adaptation, and browsing
MCAM'07: Proceedings of the 2007 international conference on Multimedia content analysis and mining

Growing popularity and richer functionality of contemporary mobile handheld devices such as PDAs and smart phones have enabled emerging video streaming applications to these devices via various wireless networks. However, these handheld devices are ...
Medical image analysis using mobile devices
SAC '08: Proceedings of the 2008 ACM symposium on Applied computing

Mobile devices are increasingly being incorporated onto Picture Archiving and Communication Systems (PACS). Previous work on medical image access using mobile devices essentially focuses on enabling the visualization of the medical images at the mobile ...
Distance-Learning and Converging Mobile Devices
ITNG '09: Proceedings of the 2009 Sixth International Conference on Information Technology: New Generations

This paper reports on the use, effectiveness, and acceptance of graduate computer science course lectures recorded and formatted for mobile devices, including Video iPods, PDAs, and Ultra-Mobile PCs (UMPC). Technology convergence is trending toward that ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

EdgeFM '24: Proceedings of the Workshop on Edge and Mobile Foundation Models

June 2024

44 pages

ISBN:9798400706639

DOI:10.1145/3662006

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMOBILE: ACM Special Interest Group on Mobility of Systems, Users, Data and Computing

In-Cooperation

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

MOBISYS '24

Sponsor:

SIGMOBILE

MOBISYS '24: The 22nd Annual International Conference on Mobile Systems, Applications and Services

June 3 - 7, 2024

Tokyo, Minato-ku, Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
226
Total Downloads

Downloads (Last 12 months)226
Downloads (Last 6 weeks)105

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents