Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3662006.3662059acmconferencesArticle/Chapter ViewAbstractPublication PagesmobisysConference Proceedingsconference-collections
research-article

Large Language Models on Mobile Devices: Measurements, Analysis, and Insights

Published: 11 June 2024 Publication History

Abstract

Deploying large language models (LLMs) inference into mobile devices is cost-efficient for companies, and well addresses the privacy concern of users. However, the limited computation capacity and memory constraints of mobile devices hinder their practical deployment. Prior work strives to expand model size for better accuracy performance, while there is a lack of systematic understanding of "small" sub-10 billion LLMs that are already feasible for current commodity devices. To better reveal the current landscape of LLMs on mobile devices, we conducted a comprehensive measurement study, deploying 22 models across 4 mobile devices. Our measurements focus on accuracy, inference latency, and memory footprint across various input lengths, devices, and execution engines. The observations from the measurements point us toward promising directions for efficient LLM deployment on mobile devices.

References

[1]
[n.d.]. Baichuan Models. https://huggingface.co/baichuan-inc.
[2]
[n. d.]. Koala: A Dialogue Model for Academic Research. https://bair.berkeley.edu/blog/2023/04/03/koala/.
[3]
[n.d.]. Pygmalion/Metharme 7B Model. https://huggingface.co/PygmalionAI/pygmalion-7b.
[4]
[n. d.]. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality. https://lmsys.org/blog/2023-03-30-vicuna/.
[5]
[n.d.]. WizardLM Model. https://github.com/nlpxucan/WizardLM.
[6]
2019. WinoGrande: An Adversarial Winograd Schema Challenge at Scale.
[7]
n.d. Android AICore. urlhttps://developer.android.com/ml/aicore. Accessed: 2024-04-06.
[8]
Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, et al. 2023. Scaling Laws for Generative Mixed-Modal Language Models. arXiv:2301.03728 [cs.CL]
[9]
Amey Agrawal, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, et al. 2023. SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills. arXiv:2308.16369 [cs.LG]
[10]
01. AI, Alex Young, Bei Chen, Chao Li, et al. 2024. Yi: Open Foundation Models by 01.AI. arXiv:2403.04652 [cs.CL]
[11]
Ebtesam Almazrouei, Hamza Alobeidli, Abdulaziz Alshamsi, Alessandro Cappelli, Ruxandra Cojocaru, et al. 2023. Falcon-40B: an open large language model with state-of-the-art performance. (2023).
[12]
Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, et al. 2023. QwenTechnical Report. arXiv preprint arXiv:2309.16609 (2023).
[13]
BigScience and Hugging Face. 2022. Introducing The World's Largest Open Multilingual Language Model: BLOOM. url-https://bigscience.huggingface.co/blog/bloom. Accessed: 2024-04-07.
[14]
Tianyu Chen, Hangbo Bao, Shaohan Huang, Li Dong, Binxing Jiao, et al. 2022. The-x: Privacy-preserving transformer inference with homomorphic encryption. arXiv preprint arXiv:2206.00216 (2022).
[15]
Yiming Cui, Ziqing Yang, and Xin Yao. 2023. Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca. arXiv preprint arXiv:2304.08177 (2023). https://arxiv.org/abs/2304.08177
[16]
Databricks. 2023. Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs. urlhttps://www.databricks.com/blog/mpt-7b. Accessed: 2024-04-06.
[17]
Alex de Vries. 2023. The growing energy footprint of artificial intelligence. Joule 7, 10 (2023), 2191--2194.
[18]
DeepSeek-AI, Xiao Bi, Deli Chen, Guanting Chen, et al. 2024. DeepSeek LLM: Scaling Open-Source Language Models with Longtermism. arXiv:2401.02954 [cs.CL]
[19]
Erich Elsen, Augustus Odena, Maxwell Nye, Sağnak Taşirlar, Tri Dao, et al. 2023. Releasing Persimmon-8B. https://www.adept.ai/blog/persimmon-8b
[20]
Thomas Mesnard Gemma Team, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Laurent Sifre, et al. 2024. Gemma. (2024). https://doi.org/10.34740/KAGGLE/M/3301
[21]
Georgi Gerganov. 2023. llama.cpp. urlhttps://github.com/ggerganov/llama.cpp.
[22]
Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, et al. 2016. CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 48), Maria Florina Balcan and Kilian Q. Weinberger (Eds.). PMLR, New York, New York, USA, 201--210. https://proceedings.mlr.press/v48/gilad-bachrach16.html
[23]
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, et al. 2021. Measuring Massive Multitask Language Understanding. arXiv:2009.03300 [cs.CY]
[24]
Bofeng Huang. 2023. Vigogne: French Instruction-following and Chat Models. https://github.com/bofenghuang/vigogne.
[25]
Yuanchun Li, Hao Wen, Weijun Wang, Xiangyu Li, Yizhen Yuan, et al. 2024. Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security. arXiv preprint arXiv:2401.05459 (2024).
[26]
Chengdong Lin, Kun Wang, Zhenjiang Li, and Yu Pu. 2023. A Workload-Aware DVFS Robust to Concurrent Tasks for Mobile Devices. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3570361.3592524
[27]
Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Xingyu Dang, et al. 2023. AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration. arXiv:2306.00978 [cs.CL]
[28]
mllm team. 2024. mllm. https://github.com/UbiquitousLearning/mllm
[29]
David Patterson, Joseph Gonzalez, Urs Hölzle, Quoc Le, Chen Liang, et al. 2022. The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink. Computer 55, 7 (2022), 18--28. https://doi.org/10.1109/MC.2022.3148714
[30]
Qualcomm Technologies, Inc. 2023. Snapdragon 8 Gen 3 Mobile Platform Product Brief. urlhttps://docs.qualcomm.com/bundle/publicresource/87-71408-1_REV_C_Snapdragon_8_gen_3_Mobile_Platform_Product_Brief.pdf. Accessed: 2024-04-06.
[31]
Robin Staab, Mark Vero, Mislav Balunović, and Martin Vechev. 2023. Beyond memorization: Violating privacy via inference with large language models. arXiv preprint arXiv:2310.07298 (2023).
[32]
MLC team. 2023. MLC-LLM. https://github.com/mlc-ai/mlc-llm
[33]
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, et al. 2023. LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971 [cs.CL]
[34]
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, et al. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 [cs.CL]
[35]
Jonathan Tow, Marco Bellagente, Dakota Mahan, and Carlos Riquelme. [n. d.]. StableLM 3B 4E1T. https://huggingface.co/stabilityai/stablelm-3b-4e1t
[36]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, et al. 2023. Attention Is All You Need. arXiv:1706.03762 [cs.CL]
[37]
Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, et al. 2022. Emergent Abilities of Large Language Models. arXiv:2206.07682 [cs.CL]
[38]
Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, et al. 2023. SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models. In Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 38087--38099. https://proceedings.mlr.press/v202/xiao23c.html
[39]
Daliang Xu, Wangsong Yin, Xin Jin, Ying Zhang, Shiyun Wei, et al. 2023. Llmcad: Fast and scalable on-device large language model inference. arXiv preprint arXiv:2309.04255 (2023).
[40]
Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, et al. 2024. A survey of resource-efficient llm and multimodal foundation models. arXiv preprint arXiv:2401.08092 (2024).
[41]
Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, et al. 2023. Baichuan 2: Open Large-scale Language Models. arXiv:2309.10305 [cs.CL]
[42]
Wangsong Yin, Mengwei Xu, Yuanchun Li, and Xuanzhe Liu. 2024. LLM as a System Service on Mobile Devices. arXiv preprint arXiv:2403.11805 (2024).
[43]
Jinliang Yuan, Chen Yang, Dongqi Cai, Shihe Wang, Xin Yuan, et al. 2024. Mobile Foundation Model as Firmware. (2024).
[44]
Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. 2019. HellaSwag: Can a Machine Really Finish Your Sentence?. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
[45]
Zejun Zhang, Li Zhang, Xin Yuan, Anlan Zhang, Mengwei Xu, et al. 2024. A First Look at GPT Apps: Landscape and Vulnerability. arXiv preprint arXiv:2402.15105 (2024).

Index Terms

  1. Large Language Models on Mobile Devices: Measurements, Analysis, and Insights

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    EdgeFM '24: Proceedings of the Workshop on Edge and Mobile Foundation Models
    June 2024
    44 pages
    ISBN:9798400706639
    DOI:10.1145/3662006
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Large Language Model
    2. Measurement Study
    3. Mobile Devices

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    MOBISYS '24
    Sponsor:

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 441
      Total Downloads
    • Downloads (Last 12 months)441
    • Downloads (Last 6 weeks)140
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media