Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3662006.3662059acmconferencesArticle/Chapter ViewAbstractPublication PagesmobisysConference Proceedingsconference-collections
research-article
Free access

Large Language Models on Mobile Devices: Measurements, Analysis, and Insights

Published: 11 June 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Deploying large language models (LLMs) inference into mobile devices is cost-efficient for companies, and well addresses the privacy concern of users. However, the limited computation capacity and memory constraints of mobile devices hinder their practical deployment. Prior work strives to expand model size for better accuracy performance, while there is a lack of systematic understanding of "small" sub-10 billion LLMs that are already feasible for current commodity devices. To better reveal the current landscape of LLMs on mobile devices, we conducted a comprehensive measurement study, deploying 22 models across 4 mobile devices. Our measurements focus on accuracy, inference latency, and memory footprint across various input lengths, devices, and execution engines. The observations from the measurements point us toward promising directions for efficient LLM deployment on mobile devices.

    References

    [1]
    [n.d.]. Baichuan Models. https://huggingface.co/baichuan-inc.
    [2]
    [n. d.]. Koala: A Dialogue Model for Academic Research. https://bair.berkeley.edu/blog/2023/04/03/koala/.
    [3]
    [n.d.]. Pygmalion/Metharme 7B Model. https://huggingface.co/PygmalionAI/pygmalion-7b.
    [4]
    [n. d.]. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90% ChatGPT Quality. https://lmsys.org/blog/2023-03-30-vicuna/.
    [5]
    [n.d.]. WizardLM Model. https://github.com/nlpxucan/WizardLM.
    [6]
    2019. WinoGrande: An Adversarial Winograd Schema Challenge at Scale.
    [7]
    n.d. Android AICore. urlhttps://developer.android.com/ml/aicore. Accessed: 2024-04-06.
    [8]
    Armen Aghajanyan, Lili Yu, Alexis Conneau, Wei-Ning Hsu, Karen Hambardzumyan, et al. 2023. Scaling Laws for Generative Mixed-Modal Language Models. arXiv:2301.03728 [cs.CL]
    [9]
    Amey Agrawal, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, et al. 2023. SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills. arXiv:2308.16369 [cs.LG]
    [10]
    01. AI, Alex Young, Bei Chen, Chao Li, et al. 2024. Yi: Open Foundation Models by 01.AI. arXiv:2403.04652 [cs.CL]
    [11]
    Ebtesam Almazrouei, Hamza Alobeidli, Abdulaziz Alshamsi, Alessandro Cappelli, Ruxandra Cojocaru, et al. 2023. Falcon-40B: an open large language model with state-of-the-art performance. (2023).
    [12]
    Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, et al. 2023. QwenTechnical Report. arXiv preprint arXiv:2309.16609 (2023).
    [13]
    BigScience and Hugging Face. 2022. Introducing The World's Largest Open Multilingual Language Model: BLOOM. url-https://bigscience.huggingface.co/blog/bloom. Accessed: 2024-04-07.
    [14]
    Tianyu Chen, Hangbo Bao, Shaohan Huang, Li Dong, Binxing Jiao, et al. 2022. The-x: Privacy-preserving transformer inference with homomorphic encryption. arXiv preprint arXiv:2206.00216 (2022).
    [15]
    Yiming Cui, Ziqing Yang, and Xin Yao. 2023. Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca. arXiv preprint arXiv:2304.08177 (2023). https://arxiv.org/abs/2304.08177
    [16]
    Databricks. 2023. Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs. urlhttps://www.databricks.com/blog/mpt-7b. Accessed: 2024-04-06.
    [17]
    Alex de Vries. 2023. The growing energy footprint of artificial intelligence. Joule 7, 10 (2023), 2191--2194.
    [18]
    DeepSeek-AI, Xiao Bi, Deli Chen, Guanting Chen, et al. 2024. DeepSeek LLM: Scaling Open-Source Language Models with Longtermism. arXiv:2401.02954 [cs.CL]
    [19]
    Erich Elsen, Augustus Odena, Maxwell Nye, Sağnak Taşirlar, Tri Dao, et al. 2023. Releasing Persimmon-8B. https://www.adept.ai/blog/persimmon-8b
    [20]
    Thomas Mesnard Gemma Team, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Laurent Sifre, et al. 2024. Gemma. (2024). https://doi.org/10.34740/KAGGLE/M/3301
    [21]
    Georgi Gerganov. 2023. llama.cpp. urlhttps://github.com/ggerganov/llama.cpp.
    [22]
    Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, et al. 2016. CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 48), Maria Florina Balcan and Kilian Q. Weinberger (Eds.). PMLR, New York, New York, USA, 201--210. https://proceedings.mlr.press/v48/gilad-bachrach16.html
    [23]
    Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, et al. 2021. Measuring Massive Multitask Language Understanding. arXiv:2009.03300 [cs.CY]
    [24]
    Bofeng Huang. 2023. Vigogne: French Instruction-following and Chat Models. https://github.com/bofenghuang/vigogne.
    [25]
    Yuanchun Li, Hao Wen, Weijun Wang, Xiangyu Li, Yizhen Yuan, et al. 2024. Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security. arXiv preprint arXiv:2401.05459 (2024).
    [26]
    Chengdong Lin, Kun Wang, Zhenjiang Li, and Yu Pu. 2023. A Workload-Aware DVFS Robust to Concurrent Tasks for Mobile Devices. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3570361.3592524
    [27]
    Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Xingyu Dang, et al. 2023. AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration. arXiv:2306.00978 [cs.CL]
    [28]
    mllm team. 2024. mllm. https://github.com/UbiquitousLearning/mllm
    [29]
    David Patterson, Joseph Gonzalez, Urs Hölzle, Quoc Le, Chen Liang, et al. 2022. The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink. Computer 55, 7 (2022), 18--28. https://doi.org/10.1109/MC.2022.3148714
    [30]
    Qualcomm Technologies, Inc. 2023. Snapdragon 8 Gen 3 Mobile Platform Product Brief. urlhttps://docs.qualcomm.com/bundle/publicresource/87-71408-1_REV_C_Snapdragon_8_gen_3_Mobile_Platform_Product_Brief.pdf. Accessed: 2024-04-06.
    [31]
    Robin Staab, Mark Vero, Mislav Balunović, and Martin Vechev. 2023. Beyond memorization: Violating privacy via inference with large language models. arXiv preprint arXiv:2310.07298 (2023).
    [32]
    MLC team. 2023. MLC-LLM. https://github.com/mlc-ai/mlc-llm
    [33]
    Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, et al. 2023. LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971 [cs.CL]
    [34]
    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, et al. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv:2307.09288 [cs.CL]
    [35]
    Jonathan Tow, Marco Bellagente, Dakota Mahan, and Carlos Riquelme. [n. d.]. StableLM 3B 4E1T. https://huggingface.co/stabilityai/stablelm-3b-4e1t
    [36]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, et al. 2023. Attention Is All You Need. arXiv:1706.03762 [cs.CL]
    [37]
    Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, et al. 2022. Emergent Abilities of Large Language Models. arXiv:2206.07682 [cs.CL]
    [38]
    Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, et al. 2023. SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models. In Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 38087--38099. https://proceedings.mlr.press/v202/xiao23c.html
    [39]
    Daliang Xu, Wangsong Yin, Xin Jin, Ying Zhang, Shiyun Wei, et al. 2023. Llmcad: Fast and scalable on-device large language model inference. arXiv preprint arXiv:2309.04255 (2023).
    [40]
    Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, et al. 2024. A survey of resource-efficient llm and multimodal foundation models. arXiv preprint arXiv:2401.08092 (2024).
    [41]
    Aiyuan Yang, Bin Xiao, Bingning Wang, Borong Zhang, Ce Bian, et al. 2023. Baichuan 2: Open Large-scale Language Models. arXiv:2309.10305 [cs.CL]
    [42]
    Wangsong Yin, Mengwei Xu, Yuanchun Li, and Xuanzhe Liu. 2024. LLM as a System Service on Mobile Devices. arXiv preprint arXiv:2403.11805 (2024).
    [43]
    Jinliang Yuan, Chen Yang, Dongqi Cai, Shihe Wang, Xin Yuan, et al. 2024. Mobile Foundation Model as Firmware. (2024).
    [44]
    Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. 2019. HellaSwag: Can a Machine Really Finish Your Sentence?. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
    [45]
    Zejun Zhang, Li Zhang, Xin Yuan, Anlan Zhang, Mengwei Xu, et al. 2024. A First Look at GPT Apps: Landscape and Vulnerability. arXiv preprint arXiv:2402.15105 (2024).

    Index Terms

    1. Large Language Models on Mobile Devices: Measurements, Analysis, and Insights

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      EdgeFM '24: Proceedings of the Workshop on Edge and Mobile Foundation Models
      June 2024
      44 pages
      ISBN:9798400706639
      DOI:10.1145/3662006
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 11 June 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Large Language Model
      2. Measurement Study
      3. Mobile Devices

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      MOBISYS '24
      Sponsor:

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 19
        Total Downloads
      • Downloads (Last 12 months)19
      • Downloads (Last 6 weeks)19

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media