Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3653081.3653102acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiotaaiConference Proceedingsconference-collections
research-article

Retrieval-Augmented Generation with Quantized Large Language Models: A Comparative Analysis

Published: 03 May 2024 Publication History

Abstract

Large language models have demonstrated emergent intelligence and ability to handle a wide array of tasks. However, the reliability of these models in terms of factual accuracy and timely knowledge acquisition remains a challenge. Researchers explore the implementation of retrieval-augmented generation methods, aiming to enhance the authenticity and specificity in knowledge-intensive tasks. This paper discusses the practical application in industrial settings, particularly in assisting design personnel with navigating complex standards and quality manuals. Utilizing an open-source model with 6 billion parameters, the study employs quantization technology for local deployment, addressing computational challenges. The retrieval-augmented generation framework is analyzed, emphasizing the integration of document parsing, vector databases, and text embedding models. Experimental results compare models at different quantization levels, revealing trade-offs between response time, model size, and performance metrics. The findings suggest that 4-bit integer quantization is optimal for standard document retrieval and question-answering tasks, highlighting practical considerations for CPU inference. The paper concludes with insights into hyper-parameter tuning, model comparisons, and future optimizations for enhanced performance in edge device deployments of large language models.

References

[1]
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. 2023. A Survey of Large Language Models. https://doi.org/10.48550/arXiv.2303.18223
[2]
Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, and Yi Zhang. 2023. Sparks of Artificial General Intelligence: Early experiments with GPT-4. Retrieved November 13, 2023 from http://arxiv.org/abs/2303.12712
[3]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, and Amanda Askell. 2020. Language models are few-shot learners. Advances in neural information processing systems 33, (2020), 1877–1901.
[4]
Partha Pratim Ray. 2023. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems (2023). Retrieved December 6, 2023 from https://www.sciencedirect.com/science/article/pii/S266734522300024X
[5]
Som S. Biswas. 2023. Role of Chat GPT in Public Health. Ann Biomed Eng 51, 5 (May 2023), 868–869. https://doi.org/10.1007/s10439-023-03172-7
[6]
Michael Dowling and Brian Lucey. 2023. ChatGPT for (finance) research: The Bananarama conjecture. Finance Research Letters 53, (2023), 103662.
[7]
Jiaxi Cui, Zongjian Li, Yang Yan, Bohua Chen, and Li Yuan. 2023. ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases.
[8]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, and Tim Rocktäschel. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33, (2020), 9459–9474.
[9]
Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. 2022. LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale. Retrieved December 6, 2023 from http://arxiv.org/abs/2208.07339
[10]
Haihao Shen, Hanwen Chang, Bo Dong, Yu Luo, and Hengyu Meng. 2023. Efficient LLM Inference on CPUs. Retrieved December 6, 2023 from http://arxiv.org/abs/2311.00502
[11]
Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, and Alex Ray. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35, (2022), 27730–27744.
[12]
Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V. Le. 2022. Finetuned Language Models Are Zero-Shot Learners. Retrieved December 6, 2023 from http://arxiv.org/abs/2109.01652
[13]
Amelia Glaese, Nat McAleese, Maja Trębacz, John Aslanides, Vlad Firoiu, Timo Ewalds, Maribeth Rauh, Laura Weidinger, Martin Chadwick, Phoebe Thacker, Lucy Campbell-Gillingham, Jonathan Uesato, Po-Sen Huang, Ramona Comanescu, Fan Yang, Abigail See, Sumanth Dathathri, Rory Greig, Charlie Chen, Doug Fritz, Jaume Sanchez Elias, Richard Green, Soňa Mokrá, Nicholas Fernando, Boxi Wu, Rachel Foley, Susannah Young, Iason Gabriel, William Isaac, John Mellor, Demis Hassabis, Koray Kavukcuoglu, Lisa Anne Hendricks, and Geoffrey Irving. 2022. Improving alignment of dialogue agents via targeted human judgements. Retrieved December 6, 2023 from http://arxiv.org/abs/2209.14375
[14]
Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, and Jie Tang. 2022. GLM: General Language Model Pretraining with Autoregressive Blank Infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022. 320–335.
[15]
Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, and others. 2022. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414 (2022).
[16]
Baichuan. 2023. Baichuan 2: Open Large-scale Language Models. arXiv preprint arXiv:2309.10305 (2023). Retrieved from https://arxiv.org/abs/2309.10305
[17]
InternLM Team. 2023. InternLM: A Multilingual Language Model with Progressively Enhanced Capabilities. Retrieved from https://github.com/InternLM/InternLM
[18]
Yuzhen Huang, Yuzhuo Bai, Zhihao Zhu, Junlei Zhang, Jinghan Zhang, Tangjun Su, Junteng Liu, Chuancheng Lv, Yikai Zhang, Jiayi Lei, Yao Fu, Maosong Sun, and Junxian He. 2023. C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models. In Advances in Neural Information Processing Systems, 2023.
[19]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the Knowledge in a Neural Network. Retrieved December 6, 2023 from http://arxiv.org/abs/1503.02531
[20]
Harrison Lee, Samrat Phatale, Hassan Mansoor, Thomas Mesnard, Johan Ferret, Kellie Lu, Colton Bishop, Ethan Hall, Victor Carbune, Abhinav Rastogi, and Sushant Prakash. 2023. RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback. https://doi.org/10.48550/arXiv.2309.00267
[21]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002. 311–318. . Retrieved December 6, 2023 from https://aclanthology.org/P02-1040.pdf
[22]
Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, 2004. 74–81. . Retrieved December 6, 2023 from https://aclanthology.org/W04-1013.pdf
[23]
pypdfium2-team. 2023. pypdfium2. Retrieved from https://github.com/pypdfium2-team/pypdfium2
[24]
Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 7, 3 (2019), 535–547.
[25]
Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff. 2023. C-Pack: Packaged Resources To Advance General Chinese Embedding.
[26]
li-plus. 2023. ChatGLM.cpp. Retrieved from https://github.com/li-plus/chatglm.cpp

Index Terms

  1. Retrieval-Augmented Generation with Quantized Large Language Models: A Comparative Analysis

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    IoTAAI '23: Proceedings of the 2023 5th International Conference on Internet of Things, Automation and Artificial Intelligence
    November 2023
    902 pages
    ISBN:9798400716485
    DOI:10.1145/3653081
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 May 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    IoTAAI 2023

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 104
      Total Downloads
    • Downloads (Last 12 months)104
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media