research-article

Retrieval-Augmented Generation with Quantized Large Language Models: A Comparative Analysis

Authors:

Benchang Zheng,

Hu HuangAuthors Info & Claims

IoTAAI '23: Proceedings of the 2023 5th International Conference on Internet of Things, Automation and Artificial Intelligence

Pages 120 - 124

https://doi.org/10.1145/3653081.3653102

Published: 03 May 2024 Publication History

Abstract

Large language models have demonstrated emergent intelligence and ability to handle a wide array of tasks. However, the reliability of these models in terms of factual accuracy and timely knowledge acquisition remains a challenge. Researchers explore the implementation of retrieval-augmented generation methods, aiming to enhance the authenticity and specificity in knowledge-intensive tasks. This paper discusses the practical application in industrial settings, particularly in assisting design personnel with navigating complex standards and quality manuals. Utilizing an open-source model with 6 billion parameters, the study employs quantization technology for local deployment, addressing computational challenges. The retrieval-augmented generation framework is analyzed, emphasizing the integration of document parsing, vector databases, and text embedding models. Experimental results compare models at different quantization levels, revealing trade-offs between response time, model size, and performance metrics. The findings suggest that 4-bit integer quantization is optimal for standard document retrieval and question-answering tasks, highlighting practical considerations for CPU inference. The paper concludes with insights into hyper-parameter tuning, model comparisons, and future optimizations for enhanced performance in edge device deployments of large language models.

References

[1]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. 2023. A Survey of Large Language Models. https://doi.org/10.48550/arXiv.2303.18223

[2]

Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, and Yi Zhang. 2023. Sparks of Artificial General Intelligence: Early experiments with GPT-4. Retrieved November 13, 2023 from http://arxiv.org/abs/2303.12712

[3]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, and Amanda Askell. 2020. Language models are few-shot learners. Advances in neural information processing systems 33, (2020), 1877–1901.

[4]

Partha Pratim Ray. 2023. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems (2023). Retrieved December 6, 2023 from https://www.sciencedirect.com/science/article/pii/S266734522300024X

[5]

Som S. Biswas. 2023. Role of Chat GPT in Public Health. Ann Biomed Eng 51, 5 (May 2023), 868–869. https://doi.org/10.1007/s10439-023-03172-7

[6]

Michael Dowling and Brian Lucey. 2023. ChatGPT for (finance) research: The Bananarama conjecture. Finance Research Letters 53, (2023), 103662.

[7]

Jiaxi Cui, Zongjian Li, Yang Yan, Bohua Chen, and Li Yuan. 2023. ChatLaw: Open-Source Legal Large Language Model with Integrated External Knowledge Bases.

[8]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, and Tim Rocktäschel. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33, (2020), 9459–9474.

[9]

Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. 2022. LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale. Retrieved December 6, 2023 from http://arxiv.org/abs/2208.07339

[10]

Haihao Shen, Hanwen Chang, Bo Dong, Yu Luo, and Hengyu Meng. 2023. Efficient LLM Inference on CPUs. Retrieved December 6, 2023 from http://arxiv.org/abs/2311.00502

[11]

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, and Alex Ray. 2022. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35, (2022), 27730–27744.

[12]

Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V. Le. 2022. Finetuned Language Models Are Zero-Shot Learners. Retrieved December 6, 2023 from http://arxiv.org/abs/2109.01652

[13]

Amelia Glaese, Nat McAleese, Maja Trębacz, John Aslanides, Vlad Firoiu, Timo Ewalds, Maribeth Rauh, Laura Weidinger, Martin Chadwick, Phoebe Thacker, Lucy Campbell-Gillingham, Jonathan Uesato, Po-Sen Huang, Ramona Comanescu, Fan Yang, Abigail See, Sumanth Dathathri, Rory Greig, Charlie Chen, Doug Fritz, Jaume Sanchez Elias, Richard Green, Soňa Mokrá, Nicholas Fernando, Boxi Wu, Rachel Foley, Susannah Young, Iason Gabriel, William Isaac, John Mellor, Demis Hassabis, Koray Kavukcuoglu, Lisa Anne Hendricks, and Geoffrey Irving. 2022. Improving alignment of dialogue agents via targeted human judgements. Retrieved December 6, 2023 from http://arxiv.org/abs/2209.14375

[14]

Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, and Jie Tang. 2022. GLM: General Language Model Pretraining with Autoregressive Blank Infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022. 320–335.

[15]

Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, and others. 2022. Glm-130b: An open bilingual pre-trained model. arXiv preprint arXiv:2210.02414 (2022).

[16]

Baichuan. 2023. Baichuan 2: Open Large-scale Language Models. arXiv preprint arXiv:2309.10305 (2023). Retrieved from https://arxiv.org/abs/2309.10305

[17]

InternLM Team. 2023. InternLM: A Multilingual Language Model with Progressively Enhanced Capabilities. Retrieved from https://github.com/InternLM/InternLM

[18]

Yuzhen Huang, Yuzhuo Bai, Zhihao Zhu, Junlei Zhang, Jinghan Zhang, Tangjun Su, Junteng Liu, Chuancheng Lv, Yikai Zhang, Jiayi Lei, Yao Fu, Maosong Sun, and Junxian He. 2023. C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models. In Advances in Neural Information Processing Systems, 2023.

[19]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the Knowledge in a Neural Network. Retrieved December 6, 2023 from http://arxiv.org/abs/1503.02531

[20]

Harrison Lee, Samrat Phatale, Hassan Mansoor, Thomas Mesnard, Johan Ferret, Kellie Lu, Colton Bishop, Ethan Hall, Victor Carbune, Abhinav Rastogi, and Sushant Prakash. 2023. RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback. https://doi.org/10.48550/arXiv.2309.00267

[21]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002. 311–318. . Retrieved December 6, 2023 from https://aclanthology.org/P02-1040.pdf

[22]

Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, 2004. 74–81. . Retrieved December 6, 2023 from https://aclanthology.org/W04-1013.pdf

[23]

pypdfium2-team. 2023. pypdfium2. Retrieved from https://github.com/pypdfium2-team/pypdfium2

[24]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data 7, 3 (2019), 535–547.

[25]

Shitao Xiao, Zheng Liu, Peitian Zhang, and Niklas Muennighoff. 2023. C-Pack: Packaged Resources To Advance General Chinese Embedding.

[26]

li-plus. 2023. ChatGLM.cpp. Retrieved from https://github.com/li-plus/chatglm.cpp

Index Terms

Retrieval-Augmented Generation with Quantized Large Language Models: A Comparative Analysis
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation

Recommendations

A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models
KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

As one of the most advanced techniques in AI, Retrieval-Augmented Generation (RAG) can offer reliable and up-to-date external knowledge, providing huge convenience for numerous tasks. Particularly in the era of AI-Generated Content (AIGC), the powerful ...
Retrieval Augmented Spoken Language Generation for Transport Domain
Text, Speech, and Dialogue
Abstract
RAG-based models have gained significant attention in recent times mainly due to their ability to address some of the key challenges like mitigation hallucination, incorporation of knowledge from external sources and traceability in the reasoning ...
Steganographic Text Generation Based on Large Language Models in Dialogue Scenarios
Natural Language Processing and Chinese Computing
Abstract
The text comprehension and generation capabilities of large language models (LLMs) have become extremely powerful, fulfilling the needs of many application scenarios. And online chatting is very popular, which provides new application scenarios ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

IoTAAI '23: Proceedings of the 2023 5th International Conference on Internet of Things, Automation and Artificial Intelligence

November 2023

902 pages

ISBN:9798400716485

DOI:10.1145/3653081

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

IoTAAI 2023

IoTAAI 2023: 2023 5th International Conference on Internet of Things, Automation and Artificial Intelligence

November 24 - 26, 2023

Nanchang, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
104
Total Downloads

Downloads (Last 12 months)104
Downloads (Last 6 weeks)19

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents