demonstration

Keyword Augmented Retrieval: Novel framework for Information Retrieval integrated with speech interface

Authors:

Rahul SundarAuthors Info & Claims

AIMLSystems '23: Proceedings of the Third International Conference on AI-ML Systems

Article No.: 58, Pages 1 - 5

https://doi.org/10.1145/3639856.3639916

Published: 17 May 2024 Publication History

Abstract

Retrieving answers in a quick and low cost manner without hallucinations from a combination of structured and unstructured data using Language models is a major hurdle. This is what prevents employment of Language models in knowledge retrieval automation. This becomes accentuated when one wants to integrate a speech interface on top of a text based knowledge retrieval system. Besides, for commercial search and chat-bot applications, complete reliance on commercial large language models (LLMs) like GPT 3.5 etc. can be very costly. In the present study, the authors have addressed the aforementioned problem by first developing a keyword based search framework which augments discovery of the context from the document to be provided to the LLM. The keywords in turn are generated by a relatively smaller LLM and cached for comparison with keywords generated by the same smaller LLM against the query raised. This significantly reduces time and cost to find the context within documents. Once the context is set, a larger LLM uses that to provide answers based on a prompt tailored for Q&A. This research work demonstrates that use of keywords in context identification reduces the overall inference time and cost of information retrieval. Given this reduction in inference time and cost with the keyword augmented retrieval framework, a speech based interface for user input and response readout was integrated. This allowed a seamless interaction with the language model.

References

[1]

[n. d.]. OpenAI Platform — platform.openai.com. https://platform.openai.com/docs/libraries. [Accessed 26-09-2023].

[2]

[n. d.]. SpeechRecognition — pypi.org. https://pypi.org/project/SpeechRecognition/. [Accessed 26-09-2023].

[3]

1975. A vector space model for automatic indexing. Commun. ACM 18, 11 (1975), 613–620.

[4]

Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems 30, 1-7 (1998), 107–117.

[5]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arxiv:1810.04805 [cs.CL]

[7]

Norbert Fuhr. 1992. Probabilistic models in information retrieval. The computer journal 35, 3 (1992), 243–255.

[8]

Maarten Grootendorst. 2020. KeyBERT: Minimal keyword extraction with BERT.https://doi.org/10.5281/zenodo.4461265

[9]

Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. ACM. https://doi.org/10.1145/2983323.2983769

Digital Library

[10]

Yucheng Li. 2023. Unlocking Context Constraints of LLMs: Enhancing Context Efficiency of LLMs with Self-Information-Based Content Filtering. arxiv:2304.12102 [cs.CL]

[11]

Potsawee Manakul, Adian Liusie, and Mark JF Gales. 2023. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896 (2023).

[12]

Stephen Robertson, Hugo Zaragoza, 2009. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval 3, 4 (2009), 333–389.

Digital Library

[13]

Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing & management 24, 5 (1988), 513–523.

[14]

Oguzhan Topsakal and Tahir Cetin Akinci. 2023. Creating Large Language Model Applications Utilizing LangChain: A Primer on Developing LLM Apps Fast. In International Conference on Applied Engineering and Natural Sciences, Vol. 1. 1050–1056.

[15]

Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le. [n. d.]. Finetuned Language Models are Zero-Shot Learners. In International Conference on Learning Representations.

[16]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. 2023. A Survey of Large Language Models. arxiv:2303.18223 [cs.CL]

Index Terms

Keyword Augmented Retrieval: Novel framework for Information Retrieval integrated with speech interface
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Information Retrieval eXperience (IRX): Towards a Human-Centered Personalized Model of Relevance
WI-IAT '10: Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03

We approach Information Retrieval (IR) from a User eXperience (UX) perspective. Through introducing a model for Information Retrieval eXperience (IRX), this paper operationalizes a perspective on IR that reaches beyond topicality. Based on a document's ...
Structural feedback for keyword-based XML retrieval
ECIR'06: Proceedings of the 28th European conference on Advances in Information Retrieval

Keyword-based queries are an important means to retrieve information from XML collections with unknown or complex schemas. Relevance Feedback integrates relevance information provided by a user to enhance retrieval quality. For keyword-based XML queries,...
Enhancing keyword-based botanical information retrieval with information extraction
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Keyword-based retrieval matches search terms and documents via term co-occurrence. Such an approach does not allow matching based on the specific plant characteristic descriptions that are often used in botanical text retrieval. This study applies ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AIMLSystems '23: Proceedings of the Third International Conference on AI-ML Systems

October 2023

381 pages

ISBN:9798400716492

DOI:10.1145/3639856

Copyright © 2023 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2024

Check for updates

Author Tags

Qualifiers

Demonstration
Research
Refereed limited

Conference

AIMLSystems 2023

AIMLSystems 2023: The Third International Conference on Artificial Intelligence and Machine Learning Systems

October 25 - 28, 2023

Bangalore, India

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
53
Total Downloads

Downloads (Last 12 months)53
Downloads (Last 6 weeks)6

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents