Retrieving Multimodal Information for Augmented Generation: A Survey

Zhao, Ruochen; Chen, Hailin; Wang, Weishi; Jiao, Fangkai; Do, Xuan Long; Qin, Chengwei; Ding, Bosheng; Guo, Xiaobao; Li, Minzhi; Li, Xingxuan; Joty, Shafiq

Computer Science > Computation and Language

arXiv:2303.10868 (cs)

[Submitted on 20 Mar 2023 (v1), last revised 1 Dec 2023 (this version, v3)]

Title:Retrieving Multimodal Information for Augmented Generation: A Survey

Authors:Ruochen Zhao, Hailin Chen, Weishi Wang, Fangkai Jiao, Xuan Long Do, Chengwei Qin, Bosheng Ding, Xiaobao Guo, Minzhi Li, Xingxuan Li, Shafiq Joty

View PDF

Abstract:As Large Language Models (LLMs) become popular, there emerged an important trend of using multimodality to augment the LLMs' generation ability, which enables LLMs to better interact with the world. However, there lacks a unified perception of at which stage and how to incorporate different modalities. In this survey, we review methods that assist and augment generative models by retrieving multimodal knowledge, whose formats range from images, codes, tables, graphs, to audio. Such methods offer a promising solution to important concerns such as factuality, reasoning, interpretability, and robustness. By providing an in-depth review, this survey is expected to provide scholars with a deeper understanding of the methods' applications and encourage them to adapt existing techniques to the fast-growing field of LLMs.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2303.10868 [cs.CL]
	(or arXiv:2303.10868v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2303.10868

Submission history

From: Ruochen Zhao [view email]
[v1] Mon, 20 Mar 2023 05:07:41 UTC (7,134 KB)
[v2] Sun, 8 Oct 2023 03:26:35 UTC (7,427 KB)
[v3] Fri, 1 Dec 2023 02:58:09 UTC (7,427 KB)

Computer Science > Computation and Language

Title:Retrieving Multimodal Information for Augmented Generation: A Survey

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Retrieving Multimodal Information for Augmented Generation: A Survey

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators