A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers

Huang, Kaiyu; Mo, Fengran; Li, Hongliang; Li, You; Zhang, Yuanchi; Yi, Weijian; Mao, Yulong; Liu, Jinchen; Xu, Yuzhuang; Xu, Jinan; Nie, Jian-Yun; Liu, Yang

Computer Science > Computation and Language

arXiv:2405.10936 (cs)

[Submitted on 17 May 2024]

Title:A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers

Authors:Kaiyu Huang, Fengran Mo, Hongliang Li, You Li, Yuanchi Zhang, Weijian Yi, Yulong Mao, Jinchen Liu, Yuzhuang Xu, Jinan Xu, Jian-Yun Nie, Yang Liu

View PDF HTML (experimental)

Abstract:The rapid development of Large Language Models (LLMs) demonstrates remarkable multilingual capabilities in natural language processing, attracting global attention in both academia and industry. To mitigate potential discrimination and enhance the overall usability and accessibility for diverse language user groups, it is important for the development of language-fair technology. Despite the breakthroughs of LLMs, the investigation into the multilingual scenario remains insufficient, where a comprehensive survey to summarize recent approaches, developments, limitations, and potential solutions is desirable. To this end, we provide a survey with multiple perspectives on the utilization of LLMs in the multilingual scenario. We first rethink the transitions between previous and current research on pre-trained language models. Then we introduce several perspectives on the multilingualism of LLMs, including training and inference methods, model security, multi-domain with language culture, and usage of datasets. We also discuss the major challenges that arise in these aspects, along with possible solutions. Besides, we highlight future research directions that aim at further enhancing LLMs with multilingualism. The survey aims to help the research community address multilingual problems and provide a comprehensive understanding of the core concepts, key techniques, and latest developments in multilingual natural language processing based on LLMs.

Comments:	54 pages, Work in Progress
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2405.10936 [cs.CL]
	(or arXiv:2405.10936v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2405.10936

Submission history

From: Kaiyu Huang [view email]
[v1] Fri, 17 May 2024 17:47:39 UTC (690 KB)

Computer Science > Computation and Language

Title:A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators