Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

FEDS-ICL: : Enhancing translation ability and efficiency of large language model by optimizing demonstration selection

Published: 25 September 2024 Publication History

Abstract

Large language models (LLMs) that exhibit a remarkable ability by in-context learning (ICL) with bilingual demonstrations have been recognized as a potential solution for machine translation. However, the process of selecting these demonstrations from vast datastores is notoriously time-consuming and inefficient. Moreover, the strategies for designing effective in-context demonstrations are not well-established. To address these critical gaps, we introduce a novel Fast and Effective approach for Demonstration Selection in-Context learning (FEDS-ICL) tailored to LLMs. Our method is designed to mainly enhance the efficiency and accuracy of translation of LLMs. Our approach revolutionizes demonstration selection by designing new product quantization technique that rapidly extracts neighboring target tokens from a strategically curated subset of sentences. This method significantly deviates from the conventional exhaustive search across entire datastores, leading to a remarkable increase in speed. Furthermore, FEDS-ICL pioneers an innovative template design for in-context demonstrations, specifically crafted to amplify the translation capabilities of multilingual LLMs. In experiments, we compare our FEDS-ICL with various existing methods on across diverse language pairs on ten different LLMs. The results reveal an up to 2.1-fold increase in selection speed and an impressive enhancement in translation accuracy, outperforming existing baselines by up to 2.0 BLEU points at least on ten different LLMs. The ablation study show the proposed product quantization and multi-view demonstration can effectively enhance the efficiency and accuracy of LLMs in machine translation. The analysis on robustness of FEDS-ICL shows that the incorporation of a greater number of demonstrations can lead a positive correlation between the quantity of contextually rich demonstrations and the translation quality of LLMs. These advancements position FEDS-ICL as a transformative methodology in the domain of machine translation and pattern analysis, marking a significant leap towards more efficient and precise machine translation.

Highlights

We explore how to enhance translation ability and efficiency of large language model.
A new product quantization technique to accelerate selecting demonstrations.
An innovative template design for in-context learning to implement machine translation.

References

[1]
Agrawal S., Zhou C., Lewis M., Zettlemoyer L., Ghazvininejad M., In-context examples selection for machine translation, in: Findings of the association for computational linguistics: ACL 2023, toronto, Canada, July 9-14, 2023, 2023, pp. 8857–8873,.
[2]
Aharoni R., Goldberg Y., Unsupervised domain clusters in pretrained language models, in: Jurafsky D., Chai J., Schluter N., Tetreault J.R. (Eds.), Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, online, July 5-10, 2020, Association for Computational Linguistics, 2020, pp. 7747–7763,.
[3]
Alves D., Guerreiro N.M., Alves J., Pombal J., Rei R., de Souza J.G.C., et al., Steering large language models for machine translation with finetuning and in-context learning, in: Bouamor H., Pino J., Bali K. (Eds.), Findings of the association for computational linguistics: EMNLP 2023, Singapore, December 6-10, 2023, Association for Computational Linguistics, 2023, pp. 11127–11148. URL https://aclanthology.org/2023.findings-emnlp.744.
[4]
Artetxe M., Schwenk H., Margin-based parallel corpus mining with multilingual sentence embeddings, in: Korhonen A., Traum D.R., Màrquez L. (Eds.), Proceedings of the 57th conference of the association for computational linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, volume 1: long papers, Association for Computational Linguistics, 2019, pp. 3197–3203,.
[5]
Basyal L., Sanghvi M., Text summarization using large language models: A comparative study of MPT-7b-instruct, falcon-7b-instruct, and openai chat-GPT models, 2023,. CoRR arXiv:2310.10449.
[6]
Bawden R., Yvon F., Investigating the translation performance of a large multilingual language model: the case of BLOOM, in: Nurminen M., Brenner J., Koponen M., Latomaa S., Mikhailov M., Schierl F., Ranasinghe T., Vanmassenhove E., Vidal S.A., Aranberri N., Nunziatini M., Escartín C.P., Forcada M.L., Popovic M., Scarton C., Moniz H. (Eds.), Proceedings of the 24th annual conference of the European association for machine translation, EAMT 2023, Tampere, Finland, 12-15 June 2023, European Association for Machine Translation, 2023, pp. 157–170. URL https://aclanthology.org/2023.eamt-1.16.
[7]
Cai D., Wang Y., Li H., Lam W., Liu L., Neural machine translation with monolingual translation memory, in: Zong C., Xia F., Li W., Navigli R. (Eds.), Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, ACL/IJCNLP 2021, (volume 1: long papers), virtual event, August 1-6, 2021, Association for Computational Linguistics, 2021, pp. 7307–7318,.
[8]
Chen Y., Liu Y., Meng F., Chen Y., Xu J., Zhou J., Improving translation faithfulness of large language models via augmenting instructions, 2023,. CoRR arXiv:2308.12674.
[9]
Deguchi H., Watanabe T., Matsui Y., Utiyama M., Tanaka H., Sumita E., Subset retrieval nearest neighbor machine translation, in: Rogers A., Boyd-Graber J.L., Okazaki N. (Eds.), Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers), ACL 2023, Toronto, Canada, July 9-14, 2023, Association for Computational Linguistics, 2023, pp. 174–189,.
[10]
Fu Z., Lam W., Yu Q., So A.M., Hu S., Liu Z., et al., Decoder-only or encoder-decoder? Interpreting language model as a regularized encoder-decoder, 2023, CoRR arXiv:2304.04052.
[11]
Gao T., Fisch A., Chen D., Making pre-trained language models better few-shot learners, in: Zong C., Xia F., Li W., Navigli R. (Eds.), Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, ACL/IJCNLP 2021, (volume 1: long papers), virtual event, August 1-6, 2021, Association for Computational Linguistics, 2021, pp. 3816–3830,.
[12]
Garcia X., Bansal Y., Cherry C., Foster G.F., Krikun M., Johnson M., et al., The unreasonable effectiveness of few-shot learning for machine translation, in: Krause A., Brunskill E., Cho K., Engelhardt B., Sabato S., Scarlett J. (Eds.), International conference on machine learning, ICML 2023, 23-29 July 2023, honolulu, hawaii, USA, in: Proceedings of machine learning research, Vol. 202, PMLR, 2023, pp. 10867–10878. URL https://proceedings.mlr.press/v202/garcia23a.html.
[13]
Garcia X., Firat O., Using natural language prompts for machine translation, 2022, CoRR arXiv:2202.11822, URL https://arxiv.org/abs/2202.11822.
[14]
Ghazvininejad M., Gonen H., Zettlemoyer L., Dictionary-based phrase-level prompting of large language models for machine translation, 2023,. CoRR arXiv:2302.07856.
[15]
Gupta K., Thérien B., Ibrahim A., Richter M.L., Anthony Q., Belilovsky E., et al., Continual pre-training of large language models: How to (re)warm your model?, 2023,. CoRR arXiv:2308.04014.
[16]
Han L., Erofeev G., Sorokina I., Gladkoff S., Nenadic G., Examining large pre-trained language models for machine translation: What you don’t know about it, in: Koehn P., Barrault L., Bojar O., Bougares F., Chatterjee R., Costa-jussà M.R., Federmann C., Fishel M., Fraser A., Freitag M., Graham Y., Grundkiewicz R., Guzman P., Haddow B., Huck M., Jimeno-Yepes A., Kocmi T., Martins A., Morishita M., Monz C., Nagata M., Nakazawa T., Negri M., Névéol A., Neves M., Popel M., Turchi M., Zampieri M. (Eds.), Proceedings of the seventh conference on machine translation, WMT 2022, abu dhabi, United arab emirates (hybrid), December 7-8, 2022, Association for Computational Linguistics, 2022, pp. 908–919. URL https://aclanthology.org/2022.wmt-1.84.
[17]
Hao H., Huang G., Liu L., Zhang Z., Shi S., Wang R., Rethinking translation memory augmented neural machine translation, in: Rogers A., Boyd-Graber J.L., Okazaki N. (Eds.), Findings of the association for computational linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023, Association for Computational Linguistics, 2023, pp. 2589–2605,.
[18]
He Q., Huang G., Cui Q., Li L., Liu L., Fast and accurate neural machine translation with translation memory, in: Zong C., Xia F., Li W., Navigli R. (Eds.), Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, ACL/IJCNLP 2021, (volume 1: long papers), virtual event, August 1-6, 2021, Association for Computational Linguistics, 2021, pp. 3170–3180,.
[19]
He Z., Liang T., Jiao W., Zhang Z., Yang Y., Wang R., et al., Exploring human-like translation strategy with large language models, 2023, arXiv preprint arXiv:2305.04118.
[20]
Jiang H., Lu Z., Meng F., Zhou C., Zhou J., Huang D., et al., Towards robust k-nearest-neighbor machine translation, in: Goldberg Y., Kozareva Z., Zhang Y. (Eds.), Proceedings of the 2022 conference on empirical methods in natural language processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Association for Computational Linguistics, 2022, pp. 5468–5477,.
[21]
Jiang A.Q., Sablayrolles A., Mensch A., Bamford C., Chaplot D.S., Casas D.d., et al., Mistral 7B, 2023,. CoRR arXiv:2310.06825.
[22]
Johnson J., Douze M., Jégou H., Billion-scale similarity search with GPUs, IEEE Transactions on Big Data 7 (3) (2021) 535–547,.
[23]
Kocmi T., Bawden R., Bojar O., Dvorkovich A., Federmann C., Fishel M., et al., Findings of the 2022 conference on machine translation (WMT22), in: Koehn P., Barrault L., Bojar O., Bougares F., Chatterjee R., Costa-jussà M.R., Federmann C., Fishel M., Fraser A., Freitag M., Graham Y., Grundkiewicz R., Guzman P., Haddow B., Huck M., Jimeno-Yepes A., Kocmi T., Martins A., Morishita M., Monz C., Nagata M., Nakazawa T., Negri M., Névéol A., Neves M., Popel M., Turchi M., Zampieri M. (Eds.), Proceedings of the seventh conference on machine translation, WMT 2022, abu dhabi, United arab emirates (hybrid), December 7-8, 2022, Association for Computational Linguistics, 2022, pp. 1–45. URL https://aclanthology.org/2022.wmt-1.1.
[24]
Lengkeek M., van der Knaap F., Frasincar F., Leveraging hierarchical language models for aspect-based sentiment analysis on financial data, Information Processing & Management 60 (5) (2023),.
[25]
Lin X.V., Mihaylov T., Artetxe M., Wang T., Chen S., Simig D., et al., Few-shot learning with multilingual generative language models, in: Goldberg Y., Kozareva Z., Zhang Y. (Eds.), Proceedings of the 2022 conference on empirical methods in natural language processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Association for Computational Linguistics, 2022, pp. 9019–9052,.
[26]
Liu J., Shen D., Zhang Y., Dolan B., Carin L., Chen W., What makes good in-context examples for GPT-3?, 2021, arXiv preprint arXiv:2101.06804.
[27]
Lu H., Huang H., Zhang D., Yang H., Lam W., Wei F., Chain-of-dictionary prompting elicits translation in large language models, 2023,. CoRR arXiv:2305.06575.
[28]
Lyu C., Xu J., Wang L., New trends in machine translation using large language models: Case examples with ChatGPT, 2023,. CoRR arXiv:2305.01181.
[29]
Matsui Y., Hinami R., Satoh S., Reconfigurable inverted index, in: Boll S., Lee K.M., Luo J., Zhu W., Byun H., Chen C.W., Lienhart R., Mei T. (Eds.), 2018 ACM multimedia conference on multimedia conference, MM 2018, Seoul, Republic of Korea, October 22-26, 2018, ACM, 2018, pp. 1715–1723,.
[30]
Min S., Lyu X., Holtzman A., Artetxe M., Lewis M., Hajishirzi H., et al., Rethinking the role of demonstrations: What makes in-context learning work?, in: Goldberg Y., Kozareva Z., Zhang Y. (Eds.), Proceedings of the 2022 conference on empirical methods in natural language processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Association for Computational Linguistics, 2022, pp. 11048–11064,.
[31]
Moslem Y., Haque R., Kelleher J.D., Way A., Adaptive machine translation with large language models, in: Nurminen M., Brenner J., Koponen M., Latomaa S., Mikhailov M., Schierl F., Ranasinghe T., Vanmassenhove E., Vidal S.A., Aranberri N., Nunziatini M., Escartín C.P., Forcada M.L., Popovic M., Scarton C., Moniz H. (Eds.), Proceedings of the 24th annual conference of the European association for machine translation, EAMT 2023, Tampere, Finland, 12-15 June 2023, European Association for Machine Translation, 2023, pp. 227–237. URL https://aclanthology.org/2023.eamt-1.22.
[32]
Nowakowski K., Ptaszynski M., Murasaki K., Nieuwazny J., Adapting multilingual speech representation model for a new, underresourced language through multilingual fine-tuning and continued pretraining, Information Processing & Management 60 (2) (2023),.
[33]
Pires, T., Schlinger, E., & Garrette, D. (2019). How Multilingual is Multilingual BERT?. In Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 4996–5001).
[34]
Post M., A call for clarity in reporting BLEU scores, 2018, arXiv preprint arXiv:1804.08771.
[35]
Puduppully R., Dabre R., Aw A.T., Chen N.F., Decomposed prompting for machine translation between related languages using large language models, 2023,. CoRR arXiv:2305.13085.
[36]
Reheman A., Zhou T., Luo Y., Yang D., Xiao T., Zhu J., Prompting neural machine translation with translation memories, in: Williams B., Chen Y., Neville J. (Eds.), Thirty-seventh AAAI conference on artificial intelligence, AAAI 2023, thirty-fifth conference on innovative applications of artificial intelligence, IAAI 2023, thirteenth symposium on educational advances in artificial intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, AAAI Press, 2023, pp. 13519–13527,.
[37]
Scao T.L., Fan A., Akiki C., Pavlick E., Ilić S., Hesslow D., et al., Bloom: A 176b-parameter open-access multilingual language model, 2022, arXiv preprint arXiv:2211.05100.
[38]
Song R., Giunchiglia F., Li Y., Shi L., Xu H., Measuring and mitigating language model biases in abusive language detection, Information Processing & Management 60 (3) (2023),.
[39]
Suzuki J., Zen H., Kazawa H., Extracting representative subset from extensive text data for training pre-trained language models, Information Processing & Management 60 (3) (2023),.
[40]
Touvron H., Lavril T., Izacard G., Martinet X., Lachaux M., Lacroix T., et al., LLaMA: Open and efficient foundation language models, 2023,. CoRR arXiv:2302.13971.
[41]
Touvron H., Martin L., Stone K., Albert P., Almahairi A., Babaei Y., et al., Llama 2: Open foundation and fine-tuned chat models, 2023,. CoRR arXiv:2307.09288.
[42]
Vilar D., Freitag M., Cherry C., Luo J., Ratnakar V., Foster G.F., Prompting PaLM for translation: Assessing strategies and performance, in: Rogers A., Boyd-Graber J.L., Okazaki N. (Eds.), Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers), ACL 2023, Toronto, Canada, July 9-14, 2023, Association for Computational Linguistics, 2023, pp. 15406–15427,.
[43]
Xu H., Kim Y.J., Sharaf A., Awadalla H.H., A paradigm shift in machine translation: Boosting translation performance of large language models, 2023,. CoRR arXiv:2309.11674.
[44]
Zhang S., Fang Q., Zhang Z., Ma Z., Zhou Y., Huang L., et al., BayLing: Bridging cross-lingual alignment and instruction following through interactive translation for large language models, 2023,. CoRR arXiv:2306.10968.
[45]
Zhang B., Haddow B., Birch A., Prompting large language model for machine translation: A case study, in: Krause A., Brunskill E., Cho K., Engelhardt B., Sabato S., Scarlett J. (Eds.), International conference on machine learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, in: Proceedings of machine learning research, Vol. 202, PMLR, 2023, pp. 41092–41110. URL https://proceedings.mlr.press/v202/zhang23m.html.
[46]
Zhang S., Roller S., Goyal N., Artetxe M., Chen M., Chen S., et al., OPT: open pre-trained transformer language models, 2022,. CoRR arXiv:2205.01068.
[47]
Zhao Z., Wallace E., Feng S., Klein D., Singh S., Calibrate before use: Improving few-shot performance of language models, in: Meila M., Zhang T. (Eds.), Proceedings of the 38th international conference on machine learning, ICML 2021, 18-24 July 2021, virtual event, in: Proceedings of machine learning research, Vol. 139, PMLR, 2021, pp. 12697–12706. URL http://proceedings.mlr.press/v139/zhao21c.html.
[48]
Zheng X., Zhang Z., Guo J., Huang S., Chen B., Luo W., et al., Adaptive nearest neighbor machine translation, in: Zong C., Xia F., Li W., Navigli R. (Eds.), Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing, ACL/IJCNLP 2021, (volume 2: short papers), virtual event, August 1-6, 2021, Association for Computational Linguistics, 2021, pp. 368–374,.
[49]
Zhu W., Liu H., Dong Q., Xu J., Kong L., Chen J., et al., Multilingual machine translation with large language models: Empirical results and analysis, 2023,. CoRR arXiv:2304.04675.
[50]
Zhu W., Xu J., Huang S., Kong L., Chen J., INK: injecting kNN knowledge in nearest neighbor machine translation, in: Rogers A., Boyd-Graber J.L., Okazaki N. (Eds.), Proceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers), ACL 2023, Toronto, Canada, July 9-14, 2023, Association for Computational Linguistics, 2023, pp. 15948–15959,.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Information Processing and Management: an International Journal
Information Processing and Management: an International Journal  Volume 61, Issue 5
Sep 2024
850 pages

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 25 September 2024

Author Tags

  1. Natural language processing
  2. Large language model
  3. Machine translation
  4. In-context learning

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media