Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

ChatGPT vs state-of-the-art models: a benchmarking study in keyphrase generation task

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Transformer-based language models, including ChatGPT, have demonstrated exceptional performance in various natural language generation tasks. However, there has been limited research evaluating ChatGPT’s keyphrase generation ability, which involves identifying informative phrases that accurately reflect a document’s content. This study seeks to address this gap by comparing ChatGPT’s keyphrase generation performance with state-of-the-art models, while also testing its potential as a solution for two significant challenges in the field: domain adaptation and keyphrase generation from long documents. We conducted experiments on eight publicly available datasets spanning scientific, news, and biomedical domains, analyzing performance across both short and long documents. Our results show that ChatGPT outperforms current state-of-the-art models in all tested datasets and environments, generating high-quality keyphrases that adapt well to diverse domains and document lengths.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The datasets supporting the conclusions of this article are all publicly available and openly published for research purposes. Links to these datasets can be found in Section 3.1 of this paper. The availability of these datasets ensures transparency and allows for the reproducibility of the research findings.

Notes

  1. https://huggingface.co/datasets/midas/inspec

  2. https://huggingface.co/datasets/midas/kp20k

  3. https://huggingface.co/datasets/midas/nus

  4. https://huggingface.co/datasets/midas/semeval2010

  5. https://huggingface.co/datasets/midas/kptimes

  6. https://huggingface.co/datasets/midas/duc2001

  7. https://huggingface.co/datasets/taln-ls2n/kpbiomed

  8. https://github.com/boudinfl/ake-datasets

  9. https://huggingface.co/bloomberg/KeyBART

References

  1. Hulth A, Megyesi BB (2006) A study on automatically extracted keywords in text categorization. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, pp 537–544. Association for Computational Linguistics, Sydney, Australia. https://doi.org/10.3115/1220175.1220243. https://aclanthology.org/P06-1068

  2. Hammouda KM, Matute DN, Kamel MS (2005) Corephrase: Keyphrase extraction for document clustering. In: Perner P, Imiya A (eds) Machine Learning and Data Mining in Pattern Recognition. Springer, Berlin, Heidelberg, pp 265–274

  3. Qazvinian V, Radev DR, Özgür A (2010) Citation summarization through keyphrase extraction. In: Proceedings of the 23rd international conference on computational linguistics (Coling 2010), pp 895–903. Coling 2010 Organizing Committee, Beijing, China. https://aclanthology.org/C10-1101

  4. Augenstein I, Das M, Riedel S, Vikraman L, McCallum A (2017) SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications

  5. Sanyal DK, Bhowmick PK, Das PP, Chattopadhyay S, Santosh TYSS (2019) Enhancing access to scholarly publications with surrogate resources. Scientometrics 121(2):1129–1164. https://doi.org/10.1007/s11192-019-03227-

    Article  MATH  Google Scholar 

  6. Nguyen TD, Kan M-Y (2007) Keyphrase extraction in scientific publications. In: Goh DH-L, Cao TH, Sølvberg IT, Rasmussen E (eds.) Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers, pp 317–326. Springer, Berlin, Heidelberg

  7. Gollapalli S.D, Li X-l, Yang P (2017) Incorporating expert knowledge into keyphrase extraction. Proceedings of the AAAI Conference on Artificial Intelligence 31(1). https://doi.org/10.1609/aaai.v31i1.10986

  8. Alzaidy R, Caragea C, Giles CL (2019) Bi-lstm-crf sequence labeling for keyphrase extraction from scholarly documents. In: The World Wide Web Conference. WWW ’19, pp. 2551–2557. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3308558.3313642

  9. Rungta M, Kumar R, Dhaliwal MP, Tiwari H, Vala V (2020) Transkp: Transformer based key-phrase extraction. 2020 International Joint Conference on Neural Networks (IJCNN), pp 1–7

  10. Sahrawat D, Mahata D, Zhang H, Kulkarni M, Sharma A, Gosangi R, Stent A, Kumar Y, Shah RR, Zimmermann R (2020) Keyphrase extraction as sequence labeling using contextualized embeddings. In: European conference on information retrieval, pp 328–335. Springer

  11. Kulkarni M, Mahata D, Arora R, Bhowmik R (2022) Learning rich representation of keyphrases from text. In: Findings of the association for computational linguistics: NAACL 2022, pp. 891–906. Association for Computational Linguistics, Seattle, United States. https://doi.org/10.18653/v1/2022.findings-naacl.67. https://aclanthology.org/2022.findings-naacl.67

  12. Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2019) BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

  13. Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

  14. Radford A, Narasimhan K (2018) Improving language understanding by generative pre-training

  15. Ouyang L, Wu J, Jiang X, Almeida D, Wainwright CL, Mishkin P, Zhang C, Agarwal S, Slama K, Ray A, Schulman J, Hilton J, Kelton F, Miller L, Simens M, Askell A, Welinder P, Christiano P, Leike J, Lowe R (2022) Training language models to follow instructions with human feedback

  16. Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler D.M, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D (2020) Language Models are Few-Shot Learners

  17. Mihalcea R, Tarau P (2004) TextRank: Bringing order into text. In: Proceedings of the 2004 Conference on empirical methods in natural language processing, pp 404–411. Association for Computational Linguistics, Barcelona, Spain. https://aclanthology.org/W04-3252

  18. Bougouin A, Boudin F, Daille B (2013) TopicRank: Graph-based topic ranking for keyphrase extraction. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing, pp 543–551. Asian Federation of Natural Language Processing, Nagoya, Japan. https://aclanthology.org/I13-1062

  19. Wang R, Liu W, McDonald C (2014) Corpus-independent generic keyphrase extraction using word embedding vectors. Soft Eng Res Conf 39:1–8

    MATH  Google Scholar 

  20. Bennani-Smires K, Musat C, Hossmann A, Baeriswyl M, Jaggi M (2018) Simple unsupervised keyphrase extraction using sentence embeddings. arXiv:1801.04470

  21. Mahata D, Kuriakose J, Shah R, Zimmermann R (2018) Key2vec: Automatic ranked keyphrase extraction from scientific articles using phrase embeddings. In: Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, vol 2 (Short Papers), pp 634–639

  22. Hulth A (2003) Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 conference on empirical methods in natural language processing. EMNLP ’03, pp 216–223. Association for Computational Linguistics, USA. https://doi.org/10.3115/1119355.1119383

  23. Kim SN, Kan M-Y (2009) Re-examining automatic keyphrase extraction approaches in scientific articles. In: Proceedings of the workshop on multiword expressions: identification, interpretation, disambiguation and applications (MWE 2009), pp 9–16. Association for Computational Linguistics, Singapore. https://aclanthology.org/W09-2902

  24. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient Estimation of Word Representations in Vector Space. arXiv. https://doi.org/10.48550/ARXIV.1301.3781

  25. Pennington J, Socher R, Manning C (2014) GloVe: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543. Association for Computational Linguistics, Doha, Qatar. https://doi.org/10.3115/v1/D14-1162. https://aclanthology.org/D14-1162

  26. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A.N, Kaiser L, Polosukhin I (2017) Attention Is All You Need. arXiv. https://doi.org/10.48550/ARXIV.1706.03762

  27. Martinc M, Škrlj B, Pollak S (2021) TNT-KID: Transformer-based neural tagger for keyword identification. Nat Lang Eng 28(4):409–448. https://doi.org/10.1017/s1351324921000127

    Article  MATH  Google Scholar 

  28. Meng R, Zhao S, Han S, He D, Brusilovsky P, Chi Y (2017) Deep keyphrase generation. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 582–592. Association for Computational Linguistics, Vancouver, Canada. https://doi.org/10.18653/v1/P17-1054. https://aclanthology.org/P17-1054

  29. Ye H, Wang L (2018) Semi-supervised learning for neural keyphrase generation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4142–4153. Association for Computational Linguistics, Brussels, Belgium. https://doi.org/10.18653/v1/D18-1447. https://aclanthology.org/D18-1447

  30. Chen J, Zhang X, Wu Y, Yan Z, Li Z (2018) Keyphrase Generation with Correlation Constraints

  31. Chen W, Gao Y, Zhang J, King I, Lyu MR (2019) Title-guided encoding for keyphrase generation. In: Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence. AAAI’19/IAAI’19/EAAI’19. AAAI Press, ???. https://doi.org/10.1609/aaai.v33i01.33016268

  32. Wang Y, Li J, Chan H.P, King I, Lyu MR, Shi S (2019) Topic-aware neural keyphrase generation for social media language. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 2516–2526. Association for Computational Linguistics, Florence, Italy. https://doi.org/10.18653/v1/P19-1240. https://aclanthology.org/P19-1240

  33. Zhao J, Zhang Y (2019) Incorporating linguistic constraints into keyphrase generation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 5224–5233. Association for Computational Linguistics, Florence, Italy. https://doi.org/10.18653/v1/P19-1515. https://aclanthology.org/P19-1515

  34. Chan HP, Chen W, Wang L, King I (2019) Neural Keyphrase Generation via Reinforcement Learning with Adaptive Rewards

  35. Swaminathan A, Zhang H, Mahata D, Gosangi R, Shah RR, Stent A (2020) A preliminary exploration of GANs for keyphrase generation. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 8021–8030. Association for Computational Linguistics, Online. https://doi.org/10.18653/v1/2020.emnlp-main.645. https://aclanthology.org/2020.emnlp-main.645

  36. Chen W, Chan HP, Li P, King I (2020) Exclusive hierarchical decoding for deep keyphrase generation. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 1095–1105. Association for Computational Linguistics, Online. https://doi.org/10.18653/v1/2020.acl-main.103. https://aclanthology.org/2020.acl-main.103

  37. Yuan X, Wang T, Meng R, Thaker K, Brusilovsky P, He D, Trischler A (2020) One size does not fit all: Generating and evaluating variable number of keyphrases. In: Proceedings of the 58th annual meeting of the association for computational linguistics, pp 7961–7975. Association for Computational Linguistics, Online. https://doi.org/10.18653/v1/2020.acl-main.710. https://aclanthology.org/2020.acl-main.710

  38. Zhao J, Bao J, Wang Y, Wu Y, He X, Zhou B (2021) SGG: learning to select, guide, and generate for keyphrase generation

  39. Huang X, Xu T, Jiao L, Zu Y, Zhang Y (2021) Adaptive beam search decoding for discrete keyphrase generation. Proceed AAAI Conf Artif Intell 35(14):13082–13089. https://doi.org/10.1609/aaai.v35i14.17546

    Article  Google Scholar 

  40. Ye J, Gui T, Luo Y, Xu Y, Zhang Q (2021) One2Set: Generating diverse keyphrases as a set. In: Proceedings of the 59th Annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (vol 1: Long Papers), pp 4598–4608. Association for Computational Linguistics, Online. https://doi.org/10.18653/v1/2021.acl-long.354. https://aclanthology.org/2021.acl-long.354

  41. Chen W, Chan HP, Li P, Bing L, King I (2019) An integrated approach for keyphrase generation via exploring the power of retrieval and extraction. ArXiv arXiv:1904.03454

  42. Ahmad W, Bai X, Lee S, Chang K-W (2021) Select, extract and generate: Neural keyphrase generation with layer-wise coverage attention. In: Proceedings of the 59th annual meeting of the association for computational linguistics and the 11th international joint conference on natural language processing (Volume 1: Long Papers), pp 1389–1404. Association for Computational Linguistics, Online. https://doi.org/10.18653/v1/2021.acl-long.111. https://aclanthology.org/2021.acl-long.111

  43. Wu H, Liu W, Li L, Nie D, Chen T, Zhang F, Wang D (2021) UniKeyphrase: A Unified Extraction and Generation Framework for Keyphrase Prediction

  44. Wu H, Ma B, Liu W, Chen T, Nie D (2022) Fast and constrained absent keyphrase generation by prompt-based learning. Proceed AAAI Conf Artif Intell 36(10):11495–11503. https://doi.org/10.1609/aaai.v36i10.21402

    Article  MATH  Google Scholar 

  45. Song M, Jiang H, Shi S, Yao S, Lu S, Feng Y, Liu H, Jing L (2023) Is ChatGPT A Good Keyphrase Generator? A Preliminary Study

  46. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners

  47. Christiano PF, Leike J, Brown T, Martic M, Legg S, Amodei D (2017) Deep reinforcement learning from human preferences. In: Guyon I, Luxburg U.V, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds.) Advances in Neural Information Processing Systems, vol 30. Curran Associates, Inc., ???. https://proceedings.neurips.cc/paper_files/paper/2017/file/d5e2c0adad503c91f91df240d0cd4e49-Paper.pdf

  48. Ibarz B, Leike J, Pohlen T, Irving G, Legg S, Amodei D (2018) Reward learning from human preferences and demonstrations in Atari

  49. Ziegler D.M, Stiennon N, Wu J, Brown T.B, Radford A, Amodei D, Christiano P, Irving G (2020) Fine-Tuning Language Models from Human Preferences

  50. Stiennon N, Ouyang L, Wu J, Ziegler D.M, Lowe R, Voss C, Radford A, Amodei D, Christiano P (2022) Learning to summarize from human feedback

  51. Böhm F, Gao Y, Meyer CM, Shapira O, Dagan I, Gurevych I (2019) Better rewards yield better summaries: learning to summarise without references

  52. Wu J, Ouyang L, Ziegler DM, Stiennon N, Lowe R, Leike J, Christiano P (2021) Recursively Summarizing Books with Human Feedback

  53. Jaques N, Ghandeharioun A, Shen JH, Ferguson C, Lapedriza A, Jones N, Gu S, Picard R (2019) Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog

  54. Kreutzer J, Khadivi S, Matusov E, Riezler S (2018) Can neural machine translation be improved with user feedback? In: North american chapter of the association for computational linguistics

  55. Lawrence C, Riezler S (2018) Improving a neural semantic parser by counterfactual learning from human bandit feedback. In: Proceedings of the 56th annual meeting of the association for computational linguistics (vol 1: Long Papers), pp 1820–1830. Association for Computational Linguistics, Melbourne, Australia. https://doi.org/10.18653/v1/P18-1169. https://aclanthology.org/P18-1169

  56. Zhou W, Xu K (2020) Learning to compare for better training and evaluation of open domain natural language generation models

  57. Cho WS, Zhang P, Zhang Y, Li X, Galley M, Brockett C, Wang M, Gao J (2019) Towards Coherent and Cohesive Long-form Text Generation

  58. Perez E, Karamcheti S, Fergus R, Weston J, Kiela D, Cho K (2019) Finding generalizable evidence by learning to convince q &a models. arXiv:1909.05863

  59. Madaan A, Tandon N, Clark P, Yang Y (2023) Memory-assisted prompt editing to improve GPT-3 after deployment

  60. Howard J, Ruder S (2018) Universal language model fine-tuning for text classification. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 328–339. Association for Computational Linguistics, Melbourne, Australia. https://doi.org/10.18653/v1/P18-1031. https://aclanthology.org/P18-1031

  61. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding

  62. Dong L, Yang N, Wang W, Wei F, Liu X, Wang Y, Gao J, Zhou M, Hon H-W (2019) Unified language model pre-training for natural language understanding and generation

  63. McCann B, Keskar N.S, Xiong C, Socher R (2018) The natural language decathlon: multitask learning as question answering

  64. Keskar N.S, McCann B, Xiong C, Socher R (2019) Unifying question answering, text classification, and regression via span extraction

  65. Hendy A, Abdelrehim M, Sharaf A, Raunak V, Gabr M, Matsushita H, Kim YJ, Afify M, Awadalla HH (2023) How Good Are GPT Models at Machine Translation? A Comprehensive Evaluation

  66. Kim SN, Medelyan O, Kan M-Y, Baldwin T (2010) Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. In: Proceedings of the 5th international workshop on semantic evaluation. SemEval ’10, pp 21–26. Association for Computational Linguistics, USA

  67. Gallina Y, Boudin F, Daille B (2019) KPTimes: A Large-Scale Dataset for Keyphrase Generation on News Documents

  68. Wan X, Xiao J (2008) Single document keyphrase extraction using neighborhood knowledge. In: Proceedings of the 23rd national conference on artificial intelligence - Volume 2. AAAI’08, pp. 855–860. AAAI Press, ???

  69. Houbre M, Boudin F, Daille B (2022) A large-scale dataset for biomedical keyphrase generation. arXiv:2211.12124

  70. Schutz AT,et al (2008) Keyphrase extraction from single documents in the open domain exploiting linguistic and statistical methods. M. App. Sc Thesis

  71. Ogawa E, Takenaka K, Katakura H, Adachi M, Otake Y, Toda Y, Kotani H, Manabe T, Wada H, Tanaka F (2008) Perimembrane aurora-a expression is a significant prognostic factor in correlation with proliferative activity in non-small-cell lung cancer (nsclc). Annals of Surgical Oncology 15(2):547–554. https://doi.org/10.1245/s10434-007-9653-8

    Article  Google Scholar 

  72. Abri Aghdam K, Zand A, Soltan Sanjari M (2019) Bilateral optic disc edema in a patient with lead poisoning. Journal of Ophthalmic and Vision Research 14(4):513–517. https://doi.org/10.18502/jovr.v14i4.5465

  73. Hoyle E, Genn RF, Fernandes C, Stolerman IP (2006) Impaired performance of alpha7 nicotinic receptor knockout mice in the five-choice serial reaction time task. Psychopharmacology (Berl) 189(2):211–223. https://doi.org/10.1007/s00213-006-0549-2

    Article  Google Scholar 

Download references

Acknowledgements

We would like to express our gratitude to Debanjan Mahata, who served as our master and introduced us to the field of NLP and KPE. His guidance and patience have been invaluable throughout this research project, and we are grateful for his mentorship and support.

We would also like to thank Alejandro Pérez and Sergio Gago for providing the computational resources that were essential for developing and testing the ideas presented in this paper. Their generosity and support have been instrumental in the success of this research project.

Additionally, we acknowledge that an earlier version of this manuscript was made available as a preprint on arXiv https://arxiv.org/abs/2304.14177. We appreciate the broader research community for engaging with this preliminary version.

Finally, we would like to acknowledge the countless individuals and organizations who have contributed to the field of NLP and Keyphrase Extraction, as their work has provided the foundation for this research. We are grateful for their ongoing efforts and dedication to advancing this field, and we hope that this paper will contribute to their ongoing work.

Thank you all for your contributions and support.

Author information

Authors and Affiliations

Authors

Contributions

Roberto Martínez-Cruz played a pivotal role in shaping the study’s conception and design. His expertise was instrumental in the development of the study’s methodology and the analytical processing of the data. Alvaro J. López-López and José Portela were integral to the data gathering and compilation efforts. Additionally, their insightful critiques significantly enriched the study’s intellectual depth. Each author contributed diligently to the manuscript’s creation, and collectively, they have given their consent for the final manuscript to be published.

Corresponding author

Correspondence to Roberto Martínez-Cruz.

Ethics declarations

Competing of interest

The authors declare no competing interests related to this study. All methodologies, analyses, and interpretations of data were conducted independently and without influence from external entities. This research was purely academic and aimed at contributing to the existing body of knowledge on keyphrase generation.

Ethical and Informed Consent

This research did not involve any human participants, data, or tissues, and therefore did not require ethical approval or informed consent. All datasets utilized in this study are publicly available and were openly published for research purposes, adhering to all applicable ethical standards.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Martínez-Cruz, R., López-López, A.J. & Portela, J. ChatGPT vs state-of-the-art models: a benchmarking study in keyphrase generation task. Appl Intell 55, 50 (2025). https://doi.org/10.1007/s10489-024-05901-4

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10489-024-05901-4

Keywords