Abstract
In Unsupervised Keyphrase Extraction (UKE) tasks, candidate phrases are ranked based on their similarity to the document embedding. However, This method assumes that every document focuses on only one topic. As a result, it can be difficult to distinguish the significance of potential keyphrases among different topics. Hence, it is necessary to discover a method for acquiring diversified topic information to obtain accurate key phrases. In this paper, we propose a new unsupervised key phrase extraction method (MSFFUKE) that utilizes multi-granularity semantic feature fusion. We first cluster phrases into different clusters through granulation, calculate the semantic similarity between phrases and each cluster, and take the mean to obtain the semantic features of topic granularity. Then, we obtain semantic features of phrase granularity based on the degree centrality of candidate phrases in the graph structure. Finally, we integrate semantic features of different granularity to sort candidate phrases. Three public benchmarks (Inspec, DUC 2001, SemEval 2010) are used to evaluate our model and compared it to the most advanced models currently available. The results demonstrate that our model performs better than most models and can generalize well when processing input documents from various domains and of different lengths. Another ablation study indicates that both topic granularity semantic features and phrase granularity semantic features are crucial for unsupervised keyphrase extraction tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bennani-Smires, K., Musat, C., Hossmann, A., Baeriswyl, M., Jaggi, M.: Simple unsupervised keyphrase extraction using sentence embeddings. arXiv preprint arXiv:1801.04470 (2018)
Boudin, F.: Unsupervised keyphrase extraction with multipartite graphs. arXiv preprint arXiv:1803.08721 (2018)
Bougouin, A., Boudin, F., Daille, B.: Topicrank: graph-based topic ranking for keyphrase extraction. In: International Joint Conference on Natural Language Processing (IJCNLP), pp. 543–551 (2013)
Campos, R., Mangaravite, V., Pasquali, A., Jorge, A.M., Nunes, C., Jatowt, A.: YAKE! Collection-independent automatic keyword extractor. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 806–810. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_80
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Florescu, C., Caragea, C.: Positionrank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1105–1115 (2017)
Kong, A., et al.: Promptrank: unsupervised keyphrase extraction using prompt. ACL (2023)
Liang, X., Wu, S., Li, M., Li, Z.: Unsupervised keyphrase extraction by jointly modeling local and global context. arXiv preprint arXiv:2109.07293 (2021)
Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Papagiannopoulou, E., Tsoumakas, G.: Local word vectors guiding keyphrase extraction. Inf. Process. Manag. 54(6), 888–902 (2018)
Sarwar, T.B., Noor, N.M., Miah, M.S.U.: Evaluating keyphrase extraction algorithms for finding similar news articles using lexical similarity calculation and semantic relatedness measurement by word embedding. PeerJ Comput. Sci. 8, e1024 (2022)
Schopf, T., Klimek, S., Matthes, F.: Patternrank: leveraging pretrained language models and part of speech for unsupervised keyphrase extraction. arXiv preprint arXiv:2210.05245 (2022)
Song, M., Feng, Y., Jing, L.: Hyperbolic relevance matching for neural keyphrase extraction. arXiv preprint arXiv:2205.02047 (2022)
Song, M., Feng, Y., Jing, L.: A survey on recent advances in keyphrase extraction from pre-trained language models. Find. Assoc. Comput. Linguist. EACL 2023, 2108–2119 (2023)
Song, M., Liu, H., Feng, Y., Jing, L.: Improving embedding-based unsupervised keyphrase extraction by incorporating structural information. ACL Finds (2023)
Song, M., Xiao, L., Jing, L.: Learning to extract from multiple perspectives for neural keyphrase extraction. Comput. Speech Lang. 81, 101502 (2023)
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)
Sun, Y., Qiu, H., Zheng, Y., Wang, Z., Zhang, C.: SIFRank: a new baseline for unsupervised keyphrase extraction based on pre-trained language model. IEEE Access 8, 10896–10906 (2020)
Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: AAAI, vol. 8, pp. 855–860 (2008)
Wang, R., Liu, W., McDonald, C.: Corpus-independent generic keyphrase extraction using word embedding vectors. In: Software Engineering Research Conference, vol. 39, pp. 1–8 (2014)
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: Xlnet: generalized autoregressive pretraining for language understanding. Adv. Neural Inf. Process. Syst. 32 (2019)
Zhang, C., Zhao, L., Zhao, M., Zhang, Y.: Enhancing keyphrase extraction from academic articles with their reference information. Scientometrics 127(2), 703–731 (2022)
Zhang, L., et al.: Mderank: a masked document embedding rank approach for unsupervised keyphrase extraction. arXiv preprint arXiv:2110.06651 (2021)
Acknowledgements
This work was supported by the Major Program of the National Natural Science Foundation of China (Grant No.61876001, 61876157), the National Social Science Foundation of China (GrantNo.18ZDA032), the Natural Science Foundation for the Higher Education Institutions of Anhui Province of China (KJ2021A0039).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, J., Hu, H., Zhao, S., Zhang, Y. (2023). Unsupervised KeyPhrase Extraction Based on Multi-granular Semantics Feature Fusion. In: Campagner, A., Urs Lenz, O., Xia, S., Ślęzak, D., Wąs, J., Yao, J. (eds) Rough Sets. IJCRS 2023. Lecture Notes in Computer Science(), vol 14481. Springer, Cham. https://doi.org/10.1007/978-3-031-50959-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-50959-9_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50958-2
Online ISBN: 978-3-031-50959-9
eBook Packages: Computer ScienceComputer Science (R0)