Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2018936.2018952dlproceedingsArticle/Chapter ViewAbstractPublication PagesconllConference Proceedingsconference-collections
research-article
Free access

Automatic keyphrase extraction by bridging vocabulary gap

Published: 23 June 2011 Publication History

Abstract

Keyphrase extraction aims to select a set of terms from a document as a short summary of the document. Most methods extract keyphrases according to their statistical properties in the given document. Appropriate keyphrases, however, are not always statistically significant or even do not appear in the given document. This makes a large vocabulary gap between a document and its keyphrases. In this paper, we consider that a document and its keyphrases both describe the same object but are written in two different languages. By regarding keyphrase extraction as a problem of translating from the language of documents to the language of keyphrases, we use word alignment models in statistical machine translation to learn translation probabilities between the words in documents and the words in keyphrases. According to the translation model, we suggest keyphrases given a new document. The suggested keyphrases are not necessarily statistically frequent in the document, which indicates that our method is more flexible and reliable. Experiments on news articles demonstrate that our method outperforms existing unsupervised methods on precision, recall and F-measure.

References

[1]
M. Banko, V. O. Mittal, and M. J. Witbrock. 2000. Headline generation based on statistical translation. In Proceedings of ACL, pages 318--325.
[2]
A. Berger and J. Lafferty. 1999. Information retrieval as statistical translation. In Proceedings of SIGIR, pages 222--229.
[3]
A. Berger, R. Caruana, D. Cohn, D. Freitag, and V. Mittal. 2000. Bridging the lexical chasm: statistical approaches to answer-finding. In Proceedings of SIGIR, pages 192--199.
[4]
D. M. Blei and J. D. Lafferty, 2009. Text mining: Classification, Clustering, and Applications, chapter Topic models. Chapman & Hall.
[5]
D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, January.
[6]
P. F. Brown, V. J. D. Pietra, S. A. D. Pietra, and R. L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational linguistics, 19(2):263--311.
[7]
A. P. Dempster, N. M. Laird, D. B. Rubin, et al. 1977. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1--38.
[8]
P. Duygulu, Kobus Barnard, J. F. G. de Freitas, and David A. Forsyth. 2002. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Proceedings of ECCV, pages 97--112.
[9]
A. Echihabi and D. Marcu. 2003. A noisy-channel approach to question answering. In Proceedings of ACL, pages 16--23.
[10]
E. Frank, G. W. Paynter, I. H. Witten, C. Gutwin, and C. G. Nevill-Manning. 1999. Domain-specific keyphrase extraction. In Proceedings of IJCAI, pages 668--673.
[11]
J. Goldstein, V. Mittal, J. Carbonell, and M. Kantrowitz. 2000. Multi-document summarization by sentence extraction. In Proceedings of NAACL-ANLP 2000 Workshop on Automatic summarization, pages 40--48.
[12]
G. Heinrich. 2005. Parameter estimation for text analysis. Web: http://www.arbylon.net/publications/text-est.
[13]
T. Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of SIGIR, pages 50--57.
[14]
A. Hulth. 2003. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of EMNLP, pages 216--223.
[15]
M. Karimzadehgan and C. X. Zhai. 2010. Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In Proceedings of SIGIR, pages 323--330.
[16]
P. Koehn. 2010. Statistical Machine Translation. Cambridge University Press.
[17]
T. K. Landauer, P. W. Foltz, and D. Laham. 1998. An introduction to latent semantic analysis. Discourse Processes, 25:259--284.
[18]
Z. Liu, P. Li, Y. Zheng, and M. Sun. 2009a. Clustering to find exemplar terms for keyphrase extraction. In Proceedings of EMNLP, pages 257--266.
[19]
Z. Liu, H. Wang, H. Wu, and S. Li. 2009b. Collocation extraction using monolingual word alignment method. In Proceedings of EMNLP, pages 487--495.
[20]
Z. Liu, W. Huang, Y. Zheng, and M. Sun. 2010a. Automatic keyphrase extraction via topic decomposition. In Proceedings of EMNLP, pages 366--376.
[21]
Z. Liu, H. Wang, H. Wu, and S. Li. 2010b. Improving statistical machine translation with monolingual collocation. In Proceedings of ACL, pages 825--833.
[22]
R. Mihalcea and P. Tarau. 2004. Textrank: Bringing order into texts. In Proceedings of EMNLP, pages 404--411.
[23]
V. Murdock and W. B. Croft. 2004. Simple translation models for sentence retrieval in factoid question answering. In Proceedings of SIGIR.
[24]
T. Nguyen and M. Y. Kan. 2007. Keyphrase extraction in scientific publications. In Proceedings of the 10th International Conference on Asian Digital Libraries, pages 317--326.
[25]
F. J. Och and H. Ney. 2003. A systematic comparison of various statistical alignment models. Computational linguistics, 29(1):19--51.
[26]
L. Page, S. Brin, R. Motwani, and T. Winograd. 1998. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
[27]
C. Quirk, C. Brockett, and W. Dolan. 2004. Monolingual machine translation for paraphrase generation. In Proceedings of EMNLP, volume 149.
[28]
S. Riezler and Y. Liu. 2010. Query rewriting using monolingual statistical machine translation. Computational Linguistics, 36(3):569--582.
[29]
S. Riezler, A. Vasserman, I. Tsochantaridis, V. Mittal, and Y. Liu. 2007. Statistical machine translation for query expansion in answer retrieval. In Proccedings of ACL, pages 464--471.
[30]
S. Riezler, Y. Liu, and A. Vasserman. 2008. Translating queries into snippets for improved query expansion. In Proceedings of COLING, pages 737--744.
[31]
G. Salton and C. Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing and management, 24(5):513--523.
[32]
R. Soricut and E. Brill. 2006. Automatic question answering using the web: Beyond the factoid. Information Retrieval, 9(2):191--206.
[33]
P. D. Turney. 2000. Learning algorithms for keyphrase extraction. Information Retrieval, 2(4):303--336.
[34]
X. Wan and J. Xiao. 2008a. Collabrank: towards a collaborative approach to single-document keyphrase extraction. In Proceedings of COLING, pages 969--976.
[35]
X. Wan and J. Xiao. 2008b. Single document keyphrase extraction using neighborhood knowledge. In Proceedings of AAAI, pages 855--860.
[36]
I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning. 1999. Kea: Practical automatic keyphrase extraction. In Proceedings of DL, pages 254--255.
[37]
X. Xue, J. Jeon, and W. B. Croft. 2008. Retrieval models for question and answer archives. In Proceedings of SIGIR, pages 475--482.
[38]
S. Zhao, H. Wang, and T. Liu. 2010. Paraphrasing with search engine query logs. In Proceedings of COLING, pages 1317--1325.

Cited By

View all
  • (2021)A Text Mining Approach Agent-Based DSS for IT Infrastructure MaintenanceInternational Journal of Decision Support System Technology10.4018/IJDSST.202107010513:3(1-21)Online publication date: 1-Jul-2021
  • (2021)An Interactive Neural Network Approach to Keyphrase Extraction in Talent RecruitmentProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482319(2383-2393)Online publication date: 26-Oct-2021
  • (2019)A User-Centered Concept Mining System for Query and Document Understanding at TencentProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330727(1831-1841)Online publication date: 25-Jul-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
CoNLL '11: Proceedings of the Fifteenth Conference on Computational Natural Language Learning
June 2011
270 pages
ISBN:9781932432923

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 23 June 2011

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)55
  • Downloads (Last 6 weeks)9
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)A Text Mining Approach Agent-Based DSS for IT Infrastructure MaintenanceInternational Journal of Decision Support System Technology10.4018/IJDSST.202107010513:3(1-21)Online publication date: 1-Jul-2021
  • (2021)An Interactive Neural Network Approach to Keyphrase Extraction in Talent RecruitmentProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482319(2383-2393)Online publication date: 26-Oct-2021
  • (2019)A User-Centered Concept Mining System for Query and Document Understanding at TencentProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330727(1831-1841)Online publication date: 25-Jul-2019
  • (2017)A Graph-based Approach of Automatic Keyphrase ExtractionProcedia Computer Science10.1016/j.procs.2017.03.087107:C(248-255)Online publication date: 1-Apr-2017
  • (2016)Personalized Re-ranking of TweetsProceedings of the 17th International Conference on Web Information Systems Engineering - Volume 1004210.1007/978-3-319-48743-4_6(70-84)Online publication date: 7-Nov-2016
  • (2015)Mining Quality Phrases from Massive Text CorporaProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2751523(1729-1744)Online publication date: 27-May-2015
  • (2013)CONCERTProceedings of the 14th international conference on Web-Age Information Management10.1007/978-3-642-38562-9_82(796-798)Online publication date: 14-Jun-2013
  • (2011)A simple word trigger method for social tag suggestionProceedings of the Conference on Empirical Methods in Natural Language Processing10.5555/2145432.2145601(1577-1588)Online publication date: 27-Jul-2011

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media