research-article

Free access

Automatic keyphrase extraction by bridging vocabulary gap

Authors:

Maosong SunAuthors Info & Claims

CoNLL '11: Proceedings of the Fifteenth Conference on Computational Natural Language Learning

Pages 135 - 144

Published: 23 June 2011 Publication History

Abstract

Keyphrase extraction aims to select a set of terms from a document as a short summary of the document. Most methods extract keyphrases according to their statistical properties in the given document. Appropriate keyphrases, however, are not always statistically significant or even do not appear in the given document. This makes a large vocabulary gap between a document and its keyphrases. In this paper, we consider that a document and its keyphrases both describe the same object but are written in two different languages. By regarding keyphrase extraction as a problem of translating from the language of documents to the language of keyphrases, we use word alignment models in statistical machine translation to learn translation probabilities between the words in documents and the words in keyphrases. According to the translation model, we suggest keyphrases given a new document. The suggested keyphrases are not necessarily statistically frequent in the document, which indicates that our method is more flexible and reliable. Experiments on news articles demonstrate that our method outperforms existing unsupervised methods on precision, recall and F-measure.

References

[1]

M. Banko, V. O. Mittal, and M. J. Witbrock. 2000. Headline generation based on statistical translation. In Proceedings of ACL, pages 318--325.

Digital Library

[2]

A. Berger and J. Lafferty. 1999. Information retrieval as statistical translation. In Proceedings of SIGIR, pages 222--229.

Digital Library

[3]

A. Berger, R. Caruana, D. Cohn, D. Freitag, and V. Mittal. 2000. Bridging the lexical chasm: statistical approaches to answer-finding. In Proceedings of SIGIR, pages 192--199.

Digital Library

[4]

D. M. Blei and J. D. Lafferty, 2009. Text mining: Classification, Clustering, and Applications, chapter Topic models. Chapman & Hall.

Digital Library

[5]

D. M. Blei, A. Y. Ng, and M. I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, January.

Digital Library

[6]

P. F. Brown, V. J. D. Pietra, S. A. D. Pietra, and R. L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational linguistics, 19(2):263--311.

Digital Library

[7]

A. P. Dempster, N. M. Laird, D. B. Rubin, et al. 1977. Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1--38.

[8]

P. Duygulu, Kobus Barnard, J. F. G. de Freitas, and David A. Forsyth. 2002. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Proceedings of ECCV, pages 97--112.

Digital Library

[9]

A. Echihabi and D. Marcu. 2003. A noisy-channel approach to question answering. In Proceedings of ACL, pages 16--23.

Digital Library

[10]

E. Frank, G. W. Paynter, I. H. Witten, C. Gutwin, and C. G. Nevill-Manning. 1999. Domain-specific keyphrase extraction. In Proceedings of IJCAI, pages 668--673.

Digital Library

[11]

J. Goldstein, V. Mittal, J. Carbonell, and M. Kantrowitz. 2000. Multi-document summarization by sentence extraction. In Proceedings of NAACL-ANLP 2000 Workshop on Automatic summarization, pages 40--48.

Digital Library

[12]

G. Heinrich. 2005. Parameter estimation for text analysis. Web: http://www.arbylon.net/publications/text-est.

[13]

T. Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of SIGIR, pages 50--57.

Digital Library

[14]

A. Hulth. 2003. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of EMNLP, pages 216--223.

Digital Library

[15]

M. Karimzadehgan and C. X. Zhai. 2010. Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In Proceedings of SIGIR, pages 323--330.

Digital Library

[16]

P. Koehn. 2010. Statistical Machine Translation. Cambridge University Press.

Digital Library

[17]

T. K. Landauer, P. W. Foltz, and D. Laham. 1998. An introduction to latent semantic analysis. Discourse Processes, 25:259--284.

[18]

Z. Liu, P. Li, Y. Zheng, and M. Sun. 2009a. Clustering to find exemplar terms for keyphrase extraction. In Proceedings of EMNLP, pages 257--266.

Digital Library

[19]

Z. Liu, H. Wang, H. Wu, and S. Li. 2009b. Collocation extraction using monolingual word alignment method. In Proceedings of EMNLP, pages 487--495.

Digital Library

[20]

Z. Liu, W. Huang, Y. Zheng, and M. Sun. 2010a. Automatic keyphrase extraction via topic decomposition. In Proceedings of EMNLP, pages 366--376.

Digital Library

[21]

Z. Liu, H. Wang, H. Wu, and S. Li. 2010b. Improving statistical machine translation with monolingual collocation. In Proceedings of ACL, pages 825--833.

Digital Library

[22]

R. Mihalcea and P. Tarau. 2004. Textrank: Bringing order into texts. In Proceedings of EMNLP, pages 404--411.

[23]

V. Murdock and W. B. Croft. 2004. Simple translation models for sentence retrieval in factoid question answering. In Proceedings of SIGIR.

[24]

T. Nguyen and M. Y. Kan. 2007. Keyphrase extraction in scientific publications. In Proceedings of the 10th International Conference on Asian Digital Libraries, pages 317--326.

Digital Library

[25]

F. J. Och and H. Ney. 2003. A systematic comparison of various statistical alignment models. Computational linguistics, 29(1):19--51.

Digital Library

[26]

L. Page, S. Brin, R. Motwani, and T. Winograd. 1998. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.

[27]

C. Quirk, C. Brockett, and W. Dolan. 2004. Monolingual machine translation for paraphrase generation. In Proceedings of EMNLP, volume 149.

[28]

S. Riezler and Y. Liu. 2010. Query rewriting using monolingual statistical machine translation. Computational Linguistics, 36(3):569--582.

Digital Library

[29]

S. Riezler, A. Vasserman, I. Tsochantaridis, V. Mittal, and Y. Liu. 2007. Statistical machine translation for query expansion in answer retrieval. In Proccedings of ACL, pages 464--471.

[30]

S. Riezler, Y. Liu, and A. Vasserman. 2008. Translating queries into snippets for improved query expansion. In Proceedings of COLING, pages 737--744.

Digital Library

[31]

G. Salton and C. Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information processing and management, 24(5):513--523.

Digital Library

[32]

R. Soricut and E. Brill. 2006. Automatic question answering using the web: Beyond the factoid. Information Retrieval, 9(2):191--206.

Digital Library

[33]

P. D. Turney. 2000. Learning algorithms for keyphrase extraction. Information Retrieval, 2(4):303--336.

Digital Library

[34]

X. Wan and J. Xiao. 2008a. Collabrank: towards a collaborative approach to single-document keyphrase extraction. In Proceedings of COLING, pages 969--976.

Digital Library

[35]

X. Wan and J. Xiao. 2008b. Single document keyphrase extraction using neighborhood knowledge. In Proceedings of AAAI, pages 855--860.

Digital Library

[36]

I. H. Witten, G. W. Paynter, E. Frank, C. Gutwin, and C. G. Nevill-Manning. 1999. Kea: Practical automatic keyphrase extraction. In Proceedings of DL, pages 254--255.

Digital Library

[37]

X. Xue, J. Jeon, and W. B. Croft. 2008. Retrieval models for question and answer archives. In Proceedings of SIGIR, pages 475--482.

Digital Library

[38]

S. Zhao, H. Wang, and T. Liu. 2010. Paraphrasing with search engine query logs. In Proceedings of COLING, pages 1317--1325.

Digital Library

Cited By

Elandaloussi SZarate PTaghezout N(2021)A Text Mining Approach Agent-Based DSS for IT Infrastructure MaintenanceInternational Journal of Decision Support System Technology10.4018/IJDSST.202107010513:3(1-21)Online publication date: 1-Jul-2021
https://dl.acm.org/doi/10.4018/IJDSST.2021070105
Yao KQin CZhu HMa CZhang JDu YXiong HDemartini GZuccon GCulpepper JHuang ZTong H(2021)An Interactive Neural Network Approach to Keyphrase Extraction in Talent RecruitmentProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482319(2383-2393)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482319
Liu BGuo WNiu DWang CXu SLin JLai KXu YTeredesai AKumar VLi YRosales RTerzi EKarypis G(2019)A User-Centered Concept Mining System for Query and Document Understanding at TencentProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330727(1831-1841)Online publication date: 25-Jul-2019
https://dl.acm.org/doi/10.1145/3292500.3330727
Show More Cited By

Index Terms

Automatic keyphrase extraction by bridging vocabulary gap
1. Applied computing
  1. Arts and humanities
    1. Language translation
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

Domain-specific keyphrase extraction
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

Document keyphrases provide semantic metadata characterizing documents and producing an overview of the content of a document. They can be used in many text-mining and knowledge management related applications. This paper describes a Keyphrase ...
Automatic keyphrase extraction for Arabic news documents based on KEA system

A keyphrase is a sequence of words that play an important role in the identification of the topics that are embedded in a given document. Keyphrase extraction is a process which extracts such phrases. This has many important applications such as document ...
Improved Automatic Keyphrase Extraction by Using Semantic Information
ICICTA '08: Proceedings of the 2008 International Conference on Intelligent Computation Technology and Automation - Volume 01

Keyphrases provide semantic metadata producing an overview of the content of a document, they are used in many text-mining applications. This paper proposes a new method that improves automatic keyphrase extraction by using semantic information of ...

Comments

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

CoNLL '11: Proceedings of the Fifteenth Conference on Computational Natural Language Learning

June 2011

270 pages

ISBN:9781932432923

Program Chairs:
Sharon Goldwater
University of Edinburgh, United Kingdom
,
Christopher Manning
Stanford University

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 23 June 2011

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
571
Total Downloads

Downloads (Last 12 months)55
Downloads (Last 6 weeks)9

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Elandaloussi SZarate PTaghezout N(2021)A Text Mining Approach Agent-Based DSS for IT Infrastructure MaintenanceInternational Journal of Decision Support System Technology10.4018/IJDSST.202107010513:3(1-21)Online publication date: 1-Jul-2021
https://dl.acm.org/doi/10.4018/IJDSST.2021070105
Yao KQin CZhu HMa CZhang JDu YXiong HDemartini GZuccon GCulpepper JHuang ZTong H(2021)An Interactive Neural Network Approach to Keyphrase Extraction in Talent RecruitmentProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482319(2383-2393)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482319
Liu BGuo WNiu DWang CXu SLin JLai KXu YTeredesai AKumar VLi YRosales RTerzi EKarypis G(2019)A User-Centered Concept Mining System for Query and Document Understanding at TencentProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330727(1831-1841)Online publication date: 25-Jul-2019
https://dl.acm.org/doi/10.1145/3292500.3330727
Ying YQingping TQinzheng XPing ZPanpan L(2017)A Graph-based Approach of Automatic Keyphrase ExtractionProcedia Computer Science10.1016/j.procs.2017.03.087107:C(248-255)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1016/j.procs.2017.03.087
Zhao YLiang SMa J(2016)Personalized Re-ranking of TweetsProceedings of the 17th International Conference on Web Information Systems Engineering - Volume 1004210.1007/978-3-319-48743-4_6(70-84)Online publication date: 7-Nov-2016
https://dl.acm.org/doi/10.1007/978-3-319-48743-4_6
Liu JShang JWang CRen XHan JSellis TDavidson SIves Z(2015)Mining Quality Phrases from Massive Text CorporaProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2751523(1729-1744)Online publication date: 27-May-2015
https://dl.acm.org/doi/10.1145/2723372.2751523
Ren HFeng W(2013)CONCERTProceedings of the 14th international conference on Web-Age Information Management10.1007/978-3-642-38562-9_82(796-798)Online publication date: 14-Jun-2013
https://dl.acm.org/doi/10.1007/978-3-642-38562-9_82
Liu ZChen XSun MMerlo PBarzilay RJohnson M(2011)A simple word trigger method for social tag suggestionProceedings of the Conference on Empirical Methods in Natural Language Processing10.5555/2145432.2145601(1577-1588)Online publication date: 27-Jul-2011
https://dl.acm.org/doi/10.5555/2145432.2145601

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten