Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1699510.1699544dlproceedingsArticle/Chapter ViewAbstractPublication PagesemnlpConference Proceedingsconference-collections
research-article
Free access

Clustering to find exemplar terms for keyphrase extraction

Published: 06 August 2009 Publication History
  • Get Citation Alerts
  • Abstract

    Keyphrases are widely used as a brief summary of documents. Since manual assignment is time-consuming, various unsupervised ranking methods based on importance scores are proposed for keyphrase extraction. In practice, the keyphrases of a document should not only be statistically important in the document, but also have a good coverage of the document. Based on this observation, we propose an unsupervised method for keyphrase extraction. Firstly, the method finds exemplar terms by leveraging clustering techniques, which guarantees the document to be semantically covered by these exemplar terms. Then the keyphrases are extracted from the document using the exemplar terms. Our method outperforms sate-of-the-art graph-based ranking methods (TextRank) by 9.5% in F1-measure.

    References

    [1]
    Mo Chen, Jian-Tao Sun, Hua-Jun Zeng, and Kwok-Yan Lam. 2005. A practical system of keyphrase extraction for web pages. In Proceedings of the 14th ACM international conference on Information and knowledge management, pages 277--278.
    [2]
    Wen Y. Chen, Yangqiu Song, Hongjie Bai, Chih J. Lin, and Edward Chang. 2008. Psc: Paralel spectral clustering. Submitted.
    [3]
    Rudi L. Cilibrasi and Paul M. B. Vitanyi. 2007. The google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 19(3):370--383.
    [4]
    Mark Dredze, Hanna M. Wallach, Danny Puller, and Fernando Pereira. 2008. Generating summary keywords for emails using topics. In Proceedings of the 13th international conference on Intelligent user interfaces, pages 199--206.
    [5]
    S. Elbeltagy and A. Rafea. 2009. Kp-miner: A keyphrase extraction system for english and arabic documents. Information Systems, 34(1): 132--144.
    [6]
    Eibe Frank, Gordon W. Paynter, Ian H. Witten, Carl Gutwin, and Craig G. Nevill-Manning. 1999. Domain-specific keyphrase extraction. In Proceedings of the 16th International Joint Conference on Artificial Intelligence, pages 668--673.
    [7]
    Brendan J J. Frey and Delbert Dueck. 2007. Clustering by passing messages between data points. Science.
    [8]
    E. Gabrilovich and S. Markovitch. 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 6--12.
    [9]
    M. Grineva, M. Grinev, and D. Lizorkin. 2009. Extracting key terms from noisy and multi-theme documents. In Proceedings of the 18th international conference on Worldwide web, pages 661--670. ACM New York, NY, USA.
    [10]
    Jiawei Han and Micheline Kamber. 2005. Data Mining: Concepts and Techniques, second edition. Morgan Kaufmann.
    [11]
    Chong Huang, Yonghong Tian, Zhi Zhou, Charles X. Ling, and Tiejun Huang. 2006. Keyphrase extraction using semantic networks structure analysis. In Proceedings of the 6th International Conference on Data Mining, pages 275--284.
    [12]
    Anette Hulth. 2003. Improved automatic keyword extraction given more linguistic knowledge. In Proceedings of the 2003 conference on Empirical methods in natural language processing, pages 216--223.
    [13]
    A. Hulth. 2004. Reducing false positives by expert combination in automatic keyword indexing. Recent Advances in Natural Language Processing III: Selected Papers from RANLP 2003, page 367.
    [14]
    Daniel Kelleher and Saturnino Luz. 2005. Automatic hypertext keyphrase detection. In Proceedings of the 19th International Joint Conference on Artificial Intelligence.
    [15]
    Marina Litvak and Mark Last. 2008. Graph-based keyword extraction for single-document summarization. In Proceedings of the workshop Multi-source Multilingual Information Extraction and Summarization, pages 17--24.
    [16]
    Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing order into texts. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing.
    [17]
    Peter D. Turney. 1999. Learning to Extract Keyphrases from Text. National Research Council Canada, Institute for Information Technology, Technical Report ERB-1057.
    [18]
    U. von Luxburg. 2006. A tutorial on spectral clustering. Technical report, Max Planck Institute for Biological Cybernetics.
    [19]
    Xiaojun Wan and Jianguo Xiao. 2008a. Collabrank: Towards a collaborative approach to single-document keyphrase extraction. In Proceedings of COLING, pages 969--976.
    [20]
    Xiaojun Wan and Jianguo Xiao. 2008b. Single document keyphrase extraction using neighborhood knowledge. In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, pages 855--860.

    Cited By

    View all
    • (2023)TechPat: Technical Phrase Extraction for Patent MiningACM Transactions on Knowledge Discovery from Data10.1145/359660317:9(1-31)Online publication date: 15-Jun-2023
    • (2022)WAIN: Automatic Web Application Identification and Naming MethodProceedings of the 13th Asia-Pacific Symposium on Internetware10.1145/3545258.3545271(37-44)Online publication date: 11-Jun-2022
    • (2021)Unsupervised Keyword Combination Query Generation from Online Health Related Content for Evidence-Based Fact CheckingThe 23rd International Conference on Information Integration and Web Intelligence10.1145/3487664.3487701(267-277)Online publication date: 29-Nov-2021
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image DL Hosted proceedings
    EMNLP '09: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
    August 2009
    505 pages
    ISBN:9781932432596

    Publisher

    Association for Computational Linguistics

    United States

    Publication History

    Published: 06 August 2009

    Qualifiers

    • Research-article

    Acceptance Rates

    Overall Acceptance Rate 73 of 234 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)29
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 12 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)TechPat: Technical Phrase Extraction for Patent MiningACM Transactions on Knowledge Discovery from Data10.1145/359660317:9(1-31)Online publication date: 15-Jun-2023
    • (2022)WAIN: Automatic Web Application Identification and Naming MethodProceedings of the 13th Asia-Pacific Symposium on Internetware10.1145/3545258.3545271(37-44)Online publication date: 11-Jun-2022
    • (2021)Unsupervised Keyword Combination Query Generation from Online Health Related Content for Evidence-Based Fact CheckingThe 23rd International Conference on Information Integration and Web Intelligence10.1145/3487664.3487701(267-277)Online publication date: 29-Nov-2021
    • (2021)Review on adopting concept extraction in weak signals detection in competitive intelligenceThe 7th Annual International Conference on Arab Women in Computing in Conjunction with the 2nd Forum of Women in Research10.1145/3485557.3485560(1-8)Online publication date: 25-Aug-2021
    • (2020)Unsupervised Keyword Extraction Methods Based on a Word Graph NetworkInternational Journal of Ambient Computing and Intelligence10.4018/IJACI.202004010411:2(68-79)Online publication date: 1-Apr-2020
    • (2019)SemKeyphrase: An Unsupervised Approach to Keyphrase Extraction from MOOC Video LecturesIEEE/WIC/ACM International Conference on Web Intelligence10.1145/3350546.3352535(303-307)Online publication date: 14-Oct-2019
    • (2019)Inferring Search Queries from Web Documents via a Graph-Augmented Sequence to Attention NetworkThe World Wide Web Conference10.1145/3308558.3313746(2792-2798)Online publication date: 13-May-2019
    • (2019)Keyphrase Extraction from Disaster-related TweetsThe World Wide Web Conference10.1145/3308558.3313696(1555-1566)Online publication date: 13-May-2019
    • (2019)Bi-LSTM-CRF Sequence Labeling for Keyphrase Extraction from Scholarly DocumentsThe World Wide Web Conference10.1145/3308558.3313642(2551-2557)Online publication date: 13-May-2019
    • (2019)NamedKeysProceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3307339.3342147(328-337)Online publication date: 4-Sep-2019
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media