Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3331184.3331359acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Corpus-based Set Expansion with Lexical Features and Distributed Representations

Published: 18 July 2019 Publication History

Abstract

Corpus-based set expansion refers to mining "sibling" entities of some given seed entities from a corpus. Previous works are limited to using either textual context matching or semantic matching to fulfill this task. Neither matching method takes full advantage of the rich information in free text. We present CaSE, an efficient unsupervised corpus-based set expansion framework that leverages lexical features as well as distributed representations of entities for the set expansion task. Experiments show that CaSE outperforms state-of-the-art set expansion algorithms in terms of expansion accuracy.

References

[1]
H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, and H. Li. 2008. Context-aware query suggestion by mining click-through and session data. In Proceedings SIGKDD. ACM, 875--883.
[2]
Z. Chen, M. Cafarella, and H. Jagadish. 2016. Long-tail vocabulary dictionary extraction from the web. In Proceedings WSDM. ACM, 625--634.
[3]
A. P. De Vries, A.-M. Vercoustre, J. A. Thom, N. Craswell, and M. Lalmas. 2007. Overview of the INEX 2007 entity ranking track. Springer, 245--251.
[4]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018).
[5]
Z. Ghahramani and K. A. Heller. 2006. Bayesian sets. In Advances in neural information processing systems. 435--442.
[6]
Z. S. Harris. 1954. Distributional structure. Word, Vol. 10, 2--3 (1954), 146--162.
[7]
Y. He and D. Xin. 2011. Seisa: set expansion by iterative similarity aggregation. In Proceedings of the 20th international conference on World wide web. ACM, 427--436.
[8]
C. Kelly and L. Kelly. 2019. http://www.manythings.org/
[9]
J. Lang and J. Henderson. 2013. Graph-based seed set expansion for relation extraction using random walk hitting times. In Proceedings NAACL/HLT. 772--776.
[10]
Y. Lei, V. Uren, and E. Motta. 2006. Semsearch: A search engine for the semantic web. In KEOD. Springer, 238--245.
[11]
J. Liu, J. Shang, C. Wang, X. Ren, and J. Han. 2015. Mining quality phrases from massive text corpora. In Proceedings SIGMOD. ACM, 1729--1744.
[12]
J. Mamou, O. Pereg, M. Wasserblat, I. Dagan, Y. Goldberg, A. Eirew, Y. Green, S. Guskin, P. Izsak, and D. Korat. 2018. Term Set Expansion based on Multi-Context Term Embeddings: an End-to-end Workflow. arXiv:1807.10104 (2018).
[13]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.
[14]
J. Prager, J. Chu-Carroll, and K. Czuba. 2004. Question answering using constraint satisfaction: QA-by-Dossier-with-Constraints. In Proceedings ACL. Association for Computational Linguistics, 574.
[15]
B. Roark and E. Charniak. 1998. Noun-phrase co-occurrence statistics for semiautomatic semantic lexicon construction. In Proceedings COLING. Association for Computational Linguistics, 1110--1116.
[16]
X. Rong, Z. Chen, Q. Mei, and E. Adar. 2016. Egoset: Exploiting word ego-networks and user-generated ontology for multifaceted set expansion. In Proceedings WSDM. ACM, 645--654.
[17]
L. Sarmento, V. Jijkuon, M. De Rijke, and E. Oliveira. 2007. More like these: growing entity classes from seeds. In Proceedings CIKM. ACM, 959--962.
[18]
J. Shen, Z. Wu, D. Lei, J. Shang, X. Ren, and J. Han. 2017. Setexpan: Corpus-based set expansion via context feature selection and rank ensemble. In ECML-PKDD.
[19]
J. Shen, Z. Wu, D. Lei, C. Zhang, X. Ren, M. T. Vanni, B. M. Sadler, and J. Han. 2018. HiExpan: Task-guided taxonomy construction by hierarchical tree expansion. In Proceedings SIGKDD. ACM, 2180--2189.
[20]
S. Shi, H. Zhang, X. Yuan, and J.-R. Wen. 2010. Corpus-based semantic class mining: distributional vs. pattern-based approaches. In Proceedings COLING. 993--1001.
[21]
M. Thelen and E. Riloff. 2002. A bootstrapping method for learning semantic lexicons using extraction pattern contexts. In Proceedings EMNLP. Association for Computational Linguistics, 214--221.
[22]
S. Tong and J. Dean. 2008. System and methods for automatically creating lists. US Patent 7,350,187.
[23]
R. C. Wang, N. Schlaefer, W. W. Cohen, and E. Nyberg. 2008. Automatic set expansion for list question answering. In Proceedings EMNLP. Association for Computational Linguistics, 947--954.

Cited By

View all
  • (2024)Exogenous and Endogenous Data Augmentation for Low-Resource Complex Named Entity RecognitionProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657754(630-640)Online publication date: 10-Jul-2024
  • (2023)Search Result Diversification Using Query Aspects as BottlenecksProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615050(3040-3051)Online publication date: 21-Oct-2023
  • (2023)Automatic Context Pattern Generation for Entity Set ExpansionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327521135:12(12458-12469)Online publication date: 1-Dec-2023
  • Show More Cited By

Index Terms

  1. Corpus-based Set Expansion with Lexical Features and Distributed Representations

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR'19: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2019
    1512 pages
    ISBN:9781450361729
    DOI:10.1145/3331184
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 July 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. information retrieval
    2. set expansion
    3. text mining

    Qualifiers

    • Short-paper

    Conference

    SIGIR '19
    Sponsor:

    Acceptance Rates

    SIGIR'19 Paper Acceptance Rate 84 of 426 submissions, 20%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Exogenous and Endogenous Data Augmentation for Low-Resource Complex Named Entity RecognitionProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657754(630-640)Online publication date: 10-Jul-2024
    • (2023)Search Result Diversification Using Query Aspects as BottlenecksProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615050(3040-3051)Online publication date: 21-Oct-2023
    • (2023)Automatic Context Pattern Generation for Entity Set ExpansionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327521135:12(12458-12469)Online publication date: 1-Dec-2023
    • (2022)Rows from Many Sources: Enriching row completions from Wikidata with a pre-trained Language ModelCompanion Proceedings of the Web Conference 202210.1145/3487553.3524923(1272-1280)Online publication date: 25-Apr-2022
    • (2022)Contrastive Learning with Hard Negative Entities for Entity Set ExpansionProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531954(1077-1086)Online publication date: 6-Jul-2022
    • (2022)Entity Set Co-Expansion in StackOverflow2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10020770(4792-4795)Online publication date: 17-Dec-2022
    • (2022)Concept Set ExpansionAutomated Taxonomy Discovery and Exploration10.1007/978-3-031-11405-2_2(9-29)Online publication date: 22-Sep-2022
    • (2021)SAUCEProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3481950(4173-4183)Online publication date: 26-Oct-2021
    • (2021)AutoName: A Corpus-Based Set Naming FrameworkProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463100(2101-2105)Online publication date: 11-Jul-2021
    • (2020)Data-driven domain discovery for structured datasetsProceedings of the VLDB Endowment10.14778/3384345.338434613:7(953-967)Online publication date: 26-Mar-2020

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media