Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3097983.3098185acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Automatic Synonym Discovery with Knowledge Bases

Published: 13 August 2017 Publication History

Abstract

Recognizing entity synonyms from text has become a crucial task in many entity-leveraging applications. However, discovering entity synonyms from domain-specific text corpora (e.g., news articles, scientific papers) is rather challenging. Current systems take an entity name string as input to find out other names that are synonymous, ignoring the fact that often times a name string can refer to multiple entities (e.g., "apple" could refer to both Apple Inc. and the fruit apple). Moreover, most existing methods require training data manually created by domain experts to construct supervised-learning systems. In this paper, we study the problem of automatic synonym discovery with knowledge bases, that is, identifying synonyms for knowledge base entities in a given domain-specific corpus. The manually-curated synonyms for each entity stored in a knowledge base not only form a set of name strings to disambiguate the meaning for each other, but also can serve as "distant" supervision to help determine important features for the task. We propose a novel framework, called DPE, to integrate two kinds of mutually-complementing signals for synonym discovery, i.e., distributional features based on corpus-level statistics and textual patterns based on local contexts. In particular, DPE jointly optimizes the two kinds of signals in conjunction with distant supervision, so that they can mutually enhance each other in the training stage. At the inference stage, both signals will be utilized to discover synonyms for the given entities. Experimental results prove the effectiveness of the proposed framework.

References

[1]
R. Angheluta, R. De Busser, and M.-F. Moens. The use of topic segmentation for automatic summarization. In ACL Workshop on Automatic Summarization 2002.
[2]
S. Chaudhuri, V. Ganti, and D. Xin. Exploiting web search to generate synonyms for entities. In WWW 2009.
[3]
J. Daiber, M. Jakob, C. Hokamp, and P. N. Mendes. Improving efficiency and accuracy in multilingual entity extraction. In Proceedings of the 9th International Conference on Semantic Systems (I-Semantics), 2013.
[4]
Google. Freebase data dumps. https://developers.google.com/freebase/data.
[5]
M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on Computational linguistics-Volume 2, pages 539--545. Association for Computational Linguistics, 1992.
[6]
D. Lin, S. Zhao, L. Qin, and M. Zhou. Identifying synonyms among distributionally similar words. In IJCAI 2003.
[7]
Q. Liu, H. Jiang, S. Wei, Z.-H. Ling, and Y. Hu. Learning semantic word embeddings based on ordinal knowledge constraints. In ACL-IJCNLP 2015.
[8]
C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky. The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL) System Demonstrations, 2014.
[9]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS 2013.
[10]
M. Mintz, S. Bills, R. Snow, and D. Jurafsky. Distant supervision for relation extraction without labeled data. In ACL 2009.
[11]
N. Nakashole, G. Weikum, and F. Suchanek. Patty: a taxonomy of relational patterns with semantic types. In EMNLP 2012.
[12]
P. Pantel, E. Crestan, A. Borkovsky, A.-M. Popescu, and V. Vyas. Web-scale distributional similarity and entity set expansion. In EMNLP 2009.
[13]
J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. In EMNLP 2014.
[14]
M. Purver and S. Battersby. Experimenting with distant supervision for emotion classification. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 482--491. Association for Computational Linguistics, 2012.
[15]
L. Qian, G. Zhou, F. Kong, and Q. Zhu. Semi-supervised learning for semantic relation classification using stratified sampling strategy. In EMNLP 2009.
[16]
X. Ren and T. Cheng. Synonym discovery for structured entities on heterogeneous graphs. In WWW 2015.
[17]
X. Ren, A. El-Kishky, C. Wang, F. Tao, C. R. Voss, and J. Han. Clustype: Effective entity recognition and typing by relation phrase-based clustering. In KDD 2015.
[18]
X. Ren, W. He, M. Qu, C. R. Voss, H. Ji, and J. Han. Label noise reduction in entity typing by heterogeneous partial-label embedding. In KDD 2016.
[19]
S. Roller, K. Erk, and G. Boleda. Inclusive yet selective: Supervised distributional hypernymy detection. In COLING 2014.
[20]
R. Snow, D. Jurafsky, and A. Y. Ng. Learning syntactic patterns for automatic hypernym discovery. NIPS 2004.
[21]
M. Stanojević et al. Cognitive synonymy: A general overview. FACTA UNIVERSITATIS-Linguistics and Literature, 7(2):193--200, 2009.
[22]
A. Sun and R. Grishman. Semi-supervised semantic pattern discovery with guidance from unsupervised pattern clusters. In ACL 2010.
[23]
J. Tang, M. Qu, and Q. Mei. Pte: Predictive text embedding through large-scale heterogeneous text networks. In KDD 2015.
[24]
J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. Line: Large-scale information network embedding. In WWW 2015.
[25]
P. Turney. Mining the web for synonyms: Pmi-ir versus lsa on toefl. 2001.
[26]
E. M. Voorhees. Query expansion using lexical-semantic relations. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 61--69. Springer-Verlag New York, Inc., 1994.
[27]
H. Wang, F. Tian, B. Gao, J. Bian, and T.-Y. Liu. Solving verbal comprehension questions in iq test by knowledge-powered word embedding. arXiv preprint arXiv:1505.07909, 2015.
[28]
Z. Wang, J. Zhang, J. Feng, and Z. Chen. Knowledge graph and text jointly embedding. In EMNLP 2014.
[29]
J. Weeds, D. Clarke, J. Reffin, D. J. Weir, and B. Keller. Learning to distinguish hypernyms and co-hyponyms. In COLING 2014.
[30]
X. Wei, F. Peng, H. Tseng, Y. Lu, and B. Dumoulin. Context sensitive synonym discovery for web search queries. In CIKM 2009.
[31]
J. Weston, A. Bordes, O. Yakhnenko, and N. Usunier. Connecting language and knowledge bases with embedding models for relation extraction. arXiv preprint arXiv:1307.7973, 2013.
[32]
F. Wu and D. S. Weld. Open information extraction using wikipedia. In ACL 2010.
[33]
P. Xie, D. Yang, and E. P. Xing. Incorporating word correlation knowledge into topic modeling. In HLT-NAACL 2015.
[34]
C. Xu, Y. Bai, J. Bian, B. Gao, G. Wang, X. Liu, and T.-Y. Liu. Rc-net: A general framework for incorporating knowledge into word representations. In CIKM 2014.
[35]
M. Yahya, S. Whang, R. Gupta, and A. Y. Halevy. Renoun: Fact extraction for nominal attributes. In EMNLP 2014.
[36]
B. Yang, W.-t. Yih, X. He, J. Gao, and L. Deng. Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575, 2014.
[37]
Q. T. Zeng, D. Redd, T. C. Rindflesch, and J. R. Nebeker. Synonym, topic model and predicate-based query expansion for retrieving clinical documents. In AMIA 2012.

Cited By

View all

Index Terms

  1. Automatic Synonym Discovery with Knowledge Bases

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    August 2017
    2240 pages
    ISBN:9781450348874
    DOI:10.1145/3097983
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 August 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. distant supervision
    2. distributional based approach
    3. knowledge base
    4. pattern based approach
    5. synonym discovery

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    KDD '17
    Sponsor:

    Acceptance Rates

    KDD '17 Paper Acceptance Rate 64 of 748 submissions, 9%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)276
    • Downloads (Last 6 weeks)50
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)A Visually Enhanced Neural Encoder for Synset InductionElectronics10.3390/electronics1216352112:16(3521)Online publication date: 20-Aug-2023
    • (2023)Synonym recognition from short textsExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.119966224:COnline publication date: 15-Aug-2023
    • (2023)A bilateral context and filtering strategy-based approach to Chinese entity synonym set expansionComplex & Intelligent Systems10.1007/s40747-023-01064-w9:5(6065-6085)Online publication date: 25-Apr-2023
    • (2023)LSTM-SN: complex text classifying with LSTM fusion social networkThe Journal of Supercomputing10.1007/s11227-022-05034-w79:9(9558-9583)Online publication date: 17-Jan-2023
    • (2023)Computing Online Average Happiness Maximization Sets over Data StreamsWeb and Big Data10.1007/978-3-031-25201-3_2(19-33)Online publication date: 10-Feb-2023
    • (2022)Geographic Knowledge Graph Attribute Normalization: Improving the Accuracy by Fusing Optimal Granularity Clustering and Co-Occurrence AnalysisISPRS International Journal of Geo-Information10.3390/ijgi1107036011:7(360)Online publication date: 23-Jun-2022
    • (2022)Set-aware Entity Synonym Discovery with Flexible Receptive Fields (Extended Abstract)2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00141(1529-1530)Online publication date: May-2022
    • (2022)Synonym Prediction for Vietnamese Occupational SkillsAdvances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence10.1007/978-3-031-08530-7_29(351-362)Online publication date: 19-Jul-2022
    • (2022)Mining Structures of Factual Knowledge from Text10.1007/978-3-031-01912-8Online publication date: 24-Mar-2022
    • (2022)Natural Language Data Management and Interfaces10.1007/978-3-031-01862-6Online publication date: 25-Feb-2022
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media