Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3243082.3243118acmotherconferencesArticle/Chapter ViewAbstractPublication PageswebmediaConference Proceedingsconference-collections
research-article

Estimating Similarity Among Entities Aided by the Web when Only the Entity Name is Available

Published: 16 October 2018 Publication History
  • Get Citation Alerts
  • Abstract

    Estimating the similarity between entity names plays an important role in several tasks, such as entity resolution and recommendation tasks. Identifying the similarity between entity names, such as between titles of scientific articles, may not be feasible from direct comparison or using knowledge-based similarity approaches. Being an immeasurable source of data, Web can aid in this similarity check. In this work, we propose a method to calculate the similarity between two values of textual names, based on features inferred from data obtained from the Web and with the aid of genre terms. Experiments show that the method is able to check the similarity between names even those names no share terms in common.

    References

    [1]
    David M. Blei, Andrew Y. Ng, and Michael I. Jordan. 2003. Latent Dirichlet Allocation. Distributed Representations of Sentences and Documents 3 (March 2003), 993--1022.
    [2]
    Danushka Bollegala, Yutaka Matsuo, and Mitsuru Ishizuka. 2011. A Web Search Engine-Based Approach to Measure Semantic Similarity Between Words. IEEE Transactions on Knowledge and Data Engineering 23, 7 (jul 2011), 977--990.
    [3]
    Pável Calado, Marco Cristo, Marcos André Gonçalves, Edleno S de Moura, Berthier Ribeiro-Neto, and Nivio Ziviani. 2006. Link-based similarity measures for the classification of Web documents. Journal of the American Society for Information Science and Technology 57, 2 (2006), 208--221.
    [4]
    R. L. Cilibrasi and P. M. B. Vitanyi. 2007. The Google Similarity Distance. IEEE Transactions on Knowledge and Data Engineering 19, 3 (March 2007), 370--383.
    [5]
    Guilherme T. de Assis, Alberto H. F. Laender, Marcos André Gonçalves, and Altigran S. da Silva. 2009. A Genre-Aware Approach to Focused Crawling. World Wide Web 12, 3 (01 Sep 2009), 285--319.
    [6]
    Sébastien Harispe, Sylvie Ranwez, Stefan Janaqi, and Jacky Montmain. 2013. Semantic Measures for the Comparison of Units of Language, Concepts or Entities from Text and Knowledge Base Analysis. CoRR abs/1310.1285 (2013).
    [7]
    Bernhard Haslhofer, Flávio Martins, and João Magalhães. 2013. Using SKOS Vocabularies for Improving Web Search. In Proceedings of the 22Nd International Conference on World Wide Web. ACM, New York, NY, USA, 1253--1258.
    [8]
    Jonathan Lee Herlocker. 2000. Understanding and Improving Automated Collaborative Filtering Systems. Ph.D. Dissertation. Minneapolis, MN, USA. Advisor(s) Konstan, Joseph A. AAI9983577.
    [9]
    Hendrik Heuer. 2016. Text comparison using word vector representations and dimensionality reduction. The Computing Research Repository abs/1607.00534 (2016).
    [10]
    Quoc V. Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents. In Proceeding of the International Conference on Machine Learning, Vol. 32. JMLR.org, 1188--1196.
    [11]
    Chenliang Li, Yu Duan, Haoran Wang, Zhiqian Zhang, Aixin Sun, and Zongyang Ma. 2017. Enhancing Topic Modeling for Short Texts with Auxiliary Word Embeddings. ACM Trans. Inf. Syst. 36, 2, Article 11 (Aug. 2017), 30 pages.
    [12]
    Jorge Martinez-Gil and Jose F Aldana-Montes. 2012. Smart combination of web measures for solving semantic similarity problems. Online Information Review 36, 5 (2012), 724--738.
    [13]
    Rouzbeh Meymandpour and Joseph G. Davis. 2016. A semantic similarity measure for linked data: An information content-based approach. Knowledge-Based Systems 109 (2016), 276 -- 293.
    [14]
    Denilson Alves Pereira, Eduardo Emanuel Braga da Silva, and Ahmed A. A. Esmin. 2014. Disambiguating Publication Venue Titles Using Association Rules. In Proceedings of the 14th Joint Conference on Digital Libraries. IEEE Press, Piscataway, NJ, USA, 77--85.
    [15]
    M. F. Porter. 1997. Readings in Information Retrieval. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, Chapter An Algorithm for Suffix Stripping, 313--316.
    [16]
    Mehran Sahami and Timothy D. Heilman. 2006. A Web-based Kernel Function for Measuring the Similarity of Short Text Snippets. In Proceedings of the 15th International Conference on World Wide Web. ACM, New York, NY, USA, 377--386.
    [17]
    Alan Filipe Santana, Marcos André Gonçalves, Alberto HF Laender, and Anderson A Ferreira. 2017. Incremental author name disambiguation by exploiting domain-specific heuristics. Journal of the Association for Information Science and Technology 68, 4 (2017), 931--945.
    [18]
    Yee Fan Tan, Min Yen Kan, and Dongwon Lee. 2006. Search Engine Driven Author Disambiguation. In Proceedings of the 6th Joint Conference on Digital Libraries. ACM, New York, NY, USA, 314--315.
    [19]
    W John Wilbur and Karl Sirotkin. 1992. The automatic identification of stop words. Journal of information science 18, 1 (1992), 45--55.
    [20]
    Zhenglu Yang and Masaru Kitsuregawa. 2011. Efficient Searching Top-k Semantic Similar Words. In Proceedings of the 22Nd International Joint Conference on Artificial Intelligence - Volume Volume Three. AAAI Press, 2373--2378.
    [21]
    Airton Zancanaro, Leandro Dal Pizzol, Rafael Speroni, José Leomar Todesco, and Fernando Gauthier. 2013. Publishing Multidimensional Statistical Linked Data. 290--304.
    [22]
    S. Zhao, Y. Zheng, C. Zhu, T. Zhao, and S. Li. 2016. Semantic computation in geography question answering. In Proceeding of the 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery. 1572--1576.

    Cited By

    View all
    • (2019)An item-item similarity approach based on linked open data semantic relationshipProceedings of the 25th Brazillian Symposium on Multimedia and the Web10.1145/3323503.3349547(425-432)Online publication date: 29-Oct-2019

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WebMedia '18: Proceedings of the 24th Brazilian Symposium on Multimedia and the Web
    October 2018
    437 pages
    ISBN:9781450358675
    DOI:10.1145/3243082
    © 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 October 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Data Integration
    2. Entity Resolution
    3. Similarity among Entities
    4. Web Text Analysis

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    WebMedia '18
    WebMedia '18: Brazilian Symposium on Multimedia and the Web
    October 16 - 19, 2018
    BA, Salvador, Brazil

    Acceptance Rates

    WebMedia '18 Paper Acceptance Rate 37 of 111 submissions, 33%;
    Overall Acceptance Rate 270 of 873 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)An item-item similarity approach based on linked open data semantic relationshipProceedings of the 25th Brazillian Symposium on Multimedia and the Web10.1145/3323503.3349547(425-432)Online publication date: 29-Oct-2019

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media