Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3428757.3429111acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiiwasConference Proceedingsconference-collections
research-article

Tailored Graph Embeddings for Entity Alignment on Historical Data

Published: 27 January 2021 Publication History

Abstract

In the domain of the Dutch cultural heritage various data sets describe different aspects of life during the Dutch Golden Age. These data sets, in the form of RDF graphs, use different standards and contain noise in the values of literal nodes, such as misspelled names and uncertainty in dates. The Golden Agents project aims at answering queries about the Dutch Golden ages using these distributed and independently maintained data sets. A problem in this project, among many other problems, is the identification of persons who occur in multiple data sets but under different URI's. This paper aims to solve this specific problem and generate a linkset, i.e. a set of pairs of URI's which are judged to represent the same person. We use domain knowledge in the application of an existing node context generation algorithm to serve as input for GloVe, an algorithm originally designed for embedding words. This embedding is then used to train a classifier on pairs of URI's which are known duplicates and non-duplicates. Using just the cosine similarity between URI-pairs in embedding space for prediction, we obtain a simple classifier with an F½-score of around 0.85, even when very few training examples are provided. On larger training sets, more complex classifiers are shown to reach an F½-score of up to 0.88.

References

[1]
Manel Achichi, Zohra Bellahsene, Mohamed Ben Ellefi, and Konstantin Todorov. 2019. Linking and disambiguating entities across heterogeneous RDF graphs. Journal of Web Semantics 55 (2019), 108--121.
[2]
Pavel Berkhin. 2006. Bookmark-coloring algorithm for personalized pagerank computing. Internet Mathematics 3, 1 (2006), 41--62.
[3]
Antoine Bordes, Xavier Glorot, Jason Weston, and Yoshua Bengio. 2014. A semantic matching energy function for learning with multi-relational data. Machine Learning 94, 2 (2014), 233--259.
[4]
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems. 2787--2795.
[5]
Michael Cochez, Petar Ristoski, Simone Paolo Ponzetto, and Heiko Paulheim. 2017. Global RDF vector space embeddings. In Proceedings of the 16th International Semantic Web Conference. Springer, 190--207.
[6]
John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12, Jul(2011), 2121--2159.
[7]
Al Idrissou, Veruska Zamborlini, Chiara Latronico, Frank van Harmelen, and CMJM van den Heuvel. 2018. Amsterdamers from the Golden Age to the Information Age via Lenticular Lenses: Short paper. (2018).
[8]
Anja Jentzsch, Robert Isele, and Christian Bizer. 2010. Silk-generating RDF links while publishing or consuming linked data. In Proceedings of the 9th International Semantic Web Conference (ISWC'10). Springer.
[9]
Axel-Cyrille Ngonga Ngomo and Sören Auer. 2011. LIMES---a time-efficient approach for large-scale link discovery on the web of data. In Twenty-Second International Joint Conference on Artificial Intelligence.
[10]
Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A Three-Way Model for Collective Learning on Multi-Relational Data. In ICML, Vol. 11. 809--816.
[11]
Xing Niu, Shu Rong, Haofen Wang, and Yong Yu. 2012. An effective rule miner for instance matching in a web of data. In Proceedings of the 21st ACM international conference on Information and knowledge management. 1085--1094.
[12]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.
[13]
Yu Shi, Huan Gui, Qi Zhu, Lance Kaplan, and Jiawei Han. 2018. Aspem: Embedding learning by aspects in heterogeneous information networks. In Proceedings of the 2018 SIAM International Conference on Data Mining. SIAM, 144--152.
[14]
Bayu Distiawan Trisedya, Jianzhong Qi, and Rui Zhang. 2019. Entity alignment between knowledge graphs using attribute embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 297--304.
[15]
Chi Man Wong, Qiang Chen, Suhui Wu, and Wei Zhang. 2020. Global Structure and Local Semantics-Preserved Embeddings for Entity Alignment. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Vol. 20. 3658--3664.
[16]
Yuting Wu, Xiao Liu, Yansong Feng, Zheng Wang, and Dongyan Zhao. 2020. Neighborhood Matching Network for Entity Alignment. arXiv preprint arXiv:2005.05607 (2020).
[17]
Qiannan Zhu, Xiaofei Zhou, Jia Wu, Jianlong Tan, and Li Guo. 2019. Neighborhood-Aware Attentional Representation for Multilingual Knowledge Graphs. In Proceedings of the Twenty-Eight International Joint Conference on Artificial Intelligence. 1943--1949.

Cited By

View all
  • (2022)A Computational Framework for Organizing and Querying Cultural Heritage ArchivesJournal on Computing and Cultural Heritage 10.1145/348584315:3(1-25)Online publication date: 18-Feb-2022

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
iiWAS '20: Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services
November 2020
492 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

  • Johannes Kepler University, Linz, Austria

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 January 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cultural Heritage
  2. Embedding
  3. Entity Alignment
  4. RDF

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

iiWAS '20

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)2
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2022)A Computational Framework for Organizing and Querying Cultural Heritage ArchivesJournal on Computing and Cultural Heritage 10.1145/348584315:3(1-25)Online publication date: 18-Feb-2022

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media