Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2187836.2187899acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Cross-lingual knowledge linking across wiki knowledge bases

Published: 16 April 2012 Publication History

Abstract

Wikipedia becomes one of the largest knowledge bases on the Web. It has attracted 513 million page views per day in January 2012. However, one critical issue for Wikipedia is that articles in different language are very unbalanced. For example, the number of articles on Wikipedia in English has reached 3.8 million, while the number of Chinese articles is still less than half million and there are only 217 thousand cross-lingual links between articles of the two languages. On the other hand, there are more than 3.9 million Chinese Wiki articles on Baidu Baike and Hudong.com, two popular encyclopedias in Chinese. One important question is how to link the knowledge entries distributed in different knowledge bases. This will immensely enrich the information in the online knowledge bases and benefit many applications. In this paper, we study the problem of cross-lingual knowledge linking and present a linkage factor graph model. Features are defined according to some interesting observations. Experiments on the Wikipedia data set show that our approach can achieve a high precision of 85.8% with a recall of 88.1%. The approach found 202,141 new cross-lingual links between English Wikipedia and Baidu Baike.

References

[1]
http://code.google.com/intl/zh-cn/apis/language/translate/overview.html.
[2]
http://linkeddata.org/.
[3]
http://svmlight.joachims.org/.
[4]
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives. Dbpedia: A nucleus for a web of open data. In ISWC/ASWC'07, pages 722--735, 2007.
[5]
M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, and S. Fienberg. Adaptive name matching in information integration. Intelligent Systems, IEEE, 18(5):16 -- 23, 2003.
[6]
C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, and S. Hellmann. Dbpedia - a crystallization point for the web of data. Web Semantics: Science, Services and Agents on the World Wide Web, 7(3):154 -- 165, 2009.
[7]
M. Cochinwala, V. Kurien, G. Lalk, and D. Shasha. Efficient data reconciliation. Information Sciences, 137(1--4):1 -- 15, 2001.
[8]
W. W. Cohen and J. Richman. Learning to match and cluster large high-dimensional data sets for data integration. In Proceedings of SIGKDD'02, pages 475--480, 2002.
[9]
C. Cortes and V. Vapnik. Support-Vector Networks. Machine Learning, 20(3):273--297, Sept. 1995.
[10]
P. Cudre-Mauroux, P. Haghani, M. Jost, K. Aberer, and H. De Meer. idmesh: graph-based disambiguation of linked data. In Proceedings of WWW '09, pages 591--600, 2009.
[11]
G. de Melo and G. Weikum. Menta: inducing multilingual taxonomies from wikipedia. In Proceedings of CIKM'10, pages 1099--1108, 2010.
[12]
Z. Dong and Q. Dong. Hownet And the Computation of Meaning. World Scientific Publishing Co., Inc., River Edge, NJ, USA, 2006.
[13]
M. Elfeky, V. Verykios, and A. Elmagarmid. Tailor: a record linkage toolbox. In Proceedings of ICDE'02, pages 17 --28, 2002.
[14]
A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering, 19:1--16, 2007.
[15]
M. Erdmann, K. Nakayama, T. Hara, and S. Nishio. Improving the extraction of bilingual terminology from wikipedia. ACM Transactions on Multimedia Computing, Communications, and Applications, 5:31:1--31:17, November 2009.
[16]
B. Fu and R. Brennan. Cross-lingual ontology mapping and its use on the multilingual semantic web. In Proceedings of WWW Workshop on Multilingual Semantic Web, 2010.
[17]
B. Fu, R. Brennan, and D. O'Sullivan. Cross-lingual ontology mapping -- an investigation of the impact of machine translation. In A. Gómez-Pérez, Y. Yu, and Y. Ding, editors, Proceedings of ASWC '09, volume 5926, pages 1--15, 2009.
[18]
S. Hassan and R. Mihalcea. Cross-lingual semantic relatedness using encyclopedic knowledge. In Proceedings of EMNLP '09, volume 3, pages 1192--1201, 2009.
[19]
J. Hopcroft, T. Lou, and J. Tang. Who will follow you back? reciprocal relationship prediction. In Proceedings of CIKM'11, 2011.
[20]
G. J. Jones, F. Fantino, E. Newman, and Y. Zhang. Domain-specific query translation for multilingual information access using machine translation augmented with dictionaries mined from wikipedia. In Proceedings of CLIA '08, 2008.
[21]
N. Koudas, S. Sarawagi, and D. Srivastava. Record linkage: similarity measures and algorithms. In Proceedings of SIGMOD '06, pages 802--803, 2006.
[22]
F. Kschischang, B. Frey, and H.-A. Loeliger. Factor graphs and the sum-product algorithm. IEEE Transactions on Information Theory, 47(2):498--519, 2001.
[23]
D. B. Lenat. Cyc: a large-scale investment in knowledge infrastructure. Communications of the ACM, 38:33--38, November 1995.
[24]
J. Li, J. Tang, Y. Li, and Q. Luo. Rimom: A dynamic multistrategy ontology alignment framework. IEEE Transactions on Knowledge and Data Engineering, 21(8):1218--1232, 2009.
[25]
G. A. Miller. Wordnet: a lexical database for english. Communications of the ACM, 38:39--41, November 1995.
[26]
R. Navigli and S. P. Ponzetto. Babelnet: building a very large multilingual semantic network. In Proceedings of ACL '10, pages 216--225, 2010.
[27]
H. B. Newcombe, J. M. Kennedy, S. J. Axford, and A. P. James. Automatic linkage of vital records. Science, 130(3381):954--959, 1959.
[28]
A. Nikolov, V. S. Uren, E. Motta, and A. N. D. Roeck. Handling instance coreferencing in the knofuss architecture. In In Proceedings of IRSW'08, volume 422, 2008.
[29]
J.-H. Oh, D. Kawahara, K. Uchimoto, J. Kazama, and K. Torisawa. Enriching multilingual language resources by discovering missing cross-language links in wikipedia. In Proceedings of WI-IAT '08, volume 1, pages 322--328, 2008.
[30]
M. Potthast, B. Stein, and M. Anderka. A wikipedia-based multilingual retrieval model. In Proceedings of ECIR'08, pages 522--530, 2008.
[31]
L. Sorg and P. Cimiano. Enriching the crosslingual link structure of Wikipedia - A classification-based approach. In AAAI 2008 Workshop on Wikipedia and Artifical Intelligence, 2008.
[32]
J. Tang, J. Li, B. Liang, X. Huang, Y. Li, and K. Wang. Using bayesian decision for ontology mapping. Web Semantics: Science, Services and Agents on the World Wide Web, 4(4):243--262, 2006.
[33]
J. Tang, T. Lou, and J. Kleinberg. Inferring social ties across heterogenous networks. In Proceedings of WSDM'12, pages 743--752, 2012.
[34]
J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analysis in large-scale networks. In Proceedings of SIGKDD'09, pages 807--816, 2009.
[35]
W. Tang, H. Zhuang, and J. Tang. Learning to infer social ties in large networks. In Proceedings of ECML/PKDD'11, pages 381--397, 2011.
[36]
J. Volz, C. Bizer, M. Gaedke, and G. Kobilarov. Discovering and maintaining links on the web of data. In Proceedings of ISWC '09, pages 650--665, 2009.
[37]
C. Wang, J. Han, Y. Jia, J. Tang, D. Zhang, Y. Yu, and J. Guo. Mining advisor-advisee relationships from research publication networks. In Proceedings of KDD'10, pages 203--212, 2010.
[38]
W. E. Winkler. Methods for record linkage and bayesian networks. Technical report, Series RRS2002/05, U.S. Bureau of the Census, 2002.
[39]
C. S. Wolodja Wentland, Johannes Knopp and M. Hartung. Building a multilingual lexical resource for named entity disambiguation, translation and transliteration. In Proceedings of LREC'08, 2008.
[40]
Z. Ye, X. Huang, and H. Lin. A graph-based approach to mining multilingual word associations from wikipedia. In Proceedings of SIGIR'09, pages 690--691, 2009.
[41]
X. Zhang, Q. Zhong, F. Shi, J. Li, and J. Tang. Rimom results for oaei 2009. In Proceedings of ISWC Workshop on Ontology Matching, 2009.

Cited By

View all
  • (2023)OAG: Linking Entities Across Large-Scale Heterogeneous Knowledge GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322216835:9(9225-9239)Online publication date: 1-Sep-2023
  • (2021)Wikipedia Beyond the English Language EditionProceedings of the ACM on Human-Computer Interaction10.1145/34491295:CSCW1(1-39)Online publication date: 22-Apr-2021
  • (2021)OAG_know: Self-supervised Learning for Linking Knowledge GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3090830(1-1)Online publication date: 2021
  • Show More Cited By

Index Terms

  1. Cross-lingual knowledge linking across wiki knowledge bases

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '12: Proceedings of the 21st international conference on World Wide Web
    April 2012
    1078 pages
    ISBN:9781450312295
    DOI:10.1145/2187836
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • Univ. de Lyon: Universite de Lyon

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 16 April 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cross-lingual
    2. knowledge linking
    3. knowledge sharing
    4. wiki knowledge base

    Qualifiers

    • Research-article

    Conference

    WWW 2012
    Sponsor:
    • Univ. de Lyon
    WWW 2012: 21st World Wide Web Conference 2012
    April 16 - 20, 2012
    Lyon, France

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 10 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)OAG: Linking Entities Across Large-Scale Heterogeneous Knowledge GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.322216835:9(9225-9239)Online publication date: 1-Sep-2023
    • (2021)Wikipedia Beyond the English Language EditionProceedings of the ACM on Human-Computer Interaction10.1145/34491295:CSCW1(1-39)Online publication date: 22-Apr-2021
    • (2021)OAG_know: Self-supervised Learning for Linking Knowledge GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3090830(1-1)Online publication date: 2021
    • (2020)Research of Knowledge Graph Technology and its Applications in Agricultural Information Consultation Field2020 IEEE 39th International Performance Computing and Communications Conference (IPCCC)10.1109/IPCCC50635.2020.9391515(1-4)Online publication date: 6-Nov-2020
    • (2020)In favour of or against multi-lingual Q&A sites? Exploring the evidence from user and knowledge perspectivesBehaviour & Information Technology10.1080/0144929X.2020.175230840:13(1390-1405)Online publication date: 22-Apr-2020
    • (2020)A survey on the development status and application prospects of knowledge graph in smart gridsIET Generation, Transmission & Distribution10.1049/gtd2.1204015:3(383-407)Online publication date: 3-Dec-2020
    • (2019)XLORE2: Large-scale Cross-lingual Knowledge Graph Construction and ApplicationData Intelligence10.1162/dint_a_000031:1(77-98)Online publication date: Mar-2019
    • (2019)OAGProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330785(2585-2595)Online publication date: 25-Jul-2019
    • (2019)TriTag-NFPF: Knowledge Denoising for Chinese Encyclopedia based on Triple Tag-Constructed Potential FunctionIEEE Access10.1109/ACCESS.2019.29332497(107413-107427)Online publication date: 2019
    • (2019)Neural Article Pair Modeling for Wikipedia Sub-article MatchingMachine Learning and Knowledge Discovery in Databases10.1007/978-3-030-10997-4_1(3-19)Online publication date: 18-Jan-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media