Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-642-35176-1_29guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A machine learning approach for instance matching based on similarity metrics

Published: 11 November 2012 Publication History

Abstract

The Linking Open Data (LOD) project is an ongoing effort to construct a global data space, i.e. the Web of Data. One important part of this project is to establish owl:sameAs links among structured data sources. Such links indicate equivalent instances that refer to the same real-world object. The problem of discovering owl:sameAs links between pairwise data sources is called instance matching. Most of the existing approaches addressing this problem rely on the quality of prior schema matching, which is not always good enough in the LOD scenario. In this paper, we propose a schema-independent instance-pair similarity metric based on several general descriptive features. We transform the instance matching problem to the binary classification problem and solve it by machine learning algorithms. Furthermore, we employ some transfer learning methods to utilize the existing owl:sameAs links in LOD to reduce the demand for labeled data. We carry out experiments on some datasets of OAEI2010. The results show that our method performs well on real-world LOD data and outperforms the participants of OAEI2010.

References

[1]
Ananthakrishna, R., Chaudhuri, S., Ganti, V.: Eliminating fuzzy duplicates in data warehouses. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 586-597. VLDB Endowment (2002)
[2]
Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Machine Learning 56(1), 89-113 (2004)
[3]
Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive name matching in information integration. IEEE Intelligent Systems 18(5), 16-23 (2003)
[4]
Bizer, C., Heath, T., Berners-Lee, T.: Linked data-the story so far. International Journal on Semantic Web and Information Systems (IJSWIS) 5(3), 1-22 (2009)
[5]
Breiman, L.: Random forests. Machine Learning 45(1), 5-32 (2001)
[6]
Cochinwala, M., Kurien, V., Lalk, G., Shasha, D.: Efficient data reconciliation. Information Sciences 137(1), 1-15 (2001)
[7]
Cohen, W.: Integration of heterogeneous databases without common domains using queries based on textual similarity. ACM SIGMOD Record 27, 201-212 (1998)
[8]
Dai, W., Yang, Q., Xue, G., Yu, Y.: Boosting for transfer learning. In: Proceedings of the 24th International Conference on Machine Learning, pp. 193-200. ACM (2007)
[9]
Eaton, E., desJardins, M., et al.: Selective transfer between learning tasks using task-based boosting. In: Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)
[10]
Ell, B., Vrande?cić, D., Simperl, E.: Labels in the Web of Data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 162-176. Springer, Heidelberg (2011)
[11]
Elmagarmid, A., Ipeirotis, P., Verykios, V.: Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1-16 (2007)
[12]
Freund, Y., Schapire, R.: A Desicion-Theoretic Generalization of On-line Learning and an Application to Boosting. In: Vitányi, P. M. B. (ed.) EuroCOLT 1995. LNCS, vol. 904, pp. 23-37. Springer, Heidelberg (1995)
[13]
Guha, S., Koudas, N., Marathe, A., Srivastava, D.: Merging the results of approximate match operations. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, vol. 30, pp. 636-647. VLDB Endowment (2004)
[14]
Hogan, A., Harth, A., Decker, S.: Performing object consolidation on the semantic web data graph (2007)
[15]
Hogan, A., Polleres, A., Umbrich, J., Zimmermann, A.: Some entities are more equal than others: statistical methods to consolidate linked data. In: 4th International Workshop on New Forms of Reasoning for the Semantic Web: Scalable and Dynamic, NeFoRS 2010 (2010)
[16]
Hu, W., Chen, J., Cheng, G., Qu, Y.: Objectcoref & falcon-ao: results for oaei 2010. In: Ontology Matching, p. 158 (2010)
[17]
Hu, W., Chen, J., Qu, Y.: A self-training approach for resolving object coreference on the semantic web. In: Proceedings of the 20th International Conference on World Wide Web, pp. 87-96. ACM (2011)
[18]
Isele, R., Jentzsch, A., Bizer, C.: Efficient multidimensional blocking for link discovery without losing recall (2011)
[19]
Li, J., Tang, J., Li, Y., Luo, Q.: Rimom: A dynamic multistrategy ontology alignment framework. IEEE Transactions on Knowledge and Data Engineering 21(8), 1218-1232 (2009)
[20]
Loh, W.: Classification and regression tree methods. In: Encyclopedia of Statistics in Quality and Reliability (2008)
[21]
Ngomo, A., Auer, S.: Limes-a time-efficient approach for large-scale link discovery on the web of data. In: Proceedings of IJCAI (2011)
[22]
Ngomo, A., Lehmann, J., Auer, S., Höffner, K.: Raven-active learning of link specifications. In: Ontology Matching, p. 25 (2011)
[23]
Niu, X., Rong, S., Zhang, Y., Wang, H.: Zhishi.links results for oaei 2011. In: Ontology Matching, p. 220 (2011)
[24]
Pan, S., Yang, Q.: A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(10), 1345-1359 (2010)
[25]
Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity uncertainty and citation matching. In: Proceedings of NIPS 2002 (2002)
[26]
Rahm, E., Bernstein, P.: A survey of approaches to automatic schema matching. The VLDB Journal 10(4), 334-350 (2001)
[27]
Raimond, Y., Sutton, C., Sandler, M.: Automatic interlinking of music datasets on the semantic web. In: Proceedings of the 1st Workshop about Linked Data on the Web. Citeseer (2008)
[28]
Sleeman, J., Finin, T.: Computing foaf co-reference relations with rules and machine learning. In: Proceedings of the Third International Workshop on Social Data on the Web (2010)
[29]
Sleeman, J., Finin, T.: A machine learning approach to linking foaf instances. In: Spring Symposium on Linked Data Meets AI. AAAI (January 2010)
[30]
Song, D., Heflin, J.: Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 649-664. Springer, Heidelberg (2011)
[31]
Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Discovering and Maintaining Links on the Web of Data. In: Bernstein, A., Karger, D. R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 650-665. Springer, Heidelberg (2009)
[32]
Wang, Z., Zhang, X., Hou, L., Zhao, Y., Li, J., Qi, Y., Tang, J.: Rimom results for oaei 2010. In: Ontology Matching, p. 195 (2010)
[33]
Winkler, W.: The state of record linkage and current research problems. In: Statistical Research Division, US Census Bureau. Citeseer (1999)
[34]
Winkler, W.: Overview of record linkage and current research directions. In: Bureau of the Census. Citeseer (2006)

Cited By

View all
  • (2020)An Overview of End-to-End Entity Resolution for Big DataACM Computing Surveys10.1145/341889653:6(1-42)Online publication date: 6-Dec-2020
  • (2019)Contextual Entity Disambiguation in Domains with Weak Identity CriteriaProceedings of the 10th International Conference on Knowledge Capture10.1145/3360901.3364440(259-262)Online publication date: 23-Sep-2019
  • (2019)OAGProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330785(2585-2595)Online publication date: 25-Jul-2019
  • Show More Cited By

Index Terms

  1. A machine learning approach for instance matching based on similarity metrics
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ISWC'12: Proceedings of the 11th international conference on The Semantic Web - Volume Part I
    November 2012
    673 pages
    ISBN:9783642351754

    Sponsors

    • Elsevier
    • Yahoo! Research
    • Artificial Intelligence Journal
    • ORACLE: ORACLE
    • IBM: IBM

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 11 November 2012

    Author Tags

    1. instance matching
    2. linking open data
    3. machine learning
    4. similarity matric
    5. transfer learning

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)An Overview of End-to-End Entity Resolution for Big DataACM Computing Surveys10.1145/341889653:6(1-42)Online publication date: 6-Dec-2020
    • (2019)Contextual Entity Disambiguation in Domains with Weak Identity CriteriaProceedings of the 10th International Conference on Knowledge Capture10.1145/3360901.3364440(259-262)Online publication date: 23-Sep-2019
    • (2019)OAGProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330785(2585-2595)Online publication date: 25-Jul-2019
    • (2019)An improved method of locality-sensitive hashing for scalable instance matchingKnowledge and Information Systems10.1007/s10115-018-1199-558:2(275-294)Online publication date: 1-Feb-2019
    • (2018)AvatarProceedings of the Second Workshop on Data Management for End-To-End Machine Learning10.1145/3209889.3209892(1-10)Online publication date: 15-Jun-2018
    • (2018)A statistically-based ontology matching toolDistributed and Parallel Databases10.1007/s10619-017-7206-036:1(195-217)Online publication date: 1-Mar-2018
    • (2018)Active instance matching with pairwise constraints and its application to Chinese knowledge base constructionKnowledge and Information Systems10.1007/s10115-017-1076-755:1(171-214)Online publication date: 1-Apr-2018
    • (2017)ScLinkJournal of Intelligent Information Systems10.1007/s10844-016-0426-348:3(519-551)Online publication date: 1-Jun-2017
    • (2016)Multi-Source Uncertain Entity Resolution at Yad VashemProceedings of the 2016 International Conference on Management of Data10.1145/2882903.2903737(807-819)Online publication date: 26-Jun-2016
    • (2016)Automatic Key Selection for Data LinkingKnowledge Engineering and Knowledge Management10.1007/978-3-319-49004-5_1(3-18)Online publication date: 19-Nov-2016
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media