Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
survey

Large-scale Semantic Integration of Linked Data: A Survey

Published: 13 September 2019 Publication History
  • Get Citation Alerts
  • Abstract

    A large number of published datasets (or sources) that follow Linked Data principles is currently available and this number grows rapidly. However, the major target of Linked Data, i.e., linking and integration, is not easy to achieve. In general, information integration is difficult, because (a) datasets are produced, kept, or managed by different organizations using different models, schemas, or formats, (b) the same real-world entities or relationships are referred with different URIs or names and in different natural languages,<?brk?>(c) datasets usually contain complementary information, (d) datasets can contain data that are erroneous, out-of-date, or conflicting, (e) datasets even about the same domain may follow different conceptualizations of the domain, (f) everything can change (e.g., schemas, data) as time passes. This article surveys the work that has been done in the area of Linked Data integration, it identifies the main actors and use cases, it analyzes and factorizes the integration process according to various dimensions, and it discusses the methods that are used in each step. Emphasis is given on methods that can be used for integrating several datasets. Based on this analysis, the article concludes with directions that are worth further research.

    Supplementary Material

    a103-mountantonakis-suppl.pdf (mountantonakis.zip)
    Supplemental movie, appendix, image and software files for, Large-scale Semantic Integration of Linked Data: A Survey

    References

    [1]
    A. Abello, O. Romero, T. B. Pedersen, R. Berlanga, V. Nebot, M. J. Aramburu, and A. Simitsis. 2015. Using semantic web technologies for exploratory OLAP: A survey. IEEE Trans. Knowl. Data Eng. 27, 2 (2015), 571--588.
    [2]
    M. Acosta, E. Simperl, F. Flöck, and M. Vidal. 2017. Enhancing answer completeness of SPARQL queries via crowdsourcing. Web Semantics: Sci. Serv. Agents World Wide Web 45 (2017), 41--62.
    [3]
    Maribel Acosta, Maria-Esther Vidal, Tomas Lampo, Julio Castillo, and Edna Ruckhaus. 2011. ANAPSID: An adaptive query processing engine for SPARQL endpoints. In Proceedings of the International Semantic Web Conference (ISWC’11). Springer, 18--34.
    [4]
    Maribel Acosta, Amrapali Zaveri, Elena Simperl, Dimitris Kontokostas, Fabian Flöck, and Jens Lehmann. 2018. Detecting linked data quality issues via crowdsourcing: A dbpedia study. Semantic Web 9, 3 (2018), 303--335.
    [5]
    Grigoris Antoniou and Frank Van Harmelen. 2004. A Semantic Web Primer. MIT Press.
    [6]
    Ciro Baron Neto, Kay Müller, Martin Brümmer, Dimitris Kontokostas, and Sebastian Hellmann. 2016. LODVader: An interface to LOD visualization, analytics and discovERy in real-time. In Proceedings of the Conference on the World Wide Web (WWW’16). 163--166.
    [7]
    Sonia Bergamaschi, Silvana Castano, and Maurizio Vincini. 1999. Semantic integration of semistructured and structured data sources. ACM SIGMOD Rec. 28, 1 (1999), 54--59.
    [8]
    Tim Berners-Lee and Mark Fischetti. 2001. Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor. DIANE Publishing Company.
    [9]
    Tim Berners-Lee, James Hendler, Ora Lassila et al. 2001. The semantic web. Sci. Amer. 284, 5 (2001), 28--37.
    [10]
    Nikos Bikakis and Timos K. Sellis. 2016. Exploration and visualization in the web of big linked data: A survey of the state of the art. In Proceedings of the International Joint Conference on Extending Database Technology and International Conference on Database Theory (EDBT/ICDT’16), Vol. 1558.
    [11]
    N. Bikakis, C. Tsinaraki, N. Gioldasis, I. Stavrakantonakis, and S. Christodoulakis. 2013. The XML and semantic web worlds: Technologies, interoperability and integration: a survey of the state of the art. In Semantic Hyper/Multimedia Adaptation. Springer, 319--360.
    [12]
    Christian Bizer, Tom Heath, and Tim Berners-Lee. 2009. Linked data-the story so far. International Journal on Semantic Web and Information Systems, Tom Heath, Martin Hepp, and Christian Bizer (Eds.), 5, 3 (2009), 1–22.
    [13]
    Christian Bizer and Andreas Schultz. 2010. The R2R framework: Publishing and discovering mappings on the web. In Proceedings of the 1st International Conference on Consuming Linked Data. 97--108.
    [14]
    Christian Bizer, Andreas Schultz, David Ruiz, and Carlos R. Rivero. 2012. Benchmarking the performance of linked data translation systems. In Proceedings of the Conference on Linked Data on the Web (LDOW’12).
    [15]
    Christoph Böhm, Gerard de Melo, Felix Naumann, and Gerhard Weikum. 2012. LINDA: Distributed web-of-data-scale entity matching. In Proceedings of the 21st ACM ACM International Conference on Information and Knowledge Management Conference (CIKM’12). 2104--2108.
    [16]
    Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the ACM SIGMOD Conference. 1247--1250.
    [17]
    Angela Bonifati, Fabiano Cattaneo, Stefano Ceri, Alfonso Fuggetta, Stefano Paraboschi et al. 2001. Designing data marts for data warehouses. ACM Trans. Softw. Eng. Methodol. 10, 4 (2001), 452--483.
    [18]
    Dan Brickley, Matthew Burgess, and Natasha Noy. 2019. Google dataset search: Building a search engine for datasets in an open web ecosystem. In Proceedings of the World Wide Web Conference. ACM, 1365--1375.
    [19]
    Qingqing Cai and Alexander Yates. 2013. Large-scale semantic parsing via schema matching and lexicon extension. In Proceedings of the Association for Computational Linguistics (ACL’13). 423--433.
    [20]
    Alison Callahan, José Cruz-Toledo, Peter Ansell, and Michel Dumontier. 2013. Bio2RDF release 2: Improved coverage, interoperability and provenance of life science linked data. In Proceedings of the Extended Semantic Web Conference. Springer, 200--212.
    [21]
    Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, Daniele Nardi, and Riccardo Rosati. 1998. Information integration: Conceptual modeling and reasoning support. In Proceedings of 3rd International Conference on Cooperative Information Systems (IFCIS’98). 280--289.
    [22]
    Diego Calvanese, Domenico Lembo, and Maurizio Lenzerini. 2001. Survey on methods for query rewriting and query answering using views. Integrazione, Warehousing e Mining Di Sorgenti Eterogenee 25 (2001).
    [23]
    Silvana Castano, Alfio Ferrara, Stefano Montanelli, and Gaia Varese. 2011. Ontology and instance matching. In Knowledge-driven Multimedia Information Extraction and Ontology Evolution. Springer, 167--195.
    [24]
    M. Cheatham and P. Hitzler. 2013. String similarity metrics for ontology alignment. In Proceedings of the International Semantic Web Conference (ISWC’13). Springer, 294--309.
    [25]
    Diego Collarana, Mikhail Galkin, Ignacio Traverso-Ribón, Christoph Lange, Maria-Esther Vidal, and Sören Auer. 2017. Semantic data integration for knowledge graph construction at query time. In Proceedings of the IEEE International Conference on Semantic Computing (ICSC’17). IEEE, 109--116.
    [26]
    Diego Collarana, Mikhail Galkin, Ignacio Traverso-Ribón, Maria-Esther Vidal, Christoph Lange, and Sören Auer. 2017. MINTE: Semantically integrating RDF graphs. In Proceedings of International Conference on Web Intelligence, Mining and Semantics (WIMS’17). ACM, 22.
    [27]
    Gianluca Correndo, Manuel Salvadores, Ian Millard, Hugh Glaser, and Nigel Shadbolt. 2010. SPARQL query rewriting for implementing data integration over linked data. In Proceedings of the International Joint Conference on Extending Database Technology and International Conference on Database Theory (EDBT/ICDT’10). ACM, 1--11.
    [28]
    Isabel F. Cruz, Flavio Palandri Antonelli, and Cosmin Stroe. 2009. AgreementMaker: Efficient matching for large real-world schemas and ontologies. Proc. VLDB Endow. 2, 2 (2009), 1586--1589.
    [29]
    A. Dadzie and M. Rowe. 2011. Approaches to visualising linked data: A survey. Semantic Web 2, 2 (2011), 89--124.
    [30]
    Mathieu d’Aquin and Enrico Motta. 2011. Watson, more than a semantic web search engine. Semantic Web 2, 1 (2011), 55--63.
    [31]
    Evangelia Daskalaki, Giorgos Flouris, Irini Fundulaki, and Tzanina Saveta. 2016. Instance matching benchmarks in the era of Linked Data. Web Semantics: Sci. Serv. Agents World Wide Web 39 (2016), 1--14.
    [32]
    Jeremy Debattista, Sören Auer, and Christoph Lange. 2016. Luzzu—A methodology and framework for linked data quality assessment. J. Data Info. Qual. 8, 1 (2016), 4.
    [33]
    Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. 2012. ZenCrowd: Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In Proceedings of the Conference on the World Wide Web (WWW’12). ACM, 469--478.
    [34]
    Li Ding, Tim Finin, Anupam Joshi, Rong Pan, R. Scott Cost, Yun Peng, Pavan Reddivari, Vishal Doshi, and Joel Sachs. 2004. Swoogle: A search and metadata engine for the semantic web. In Proceedings of the Conference on Information and Knowledge Management (CIKM’04). ACM, 652--659.
    [35]
    Li Ding, Vassilios Peristeras, and Michael Hausenblas. 2012. Linked open government data {Guest editors’ introduction}. IEEE Intell. Syst. 27, 3 (2012), 11--15.
    [36]
    Renata Queiroz Dividino, Thomas Gottron, Ansgar Scherp, and Gerd Gröner. 2014. From changes to dynamics: Dynamics analysis of linked open data sources. In Proceedings of the European Semantic Web Conference (ESWC’14). Retrieved from CEUR-WS.org.
    [37]
    Warith Eddine Djeddi and Mohamed Tarek Khadir. 2014. A novel approach using context-based measure for matching large-scale ontologies. In Proceedings of the International Conference on Big Data Analytics and Knowledge Discovery (DaWaK’14). Springer, 320--331.
    [38]
    Martin Doerr. 2003. The CIDOC conceptual reference module: An ontological approach to semantic interoperability of metadata. AI Mag 24, 3 (2003), 75.
    [39]
    X. Dong, E. Gabrilovich, G. Heitz, W. Horn, Ni Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. 2014. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proceedings of the ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD’14). ACM, 601--610.
    [40]
    Xin Luna Dong, Barna Saha, and Divesh Srivastava. 2012. Less is more: Selecting sources wisely for integration. In Proceedings of the VLDB Endowment, Vol. 6. VLDB Endowment, 37--48.
    [41]
    X. L. Dong and D. Srivastava. 2015. Big data integration. Synth. Lect. Data Manage. 7, 1 (2015), 1--198.
    [42]
    Vasilis Efthymiou, Kostas Stefanidis, and Vassilis Christophides. 2016. Minoan ER: Progressive entity resolution in the web of data. In Proceedings of the 19th International Conference on Extending Database Technology.
    [43]
    Khadija M. Elbedweihy, Stuart N. Wrigley, Paul Clough, and Fabio Ciravegna. 2015. An overview of semantic search evaluation initiatives. Web Semantics: Sci. Serv. Agents World Wide Web 30 (2015), 82--105.
    [44]
    Mohamed Ben Ellefi, Zohra Bellahsene, Stefan Dietze, and Konstantin Todorov. 2016. Dataset recommendation for data linking: An intensional approach. In Proceedings of the International Semantic Web Conference. Springer, 36--51.
    [45]
    Kemele M. Endris, Mikhail Galkin, Ioanna Lytra, Mohamed Nadjib Mami, Maria-Esther Vidal, and Sören Auer. 2017. MULDER: Querying the linked data web by bridging RDF molecule templates. In Proceedings of the International Conference on Database and Expert Systems Applications (DEXA’17). Springer, 3--18.
    [46]
    Ivan Ermilov, Sören Auer, and Claus Stadler. 2013. Csv2rdf: User-driven CSV to RDF mass conversion framework. In Proceedings of the Conference on Information Systems Engineering and Management (ISEM’13), Vol. 13. 4--6.
    [47]
    I. Ermilov, J. Lehmann, M. Martin, and S. Auer. 2016. LODStats: The data web census dataset. In Proceedings of the International Semantic Web Conference (ISWC’16). Springer, 38--46.
    [48]
    Pavlos Fafalios, Thanos Yannakis, and Yannis Tzitzikas. 2016. Querying the web of data with SPARQL-LD. In International Conference on Theory and Practice of Digital Libraries (TPDL’16). Springer, 175--187.
    [49]
    Michael Färber, Frederic Bartscherer, Carsten Menne, and Achim Rettinger. 2018. Linked data quality of dbpedia, freebase, opencyc, wikidata, and yago. Semantic Web 9, 1 (2018), 77--129.
    [50]
    Michael Färber, Basil Ell, Carsten Menne, and Achim Rettinger. 2015. A comparative survey of dbpedia, freebase, opencyc, wikidata, and yago. Semantic Web J. 1, 1 (2015), 1--5.
    [51]
    J. D. Fernández, W. Beek, M. A. Martínez-Prieto, and M. Arias. 2017. LOD-a-lot. In Proceedings of the International Semantic Web Conference (ISWC’17). Springer, 75--83.
    [52]
    J. D. Fernández, M. A. Martínez-Prieto, C. Gutiérrez, A. Polleres, and M. Arias. 2013. Binary RDF representation for publication and exchange (HDT). Web Semantics: Sci. Serv. Agents World Wide Web 19 (2013), 22--41.
    [53]
    Valeria Fionda, Giuseppe Pirrò, and Claudio Gutierrez. 2015. NautiLOD: A formal language for the web of data graph. ACM Trans. Web 9, 1 (2015), 5.
    [54]
    Annika Flemming. 2010. Quality characteristics of linked data publishing datasources. Master’s Thesis, Humboldt-Universität of Berlin (2010).
    [55]
    Giorgos Flouris, Dimitris Manakanatas, Haridimos Kondylakis, Dimitris Plexousakis, and Grigoris Antoniou. 2008. Ontology change: Classification and survey. Knowl. Eng. Rev. 23, 2 (2008), 117--152.
    [56]
    Christian Fürber and Martin Hepp. 2011. Swiqa-a semantic web information quality assessment framework. In Proceedings of the European Conference on Information Systems (ECIS’11), Vol. 15. 19.
    [57]
    D. Gerber, D. Esteves, J. Lehmann, L. Bühmann, R. Usbeck, A. N. Ngomo, and R. Speck. 2015. DeFacto-temporal and multilingual deep fact validation. Web Semantics: Sci. Serv. Agents World Wide Web 35 (2015), 85--101.
    [58]
    José M Giménez-Garcıa, Harsh Thakkar, and Antoine Zimmermann. 2016. Assessing trust with pagerank in the web of data. In Proceedings of the 3rd International Workshop on Dataset PROFIling and fEderated Search for Linked Data.
    [59]
    Hugh Glaser, Afraz Jaffri, and Ian Millard. 2009. Managing co-reference on the semantic web. (2009).
    [60]
    Olaf Görlitz and Steffen Staab. 2011. Splendid: Sparql endpoint federation exploiting void descriptions. In Proceedings of the International Conference on Consuming Linked Data (COLD’11). 13--24.
    [61]
    P. Groth, A. Loizou, A. J. G. Gray, C. Goble, L. Harland, and S. Pettifer. 2014. API-centric linked data integration: The open PHACTS discovery platform case study. Web Semantics: Sci. Serv. Agents World Wide Web 29 (2014), 12--18.
    [62]
    Tobias Grubenmann, Abraham Bernstein, Dmitry Moor, and Sven Seuken. 2017. Challenges of source selection in the WoD. In Proceedings of the International Semantic Web Conference. Springer, 313--328.
    [63]
    Christophe Guéret, Paul Groth, Claus Stadler, and Jens Lehmann. 2012. Assessing linked data mappings using network measures. In The Semantic Web: Research and Applications. Springer, 87--102.
    [64]
    Yuanbo Guo, Zhengxiang Pan, and Jeff Heflin. 2005. LUBM: A benchmark for OWL knowledge base systems. Web Semantics: Sci. Serv. Agents World Wide Web 3, 2--3 (2005), 158--182.
    [65]
    A. Halevy, A. Rajaraman, and J. Ordille. 2006. Data integration: the teenage years. In Proceedings of the Conference on Very Large Data Bases (VLDB’06). 9--16.
    [66]
    Andreas Harth, Craig A. Knoblock, Steffen Stadtmüller, Rudi Studer, and Pedro Szekely. 2013. On-the-fly integration of static and dynamic linked data. In Proceedings of the International Conference on Consuming Linked Data (COLD’13). Citeseer, 1613--0073.
    [67]
    Olaf Hartig. 2009. Provenance information in the web of data. In Proceedings of the Workshop on Linked Data on the Web (LDOW’09).
    [68]
    Olaf Hartig. 2013. An overview on execution strategies for Linked Data queries. Datenbank-Spektrum 13, 2 (2013), 89--99.
    [69]
    Olaf Hartig. 2013. SQUIN: A traversal-based query execution system for the web of linked data. In Proceedings of the ACM Special Interest Group on Management of Data (SIGMOD’13). ACM, 1081--1084.
    [70]
    Olaf Hartig and Jun Zhao. 2009. Using web data provenance for quality assessment. In Proceedings of the International Workshop on the role of Semantic Web in Provenance Management (SWPM’09). 29--34.
    [71]
    Olaf Hartig and Jun Zhao. 2010. Publishing and consuming provenance metadata on the web of linked data. In Provenance and Annotation of Data and Processes. Springer, 78--90.
    [72]
    A. Hogan, A. Harth, J. Umbrich, S. Kinsella, A. Polleres, and S. Decker. 2011. Searching and browsing linked data with swse: The semantic web search engine. Web Semantics: Sci. Serv. Agents World Wide Web 9, 4 (2011), 365--401.
    [73]
    A. Hogan, J. Umbrich, A. Harth, R. Cyganiak, A. Polleres, and S. Decker. 2012. An empirical survey of linked data conformance. Web Semantics: Sci. Serv. Agents World Wide Web 14 (2012), 14--44.
    [74]
    K. Hose, R. Schenkel, M. Theobald, and G. Weikum. 2011. Database foundations for scalable RDF processing. In Proceedings of the International Conference on Reasoning Web: Semantic Technologies for the Web of Data. Springer-Verlag, 202--249.
    [75]
    Filip Ilievski, Wouter Beek, Marieke van Erp, Laurens Rietveld, and Stefan Schlobach. 2016. LOTUS: Adaptive text search for big linked data. In Proceedings of the International Semantic Web Conference. Springer, 470--485.
    [76]
    A. Isaac and B. Haslhofer. 2013. Europeana linked open data—data.europeana.eu. Semantic Web 4, 3 (2013), 291--297.
    [77]
    Krzysztof Janowicz, Pascal Hitzler, Benjamin Adams, Dave Kolas, I. I. Vardeman et al. 2014. Five stars of linked data vocabulary use. Semantic Web 5, 3 (2014), 173--176.
    [78]
    Ernesto Jiménez-Ruiz and Bernardo Cuenca Grau. 2011. Logmap: Logic-based and scalable ontology matching. In Proceedings of the International Semantic Web Conference. Springer, 273--288.
    [79]
    Tobias Käfer, Ahmed Abdelrahman, Jürgen Umbrich, Patrick O’ Byrne, and Aidan Hogan. 2013. Observing linked data dynamics. In Proceedings of the Extended Semantic Web Conference. Springer, 213--227.
    [80]
    Maulik R. Kamdar and Mark A. Musen. 2017. PhLeGrA: Graph analytics in pharmacology over the web of life sciences linked open data. In Proceedings of the World Wide Web Conference. 321--329.
    [81]
    Zoi Kaoudi and Ioana Manolescu. 2015. RDF in the clouds: A survey. VLDB J. 24, 1 (2015), 67--91.
    [82]
    Michel Klein. 2001. Combining and relating ontologies: An analysis of problems and solutions. In Proceeding sof the International Joint Conferences on Artificial Intelligence (OIS@ IJCAI’01).
    [83]
    T. Knap, J. Michelfeit, J. Daniel, P. Jerman, D. Rychnovskỳ, T. Soukup, and M. Nečaskỳ. 2012. ODCleanStore: A framework for managing and providing integrated linked data on the web. In Proceedings of the International Conference on Web Information Systems Engineering (WISE’12). Springer, 815--816.
    [84]
    Craig A Knoblock and Pedro Szekely. 2015. Exploiting semantics for big data integration. AI Magazine 36, 1 (2015).
    [85]
    Haridimos Kondylakis and Dimitris Plexousakis. 2013. Ontology evolution without tears. Web Semantics: Sci. Serv. Agents World Wide Web 19 (2013), 42--58.
    [86]
    Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, Jens Lehmann, Roland Cornelissen, and Amrapali Zaveri. 2014. Test-driven evaluation of linked data quality. In Proceedings of the 23rd World Wide Web Conference (WWW’14). ACM, 747--758.
    [87]
    Dimitris Kontokostas, Amrapali Zaveri, Sören Auer, and Jens Lehmann. 2013. Triplecheckmate: A Tool for Crowdsourcing the Quality Assessment of Linked Data. Springer, 265--272.
    [88]
    Christina Lantzaki, Panagiotis Papadakos, Anastasia Analyti, and Yannis Tzitzikas. 2017. Radius-aware approximate blank node matching using signatures. Knowl. Info. Syst. 50, 2 (2017), 505–542.
    [89]
    Wangchao Le, Songyun Duan, Anastasios Kementsietsidis, Feifei Li, and Min Wang. 2011. Rewriting queries on SPARQL views. In Proceedings of the 20th International Conference on World Wide Web. ACM, 655--664.
    [90]
    Jens Lehmann, Robert Isele, et al. 2015. DBpedia--a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6, 2 (2015), 167--195.
    [91]
    Luiz André P. Paes Leme, Giseli Rabello Lopes, Bernardo Pereira Nunes, Marco Antonio Casanova, and Stefan Dietze. 2013. Identifying candidate datasets for data interlinking. In Proceedings of the International Conference on Web Engineering (ICWE’13). Springer, 354--366.
    [92]
    Maurizio Lenzerini. 2002. Data integration: A theoretical perspective. In Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM, 233--246.
    [93]
    Xian Li, Xin Luna Dong, Kenneth Lyons, Weiyi Meng, and Divesh Srivastava. 2012. Truth finding on the deep web: Is the problem solved? In Proceedings of the VLDB Endowment, Vol. 6. VLDB Endowment, 97--108.
    [94]
    Vanessa Lopez, Christina Unger, Philipp Cimiano, and Enrico Motta. 2013. Evaluating question answering over linked data. Web Semantics: Sci. Serv. Agents World Wide Web 21 (2013), 3--13.
    [95]
    Konstantinos Makris, Nikos Bikakis, Nektarios Gioldasis, and Stavros Christodoulakis. 2012. SPARQL-RW: Transparent query access over mapped RDF data sources. In Proceedings of the 15th International Conference on Extending Database Technology (EDBT’12). ACM, 610--613.
    [96]
    Pablo N. Mendes, Hannes Mühleisen, and Christian Bizer. 2012. Sieve: Linked data quality assessment and fusion. In Proceedings of the Joint International Conference on Extending Database Technology and International Conference on Database Theory (EDBT/ICDT’12). ACM, 116--123.
    [97]
    Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. Arxiv Preprint Arxiv:1301.3781.
    [98]
    N. Minadakis, Y. Marketakis, H. Kondylakis, G. Flouris, M. Theodoridou, M. Doerr, and G. de Jong. 2015. X3ML framework: An effective suite for supporting data mappings. In Proceedings of the Workshop on Extending, Mapping and Focusing the CRM at the International Conference on Theory and Practice of Digital Libraries (EMF-CRM@ TPDL’15). 1--12.
    [99]
    Paolo Missier, Khalid Belhajjame, and James Cheney. 2013. The W3C PROV family of specifications for modelling provenance metadata. In Proceedings of the 16th International Conference on Extending Database Technology (EDBT’13). ACM, 773--776.
    [100]
    Gabriela Montoya. 2016. Answering SPARQL Queries Using Views. Ph.D. Dissertation. Université de Nantes.
    [101]
    Gabriela Montoya, Luis-Daniel Ibáñez, Hala Skaf-Molli, Pascal Molli, and Maria-Esther Vidal. 2014. Semlav: Local-as-view mediation for sparql queries. In Transactions on Large-Scale Data- and Knowledge-Centered Systems XIII. Springer, 33--58.
    [102]
    Camilo Morales, Diego Collarana, Maria-Esther Vidal, and Sören Auer. 2017. MateTee: A semantic similarity metric based on translation embeddings for knowledge graphs. In Proceedings of the International Conference on Web Engineering (ICWE’17). Springer, 246--263.
    [103]
    L. Moreau, B. Ludäscher, I. Altintas, R. S. Barga, S. Bowers, S. Callahan, G. Chin Jr, B. Clifford et al. 2008. The first provenance challenge. Concurr. Comput.: Pract. Exper. 20, 5 (2008), 409--418.
    [104]
    M. Mountantonakis, N. Minadakis, Y. Marketakis, P. Fafalios, and Y. Tzitzikas. 2016. Quantifying the connectivity of a semantic warehouse and understanding its evolution over time. Int. J. Semantic Web Info. Syst. 12, 3 (2016), 27--78.
    [105]
    Michalis Mountantonakis and Yannis Tzitzikas. 2016. On measuring the lattice of commonalities among several linked datasets. Proc. VLDB Endow. 9, 12 (2016), 1101--1112.
    [106]
    Michalis Mountantonakis and Yannis Tzitzikas. 2018. High performance methods for linked open data connectivity analytics. Information 9, 6 (2018).
    [107]
    Michalis Mountantonakis and Yannis Tzitzikas. 2018. Scalable methods for measuring the connectivity and quality of large numbers of linked datasets. J. Data Info. Qual. 9, 3 (2018), 15.
    [108]
    Markus Nentwig, Michael Hartung, Axel-Cyrille Ngonga Ngomo, and Erhard Rahm. 2017. A survey of current link discovery frameworks. Semantic Web 8, 3 (2017), 419--436.
    [109]
    Markus Nentwig, Tommaso Soru, Axel-Cyrille Ngonga Ngomo, and Erhard Rahm. 2014. LinkLion: A link repository for the web of data. In Proceedings of the European Semantic Web Conference (ESWC’14). Springer, 439--443.
    [110]
    DuyHoa Ngo, Zohra Bellahsene, and R Coletta. 2016. Overview of YAM++—(not) Yet Another Matcher for ontology alignment task. Web Semantics: Science, Services and Agents on the World Wide Web 41 (2016), 30–49.
    [111]
    Axel-Cyrille Ngonga Ngomo and Sören Auer. 2011. Limes-a time-efficient approach for large-scale link discovery on the web of data. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’11). 2312--2317.
    [112]
    Andriy Nikolov, Mathieu d’ Aquin, and Enrico Motta. 2011. What should i link to? identifying relevant sources and classes for data linking. In Proceedings of the Joint International Semantic Technology Conference. Springer, 284--299.
    [113]
    N. F. Noy. 2004. Semantic integration: A survey of ontology-based approaches. ACM SIGMOD Rec. 33, 4 (2004), 65--70.
    [114]
    Peter Ochieng and Swaib Kyanda. 2018. A statistically based ontology matching tool. Distrib. Parallel Databases 36, 1 (2018), 195–217.
    [115]
    D. Oguz, B. Ergenc, S. Yin, O. Dikenelli, and A. Hameurlain. 2015. Federated query processing on linked data: A qualitative survey and open challenges. Knowl. Eng. Rev. 30, 5 (2015), 545--563.
    [116]
    Eyal Oren, Renaud Delbru, Michele Catasta, Richard Cyganiak, Holger Stenzhorn, and Giovanni Tummarello. 2008. Sindice.com: A document-oriented lookup index for open linked data. Int. J. Metadata Semantics Ontol. 3, 1 (2008), 37--52.
    [117]
    Lorena Otero-Cerdeira, Francisco J. Rodríguez-Martínez, and Alma Gómez-Rodríguez. 2015. Ontology matching: A literature review. Expert Syst. Appl. 42, 2 (2015), 949--971.
    [118]
    Jeff Pasternack and Dan Roth. 2013. Latent credibility analysis. In Proceedings of the World Wide Web Conference (WWW’13). ACM, 1009--1020.
    [119]
    Heiko Paulheim. 2017. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web 8, 3 (2017), 489--508.
    [120]
    Peng Peng, Lei Zou, M Tamer Özsu, Lei Chen, and Dongyan Zhao. 2016. Processing SPARQL queries over distributed RDF graphs. VLDB J. 25, 2 (2016), 243--268.
    [121]
    Petar Petrovski, Volha Bryl, and Christian Bizer. 2014. Integrating product data from websites offering microdata markup. In Proceedings of the 23rd International Conference on World Wide Web. ACM, 1299--1304.
    [122]
    B. Quilitz and U. Leser. 2008. Querying distributed RDF data sources with SPARQL. In Proceedings of the European Semantic Web Conference (ESWC’08). Springer, 524--538.
    [123]
    Erhard Rahm. 2016. The Case for holistic data integration. In Proceedings of the East European Conference on Advances in Databases and Information Systems (ADBIS’16). Springer, 11--27.
    [124]
    Thomas Rebele, Fabian Suchanek, Johannes Hoffart, Joanna Biega, Erdal Kuzey, and Gerhard Weikum. 2016. YAGO: A multilingual knowledge base from wikipedia, wordnet, and geonames. In Proceedings of the International Semantic Web Conference (ISWC’16). Springer, 177--185.
    [125]
    Theodoros Rekatsinas, Xin Luna Dong, Lise Getoor, and Divesh Srivastava. 2015. Finding quality in quantity: The challenge of discovering valuable sources for integration. In Proceedings of the Conference on Innovative Data Systems Research (CIDR’15).
    [126]
    L. Rietveld, W. Beek, and S. Schlobach. 2015. LOD lab: Experiments at LOD scale. In Proceedings of the International Semantic Web Conference (ISWC’15). Springer, 339--355.
    [127]
    Petar Ristoski, Christian Bizer, and Heiko Paulheim. 2015. Mining the web of linked data with rapidminer. Web Semantics: Sci. Serv. Agents World Wide Web 35 (2015), 142--151.
    [128]
    P. Ristoski and H. Paulheim. 2016. Rdf2vec: Rdf graph embeddings for data mining. In Proceedings of the International Semantic Web Conference (ISWC’16). Springer, 498--514.
    [129]
    Muhammad Saleem, Ali Hasnain, and Axel-Cyrille Ngonga Ngomo. 2018. Largerdfbench: A billion triples benchmark for sparql endpoint federation. J. Web Semantics (2018).
    [130]
    Muhammad Saleem, Yasar Khan, Ali Hasnain, Ivan Ermilov, and Axel-Cyrille Ngonga Ngomo. 2016. A fine-grained evaluation of SPARQL endpoint federation systems. Semantic Web 7, 5 (2016), 493--518.
    [131]
    Muhammad Saleem and Axel-Cyrille Ngonga Ngomo. 2014. Hibiscus: Hypergraph-based source selection for sparql endpoint federation. In Proceedings of the European Semantic Web Conference. Springer, 176--191.
    [132]
    Muhammad Saleem, Axel-Cyrille Ngonga Ngomo, Josiane Xavier Parreira, Helena F. Deus, and Manfred Hauswirth. 2013. Daw: Duplicate-aware federated query processing over the web of data. In Proceedings of the International Semantic Web Conference (ISWC’13). Springer, 574--590.
    [133]
    M. Saleem, S. Padmanabhuni, A. N. Ngomo, A. Iqbal, J. S. Almeida, S. Decker, and H. F. Deus. 2014. TopFed: TCGA tailored federated query processing and linking to LOD. J. Biomed. Semantics 5, 1 (2014), 1.
    [134]
    Cristina Sarasua, Elena Simperl, and Natalya F. Noy. 2012. Crowdmap: Crowdsourcing ontology alignment with microtasks. In Proceedings of the International Semantic Web Conference. Springer, 525--541.
    [135]
    Cristina Sarasua, Steffen Staab, and Matthias Thimm. 2017. Methods for intrinsic evaluation of links in the web of data. In Proceedings of the European Semantic Web Conference. Springer, 68--84.
    [136]
    Max Schmachtenberg, Christian Bizer, and Heiko Paulheim. 2014. Adoption of the linked data best practices in different topical domains. In Proceedings of the International Semantic Web Conference (ISWC’14). Springer, 245--260.
    [137]
    Michael Schmidt, Olaf Görlitz, Peter Haase, Günter Ladwig, Andreas Schwarte, and Thanh Tran. 2011. Fedbench: A benchmark suite for federated semantic data query processing. In Proceedings of the International Semantic Web Conference (ISWC’11). 585--600.
    [138]
    Falk Scholer, Diane Kelly, and Ben Carterette. 2016. Information retrieval evaluation using test collections. Info. Retrieval J. 19, 3 (2016), 225--229.
    [139]
    Andreas Schultz, Andrea Matteini, Robert Isele, Pablo N. Mendes, Christian Bizer, and Christian Becker. 2012. LDIF-a framework for large-scale Linked Data integration. In Proceedings of the 21st International World Wide Web Conference, Developers Track (WWW’12).
    [140]
    Andreas Schwarte, Peter Haase, Katja Hose, Ralf Schenkel, and Michael Schmidt. 2011. Fedx: Optimization techniques for federated query processing on linked data. In Proceedings of the International Semantic Web Conference. Springer, 601--616.
    [141]
    Juan F. Sequeda, Marcelo Arenas, and Daniel P. Miranker. 2012. On directly mapping relational databases to RDF and OWL. In Proceedings of the 21st International Conference on World Wide Web. ACM, 649--658.
    [142]
    Juan F. Sequeda, Syed Hamid Tirmizi, Oscar Corcho, and Daniel P. Miranker. 2011. Survey of directly mapping sql databases to the semantic web. Knowl. Eng. Rev. 26, 4 (2011), 445--486.
    [143]
    Nigel Shadbolt, Tim Berners-Lee, and Wendy Hall. 2006. The semantic web revisited. IEEE Intell. Syst. 21, 3 (2006), 96--101.
    [144]
    Pavel Shvaiko and Jérôme Euzenat. 2013. Ontology matching: State of the art and future challenges. IEEE Trans. Knowl. Data Eng. 25, 1 (2013), 158--176.
    [145]
    Karen Smith-Yoshimura. 2018. Analysis of 2018 international linked data survey for implementers. Code {4} lib J. 42 (2018).
    [146]
    Stefano Spaccapietra and Christine Parent. 1994. View integration: A step forward in solving structural conflicts. IEEE Trans. Knowl. Data Eng. 6, 2 (1994), 258--274.
    [147]
    Stefano Spaccapietra, Christine Parent, and Yann Dupont. 1992. Model independent assertions for integration of heterogeneous schemas. VLDB J. 1, 1 (1992), 81--126.
    [148]
    Fabian M. Suchanek, Serge Abiteboul, and Pierre Senellart. 2011. Paris: Probabilistic alignment of relations, instances, and schema. Proc. VLDB Endow. 5, 3 (2011), 157--168.
    [149]
    Ignacio Traverso, Maria-Esther Vidal, Benedikt Kämpgen, and York Sure-Vetter. 2016. GADES: A graph-based semantic similarity measure. In Proceedings of the 12th International Conference on Semantic Systems. ACM, 101--104.
    [150]
    Priyansh Trivedi, Gaurav Maheshwari, Mohnish Dubey, and Jens Lehmann. 2017. Lc-quad: A corpus for complex question answering over knowledge graphs. In Proceedings of the International Semantic Web Conference. Springer, 210--218.
    [151]
    Yannis Tzitzikas et al. 2013. Integrating heterogeneous and distributed information about marine species through a top level ontology. In Proceedings of the International Conference on Metadata and Semantics Research (MTSR’13). Springer, 289--301.
    [152]
    Yannis Tzitzikas, Mary Kampouraki, and Anastasia Analyti. 2013. Curating the specificity of ontological descriptions under ontology evolution. J. Data Semantics (2013), 1--32.
    [153]
    Yannis Tzitzikas, Nikos Manolis, and Panagiotis Papadakos. 2017. Faceted exploration of RDF/S datasets: A survey. J. Intell. Info. Syst. 48, 2 (2017), 329–364.
    [154]
    Yannis Tzitzikas, Yannis Marketakis et al. 2017. Towards a global record of stocks and fisheries. In Proceedings of the 8th International HAICTA Conference. 328--340.
    [155]
    Yannis Tzitzikas and Carlo Meghini. 2003. Ostensive automatic schema mapping for taxonomy-based peer-to-peer systems. In Proceedings of the International Workshop on Cooperative Information Agents. Springer, 78--92.
    [156]
    Y. Tzitzikas, N. Minadakis, Y. Marketakis, P. Fafalios, C. Allocca, M. Mountantonakis, and I. Zidianaki. 2014. Matware: Constructing and exploiting domain specific warehouses by aggregating semantic data. In Proceedings of the European Semantic Web Conference (ESWC’14). Springer, 721--736.
    [157]
    Yannis Tzitzikas, Nicolas Spyratos, and Panos Constantopoulos. 2005. Mediators over taxonomy-based information sources. VLDB J. 14, 1 (2005), 112--136.
    [158]
    J. Urbani, S. Kotoulas, J. Maassen, F. Van Harmelen, and H. Bal. 2012. WebPIE: A web-scale parallel inference engine using mapreduce. Web Semantics: Sci. Serv. Agents World Wide Web 10 (2012), 59--75.
    [159]
    Pierre-Yves Vandenbussche, Jürgen Umbrich, Luca Matteis, Aidan Hogan, and Carlos Buil-Aranda. 2017. SPARQLES: Monitoring public SPARQL endpoints. Semantic Web 8, 6 (2017), 1049--1065.
    [160]
    Panos Vassiliadis and Timos Sellis. 1999. A survey of logical models for OLAP databases. ACM SIGMOD Rec. 28, 4 (1999), 64--69.
    [161]
    Julius Volz, Christian Bizer, Martin Gaedke, and Georgi Kobilarov. 2009. Silk—A link discovery framework for the web of data. In Proceedings of the World Wide Web Workshop on Linked Data on the Web.
    [162]
    D. Vrandečić and M. Krötzsch. 2014. Wikidata: A free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78--85.
    [163]
    Andreas Wagner, Peter Haase, Achim Rettinger, and Holger Lamm. 2014. Entity-based data source contextualization for searching the web of data. In Proceedings of the European Semantic Web Conference. Springer, 25--41.
    [164]
    M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg et al. 2016. The FAIR Guiding principles for scientific data management and stewardship. Sci. Data 3 (2016).
    [165]
    Ramana Yerneni, Chen Li, Jeffrey Ullman, and Hector Garcia-Molina. 1999. Optimizing large join queries in mediation systems. In Proceedings of the 7th International Conference on Database Theory (ICDT’99). 348–364. http://dl.acm.org/citation.cfm?id&equals;645503.656258.
    [166]
    S. Yumusak, E. Dogdu, H. Kodaz, A. Kamilaris, and P. Vandenbussche. 2017. SpEnD: Linked data SPARQL endpoints discovery using search engines. IEICE Trans. Info. Syst. 100, 4 (2017), 758--767.
    [167]
    Amrapali Zaveri, Dimitris Kontokostas, Mohamed A. Sherif, Lorenz Bühmann, Mohamed Morsey, Sören Auer, and Jens Lehmann. 2013. User-driven quality evaluation of dbpedia. In Proceedings of the European Conference on Semantic Technologies and Artificial Intelligence (SEMANTiCS’13). ACM, 97--104.
    [168]
    Amrapali Zaveri, Anisa Rula, Andrea Maurino, Ricardo Pietrobon, Jens Lehmann, and Sören Auer. 2015. Quality assessment for linked data: A survey. Semantic Web 7, 1 (2015), 63--93.
    [169]
    Linhong Zhu, Majid Ghasemi-Gol, Pedro Szekely, Aram Galstyan, and Craig A. Knoblock. 2016. Unsupervised entity resolution on multi-type graphs. In Proceedings of the International Semantic Web Conference. Springer, 649--667.

    Cited By

    View all
    • (2024)Data Fusion for Destination SuccessMarketing and Big Data Analytics in Tourism and Events10.4018/979-8-3693-3310-5.ch003(38-60)Online publication date: 3-May-2024
    • (2024)A systematic overview of data federation systemsSemantic Web10.3233/SW-22320115:1(107-165)Online publication date: 12-Jan-2024
    • (2024)Semantic Data Integration and Querying: A Survey and ChallengesACM Computing Surveys10.1145/365331756:8(1-35)Online publication date: 26-Apr-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Computing Surveys
    ACM Computing Surveys  Volume 52, Issue 5
    September 2020
    791 pages
    ISSN:0360-0300
    EISSN:1557-7341
    DOI:10.1145/3362097
    • Editor:
    • Sartaj Sahni
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 September 2019
    Accepted: 01 July 2019
    Revised: 01 May 2019
    Received: 01 July 2018
    Published in CSUR Volume 52, Issue 5

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Data integration
    2. RDF
    3. big data
    4. data discovery
    5. semantic web

    Qualifiers

    • Survey
    • Research
    • Refereed

    Funding Sources

    • Hellenic Foundation for Research and Innovation (HFRI) and the General Secretariat for Research and Technology (GSRT)

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)151
    • Downloads (Last 6 weeks)7

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Data Fusion for Destination SuccessMarketing and Big Data Analytics in Tourism and Events10.4018/979-8-3693-3310-5.ch003(38-60)Online publication date: 3-May-2024
    • (2024)A systematic overview of data federation systemsSemantic Web10.3233/SW-22320115:1(107-165)Online publication date: 12-Jan-2024
    • (2024)Semantic Data Integration and Querying: A Survey and ChallengesACM Computing Surveys10.1145/365331756:8(1-35)Online publication date: 26-Apr-2024
    • (2024)A Semantic Analysis Prototype for Distributed Syrian Government Data2024 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems (ICETSIS)10.1109/ICETSIS61505.2024.10459400(692-697)Online publication date: 28-Jan-2024
    • (2024)Situational Data Integration in Question Answering systems: a survey over two decadesKnowledge and Information Systems10.1007/s10115-024-02136-0Online publication date: 18-Jun-2024
    • (2023)Blue Brain Nexus: An open, secure, scalable system for knowledge graph management and data-driven scienceSemantic Web10.3233/SW-22297414:4(697-727)Online publication date: 24-Apr-2023
    • (2023)NELLIE: Never-Ending Linking for Linked Open DataIEEE Access10.1109/ACCESS.2023.330069411(84957-84973)Online publication date: 2023
    • (2023)Declarative RDF graph generation from heterogeneous (semi-)structured dataWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2022.10075375:COnline publication date: 1-Jan-2023
    • (2023)A practical approach to constructing a knowledge graph for soil ecological researchEuropean Journal of Soil Biology10.1016/j.ejsobi.2023.103497117(103497)Online publication date: Jul-2023
    • (2023)Eris: efficiently measuring discord in multidimensional sourcesThe VLDB Journal10.1007/s00778-023-00810-333:2(399-423)Online publication date: 20-Sep-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media