Abstract
The paper presents a vision about a new paradigm of data integration in the context of the scientific world, where data integration is instrumental in exploratory studies carried out by research teams. It briefly overviews the technological challenges to be faced in order to successfully carry out the traditional approach to data integration. Then, three important application scenarios are described in terms of their main characteristics that heavily influence the data integration process. The first application scenario is characterized by the need of large enterprises to combine information from a variety of heterogeneous data sets developed autonomously, managed and maintained independently from the others in the enterprises. The second application scenario is characterized by the need of many organizations to combine information from a large number of data sets dynamically created, distributed worldwide and available on the Web. The third application scenario is characterized by the need of scientists and researchers to connect each others research data as new insight is revealed by connections between diverse research data sets. The paper highlights the fact that the characteristics of the second and third application scenarios make unfeasible the traditional approach to data integration, i.e., the design of a global schema and mappings between the local schemata and the global schema. The focus of the paper is on the data integration problem in the context of the third application scenario. A new paradigm of data integration is proposed based on the emerging new empiricist scientific method, i.e., data driven research and the new data seeking paradigm, i.e., data exploration. Finally, a generic scientific application scenario is presented for the purpose of better illustrating the new data integration paradigm, and a concise list of actions that must be performed in order to successfully carry out the new paradigm of big research data integration is described.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bernstein, P.A., Haas, L.M.: Information integration in the enterprise. Commun. ACM 51(9), 72–79 (2008)
Bizer, C., Seaborne, A.: D2RQ-treating non-RDF databases as virtual RDF graphs. In: Proceedings of the 3rd international semantic web conference (ISWC 2004), vol. 2004 (2004)
Brackett, M.H.: Data Resource Integration: Understanding and Resolving a Disparate Data Resource, vol. 2. Technics Publications, Denville (2012)
Buneman, P., Davidson, S., Frew, J.: Why data citation is a computational problem. Commun. ACM 59(9), 50–57 (2016)
Chawathe, S., et al.: The TSIMMIS project: integration of heterogenous information sources (1994)
Council, N.R., et al.: Steps Toward Large-scale Data Integration in the Sciences: Summary of a Workshop. National Academies Press, Washington, D.C. (2010)
Daraio, C., et al.: Data integration for research and innovation policy: an ontology-based data management approach. Scientometrics 106(2), 857–871 (2016)
Doan, A., Halevy, A.Y.: Semantic integration research in the database community: a brief survey. AI Mag. 26(1), 83–83 (2005)
Dong, X.L., Srivastava, D.: Big data integration. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 1245–1248. IEEE (2013)
Guarino, N., Oberle, D., Staab, S.: What is an ontology? In: Staab, S., Studer, R. (eds.) Handbook on Ontologies. IHIS, pp. 1–17. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-92673-3_0
Gutierrez, C., Hurtado, C.A., Vaisman, A.: Introducing time into RDF. IEEE Trans. Knowl. Data Eng. 19(2), 207–218 (2007)
Halevy, A., Rajaraman, A., Ordille, J.: Data integration: the teenage years. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 9–16. VLDB Endowment (2006)
Heath, T., Bizer, C.: Linked data: evolving the web into a global data space. Synth. Lect. Semant. Web: Theory Technol. 1(1), 1–136 (2011)
Idreos, S., Papaemmanouil, O., Chaudhuri, S.: Overview of data exploration techniques. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 277–281. ACM (2015)
Kitchin, R.: Big data, new epistemologies and paradigm shifts. Big Data Soc. 1(1), 2053951714528481 (2014)
Koch, C.: Data integration against multiple evolving autonomous schemata. Ph.D. thesis, Vienna U (2001)
Lenzerini, M.: Data integration: a theoretical perspective. In: Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 233–246. ACM (2002)
McBride, B.: The resource description framework (RDF) and its vocabulary description language RDFs. In: Staab, S., Studer, R. (eds.) Handbook on Ontologies, pp. 51–65. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24750-0_3
Naumann, F., Bilke, A., Bleiholder, J., Weis, M.: Data fusion in three steps: resolving inconsistencies at schema-, tuple-, and value-level. IEEE Data Eng. Bull. 29(2), 21–31 (2006)
Proll, S., Rauber, A.: Scalable data citation in dynamic, large databases: model and reference implementation. In: 2013 IEEE International Conference on Big Data, pp. 307–312. IEEE (2013)
Silvello, G.: Theory and practice of data citation. J. Assoc. Inf. Sci. Technol. 69(1), 6–20 (2018)
Vassiliadis, P.: A survey of extract-transform-load technology. Int. J. Data Warehous. Min. (IJDWM) 5(3), 1–27 (2009)
Wiederhold, G.: Interoperation, mediation and ontologies. In: FGCS Workshop on Heterogeneous Cooperative Knowledge-Bases (1994)
Ziegler, P., Dittrich, K.R.: Three decades of data intecration — all problems solved? In: Jacquart, R. (ed.) Building the Information Society. IIFIP, vol. 156, pp. 3–12. Springer, Boston, MA (2004). https://doi.org/10.1007/978-1-4020-8157-6_1
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Bartalesi, V., Meghini, C., Thanos, C. (2019). Big Research Data Integration. In: Kotzinos, D., Laurent, D., Spyratos, N., Tanaka, Y., Taniguchi, Ri. (eds) Information Search, Integration, and Personalization. ISIP 2018. Communications in Computer and Information Science, vol 1040. Springer, Cham. https://doi.org/10.1007/978-3-030-30284-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-30284-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30283-2
Online ISBN: 978-3-030-30284-9
eBook Packages: Computer ScienceComputer Science (R0)