Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-030-30796-7_15guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Squerall: Virtual Ontology-Based Access to Heterogeneous and Large Data Sources

Published: 26 October 2019 Publication History

Abstract

The last two decades witnessed a remarkable evolution in terms of data formats, modalities, and storage capabilities. Instead of having to adapt one’s application needs to the, earlier limited, available storage options, today there is a wide array of options to choose from to best meet an application’s needs. This has resulted in vast amounts of data available in a variety of forms and formats which, if interlinked and jointly queried, can generate valuable knowledge and insights. In this article, we describe Squerall: a framework that builds on the principles of Ontology-Based Data Access (OBDA) to enable the querying of disparate heterogeneous sources using a unique query language, SPARQL. In Squerall, original data is queried on-the-fly without prior data materialization or transformation. In particular, Squerall allows the aggregation and joining of large data in a distributed manner. Squerall supports out-of-the-box five data sources and moreover, it can be programmatically extended to cover more sources and incorporate new query engines. The framework provides user interfaces for the creation of necessary inputs, as well as guiding non-SPARQL experts to write SPARQL queries. Squerall is integrated into the popular SANSA stack and available as open-source software via GitHub and as a Docker image.

References

[1]
Atzeni P, Bugiotti F, and Rossi L Ralyté J, Franch X, Brinkkemper S, and Wrycza S Uniform access to non-relational database systems: the SOS platform Advanced Information Systems Engineering 2012 Heidelberg Springer 160-174
[2]
Auer S et al. Cabot J, De Virgilio R, Torlone R, et al. The BigDataEurope platform – supporting the variety dimension of big data Web Engineering 2017 Cham Springer 41-59
[3]
Bizer C and Schultz A The Berlin SPARQL benchmark Int. J. Semant. Web Inf. Syst. (IJSWIS) 2009 5 2 1-24
[4]
Botoeva E, Calvanese D, Cogrel B, Corman J, and Xiao G Ghidini C, Magnini B, Passerini A, and Traverso P A generalized framework for ontology-based data access AI*IA 2018 – Advances in Artificial Intelligence 2018 Cham Springer 166-180
[5]
Curé O, Kerdjoudj F, Faye D, Le Duc C, and Lamolle M On the potential integration of an ontology-based data access approach in NoSQL stores Int. J. Distrib. Syst. Technol. (IJDST) 2013 4 3 17-30
[6]
Curé O, Hecht R, Le Duc C, and Lamolle M Hameurlain A, Liddle SW, Schewe K-D, and Zhou X Data integration over NoSQL stores using access path based mappings Database and Expert Systems Applications 2011 Heidelberg Springer 481-495
[7]
Das, S., Sundara, S., Cyganiak, R.: R2RML: RDB to RDF mapping language. Working Group Recommendation, W3C, September 2012
[8]
De Meester B, Dimou A, Verborgh R, and Mannens E Sack H, Rizzo G, Steinmetz N, Mladenić D, Auer S, and Lange C An ontology to semantically declare and describe functions The Semantic Web 2016 Cham Springer 46-49
[9]
De Meester B, Maroy W, Dimou A, Verborgh R, and Mannens E Blomqvist E, Maynard D, Gangemi A, Hoekstra R, Hitzler P, and Hartig O Declarative data transformations for linked data generation: the case of DBpedia The Semantic Web 2017 Cham Springer 33-48
[10]
Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: RML: a generic language for integrated RDF mappings of heterogeneous data. In: LDOW (2014)
[11]
Dixon, J.: Pentaho, Hadoop, and Data Lakes (2010). https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes. Accessed 27 Jan 2019
[12]
Endris KM, Galkin M, Lytra I, Mami MN, Vidal M-E, and Auer S Benslimane D, Damiani E, Grosky WI, Hameurlain A, Sheth A, and Wagner RR MULDER: querying the linked data web by bridging RDF molecule templates Database and Expert Systems Applications 2017 Cham Springer 3-18
[13]
Gadepally, V., et al.: The BigDAWG polystore system and architecture. In: High Performance Extreme Computing Conference, pp. 1–6. IEEE (2016)
[14]
Giese M et al. Optique: zooming in on big data Computer 2015 48 3 60-67
[15]
Harris, S., Seaborne, A., Prud’hommeaux, E.: SPARQL 1.1 query language. W3C Recommendation 21(10) (2013)
[16]
Kolev B, Valduriez P, Bondiombouy C, Jiménez-Peris R, Pau R, and Pereira J CloudMdsQL: querying heterogeneous cloud data stores with a common language Distrib. Parallel Databases 2016 34 4 463-503
[17]
Kolychev A and Zaytsev K Research of the effectiveness of SQL engines working in HDFS J. Theor. Appl. Inf. Technol. 2017 95 20 5360-5368
[18]
Lehmann J, et al., et al. d’Amato C, et al., et al. Distributed semantic analytics using the SANSA stack The Semantic Web – ISWC 2017 2017 Cham Springer 147-155
[19]
Mami, M.N., Graux, D., Scerri, S., Jabeen, H., Auer, S.: Querying data lakes using spark and presto (2019, To appear in The WebConf - Demonstrations)
[20]
Michel F, Faron-Zucker C, and Montagnat J Hartmann S and Ma H A mapping-based method to query MongoDB documents with SPARQL Database and Expert Systems Applications 2016 Cham Springer 52-67
[21]
Miloslavskaya, N., Tolstoy, A.: Application of big data, fast data, and data lake concepts to information security issues. In: International Conference on Future Internet of Things and Cloud Workshops, pp. 148–153. IEEE (2016)
[22]
Ong, K.W., Papakonstantinou, Y., Vernoux, R.: The SQL++ unifying semi-structured query language, and an expressiveness benchmark of SQL-on-Hadoop, NoSQL and NewSQL databases. CoRR, abs/1405.3631 (2014)
[23]
Poggi A, Lembo D, Calvanese D, De Giacomo G, Lenzerini M, and Rosati R Spaccapietra S Linking data to ontologies Journal on Data Semantics X 2008 Heidelberg Springer 133-173
[24]
Quix, C., Hai, R., Vatov, I.: GEMMS: a generic and extensible metadata management system for data lakes. In: CAiSE Forum, pp. 129–136 (2016)
[25]
Saleem M and Ngonga Ngomo A-C Presutti V, d’Amato C, Gandon F, d’Aquin M, Staab S, and Tordai A HiBISCuS: hypergraph-based source selection for SPARQL endpoint federation The Semantic Web: Trends and Challenges 2014 Cham Springer 176-191
[26]
Sellami R, Bhiri S, and Defude B Supporting multi data stores applications in cloud environments IEEE Trans. Serv. Comput. 2016 9 1 59-71
[27]
Sellami R and Defude B Complex queries optimization and evaluation over relational and NoSQL data stores in cloud environments IEEE Trans. Big Data 2018 4 2 217-230
[28]
Spanos, D., Stavrou, P., Mitrou, N.: Bringing relational databases into the semantic web: a survey. Semant. Web 1–41 (2010)
[29]
Unbehauen, J., Martin, M.: Executing SPARQL queries over mapped document stores with SparqlMap-M. In: 12th International Conference on Semantic Systems (2016)
[30]
Vathy-Fogarassy Á and Hugyák T Uniform data access platform for SQL and NoSQL database systems Inf. Syst. 2017 69 93-105
[31]
Vogt, M., Stiemer, A., Schuldt, H.: Icarus: towards a multistore database system. In: 2017 IEEE International Conference on Big Data (Big Data), pp. 2490–2499 (2017)
[32]
Walker, C., Alrehamy, H.: Personal data lake with data gravity pull. In: 5th International Conference on Big Data and Cloud Computing, pp. 160–167. IEEE (2015)
[33]
Wiewiórka, M.S., Wysakowicz, D.P., Okoniewski, M.J., Gambin, T.: Benchmarking distributed data warehouse solutions for storing genomic variant information. Database 2017 (2017)
[34]
Xiao, G., et al.: Ontology-based data access: a survey. In: IJCAI (2018)
[35]
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. HotCloud 10(10–10), 95 (2010)

Cited By

View all
  • (2023)SEDAR: A Semantic Data Reservoir for Heterogeneous DatasetsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614753(5056-5060)Online publication date: 21-Oct-2023
  • (2023)Declarative RDF graph generation from heterogeneous (semi-)structured dataWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2022.10075375:COnline publication date: 1-Jan-2023
  • (2022)Ontology-based Data FederationProceedings of the 11th International Joint Conference on Knowledge Graphs10.1145/3579051.3579070(10-19)Online publication date: 27-Oct-2022
  • Show More Cited By

Index Terms

  1. Squerall: Virtual Ontology-Based Access to Heterogeneous and Large Data Sources
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image Guide Proceedings
            The Semantic Web – ISWC 2019: 18th International Semantic Web Conference, Auckland, New Zealand, October 26–30, 2019, Proceedings, Part II
            Oct 2019
            582 pages
            ISBN:978-3-030-30795-0
            DOI:10.1007/978-3-030-30796-7

            Publisher

            Springer-Verlag

            Berlin, Heidelberg

            Publication History

            Published: 26 October 2019

            Qualifiers

            • Article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 21 Sep 2024

            Other Metrics

            Citations

            Cited By

            View all
            • (2023)SEDAR: A Semantic Data Reservoir for Heterogeneous DatasetsProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3614753(5056-5060)Online publication date: 21-Oct-2023
            • (2023)Declarative RDF graph generation from heterogeneous (semi-)structured dataWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2022.10075375:COnline publication date: 1-Jan-2023
            • (2022)Ontology-based Data FederationProceedings of the 11th International Joint Conference on Knowledge Graphs10.1145/3579051.3579070(10-19)Online publication date: 27-Oct-2022
            • (2022)OPTIMA: Framework Selecting Optimal Virtual Model to Query Large Heterogeneous DataBig Data Analytics and Knowledge Discovery10.1007/978-3-031-12670-3_18(209-215)Online publication date: 22-Aug-2022
            • (2022)Balancing RDF Generation from Heterogeneous Data SourcesThe Semantic Web: ESWC 2022 Satellite Events10.1007/978-3-031-11609-4_40(264-274)Online publication date: 29-May-2022
            • (2021)Enhancing virtual ontology based access over tabular data with Morph-CSVSemantic Web10.3233/SW-21043212:6(869-902)Online publication date: 1-Jan-2021
            • (2021)Chimera: A Bridge Between Big Data Analytics and Semantic TechnologiesThe Semantic Web – ISWC 202110.1007/978-3-030-88361-4_27(463-479)Online publication date: 24-Oct-2021
            • (2020)Semantic Integration of Bosch Manufacturing Data Using Virtual Knowledge GraphsThe Semantic Web – ISWC 202010.1007/978-3-030-62466-8_29(464-481)Online publication date: 2-Nov-2020
            • (2020)FunMap: Efficient Execution of Functional Mappings for Knowledge Graph CreationThe Semantic Web – ISWC 202010.1007/978-3-030-62419-4_16(276-293)Online publication date: 2-Nov-2020
            • (2020)Semantic Data Integration for the SMT Manufacturing Process Using SANSA StackThe Semantic Web: ESWC 2020 Satellite Events10.1007/978-3-030-62327-2_47(307-311)Online publication date: 31-May-2020

            View Options

            View options

            Get Access

            Login options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media