Abstract
The prior declaration of a schema when creating a database (DB) is not necessary for most NoSQL systems. This “Schemaless” property is important since it provides undeniable flexibility during data exploitation. However, the absence of schema is a major obstacle to the expression of precise queries on a DB. A new area of research has emerged to allow users of Schemaless NoSQL systems to visualize the schema of the data. Research works have proposed schema extraction processes, but these solutions are generally limited. In our previous works (Abdelhedi et al. in Proceedings of the 10th international conference on model-driven engineering and software development, pp 61–71. https://doi.org/10.5220/0010899000003119, 2022), we proposed a logical schema extraction process for a document-oriented NoSQL DB to address the needs of a medical application. In this paper, we extend this process to additional relationship types. To do this, we use the model driven architecture which proposes a development method based on metamodeling and the definition of transformation rules. The DB schema is obtained by applying a set of transformation rules to the specifications extracted from the DB. The interest of our process is to produce a schema that allows users of a NoSQL DB to build complex and precise queries. This is useful for both computer scientists who create a large number of complex queries as well as for decision makers who often have difficulty in apprehending the semantic of the data. Our extraction process was implemented in a medical application.
Similar content being viewed by others
Change history
10 February 2023
A Correction to this paper has been published: https://doi.org/10.1007/s42979-023-01700-9
References
Abdelhedi F, Rajhi H, Zurfluh G. Extraction process of the logical schema of a document-oriented NoSQL database. In: Proceedings of the 10th international conference on model-driven engineering and software development, 2022, pp. 61–71. https://doi.org/10.5220/0010899000003119.
Wang L, Wang J, Wang M, Li Y, Liang Y, Xu D. Using internet search engines to obtain medical information: a comparative study. J Med Internet Res. 2012;14(3):e74. https://doi.org/10.2196/jmir.1943.
Wang L, Hassanzadeh O, Zhang S, Shi J, Jiao L, Zou J, Wang C. Schema management for document stores. Proc VLDB Endow. 2015;8:922–33. https://doi.org/10.14778/2777598.2777601.
Baazizi M-A, Lahmar HB, Colazzo D, Ghelli G, Sartiani C. Schema inference for massive JSON datasets. Extend Database Technol. 2017. https://doi.org/10.5441/002/edbt.2017.21.
Baazizi M-A, Colazzo D, Ghelli G, Sartiani C. Parametric schema inference for massive JSON datasets. VLDB J. 2019;28(4):497–521. https://doi.org/10.1007/s00778-018-0532-7.
Frozza AA, dos Santos Mello R, da Costa FS. An approach for schema extraction of JSON and extended JSON document collections. In: IEEE international conference on information reuse and integration (IRI), 2018. pp. 356–63. https://doi.org/10.1109/IRI.2018.00060.
Fruth M, Dauberschmidt K, Scherzinger SJ. Managing schemas for NoSQL document stores. In: IEEE 37th international conference on data engineering (ICDE), 2021. pp. 2693–6. https://doi.org/10.1109/ICDE51399.2021.00306.
Istiqamah AN, Wiharja KRS. A schema extraction of document-oriented database for data warehouse. Int J Inf Commun Technol. 2021;7(2):36–47. https://doi.org/10.21108/ijoict.v7i2.584.
Aftab Z, Iqbal W, Almustafa KM, Bukhari F, Abdullah M. Automatic NoSQL to relational database transformation with dynamic schema mapping. Sci Programm. 2020. https://doi.org/10.1155/2020/8813350.
Chillón AH, Hoyos JR, García-Molina J, Ruiz DS. Discovering entity inheritance relationships in document stores. Knowl Based Syst. 2021;230:107394. https://doi.org/10.1016/j.knosys.2021.107394.
Ruiz DS, Morales SF, Molina JG. Inferring versioned schemas from NoSQL databases and its applications. In: International conference on conceptual modeling, 2015, pp. 467–480. https://doi.org/10.1007/978-3-319-25264-3_35.
ODMS: Operational Database Management Systems. http://www.odbms.org/odmg-standard/. Accessed 10 Apr 2021.
OrientDB. https://orientdb.org/. Accessed 10 Apr 2021.
OMG. MDA-The architecture of choice for a changing world. https://www.omg.org/mda. Accessed 1 Apr 2021.
Laney D. 3D data management: Controlling data volume, velocity and variety. META group research note, 2001.
Idera: Data modeling tools for enterprise-scale data architecture. https://www.idera.com/products/er-studio/enterprise-data-modeling. Accessed 2 June 2021.
Erwin: Erwin Data Modeler. https://www.erwin.com/products/erwin-data-modeler/. Accessed 2 June 2021.
MongoDB. https://www.mongodb.com/products/compass. Accessed 5 Sept 2021.
MongoDB. https://www.mongodb.com/. Accessed 5 Sept 2021.
OMG. https://www.omg.org/. Accessed 1 June 2021.
Eclipse: Eclipse Modeling Framework (EMF) https://www.eclipse.org/modeling/emf/. Accessed 10 Jan 2022.
Eclipse: Ecore Tools. https://www.eclipse.org/ecoretools/doc/index.html. Accessed 20 Jan 2022.
OMG: MOF query/view/transformation. https://www.omg.org/spec/QVT/1.3/About-QVT/. Accessed 20 Jan 2022.
UML. https://www.uml.org/. Accessed 12 Dec 2021.
Baazizi M-A, Colazzo D, Ghelli G, Sartiani C. A type system for interactive JSON schema inference. In: 46th international colloquium on automata, languages, and programming (ICALP), 2019.
OrientDB: basic concepts. https://orientdb.org/docs/3.0.x/datamodeling/Concepts.html. Accessed 10 Jan 2022.
OMG: XML metadata interchange. https://www.omg.org/spec/XMI/. Accessed 20 Jan 2022.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Advances on Model-Driven Engineering and Software Development” guest edited by Luís Ferreira Pires and Slimane Hammoudi.
The original online version of this article was revised: Incorrect version of Figures 2 and 3 were published in the original publication. Now, they have been corrected.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Abdelhedi, F., Rajhi, H. & Zurfluh, G. Extraction of Semantic Links from a Document-Oriented NoSQL Database. SN COMPUT. SCI. 4, 148 (2023). https://doi.org/10.1007/s42979-022-01578-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-022-01578-z