Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-030-32065-2_8guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Metadata Discovery Using Data Sampling and Exploratory Data Analysis

Published: 28 October 2019 Publication History

Abstract

Metadata discovery is a prominent contributor towards understanding the semantics of data, relationships between data, and fundamental data features for the purpose of data management, query processing, and data integration. Metadata discovery is constantly evolving with the help of data profiling and manual annotators, resulting in various good quality data profiling techniques and tools. Even though, there are different metadata standards specified for distinct fields such as finance, biology, experimental physics, medicine, there is no generic method that discovers metadata automatically or presents them in a unified way. In this paper, we present a technique for discovering and generating metadata for data sources that do not provide explicit metadata. To this end, we apply exploratory data analysis to produce two kinds of metadata, i.e., administrative and technical, in order to find similarities between resources, w.r.t. their structures and contents. Our technique was evaluated experimentally. The results show that the technique allows to identify similar data sources and compute their similarity measures.

References

[1]
Sakr Sherif and Zomaya Albert Y Encyclopedia of Big Data Technologies 2019 Cham Springer
[2]
Abedjan, Z., Golab, L., Naumann, F.: Data profiling. In: IEEE International Conference on Data Engineering (ICDE), pp. 1432–1435 (2016)
[3]
Aindrila Ghosh JM and Nashaat M A comprehensive review of tools for exploratory analysis of tabular industrial datasets Vis. Inform. 2018 2 235-253
[4]
Bauckmann, J., Leser, U., Naumann, F.: Efficiently computing inclusion dependencies for schema discovery. In: International Conference on Data Engineering Workshops, p. 2 (2006)
[5]
Bouguettaya A, Benatallah B, and Elmargamid A Interconnecting Heterogeneous Information Systems 1998 Boston Springer Kluwer Academic Publishers, ISBN 0792382161
[6]
Ceravolo P et al. Big data semantics J. Data Semant. 2018 7 2 65-85
[7]
Chen CLP and Zhang C Data-intensive applications, challenges, techniques and technologies: a survey on big data Inf Sci. 2014 275 314-347
[8]
DublinCore: Dublin core metadata initiative. http://dublincore.org/specifications/dublin-core/
[9]
Duggan J et al. The BigDAWG polystore system SIGMOD Rec. 2015 44 2 11-16
[10]
Edvardsen, L.F.H.: Using the structural content of documents to automatically generate quality metadata. Ph.D. thesis, Norwegian University of Science and Technology (2013)
[11]
Ehrlich, J., Roick, M., Schulze, L., Zwiener, J., Papenbrock, T., Naumann, F.: Holistic data profiling: simultaneous discovery of various metadata. In: International Conference on Extending Database Technology (EDBT), pp. 305–316 (2016)
[12]
Elmagarmid A, Rusinkiewicz M, and Sheth A Management of Heterogeneous and Autonomous Database Systems 1999 San Francisco Morgan Kaufmann
[13]
Gali, N., Mariescu-Istodor, R., Frnti, P.: Similarity measures for title matching. In: International Conference on Pattern Recognition (ICPR) (2016)
[14]
Gallinucci E, Golfarelli M, and Rizzi S Schema profiling of document-oriented databases Inf. Syst. 2018 75 13-25
[15]
Halevy, A.Y., et al.: Goods: organizing google’s datasets. In: ACM SIGMOD International Conference on Management of Data, pp. 795–806 (2016)
[16]
Hewasinghage, M., Varga, J., Abelló, A., Zimányi, E.: Managing polyglot systems metadata with hypergraphs. In: International Conference on Conceptual Modeling (ER), pp. 463–478 (2018)
[17]
IEEE: IEEE LOM: IEEE standard for learning object metadata. https://standards.ieee.org/standard/1484_12_1-2002.html
[18]
IEEE Standards Association: IEEE Big Data Governance and Metadata Management (BDGMM). https://standards.ieee.org/industry-connections/BDGMM-index.html
[19]
IEEELO: IEEE standard for learning object metadata. https://ieeexplore.ieee.org/document/1032843
[20]
Jarke M, Lenzerini M, Vassiliou Y, and Vassiliadis P Fundamentals of Data Warehouses 2003 Heidelberg Springer
[22]
Kolaitis, P.G.: Reflections on schema mappings, data exchange, and metadata management. In: ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), pp. 107–109 (2018)
[23]
Kunz M, Puchta A, Groll S, Fuchs L, and Pernul G Attribute quality management for dynamic identity and access management J. Inf. Secur. Appl. 2019 44 64-79
[24]
Liu, M., Wang, Q.: Rogas: a declarative framework for network analytics. In: International Conference on Very Large Data Bases (VLDB), vol. 9, no. 13, pp. 1561–1564 (2016)
[25]
March, F.D., Lopes, S., Petit, J.-M: Efficient algorithms for mining inclusion dependencies. In: International Conference on Extending Database Technology (EDBT), pp. 464–476 (2002)
[26]
Poole, J., Chang, D., Tolbert, D., Mellor, D.: Common Warehouse Metamodel. Wiley, Developer’s Guide (2003)
[27]
Russom, P.: Data lakes: purposes, practices, patterns, and platforms (2017). TDWI white paper
[29]
Stefanowski J, Krawiec K, and Wrembel R Exploring complex and big data Appl. Math. Comput. Sci. 2017 27 4 669-679
[30]
Terrizzano, I., Schwarz, P., Roth, M., Colino, J.E.: Data wrangling: the challenging journey from the wild to the lake. In: Conference on Innovative Data Systems Research (CIDR) (2015)
[32]
Varga J, Romero O, Pedersen TB, and Thomsen C Analytical metadata modeling for next generation BI systems J. Syst. Softw. 2018 144 240-254
[33]
Wiederhold G Mediators in the architecture of future information systems IEEE Comput. 1992 25 3 38-49
[34]
Wu, D., Sakr, S., Zhu, L.: HDM: optimized big data processing with data provenance. In: International Conference on Extending Database Technology (EDBT), pp. 530–533 (2017)
[35]
Wylot M, Cudré-Mauroux P, Hauswirth M, and Groth PT Storing, tracking, and querying provenance in linked data IEEE Trans. Knowl. Data Eng. (TKDE) 2017 29 8 1751-1764

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Model and Data Engineering: 9th International Conference, MEDI 2019, Toulouse, France, October 28–31, 2019, Proceedings
Oct 2019
352 pages
ISBN:978-3-030-32064-5
DOI:10.1007/978-3-030-32065-2

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 28 October 2019

Author Tags

  1. Data profiling
  2. Metadata management
  3. Discovery
  4. Enrichment

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media