Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Knowledge-Driven Data Ecosystems Toward Data Transparency

Published: 23 December 2021 Publication History

Abstract

A data ecosystem (DE) offers a keystone-player or alliance-driven infrastructure that enables the interaction of different stakeholders and the resolution of interoperability issues among shared data. However, despite years of research in data governance and management, trustability is still affected by the absence of transparent and traceable data-driven pipelines. In this work, we focus on requirements and challenges that DEs face when ensuring data transparency. Requirements are derived from the data and organizational management, as well as from broader legal and ethical considerations. We propose a novel knowledge-driven DE architecture, providing the pillars for satisfying the analyzed requirements. We illustrate the potential of our proposal in a real-world scenario. Last, we discuss and rate the potential of the proposed architecture in the fulfillmentof these requirements.

References

[1]
Rakefet Ackerman, Avigdor Gal, Roce Sagi, and Tomerand Shraga. 2019. A cognitive model of human bias in matching. In PRICAI 2019: Trends in Artificial Intelligence, Abhaya C. Nayak and Alok Sharma (Eds.). Springer International Publishing, Cham, Switzerland, 632–646.
[2]
Sebastian R. Bader, Maria Maleshkova, and Steffen Lohmann. 2019. Structuring reference architectures for the industrial Internet of Things. Future Internet 11, 7 (2019), 151.
[3]
Martina Barbero, Arne Berre, Davide dalle Carbonare, and Walter Weigel. 2020. Towards a European-Governed Data Sharing Space. Retrieved November 4, 2021 from https://www.bdva.eu/sites/default/files/BDVA DataSharingSpaces PositionPaper V2_2020_Final.pdf.
[4]
Carlo Batini and Monica Scannapieco. 2016. Data and Information Quality—Dimensions, Principles and Techniques. Springer, Cham, Switzerland. https://doi.org/10.1007/978-3-319-24106-7
[5]
Zohra Bellahsene, Angela Bonifati, and Erhard Rahm (Eds.). 2011. Schema Matching and Mapping. Springer, Berlin, Germany. https://doi.org/10.1007/978-3-642-16518-4
[6]
Cinzia Capiello, Avigdor Gal, Matthias Jarke, and Jakob Rehof. 2020. Data ecosystems: Sovereign data exchange among organizations (Dagstuhl Seminar 19391). Dagstuhl Reports 9, 9 (2020), 66–134. https://doi.org/10.4230/DagRep.9.9.66
[7]
Chen Chen, Behzad Golshan, Alon Y. Halevy, Wang-Chiew Tan, and AnHai Doan. 2018. BigGorilla: An open-source ecosystem for data preparation and integration. IEEE Data Engineering Bulletin 41, 2 (2018), 10–22.
[8]
Julien Corman, Fernando Florenzano, Juan L. Reutter, and Ognjen Savković. 2019. Validating SHACL constraints over a SPARQL endpoint. In Proceedings of the International Semantic Web Conference. 145–163.
[9]
Federico Croce, Gianluca Cima, Maurizio Lenzerini, and Tiziana Catarci. 2020. Ontology-based explanation of classifiers. In Proceedings of the Workshops of the EDBT/ICDT 2020 Joint Conference. 1–5.
[10]
E. Curry and A. Sheth. 2018. Next-generation smart environments: From system of systems to data ecosystems. IEEE Intelligent Systems 33, 3 (2018), 69–76. https://doi.org/10.1109/MIS.2018.033001418
[11]
H. H. Do and E. Rahm. 2002. COMA: A system for flexible combination of schema matching approaches. In Proceedings of the 28th International Conference on Very Large Data Bases (VLDB’02). 610–621.
[12]
Frederika Welle Donker and Bastiaan van Loenen. 2017. How to assess the success of the open data ecosystem?International Journal of Digital Earth 10, 3 (2017), 284–306. https://doi.org/10.1080/17538947.2016.1224938
[13]
Kemele M. Endris, Philipp D. Rohde, Maria-Esther Vidal, and Sören Auer. 2019. Ontario: Federated query processing against a semantic data lake. In Database and Expert Systems Applications—30th International Conference, DEXA 2019, Linz, Austria, August 26-29, 2019, Proceedings, Part I. Springer, 379–395.
[14]
EU2019. Ethics Guidelines for Trustworthy AI. Retrieved November 21, 2021 from https://www.aepd.es/sites/default/files/2019-12/ai-ethics-guidelines.pdf.
[15]
Avigdor Gal. 2011. Uncertain schema matching. Synthesis Lectures on Data Management 3, 1 (2011), 1–97.
[16]
Tobias Moritz Guggenberger, Frederik Möller, Tim Haarhaus, Inan Gür, and Boris Otto. 2020. Ecosystem types in information systems. In Proceedings of the 28th European Conference on Information Systems (ECIS’20). 1–21.
[17]
Rihan Hai, Sandra Geisler, and Christoph Quix. 2016. Constance: An intelligent data lake system. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD’16). ACM, New York, NY, 2091–2100. https://doi.org/10.1145/2882903.2899389
[18]
Pieter Heyvaert, Ben De Meester, Anastasia Dimou, and Ruben Verborgh. 2019. Rule-driven inconsistency resolution for knowledge graph generation rules. Semantic Web 10, 6 (2019), 1071–1086.
[19]
Matthias Jarke. 2020. Data sovereignty and the Internet of Production. In Advanced Information Systems Engineering, 32nd International Conference; Grenoble. Springer, Cham, Switzerland, 549–558.
[20]
Manfred A. Jeusfeld, Matthias Jarke, and John Mylopouos. 2010. Metamodeling for Method Engineering. MIT Press, Cambridge, MA.
[21]
Samaneh Jozashoori, David Chaves-Fraga, Enrique Iglesias, Maria-Esther Vidal, and Óscar Corcho. 2020. FunMap: Efficient execution of functional mappings for knowledge graph creation. In Proceedings of the International Semantic Web Conference. 276–293.
[22]
B. Kenig and A. Gal. 2013. MFIBlocks: An effective blocking algorithm for entity resolution. Information Systems 38, 6 (Sept. 2013), 908–926.
[23]
F. Kitsios, N. Papachristos, and M. Kamariotou. 2017. Business models for open data ecosystem: Challenges and motivations for entrepreneurship and innovation. In Proceedings of the 2017 IEEE 19th Conference on Business Informatics (CBI’17). IEEE, Los Alamitos, CA, 398–407. https://doi.org/10.1109/CBI.2017.51
[24]
Pradap Konda, Sanjib Das, Paul Suganthan, AnHai Doan, Adel Ardalan, Jeffrey R. Ballard, Han Li, et al. 2016. Magellan: Toward building entity matching management systems. Proceedings of the VLDB Endowment 9, 12 (2016), 1197–1208.
[25]
Maurizio Lenzerini. 2002. Data integration: A theoretical perspective. In Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM, New York, NY, 233–246.
[26]
I. Lopez de Vallejo, S. Scerri, and T. Tuikka. 2020. Towards a European-Governed Data Sharing Space. Technical Report. BDVA, Brussels, Belgium.
[27]
Marcelo Iury S. Oliveira and Bernadette Farias Lóscio. 2018. What is a data ecosystem?. In Proceedings of the 19th Annual International Conference on Digital Government Research: Governance in the Data Age (DGO’18). ACM, New York, NY, Article 74, 9 pages.
[28]
Boris Otto and Matthias Jarke. 2019. Designing a multi-sided data platform: Findings from the international data spaces case. Electronic Markets 29, 4 (Oct. 2019), 561–580. https://doi.org/10.1007/s12525-019-00362-x
[29]
Boris Otto, Dominik Lis, Jan Jurjens, Jan Cirullies, Falk Howar, Sven Meister, Markus Spiekermann, et al. 2019. Data Ecosystems—Conceptual Foundations, Constituents and Recommendations for Action. Technical Report. Fraunhofer ISST.
[30]
Rufus Pollock. 2011. Building the (Open) Data Ecosystem. Retrieved November 4, 2021 from https://blog.okfn.org/2011/03/31/building-the-open-data-ecosystem/.
[31]
Thomas C. Redman. 2001. Data Quality: The Field Guide. Digital Press.
[32]
Theodoros Rekatsinas, Sudeepa Roy, Manasi Vartak, Ce Zhang, and Neoklis Polyzotis. 2019. Opportunities for data management research in the era of horizontal AI/ML. Proceedings of the VLDB Endowment 12, 12 (2019), 2323–2324.
[33]
Ahmad Sakor, Kuldeep Singh, Anery Patel, and Maria-Esther Vidal. 2020. Falcon 2.0: An entity and relation linking tool over wikidata. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM’20). ACM, New York, NY, 3141–3148.
[34]
Xiaolan Wang, Laura M. Haas, and Alexandra Meliou. 2018. Explaining data integration. IEEE Data Engineering Bulletin 41, 2 (2018), 47–58.
[35]
Ruojing Zhang, Marta Indulska, and Shazia W. Sadiq. 2019. Discovering data quality problems—The case of repurposed data. Business & Information Systems Engineering 61, 5 (2019), 575–593. https://doi.org/10.1007/s12599-019-00608-0

Cited By

View all
  • (2024)Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystemsSSRN Electronic Journal10.2139/ssrn.4831881Online publication date: 2024
  • (2024)Towards FAIR Data Stream Processing EcosystemsProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3672434(203-206)Online publication date: 24-Jun-2024
  • (2024)A Unified Data Ontology for Demand-driven Data Sharing in the DOA-based Data Ecosystem2024 IEEE International Conference on Web Services (ICWS)10.1109/ICWS62655.2024.00030(108-115)Online publication date: 7-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of Data and Information Quality
Journal of Data and Information Quality  Volume 14, Issue 1
March 2022
61 pages
ISSN:1936-1955
EISSN:1936-1963
DOI:10.1145/3505184
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 December 2021
Accepted: 01 May 2021
Received: 01 May 2021
Published in JDIQ Volume 14, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data transparency
  2. data ecosystems
  3. data quality
  4. trustability

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • MUR-PRIN
  • H2020-EU.2.1.1
  • EU H2020
  • CLARIFY
  • German Innovation Fund
  • German Federal Ministry of Education and Research (BMBF)
  • Fraunhofer Cluster of Excellence “Cognitive Internet Technologies” (CCIT)
  • Deutsche Forschungsgemeinschaft (DFG) under Germany’s Excellence Strategy - EXC-2023 Internet of Production

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)552
  • Downloads (Last 6 weeks)51
Reflects downloads up to 14 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Understanding the development of public data ecosystems: from a conceptual model to a six-generation model of the evolution of public data ecosystemsSSRN Electronic Journal10.2139/ssrn.4831881Online publication date: 2024
  • (2024)Towards FAIR Data Stream Processing EcosystemsProceedings of the 18th ACM International Conference on Distributed and Event-based Systems10.1145/3629104.3672434(203-206)Online publication date: 24-Jun-2024
  • (2024)A Unified Data Ontology for Demand-driven Data Sharing in the DOA-based Data Ecosystem2024 IEEE International Conference on Web Services (ICWS)10.1109/ICWS62655.2024.00030(108-115)Online publication date: 7-Jul-2024
  • (2024)Understanding the development of public data ecosystems: From a conceptual model to a six-generation model of the evolution of public data ecosystemsTelematics and Informatics10.1016/j.tele.2024.10219094(102190)Online publication date: Oct-2024
  • (2024)A CONCEPTUAL FRAMEWORK FOR THE GOVERNMENT BIG DATA ECOSYSTEM (‘datagov.eco’)Data & Knowledge Engineering10.1016/j.datak.2024.102348(102348)Online publication date: Sep-2024
  • (2024)A design theory for data quality tools in data ecosystems: Findings from three industry casesData & Knowledge Engineering10.1016/j.datak.2024.102333153(102333)Online publication date: Sep-2024
  • (2024)The unresolved need for dependable guarantees on security, sovereignty, and trust in data ecosystemsData & Knowledge Engineering10.1016/j.datak.2024.102301151:COnline publication date: 1-May-2024
  • (2024)Issues in inter-organizational data sharingData & Knowledge Engineering10.1016/j.datak.2024.102280150:COnline publication date: 2-Jul-2024
  • (2024)Industrial data ecosystems and data spacesElectronic Markets10.1007/s12525-024-00724-034:1Online publication date: 6-Aug-2024
  • (2024)From the Evolution of Public Data Ecosystems to the Evolving Horizons of the Forward-Looking Intelligent Public Data Ecosystem Empowered by Emerging TechnologiesElectronic Government10.1007/978-3-031-70274-7_25(402-418)Online publication date: 3-Sep-2024
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media