Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
editorial
Free access

Editorial: Special Issue on Data Transparency—Data Quality, Annotation, and Provenance

Published: 02 February 2022 Publication History

Introduction

Advances in Artificial Intelligence (AI) and mobile and Internet technologies have been progressively reshaping our lives over the past few years. The applications of the Internet of Things and cyber-physical systems today touch almost all aspects of our daily lives, including healthcare (e.g., remote patient monitoring environments), leisure (e.g., smart entertainment spaces), and work (e.g., smart manufacturing, asset management). For many of us, social media have become the rule rather than the exception as the way to interact, socialize, and exchange information. AI-powered systems have become a reality and started to affect our lives in important ways. These systems and services collect huge amounts of data about us and exploit it for various purposes that could affect positively or negatively our lives. Even though most of these systems claim to abide by data protection regulations and ethics, data misuse incidents keep making the headlines.
In this new digital world, data transparency for end users is becoming a fundamental aspect to consider when designing, implementing, and deploying a system, service, or software [1, 3, 4]. Transparency allows users to track down and follow how their data are collected, transmitted, stored, processed, exploited, and serviced. It also allows them to verify how fairly they are treated by algorithms, software, and systems that affect their lives.
Data transparency is a complex concept that is interpreted and approached in different ways by different research communities and bodies. A comprehensive definition of data transparency is proposed by Bertino et al. as “the ability of subjects to effectively gain access to all information related to data used in processes and decisions that affect the subjects” [2].
The term subject refers to four categories of users, including data participants, data victims, data users, and processers (or curators). These categories are concerned with different facets or dimensions of data transparency as follows. Data participants are subjects whose data are collected and processed by the system. Data transparency can provide this category with information about how their data are collected, processed, exploited (i.e., for what purposes), and provisioned (to whom, when, and why). They can also verify how their data and privacy are protected. Data victims are affected by decisions (or recommendations) made by the system regardless of whether those decisions were made based on their own data or on the data of other people or groups. Data transparency provides this category of users with explanations about how decisions were made and how fair the system is toward them compared to other groups of users. Data users are those who exploit the decisions made by the system and need to assess the trustworthiness of the system (and its decisions). Data transparency can provide comprehensive information to this category of users about the provenance of the data that are exploited by the system to reach decisions as well as about its quality. Data processors (or curators) are those who manage data and are legally required to ensure that data are used in accordance with data protection regulations and ethical norms.
The importance of data transparency has also been underlined by recent data protection regulations and reforms such as the EU General Data Protection Regulation [5] and the California Consumer Privacy Act of 2018 (CCPA or AB-375) [6]. Under these regulations, data controllers are required to inform data subjects before collecting their data and to clearly explain the purpose of collecting data and how data will be processed upon data subjects’ requests (“right to explanation” and “right to non-discrimination”).

IN This Special Issue

This special issue collected recent advances, innovations, and practices in software and data engineering for ensuring data transparency along its different facets and dimensions, including record transparency, use, disclosure and provision transparency, algorithm and decision-process transparency, law, and policy transparency. A total of 8 submissions were accepted from 19 submitted articles. We organized the accepted articles in two parts.
The papers in this part of the special issue address issues on foundational aspects of data transparency, including topics such as data annotation as a key enabler for transparency, annotation challenges and architectures, data quality and provenance, and the potentials of blockchain for data transparency.
The article “Automated Annotations for AI Data Transparency,” by Thirumuruganathan et al., explores the importance and the necessity of annotations to ensure a responsible and transparent use of datasets and AI models by and within modern enterprises. Annotations can be exploited for various purposes, such as ensuring policy compliance, quality monitoring, data discovery and search, informing the users of AI models about the capabilities, and the potential pitfalls of their models. Recognizing the benefits of annotations, the article proposes a reference architecture to automatically compute, manage, and consume annotations on top of an enterprise's data lakes.
The article “Knowledge-driven Data Ecosystems towards Data Transparency,” by Geisler et al., identifies and explores the different data quality and transparency challenges (and requirements) that Data Ecosystems should address (and meet) to be trusted for their users and stakeholders. Data Ecosystems are defined as “set of networks composed of autonomous actors, which consume, produce, or provide data or other related resources.” They involve datasets, operators to manage the datasets, metadata, and mappings to describe and ensure the interoperability among datasets. The article proposes a novel knowledge-driven data ecosystem architecture, equipped with knowledge-driven services and functions to satisfy the analyzed data transparency requirements and challenges.
The article “On the Anonymization of Workflow Provenance without Compromising the Transparency of Lineage,” by Belhajjame, addresses the tradeoff between privacy and transparency in scientific workflows that manipulate privacy-sensitive datasets. The article proposes a solution to anonymize the provenance information in a workflow in such a way to protect the privacy of data subjects while keeping the provenance information useful for scientists who try to interpret the results obtained by the workflow or verify its validity. The solution is inspired from mature data anonymization techniques including k-anonymity and l-diversity.
The article “Integration of Blockchain with Connected and Autonomous Vehicles: Vision and Challenge,” by Dargahi et al., explores the benefits and challenges that the use of blockchain would bring to the field of Intelligent Transportation Systems (ITS). Specifically, the article focuses on how blockchain can increase the transparency of ITSs relative to how personal data are collected, stored, processed, and shared among the different stakeholders within an ITS. It also discusses the tradeoff between privacy and transparency in blockchain-based intelligent transportation systems and points out some key research directions to ensure privacy protection.

References

[1]
Elisa Bertino, Shawn Merrill, and Alina Nesen. 2019. Christine Utz: Redefining data transparency: A multidimensional approach. Computer 52, 1 (2019), 16–26.
[2]
Elisa Bertino. 2020. The Quest for Data Transparency. IEEE Secur. Privacy 18, 3 (2020), 67–68.
[3]
Simson L. Garfinkel, Jeanna N. Matthews, Stuart S. Shapiro, and Jonathan M. Smith. 2017. Toward algorithmic transparency and accountability. Commun. ACM 60, 9 (2017), 5.
[4]
Serge Abiteboul and Julia Stoyanovich. 2019. Transparency, fairness, data protection, neutrality: Data management challenges in the face of new regulation. ACM J. Data Inf. Qual. 11, 3 (2019), 15 1–15:9.
[5]
The European Union. 2016. Regulation (EU) 2016/680: General Data Protection Regulation (GDPR). Retrieved August 30, 2021 from https://gdpr-info.eu/.
[6]
The California State Legislature. 2018. The California Consumer Privacy Act (CCPA), 2018. Retrieved August 30, 2021 from https://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=201720180AB375.

Cited By

View all
  • (2024)Blockchain, artificial intelligence, and healthcare: the tripod of future—a narrative reviewArtificial Intelligence Review10.1007/s10462-024-10873-557:9Online publication date: 8-Aug-2024
  • (2023)Distributed Cooperative Coevolution of Data Publishing Privacy and TransparencyACM Transactions on Knowledge Discovery from Data10.1145/361396218:1(1-23)Online publication date: 6-Sep-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of Data and Information Quality
Journal of Data and Information Quality  Volume 14, Issue 1
March 2022
61 pages
ISSN:1936-1955
EISSN:1936-1963
DOI:10.1145/3505184
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 February 2022
Published in JDIQ Volume 14, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data transparency
  2. data provenance
  3. data quality and annotation
  4. privacy
  5. accountability
  6. fairness

Qualifiers

  • Editorial
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)329
  • Downloads (Last 6 weeks)48
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Blockchain, artificial intelligence, and healthcare: the tripod of future—a narrative reviewArtificial Intelligence Review10.1007/s10462-024-10873-557:9Online publication date: 8-Aug-2024
  • (2023)Distributed Cooperative Coevolution of Data Publishing Privacy and TransparencyACM Transactions on Knowledge Discovery from Data10.1145/361396218:1(1-23)Online publication date: 6-Sep-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media