Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
editorial
Free access

Editorial: Special Issue on Data Transparency—Uses Cases and Applications

Published: 11 February 2022 Publication History

Introduction

Advances in Artificial Intelligence (AI) and mobile and Internet technologies have been progressively reshaping our lives over the past few years. The applications of the Internet of Things and cyber-physical systems today touch almost all aspects of our daily lives, including healthcare (e.g., remote patient monitoring environments), leisure (e.g., smart entertainment spaces), and work (e.g., smart manufacturing and asset management). For many of us, social media have become the rule rather than the exception as the way to interact, socialize, and exchange information. AI-powered systems have become a reality and started to affect our lives in important ways. These systems and services collect huge amounts of data about us and exploit it for various purposes that could affect our lives positively or negatively. Even though most of these systems claim to abide by data protection regulations and ethics, data misuse incidents keep making the headlines.
In this new digital world, data transparency for end users is becoming a fundamental aspect to consider when designing, implementing, and deploying a system, service, or software [1, 3, 4]. Transparency allows users to track down and follow how their data are collected, transmitted, stored, processed, exploited, and serviced. It also allows them to verify how fairly they are treated by algorithms, software, and systems that affect their lives.
Data transparency is a complex concept that is interpreted and approached in different ways by different research communities and bodies. A comprehensive definition of data transparency is proposed by Bertino et al. as “the ability of subjects to effectively gain access to all information related to data used in processes and decisions that affect the subjects” [2].
The term subject refers to four categories of users, including data participants, data victims, data users, and processers (or curators). These categories are concerned with different facets or dimensions of data transparency as follows. Data participants are subjects whose data are collected and processed by the system. Data transparency can provide this category with information about how their data are collected, processed, exploited (i.e., for what purposes), and provisioned (to whom, when, and why). They can also verify how their data and privacy are protected. Data victims are affected by decisions (or recommendations) made by the system regardless of whether those decisions were made based on their own data or on the data of other people or groups. Data transparency provides this category of users with explanations about how decisions were made and how fair the system is toward them compared to other groups of users. Data users are those who exploit the decisions made by the system and need to assess the trustworthiness of the system (and its decisions). Data transparency can provide comprehensive information to this category of users about the provenance of the data that are exploited by the system to reach decisions as well as about its quality. Data processors (or curators) are those who manage data and are legally required to ensure that data are used in accordance with data protection regulations and ethical norms.
The importance of data transparency has also been underlined by recent data protection regulations and reforms such as the EU General Data Protection Regulation (GDPR) [5] and the California Consumer Privacy Act of 2018 (CCPA or AB-375) [6]. Under these regulations, data controllers are required to inform data subjects before collecting their data and to clearly explain the purpose of collecting data and how data will be processed upon data subjects’ requests (“right to explanation” and “right to non-discrimination”).

IN This Special Issue

This special issue collected recent advances, innovations, and practices in software and data engineering for ensuring data transparency along its different facets and dimensions, including record transparency, use, disclosure and provision transparency, algorithm and decision-process transparency, law, and policy transparency. A total of 8 submissions were accepted from 19 submitted articles. We organized the accepted articles in two parts.
The articles in this part of the special issue address important aspects of data transparency in real use-cases and applications, including transparency and fairness in police practices (e.g., recidivism predication, and suspects control), privacy and transparency in financial systems (e.g., loan services), and mining public opinions from social media.
The article “Data Transparency and Fairness Analysis of the NYPD Stop-and-Frisk Program,” by Badr and Sharma, presents an interesting study of how automated AI programs can be analyzed to determine their fairness and transparency, especially when those programs affect the lives of people such as those used by police authorities to control suspects and predict recidivism, crimes, and terrorism. The article presents a predictive analytics method, including bias metrics and bias mitigation techniques, to analyze the public datasets of New York police authorities and discover whether their arrest practices are fair toward minorities.
The article “Achieving Transparency Report Privacy in Linear Time,” by Chen et al., addresses the problem of establishing privacy-preserving transparency reports to justify how and why decisions are made by automated decision processes and algorithms and whether they are fair to certain individuals or groups. The article explores the tradeoff between privacy and transparency and proposes a data perturbation technique that can be tuned to provide a maximum privacy protection (for data subjects) under certain transparency constraints or a transparency indicator (relative to the justifications published about the decisions made) given a threshold of privacy protection that must be respected.
The article “Estimating Degradation of Machine Learning Data Assets,” by Mauri and Damiani, proposes a quantitative technique and measures the quality of datasets used by AI models. This is an important problem, since low-quality data should not be used by AI models to compute important decisions concerning people. The article also discusses the strategies for alleviating the degradation of datasets and advocates for the use of compensation data that would counterbalance the damage caused by spurious data in a dataset.
The article “Transparent Aspect-Level Sentiment Analysis Based on Dependency Syntax Analysis and Its Application on COVID-19”, by Wang et al., proposes an integrated framework to analyze the sentiments of people on social media and provide decision makers with a clear picture of the public opinion. The article deals with transparency on the data level by proposing indicators about how and why data were collected on social media and then how they are stored, processed, and analyzed to infer the public opinion.

References

[1]
Elisa Bertino, Shawn Merrill, Alina Nesen, and Christine Utz. 2019. Redefining data transparency: A multidimensional approach. Computer 52, 1 (2019), 16–26.
[2]
Elisa Bertino. 2020. The quest for data transparency. IEEE Secur. Priv. 18, 3 (2020), 67–68.
[3]
Simson L. Garfinkel, Jeanna N. Matthews, Stuart S. Shapiro, and Jonathan M. Smith. 2017. Toward algorithmic transparency and accountability. Commun. ACM 60, 9 (2017), 5.
[4]
Serge Abiteboul and Julia Stoyanovich. 2019. Transparency, fairness, data protection, neutrality: Data management challenges in the face of new regulation. ACM J. Data Inf. Qual. 11, 3 (2019), 15:1–15:9
[5]
The European Union. 2016. Regulation (EU) 2016/680: General Data Protection Regulation (GDPR). Retrieved August 30, 2021 from https://gdpr-info.eu/.
[6]
The California State Legislature. 2018. The California Consumer Privacy Act (CCPA), 2018. Retrieved August 30, 2021 from https://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=201720180AB375.

Cited By

View all
  • (2023)Distributed Cooperative Coevolution of Data Publishing Privacy and TransparencyACM Transactions on Knowledge Discovery from Data10.1145/361396218:1(1-23)Online publication date: 6-Sep-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of Data and Information Quality
Journal of Data and Information Quality  Volume 14, Issue 2
June 2022
150 pages
ISSN:1936-1955
EISSN:1936-1963
DOI:10.1145/3505186
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 February 2022
Published in JDIQ Volume 14, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data transparency
  2. data provenance
  3. data quality and annotation
  4. privacy
  5. accountability
  6. fairness

Qualifiers

  • Editorial
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)184
  • Downloads (Last 6 weeks)41
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Distributed Cooperative Coevolution of Data Publishing Privacy and TransparencyACM Transactions on Knowledge Discovery from Data10.1145/361396218:1(1-23)Online publication date: 6-Sep-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media