research-article

Open access

Responsible data management

Authors:

Julia Stoyanovich,

Serge Abiteboul,

H. V. Jagadish, and

Sebastian SchelterAuthors Info & Claims

Communications of the ACM, Volume 65, Issue 6

Pages 64 - 74

https://doi.org/10.1145/3488717

Published: 20 May 2022 Publication History

All formats PDF

Abstract

Perspectives on the role and responsibility of the data-management research community in designing, developing, using, and overseeing automated decision systems.

References

[1]

Abiteboul, S. and Stoyanovich, J. Transparency, fairness, data protection, neutrality: Data management challenges in the face of new regulation. J. of Data and Information Quality 11, 3 (2019), 15:1--15:9.

Digital Library

[2]

Asudeh, A., Jin, Z., and Jagadish, H.V. Assessing and remedying coverage for a given dataset. In 35^th IEEE International Conference on Data Engineering (April 2019), 554--565.

[3]

Baeza-Yates, R. Bias on the web. Communications of the ACM 61, 6 (2018), 54--61.

Digital Library

[4]

Biessmann, F., Salinas, D., Schelter, S., Schmidt, P., and Lange, D. Deep learning for missing value imputation in tables with non-numerical data. In Proceedings of the 27^th ACM Intern. Conf. on Information and Knowledge Management (2018), 2017--2025.

[5]

Bogen, M. and Rieke, A. Help wanted: An examination of hiring algorithms, equity, and bias. Upturn (2018).

[6]

Cauwenberghs, G. and Poggio, T. Incremental and decremental support vector machine learning. NeurIPS (2001), 409--415.

[7]

Chen, I., Johansson, F., and Sontag, D. Why is my classifier discriminatory? S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 3543--3554.

[8]

Chouldechova, A. and Roth, A. A snapshot of the frontiers of fairness in machine learning. Communications of the ACM 63, 5 (2020), 82--89.

Digital Library

[9]

Crenshaw, K. Demarginalizing the intersection of race and sex: A Black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. University of Chicago Legal Forum 1 (1989), 139--167.

[10]

Datta, A., Sen, S., and Zick, Y. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In IEEE Symposium on Security and Privacy (May 2016), 598--617.

[11]

Friedler, S., Scheidegger, C., and Venkatasubramanian, S. The (im)possibility of fairness: Different value systems require different mechanisms for fair decision making. Communications of the ACM 64, 4 (2021), 136--143.

Digital Library

[12]

Friedman, B. and Nissenbaum, H. Bias in computer systems. ACM Transactions on Information Systems 14, 3 (1996), 330--347.

Digital Library

[13]

Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J., Wallach, H., Daumé III, H., and Crawford, K. Datasheets for datasets. CoRR (2018), abs/1803.09010.

[14]

Ginart, A., Guan, M., Valiant, G., and Zou, J. Making AI forget you: Data deletion in machine learning. In NeurIPS (2019), 3513--3526.

[15]

Grafberger, S., Stoyanovich, J., and Schelter, S. Lightweight inspection of data preprocessing in native machine learning pipelines. In 11^th Conf. on Innovative Data Sys. Research, Online Proceedings (January 2021), http://www.cidrdb.org.

[16]

Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., and Pedreschi, D. A survey of methods for explaining black box models. ACM Computing Surveys 51, 5 (2019), 93:1--93:42.

Digital Library

[17]

Herschel, M., Diestelkämper, R., and Ben Lahmar, H. A survey on provenance: What for? What form? What from? VLDB Journal 26, 6 (2017), 881--906.

Digital Library

[18]

Holland, S., Hosny, A., Newman, S., Joseph, J., and Chmielinski, K. The dataset nutrition label: A framework to drive higher data quality standards. CoRR (2018), abs/1805.03677.

[19]

Jagadish, H.V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J., Ramakrishnan, R., and Shahabi, C. Big data and its technical challenges. Communications of the ACM 57, 7 (2014), 86--94.

Digital Library

[20]

Kappelhof, J. Survey research and the quality of survey data among ethnic minorities. In Total Survey Error in Practice, Wiley (2017).

[21]

Kilbertus, N., Carulla, M., Parascandolo, G., Hardt, M., Janzing, D., and Schölkopf, B. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems (2017), 656--666.

Digital Library

[22]

Kusner, M., Loftus, J., Russell, C., and Silva, R. Counterfactual fairness. I. Guyon, U. von Luxburg, S. Bengio, H.M. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, In Advances in Neural Information Processing Systems 30: (2017), 4066--4076.

[23]

Lehr, D. and Ohm, P. Playing with the data: What legal scholars should learn about machine learning. UC Davis Law Review 51, 2 (2017), 653--717.

[24]

Lewis, A. and Stoyanovich, J. Teaching responsible data science. Intern. J. of Artificial Intelligence in Education (2021).

[25]

Mitchell, M., et al. Model cards for model reporting. In Proceedings of the Conf. on Fairness, Accountability, and Transparency 2019, 220--229.

Digital Library

[26]

Olteanu, A., Castillo, C., Diaz, F., and Kiciman, E. Social data: Biases, methodological pitfalls, and ethical boundaries. Frontiers Big Data 2, 13 (2019).

[27]

Rabanser, S., Günnemann, S., and Lipton, Z. Failing loudly: An empirical study of methods for detecting dataset shift. H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Gannett, editors. In Advances in Neural Information Processing Systems 32 (December 2019), 1394--1406.

[28]

Reeves, R. and Halikias, D. Race gaps in SAT scores highlight inequality and hinder upward mobility. Brookings (2017), https://www.brookings.edu/research/race-gaps-in-sat-scores-highlight-inequality-and-hinder-upward-mobility.

[29]

Salimi, B., Rodriguez, L., Howe, B., and Suciu, D. Interventional fairness: Causal database repair for algorithmic fairness. P.A. Boncz, S. Manegold, A. Ailamaki, A. Deshpande, and T. Kraska, editors. In Proceedings of the 2019 Intern. Conf. on Management of Data, 793--810.

[30]

Sarkar, S., Papon, T., Staratzis, D., and Athanassoulis, M. Lethe: A tunable delete-aware LSM engine. In Proceedings of the 2020 Intern. Conf. on Management of Data.

[31]

Schelter, S. "Amnesia"--a selection of machine learning models that can forget user data very fast. Conf. on Innovative Data Systems Research, 2020.

[32]

Schelter, S., Grafberger, S., and Dunning, T. HedgeCut: Maintaining randomised trees for low-latency machine unlearning. In Proceedings of the 2021 Intern. Conf. on Management of Data.

[33]

Schelter, S. and Stoyanovich, J. Taming technical bias in machine learning pipelines. IEEE Data Engineering Bulletin 43, 4 (2020).

[34]

Selbst, A. Disparate impact in big data policing. Georgia Law Review 52, 109 (2017).

[35]

Shastri, S., Banakar, V., Wasserman, M., Kumar, A., and Chidambaram, V. Understanding and benchmarking the impact of GDPR on database systems. PVLDB (2020).

[36]

Stoyanovich, J. and Howe, B. Nutritional labels for data and models. IEEE Data Engineering Bulletin 42, 3 (2019), 13--23.

[37]

Stoyanovich, J., Howe, B., and Jagadish, H.V. Responsible data management. In Proceedings of the VLDB Endowment 13, 12 (2020), 3474--3488.

Digital Library

[38]

Yang, K., Loftus, J., and Stoyanovich, J. Causal intersectionality and fair ranking. K. Ligett and S. Gupta, editors. In 2^nd Symposium on Foundations of Responsible Computing, Volume 192 of LIPICS, Schloss Dagstuhl--Leibniz Center for Informatics (June 2021), 7:1--7:20.

[39]

Yang, K., Stoyanovich, J., Asudeh, A., Howe, B., Jagadish, H.V., and Miklau, G. A nutritional label for rankings. G. Das, C. Jermaine, and P. Bernstein, editors. In Proceedings of the 2018 Intern. Conf. on Management of Data, 1773--1776.

[40]

Zehlike, M., Yang, K., and Stoyanovich, J. Fairness in ranking: A survey. CoRR (2021), abs/2103.14000.

Cited By

Shahbazi NSintos SAsudeh A(2024)FairHash: A Fair and Memory/Time-efficient HashmapProceedings of the ACM on Management of Data10.1145/36549392:3(1-29)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654939
Grafberger SGroth PSchelter S(2024)Towards Interactively Improving ML Data Preparation Code via "Shadow Pipelines"Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663327(7-11)Online publication date: 9-Jun-2024
https://doi.org/10.1145/3650203.3663327
Asudeh AGalhotra SGilad ASalimi BYoungmann BBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)First Workshop on Governance, Understanding and Integration of Data for Effective and Responsible AI (GUIDE-AI)Companion of the 2024 International Conference on Management of Data10.1145/3626246.3655019(661-662)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3655019
Show More Cited By

Index Terms

Responsible data management

Recommendations

OM Forum—A Vision of Responsible Research in Operations Management
Are we contributing positively to the society at large by research that we conduct in the field of Operations Management? Currently, the answer is probably closer to “no” than to “yes.” We often do not realize it, but there is very real cost of conducting ...
Read More
Data management and model management: a relational synthesis
ACM-SE 20: Proceedings of the 20th annual Southeast regional conference

The successful implementation of data base management systems has led to suggestions that similar systems, called model management systems, be developed for decision models to facilitate and control user access to models and to integrate sets of models. ...
Read More
The DAMA Guide to the Data Management Body of Knowledge
Read More

Comments

Information & Contributors

Information

Published In

cover image Communications of the ACM

Communications of the ACM Volume 65, Issue 6

June 2022

98 pages

ISSN:0001-0782

EISSN:1557-7317

DOI:10.1145/3538687

Editor:
Andrew A. Chien
Association for Computing Machinery, New York, NY

Issue’s Table of Contents

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 May 2022

Published in CACM Volume 65, Issue 6

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Popular
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
16,449
Total Downloads

Downloads (Last 12 months)2,225
Downloads (Last 6 weeks)167

Other Metrics

View Author Metrics

Citations

Cited By

Shahbazi NSintos SAsudeh A(2024)FairHash: A Fair and Memory/Time-efficient HashmapProceedings of the ACM on Management of Data10.1145/36549392:3(1-29)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654939
Grafberger SGroth PSchelter S(2024)Towards Interactively Improving ML Data Preparation Code via "Shadow Pipelines"Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663327(7-11)Online publication date: 9-Jun-2024
https://doi.org/10.1145/3650203.3663327
Asudeh AGalhotra SGilad ASalimi BYoungmann BBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)First Workshop on Governance, Understanding and Integration of Data for Effective and Responsible AI (GUIDE-AI)Companion of the 2024 International Conference on Management of Data10.1145/3626246.3655019(661-662)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3655019
Alvarez JColmenarejo AElobaid AFabbrizzi SFahimi MFerrara AGhodsi SMougan CPapageorgiou IReyero PRusso MScott KState LZhao XRuggieri S(2024)Policy advice and best practices on bias and fairness in AIEthics and Information Technology10.1007/s10676-024-09746-w26:2Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1007/s10676-024-09746-w
Jamil G(2023)A Scientific Field in FormationEnhancing Business Communications and Collaboration Through Data Science Applications10.4018/978-1-6684-6786-2.ch004(60-82)Online publication date: 28-Apr-2023
https://doi.org/10.4018/978-1-6684-6786-2.ch004
Do KPang RJiang JReinecke K(2023)“That’s important, but...”: How Computer Science Researchers Anticipate Unintended Consequences of Their Research InnovationsProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581347(1-16)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3581347
Sloane MSolano-Kamaiko IYuan JDasgupta AStoyanovich J(2023)Introducing contextual transparency for automated decision systemsNature Machine Intelligence10.1038/s42256-023-00623-75:3(187-195)Online publication date: 13-Mar-2023
https://doi.org/10.1038/s42256-023-00623-7
Pelliccione PBuhnova BGottschalk SWeber IEngels G(2023)Architecting and Engineering Value-Based EcosystemsSoftware Architecture10.1007/978-3-031-36847-9_3(41-68)Online publication date: 3-Jun-2023
https://doi.org/10.1007/978-3-031-36847-9_3
Medeiros C(2022)Dados, Algoritmos, Máquinas E PessoasComputação Brasil10.5753/compbr.2022.47.4400(11-14)Online publication date: 1-Jul-2022
https://doi.org/10.5753/compbr.2022.47.4400
Swift IEbrahimi SNova AAsudeh A(2022)Maximizing fair content spread via edge suggestion in social networksProceedings of the VLDB Endowment10.14778/3551793.355182415:11(2692-2705)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.14778/3551793.3551824

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents