Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Free access

The Principles of Data-Centric AI

Published: 25 July 2023 Publication History

Abstract

Uniting data-centric perspectives and concepts to trace the foundations of DCAI.

References

[1]
Ahmad, S., Lavin, A., Purdy, S., and Agha, Z. Unsupervised real-time anomaly detection for streaming data. Neurocomputing 262, (Nov. 2017), 134--147.
[2]
Amershi, S. et al. Guidelines for Human-AI Interaction. In Proceedings of the 2019 CHI Conf. Human Factors in Computing Systems. ACM, New York, NY, USA, 1--13.
[3]
Anandkumar, A. Data Augmentation. Data-centric AI Resource Hub, 2021; https://datacentricai.org/data-augmentation/
[4]
Aragon, C., Guha, S., Kogan, M., Muller, M., and Neff, G. Human-Centered Data Science: An Introduction. MIT Press. Cambridge, MA, USA, 2022.
[5]
Aroyo, L., Lease, M., Paritosh, P., and Schaekermann, M. Data excellence for AI: why should you care? Interactions 29, 2 (Feb. 2022), 66--69.
[6]
Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. (Dec. 2018)
[7]
Bernstein, M. Labeling and crowdsourcing. Data-centric AI Resource Hub, 2022; https://datacentricai.org/
[8]
Boehm, M., Kumar, A., and Yang, J. Data management in machine learning systems. Synthesis Lectures on Data Management 14, 1 (Feb. 2019), 1--173.
[9]
Buneman, P., Khanna, S., and Tan, W.C. Data provenance: Some basic issues. In Proceedings of Foundations of Software Technology and Theoretical Computer Science. Springer Berlin Heidelberg, 2000, 87--93.
[10]
Chancellor, S., Baumer, E.P.S., and De Choudhury. M. Who is the "human" in human-centered machine learning? In Proceedings of the ACM Hum. Comput. Interact. 3, (Nov. 2019), 1--32.
[11]
Chen, A. et al. Developments in MLflow: A system to accelerate the machine learning lifecycle. In Proceedings of the 4th Intern. Workshop on Data Management for End-to-End Machine Learning, 2020. ACM, New York, NY, USA, 1--4.
[12]
DVC. Open-source version control system for machine learning projects. iterative.ai., 2022; https://dvc.org/
[13]
Ehsan, U. and Riedl, M.O Human-centered explainable AI: Towards a reflective sociotechnical approach. HCI International 2020---Late Breaking Papers: Muttimodatity and Intelligence. Springer International Publishing, 449--466.
[14]
Feinberg, M. A design perspective on data. In Proceedings of the 2017 Conf. Human Factors in Computing Systems. ACM, New York, NY, USA, 2952--2963.
[15]
Frid-Adar, M., Klang, E., Amitai, M., Goldberger, J., and Greenspan, H. Synthetic data augmentation using GAN for improved liver lesion classification. In Proceedings of the 15th IEEE Intern. Symp. Biomedical Imaging. 2018, 289--293.
[16]
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J.W., Wallach, H., Daumé III, H., and Crawford, K. Datasheets for datasets. Commun. ACM 64, 11 (Nov. 2021), 86--92.
[17]
Gordon, M.L., Zhou, K., Patel, K., Hashimoto, T., and Bernstein, M.S. The disagreement deconvolution: Bringing machine learning performance metrics in line with reality. In Proceedings of the 2021 Conf. Human Factors in Computing Systems. ACM, New York, NY, USA, 1--14.
[18]
Gray, M.L. and Suri, S. Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass. Houghton Mifflin Harcourt, 2019.
[19]
Jarrahi, M.H., Savage, S., and Lease, M. Driving fairness in digital labor platforms. In Proceedings Against Platform Determinism Workshop. Data & Society, 2021.
[20]
Kurban, H., Sharma, P., and Dalkilic, M. Data expressiveness and its use in data-centric AI. In Proceedings of NeurIPS Data-Centric AI Workshop. 2021.
[21]
Lavitas, L., Redfield, O., Lee, A., Fletcher, D., Eck, M., and Janardhanan, S. Annotation quality framework-accuracy, credibility, and consistency. In Proceedings of NeurIPS Data-Centric AI Workshop. 2021; https://bit.ly/3H9vtuT
[22]
Lee, N.T., Resnick, P., and Barton, G. Algorithmic bias detection and mitigation: Best practices and policies to reduce consumer harms. Center for Technology Innovation, Brookings. Tillgänglig online, 2019; https://bit.ly/3LoyJ7l.
[23]
Lee, Y., Kwon, O.J., Lee, H., Kim, J., Lee, K., and Kim, K.E. Augment & valuate: A data enhancement pipeline for data-centric AI. NeurIPS Data-Centric AI Workshop. 2021; http://arxiv.org/abs/2112.03837
[24]
Liu, A., Soderland, S., Bragg, J., Lin, C.H., Ling, X., and Weld, D.S. Effective crowd annotation for relation extraction. In Proceedings of the 2016 Conf. North American Chapter of the ACL: Human Language Technologies. Assoc. Comput. Linguistics (San Diego, CA, USA, 2016), 897--906.
[25]
Lombard, M., Snyder-Duch, J., and Bracken, C.C. Content analysis in mass communication: Assessment and reporting of intercoder reliability. Hum. Commun. Res. 28, 4 (Oct. 2002), 587--604.
[26]
Mattson, P. and Paritosh, P. DataPerf: Benchmarking data for better ML. In Proceedings of the 35th Conf. Neural Information Processing Systems. 2021; https://bit.ly/3Lx1Mqx
[27]
Miranda, L. Towards data-centric machine learning: a short review. Lj Miranda's Blog. 2021; https://bit.ly/41EwAuT
[28]
Motamedi, M., Sakharnykh, N., and Kaldewey, T. A data-centric approach for training deep neural networks with less data. In Proceedings of the 35th Conf. Neural Information Processing Systems. 2021; http://arxiv.org/abs/2110.03613
[29]
Ng, A. A chat with Andrew on MLOps: From model-centric to data-centric AI. 2021; https://www.youtube.com/watch?v=06-AZXmwHjo&t=2109s
[30]
Ng, A., Laird, D., and He, L. Data-centric AI competition. Data-centric AI Resource Hub, 2021; https://https-deeplearning-ai.github.io/data-centric-comp/
[31]
Passi, S. and Jackson, S. Data vision: Learning to see through algorithmic abstraction. In Proceedings of the 2017 ACM Conf. Computer Supported Cooperative Work and Social Computing. ACM, New York, NY, USA, 2436--2447.
[32]
Pine, K.H. and Liboiron, M. The politics of measurement and action. In Proceedings of the 33rd Annual ACM Conf. Human Factors in Computing System. ACM, New York, NY, USA, 2015, 3147--3156.
[33]
Polyzotis, N. and Zaharia, M. What can data-centric AI learn from data and ML engineering? 2021, arXi; http://arxiv.org/abs/2112.06439
[34]
Rother, C., Kolmogorov, V., and Blake, A. GrabCut. ACM Trans. Graph. 23, 3 (Aug. 2004), 309--314.
[35]
Sambasivan, N., Kapania, S., Highfill, H., Akrong, D., Paritosh, P., and Aroyo, L.M. 'Everyone wants to do the model work, not the data work:' Data cascades in high-stakes AI. In Proceedings of the ACM Conf. Human Factors in Computing Systems. ACM, New York, NY, USA, 2021, 11--15.
[36]
Sambasivan, N. and Veeraraghavan, R. The deskilling of domain expertise in AI development. In Proceedings of the 2022 ACM Conf. Human Factors in Computing Systems. ACM, New York, NY, USA
[37]
Sculley, D. A data-centric view of technical debt in AI. Data-centric AI Resource Hub. 2022; https://datacentricai.org/data-in-deployment/
[38]
Shneiderman, B. Human-Centered AI. Oxford University Press, 2022.
[39]
Subramonyam, H., Seifert, C., and Adar, M.I.E. How can human-centered design shape data-centric AI? In Proceedings of NeurIPS Data-Centric AI Workshop. 2021; https://haridecoded.com/resources/AIX_NeurIPS_2021.pdf

Cited By

View all
  • (2024)A Data-Centric AI Approach to Extend ODD of Autonomous DrivingTransaction of the Korean Society of Automotive Engineers10.7467/KSAE.2024.32.3.28932:3(289-294)Online publication date: 1-Mar-2024
  • (2024)Um Farol para Criação e Avaliação de Cursos de Ciência de Dados: Os Referenciais Curriculares da SBCAnais do IV Simpósio Brasileiro de Educação em Computação (EDUCOMP 2024)10.5753/educomp.2024.237484(266-272)Online publication date: 22-Apr-2024
  • (2024)A Data-Centric AI Paradigm for Socio-Industrial and Global ChallengesElectronics10.3390/electronics1311215613:11(2156)Online publication date: 1-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Communications of the ACM
Communications of the ACM  Volume 66, Issue 8
August 2023
106 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/3610954
  • Editor:
  • James Larus
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2023
Published in CACM Volume 66, Issue 8

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,416
  • Downloads (Last 6 weeks)158
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Data-Centric AI Approach to Extend ODD of Autonomous DrivingTransaction of the Korean Society of Automotive Engineers10.7467/KSAE.2024.32.3.28932:3(289-294)Online publication date: 1-Mar-2024
  • (2024)Um Farol para Criação e Avaliação de Cursos de Ciência de Dados: Os Referenciais Curriculares da SBCAnais do IV Simpósio Brasileiro de Educação em Computação (EDUCOMP 2024)10.5753/educomp.2024.237484(266-272)Online publication date: 22-Apr-2024
  • (2024)A Data-Centric AI Paradigm for Socio-Industrial and Global ChallengesElectronics10.3390/electronics1311215613:11(2156)Online publication date: 1-Jun-2024
  • (2024)Materials data science using CRADLE: A distributed, data-centric approachMRS Communications10.1557/s43579-024-00616-6Online publication date: 29-Jul-2024
  • (2024)Data-related practices for creating Artificial Intelligence systems in K-12Proceedings of the 19th WiPSCE Conference on Primary and Secondary Computing Education Research10.1145/3677619.3678115(1-10)Online publication date: 16-Sep-2024
  • (2024)DLProv: A Data-Centric Support for Deep Learning Workflow AnalysesProceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning10.1145/3650203.3663337(77-85)Online publication date: 9-Jun-2024
  • (2024)What About the Data? A Mapping Study on Data Engineering for AI SystemsProceedings of the IEEE/ACM 3rd International Conference on AI Engineering - Software Engineering for AI10.1145/3644815.3644954(43-52)Online publication date: 14-Apr-2024
  • (2024)Are We Asking the Right Questions?: Designing for Community Stakeholders’ Interactions with AI in PolicingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642738(1-20)Online publication date: 11-May-2024
  • (2024)"This is not a data problem": Algorithms and Power in Public Higher Education in CanadaProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642451(1-14)Online publication date: 11-May-2024
  • (2024)A Human-Centered Review of Algorithms in Homelessness ResearchProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642392(1-15)Online publication date: 11-May-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media