Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
Open access

The Craft and Coordination of Data Curation: Complicating Workflow Views of Data Science

Published: 11 November 2022 Publication History


Data curation is the process of making a dataset fit-for-use and archivable. It is critical to data-intensive science because it makes complex data pipelines possible, studies reproducible, and data reusable. Yet the complexities of the hands-on, technical, and intellectual work of data curation is frequently overlooked or downplayed. Obscuring the work of data curation not only renders the labor and contributions of data curators invisible but also hides the impact that curators' work has on the later usability, reliability, and reproducibility of data. To better understand the work and impact of data curation, we conducted a close examination of data curation at a large social science data repository, the Inter-university Consortium for Political and Social Research (ICPSR). We asked: What does curatorial work entail at ICPSR, and what work is more or less visible to different stakeholders and in different contexts? And, how is that curatorial work coordinated across the organization? We triangulated accounts of data curation from interviews and records of curation in Jira tickets to develop a rich and detailed account of curatorial work. While we identified numerous curatorial actions performed by ICPSR curators, we also found that curators rely on a number of craft practices to perform their jobs. The reality of their work practices defies the rote sequence of events implied by many life cycle or workflow models. Further, we show that craft practices are needed to enact data curation best practices and standards. The craft that goes into data curation is often invisible to end users, but it is well recognized by ICPSR curators and their supervisors. Explicitly acknowledging and supporting data curators as craftspeople is important in creating sustainable and successful curatorial infrastructures.


Mark S. Ackerman and Christine Halverson. 1999. Organizational Memory: Processes, Boundary Objects, and Trajectories. In Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. Abstracts and CD-ROM of Full Papers (HICSS-32).
Katherine G. Akers and Jennifer Doty. 2013. Disciplinary Differences in Faculty Research Data Management Practices and Perspectives. Int. J. Digit. Curation 8, 2 (2013), 5--26.
Karen Baker and Florence Millerand. 2007. Articulation Work Supporting Information Infrastructure Design: Coordination, Categorization, and Assessment in Practice. In 2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07) (Waikoloa, HI, USA). IEEE, Piscataway, NJ, USA, 242a--242a. https://doi.org/10.1109/HICSS.2007.88
Alex Ball. 2012. Review of Data Management Lifecycle Models. University of Bath. http://opus.bath.ac.uk/28587/
Stephen R. Barley and Julian E. Orr. 1997. Introduction: The Neglected Workforce. In Between Craft and Science. Cornell University Press, Ithaca, NY, USA, 1--20.
William C. Barley, Jeffrey W. Treem, and Paul M. Leonardi. 2020. Experts at Coordination: Examining the Performance, Production, and Value of Process Expertise. J. Commun. 70, 1 (2020), 60--89.
Bionomia n.d. Bionomia. Retrieved July 14, 2021 from https://bionomia.net/
Libby Bishop. 1999. Visible and Invisible Work: The Emerging Post-Industrial Employment Relation. Comput. Support. Coop. Work 8, 1--2 (March 1999), 115--126.
Christine L. Borgman. 2015. Big Data, Little Data, No Data: Scholarship in the Networked World. MIT Press, Cambridge, MA, USA.
Christine L. Borgman, Andrea Scharnhorst, and Milena S. Golshan. 2019. Digital Data Archives as Knowledge Infrastructures: Mediating Data Sharing and Reuse. J. Assoc. Inf. Sci. Technol. 70, 8 (Aug. 2019), 888--904.
Geoffrey C. Bowker, Stefan Timmermans, and Susan Leigh Star. 1996. Infrastructure and Organizational Transformation: Classifying Nurses' Work. In Information Technology and Changes in Organizational Work (IFIP Advances in Information and Communication Technology), Wanda J. Orlikowski, Geoff Walsham, Matthew R. Jones, and Janice I. Degross (Eds.). Springer US, Boston, MA, USA, 344--370.
Tiffany C. Chao, Melissa H. Cragin, and Carole L. Palmer. 2015. Data Practices and Curation Vocabulary (DPCVocab): An Empirically Derived Framework of Scientific Data Practices and Curatorial Processes: Data Practices and Curation Vocabulary (DPCVocab). J. Assoc. Inf. Sci. Technol. 66, 3 (March 2015), 616--633.
Amy Cheatle and Steven J. Jackson. 2014. Digital Entanglements: Craft, Computation and Collaboration in Fine Art Furniture Production. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (Vancouver, BC, Canada). ACM, New York, NY, USA, 11. https://doi.org/10.1145/2675133.2675291
Melissa H. Cragin, Carole L. Palmer, Jacob R. Carlson, and Michael Witt. 2010. Data Sharing, Small Science and Institutional Repositories. Phil. Trans. R. Soc. A. 368, 1926 (Sept. 2010), 4023--4038.
Roderic Crooks and Morgan E. Currie. 2021. Numbers Will Not Save Us: Agonistic Data Practices. Inf. Soc. 37, 4 (2021). https://doi.org/10.1080/01972243.2021.1920081
Peter T. Darch, Ashley E. Sands, Christine L. Borgman, and Milena S. Golshan. 2020. Library Cultures of Data Curation: Adventures in Astronomy. J. Assoc. Inf. Sci. Tech. 71, 12 (2020), 1470--1483.
DataONE. 2015. Data Life Cycle. Retrieved July 14, 2021 from https://old.dataone.org/data-life-cycle
Christine Dearnley. 2005. A Reflection on the Use of Semi-structured Interviews. Nurse Res. 13, 1 (2005), 19--28.
Catherine D'Ignazio and Lauren F. Klein. 2020. Data Feminism. MIT Press, Cambridge, MA, USA.
Peter Dormer (Ed.). 1997. The Culture of Craft: Status and Future. Manchester University Press, Manchester, UK ; New York, NY.
Paul Dourish. 2001. Process Descriptions as Organisational Accounting Devices: The Dual Use of Workflow Technologies. In Proceedings of the 2001 International ACM SIGGROUP Conference on Supporting Group Work (Boulder, CO, USA). ACM, New York, NY, USA, 52--60.
Ingrid Erickson and Mohammad Hossein Jarrahi. 2016. Infrastructuring and the Challenge of Dynamic Seams in Mobile Knowledge Work. In Proceedings of the 19th ACM conference on Computer-Supported Cooperative Work & Social Computing (San Francisco, CA, USA). ACM, New York, NY, USA, 1323--1336.
Ixchel M. Faniel, Rebecca D. Frank, and Elizabeth Yakel. 2019. Context from the Data Reuser's Point of View. J. Doc. 75, 6 (2019), 1274--1297.
Ixchel M. Faniel and Ann Zimmerman. 2011. Beyond the Data Deluge: A Research Agenda for Large-Scale Data Sharing and Reuse. Int. J. Digit. Curation 6, 1 (March 2011), 58--69.
John L. Faundeen, Thomas E. Burley, Jennifer Carlino, David L. Govoni, Heather S. Henkel, Sally Holl, Vivian B. Hutchison, Elizabeth Martín, Ellyn T. Montgomery, Cassandra C Ladino, Steven Tessler, and Lisa S. Zolly. 2013. The United States Geological Survey Science Data Lifecycle Model. Technical Report. Reston, VA, USA. Open-File Report 2013--1265.
Sebastian S. Feger, Pawel W. Wozniak, Lars Lischke, and Albrecht Schmidt. 2020. 'Yes, I Comply!': Motivations and Practices around Research Data Management and Reuse across Scientific Fields. Proc. ACM Hum.-Comput. Interact. 4, CSCW2, Article 141 (Oct. 2020), 26 pages. https://doi.org/10.1145/3415212
Melanie Feinberg. 2017. A Design Perspective on Data. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, CO, USA). ACM, New York, NY, USA, 2952--2963. https://doi.org/10.1145/3025453.3025837
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé Iii, and Kate Crawford. 2021. Datasheets for Datasets. Commun. ACM 64, 12 (2021), 86--92. https://doi.org/10.1145/3458723
Elihu M. Gerson and Susan Leigh Star. 1986. Analyzing Due Process in the Workplace. ACM Trans. Inf. Syst. 4, 3 (1986), 257--270. https://doi.org/10.1145/214427.214431
Carole Goble, Robert Stevens, Duncan Hull, Katy Wolstencroft, and Rodrigo Lopez. 2008. Data curation process curation=data integration science. Brief. Bioinform. 9, 6 (Nov. 2008), 506--517.
Carole A. Goble, Jiten Bhagat, Sergejs Aleksejevs, Don Cruickshank, Danius Michaelides, David Newman, Mark Borkum, Sean Bechhofer, Marco Roos, Peter Li, and David De Roure. 2010. myExperiment: A Repository and Social Network for the Sharing of Bioinformatics Workflows. Nucleic Acids Res. 38, Issue suppl 2 (July 2010), W677--W682. https://doi.org/10.1093/nar/gkq429
Ann G. Green and Myron P. Gutmann. 2007. Building Partnerships among Social Science Researchers, Institution-based Repositories and Domain Specific Data Archives. OCLC Systems & Services: Int. Digit. Libr. Perspect. 23, 1 (Feb. 2007), 35--53. http://hdl.handle.net/2027.42/41214
Libby Hemphill, Margaret L. Hedstrom, and Susan Hautaniemi Leonard. 2021. Saving Social Media Data: Understanding Data Management Practices among Social Media Researchers and Their Implications for Archives. J. Assoc. Inf. Sci. Technol. 72, 1 (Jan. 2021), 97--109. https://doi.org/10.1002/asi.24368
Sharlene N. Hesse-Biber and Patricia Leavy. 2005. The Practice of Qualitative Research (third ed.). SAGE Publications, Los Angeles, CA, USA.
Tony Hey, Stewart Tansley, and Kristin Tolle (Eds.). 2009. The Fourth Paradigm: Data-intensive Scientific Discovery. Microsoft Research, Redmond, WA, USA.
Sarah Higgins. 2008. The DCC Curation Lifecycle Model. Int. J. Digit. Curation 3, 1 (Dec. 2008), 134--140.
Caihong Huang, Jian-Sin Lee, and Carole L. Palmer. 2020. DCC Curation Lifecycle Model 2.0: Literature Review and Comparative Analysis. https://digital.lib.washington.edu:443/researchworks/handle/1773/45392
ICPSR. 2020. ICPSR Curation Levels. https://www.icpsr.umich.edu/files/datamanagement/icpsr-curation-levels.pdf
Eun Seo Jo and Timnit Gebru. 2020. Lessons from Archives: Strategies for Collecting Sociocultural Data in Machine Learning. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* '20). ACM, New York, NY, USA, 306--316.
Lisa R. Johnston. 2014. A Workflow Model for Curating Research Data in the University of Minnesota Libraries: Report from the 2013 Data Curation Pilot. Technical Report. Twin Cities, MN, USA.
Lisa R. Johnston, Jacob Carlson, Cynthia Hudson-Vitale, Heidi Imker, Wendy Kozlowski, Robert Olendorf, and Claire Stewart. 2018. How Important Are Data Curation Activities to Researchers? Gaps and Opportunities for Academic Libraries. J. Librariansh. Schol. Commun. 6, 1 (2018), eP2198. https://doi.org/10.7710/2162--3309.2198
Sean Kandel, Andreas Paepcke, Joseph Hellerstein, and Jeffrey Heer. 2011. Wrangler: Interactive Visual Specification of Data Transformation Scripts. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, BC, Canada) (CHI '11). ACM, New York, NY, USA, 3363--3372.
Helena Karasti and Karen S. Baker. 2004. Infrastructuring for the Long-term: Ecological Information Management. In Proceedings of the 37th Annual Hawaii International Conference on System Sciences, 2004 (Big Island, HI, USA).
Helena Karasti, Karen S. Baker, and Eija Halkola. 2006. Enriching the Notion of Data Curation in E-science: Data Managing and Information Infrastructuring in the Long Term Ecological Research (LTER) Network. Comput. Support. Coop. Work 15, 4 (Oct. 2006), 321--358. https://doi.org/10.1007/s10606-006--9023--2
Karina Kervin, Robert B. Cook, and William K. Michener. 2014. The Backstage Work of Data Sharing. In Proceedings of the 18th International Conference on Supporting Group Work (Sanibel Island, FL, USA) (GROUP '14). ACM, New York, NY, USA, 152--156.
Sean Kross and Philip J. Guo. 2021. Orienting, Framing, Bridging, Magic, and Counseling: How Data Scientists Navigate the Outer Loop of Client Collaborations in Industry and Academia. (2021). https://doi.org/10.48550/arXiv.2105.05849 arXiv:2105.05849
Sara Lafia, Andrea Thomer, David Bleckley, Dharma Akmon, and Libby Hemphill. 2021. Leveraging Machine Learning to Detect Data Curation Activities. In 2021 IEEE 17th International Conference on eScience (eScience) (Innsbruck, Austria). IEEE Computer Society, Los Alamitos, CA, USA, 149--158. https://doi.org/10.1109/eScience51609.2021.00025
Margaret D. LeCompte and Jean J. Schensul. 2012. Analysis and Interpretation of Ethnographic Data: A Mixed Methods Approach (second ed.). Rowman & Littlefield, Lanham, MD, USA.
Helena M. Mentis, Ahmed Rahim, and Pierre Theodore. 2016. Crafting the Image in Surgical Telemedicine. In Proceedings of the 19th ACM Conference on Computer-Supported Cooperative Work & Social Computing (San Francisco, CA, USA) (CSCW '16). ACM, New York, NY, USA, 744--755.
Merriam-Webster. n.d. Craft Definition & Meaning. https://www.merriam-webster.com/dictionary/craft
Matthew B. Miles, A. Michael Huberman, and Johnny Saldaña. 2014. Qualitative Data Analysis: A Methods Sourcebook (third ed.). SAGE Publications, Thousand Oaks, CA, USA.
Florence Millerand and Geoffrey C. Bowker. 2009. Trajectories and Enactment in the Life of an Ontology. In Standards and Their Stories., Susan Leigh Star and Martha Lampland (Eds.). Cornell University Press, Ithaca, NY, USA, 149--165.
Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q. Vera Liao, Casey Dugan, and Thomas Erickson. 2019. How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, USA, 1--15. https://doi.org/10.1145/3290605.3300356
Tahani Nadim. 2016. Data Labours: How the Sequence Databases GenBank and EMBL-bank Make Data. Sci. Cult. 25, 4 (Oct. 2016), 496--519. https://doi.org/10.1080/09505431.2016.1189894
Bonnie A Nardi and Yrjö Engeström. 1999. A Web on the Wind: The Structure of Invisible Work. Comput. Support. Coop. Work 8, 1 (March 1999), 1--8. https://doi.org/10.1023/A:1008694621289
National Academies of Sciences, Engineering, and Medicine, Policy and Global Affairs, Committee on Science, Engineering, Medicine, and Public Policy, Board on Research Data and Information, Division on Engineering and Physical Sciences, Committee on Applied and Theoretical Statistics, Board on Mathematical Sciences and Analytics, Division on Earth and Life Studies, Nuclear and Radiation Studies Board, Division of Behavioral and Social Sciences and Education, Committee on National Statistics, Board on Behavioral, Cognitive, and Sensory Sciences, and Committee on Reproducibility and Replicability in Science. 2019. Reproducibility and Replicability in Science. National Academies Press, Washington D.C., USA.
Andrew B. Neang, Will Sutherland, Michael W. Beach, and Charlotte P. Lee. 2021. Data Integration as Coordination: The Articulation of Data Work in an Ocean Science Collaboration. Proc. ACM Hum.-Comput. Interact. 4, CSCW3 (2021), 1--25. https://doi.org/10.1145/3432955
Trevor Owens. 2018. The Theory and Craft of Digital Preservation. Johns Hopkins University Press, Baltimore, MD, USA.
Carole L. Palmer. 2006. Weak Information Work and "Doable" Problems in Interdisciplinary Science. Proc. Am. Soc. Inf. Sci. Technol. 43, 1 (2006), 1--16. https://doi.org/10.1002/meet.14504301108
Carole L. Palmer, Melissa H. Cragin, and Timothy P. Hogan. 2007. Weak Information Work in Scientific Discovery. Inf. Process. Manage. 43, 3 (2007), 808--820. https://doi.org/10.1016/j.ipm.2006.06.003
Carole L. Palmer, Nicholas M. Weber, Trevor Muñoz, and Allen H. Renear. 2013. Foundations of Data Curation: The Pedagogy and Practice of "Purposeful Work" with Research Data. Arch. J. (2013).
Amandalynne Paullada, Inioluwa Deborah Raji, Emily M. Bender, Emily Denton, and Alex Hanna. 2021. Data and Its (dis)contents: A survey of Dataset Development and Use in Machine Learning Research. Patterns 2, 11 (Nov. 2021). https://doi.org/10.1016/j.patter.2021.100336
Kathleen H. Pine, Claus Bossen, Yunan Chen, Gunnar Ellingsen, Miria Grisot, Melissa Mazmanian, and Naja Holten Møller. 2018. Data Work in Healthcare: Challenges for Patients, Clinicians and Administrators. In Companion of the 2018 ACM Conference on Computer Supported Cooperative Work and Social Computing (Jersey City, NJ, USA) (CSCW '18). ACM, New York, NY, USA, 433--439. https://doi.org/10.1145/3272973.3273017
Jean-Christophe Plantin. 2019. Data Cleaners for Pristine Datasets: Visibility and Invisibility of Data Processors in Social Science. Sci. Technol. Human Values 44, 1 (Jan. 2019), 52--73. https://doi.org/10.1177/0162243918781268
Jean-Christophe Plantin. 2021. The Data Archive as Factory: Alienation and Resistance of Data Processors. Big Data & Soc. 8, 1 (2021), 205395172110075. https://doi.org/10.1177/20539517211007510
Line Pouchard. 2016. Revisiting the Data Lifecycle with Big Data Curation. International Journal of Digital Curation 10, 2 (May 2016), 176--192.
Katie Rawson and Trevor Muñoz. 2019. Against Cleaning. In Debates in the Digital Humanities 2019, Matthew K. Gold and Lauren F. Klein (Eds.). University of Minnesota Press, 279--292. https://doi.org/10.5749/j.ctvg251hk.26
Daniela K. Rosner, Samantha Shorey, Brock R. Craft, and Helen Remick. [n.d.]. Making Core Memory: Design Inquiry into Gendered Legacies of Engineering and Craftwork. In Proc. ACM on Hum.-Comput. Interact. ACM, New York, NY, USA, 1--13.
Herbert J. Rubin and Irene S. Rubin. 2012. Qualitative Interviewing: The Art of Hearing Data (3rd ed.). SAGE Publications, Thousand Oaks, CA, USA.
Nithya Sambasivan, Shivani Kapania, Hannah Highfill, Diana Akrong, Praveen Paritosh, and Lora M. Aroyo. 2021. "Everyone Wants to do the Model Work, Not the Data Work": Data Cascades in High-Stakes AI. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan). ACM, New York, NY, USA. https://doi.org/10.1145/3411764.3445518
Morgan Klaus Scheuerman, Alex Hanna, and Emily Denton. 2021. Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development. Proc. ACM Hum.-Comput. Interact. 5, CSCW2 (Oct. 2021), 1--37. https://doi.org/10.1145/3476058
William A. Scott. 1955. Reliability of Content Analysis: The Case of Nominal Scale Coding. Public Opin. Q. 19, 3 (1955), 321--325. https://doi.org/10.1086/266577
Richard Sennett. 2008. The Craftsman. Yale University Press, New Haven, CT, USA.
Susan Leigh Star and Anselm Strauss. 1999. Layers of Silence, Arenas of Voice: The Ecology of Visible and Invisible Work. Comput. Supp. Coop. Work 8, 1 (March 1999), 9--30. https://doi.org/10.1023/A:1008651105359
Pontus Stenetorp, Sampo Pyysalo, Goran Topic, Tomoko Ohta, Sophia Ananiadou, and Jun'ichi Tsujii. 2012. brat: a Web-based Tool for NLP-Assisted Text Annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics (Avignon, France). Association for Computational Linguistics, Stroudsburg, PA, USA, 102--107.
Besiki Stvilia, Charles C. Hinnant, Shuheng Wu, Adam Worrall, Dong Joon Lee, Kathleen Burnett, Gary Burnett, Michelle M. Kazmer, and Paul F. Marty. 2013. Studying the Data Practices of a Scientific Community. In Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (Indianapolis, IN, USA) (JCDL '13). ACM, New York, NY, USA, 425--426. https://doi.org/10.1145/2467696.2467781
Lucy Suchman. 1995. Making Work Visible. Commun. ACM 38, 9 (Sept. 1995), 56--64. https://doi.org/10.1145/223248.223263
Alex S. Taylor, Siân Lindley, Tim Regan, David Sweeney, Vasillis Vlachokyriakos, Lillie Grainger, and Jessica Lingel. 2015. Data-in-Place: Thinking through the Relations between Data and Community. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (Seoul, Republic of Korea). ACM, New York, NY, USA, 2863--2872. https://doi.org/10.1145/2702123.2702558
Carol Tenopir, Elizabeth D. Dalton, Suzie Allard, Mike Frame, Ivanka Pjesivac, Ben Birch, Danielle Pollock, and Kristina Dorsett. 2015. Changes in Data Sharing and Data Reuse Practices and Perceptions among Scientists Worldwide. PloS ONE 10, 8 (2015), e0134826. https://doi.org/10.1371/journal.pone.0134826
Anne E. Thessen, Matt Woodburn, Dimitrios Koureas, Deborah Paul, Michael Conlon, David P. Shorthouse, and Sarah Ramdeen. 2019. Proper Attribution for Curation and Maintenance of Research Collections: Metadata Recommendations of the RDA/TDWG Working Group. Data Sci. J. 18, 1 (Nov. 2019), 54. https://doi.org/10.5334/dsj-2019-054
Andrea K. Thomer. 2022. Integrative Data Reuse at Scientifically Significant Sites: Case Studies at Yellowstone National Park and the La Brea Tar Pits. J. Assoc. Inf. Sci. Technol. 73, 3 (2022), 1155--1170. https://doi.org/10.1002/asi.24620
Andrea K. Thomer, Michael Bernard Twidale, and Matthew J. Yoder. 2018. Transforming Taxonomic Interfaces: "Arm's Length" Cooperative Work and the Maintenance of a Long-lived Classification System. Proc. ACM on Hum.-Comput. Interact. 2, CSCW (2018), 1--23. https://doi.org/10.1145/3274442
Andrea K. Thomer, Karen M. Wickett, Karen S. Baker, Bruce W. Fouke, and Carole L. Palmer. 2018. Documenting Provenance in Noncomputational Workflows: Research Process Models Based on Geobiology Fieldwork in Yellowstone National Park. J. Assoc. Inf. Sci. Technol. 69, 10 (2018), 1234--1245. https://doi.org/10.1002/asi.24039
Mary Vardigan, Pascal Heus, and Wendy Thomas. 2008. Data Documentation Initiative: Toward a Standard for the Social Sciences. Int. J. Digit. Curation 3, 1 (Dec. 2008), 107--113. https://doi.org/10.2218/ijdc.v3i1.45
Jullian C. Wallis, Christine L. Borgman, Matthew S. Mayernik, and Alberto Pepe. 2008. Moving Archival Practices Upstream: An Exploration of the Life Cycle of Ecological Sensing Data in Collaborative Field Research. Int. J. Digit. Curation 3, 1 (Dec. 2008), 114--126. https://doi.org/10.2218/ijdc.v3i1.46
Hadley Wickham. 2014. Tidy Data. J. Stat. Softw. 59, 10 (2014), 1--23. https://doi.org/10.18637/jss.v059.i10
Mark D. Wilkinson, Michel Dumontier, I. Jsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E. Bourne, Jildau Bouwman, Anthony J. Brookes, Tim Clark, Mercè Crosas, Ingrid Dillo, Olivier Dumon, Scott Edmunds, Chris T. Evelo, Richard Finkers, Alejandra Gonzalez-Beltran, Alasdair J. G. Gray, Paul Groth, Carole Goble, Jeffrey S. Grethe, Jaap Heringa, Peter A. C. 't Hoen, Rob Hooft, Tobias Kuhn, Ruben Kok, Joost Kok, Scott J. Lusher, Maryann E. Martone, Albert Mons, Abel L. Packer, Bengt Persson, Philippe Rocca-Serra, Marco Roos, Rene van Schaik, Susanna-Assunta Sansone, Erik Schultes, Thierry Sengstag, Ted Slater, George Strawn, Morris A. Swertz, Mark Thompson, Johan van der Lei, Erik van Mulligen, Jan Velterop, Andra Waagmeester, Peter Wittenburg, Katherine Wolstencroft, Jun Zhao, and Barend Mons. 2016. The FAIR Guiding Principles for Scientific Data Management and Stewardship. Sci. Data 3 (March 2016), 160018. https://doi.org/10.1038/sdata.2016.18
Michael Witt, Jacob Carlson, D. Scott Brandt, and Melissa H. Cragin. 2009. Constructing Data Curation Profiles. Int. J. Digit. Curation 4, 3 (2009), 93--103. https://doi.org/10.2218/ijdc.v4i3.117
Ellery Wulczyn, Nithum Thain, and Lucas Dixon. 2016. Wikipedia Talk Labels: Personal Attacks.
Ellery Wulczyn, Nithum Thain, and Lucas Dixon. 2017. Ex Machina: Personal Attacks Seen at Scale. In Proceedings of the 26th International Conference on World Wide Web (Perth, Australia) (WWW '17, Vol. 11). ACM, New York, NY, USA, 1391--1399.
Elizabeth Yakel. 2007. Digital Curation. OCLC Systems & Services: Int. Digit. Libr. Perspect. 23, 4 (Nov. 2007), 335--340. https://doi.org/10.1108/10650750710831466
An Yan, Caihong Huang, Jian-Sin Lee, and Carole L. Palmer. 2020. Cross-disciplinary Data Practices in Earth System Science: Aligning Services with Reuse and Reproducibility Priorities. In Proceedings of the Association for Information Science and Technology (Virtual Conference), Vol. 57. Association for Information Science and Technology, Silver Spring, MD, USA, e218. https://doi.org/10.1002/pra2.218
JoAnne Yates. 1989. Control through Communication: The Rise of System in American Management. Johns Hopkins University Press, Baltimore, MD, USA.
Amy X. Zhang, Michael Muller, and Dakuo Wang. 2020. How do Data Science Workers Collaborate? Roles, Workflows, and Tools. Proc. ACM Hum.-Comput. Interact. 4, CSCW1 (May 2020), 1--23. https://doi.org/10.1145/3392826
Ning Zhang, Manohar Paluri, Yaniv Taigman, Rob Fergus, and Lubomir Bourdev. 2015. Beyond Frontal Faces: Improving Person Recognition Using Multiple Cues. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. cv-foundation.org, Boston, MA, USA, 4804--4813.
Jun Zhao, Jose Manuel Gomez-Perez, Khalid Belhajjame, Graham Klyne, Esteban Garcia-Cuesta, Aleix Garrido, Kristina Hettne, Marco Roos, David De Roure, and Carole Goble. 2012. Why Workflows Break--Understanding and Combating Decay in Taverna Workflows. In 2012 IEEE 8th International Conference on e-Science (Chicago, IL, USA). IEEE, Piscataway, NJ, USA, 1--9. https://doi.org/10.1109/eScience.2012.6404482
Ann S. Zimmerman. 2008. New Knowledge from Old Data: The Role of Standards in the Sharing and Reuse of Ecological Data. Sci. Technol. Human Values 33, 5 (2008), 631--652. https://doi.org/10.1177/0162243907306704

Cited By

View all
  • (2024)Machine learning data practices through a data curation lens: An evaluation frameworkProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658955(1055-1067)Online publication date: 3-Jun-2024
  • (2024)Mining Semantic Relations in Data References to Understand the Roles of Research Data in Academic LiteratureProceedings of the 2023 ACM/IEEE Joint Conference on Digital Libraries10.1109/JCDL57899.2023.00039(215-227)Online publication date: 26-Jun-2024
  • (2024)Is the climate getting WARMer? A framework and tool for climate data comparisonEnvironmental Modelling & Software10.1016/j.envsoft.2023.105879171:COnline publication date: 27-Feb-2024
  • Show More Cited By



Information & Contributors


Published In

cover image Proceedings of the ACM on Human-Computer Interaction
Proceedings of the ACM on Human-Computer Interaction  Volume 6, Issue CSCW2
November 2022
8205 pages
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution-ShareAlike International 4.0 License.


Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 November 2022
Published in PACMHCI Volume 6, Issue CSCW2

Check for updates

Author Tags

  1. coordination
  2. craft
  3. data curation
  4. knowledge infrastructure
  5. social science data
  6. workflows


  • Research-article

Funding Sources


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)551
  • Downloads (Last 6 weeks)79
Reflects downloads up to 30 Aug 2024

Other Metrics


Cited By

View all
  • (2024)Machine learning data practices through a data curation lens: An evaluation frameworkProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658955(1055-1067)Online publication date: 3-Jun-2024
  • (2024)Mining Semantic Relations in Data References to Understand the Roles of Research Data in Academic LiteratureProceedings of the 2023 ACM/IEEE Joint Conference on Digital Libraries10.1109/JCDL57899.2023.00039(215-227)Online publication date: 26-Jun-2024
  • (2024)Is the climate getting WARMer? A framework and tool for climate data comparisonEnvironmental Modelling & Software10.1016/j.envsoft.2023.105879171:COnline publication date: 27-Feb-2024
  • (2024)An empirical examination of data reuser trust in a digital repositoryJournal of the Association for Information Science and Technology10.1002/asi.24933Online publication date: 20-Jun-2024
  • (2024)Curating the Chinese ancient book catalogs: Leveraging the dual roles of humanities scholars as experts and users in collaborative practiceJournal of the Association for Information Science and Technology10.1002/asi.24894Online publication date: 14-Apr-2024
  • (2023)Platforms, programmability, and precarity: The platformization of research repositories in academic librariesNew Media & Society10.1177/14614448231176758(146144482311767)Online publication date: 6-Jun-2023
  • (2023)“A Patchwork of Data Systems”: Quilting as an Analytic Lens and Stabilizing Practice for Knowledge InfrastructuresScience, Technology, & Human Values10.1177/01622439231175535(016224392311755)Online publication date: 25-May-2023
  • (2023)Fostering Research Data Management in Collaborative Research Contexts: Lessons learnt from an ‘Embedded’ Evaluation of ‘Data Story’Computer Supported Cooperative Work10.1007/s10606-023-09467-632:4(911-949)Online publication date: 15-May-2023
  • (2023)Towards a Researcher-in-the-loop Driven Curation Approach for Quantitative and Qualitative Research MethodsNew Trends in Database and Information Systems10.1007/978-3-031-42941-5_58(647-655)Online publication date: 31-Aug-2023
  • (2022)A Novel Tightly Coupled Information System for Research Data ManagementElectronics10.3390/electronics1119319611:19(3196)Online publication date: 5-Oct-2022
  • Show More Cited By

View Options

View options


View or Download as a PDF file.



View online with eReader.


Get Access

Login options

Full Access







Share this Publication link

Share on social media