Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3311790.3396636acmconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article

Toward a Data Lifecycle Model for NSF Large Facilities

Published: 26 July 2020 Publication History
  • Get Citation Alerts
  • Abstract

    National Science Foundation large facilities conduct large-scale physical and natural science research. They include telescopes that survey the entire sky, gravitational wave detectors that look deep into our universe’s past, sensor-driven field sites that collect a range of biological and environmental data, and more. The Cyberinfrastructure Center for Excellence (CICoE) pilot project aims to develop a model for a center that facilitates community building, fosters knowledge sharing, and applies best practices in consulting with large facilities with regard to their cyberinfrastructure. To accomplish this goal, the pilot began an in-depth study of how large facilities manage their data during the course of their research. Large facilities are diverse and highly complex, from the types of data they capture, to the types of equipment they use, to the types of data processing and analysis they conduct, to their policies on data sharing and use. Because of this complexity, the pilot needed to find a single lens through which it could frame its growing understanding of large facilities and identify areas where it could best serve large facilities. As a result of the pilot’s research into large facilities, common themes have emerged which have enabled the creation of a data lifecycle model that successfully captures the data management practices of large facilities. This model has enabled the pilot to organize its thinking about large facilities, and frame its support and consultation efforts around the cyberinfrastructure used during lifecycle stages. This paper describes the model and discusses how it was applied to disaster recovery planning for a representative large facility—IceCube.

    Supplemental Material

    MP4 File
    Presentation video

    References

    [1]
    Sergio Albani and David Giaretta. 2009. Long term data and knowledge preservation to guarantee access and use of the Earth science archive. In PV2018: Ensuring the Long-Term Preservation and Value Adding to Scientific and Technical Data. 1–7.
    [2]
    Suzie Allard. 2012. DataONE: Facilitating eScience through collaboration. Journal of eScience Librarianship 1, 1 (2012), 4–17.
    [3]
    Mohammed El Arass, Iman Tikito, and Nissrine Souissi. 2017. Data lifecycles analysis: Towards intelligent cycle. In 2017 Intelligent Systems and Computer Vision (ISCV). IEEE, 1–8.
    [4]
    Sören Auer, Lorenz Bühmann, Christian Dirschl, Orri Erling, Michael Hausenblas, Robert Isele, Jens Lehmann, Michael Martin, Pablo N. Mendes, and Bert Van Nuffelen. 2012. Managing the life-cycle of linked data with the LOD2 stack. In International Semantic Web Conference. Springer, 1–16.
    [5]
    Alex Ball. 2012. Review of data management lifecycle models. University of Bath, IDMRC.
    [6]
    Jake Carlson. 2014. The use of life cycle models in developing and supporting data services. Research Data Management: Practical Strategies for Information Professionals (2014), 63–86.
    [7]
    Andrew Martin Cox and Winnie Wan Ting Tam. 2018. A critical analysis of lifecycle models of the research process and research data management. Aslib Journal of Information Management 70, 2 (2018), 142–157.
    [8]
    Kevin Crowston and Jian Qin. 2011. A capability maturity model for scientific data management: Evidence from the literature. Proceedings of the American Society for Information Science and Technology 48, 1 (2011), 1–9.
    [9]
    Ewa Deelman, Anirban Mandal, Valerio Pascucci, Susan Sons, Jane Wyngaard, Charles F. Vardeman II, Steve Petruzza, Ilya Baldin, Laura Christopherson, Ryan Mitchell, Loic Pottier, Mats Rynge, Erik Scott, Karan Vahi, Marina Kogank, Jasmine A Mann, Tom Gulbransen, Daniel Allen, David Barlow, Santiago Bonarrigo, Chris Clark, Leslie Goldman, Tristan Goulden, Phil Harvey, David Hulsander, Steve Jacob, Christine Laney, Ivan Lobo-Padilla, Jeremey Sampson, John Staarmann, and Steve Stone. 2019. Cyberinfrastructure Center of Excellence Pilot: Connecting Large Facilities Cyberinfrastructure. In 15th International Conference on eScience (eScience) (San Diego, CA, USA). Funding Acknowledgments: NSF 1842042.
    [10]
    Yuri Demchenko, Cees De Laat, and Peter Membrey. 2014. Defining architecture components of the Big Data Ecosystem. In 2014 International Conference on Collaboration Technologies and Systems (CTS). IEEE, 104–112.
    [11]
    DigitalNZ.org. [n.d.]. Getting Started with Digitisation. https://digitalnz.org/make-it-digital/getting-started-with-digitisation
    [12]
    Satu Elo and Helvi Kyngäs. 2008. The qualitative content analysis process. Journal of Advanced Nursing 62, 1 (2008), 107–115.
    [13]
    John L. Faundeen, Thomas E. Burley, Jennifer A. Carlino, David L. Govoni, Heather S. Henkel, Sally L. Holl, Vivian B. Hutchison, Elizabeth Martín, Ellyn T. Montgomery, and Cassandra Ladino. 2013. The United States geological survey science data lifecycle model. Technical Report. US Geological Survey. https://pubs.usgs.gov/of/2013/1265/pdf/of2013-1265.pdf
    [14]
    Inter-University Consortium for Political Social Research (ICPSR). 2012. Guide to Social Science Data Preparation and Archiving Best Practice Throughout the Data Life Cycle. https://www.icpsr.umich.edu/files/deposit/dataprep.pdf
    [15]
    Sarah Higgins. 2008. The DCC curation lifecycle model. International Journal of Digital Curation 3, 1 (2008).
    [16]
    Chuck Humphrey. 2006. e-Science and the Life Cycle of Research. https://era.library.ualberta.ca/items/3334684b-fa6a-4c9d-a74b-559fecd42f9f/view/79b064d6-7b51-4d18-8e4e-3d42b9faa81f/Lifecycle-science060308.pdf
    [17]
    Data Documentation Initiative. 2019. Why Use DDI?https://ddialliance.org/training/why-use-ddi
    [18]
    Nawsher Khan, Ibrar Yaqoob, Ibrahim Abaker Targio Hashem, Zakira Inayat, Mahmoud Ali, Waleed Kamaleldin, Muhammad Alam, Muhammad Shiraz, and Abdullah Gani. 2014. Big data: survey, technologies, opportunities, and challenges. The Scientific World Journal 2014 (2014).
    [19]
    Finance Large Facilities Office in the Budget and Award Management Office (BFA-LFO). 2019. Major Facilities Guide. NSF 19-68. National Science Foundation. https://www.nsf.gov/pubs/2019/nsf19068/nsf19068.pdf
    [20]
    Brian Lavoie. 2000. Meeting the challenges of digital preservation: The OAIS reference model. Technical Report. Online Computer Library Center (OCLC). https://www.oclc.org/research/publications/library/2000/lavoie-oais.html
    [21]
    Li Lin, Tingting Liu, Jian Hu, and Jianbiao Zhang. 2014. A privacy-aware cloud service selection method toward data life-cycle. In 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 752–759.
    [22]
    Philipp Mayring. 2004. Qualitative content analysis. A Companion to Qualitative Research 1 (2004), 159–176.
    [23]
    Research Information Network and NESTA. 2010. Open to all? Case studies of openness in research. http://www.rin.ac.uk/system/files/attachments/NESTA-RIN_Open_Science_V01_0.pdf
    [24]
    University of Central Florida Libraries: Scholarly Communication. [n.d.]. Overview: Research Lifecycle. https://library.ucf.edu/about/departments/scholarly-communication/overview-research-lifecycle/
    [25]
    University of Virginia Library: Research Data Services and Sciences. [n.d.]. Steps in the Data Life Cycle. https://data.library.virginia.edu/data-management/lifecycle/
    [26]
    Working Group on Information Systems and Services. 2012. Data life cycle models and concepts: CEOS Version 1.2. Technical Report. Committee on Earth Observation Satellites (CEOS). http://ceos.org/document_management/Working_Groups/WGISS/Interest_Groups/Data_Stewardship/White_Papers/WGISS_DSIG_Data-Lifecycle-Models-And-Concepts-v13-1_Apr2012.docx
    [27]
    Alberto Pepe, Matthew Mayernik, Christine L. Borgman, and Herbert Van de Sompel. 2010. From artifacts to aggregations: Modeling scientific life cycles on the semantic web. Journal of the American Society for Information Science and Technology 61, 3 (2010), 567–582.
    [28]
    Line Pouchard. 2015. Revisiting the data lifecycle with big data curation. International Journal of Digital Curation 10, 2 (2015), 176–192.
    [29]
    Janine Rüegg, Corinna Gries, Ben Bond-Lamberty, Gabriel J. Bowen, Benjamin S. Felzer, Nancy E. McIntyre, Patricia A. Soranno, Kristin L. Vanderbilt, and Kathleen C. Weathers. 2014. Completing the data life cycle: Using information management in macrosystems ecology research. Frontiers in Ecology and the Environment 12, 1 (2014), 24–30.
    [30]
    Amir Sinaeepourfard, Xavier Masip-Bruin, Jordi Garcia, and Eva Marín-Tordera. 2015. A survey on data lifecycle models: Discussions toward the 6Vs Challenges (UPC-DAC-RR-2015–18). Technical Report. https://www.ac.upc.edu/app/research-reports/html/RR/2015/18.pdf
    [31]
    Carly Strasser, Robert Cook, William Michener, and Amber Budden. 2012. Primer on data management: What you always wanted to know. Technical Report. DataONE. https://www.dataone.org/sites/all/documents/DataONE_BP_Primer_020212.pdf
    [32]
    Marianne Swanson, Pauline Bowen, Amy Phillips, Dean Gallup, and David Lynes. 2010. Contingency planning guide for federal information systems, SP 800-34 Rev.1. Technical Report. National Institute of Standards and Technology (NIST). https://csrc.nist.gov/publications/detail/sp/800-34/rev-1/final
    [33]
    Barbara M. Wildemuth. 2009. Applications of Social Research Methods to Questions in Information and Library Science. Libraries Unlimited.

    Cited By

    View all
    • (2024)Design Thinking for Human Centric Research Data Systems EngineeringPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670542(1-7)Online publication date: 17-Jul-2024
    • (2023)FAIR Data for Large Research Facilities2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL)10.1109/JCDL57899.2023.00073(312-313)Online publication date: Jun-2023
    • (2023)A Survey-Based Evaluation of the Data Engineering Maturity in PracticeData Management Technologies and Applications10.1007/978-3-031-37890-4_1(1-23)Online publication date: 23-Jul-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PEARC '20: Practice and Experience in Advanced Research Computing 2020: Catch the Wave
    July 2020
    556 pages
    ISBN:9781450366892
    DOI:10.1145/3311790
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 July 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cyberinfrastructure
    2. data lifecycle
    3. data management
    4. disaster recovery
    5. large facilities
    6. research computing

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    PEARC '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 133 of 202 submissions, 66%

    Upcoming Conference

    PEARC '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)18
    • Downloads (Last 6 weeks)2

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Design Thinking for Human Centric Research Data Systems EngineeringPractice and Experience in Advanced Research Computing 2024: Human Powered Computing10.1145/3626203.3670542(1-7)Online publication date: 17-Jul-2024
    • (2023)FAIR Data for Large Research Facilities2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL)10.1109/JCDL57899.2023.00073(312-313)Online publication date: Jun-2023
    • (2023)A Survey-Based Evaluation of the Data Engineering Maturity in PracticeData Management Technologies and Applications10.1007/978-3-031-37890-4_1(1-23)Online publication date: 23-Jul-2023
    • (2023)Conceptualizing Data Behavior: Bridging Data‐centric and User‐centric ApproachesProceedings of the Association for Information Science and Technology10.1002/pra2.87860:1(856-860)Online publication date: 22-Oct-2023
    • (2022)What Does Information Science Offer for Data Science Research?: A Review of Data and Information Ethics LiteratureJournal of Data and Information Science10.2478/jdis-2022-00187:4(16-38)Online publication date: 8-Sep-2022
    • (2022)A New Lifecycle Model Enabling Optimal Digital CurationJournal of Librarianship and Information Science10.1177/0961000622112595656:1(241-266)Online publication date: 15-Dec-2022
    • (2022)Broadening Student Participation in Cyberinfrastructure Research and DevelopmentPractice and Experience in Advanced Research Computing 2022: Revolutionary: Computing, Connections, You10.1145/3491418.3535175(1-2)Online publication date: 8-Jul-2022
    • (2021)DaLiF: a data lifecycle framework for data-driven governmentsJournal of Big Data10.1186/s40537-021-00481-38:1Online publication date: 14-Jun-2021

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media