Abstract
This paper describes a formal structure for keeping track of files, source code, scripts, and related material for large-scale Earth science data production. We first describe the environment and processes that govern this configuration management problem. Then, we show that a graph with typed nodes and arcs can describe the derivation of production design and of the produced files and their metadata. The graph provides three useful by-products:
-
a hierarchical data file inventory structure that can help system users find particular files,
-
methods for creating production graphs that govern job scheduling and provenance graphs that track all of the sources and transformations between raw data input and a particular output file,
-
a systematic relationship between different elements of the structure and development documentation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cavalcanti, M. C., M. L. Campos, and M. Mattoso, “Managing Scientific Models in Structural Genomic Projects,” paper presented at the Workshop on Data Lineage and Provenance, Chicago, IL, Oct. 10–11, 2002, available at http://people.cs.uchicago.edu/~yongzh/position_papers.html .
Pancerella, C., J. Myers, and L. Rahn, “Data Provenance in the CMCS,” paper presented at the Workshop on Data Lineage and Provenance, Chicago, IL, Oct. 10–11, 2002, available at http://people.cs.uchicago.edu/~yongzh/position_papers.html .
Cavanaugh, R., G. Graham, and M. Wilde, “Satisfying the Tax Collector: Using Data Provenance as a way to audit data analyses in High Energy Physics,” paper presented at the Workshop on Data Lineage and Provenance, Chicago, IL, Oct. 10–11, 2002, available at http://people.cs.uchicago.edu/~yongzh/position_papers.html .
Mann, R., “Some Data Derivation and Provenance Issues in Astronomy,” paper presented at the Workshop on Data Lineage and Provenance, Chicago, IL, Oct. 10–11, 2002, available at http://people.cs.uchicago.edu/~yongzh/position_papers.html .
Fox, P., “Some Thoughts on Data Derivation and Provenance,” paper presented at the Workshop on Data Lineage and Provenance, Chicago, IL, Oct. 10–11, 2002, available at http://people.cs.uchicago.edu/~yongzh/position_papers.html .
Musick, R., and T. Critchlow, “Practical Lessons in Supporting Large Scale Computational Science,” Lawrence Livermore Report UCRL-JC-135606, 1999.
Baum, B., and B. R. Barkstrom, “Design and implementation of a prototype data system for Earth radiation budget, cloud, aerosol, and chemistry data,” Bull. Amer. Meteor. Soc., 74, 591–598, 1993
Knuth, D. E., The Art of Computer Programming, Volume 1: Fundamental Algorithms, 2nd Ed., Addison-Wesley, Reading, MA, 1973.
Frew, J., and R. Bose, “Lineage Issues for Scientific Data and Information,” paper presented at the Workshop on Data Lineage and Provenance, Chicago, IL, Oct. 10–11, 2002, available at http://people.cs.uchicago.edu/~yongzh/position_papers.html
Mahler, A., Variants: Keeping Things Together and Telling Them Apart, Configuration Management, W. F. Tichy, ed., 73–97, J. Wiley, 1994.
Zeller, A., and G. Snelting, Unified Versioning through Feature Logic, ACM Trans. On Software Engineering and Methodology, 6, 398–441, 1997.
Conradi, R. and B. Westfechtel, Version models for software configuration management, ACM Computing Surveys, 30, No. 2, 232–282, 1998.
Estublier, J., J-M. Favre, and P. Morat, “Toward SCM/PDM integration?,” Proc. SCM8, Bruxelles, Belgium, July, 1998, Springer-Verlag, LNCS 1439, 75–95.
Cui, Y. Lineage Tracing in Data Warehouses, Ph.D. Dissertation, Stanford Univ., 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Barkstrom, B.R. (2003). Data Product Configuration Management and Versioning in Large-Scale Production of Satellite Scientific Data. In: Westfechtel, B., van der Hoek, A. (eds) Software Configuration Management. SCM SCM 2001 2003. Lecture Notes in Computer Science, vol 2649. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-39195-9_9
Download citation
DOI: https://doi.org/10.1007/3-540-39195-9_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-14036-8
Online ISBN: 978-3-540-39195-1
eBook Packages: Springer Book Archive