Abstract
Initially developed for the scientific community, Grid computing is now gaining much interest in important areas such as enterprise information systems. This makes data management critical since the techniques must scale up while addressing the autonomy, dynamicity and heterogeneity of the data sources. In this paper, we discuss the main open problems and new issues related to Grid data management. We first recall the main principles behind data management in distributed systems and the basic techniques. Then we make precise the requirements for Grid data management. Finally, we introduce the main techniques needed to address these requirements. This implies revisiting distributed database techniques in major ways, in particular, using P2P techniques.
Similar content being viewed by others
References
Aberer, K., et al.: P-Grid: a self-organizing structured P2P system. SIGMOD Rec. 32(3) (2003)
Abiteboul, S., Bonifati, A., Cobena, G., Manolescu, I., Milo, T.: Dynamic XML documents with distribution and replication. In: ACM SIGMOD International Conference on Management of Data (2003)
Agrawal, R., Kiernan, J., Srikant, R., Xu, Y.: Hippocratic databases. In: VLDB Conference (2002)
Akbarinia, R., Martins, V., Pacitti, E., Valduriez, P.: Design and implementation of APPA. In: Baldoni, R., Cortese, G., Davide, F. (eds.) Global Data Management. IOS Press, Amsterdam, The Netherlands (2006)
Akbarinia, R., Martins, V.: Data management in the APPA system. J. Grid Computing, doi:10.1007/s10723-007-9070-z (2007)
Akbarinia, R., Pacitti, E., Valduriez, P.: Data currency in DHTs. In: ACM SIGMOD International Conference on Management of Data (2007)
Akbarinia, R., Pacitti, E., Valduriez, P.: Reducing network traffic in unstructured P2P systems using top-K queries. Distributed and Parallel Databases 19(2), 67–86 (2006)
Antonioletti, M., et al.: The design and implementation of Grid database services in OGSA-DAI. Concurrency and Computation: Practice & Experience 17(2–4), 357–376 (2005)
Atkinson, M.P., et al.: Web service Grids: an evolutionary approach. Concurrency and Computation: Practice & Experience 17(2–4), 377–389 (2005)
Bhatia, K.: OGSA-P2P research group: peer-to-peer requirements on the open Grid services architecture framework. In: Global Grid Forum Document GFD-I.049 (2005)
Chakravarti, A.J., Baumgartner, G., Lauria, M.: The organic Grid: self-organizing computation on a peer-to-peer network. IEEE Trans. Syst. Man Cybern., Part A, Syst. Humans 35(3), 373–384 (2005)
Chen, M., Yang, G., Liu, X.: Gridmarket: a practical, efficient market balancing resource for Grid and P2P computing. In: International Workshop on Grid and Cooperative Computing (2003)
Churches, D., Gombás, G., Harrison, A., et al.: Programming scientific and distributed workflow with Triana services. Concurrency and Computation: Practice & Experience 18(10), 1021–1037 (2006)
Condor Project. http://www.cs.wisc.edu/condor
Enterprise Grid Alliance. http://www.gridalliance.org
Faerman, M., Moore, R.W., Minster, B., Maechling, P., Cui, Y., Hu, Y., Zhu, J.: Managing large scale data for earthquake simulations. J. Grid Computing, doi:10.1007/s10723-007-9072-x (2007)
Foster, I.T., Iamnitchi, A.: On death, taxes, and the convergence of peer-to-peer and Grid computing. In: International Workshop on P2P Systems (IPTPS) (2003)
Frey, J., Tannenbaum, T., Foster, I., Livny, M., Tuecke, S.: Condor-G: a computation management agent for multi-institutional Grids. In: IEEE Symp. on High Performance Distributed Computing (HPDC) (2001)
Grid4all: democratizing the Grid. http://www.grid4all.eu
Honeyman, W.A., Adamson, P., McKee, S.: GridNFS: global storage for global collaborations. In: IEEE Int. Symp. on Global Data Interoperability – Challenges and Technologies, pp. 111–115 (2005)
Huebsch, R., et al.: Querying the internet with PIER. In: VLDB Conference (2003)
Iamnitchi, A., Foster, I.: On fully decentralized resource discovery in Grid environments. In: International Workshop on Grid Computing (2001)
Jiménez-Peris, R., Patiño-Martínez, M., Kemme, B.: Enterprise Grids: challenges ahead. Journal of Grid Computing, doi:10.1007/s10723-007-9071-y (2007)
Kacsuk, P., Farkas, Z., Sipos, G., et al.: Workflow-level parameter study management in multi-Grid environments by the P-GRADE Grid portal. In: GCE06 – Grid Computing Environments Workshop in Conjunction with Supercomputing (2006)
Lamarre, P., Cazalens, S., Lemp, S., Valduriez, P.: A flexible mediation process for large distributed information systems. In: CoopIS Conference (2004)
Lima, A., Mattoso, M., Valduriez, P.: Adaptive virtual partitioning for OLAP query processing in a database cluster. In: Brazilian Symposium on Databases (SBBD) (2004)
Lima, A., Mattoso, M., Valduriez, P.: OLAP query processing in a database cluster. In: International Conference on Parallel and Distributed Computing (Euro-Par). LNCS, vol. 3149, pp. 355–362 (2004)
Ludäscher, B., Altintas, I., Berkley, C., et al.: Scientific workflow management and the Kepler system. Concurrency and Computation: Practice & Experience, Special Issue on Scientific Workflows 18(10), 1039–1065 (2006)
Mandal, A., Kennedy, K., Koelbel, C., et al.: Scheduling strategies for mapping application workflows onto the Grid. In: 14th IEEE International Symposium on High-Performance Distributed Computing (HPDC-14) (2005)
Martins, V., Akbarinia, R., Pacitti, E., Valduriez, P.: Reconciliation in the APPA P2P System. In: IEEE International Conference on Parallel and Distributed Systems (ICPADS) (2006)
Martins, V., Pacitti, E.: Dynamic and distributed reconciliation in P2P-DHT Networks. In: European Conference on Parallel Computing (Euro-Par) (2006)
Meyer, L., Mattoso, M., Foster, I., et al.: Planning spatial workflow to optimize Grid performance. In: ACM Symposium of Applied Computing, pp. 786–790 (2006)
Meyer, L., Wilde, M., Mattoso, M., Foster, I.: An opportunistic algorithm for scheduling workflows on Grids. In: International Conference on High Performance Computing for Computational Science (VecPar). LNCS, vol. 4395, pp. 212–224. Springer, Berlin Heidelberg New York (2007)
Narada project. http://aspen.ucs.indiana.edu/users/shrideep/narada
Nejdl, W., Siberski, W., Sintek, M.: Design issues and challenges for RDF- and schema-based peer-to-peer systems. SIGMOD Rec. 32(3) (2003)
Oinn, T., Greenwood, R., Goble, C., Wroe, C., et al.: Taverna: lessons in creating a workflow environment for the life sciences. Concurrency and Computation: Practice & Experience, Special Issue on Scientific Workflows 18(10), 1067–1100 (2006)
Open Grid Services Architecture. http://www.globus.org/ogsa/
Özsu, T., Valduriez, P.: Principles of Distributed Database Systems, 3rd edn. Prentice-Hall, Englewood Cliffs, NJ.
Pacitti, E., Dedieu, O.: Algorithms for optimistic replication on the web. J. Braz. Comput. Soc. 8(2) (2002)
Ratnasamy, S., et al.: A scalable content-addressable network. In: Proceedings of SIGCOMM (2001)
SETI@home. http://www.setiathome.ssl.berkeley.edu/
Saito, Y., Shapiro, M.: Optimistic replication. ACM Comput. Surv. 37(1), 42–81 (2005)
Singh, G., Kesselman, C., Deelman, E.: Optimizing Grid-based workflow execution. J. Grid Computing 3(3–4), 201–219 (2005)
Stoica I., et al.: Chord: a scalable peer-to-peer lookup service for internet applications. In: Proceedings of SIGCOMM (2001)
Tangmunarunkit, H., Decker, S., Kesselman, C.: Ontology-based resource matching in the Grid – the Grid meets the semantic web. In: International Semantic Web Conference (ISWC) (2003)
Tatarinov, I., et al.: The piazza peer data management project. SIGMOD Rec. 32(3) (2003)
Tomasic, A., Raschid, L., Valduriez, P.: Scaling access to heterogeneous data sources with DISCO. IEEE Trans. Knowl. Data Eng. 10(5), 808–823 (1998)
Valduriez, P.: Parallel database systems: open problems and new issues. Distributed and Parallel Databases 1(2), 137–165 (1993)
Valduriez, P., Pacitti, E.: Data management in large-scale P2P systems. In: International Conference on High Performance Computing for Computational Science (VecPar). LNCS, vol. 3402, pp. 109–122. Springer, Berlin Heidelberg New York (2005)
Van der Aalst, W., et al.: Life after BPEL? In: LNCS, vol. 3670, pp. 35–50. Springer, Berlin Heidelberg New York (2005)
Wieczorek, M., Prodan, R., Fahringer, T.: Scheduling of scientific workflows in the ASKALON Grid environment. SIGMOD Rec. 34(3), 56–62 (2005)
Yagoub, K., Florescu, D., Issarny, V., Valduriez, P.: Caching strategies for data-intensive web sites. In: International Conference on VLDB (2000)
Yu, J., Buyya, R.: A taxonomy of scientific workflow systems for Grid computing, SIGMOD Record, Special Section on Scientific Workflows 34(3), 44–49 (2005)
Yang, B., Garcia-Molina, H.: Designing a super-peer network. In: International Conference on Data Engineering (2003)
Zhao, Y., Dobson, J., Foster, I., Moreau, L., Wilde, M.: A notation and system for expressing and executing cleanly typed workflows on messy scientific data. SIGMOD Rec. 34(3), 37–43 (2005)
Author information
Authors and Affiliations
Corresponding author
Additional information
Work partially funded by ARA “Massive Data” of the French ministry of research (project Respire), the European Strep Grid4All project, the CAPES–COFECUB Daad project and the CNPq–INRIA Gridata project.
Rights and permissions
About this article
Cite this article
Pacitti, E., Valduriez, P. & Mattoso, M. Grid Data Management: Open Problems and New Issues. J Grid Computing 5, 273–281 (2007). https://doi.org/10.1007/s10723-007-9081-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10723-007-9081-9