Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

Grid Data Management: Open Problems and New Issues

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Initially developed for the scientific community, Grid computing is now gaining much interest in important areas such as enterprise information systems. This makes data management critical since the techniques must scale up while addressing the autonomy, dynamicity and heterogeneity of the data sources. In this paper, we discuss the main open problems and new issues related to Grid data management. We first recall the main principles behind data management in distributed systems and the basic techniques. Then we make precise the requirements for Grid data management. Finally, we introduce the main techniques needed to address these requirements. This implies revisiting distributed database techniques in major ways, in particular, using P2P techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aberer, K., et al.: P-Grid: a self-organizing structured P2P system. SIGMOD Rec. 32(3) (2003)

  2. Abiteboul, S., Bonifati, A., Cobena, G., Manolescu, I., Milo, T.: Dynamic XML documents with distribution and replication. In: ACM SIGMOD International Conference on Management of Data (2003)

  3. Agrawal, R., Kiernan, J., Srikant, R., Xu, Y.: Hippocratic databases. In: VLDB Conference (2002)

  4. Akbarinia, R., Martins, V., Pacitti, E., Valduriez, P.: Design and implementation of APPA. In: Baldoni, R., Cortese, G., Davide, F. (eds.) Global Data Management. IOS Press, Amsterdam, The Netherlands (2006)

  5. Akbarinia, R., Martins, V.: Data management in the APPA system. J. Grid Computing, doi:10.1007/s10723-007-9070-z (2007)

  6. Akbarinia, R., Pacitti, E., Valduriez, P.: Data currency in DHTs. In: ACM SIGMOD International Conference on Management of Data (2007)

  7. Akbarinia, R., Pacitti, E., Valduriez, P.: Reducing network traffic in unstructured P2P systems using top-K queries. Distributed and Parallel Databases 19(2), 67–86 (2006)

    Article  Google Scholar 

  8. Antonioletti, M., et al.: The design and implementation of Grid database services in OGSA-DAI. Concurrency and Computation: Practice & Experience 17(2–4), 357–376 (2005)

    Article  Google Scholar 

  9. Atkinson, M.P., et al.: Web service Grids: an evolutionary approach. Concurrency and Computation: Practice & Experience 17(2–4), 377–389 (2005)

    Article  Google Scholar 

  10. Bhatia, K.: OGSA-P2P research group: peer-to-peer requirements on the open Grid services architecture framework. In: Global Grid Forum Document GFD-I.049 (2005)

  11. Chakravarti, A.J., Baumgartner, G., Lauria, M.: The organic Grid: self-organizing computation on a peer-to-peer network. IEEE Trans. Syst. Man Cybern., Part A, Syst. Humans 35(3), 373–384 (2005)

    Article  Google Scholar 

  12. Chen, M., Yang, G., Liu, X.: Gridmarket: a practical, efficient market balancing resource for Grid and P2P computing. In: International Workshop on Grid and Cooperative Computing (2003)

  13. Churches, D., Gombás, G., Harrison, A., et al.: Programming scientific and distributed workflow with Triana services. Concurrency and Computation: Practice & Experience 18(10), 1021–1037 (2006)

    Article  Google Scholar 

  14. Condor Project. http://www.cs.wisc.edu/condor

  15. Enterprise Grid Alliance. http://www.gridalliance.org

  16. Faerman, M., Moore, R.W., Minster, B., Maechling, P., Cui, Y., Hu, Y., Zhu, J.: Managing large scale data for earthquake simulations. J. Grid Computing, doi:10.1007/s10723-007-9072-x (2007)

  17. Foster, I.T., Iamnitchi, A.: On death, taxes, and the convergence of peer-to-peer and Grid computing. In: International Workshop on P2P Systems (IPTPS) (2003)

  18. Frey, J., Tannenbaum, T., Foster, I., Livny, M., Tuecke, S.: Condor-G: a computation management agent for multi-institutional Grids. In: IEEE Symp. on High Performance Distributed Computing (HPDC) (2001)

  19. Grid4all: democratizing the Grid. http://www.grid4all.eu

  20. Honeyman, W.A., Adamson, P., McKee, S.: GridNFS: global storage for global collaborations. In: IEEE Int. Symp. on Global Data Interoperability – Challenges and Technologies, pp. 111–115 (2005)

  21. Huebsch, R., et al.: Querying the internet with PIER. In: VLDB Conference (2003)

  22. Iamnitchi, A., Foster, I.: On fully decentralized resource discovery in Grid environments. In: International Workshop on Grid Computing (2001)

  23. Jiménez-Peris, R., Patiño-Martínez, M., Kemme, B.: Enterprise Grids: challenges ahead. Journal of Grid Computing, doi:10.1007/s10723-007-9071-y (2007)

  24. Kacsuk, P., Farkas, Z., Sipos, G., et al.: Workflow-level parameter study management in multi-Grid environments by the P-GRADE Grid portal. In: GCE06 – Grid Computing Environments Workshop in Conjunction with Supercomputing (2006)

  25. Lamarre, P., Cazalens, S., Lemp, S., Valduriez, P.: A flexible mediation process for large distributed information systems. In: CoopIS Conference (2004)

  26. Lima, A., Mattoso, M., Valduriez, P.: Adaptive virtual partitioning for OLAP query processing in a database cluster. In: Brazilian Symposium on Databases (SBBD) (2004)

  27. Lima, A., Mattoso, M., Valduriez, P.: OLAP query processing in a database cluster. In: International Conference on Parallel and Distributed Computing (Euro-Par). LNCS, vol. 3149, pp. 355–362 (2004)

  28. Ludäscher, B., Altintas, I., Berkley, C., et al.: Scientific workflow management and the Kepler system. Concurrency and Computation: Practice & Experience, Special Issue on Scientific Workflows 18(10), 1039–1065 (2006)

    Article  Google Scholar 

  29. Mandal, A., Kennedy, K., Koelbel, C., et al.: Scheduling strategies for mapping application workflows onto the Grid. In: 14th IEEE International Symposium on High-Performance Distributed Computing (HPDC-14) (2005)

  30. Martins, V., Akbarinia, R., Pacitti, E., Valduriez, P.: Reconciliation in the APPA P2P System. In: IEEE International Conference on Parallel and Distributed Systems (ICPADS) (2006)

  31. Martins, V., Pacitti, E.: Dynamic and distributed reconciliation in P2P-DHT Networks. In: European Conference on Parallel Computing (Euro-Par) (2006)

  32. Meyer, L., Mattoso, M., Foster, I., et al.: Planning spatial workflow to optimize Grid performance. In: ACM Symposium of Applied Computing, pp. 786–790 (2006)

  33. Meyer, L., Wilde, M., Mattoso, M., Foster, I.: An opportunistic algorithm for scheduling workflows on Grids. In: International Conference on High Performance Computing for Computational Science (VecPar). LNCS, vol. 4395, pp. 212–224. Springer, Berlin Heidelberg New York (2007)

  34. Narada project. http://aspen.ucs.indiana.edu/users/shrideep/narada

  35. Nejdl, W., Siberski, W., Sintek, M.: Design issues and challenges for RDF- and schema-based peer-to-peer systems. SIGMOD Rec. 32(3) (2003)

  36. Oinn, T., Greenwood, R., Goble, C., Wroe, C., et al.: Taverna: lessons in creating a workflow environment for the life sciences. Concurrency and Computation: Practice & Experience, Special Issue on Scientific Workflows 18(10), 1067–1100 (2006)

    Article  Google Scholar 

  37. Open Grid Services Architecture. http://www.globus.org/ogsa/

  38. Özsu, T., Valduriez, P.: Principles of Distributed Database Systems, 3rd edn. Prentice-Hall, Englewood Cliffs, NJ.

  39. Pacitti, E., Dedieu, O.: Algorithms for optimistic replication on the web. J. Braz. Comput. Soc. 8(2) (2002)

  40. Ratnasamy, S., et al.: A scalable content-addressable network. In: Proceedings of SIGCOMM (2001)

  41. SETI@home. http://www.setiathome.ssl.berkeley.edu/

  42. Saito, Y., Shapiro, M.: Optimistic replication. ACM Comput. Surv. 37(1), 42–81 (2005)

    Article  Google Scholar 

  43. Singh, G., Kesselman, C., Deelman, E.: Optimizing Grid-based workflow execution. J. Grid Computing 3(3–4), 201–219 (2005)

    Article  Google Scholar 

  44. Stoica I., et al.: Chord: a scalable peer-to-peer lookup service for internet applications. In: Proceedings of SIGCOMM (2001)

  45. Tangmunarunkit, H., Decker, S., Kesselman, C.: Ontology-based resource matching in the Grid – the Grid meets the semantic web. In: International Semantic Web Conference (ISWC) (2003)

  46. Tatarinov, I., et al.: The piazza peer data management project. SIGMOD Rec. 32(3) (2003)

  47. Tomasic, A., Raschid, L., Valduriez, P.: Scaling access to heterogeneous data sources with DISCO. IEEE Trans. Knowl. Data Eng. 10(5), 808–823 (1998)

    Article  Google Scholar 

  48. Valduriez, P.: Parallel database systems: open problems and new issues. Distributed and Parallel Databases 1(2), 137–165 (1993)

    Article  Google Scholar 

  49. Valduriez, P., Pacitti, E.: Data management in large-scale P2P systems. In: International Conference on High Performance Computing for Computational Science (VecPar). LNCS, vol. 3402, pp. 109–122. Springer, Berlin Heidelberg New York (2005)

  50. Van der Aalst, W., et al.: Life after BPEL? In: LNCS, vol. 3670, pp. 35–50. Springer, Berlin Heidelberg New York (2005)

  51. Wieczorek, M., Prodan, R., Fahringer, T.: Scheduling of scientific workflows in the ASKALON Grid environment. SIGMOD Rec. 34(3), 56–62 (2005)

    Article  Google Scholar 

  52. Yagoub, K., Florescu, D., Issarny, V., Valduriez, P.: Caching strategies for data-intensive web sites. In: International Conference on VLDB (2000)

  53. Yu, J., Buyya, R.: A taxonomy of scientific workflow systems for Grid computing, SIGMOD Record, Special Section on Scientific Workflows 34(3), 44–49 (2005)

    Google Scholar 

  54. Yang, B., Garcia-Molina, H.: Designing a super-peer network. In: International Conference on Data Engineering (2003)

  55. Zhao, Y., Dobson, J., Foster, I., Moreau, L., Wilde, M.: A notation and system for expressing and executing cleanly typed workflows on messy scientific data. SIGMOD Rec. 34(3), 37–43 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Valduriez.

Additional information

Work partially funded by ARA “Massive Data” of the French ministry of research (project Respire), the European Strep Grid4All project, the CAPES–COFECUB Daad project and the CNPq–INRIA Gridata project.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pacitti, E., Valduriez, P. & Mattoso, M. Grid Data Management: Open Problems and New Issues. J Grid Computing 5, 273–281 (2007). https://doi.org/10.1007/s10723-007-9081-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-007-9081-9

Keywords