Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Personal adaptive clusters as containers for scientific jobs

  • Original Paper
  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

We describe a system for creating personal clusters in user-space to support the submission and management of thousands of compute-intensive serial jobs to the network-connected compute resources on the NSF TeraGrid. The system implements a robust infrastructure that submits and manages job proxies across a distributed computing environment. These job proxies contribute resources to personal clusters created dynamically for a user on-demand. The personal clusters then adapt to the prevailing job load conditions at the distributed sites by migrating job proxies to sites expected to provide resources more quickly. Furthermore, the system allows multiple instances of these personal clusters to be created as containers for individual scientific experiments, allowing the submission environment to be customized for each instance. The version of the system described in this paper allows users to build large personal Condor and Sun Grid Engine clusters on the TeraGrid. Users then manage their scientific jobs, within each personal cluster, with a single uniform interface using the feature-rich functionality found in these job management environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. The NSF TeraGrid, http://www.teragrid.org

  2. Walker, E., et al.: TeraGrid Scheduling Requirement Analysis Team Final Report, http://www.tacc.utexas.edu/~ewalker/sched-rat.pdf

  3. The Compact Muon Solenoid Experiment, http://cmsinfo.cern.ch

  4. NSF National Virtual Observatory TeraGrid Utilization Proposal to NRAC, 2004, http://us-vo.org/pubs/files/teragrid-nvo-final.pdf

  5. Foster, I., Kesselman, C.: Globus: a metacomputing infrastructure toolkit. Int. J. Supercomput. Appl. 11(2), 115–128 (1997)

    Article  Google Scholar 

  6. Globus Toolkit, http://www.globus.org/toolkit

  7. Grid Resource Allocation and Management (GRAM) component, http://www.globus.org/toolkit/gram/

  8. UNICORE forum, http://www.unicore.org

  9. Frey, J., Tannenbaum, T., Foster, I., Livny, M., Tuecke, S.: Condor-G: a computation management agent for multi-institutional grids. In: Proceedings 10th IEEE International Symposium on High Performance Distributed Computing, San Francisco, California, August 2001

  10. Buyya, R., Abramson, D., Giddy, J.: Nimrod/G: an architecture of a resource management and scheduling system in a global computational grid. In: Proceedings of HPC Asia, pp. 283–289, May 2000

  11. Casanova, H., Obertelli, G., Berman, F., Wolski, R.: The AppLeS parameter sweep template: user-level middleware for the grid. In: Proceedings of Supercomputing’00 (SC00), pp. 75–76, Nov 2000

  12. TeraGrid site scheduling policies, http://www.teragrid.org/userinfo/guide_tgpolicy.html

  13. National Virtual Observatory, http://www.us-vo.org

  14. Wilkinson Microwave Anisotropy Probe Dataset, http://map.gsfc.nasa.gov

  15. Sloan Digital Sky Survey, http://www.sdss.org

  16. Litvin, V.A., Newman, H., Shevchenko, S., Wisniewski, N.: QCD jet simulation with CMS at LHC and background studies to H→γ γ process. In: Proceedings of 10th International Conf. on Calorimetry in High Energy Physics (CALOR2002), Pasadena, Cal., March 2002, pp. 25–30

  17. TeraGrid GridShell/Condor System, http://www.teragrid.org/userinfo/guide_jobs_gridshell.html

  18. Condor, High Throughput Computing Environment, http://www.cs.wisc.edu/Condor/

  19. Litzkow, M., Livny, M., Matka, M.: Condor—a hunter of idle workstations. In: Proceeding of the International Conference of Distributed Computing Systems, June 1988, pp. 104–111

  20. GridShell, http://www.gridshell.net

  21. Walker, E., Minyard, T.: Orchestrating and coordinating scientific/engineering workflows using GridShell. In: Proceedings 13th IEEE International Symposium on High Performance Distributed Computing, Honolulu, Hawaii, June 2004, pp. 270–271

  22. Portable Batch System, http://www.openpbs.org

  23. Sun Grid Engine, http://gridengine.sunsource.net/

  24. The Globus Alliance. Overview of the Grid Security Infrastructure, http://www.globus.org/security/overview.html

  25. Deelman, E., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Blackburn, K., Lazzarini, A., Arbree, A., Cavanaugh, R., Koranda, S.: Mapping abstract complex workflows onto grid environments. J. Grid Comput. 1(1), 25–29 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Edward Walker.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Walker, E., Gardner, J.P., Litvin, V. et al. Personal adaptive clusters as containers for scientific jobs. Cluster Comput 10, 339–350 (2007). https://doi.org/10.1007/s10586-007-0028-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-007-0028-5

Keywords