Personal adaptive clusters as containers for scientific jobs

Walker, Edward; Gardner, Jeffrey P.; Litvin, Vladimir; Turner, Evan L.

doi:10.1007/s10586-007-0028-5

Personal adaptive clusters as containers for scientific jobs

Original Paper
Published: 14 June 2007

Volume 10, pages 339–350, (2007)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Edward Walker¹,
Jeffrey P. Gardner²,
Vladimir Litvin³ &
…
Evan L. Turner⁴

71 Accesses
Explore all metrics

Abstract

We describe a system for creating personal clusters in user-space to support the submission and management of thousands of compute-intensive serial jobs to the network-connected compute resources on the NSF TeraGrid. The system implements a robust infrastructure that submits and manages job proxies across a distributed computing environment. These job proxies contribute resources to personal clusters created dynamically for a user on-demand. The personal clusters then adapt to the prevailing job load conditions at the distributed sites by migrating job proxies to sites expected to provide resources more quickly. Furthermore, the system allows multiple instances of these personal clusters to be created as containers for individual scientific experiments, allowing the submission environment to be customized for each instance. The version of the system described in this paper allows users to build large personal Condor and Sun Grid Engine clusters on the TeraGrid. Users then manage their scientific jobs, within each personal cluster, with a single uniform interface using the feature-rich functionality found in these job management environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

The NSF TeraGrid, http://www.teragrid.org
Walker, E., et al.: TeraGrid Scheduling Requirement Analysis Team Final Report, http://www.tacc.utexas.edu/~ewalker/sched-rat.pdf
The Compact Muon Solenoid Experiment, http://cmsinfo.cern.ch
NSF National Virtual Observatory TeraGrid Utilization Proposal to NRAC, 2004, http://us-vo.org/pubs/files/teragrid-nvo-final.pdf
Foster, I., Kesselman, C.: Globus: a metacomputing infrastructure toolkit. Int. J. Supercomput. Appl. 11(2), 115–128 (1997)
Article Google Scholar
Globus Toolkit, http://www.globus.org/toolkit
Grid Resource Allocation and Management (GRAM) component, http://www.globus.org/toolkit/gram/
UNICORE forum, http://www.unicore.org
Frey, J., Tannenbaum, T., Foster, I., Livny, M., Tuecke, S.: Condor-G: a computation management agent for multi-institutional grids. In: Proceedings 10th IEEE International Symposium on High Performance Distributed Computing, San Francisco, California, August 2001
Buyya, R., Abramson, D., Giddy, J.: Nimrod/G: an architecture of a resource management and scheduling system in a global computational grid. In: Proceedings of HPC Asia, pp. 283–289, May 2000
Casanova, H., Obertelli, G., Berman, F., Wolski, R.: The AppLeS parameter sweep template: user-level middleware for the grid. In: Proceedings of Supercomputing’00 (SC00), pp. 75–76, Nov 2000
TeraGrid site scheduling policies, http://www.teragrid.org/userinfo/guide_tgpolicy.html
National Virtual Observatory, http://www.us-vo.org
Wilkinson Microwave Anisotropy Probe Dataset, http://map.gsfc.nasa.gov
Sloan Digital Sky Survey, http://www.sdss.org
Litvin, V.A., Newman, H., Shevchenko, S., Wisniewski, N.: QCD jet simulation with CMS at LHC and background studies to H→γ γ process. In: Proceedings of 10th International Conf. on Calorimetry in High Energy Physics (CALOR2002), Pasadena, Cal., March 2002, pp. 25–30
TeraGrid GridShell/Condor System, http://www.teragrid.org/userinfo/guide_jobs_gridshell.html
Condor, High Throughput Computing Environment, http://www.cs.wisc.edu/Condor/
Litzkow, M., Livny, M., Matka, M.: Condor—a hunter of idle workstations. In: Proceeding of the International Conference of Distributed Computing Systems, June 1988, pp. 104–111
GridShell, http://www.gridshell.net
Walker, E., Minyard, T.: Orchestrating and coordinating scientific/engineering workflows using GridShell. In: Proceedings 13th IEEE International Symposium on High Performance Distributed Computing, Honolulu, Hawaii, June 2004, pp. 270–271
Portable Batch System, http://www.openpbs.org
Sun Grid Engine, http://gridengine.sunsource.net/
The Globus Alliance. Overview of the Grid Security Infrastructure, http://www.globus.org/security/overview.html
Deelman, E., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Blackburn, K., Lazzarini, A., Arbree, A., Cavanaugh, R., Koranda, S.: Mapping abstract complex workflows onto grid environments. J. Grid Comput. 1(1), 25–29 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Texas Advanced Computing Center, University of Texas at Austin, Austin, TX, 78758, USA
Edward Walker
Pittsburgh Supercomputing Center, Pittsburgh, PA, 15213, USA
Jeffrey P. Gardner
High Energy Physics Group, California Institute of Technology, Pasadena, CA, 91125, USA
Vladimir Litvin
Texas Advanced Computing Center, University of Texas at Austin, Austin, TX, 78758, USA
Evan L. Turner

Authors

Edward Walker
View author publications
You can also search for this author in PubMed Google Scholar
Jeffrey P. Gardner
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Litvin
View author publications
You can also search for this author in PubMed Google Scholar
Evan L. Turner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Edward Walker.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Walker, E., Gardner, J.P., Litvin, V. et al. Personal adaptive clusters as containers for scientific jobs. Cluster Comput 10, 339–350 (2007). https://doi.org/10.1007/s10586-007-0028-5

Download citation

Published: 14 June 2007
Issue Date: September 2007
DOI: https://doi.org/10.1007/s10586-007-0028-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Personal adaptive clusters as containers for scientific jobs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Sarus: Highly Scalable Docker Containers for HPC Systems

Desktop supercomputer: what can it do?

The Ignite Distributed Collaborative Scientific Visualization System

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Personal adaptive clusters as containers for scientific jobs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Sarus: Highly Scalable Docker Containers for HPC Systems

Desktop supercomputer: what can it do?

The Ignite Distributed Collaborative Scientific Visualization System

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now