Abstract
We describe a system for creating personal clusters in user-space to support the submission and management of thousands of compute-intensive serial jobs to the network-connected compute resources on the NSF TeraGrid. The system implements a robust infrastructure that submits and manages job proxies across a distributed computing environment. These job proxies contribute resources to personal clusters created dynamically for a user on-demand. The personal clusters then adapt to the prevailing job load conditions at the distributed sites by migrating job proxies to sites expected to provide resources more quickly. Furthermore, the system allows multiple instances of these personal clusters to be created as containers for individual scientific experiments, allowing the submission environment to be customized for each instance. The version of the system described in this paper allows users to build large personal Condor and Sun Grid Engine clusters on the TeraGrid. Users then manage their scientific jobs, within each personal cluster, with a single uniform interface using the feature-rich functionality found in these job management environments.
Similar content being viewed by others
References
The NSF TeraGrid, http://www.teragrid.org
Walker, E., et al.: TeraGrid Scheduling Requirement Analysis Team Final Report, http://www.tacc.utexas.edu/~ewalker/sched-rat.pdf
The Compact Muon Solenoid Experiment, http://cmsinfo.cern.ch
NSF National Virtual Observatory TeraGrid Utilization Proposal to NRAC, 2004, http://us-vo.org/pubs/files/teragrid-nvo-final.pdf
Foster, I., Kesselman, C.: Globus: a metacomputing infrastructure toolkit. Int. J. Supercomput. Appl. 11(2), 115–128 (1997)
Globus Toolkit, http://www.globus.org/toolkit
Grid Resource Allocation and Management (GRAM) component, http://www.globus.org/toolkit/gram/
UNICORE forum, http://www.unicore.org
Frey, J., Tannenbaum, T., Foster, I., Livny, M., Tuecke, S.: Condor-G: a computation management agent for multi-institutional grids. In: Proceedings 10th IEEE International Symposium on High Performance Distributed Computing, San Francisco, California, August 2001
Buyya, R., Abramson, D., Giddy, J.: Nimrod/G: an architecture of a resource management and scheduling system in a global computational grid. In: Proceedings of HPC Asia, pp. 283–289, May 2000
Casanova, H., Obertelli, G., Berman, F., Wolski, R.: The AppLeS parameter sweep template: user-level middleware for the grid. In: Proceedings of Supercomputing’00 (SC00), pp. 75–76, Nov 2000
TeraGrid site scheduling policies, http://www.teragrid.org/userinfo/guide_tgpolicy.html
National Virtual Observatory, http://www.us-vo.org
Wilkinson Microwave Anisotropy Probe Dataset, http://map.gsfc.nasa.gov
Sloan Digital Sky Survey, http://www.sdss.org
Litvin, V.A., Newman, H., Shevchenko, S., Wisniewski, N.: QCD jet simulation with CMS at LHC and background studies to H→γ γ process. In: Proceedings of 10th International Conf. on Calorimetry in High Energy Physics (CALOR2002), Pasadena, Cal., March 2002, pp. 25–30
TeraGrid GridShell/Condor System, http://www.teragrid.org/userinfo/guide_jobs_gridshell.html
Condor, High Throughput Computing Environment, http://www.cs.wisc.edu/Condor/
Litzkow, M., Livny, M., Matka, M.: Condor—a hunter of idle workstations. In: Proceeding of the International Conference of Distributed Computing Systems, June 1988, pp. 104–111
GridShell, http://www.gridshell.net
Walker, E., Minyard, T.: Orchestrating and coordinating scientific/engineering workflows using GridShell. In: Proceedings 13th IEEE International Symposium on High Performance Distributed Computing, Honolulu, Hawaii, June 2004, pp. 270–271
Portable Batch System, http://www.openpbs.org
Sun Grid Engine, http://gridengine.sunsource.net/
The Globus Alliance. Overview of the Grid Security Infrastructure, http://www.globus.org/security/overview.html
Deelman, E., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Blackburn, K., Lazzarini, A., Arbree, A., Cavanaugh, R., Koranda, S.: Mapping abstract complex workflows onto grid environments. J. Grid Comput. 1(1), 25–29 (2003)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Walker, E., Gardner, J.P., Litvin, V. et al. Personal adaptive clusters as containers for scientific jobs. Cluster Comput 10, 339–350 (2007). https://doi.org/10.1007/s10586-007-0028-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-007-0028-5