Abstract
This paper proposes a prediction engine designed for non-dedicated clusters, which is able to estimate the turnaround time for parallel applications, even in the presence of serial workload of the workstation owner. The prediction engine can be configured to work with three different estimation kernels: a Historical kernel, a Simulation kernel based on analytical models and an integration of both, named Hybrid kernel. These estimation proposals were integrated into a scheduling system, named CISNE, which can be executed in an on-line or off-line mode. The accuracy of the proposed estimation methods was evaluated in relation to different job scheduling policies in a real and a simulated cluster environment. In both environments, we observed that the Hybrid system gives the best results because it combines the ability of a simulation engine to capture the dynamism of a non-dedicated environment together with the accuracy of the historical methods to estimate the application runtime considering the state of the resources.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Acharya A, Setia S. Availability and utility of idle memory in workstation clusters. In Proc. the ACM SIGMET-RICS/PERFORMANCE1999, Atlanta, USA, May 1-4, 1999, pp. 35–46.
Kuo C H. A study of resource allocation for non-dedicated distributed shared memory systems [M.S. Thesis]. “National Cheng-Kung University” 2004.
Mahanti J, Eager D L. Adaptive data parallel computing on workstation clusters. Journal of Parallel and Distributed Computing, 2004, 64(11): 1241–1255.
Stava M, Tvrdik P. Overlapping non-dedicated clusters architecture. In Proc. Int. Conf. Computer Engineering and Technology, Singapore, Jan. 22-24, 2009, pp. 3–10.
Litzkow M, Livny M, Mutka M. Condor — A hunter of idle workstations. In Proc. the 8th Int. Conference of Distributed Computing Systems, San Jose, USA, Jun. 13-17, 1988, pp. 104–111.
Chowdhury A, Nicklas L, Setia S, White E. Supporting dynamic space-sharing on non-dedicated clusters of workstations. In Proc. the 17th International Conference on Distributed Computing Systems (ICDCS 1997), Baltimore, USA, May 27-30, 1997, pp. 149–158.
Goscinski A M, Wong A. A study of the concurrent execution of parallel and sequential applications on a non-dedicated cluster. Parallel Computing, 2008, 34(2): 69–91.
Hanzich M, Giné F, Hernández P, Solsona F, Luque E. CISNE: A new integral approach for scheduling parallel applications on non-dedicated clusters. In Proc. EuroPar 2005, Lisbon, Portugal, Aug. 30-Sept. 2, 2005, pp. 220–230.
Urgaonkar B, Shenoy P. Sharc: Managing CPU and networks bandwidth in shared clusters. IEEE Transactions on Parallel and Distributed Systems, 2004, 15(1): 2–17.
Harchol-Balter M, Li C, Osogami T, Scheller-Wolf A, Squillante M S. Cycle stealing under immediate dispatch task assignment. In Proc. the 15th Annual ACM Symp. Parallel Algorithms and Architectures, San Diego, USA, Jun. 7-9, 2003, pp. 274–285.
Lafreniere B J, Sodan A C. Scopred — Scalable user-directed performance prediction using complexity modeling and historical data. In Proc. Workshop on Job Scheduling Strategies for Parallel Processing, Cambridge, USA, Jun. 19, 2005, pp. 62–90.
Downey A B. Predicting queue times on space-sharing parallel computers. In Proc. the 11th International Symposium on Parallel Processing (IPPS 1997), San Juan, Puerto Rico, Apr. 12-16, 1997, pp. 209–218.
Gibbons R. A historical application profiler for use by parallel schedulers. In Proc. Workshop on Job Scheduling Strategies for Parallel Processing, Geneva, Switzerland, Apr. 5, 1997, pp. 58–77.
Smith W, Foster I, Taylor V. Predicting application run times with historical information. Journal of Parallel and Distributed Computing, 2004, 64: 1007–1016.
Wolski R. Experiences with Predicting resource performance on-line in computational grid settings. ACM SIGMETRICS Performance Evaluation Review, 2003, 30(4): 41–49.
Yang L, Schopf J M, Foster I. Conservative scheduling: Using predicted variance to improve scheduling decisions in dynamic environments. In Proc. Supercomputing, Phoenix, USA, Nov. 15-21, 2003, pp. 262–273.
Kerbyson D J, Harper J S, Craig A, Nudd G R. PACE: A toolset to investigate and predict performance in parallel systems. In Proc. European Parallel Tools Meeting, Onera, France, Oct. 23, 1996.
Jarvis S A, Spoone D Pr, H N Lim Choi Keung, Cao J, Saini S, Nudd G R. Performance prediction and its use in parallel and distributed computing systems. Future Generation Computer Systems Special Issue on System Performance Analysis and Evaluation, 2004, 22(7): 745–754.
Hanzich M, Hernandez P, Luque E, Gine F, F Solsona, Lerida J L. Using simulation, historical and hybrid estimation systems for enhancing job scheduling on NOWs. In Proc. IEEE International Conference on Cluster Computing, Barcelona, Spain, Sept. 25-28, 2006, pp. 1–12.
Li H, Groep D, Templon J, Wolters L. Predicting job start times on clusters. In Proc. the 4th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2004), Chicago, USA, Apr. 19-22, 2004, pp. 301–308.
Smith W, Wong P. Resource selection using execution and queue wait time predictions. NAS Technical Reports, 2002.
Mu’alem A W, Feitelson D G. Utilization, predictability, workloads, and user runtime estimates in scheduling the IBM SP2 with backfilling. IEEE Transaction on Parallel & Distributed Systems, 2001, 12(6): 529–543.
Nissimov A, Feitelson D G. Probabilistic backfilling. In Proc. JSSPP 2007, Seattle, USA, Jun. 17, 2007, pp. 102–115.
Zhang Y, Franke H, Moreira J E, Sivasubramaniam A. An integrated approach to parallel scheduling using gang-scheduling, backfilling and migration. IEEE Transactions on Parallel and Distributed Systems, 2003, 14(3): 236–247.
Talby D, Feitelson D G. Improving and stabilizing parallel computer performance using adaptive backfilling. In Proc. the 19th IEEE Int. Parallel and Distributed Processing Symposium (IPDPS 2005), Denver, USA, Apr. 4-8, 2005.
Tsafrir D, Etsion Y, Feitelson D G. Backfilling using system-generated predictions rather than user runtime estimates. IEEE Transactions on Parallel and Distributed Systems, June 2007, 18(6): 789–803.
He L, Jarvis S A, Spooner D P, Nudd G R. Dynamic, capability-driven scheduling of dag-based real-time jobs in heterogeneous clusters. International Journal of High Performance Computing and Networking, 2004, 2(2-4): 165–177.
Dinda P A. Design, implementation, and performance of an extensible toolkit for resource prediction in distributed systems. IEEE Transactions on Parallel and Distributed Systems, 2006, 17(2): 160–173.
Lin B, Sundarara A I, Dinda P A. Time-sharing parallel applications with performance isolation and control. In Proc. International Conference on Autonomic Computing, Jouksonville, USA, Jun. 11-15, 2007, p. 28.
Brevik J, Nurmi D,Wolski R. Using model-based clustering to improve predictions for queueing delay on parallel machines. Parallel Processing Letters (PPL), Jan. 2007, 17(1): 21–46.
Shmueli E, Feitelson D G. Backfilling with lookahead to optimize the performance of parallel job scheduling. In Proc. Workshop on Job Scheduling Strategies for Parallel Processing, Seattle, USA, Jun. 24, 2003, pp. 228–251.
Srinivasan S, Kettimuthu R, Subrarnani V, Sadayappan P. Characterization of back¯lling strategies for parallel job scheduling. In Proc. International Conference on Parallel Processing Workshops (ICPPW2002), Vancouver, Canada, Aug. 20-23, 2002, pp. 514–522.
Arpaci R H, Dusseau A C, Vahdat A M, Liu L T, Anderson T E, Patterson D A. The interaction of parallel and sequential workloads on a network of workstations. In Proc. the ACM SIGMETRICS/PERFORMANCE1995, 1995, pp. 267–277.
Giné F, Solsona F, Hanzich M, Hernández P, Luque E. Cooperating coscheduling: A coscheduling proposal aimed at mon-Dedicated heterogeneous NOWs. Journal of Computer Science and Technology, 2007, 22(5): 695–710.
Hanzich M, Giné F, Hernández P, Solsona F, Luque E. A space and time sharing scheduling approach for PVM non-dedicated clusters. In Proc. EuroPVM/MPI 2005, Sorrento, Italy, Sept. 18-21, 2005, pp. 379–387.
Mutka M, Livny M. The available capacity of a privately owned workstation environment. J. Performance Evaluation, 1991, 12(4): 269–284.
Bailey D H, Barszcz E, Barton J T, Browning D S, Carter R L, Dagum D, Fatoohi R A, Frederickson P O, Lasinski T A, Schreiber R S, Simon H D, Venkatakrishnan V, Weeratunga S K. The NAS parallel benchmarks. The International Journal of Supercomputer Applications, 1991, 5(3): 63–73.
Lublin U, Feitelson D G. The workload on parallel supercomputers: Modeling the characteristics of rigid jobs. J. Parallel Distrib. Comput., 2003, 63(11): 1105–1122.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the MEyC under Grant No. TIN 2008-05913.
Rights and permissions
About this article
Cite this article
Hanzich, M., Hernández, P., Giné, F. et al. On/Off-Line Prediction Applied to Job Scheduling on Non-Dedicated NOWs. J. Comput. Sci. Technol. 26, 99–116 (2011). https://doi.org/10.1007/s11390-011-9418-5
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-011-9418-5