Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

A resource query interface for network-aware applications

Published: 01 April 1999 Publication History

Abstract

Networked systems provide a cost-effective platform for parallel computing, but the applications have to deal with the changing availability of computation and communication resources. Network-awareness is a recent attempt to bridge the gap between the realities of networks and the demands of applications. Network-aware applications obtain information about their execution environment and dynamically adapt to enhance their performance. Adaptation is especially important for synchronous parallel applications because a single busy communication link can become the bottleneck and degrade overall performance dramatically. This paper presents Remos, a uniform API that allows applications to obtain relevant network information, and reports on the development of parallel applications in this environment. The challenges in defining a uniform interface include network heterogeneity, diversity and variability in network traffic, and resource sharing in the network and even inside an application. The first implementation of the Remos interface uses SNMP to monitor IP-based networks. This paper reports on our methodology for developing adaptive parallel applications for high-speed networks with Remos and presents experimental results using applications generated by the Fx parallelizing compiler. The results highlight the importance and effectiveness of adaptive parallel computing.

References

[1]
{1} ATM User-Network Interface Specification. Version 4.0, ATM Forum document (1996).
[2]
{2} H. Bao, J. Bielak, O. Ghattas, D.R. O'Hallaron, L.F. Kallivokas, J.R. Shewchuk and J. Xu, Earthquake ground motion modeling on parallel computers, in: Proceedings of Supercomputing '96 , Pittsburgh, PA (November 1996).
[3]
{3} J. Bolliger and T. Gross, A framework-based approach to the development of network-aware applications, IEEE Trans. Software Engrg. 24(5) (May 1998) 376-390.
[4]
{4} J. Case, K. McCloghrie, M. Rose and S. Waldbusser, Protocol Operations for Version 2 of the Simple Network Management Protocol (SNMPv2), RFC 1905 (January 1999).
[5]
{5} P. Dinda, Statistical properties of host load in a distributed environment, in: Fourth Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers , Pittsburgh, PA (May 1998).
[6]
{6} T.M. Forum, MPI: A Message Passing Interface, in: Proceedings of Supercomputing '93 , ACM/IEEE, Oregon (November 1993) pp. 878-883.
[7]
{7} I. Foster and K. Kesselman, Globus: A metacomputing infrastructure toolkit, Journal of Supercomputer Applications 11(2) (1997) 115- 128.
[8]
{8} G.A. Geist and V.S. Sunderam, The PVM system: Supercomputer level concurrent computation on a heterogeneous network of workstations, in: Proceedings of the 6th Distributed Memory Computing Conference , IEEE (April 1991) pp. 258-261.
[9]
{9} A. Grimshaw, W. Wulf and Legion Team, The Legion vision of a worldwide virtual computer, Communications of the ACM 40(1) (January 1997).
[10]
{10} T. Gross, D. O'Hallaron and J. Subhlok, Task parallelism in a high performance fortran framework, IEEE Parallel & Distributed Technology 2(3) (Fall 1994) 16-26.
[11]
{11} E.L. Hahne, Round-robin scheduling for max-min fairness in data networks, IEEE Journal on Selected Areas in Communications 9(7) (September 1991).
[12]
{12} J.M. Jaffe, Bottleneck flow control, IEEE Transactions on Communications 29(7) (July 1981) 954-962.
[13]
{13} R. Jain, The Art of Computer Systems Performance Analysis (Wiley, New York, 1991).
[14]
{14} R. Jain, Congestion control and traffic management in ATM networks: Recent advances and a survey, Computer Networks and ISDN Systems (February 1995).
[15]
{15} C. Koelbel, D. Loveman, G. Steele and M. Zosel, The High Performance Fortran Handbook (MIT Press, Cambridge, MA, 1994).
[16]
{16} M. Litzkow, M. Livny and M. Mutka, Condor - A hunter of idle workstations, in: Proceedings of the 8th Conference on Distributed Computing Systems , San Jose, CA (June 1988).
[17]
{17} B. Noble, M. Satyanarayanan, D. Narayanan, J. Tilton, J. Flinn and K. Walker, Agile application-aware adaptation for mobility, in: Proceedings of the 16th Symposium on Operating System Principles (October 1997) pp. 276-287.
[18]
{18} K. Obraczka and G. Gheorghiu, The performance of a service for network-aware applications, Technical Report TR 97-660, Computer Science Department, University of Southern California (October 1997).
[19]
{19} J. Schopf and F. Berman, Performance prediction in production environments, in: 12th International Parallel Processing Symposium , Orlando, FL (April 1998) pp. 647-653.
[20]
{20} S. Sharma, R. Ponnusamy, B. Moon, Y. Hwang, R. Das and J. Saltz, Run-time and compile-time support for adaptive irregular problems, in: Proceedings of Supercomputing '94 , Washington, DC (November 1994) pp. 97-106.
[21]
{21} B. Siegell, Automatic generation of parallel programs with dynamic load balancing for a network of workstations, Ph.D. thesis, Department of Computer and Electrical Engineering, Carnegie Mellon University (1995). Also appeared as Technical Report CMU-CS-95-168.
[22]
{22} B. Siegell and P. Steenkiste, Automatic selection of load balancing parameters using compile-time and run-time information, Concurrency - Practice and Experience 9(3) (1996) 275-317.
[23]
{23} P. Steenkiste, Adaptation models for network-aware distributed computations, in: 3rd Workshop on Communication, Architecture, and Applications for Network-based Parallel Computing (CANPC'99) , IEEE, Orlando, January 1999 (Springer, 1999).
[24]
{24} M. Stemm, S. Seshan and R. Katz, Spand: Shared passive network performance discovery, in: USENIX Symposium on Internet Technologies and Systems , Monterey, CA (June 1997).
[25]
{25} J. Subhlok, P. Steenkiste, J. Stichnoth and P. Lieu, Airshed pollution modeling: A case study in application development in an HPF environment, in: 12th International Parallel Processing Symposium , Orlando, FL (April 1998).
[26]
{26} J. Subhlok and G. Vondran, Optimal latency-throughput tradeoffs for data parallel pipelines, in: 8th Annual ACM Symposium on Parallel Algorithms and Architectures , Padua, Italy (June 1996) pp. 62-71.
[27]
{27} J. Subhlok and B. Yang, A new model for integrated nested task and data parallel programming, in: Proceedings of the Sixth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , ACM (June 1997).
[28]
{28} H. Tangmunarunkit and P. Steenkiste, Network-aware distributed computing: A case study, in: 2nd Workshop on Runtime Systems for Parallel Programming (RTSPP) , IEEE, Orlando (March 1998). Proceedings to be published by Springer. Held in conjunction with IPPS '98.
[29]
{29} R. Wolski, N. Spring and C. Peterson, Implementing a performance forecasting system for metacomputing: The network weather service, Technical Report TR-CS97-540, University of California, San Diego (May 1997).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Cluster Computing
Cluster Computing  Volume 2, Issue 2
1999
69 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 April 1999

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2009)Taking the skeletons out of the closetsIEEE/ACM Transactions on Networking10.1109/TNET.2009.202226417:5(1385-1398)Online publication date: 1-Oct-2009
  • (2004)A Network Topology Description Model for Grid Application DeploymentProceedings of the 5th IEEE/ACM International Workshop on Grid Computing10.1109/GRID.2004.2(61-68)Online publication date: 8-Nov-2004
  • (2003)Increasing system dependability through architecture-based self-repairArchitecting dependable systems10.5555/1768179.1768183(61-89)Online publication date: 1-Jan-2003
  • (2002)Software Architecture-Based Adaptation for Grid ComputingProceedings of the 11th IEEE International Symposium on High Performance Distributed Computing10.5555/822086.823389Online publication date: 24-Jul-2002
  • (2002)Software Architecture-Based Adaptation for Pervasive SystemsProceedings of the International Conference on Architecture of Computing Systems: Trends in Network and Pervasive Computing10.5555/648198.751340(67-82)Online publication date: 8-Apr-2002
  • (2002)A framework for integrating network information into distributed iterative solution of sparse linear systemsProceedings of the 5th international conference on High performance computing for computational science10.5555/1766851.1766886(436-447)Online publication date: 26-Jun-2002
  • (2002)Model-based adaptation for self-healing systemsProceedings of the first workshop on Self-healing systems10.1145/582128.582134(27-32)Online publication date: 18-Nov-2002
  • (2002)Exploiting architectural design knowledge to support self-repairing systemsProceedings of the 14th international conference on Software engineering and knowledge engineering10.1145/568760.568804(241-248)Online publication date: 15-Jul-2002
  • (2001)Topology discovery for large ethernet networksACM SIGCOMM Computer Communication Review10.1145/964723.38307831:4(237-248)Online publication date: 27-Aug-2001
  • (2001)Topology discovery for large ethernet networksProceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications10.1145/383059.383078(237-248)Online publication date: 27-Aug-2001

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media