Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Distributed Performance Monitoring: Methods, Tools, and Applications

Published: 01 June 1994 Publication History
  • Get Citation Alerts
  • Abstract

    A method for analyzing the functional behavior and the performance of programs in distributed systems is presented. We use hybrid monitoring, a technique which combines advantages of both software monitoring and hardware monitoring. The paper contains a description of a hardware monitor and a software package (ZM4/SIMPLE) which make our concepts available to programmers, assisting them in debugging and tuning of their code. A short survey of related monitor systems highlights the distinguishing features of our implementation. As an application of our monitoring and evaluation system, the analysis of a parallel ray tracing program running on the SUPRENUM multiprocessor is described. It is shown that monitoring and modeling both rely on a common abstraction of a system's dynamic behavior and therefore can be integrated to one comprehensive methodology. This methodology is supported by a set of tools.

    References

    [1]
    {1} M. Ajmone Marsan, G. Balbo and G. Conte, "A class of generalized stochastic Petri nets for the performance evaluation of multiprocessor systems," ACM Trans. Comput. Syst., 1984.
    [2]
    {2} T. E. Anderson and E. D. Lazowska, "Quartz: A tool for tuning parallel program performance," Tech. Rep. TR # 89-10-05, Dept. of Comput. Sci., Univ. of Washington, Seattle, WA, Sept. 1989.
    [3]
    {3} P. Bates, "Debugging heterogeneous distributed systems using event-based models of behavior," ACM Sigplan Notices, Workshop on Parallel and Distrib. Debugging, vol. 24, no. 1, pp. 11-22, Jan. 1989.
    [4]
    {4} A. Böhm, J. Brehm and H. Finnemann, "Parallel conjugate gradient algorithms for solving the neutron diffusion equation," in Int. Conf. on Supercomputing, Cologne, June 1991, pp. 163-172.
    [5]
    {5} R. A. Becker, J. M. Chambers, and A. R. Wilks, The New S Language, a Programming Environment for Data Analysis and Graphics. Pacific Grove, CA: Wadsworth & Brooks/Cole Advanced Books & Software, 1988.
    [6]
    {6} F. Baiardi, N. DeFranco and G. Vaglini, "Development of a Debugger for a Concurrent Language," IEEE Trans. Software Eng., SE- 12(4):547-553, Apr. 1986.
    [7]
    {7} T. Bemmerl, R. Lindhof, and T. Treml, "The distributed monitor system of TOPSYS," in Proc. CONPAR 90-VAPP IV, Joint Int. Conf. on Vector and Parallel Processing, H. Burkhart, Ed. Zürich, Switzerland, September 1990, pp. 756-764.
    [8]
    {8} H. Burkhart and R. Millen, "Performance measurement tools in a multiprocessor environment," IEEE Trans. Comput., vol. 38, no. 5, pp. 725-737, May 1989.
    [9]
    {9} P. C. Bates and J. C. Wileden, Eds., "A basis for distributed system debugging tools," in Hawaii Int. Conf. on Syst. Sci. 15, Hawaii, 1982.
    [10]
    {10} G. Chiola, "GreatSPN 1.5 software architecture," in Proc. 5th Int. Conf. on Modelling Techn. and Tools for Comput. Perform. Eval., G. Balbo, Ed. New York: Elsevier Science Publisher B. V., 1991, pp. 117-132.
    [11]
    {11} A. Duda, G. Harrus, Y. Haddad, and G. Bernard, "Estimating global time in distributed systems," in Distrib. Syst., Proc. 7th Int. Conf., Berlin, Sept. 1987.
    [12]
    {12} P. Dauphin, F. Hartleb, M. Kienow, V. Mertsiotakis, and A. Quick, "PEPP: Performance evaluation of parallel programs--User's guide--Version 3.1," Tech. Rep. 5/92, Universität Erlangen-Nürnberg, IMMD VII, Apr. 1992.
    [13]
    {13} O. Endriss, M. Steinbrunn, and M. Zitterbart, "NETMON-II, A monitoring tool for distributed and multiprocessor systems," in Proc. 4th Int Conf. on Data Communication and Their Performance, Barcelona, Spain, June 1990.
    [14]
    {14} D. Ferrari, "Considerations on the insularity of performance evaluation," IEEE Trans. Software Eng., vol. SE-12, no. 6, pp. 678-683, June 1986.
    [15]
    {15} D. Ferrari, G. Serazzi, and A. Zeigner, Measurement and Tuning of Computer Systems. Englewood Cliffs, NJ: Prentice-Hall, 1983.
    [16]
    {16} F. M. Gardner, Phaselock Techniques, 2nd ed. New York: Wiley, 1979.
    [17]
    {17} K. Gallivan, W. Jalby, and H. Wijshoff, "Some basic performance measurements of the 16 × 16 CEDAR configuration," Tech. Rep. 1146, Center for Supercomputing Res. and Develop., Urbana, IL, Aug. 1991.
    [18]
    {18} A. S. Glassner, An Introduction to Ray Tracing. New York: Academic Press, 1989.
    [19]
    {19} A. A. Hough and J. E. Cuny, "Initial experiences with a pattern-oriented parallel debugger," ACM Sigplan Notices, Workshop on Parallel and Distrib. Debugging, vol. 24, no. 1, 195-205, Jan. 1989.
    [20]
    {20} U. Herzog, "Performance evaluation and formal description," in Advanced Computer Technology, Reliable Syst. and Applications, Proc., V. A. Monaco and R. Negrini, Eds., Bologna, May 1991, pp. 750-7551.
    [21]
    {21} D. Helmbold and D. Luckham, "Debugging Ada tasking programs," IEEE Software, vol. 2, no. 2, pp. 47-57, 1985.
    [22]
    {22} C. A. R. Hoare, Communicating Sequential Processes. Englewood Cliffs, NJ: Prentice-Hall, 1985.
    [23]
    {23} J. Joyce, G. Lomow, K. Slind, and B. Unger, "Monitoring distributed systems," ACM Trans. Comput. Syst., vol. 5, no. 2, pp. 121-150, 1987.
    [24]
    {24} R. Klar and N. Luttenberger, "VLSI-based monitoring of the interprocess-communication of multi-microcomputer systems with shared memory," in Proc. EUROMICRO '86, Microprocessing and Microprogramming , Venice, Italy, vol. 18, no. 1-5, Dec. 1986, pp. 195-204.
    [25]
    {25} L. Kleinrock, Queueing Systems, vol. 1: Theory. New York: John Wiley, 1975.
    [26]
    {26} H. Kobayashi, Modeling and Analysis: An Introduction to System Performance Evaluation Methodology. Reading, MA: Addison-Wesley, Oct. 1981.
    [27]
    {27} R. Klar, A. Quick, and F. Sötz, "Tools for a model-driven instrumentation for monitoring," in Proceedings of the 5th International Conference on Modelling Techniques and Tools for Computer Performance Evaluation, G. Balbo, Ed. New York: Elsevier Science, 1992, pp. 165-180.
    [28]
    {28} L. Lamport, "Time, clocks, and the ordering of events in a distributed system," Commun. ACM, vol. 21, no. 7, pp. 558-565, July 1978.
    [29]
    {29} N. Luttenberger and R. V. Stieglitz, "Performance evaluation of a communication subsystem prototype for broadband--ISDN," in Proc. 2nd Workshop on Future Trends of Distrib. Computing Syst. in the 1990's, Cairo, 1990.
    [30]
    {30} A. D. Malony, "Multiprocessor instrumentation: approaches for CEDAR," in Instrumentation for Future Parallel Computing Syst., M. Simmons, R. Koskela, and I. Bucher, Eds. New York: Addison-Wesley, ACM Press, Frontier Series, 1989, ch. 1, pp. 1-33.
    [31]
    {31} B. P. Miller, M. Clark, J. Hollingsworth, S. Kierstead, S.-S. Lim, and T. Torzewski, "IPS-2: The second generation of a parallel program measurement system," IEEE Trans. Parallel Distrib. Syst., vol. 1, no. 2, pp. 206-217, Apr. 1990.
    [32]
    {32} A. Mink, R. Carpenter, G. Nacht, and J. Roberts, "Multiprocessor performance-measurement instrumentation," Comput., vol. 23, no. 9, pp. 63-75, Sept. 1990.
    [33]
    {33} B. P. Miller, C. Macrander, and S. Sechrest, "A distributed programs monitor for Berkeley UNIX," Software--Practice and Experience, vol. 16, no. 2, pp. 183-200, Feb. 1986.
    [34]
    {34} B. Mohr, "SIMPLE: A performance evaluation tool environment for parallel and distributed systems," in Distrib. Memory Computing, 2nd European Conference, EDMCC2, A. Bode, Ed., Munich, Germany, Berlin: Springer, LNCS 487, Apr. 1991, pp. 80-89.
    [35]
    {35} G. J. Nutt, "Tutorial: Computer system monitors," IEEE Comput., vol. 8, no. 11, pp. 51-61, Nov. 1975.
    [36]
    {36} C.-W. Oehlrich and A. Quick, "Performance evaluation of a communication system for transputer-networks based on monitored event traces," ACM SIGARCH, vol. 19, no. 3, pp. 202-211, May 1991. Also in Proc. 18th Int. Symp. on Comput. Architecture, Toronto, ON, Canada, May 27-30, 1991.
    [37]
    {37} D. A. Reed, R. A. Aydt, T. M. Madhyastha, R. J. Noe, K. A. Shields, and B. W. Schwartz, "An overview of the Pablo performance analysis environment," Tech. Rep., Univ. of Illinois, Urbana, Nov. 1992.
    [38]
    {38} M. H. Reilly, A Performance Monitor for Parallel Programs. San Diego, CA: Academic Press, 1990.
    [39]
    {39} K. Schimek, "Modellierung eines Kommunikationssystems für Transputernetzwerke," Master's thesis, Universität Erlangen-Nürnberg, IMMD VII, Oct. 1991.
    [40]
    {40} R. Sahner and K. Trivedi, "Performance analysis and reliability analysis using directed acyclic graphs," IEEE Trans. Software Eng., vol. SE-13, no. 10, Oct. 1987.
    [41]
    {41} K. Solchenbach and U. Trottenberg, "SUPRENUM: System essentials and grid applications," Parallel Computing. Amsterdam: North-Holland, 1988, vol. 7, pp. 265-281.
    [42]
    {42} J. J. P. Tsai, K. Fang, and H. Chen, "A noninvasive architecture to monitor real-time distributed systems," Comput., vol. 23, no. 3, pp. 11-23, Mar. 1990.
    [43]
    {43} S. Utter, "Birds-of-a-feather session on standardizing parallel trace formats at Supercomputing '90," Private communication, 1990.
    [44]
    {44} D. Wybranietz and D. Haban, "Monitoring and measuring distributed systems," in Performance Instrumentation and Visualization, M. Simmons and R. Koskela, Eds. New York: Addison-Wesley Publishing Company, ACM Press, Frontier Series, ch. 2, 1990, pp. 27-45.

    Cited By

    View all
    • (2023)Combining gprof and event-driven monitoring for analyzing distributed programs: a rough view of NCSA mosaicJournal of Computer Science and Technology10.1007/BF0294848711:4(427-432)Online publication date: 22-Mar-2023
    • (2020)tpprofProceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388314(1015-1030)Online publication date: 25-Feb-2020
    • (2018)Non-Intrusive In-Situ Requirements Monitoring of Embedded SystemACM Transactions on Design Automation of Electronic Systems10.1145/320621323:5(1-27)Online publication date: 20-Aug-2018
    • Show More Cited By

    Index Terms

    1. Distributed Performance Monitoring: Methods, Tools, and Applications
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image IEEE Transactions on Parallel and Distributed Systems
      IEEE Transactions on Parallel and Distributed Systems  Volume 5, Issue 6
      June 1994
      115 pages

      Publisher

      IEEE Press

      Publication History

      Published: 01 June 1994

      Author Tags

      1. Index Termsprogram debugging
      2. SUPRENUM
      3. common abstraction
      4. debugging
      5. distributed processing
      6. distributed systems
      7. dynamic behavior
      8. functional behavior
      9. hybrid monitoring
      10. monitoring
      11. parallel ray tracing program
      12. performance evaluation
      13. performance monitoring
      14. system monitoring
      15. tuning

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 29 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Combining gprof and event-driven monitoring for analyzing distributed programs: a rough view of NCSA mosaicJournal of Computer Science and Technology10.1007/BF0294848711:4(427-432)Online publication date: 22-Mar-2023
      • (2020)tpprofProceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388314(1015-1030)Online publication date: 25-Feb-2020
      • (2018)Non-Intrusive In-Situ Requirements Monitoring of Embedded SystemACM Transactions on Design Automation of Electronic Systems10.1145/320621323:5(1-27)Online publication date: 20-Aug-2018
      • (2014)Enabling Resource Access Visibility for Automated Enterprise ServicesJournal of Database Management10.4018/jdm.201404010125:2(1-28)Online publication date: 1-Apr-2014
      • (2014)Passive performance testing of network protocolsComputer Communications10.1016/j.comcom.2014.06.00151(36-47)Online publication date: 1-Sep-2014
      • (2012)Monitoring service choreographies from multiple sourcesProceedings of the 4th international conference on Software Engineering for Resilient Systems10.1007/978-3-642-33176-3_10(134-149)Online publication date: 27-Sep-2012
      • (2006)An entropy-based algorithm for time-driven software instrumentation in parallel systemsProceedings of the 20th international conference on Parallel and distributed processing10.5555/1898699.1898874(331-331)Online publication date: 25-Apr-2006
      • (2003)The role of event description in architecting dependable systemsArchitecting dependable systems10.5555/1768179.1768188(150-174)Online publication date: 1-Jan-2003
      • (2003)Tools and techniques for performance measurement of large distributed multiagent systemsProceedings of the second international joint conference on Autonomous agents and multiagent systems10.1145/860575.860711(843-850)Online publication date: 14-Jul-2003
      • (2000)A hierarchical Quality of Service control architecture for configurable multimedia applicationsJournal of High Speed Networks10.5555/1293659.12936639:3,4(153-174)Online publication date: 1-Dec-2000
      • Show More Cited By

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media