Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1088149.1088190acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
Article

System noise, OS clock ticks, and fine-grained parallel applications

Published: 20 June 2005 Publication History

Abstract

As parallel jobs get bigger in size and finer in granularity, "system noise" is increasingly becoming a problem. In fact, fine-grained jobs on clusters with thousands of SMP nodes run faster if a processor is intentionally left idle (per node), thus enabling a separation of "system noise" from the computation. Paying a cost in average processing speed at a node for the sake of eliminating occasional processes delays is (unfortunately) beneficial, as such delays are enormously magnified when one late process holds up thousands of peers with which it synchronizes.We provide a probabilistic argument showing that, under certain conditions, the effect of such noise is linearly proportional to the size of the cluster (as is often empirically observed). We then identify a major source of noise to be indirect overhead of periodic OS clock interrupts ("ticks"), that are used by all general-purpose OSs as a means of maintaining control. This is shown for various grain sizes, platforms, tick frequencies, and OSs. To eliminate such noise, we suggest replacing ticks with an alternative mechanism we call "smart timers". This turns out to also be in line with needs of desktop and mobile computing, increasing the chances of the suggested change to be accepted.

References

[1]
M. Aron and P. Druschel, "Soft timers: efficient microsecond software timer support for network processing". ACM Trans. Comput. Syst, 18(3), pp. 197--228, Aug 2000.
[2]
S. A. Banachowski and S. A. Brandt, "The BEST Scheduler for Integrated Processing of Best-Effort and Soft Real-Time Processes". In Multimedia Computing and Networking, Jan 2002.
[3]
S. Childs and D. Ingram, "The Linux-SRT Integrated Multimedia Operating System: Bringing QoS to the Desktop". In Real-Time Technology & App. Symp., pp. 135, May 2001.
[4]
Intel Corp., IA-32 Intel Achitecture Software Developr's Manual. Vol. 3: System Programmin Guide.
[5]
J. J. Dongarra, H. W. Meuer, H. D. Simon, and E. Strohmaier, "Top500 supercomputer sites". URL http://www.top500.org/. (updated every 6 months).
[6]
Y. Etsion and D. G. Feitelson, Time Stamp Counters Library - Measurements with Nano Seconds Resolution. Technical Report 2000-36, The Hebrew University, Aug 2000.
[7]
Y. Etsion, D. Tsafrir, and D. G. Feitelson, "Effects of clock resolution on the scheduling of Interactive and soft real-time processes". In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 172--183, Jun 2003.
[8]
D. G. Feitelson and L. Rudolph, "Gang scheduling performance benefits for fine-grain synchronization". J. Parallel & Distributed Comput.16(4), pp. 306--318, Dec 1992.
[9]
D. G. Feitelson, L. Rudolph, and U. Schwiegelshohn, "Parallel job scheduling --- a status report". In Job Scheduling Strategies for Parallel Processing, D. G. Feitelson, L. Rudolph, and U, Schwiegelshohn (eds.), Springer Verlag, Jun 2004.
[10]
R. A. Finkel, An operating systems Vade Mecum. Prentice-Hall, 2nd ed., 1988.
[11]
E. Gabber, C. Small, J. Bruno, J. Brustoloni, and A. Silberschatz, "The pebble component-based operating system". In USENIX Technical Conf., Jun 1999.
[12]
B. O. Gallmeister, POSIX.4: Programming for the Real World. O'Reilly & Associates Inc, Jan 1995.
[13]
R. Gioiosa, F. Petrini, K. Davis, and F. Lebaillif-Delamare, "Analysis of system overhead on parallel computers". In IEEE Intl. Symp. on Signal Processing and Information Technology, Dec 2004.
[14]
A. Goel, L. Abeni, C. Krasic, J. Snow, and J. Walpole, "Supporting time-sensitive applications on a commodity OS". In Symp. Operating Syst. Design & Implementation, pp, 165--180, Dec 2002.
[15]
A. Hoisie, O. Lubeck, H. Wasserman, F. Petrini, and H. Alme, "A general predictive performance model for wavefront algorithms on clusters of smps". In Intl. Conf. on Parallel Processing, pp. 219, Aug 2000.
[16]
T. Jones, S. Dawson, R. Neely, W. Tuel, L. Brenner, J. Fier, R. Blackmore, P. Caffrey, B. Maskell, P. Tomlinson, and M. Roberts, "Improving scalability of parallel jobs by adding parallel awareness to the operating system". In Supercomputing, pp. 10:1--20, Nov 2003.
[17]
D. J. Kerbyson, H. J. Alme, A. Hoisie, F. Petrini, H. J. Wasserman, and M. Gittings, "Predictive performance and scalability modeling of a large-scale application". In Supercomputing, pp. 37, Nov 2001.
[18]
I. Leslie, D. McAuley, R. Black, T. Roscoe, P. Barham, D. Evers, R. Fairbairns, and E. Hyden, "The design and implementation of an operating system to support distributed multimedia applications". IEEE J. Select Areas in Commun.14(7), pp. 1280--1297, Sep 1996.
[19]
T. Li and L. K. John, "Run-time modeling and estimation of operating system power consumption". In SIGMETRICS Conf. Measurement & Modeling of Comput. Syst., pp. 160--171, Jun 2003.
[20]
J. Lions, Lions' Commentary on UNIX 6th Edition, with Source Code. Annabooks, 1996.
[21]
A. Maccabe, "FAST-OS: Forum to Address Scalable Technology for runtime and Operating Systems". URL www.cs.unm.edu/fastos/05Status/FASTOS-Feb2005.pdf, Feb 2005.
[22]
R. Mraz, "Reducing the variance of point to point transfers in the IBM 9076 parallel computer". In Supercomputing, pp. 620--629, Nov 1994.
[23]
J. Nieh, J. G. Hanko, J. D. Northcutt, and G. A. Wall, "SVR4 UNIX scheduler unacceptable for multimedia applications". In Network & Operating Syst. Support for Digital Audio & Video, Nov 1993.
[24]
J. Nieh and M. S. Lam, "A SMART scheduler for multimedia applications". ACM Trans. Comput. Syst.21(2), pp. 117--163, May 2003.
[25]
F. Petrini, D. J. Kerbyson, and S. Pakin, "The case of missing supercomputer performance: achieving optimal performance on the 8,192 processors of ASCI Q". In Supercomputing, Nov 2003.
[26]
J. C. Phillips, G. Zheng, S. Kumar, and L. V. Kale, "NAMD: biomolecular simulation on thousands of processors". In Supercomputing, Nov 2002.
[27]
Linux Programmer's Manual: sched_setscheduler System Call.
[28]
B. Srinivasan, S. Pather, R. Hill, F. Ansari, and D. Niehaus, "A firm real-time system implementation using commercial off-the-shelf hardware and free software". In IEEE Real-Time Technology & App. Symp., pp. 112--119, Jun 1998.
[29]
P. Terry, A. Shan, and P. Huttunen, "Improving application performance an HPC systems with process synchronization". Linux Journal2004(127), pp. 68--73, Nov 2004, URL http://portal.acm.org/citation.cfm?id=1029015.1029018.
[30]
L. Torvalds, A. Cox, R. Love, and many others, "HZ, preferably as small as possible". URL http://seclists.org/lists/linux-kernel/2002/Jul/index.html#2588, Jul 2002. Thread from the Linux Kernal Mailing List.
[31]
D. Tsafrir, Y. Etsion, and D. G. Feitelson, General-Purpose Timing: The Failure of Periodic Timers. Technical Report 2005-6, The Hebrew University, Feb 2005.
[32]
D. Tyrell, K. Severson, A. B. Perlman, B. Brickle, and C. Vaningen-Dunn, "Rail passenger equipment crashworthiness testing requirements and implementation". In Intl. Mechanical Engineering Congress & Exposition, Nov 2000.

Cited By

View all
  • (2024)A Stochastic Composite Model to Understand the Impact of Rare, Colossal Interference in HPC Systems2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00189(1153-1155)Online publication date: 27-May-2024
  • (2024)Profiling LAMMPS for GPU Disaggregation2024 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE)10.1109/CCECE59415.2024.10667118(78-79)Online publication date: 6-Aug-2024
  • (2023)Analysis and Characterization of Performance Variability for OpenMP RuntimeProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624239(1614-1622)Online publication date: 12-Nov-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '05: Proceedings of the 19th annual international conference on Supercomputing
June 2005
414 pages
ISBN:1595931678
DOI:10.1145/1088149
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. HPC
  2. modeling system noise
  3. operating systems
  4. smart timers
  5. synchronization
  6. ticks
  7. timer interrupts
  8. timing services

Qualifiers

  • Article

Conference

ICS05
Sponsor:
ICS05: International Conference on Supercomputing 2005
June 20 - 22, 2005
Massachusetts, Cambridge

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Stochastic Composite Model to Understand the Impact of Rare, Colossal Interference in HPC Systems2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00189(1153-1155)Online publication date: 27-May-2024
  • (2024)Profiling LAMMPS for GPU Disaggregation2024 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE)10.1109/CCECE59415.2024.10667118(78-79)Online publication date: 6-Aug-2024
  • (2023)Analysis and Characterization of Performance Variability for OpenMP RuntimeProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624239(1614-1622)Online publication date: 12-Nov-2023
  • (2023)The Role of Idle Waves, Desynchronization, and Bottleneck Evasion in the Performance of Parallel ProgramsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.322108534:2(623-638)Online publication date: 1-Feb-2023
  • (2023)Operating System Noise in the Linux KernelIEEE Transactions on Computers10.1109/TC.2022.318735172:1(196-207)Online publication date: 1-Jan-2023
  • (2023)Revisiting Performance Evaluation in the Age of Uncertainty2023 IEEE 30th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW)10.1109/HiPCW61695.2023.00012(23-30)Online publication date: 18-Dec-2023
  • (2023)Lightweight Noise DetectionPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_7(165-197)Online publication date: 19-Jun-2023
  • (2023)Evaluating the Potential of Coscheduling on High-Performance Computing SystemsJob Scheduling Strategies for Parallel Processing10.1007/978-3-031-43943-8_8(155-172)Online publication date: 15-Sep-2023
  • (2023)Evaluating the Impact of MPI Network Sharing on HPC ApplicationsParallel Computational Technologies10.1007/978-3-031-38864-4_1(3-18)Online publication date: 25-Jul-2023
  • (2022)Leveraging Code Snippets to Detect Variations in the Performance of HPC SystemsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.315874233:12(3558-3574)Online publication date: 1-Dec-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media