Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3295816.3295819acmotherconferencesArticle/Chapter ViewAbstractPublication PagesandareConference Proceedingsconference-collections
research-article

Evaluation of NTP/PTP fine-grain synchronization performance in HPC clusters

Published: 04 November 2018 Publication History

Abstract

Fine-grain time synchronization is important to address several challenges in today and future High Performance Computing (HPC) centers. Among the many, (i) co-scheduling techniques in parallel applications with sensitive bulk synchronous workloads, (ii) performance analysis tools and (iii) autotuning strategies that want to exploit State-of-the-Art (SoA) high resolution monitoring systems, are three examples where synchronization of few microseconds is required. Previous works report custom solutions to reach this performance without incurring in extra cost of dedicated hardware. On the other hand, the benefits to use robust standards which are widely supported by the community, such as Network Time Protocol (NTP) and Precision Time Protocol (PTP), are evident. With today's software and hardware improvements of these two protocols and off-the-shelf integration in SoA HPC servers no expensive extra hardware is required anymore, but an evaluation of their performance in supercomputing clusters is needed. Our results show NTP can reach on computing nodes an accuracy of 2.6 μs and a precision below 2.7 μs, with negligible overhead. These values can be bounded below microseconds, with PTP and low-cost switches (no needs of GPS antenna). Both protocols are also suitable for data time-stamping in SoA HPC monitoring infrastructures. We validate their performance with two real use-cases, and quantify scalability and CPU overhead. Finally, we report software settings and low-cost network configuration to reach these high precision synchronization results.

References

[1]
2008. IEEE Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems.
[2]
W. Abu Ahmad et al. 2017. Design of an Energy Aware Petaflops Class High Performance Cluster Based on Power Architecture. In 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 964--973.
[3]
A. Bartolini et al. 2018. The D.A.V.I.D.E. Big-Data-Powered Fine-Grain Power and Performance Monitoring Support. In ACM International Conference on Computing Frontiers 2018.
[4]
D. Becker et al. 2008. Implications of non-constant clock drifts for the timestamps of concurrent events. In 2008 IEEE International Conference on Cluster Computing. 59--68.
[5]
D. Becker et al. 2009. Scalable timestamp synchronization for event traces of message-passing applications. Parallel Comput. 35, 12 (2009), 595 -- 607. Selected papers from the 14th European PVM/MPI Users Group Meeting.
[6]
D. Becker et al. 2010. Synchronizing the Timestamps of Concurrent Events in Traces of Hybrid MPI/OpenMP Applications. In 2010 IEEE International Conference on Cluster Computing.
[7]
J. Burbank et al. 2015. Network Time Protocol Version 4: Protocol and Algorithms Specification. IETF RFC 5905.
[8]
CISCO Systems, Inc. 2016. Cisco Industrial Ethernet 3000 Layer 2/Layer 3 Series Switches. Datasheet.
[9]
C. Conficoni et al. 2015. Energy-aware cooling for hot-water cooled supercomputers. In 2015 Design, Automation Test in Europe Conference Exhibition (DATE). 1353--1358.
[10]
B. Ferencz et al. 2013. Hardware assisted COTS IEEE 1588 solution for x86 Linux and its performance evaluation. In 2013 IEEE International Symposium on Precision Clock Synchronization for Measurement, Control and Communication (ISPCS) Proceedings.
[11]
P. Giannozzi et al. 2009. QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials. Journal of Physics: Condensed Matter 21, 39 (2009), 395502.
[12]
T. Ilsche et al. 2018. Power measurement techniques for energy-efficient computing: reconciling scalability, resolution, and accuracy. Computer Science - Research and Development (Apr 2018).
[13]
T. Jones et al. 2017. An evaluation of the state of time synchronization on leadership class supercomputers. Concurrency and Computation: Practice and Experience 30, 4 (2017), e4341. e4341 cpe.4341.
[14]
T. Jones and G. A. Koenig. 2012. Clock synchronization in high-end computing environments: a strategy for minimizing clock variance at runtime. Concurrency and Computation: Practice and Experience 25, 6 (2012), 881--897.
[15]
A. Libri et al. 2016. Evaluation of synchronization protocols for fine-grain HPC sensor data time-stamping and collection. In 2016 International Conference on High Performance Computing Simulation (HPCS). 818--825.
[16]
A. Libri et al. 2018. Dwarf in a Giant: Enabling Scalable, High-Resolution HPC Energy Monitoring for Real-Time Profiling and Analytics. ArXiv e-prints (June 2018). arXiv:cs.DC/1806.02698
[17]
Mellanox Technologies. 2015. EDR InfiniBand. OFA UM 2015, OpenFabrics Software User Group Workshop.
[18]
J. Ridoux and D. Veitch. 2009. Ten Microseconds Over LAN, for Free (Extended). IEEE Transactions on Instrumentation and Measurement 58, 6 (June 2009), 1841--1848.
[19]
J. Serrano et al. 2013. THE WHITE RABBIT PROJECT. In Proceedings of IBIC 2013, Oxford, UK. http://cds.cern.ch/record/1743073
[20]
C. Silvano et al. 2017. The ANTAREX Tool Flow for Monitoring and Autotuning Energy Efficient HPC Systems. In SAMOS 2017 - International Conference on Embedded Computer Systems: Architecture, Modeling and Simulation. Pythagorion, Greece. https://hal.inria.fr/hal-01615945

Cited By

View all
  • (2023)Synchronizing MPI Processes in Space and TimeProceedings of the 30th European MPI Users' Group Meeting10.1145/3615318.3615325(1-11)Online publication date: 11-Sep-2023
  • (2023)Ethernet-based timing system for accelerator facilitiesComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2023.109897233:COnline publication date: 1-Sep-2023
  • (2022)The Time Synchronization Problem in data-intense ManufacturingProcedia CIRP10.1016/j.procir.2022.05.070107(827-832)Online publication date: 2022
  • Show More Cited By
  1. Evaluation of NTP/PTP fine-grain synchronization performance in HPC clusters

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        ANDARE '18: Proceedings of the 2nd Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems
        November 2018
        36 pages
        ISBN:9781450365918
        DOI:10.1145/3295816
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 04 November 2018

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. HPC clusters
        2. MPI
        3. NTP
        4. PTP
        5. fine grain synchronization
        6. power and performance monitoring

        Qualifiers

        • Research-article

        Funding Sources

        • EC
        • E4 Computer Engineering SpA

        Conference

        ANDARE'18

        Acceptance Rates

        Overall Acceptance Rate 3 of 4 submissions, 75%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)18
        • Downloads (Last 6 weeks)3
        Reflects downloads up to 12 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)Synchronizing MPI Processes in Space and TimeProceedings of the 30th European MPI Users' Group Meeting10.1145/3615318.3615325(1-11)Online publication date: 11-Sep-2023
        • (2023)Ethernet-based timing system for accelerator facilitiesComputer Networks: The International Journal of Computer and Telecommunications Networking10.1016/j.comnet.2023.109897233:COnline publication date: 1-Sep-2023
        • (2022)The Time Synchronization Problem in data-intense ManufacturingProcedia CIRP10.1016/j.procir.2022.05.070107(827-832)Online publication date: 2022
        • (2021)Sub-Frame Evaluation of Frame Synchronization for Camera Network Using Linearly Oscillating Light SpotSensors10.3390/s2118614821:18(6148)Online publication date: 13-Sep-2021
        • (2021)DiG: enabling out-of-band scalable high-resolution monitoring for data-center analytics, automation and control (extended)Cluster Computing10.1007/s10586-020-03219-7Online publication date: 7-Jan-2021
        • (2021)Methodology for Design and Implementation an Efficient HPC ClusterHigh Performance Computing10.1007/978-3-030-68035-0_6(71-85)Online publication date: 3-Feb-2021
        • (2020) pAElla : Edge AI-Based Real-Time Malware Detection in Data Centers IEEE Internet of Things Journal10.1109/JIOT.2020.29867027:10(9589-9599)Online publication date: Oct-2020
        • (2019)Online Anomaly Detection in HPC Systems2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)10.1109/AICAS.2019.8771527(229-233)Online publication date: Mar-2019

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media