-
A Dual Digraph Approach for Leaderless Atomic Broadcast (Extended Version)
Authors:
Marius Poke,
Colin W. Glass
Abstract:
Many distributed systems work on a common shared state; in such systems, distributed agreement is necessary for consistency. With an increasing number of servers, these systems become more susceptible to single-server failures, increasing the relevance of fault-tolerance. Atomic broadcast enables fault-tolerant distributed agreement, yet it is costly to solve. Most practical algorithms entail line…
▽ More
Many distributed systems work on a common shared state; in such systems, distributed agreement is necessary for consistency. With an increasing number of servers, these systems become more susceptible to single-server failures, increasing the relevance of fault-tolerance. Atomic broadcast enables fault-tolerant distributed agreement, yet it is costly to solve. Most practical algorithms entail linear work per broadcast message. AllConcur -- a leaderless approach -- reduces the work, by connecting the servers via a sparse resilient overlay network; yet, this resiliency entails redundancy, limiting the reduction of work. In this paper, we propose AllConcur+, an atomic broadcast algorithm that lifts this limitation: During intervals with no failures, it achieves minimal work by using a redundancy-free overlay network. When failures do occur, it automatically recovers by switching to a resilient overlay network. In our performance evaluation of non-failure scenarios, AllConcur+ achieves comparable throughput to AllGather -- a non-fault-tolerant distributed agreement algorithm -- and outperforms AllConcur, LCR and Libpaxos both in terms of throughput and latency. Furthermore, our evaluation of failure scenarios shows that AllConcur+'s expected performance is robust with regard to occasional failures. Thus, for realistic use cases, leveraging redundancy-free distributed agreement during intervals with no failures improves performance significantly.
△ Less
Submitted 12 December, 2019; v1 submitted 28 August, 2017;
originally announced August 2017.
-
Formal Specification and Safety Proof of a Leaderless Concurrent Atomic Broadcast Algorithm
Authors:
Marius Poke,
Colin W. Glass
Abstract:
Agreement plays a central role in distributed systems working on a common task. The increasing size of modern distributed systems makes them more susceptible to single component failures. Fault-tolerant distributed agreement protocols rely for the most part on leader-based atomic broadcast algorithms, such as Paxos. Such protocols are mostly used for data replication, which requires only a small n…
▽ More
Agreement plays a central role in distributed systems working on a common task. The increasing size of modern distributed systems makes them more susceptible to single component failures. Fault-tolerant distributed agreement protocols rely for the most part on leader-based atomic broadcast algorithms, such as Paxos. Such protocols are mostly used for data replication, which requires only a small number of servers to reach agreement. Yet, their centralized nature makes them ill-suited for distributed agreement at large scales. The recently introduced atomic broadcast algorithm AllConcur enables high throughput for distributed agreement while being completely decentralized. In this paper, we extend the work on AllConcur in two ways. First, we provide a formal specification of AllConcur that enables a better understanding of the algorithm. Second, we formally prove AllConcur's safety property on the basis of this specification. Therefore, our work not only ensures operators safe usage of AllConcur, but also facilitates the further improvement of distributed agreement protocols based on AllConcur.
△ Less
Submitted 16 August, 2017;
originally announced August 2017.
-
AllConcur: Leaderless Concurrent Atomic Broadcast (Extended Version)
Authors:
Marius Poke,
Torsten Hoefler,
Colin W. Glass
Abstract:
Many distributed systems require coordination between the components involved. With the steady growth of such systems, the probability of failures increases, which necessitates scalable fault-tolerant agreement protocols. The most common practical agreement protocol, for such scenarios, is leader-based atomic broadcast. In this work, we propose AllConcur, a distributed system that provides agreeme…
▽ More
Many distributed systems require coordination between the components involved. With the steady growth of such systems, the probability of failures increases, which necessitates scalable fault-tolerant agreement protocols. The most common practical agreement protocol, for such scenarios, is leader-based atomic broadcast. In this work, we propose AllConcur, a distributed system that provides agreement through a leaderless concurrent atomic broadcast algorithm, thus, not suffering from the bottleneck of a central coordinator. In AllConcur, all components exchange messages concurrently through a logical overlay network that employs early termination to minimize the agreement latency. Our implementation of AllConcur supports standard sockets-based TCP as well as high-performance InfiniBand Verbs communications. AllConcur can handle up to 135 million requests per second and achieves 17x higher throughput than today's standard leader-based protocols, such as Libpaxos. Thus, AllConcur is highly competitive with regard to existing solutions and, due to its decentralized approach, enables hitherto unattainable system designs in a variety of fields.
△ Less
Submitted 21 April, 2017; v1 submitted 20 August, 2016;
originally announced August 2016.
-
Performance Evaluation of Unified Parallel C for Molecular Dynamics
Authors:
Kamran Idrees,
Christoph Niethammer,
Aniello Esposito,
Colin W. Glass
Abstract:
Partitioned Global Address Space (PGAS) integrates the concepts of shared memory programming and the control of data distribution and locality provided by message passing into a single parallel programming model. The purpose of allying distributed data with shared memory is to cultivate a locality-aware shared memory paradigm. PGAS is comprised of a single shared address space, which is partitione…
▽ More
Partitioned Global Address Space (PGAS) integrates the concepts of shared memory programming and the control of data distribution and locality provided by message passing into a single parallel programming model. The purpose of allying distributed data with shared memory is to cultivate a locality-aware shared memory paradigm. PGAS is comprised of a single shared address space, which is partitioned among threads. Each thread has a portion of the shared address space in local memory and therefore it can exploit data locality by mainly doing computation on local data. Unified Parallel C (UPC) is a parallel extension of ISO C and an implementation of the PGAS model. In this paper, we evaluate the performance of UPC based on a real-world scenario from Molecular Dynamics.
△ Less
Submitted 12 March, 2016;
originally announced March 2016.
-
Effective use of the PGAS Paradigm: Driving Transformations and Self-Adaptive Behavior in DASH-Applications
Authors:
Kamran Idrees,
Tobias Fuchs,
Colin W. Glass
Abstract:
DASH is a library of distributed data structures and algorithms designed for running the applications on modern HPC architectures, composed of hierarchical network interconnections and stratified memory. DASH implements a PGAS (partitioned global address space) model in the form of C++ templates, built on top of DART -- a run-time system with an abstracted tier above existing one-sided communicati…
▽ More
DASH is a library of distributed data structures and algorithms designed for running the applications on modern HPC architectures, composed of hierarchical network interconnections and stratified memory. DASH implements a PGAS (partitioned global address space) model in the form of C++ templates, built on top of DART -- a run-time system with an abstracted tier above existing one-sided communication libraries.
In order to facilitate the application development process for exploiting the hierarchical organization of HPC machines, DART allows to reorder the placement of the computational units. In this paper we present an automatic, hierarchical units mapping technique (using a similar approach to the Hilbert curve transformation) to reorder the placement of DART units on the Cray XC40 machine Hazel Hen at HLRS. To evaluate the performance of new units mapping which takes into the account the topology of allocated compute nodes, we perform latency benchmark for a 3D stencil code. The technique of units mapping is generic and can be be adopted in other DART communication substrates and on other hardware platforms.
Furthermore, high--level features of DASH are presented, enabling more complex automatic transformations and optimizations in the future.
△ Less
Submitted 4 March, 2016;
originally announced March 2016.
-
Optimized Polynomial Evaluation with Semantic Annotations
Authors:
Daniel Rubio Bonilla,
Colin W. Glass,
Jan Kuper
Abstract:
In this paper we discuss how semantic annotations can be used to introduce mathematical algorithmic information of the underlying imperative code to enable compilers to produce code transformations that will enable better performance. By using this approaches not only good performance is achieved, but also better programmability, maintainability and portability across different hardware architectu…
▽ More
In this paper we discuss how semantic annotations can be used to introduce mathematical algorithmic information of the underlying imperative code to enable compilers to produce code transformations that will enable better performance. By using this approaches not only good performance is achieved, but also better programmability, maintainability and portability across different hardware architectures. To exemplify this we will use polynomial equations of different degrees.
△ Less
Submitted 11 March, 2016; v1 submitted 4 March, 2016;
originally announced March 2016.
-
ms2: A molecular simulation tool for thermodynamic properties, new version release
Authors:
Colin W. Glass,
Steffen Reiser,
Gábor Rutkai,
Stephan Deublein,
Andreas Köster,
Gabriela Guevara Carrión,
Amer Wafai,
Martin Horsch,
Martin F. Bernreuther,
Thorsten Windmann,
Hans Hasse,
Jadran Vrabec
Abstract:
A new version release (2.0) of the molecular simulation tool ms2 [S. Deublein et al., Comput. Phys. Commun. 182 (2011) 2350] is presented. Version 2.0 of ms2 features a hybrid parallelization based on MPI and OpenMP for molecular dynamics simulation to achieve higher scalability. Furthermore, the formalism by Lustig [R. Lustig, Mol. Phys. 110 (2012) 3041] is implemented, allowing for a systematic…
▽ More
A new version release (2.0) of the molecular simulation tool ms2 [S. Deublein et al., Comput. Phys. Commun. 182 (2011) 2350] is presented. Version 2.0 of ms2 features a hybrid parallelization based on MPI and OpenMP for molecular dynamics simulation to achieve higher scalability. Furthermore, the formalism by Lustig [R. Lustig, Mol. Phys. 110 (2012) 3041] is implemented, allowing for a systematic sampling of Massieu potential derivatives in a single simulation run. Moreover, the Green-Kubo formalism is extended for the sampling of the electric conductivity and the residence time. To remove the restriction of the preceding version to electro-neutral molecules, Ewald summation is implemented to consider ionic long range interactions. Finally, the sampling of the radial distribution function is added.
△ Less
Submitted 25 July, 2015;
originally announced July 2015.
-
DART-MPI: An MPI-based Implementation of a PGAS Runtime System
Authors:
Huan Zhou,
Yousri Mhedheb,
Kamran Idrees,
Colin W. Glass,
José Gracia,
Karl Fürlinger,
Jie Tao
Abstract:
A Partitioned Global Address Space (PGAS) approach treats a distributed system as if the memory were shared on a global level. Given such a global view on memory, the user may program applications very much like shared memory systems. This greatly simplifies the tasks of developing parallel applications, because no explicit communication has to be specified in the program for data exchange between…
▽ More
A Partitioned Global Address Space (PGAS) approach treats a distributed system as if the memory were shared on a global level. Given such a global view on memory, the user may program applications very much like shared memory systems. This greatly simplifies the tasks of developing parallel applications, because no explicit communication has to be specified in the program for data exchange between different computing nodes. In this paper we present DART, a runtime environment, which implements the PGAS paradigm on large-scale high-performance computing clusters. A specific feature of our implementation is the use of one-sided communication of the Message Passing Interface (MPI) version 3 (i.e. MPI-3) as the underlying communication substrate. We evaluated the performance of the implementation with several low-level kernels in order to determine overheads and limitations in comparison to the underlying MPI-3.
△ Less
Submitted 7 July, 2015;
originally announced July 2015.
-
ls1 mardyn: The massively parallel molecular dynamics code for large systems
Authors:
Christoph Niethammer,
Stefan Becker,
Martin Bernreuther,
Martin Buchholz,
Wolfgang Eckhardt,
Alexander Heinecke,
Stephan Werth,
Hans-Joachim Bungartz,
Colin W. Glass,
Hans Hasse,
Jadran Vrabec,
Martin Horsch
Abstract:
The molecular dynamics simulation code ls1 mardyn is presented. It is a highly scalable code, optimized for massively parallel execution on supercomputing architectures, and currently holds the world record for the largest molecular simulation with over four trillion particles. It enables the application of pair potentials to length and time scales which were previously out of scope for molecular…
▽ More
The molecular dynamics simulation code ls1 mardyn is presented. It is a highly scalable code, optimized for massively parallel execution on supercomputing architectures, and currently holds the world record for the largest molecular simulation with over four trillion particles. It enables the application of pair potentials to length and time scales which were previously out of scope for molecular dynamics simulation. With an efficient dynamic load balancing scheme, it delivers high scalability even for challenging heterogeneous configurations. Presently, multi-center rigid potential models based on Lennard-Jones sites, point charges and higher-order polarities are supported. Due to its modular design, ls1 mardyn can be extended to new physical models, methods, and algorithms, allowing future users to tailor it to suit their respective needs. Possible applications include scenarios with complex geometries, e.g. for fluids at interfaces, as well as non-equilibrium molecular dynamics simulation of heat and mass transfer.
△ Less
Submitted 20 August, 2014;
originally announced August 2014.
-
Avoiding Serialization Effects in Data-Dependency aware Task Parallel Algorithms for Spatial Decomposition
Authors:
Christoph Niethammer,
Colin W. Glass,
Jose Gracia
Abstract:
Spatial decomposition is a popular basis for parallelising code. Cast in the frame of task parallelism, calculations on a spatial domain can be treated as a task. If neighbouring domains interact and share results, access to the specific data needs to be synchronized to avoid race conditions. This is the case for a variety of applications, like most molecular dynamics and many computational fluid…
▽ More
Spatial decomposition is a popular basis for parallelising code. Cast in the frame of task parallelism, calculations on a spatial domain can be treated as a task. If neighbouring domains interact and share results, access to the specific data needs to be synchronized to avoid race conditions. This is the case for a variety of applications, like most molecular dynamics and many computational fluid dynamics codes. Here we present an unexpected problem which can occur in dependency-driven task parallelization models like StarSs: the tasks accessing a specific spatial domain are treated as interdependent, as dependencies are detected automatically via memory addresses. Thus, the order in which tasks are generated will have a severe impact on the dependency tree. In the worst case, a complete serialization is reached and no two tasks can be calculated in parallel. We present the problem in detail based on an example from molecular dynamics, and introduce a theoretical framework to calculate the degree of serialization. Furthermore, we present strategies to avoid this unnecessary problem. We recommend treating these strategies as best practice when using dependency-driven task parallel programming models like StarSs on such scenarios.
△ Less
Submitted 17 January, 2014;
originally announced January 2014.
-
Stability of xenon oxides at high pressures
Authors:
Qiang Zhu,
Daniel Y. Jung,
Artem R. Oganov,
Colin W. Glass,
Carlo Gatti,
Andriy O. Lyakhov
Abstract:
Xenon, which is quite inert under ambient conditions, may become reactive under pressure. The possibility of formation of stable xenon oxides and silicates in the interior of the Earth could explain the atmospheric missing xenon paradox. Using the ab initio evolutionary algorithm, we predict the thermodynamical stabilization of Xe-O compounds at high pressures (XeO, XeO2 and XeO3 at pressures abov…
▽ More
Xenon, which is quite inert under ambient conditions, may become reactive under pressure. The possibility of formation of stable xenon oxides and silicates in the interior of the Earth could explain the atmospheric missing xenon paradox. Using the ab initio evolutionary algorithm, we predict the thermodynamical stabilization of Xe-O compounds at high pressures (XeO, XeO2 and XeO3 at pressures above 83, 102 and 114 GPa, respectively). Our calculations indicate large charge transfer in these oxides, suggesting that large electronegativity difference and pressure are the key factors favoring the formation of xenon compounds. Xenon compounds in the Earth's mantle, however, cannot directly explain the missing xenon paradox: xenon oxides are unstable in equilibrium with metallic iron in the Earth's lower mantle, while xenon silicates are predicted to spontaneously decompose at all mantle pressures (<136 GPa). This does not preclude Xe atoms from being retained in defects of mantle silicates and oxides.
△ Less
Submitted 29 November, 2012; v1 submitted 28 November, 2012;
originally announced November 2012.
-
Constrained evolutionary algorithm for structure prediction of molecular crystals: methodology and applications
Authors:
Qiang Zhu,
Artem R. Oganov,
Colin W. Glass,
Harold T. Stokes
Abstract:
Evolutionary crystal structure prediction proved to be a powerful approach for studying a wide range of materials. Here, we present a specifically designed algorithm for the prediction of the structure of complex crystals consisting of well-defined molecular units. The main feature of this new approach is that each unit is treated as a whole body, which drastically reduces the search space and imp…
▽ More
Evolutionary crystal structure prediction proved to be a powerful approach for studying a wide range of materials. Here, we present a specifically designed algorithm for the prediction of the structure of complex crystals consisting of well-defined molecular units. The main feature of this new approach is that each unit is treated as a whole body, which drastically reduces the search space and improves the efficiency, but necessitates the introduction of new variation operators described here. To increase diversity of the population of structures, the initial population and part($\scriptsize{\sim}$20%) of the new generations are generated using space group symmetry combined with random cell parameters and random positions and orientations of molecular units. We illustrate the efficiency and reliability of this approach by number of tests (ice, ammonia, carbon dioxide, methane, benzene, glycine and butane-1,4-diammonium dibromide). This approach easily predicts the crystal structure of methane \emph{A} containing 21 methane molecules (105 atoms) per unit cell. We demonstrate that this new approach has also a high potential for the study of complex inorganic crystals on the examples of a complex hydrogen storage material Mg(BH$_4$)$_2$ and elemental boron.
△ Less
Submitted 17 May, 2012; v1 submitted 20 April, 2012;
originally announced April 2012.
-
Hybrid MPI/StarSs - a case study
Authors:
Jose Gracia,
Christoph Niethammer,
Manuel Hasert,
Steffen Brinkmann,
Rainer Keller,
Colin W. Glass
Abstract:
Hybrid parallel programming models combining distributed and shared memory paradigms are well established in high-performance computing. The classical prototype of hybrid programming in HPC is MPI/OpenMP, but many other combinations are being investigated. Recently, the data-dependency driven, task parallel model for shared memory parallelisation named StarSs has been suggested for usage in combin…
▽ More
Hybrid parallel programming models combining distributed and shared memory paradigms are well established in high-performance computing. The classical prototype of hybrid programming in HPC is MPI/OpenMP, but many other combinations are being investigated. Recently, the data-dependency driven, task parallel model for shared memory parallelisation named StarSs has been suggested for usage in combination with MPI. In this paper we apply hybrid MPI/StarSs to a Lattice-Boltzmann code. In particular, we present the hybrid programming model, the benefits we expect, the challenges in porting, and finally a comparison of the performance of MPI/StarSs hybrid, MPI/OpenMP hybrid and the original MPI-only versions of the same code.
△ Less
Submitted 18 April, 2012;
originally announced April 2012.
-
Static and dynamic properties of curved vapour-liquid interfaces by massively parallel molecular dynamics simulation
Authors:
Martin T. Horsch,
Svetlana K. Miroshnichenko,
Jadran Vrabec,
Colin W. Glass,
Christoph Niethammer,
Martin F. Bernreuther,
Erich A. Müller,
George Jackson
Abstract:
Curved fluid interfaces are investigated on the nanometre length scale by molecular dynamics simulation. Thereby, droplets surrounded by a metastable vapour phase are stabilized in the canonical ensemble. Analogous simulations are conducted for cylindrical menisci separating vapour and liquid phases under confinement in planar nanopores. Regarding the emergence of nanodroplets during nucleation, a…
▽ More
Curved fluid interfaces are investigated on the nanometre length scale by molecular dynamics simulation. Thereby, droplets surrounded by a metastable vapour phase are stabilized in the canonical ensemble. Analogous simulations are conducted for cylindrical menisci separating vapour and liquid phases under confinement in planar nanopores. Regarding the emergence of nanodroplets during nucleation, a non-equilibrium phenomenon, both the non-steady dynamics of condensation processes and stationary quantities related to supersaturated vapours are considered. Results for the truncated and shifted Lennard-Jones fluid and for mixtures of quadrupolar fluids confirm the applicability of the capillarity approximation and the classical nucleation theory.
△ Less
Submitted 20 October, 2011;
originally announced October 2011.
-
Ionic high-pressure form of elemental boron
Authors:
Artem R. Oganov,
Jiuhua Chen,
Carlo Gatti,
Yanzhang Ma,
Yanming Ma,
Colin W. Glass,
Zhenxian Liu,
Tony Yu,
Oleksandr O. Kurakevych,
Vladimir L. Solozhenko
Abstract:
Boron is an element of fascinating chemical complexity. Controversies have shrouded this element since its discovery was announced in 1808: the new 'element' turned out to be a compound containing less than 60-70 percent of boron, and it was not until 1909 that 99-percent pure boron was obtained. And although we now know of at least 16 polymorphs, the stable phase of boron is not yet experimenta…
▽ More
Boron is an element of fascinating chemical complexity. Controversies have shrouded this element since its discovery was announced in 1808: the new 'element' turned out to be a compound containing less than 60-70 percent of boron, and it was not until 1909 that 99-percent pure boron was obtained. And although we now know of at least 16 polymorphs, the stable phase of boron is not yet experimentally established even at ambient conditions. Boron's complexities arise from frustration: situated between metals and insulators in the periodic table, boron has only three valence electrons, which would favour metallicity, but they are sufficiently localized that insulating states emerge. However, this subtle balance between metallic and insulating states is easily shifted by pressure, temperature and impurities. Here we report the results of high-pressure experiments and ab initio evolutionary crystal structure predictions that explore the structural stability of boron under pressure and, strikingly, reveal a partially ionic high-pressure boron phase. This new phase is stable between 19 and 89 GPa, can be quenched to ambient conditions, and has a hitherto unknown structure (space group Pnnm, 28 atoms in the unit cell) consisting of icosahedral B12 clusters and B2 pairs in a NaCl-type arrangement. We find that the ionicity of the phase affects its electronic bandgap, infrared adsorption and dielectric constants, and that it arises from the different electronic properties of the B2 pairs and B12 clusters and the resultant charge transfer between them.
△ Less
Submitted 18 November, 2009; v1 submitted 16 November, 2009;
originally announced November 2009.
-
Crystal structure prediction using ab initio evolutionary techniques: principles and applications
Authors:
A. R. Oganov,
C. W. Glass
Abstract:
We have developed an efficient and reliable methodology for crystal structure prediction, merging ab initio total-energy calculations and a specifically devised evolutionary algorithm. This method allows one to predict the most stable crystal structure and a number of low-energy metastable structures for a given compound at any P-T conditions without requiring any experimental input. Extremely h…
▽ More
We have developed an efficient and reliable methodology for crystal structure prediction, merging ab initio total-energy calculations and a specifically devised evolutionary algorithm. This method allows one to predict the most stable crystal structure and a number of low-energy metastable structures for a given compound at any P-T conditions without requiring any experimental input. Extremely high success rate has been observed in a few tens of tests done so far, including ionic, covalent, metallic, and molecular structures with up to 40 atoms in the unit cell. We have been able to resolve some important problems in high-pressure crystallography and report a number of new high-pressure crystal structures. Physical reasons for the success of this methodology are discussed.
△ Less
Submitted 16 November, 2009;
originally announced November 2009.
-
New high-pressure form of boron is significantly ionic
Authors:
Artem R. Oganov,
Jiuhua Chen,
Carlo Gatti,
Yanzhang Ma,
Yanming Ma,
Colin W. Glass,
Zhenxian Liu,
Tony Yu,
Oleksandr O. Kurakevych,
Vladimir L. Solozhenko
Abstract:
The comment of Dubrovinskaia et al. is scientifically flawed. The high-pressure form of boron, discovered by Oganov et al., is indeed new and its bonding has a significant ionic character, as demonstrated in Ref. 1.
The comment of Dubrovinskaia et al. is scientifically flawed. The high-pressure form of boron, discovered by Oganov et al., is indeed new and its bonding has a significant ionic character, as demonstrated in Ref. 1.
△ Less
Submitted 27 December, 2010; v1 submitted 4 August, 2009;
originally announced August 2009.