research-article

Open access

Chaos Engineering of Ethereum Blockchain Clients

Authors:

Long Zhang,

Javier Ron,

Benoit Baudry,

Martin MonperrusAuthors Info & Claims

Distributed Ledger Technologies: Research and Practice, Volume 2, Issue 3

Article No.: 22, Pages 1 - 18

https://doi.org/10.1145/3611649

Published: 18 September 2023 Publication History

PDF eReader

Abstract

In this article, we present ChaosETH, a chaos engineering approach for resilience assessment of Ethereum blockchain clients. ChaosETH operates in the following manner: First, it monitors Ethereum clients to determine their normal behavior. Then, it injects system call invocation errors into one single Ethereum client at a time and observes the behavior resulting from perturbation. Finally, ChaosETH compares the behavior recorded before, during, and after perturbation to assess the impact of the injected system call invocation errors. The experiments are performed on the two most popular Ethereum client implementations: GoEthereum and Nethermind. We assess the impact of 22 different system call errors on those Ethereum clients with respect to 15 application-level metrics. Our results reveal a broad spectrum of resilience characteristics of Ethereum clients w.r.t. system call invocation errors, ranging from direct crashes to full resilience. The experiments clearly demonstrate the feasibility of applying chaos engineering principles to blockchain systems.

1 Introduction

Starting from the Bitcoin system [39], blockchain techniques have drawn much attention because of their strong trustworthiness and reliability [14, 47]. Ethereum is one of the most popular blockchain platforms [16] that supports both cryptocurrencies and decentralized finance [52]. To join the Ethereum distributed network, participants run an Ethereum client in their local environment. Then, the network is composed of all clients interacting together. There are multiple implementations of Ethereum clients [12], written in different languages, all implementing the same specification and protocol [53].

The consensus protocol is designed to provide resilience against high-level malfunctions. Yet, low-level bugs in different clients may affect the network as a whole. Thus, in addition to protocol-level resilience, it is essential to understand and improve the code-level resilience of the Ethereum client implementations. Ethereum clients indeed have many reasons to malfunction sometimes, such as operating system overload, memory errors, network partition, and so on. Meanwhile, recent surveys [40, 55] stress the lack of work on assessing the resilience of blockchain clients and Ethereum.

Chaos engineering is a novel methodology to assess and improve the error-handling mechanisms of software systems [13]. To perform chaos engineering, developers actively inject failures in a system in production in a controlled manner. This allows them to compare the behavior observed during fault injection with the system’s normal behavior [41]. The system can be considered resilient if these behaviors are similar. However, discrepancies between them can indicate the presence of resilience issues. In the context of an Ethereum client, it is very challenging to predict and test offline the problems that an Ethereum client will meet after it is deployed in production, because it is impossible to reproduce the actual Ethereum blockchain evolution at scale. Chaos engineering is fit to address this challenge, as it triggers an Ethereum client’s error-handling code while the client is executing in a full-fledged production environment, while downloading, sharing, and verifying the main blockchain.

In this article, we present the design and implementation of a novel chaos engineering methodology called ChaosETH. ChaosETH analyzes an Ethereum client resilience by perturbing its system call invocations in production. We focus on system calls, as they are known to be appropriate to capture the behavior of a software system [23, 25]. The implementation of ChaosETH is applicable to any Ethereum client, regardless of the programming language used. The resilience assessment is based on advanced monitoring that captures the steady state of an Ethereum client. As a result, ChaosETH is able to identify resilience strengths and weaknesses in Ethereum clients.

ChaosETH is evaluated by conducting chaos engineering experiments on GoEthereum v1.10.25 and Nethermind 1.14.5. During the experiments, ChaosETH perturbs GoEthereum and Nethermind while they interact with other Ethereum nodes all over the world. The results show that (1) ChaosETH successfully conducts chaos engineering experiments using different error models, and (2) ChaosETH successfully identifies different degrees of resilience with respect to system call invocation errors. We also present an original resilience benchmarking using four common error models (defined in Section 4.2.3) for a sound comparison of their resilience. The core novelty of our work is to perform reliability assessment of blockchain systems under production conditions. To our knowledge, we are the first to deeply study the usage of chaos engineering for blockchain systems [30].

In summary, the main contributions of this article are:

—

A novel methodology to perform chaos engineering on blockchain clients for resilience assessment. This assessment is done directly in production to avoid the core limitations of blockchain test networks. The methodology is fully implemented in a tool called ChaosETH dedicated to Ethereum.

—

An empirical evaluation of the resilience of GoEthereum v1.10.25 and Nethermind 1.14.5 in production, with respect to realistic error models at the system call level. The chaos engineering experiments highlight strengths (including resilience) and weaknesses (including crashes) for both clients, which is valuable knowledge for the Ethereum community.

—

An empirical and sound resilience benchmarking of GoEthereum versus Nethermind with respect to four common error models. We compare the differences in error handling capabilities between these clients against the same system call invocation errors and show that no client is consistently more resilient.

—

An implementation of ChaosETH that supports full error injection for system call invocations in an Ethereum client. The system is publicly available for future research (https://github.com/KTH/royal-chaos/tree/master/chaoseth).

The rest of the article is structured as follows: Section 2 introduces the background knowledge of blockchain techniques and chaos engineering. Section 3 and Section 4 present the design and evaluation of ChaosETH. Section 5 discusses threats to validity, ethical considerations, and the applicability of ChaosETH to other Ethereum clients, and Section 6 concludes the article.

2 Background

2.1 Blockchain Nodes

Blockchain is the core technology behind the success of cryptocurrencies such as Bitcoin [39]. A blockchain is a decentralized distributed ledger for recording consensual information [54]. The information is saved in a sequence of “blocks” that are shared by all of the participants on a blockchain network.

In this work, we focus on Ethereum [53], an open-source blockchain system that supports the execution of smart contracts. A smart contract is a program that is persisted on-chain and is executable by every user on the network using the so-called Ethereum Virtual Machine (EVM). As introduced in Ethereum’s official documentation [17], an Ethereum client is an implementation of Ethereum that (1) verifies blocks by executing contracts and (2) shares them with other peers over the Internet. There exist different Ethereum clients that implement an EVM in different programming languages, such as go-ethereum and nethermind. The host that runs a given Ethereum client is defined as a “node”; there are hundreds of thousands of such nodes at the time of writing.

2.2 Chaos Engineering

Chaos engineering is a software reliability methodology to assess error-handling capabilities by actively triggering failures in production [1]. For example, Netflix uses ChaosMonkey, which randomly shuts down servers in production to verify that the whole system is able to recover from such a failure scenario [13]. Chaos engineering is a unique complement to other software reliability techniques, because it is done after a system is deployed in production, where the running environment is not as controlled as during testing or staging [57].

The application of chaos engineering is based on a series of steps [41]. First, the target system’s steady state needs to be defined. The steady state is a set of observable metrics that characterize the system’s normal behavior. For instance, the number of video streams opened by users per second could be used as a steady state metric for Netflix [13]. After defining the steady state, developers set hypotheses that describe the expected behavior of the target system during a chaos engineering experiment. In this context, a chaos engineering experiment is an execution period during which failures are injected into the target system. If a hypothesis holds, then it means that the target system behaves as expected when a certain failure happens in production. Otherwise, if the hypothesis is falsified, then actions need to be taken based on the knowledge learned from the chaos engineering experiment. Finally, as the experiment is performed in production environments, developers have to limit the side effects on user experience caused by the experiment; this is also called “blast radius control.” For example, developers could utilize containerization techniques to isolate the chaos engineering experiment targets [43].

In this article, we deploy the full chaos engineering methodology in the context of blockchain systems. We define the steady state, set up a fault injection model, formalize hypotheses, and conduct chaos engineering for Ethereum blockchain nodes.

2.3 Related Work

2.3.1 Blockchain Dependability.

There are several surveys that focus on blockchain [11, 16, 22, 30, 51]. For example, Huang et al. [30] conducted a survey of the state-of-the-art on blockchains including the theories, models, and tools. Based on these surveys, we notice that there is limited research on using fault injection techniques to evaluate the reliability of blockchain systems. Most of the recent works in this direction are about fuzzing smart contracts. To our knowledge, there is no work on perturbing the runtime environment of blockchain node, as is done by ChaosETH.

Regarding performance, the existing research works focus on different levels, including EVM opcode level [4, 6], smart contract level [5], consensus algorithms level [3, 28], blockchain system level [8, 9, 19, 20, 21], and blockchain-based application level [49]. Compared with these works, ChaosETH does not focus on performance evaluation of blockchain systems. Instead, performance-related metrics such as memory usage are used by ChaosETH to evaluate the side effects caused by system call invocation errors. The most related work is done by Dinh et al. [21], who invented BlockBench, a framework that evaluates a private blockchain’s performance using its throughput, latency, scalability, and fault-tolerance capability as indicators. BlockBench investigates how Byzantine failures affect a blockchain system’s throughput and latency by crashing some nodes, injecting network delays, and corrupting messages among different nodes. Compared with BlockBench, ChaosETH assesses a blockchain client using a public blockchain network, which means developers do not need to have the full control of the whole decentralized system. The failure models are also different between these two works.

Regarding reliability analysis and improvement, Seol et al. [42] implemented a blockchain analytics engine for assessing the dependability of off-chain file systems. Sousa et al. [45] designed a Byzantine fault-tolerant ordering service for Hyperledger Fabric. Zhang et al. [59] designed LedgerGuard, a mechanism that keeps the integrity of a ledger via corrupted blocks detection and recovery. Liu et al. [37] proposed an evaluation methodology that applies a continuous-time Markov chain model for blockchain-based IoT applications. It has been proposed to use modeling techniques to study blockchain reliability. For instance, Melo et al. [38] proposed a modeling methodology that evaluates reliability and availability of a blockchain-as-a-service environment. Kancharla et al. [33, 34] applied simulation methods to demonstrate the dependability of the proposed hybrid blockchain and slim blockchain. González et al. [26] categorized different fault injection techniques and discussed the possibilities to apply them in blockchain-based applications resilience assessment. Instead of using simulations or focusing on blockchain-based applications, ChaosETH uses real Internet traffic to assess the resilience of Ethereum clients in production.

Regarding security, research has been done on analyzing vulnerabilities that are located in smart contracts [7, 10, 18, 24, 27, 36, 50]. For example, Fu et al. [24] designed EVMFuzzer, a framework that generates contracts via a set of predefined mutators to find security bugs in different EVM implementations. Zhang et al. [58] presented TxSpector, a logic-driven approach for detecting attacks in Ethereum transactions at the bytecode level. Aumasson et al. [12] reviewed four Ethereum clients from a security perspective. Their work consists of a benchmarking methodology, as all of the four clients are evaluated using the same set of security problems. However, it is different from this article, because ChaosETH focuses on resilience benchmarking instead of security.

Moreover, none of these solutions support the specification and execution of fault injection experiments directly in a production-like environment. ChaosETH is the first methodology that provides developers with a systematic way to learn how their Ethereum client implementations react to different system call invocation errors in production.

2.3.2 Chaos Engineering.

Basiri et al. [13] presented the principles of chaos engineering in 2016. The earliest known chaos engineering tool is called the “Chaos Monkey” [13], which has never been deployed to Ethereum, as opposed to the “Bored Ape.” [2] Zhang et al. [57] designed ChaosMachine, a tool that conducts chaos engineering experiments at the try-catch level for Java applications. Jernberg et al. [32] designed a chaos engineering framework based on the literature and a tool survey and validated the framework in a real commercial web system. Simonsson et al. [43] proposed ChaosOrca, a chaos engineering system that injects system call errors for dockerized applications. Hernández-Serrato et al. [29] discussed the possibilities to apply machine learning techniques for improving chaos engineering experiments. Ikeuchi et al. [31] proposed a framework for learning a recovery policy using deep reinforcement learning and chaos engineering techniques. Chaos engineering is also applicable in the field of security. Torkura et al. [48] proposed CloudStrike, a tool that focuses on injecting failures that impact security, i.e., integrity, confidentiality, and availability. Regarding human factors in chaos enginering, Canonico et al. [15] discussed what aspects of AI would be used to make a system more resilient to perturbations and the results of these findings against existing chaos engineering approaches.

The only related work that combines chaos engineering and blockchains is by Sondhi et al. [44]. They apply chaos engineering to evaluate the performance of different consensus algorithms in permissioned blockchains. Their perturbation models were designed for representing network failures and message corruptions. In comparison, ChaosETH considers an entirely different failure model: errors at the level of system call invocations. This failure model captures the problems that may happen on the node’s operating system and not on the network. To our knowledge, our methodology of end-to-end chaos engineering for blockchain systems is novel.

3 Design of ChaosETH

ChaosETH assesses an Ethereum client’s resilience within a real production environment. This section introduces the motivation behind ChaosETH, as well as its design and implementation.

3.1 Motivation

An Ethereum client is always executed on top of an operating system. The operating system is responsible for providing the Ethereum client access with critical resources such as network and storage, and it does so by means of system calls. For illustration, during a short 1-minute observation period of the GoEthereum client, we observed more than one million system calls (1,128,215) that cover 36 different types such as read and write. Although the client is behaving normally, the system call invocations are not all successful: 14,640 system call invocations fail during the observation with 9 different kinds of error codes within the same minute-long observation period. The fact that GoEthereum can stay up and running even if some of the system call invocations fail shows that GoEthereum is equipped with certain error-handling mechanisms.

As a rule of thumb, both developers and users of Ethereum clients seek robustness. ChaosETH is meant to help them to better understand and potentially improve this robustness.

3.2 Challenges of Applying Chaos Engineering to Blockchain Systems

To apply chaos engineering to blockchain systems, the following challenges have to be addressed: First, it is a challenging task to select appropriate metrics for defining the steady state prior to chaos engineering experiments. Even though most clients expose many metrics at different levels, it is unclear which ones are useful for reasoning about the behavior under fault injection. Second, it is complex to design appropriate fault injection models. Even if one only considers system call invocation errors, there still exist hundreds of system call types and error codes. Given a limited experiment time, focused and realistic fault injection models have to be guaranteed. Last but not the least, it is challenging to assess the correctness of a blockchain system, since the blockchain is always changing in production, one block at a time. The approach presented in this article tackles those three challenges.

3.3 Overview

To achieve the goal of assessing the resilience of blockchain clients, our core requirements are: (1) finding resilience problems related to production environments, hence ChaosETH should work on deployed clients; (2) finding resilience problems related to real data, hence ChaosETH should operate with the real Ethereum blockchain; and (3) being applicable to different implementations of Ethereum so benchmarking [35] can be conducted for the community to identify the most robust clients. Our contribution, ChaosETH, satisfies those three requirements.

ChaosETH is composed of three components and produces resilience reports for developers, as summarized in Figure 1. Recall that Ethereum is a distributed network with many nodes running together. ChaosETH operates on one single node (signified by the column titled Ethereum Node in Figure 1). In this specific Ethereum node, ChaosETH is attached to the client process during its execution (say, the GoEthereum process). The steady state analyzer collects monitoring metrics and infers the system’s steady state. We will present this in more detail in Section 3.4.1. The system call error injector injects different error codes into system call invocations in a controlled manner; see Section 3.4.2. With the help of the orchestrator, introduced in Section 3.4.3, ChaosETH systematically conducts chaos engineering experiments. Finally, ChaosETH produces a resilience report with respect to system call errors for developers. We discuss the resilience report in Section 3.4.4.

Fig. 1.

3.4 Components of ChaosETH

3.4.1 Steady State Analyzer.

As introduced in Section 2.2, defining the steady state of the Ethereum client under study is essential for a chaos engineering experiment. Per the state-of-the-art of observability, all Ethereum clients provide some monitoring capabilities. In ChaosETH, the steady state analyzer collects behavior-related metrics directly from the monitoring component provided by an Ethereum client.¹ For instance, the latest version of the GoEthereum client, v1.10.25, exposes more than 400 different metrics that describe the runtime status of the client. By reusing these pre-existing mechanisms, there is a guarantee that no extra monitoring overhead is introduced.

Recall the first challenge mentioned in Section 3.2: Not all the provided metrics are appropriate for describing the steady state. For instance, the highest block number is not an ideal metric for steady state inference, because this metric is determined by the whole blockchain network instead of the client itself. To address this challenge, the steady state analyzer first needs developers to define a set of metrics from the client’s monitoring module. Then, the analyzer conducts statistical analysis on the selected metrics and confirms if a metric is statistically stable for describing the client’s steady state.

For the selected metrics, the monitoring is done per “monitoring interval,” which is a period of time during which data is collected and persisted. The monitoring interval is set by developers and is an engineering tradeoff for developers between overhead and precision [56]. A shorter monitoring interval may produce a better picture of the steady state, but also means a larger amount of monitoring data and a higher monitoring overhead. A typical monitoring interval is 15 seconds.

For the steady state inference task, we first configure a “monitoring epoch” in the analyzer. A monitoring epoch is defined as a sequence of contiguous monitoring intervals. For example, if the monitoring epoch is 3,600 seconds and the monitoring interval is 15 seconds, then the analyzer records \(3,600 / 15 = 240\) data points for each metric. For a given configuration, the analyzer samples all metrics over two monitoring epochs. Then, for each metric, the samples from the two monitoring epochs are compared to each other using the Mann-Whitney U test. This statistical test is used to determine if the probability distribution of the two epochs is different. The stable metrics are those whose distributions over the two epochs are not statistically different and are selected to describe the client’s steady state.

After inferring the client’s steady state by means of metric distributions, ChaosETH can conduct chaos engineering experiments, where the core idea is to compare the metric distributions under error injection against the reference distributions.

3.4.2 System Call Error Injector.

To conduct chaos engineering experiments, we need to inject errors. In this article, we focus on system call errors and design a system call error injector accordingly. When the injector is activated, it listens to system call invocation events, and it replaces the original return code with an error code. In ChaosETH, the error injector uses an error injection model defined as a triple (s, e, r): It means that system call s is injected with error code e under the error rate r \(\in [0,1]\) . The error rate is the probability of replacing the return code by error code e when system call s is invoked at runtime. For example, an error injection model (read, EAGAIN, 0.5) means that every time a read system call is invoked, there is a \(50\%\) probability that a successful return code is replaced with error code EAGAIN, which represents that the resource being read is temporarily unavailable [46].

Depending on the type of system call to perturb and the error rates, the error scenario is more or less severe. The higher the error rate in an error injection model, the more frequently such errors are injected into the target client, which means that the client is subject to a higher resilience pressure.

Considering the second challenge mentioned in Section 3.2, ChaosETH aims at focusing on realistic errors. For this, we follow the state of the work by Zhang et al. [56]. According to their methodology, realistic fault models should fulfill two characteristics: They represent real production failure scenarios and happen frequently enough so developers are able to analyze the behavior during a chaos engineering experiment. Consequently, in ChaosETH our fault models are realistic because: (1) They use the same error code as system call invocation errors that are observed in the field, and (2) the error rate is amplified so more invocation errors can be observed in a short period of time and trigger resilience issues.

3.4.3 Experiment Orchestrator.

The experiment orchestrator communicates with the steady state analyzer and the system call error injector to conduct chaos engineering experiments. The orchestrator attaches these two components to the operating system kernel before the Ethereum client starts. Then it activates the error injector according to the experiment configuration. An experiment configuration defines the duration of an experiment and the corresponding error injection model to be used.

The orchestrator divides each chaos engineering experiment into five phases: (1) the warm-up phase, (2) the pre-checking phase, (3) the error injection phase, (4) the recovery phase, and (5) the validation phase. During the warm-up phase, the target client runs until it reaches its steady state. During the pre-checking phase, the steady state analyzer is activated to monitor the client and to check if the client has already reached its steady state. During the error injection phase, both the steady state analyzer and the error injector are turned on so system call invocation errors are injected according to the given error injection model. After the error injection phase is done, the orchestrator gives the target client time to recover during a recovery phase. After this recovery phase, the validation phase takes place during which the orchestrator monitors the client’s state again. The observed behavior during this validation phase is compared with the previously inferred steady state using the Mann–Whitney U test. If the target client fails to recover back to its steady state after the validation phase, then it means that the injected errors have caused an adverse side effect on the client. This can signify to the developers that the client’s error handling mechanisms may have failed to recover from the injected error.

3.4.4 Chaos Engineering Hypotheses.

After a chaos engineering experiment, the orchestrator analyzes the client’s behavior during the experiment and generates a resilience report for developers. The report presents to what extent the implementation of the Ethereum client under evaluation is resilient to the injected system call invocation errors. As introduced in Section 2.2, the client’s resilience is assessed by validating chaos engineering hypotheses. ChaosETH uses three hypotheses about how a client reacts when errors are injected using one specific error model.

Non-crash hypothesis ( \(H_N\) ). The non-crash resilience hypothesis holds if the injected system call invocation errors do not crash the Ethereum client and the process remains alive.

Observability hypothesis ( \(H_O\) ). The observability hypothesis for a metric m holds if m is influenced by error injections according to an error model.

Recovery hypothesis ( \(H_R\) ). The recovery hypothesis is valid if the client is able to recover to its steady state after stopping the injection of system call invocation errors.

For example, let us assume that the system call error injector follows the error model (read, EAGAIN, 0.5) to conduct the experiment. The target client does not crash during fault injection. However, the success rate of reading data from disk drops significantly during fault injection, because the client fails at invoking the read system call. Furthermore, the success rate of reading data gets back to its steady state again after the experiment stops. In this case, ChaosETH reports that all of the three hypotheses \(H_N\) , \(H_O\) , and \(H_R\) are validated, meaning that the target client is non-crashing, observable, and resilient with respect to error model (read, EAGAIN, 0.5). By validating or falsifying the hypotheses above, developers learn more about the Ethereum client’s resilience with respect to different types of system call invocation errors. Such information is helpful for developers to prioritize their work on improving error handling.

3.4.5 Implementation.

The system call monitor and system call error injector are based on the Phoebe framework [56], which implements them with the help of the eBPF (extended Berkeley Packet Filter) module. More specifically, the monitor and error injector register their BPF programs to the sys_enter and sys_exit events. When there is an error that needs to be injected, the error injector calls the BPF helper function bpf_override_return to replace the original return code with the error code. The experiment orchestrator is implemented in Python. The source code of ChaosETH is publicly available at https://github.com/KTH/royal-chaos/tree/master/chaoseth

4 Experimentation

4.1 Overview

To evaluate the effectiveness of ChaosETH, we propose the following three research questions:

—

RQ1: What is the feasibility of steady state characterization of Ethereum clients with ChaosETH?

—

RQ2: What are ChaosETH’s resilience properties actually verified in the Ethereum clients under study?

—

RQ3: What are the resilience differences among the considered Ethereum clients with respect to system call errors?

All of the research questions are answered quantitatively by conducting experiments on the selected Ethereum client implementations. To maximize the relevance and impact of our research, we select the most popular Ethereum clients based on the statistical information about Ethereum’s main network published at ethernodes.org. At the time of writing, the most popular two Ethereum clients are GoEthereum and Nethermind. Thus, we select GoEthereum v1.10.25 and Nethermind 1.14.5 for the experiments. Our experimental workflow is summarized in Figure 2, and we discuss it in detail in Section 4.2. The rest of the section is dedicated to the presentation of the results.

Fig. 2.

4.2 Experiment Protocol

To assess the resilience of the two considered Ethereum clients, we conduct three different categories of experiments: steady state analysis experiments, chaos engineering experiments, both performed individually on each client; and resilience benchmarking experiments, performed collectively on both of the clients. Both Ethereum clients run with a production configuration in bare-metal servers. The GoEthereum client runs in a server with specs: Intel(R) Core(TM) i9-10980XE/128 GB RAM/2 TB NVMe SSD with Ubuntu 18.04.6. The Nethermind client runs in a server with Intel(R) Core(TM) i9-10900K/48 GB RAM/2 TB SATA SSD with Ubuntu 20.04.5.

4.2.1 Steady State Analysis Experiment.

To answer RQ1, we conduct a steady state analysis experiment on each client. As introduced in Section 2.2, characterizing an Ethereum client’s steady state is a prerequisite for a chaos engineering experiment. Conducting steady state analysis gives developers a clear understanding of how the Ethereum client behaves in a normal production condition.

The first row of Figure 2 illustrates our methodology to model the steady state. Each Ethereum client (GoEthereum or Nethermind) is started up using the options recommended in their respective documentation. The clients are started in active synchronization mode, which consists in downloading and verifying the blocks from other peers on the Internet. After starting up the client, we wait for the client to finish synchronizing the existing blocks.

After a client reaches a synced state, the main job of the client is to verify newly generated blocks and to share all the blocks with other peers. From this point, the steady state analyzer is attached to the client with a monitoring interval of 15 seconds and a monitoring epoch of 5 hours, per our conceptual framework detailed in Section 3.4.1. These values ensure that a sufficient number of data points ( \(5\times 60\times 60\div 15=1,200\) points) are available for capturing the distribution of each metric.

A steady state analysis experiment takes all of a client’s exposed metrics as input and outputs each metric’s probability distribution. The metrics that meet the following two criteria are selected for chaos engineering experiments: (1) being an active non-zero metric (some metrics are always zero if a feature is not turned on) and (2) being statistically stable to describe the client’s steady state.

4.2.2 Chaos Engineering Experiment.

To answer RQ2, we conduct chaos engineering experiments on each client. During a chaos engineering experiment, all of the components in ChaosETH are activated. The system call error injector intercepts a specific type of system call invocation. It replaces the successful return code with an error code, while the Ethereum client is connected to the rest of the blockchain distributed system. To perform the chaos engineering experiment, recall that we use the results of the steady state analysis (see Section 3.4.3), and we use those of RQ1. Also, the duration of the warm-up phase is 2 hours. The pre-check, error injection, and validation phases of a chaos engineering experiment in Figure 2 are set to 5 minutes for each phase. The recovery phase between the error injection phase and the validation phase is set to 10 minutes to give a client more time for self-recovery.

4.2.3 Resilience Benchmarking Experiment.

RQ3 is about resilience benchmarking experiments, which aim at providing insights for end-users who wish to choose a suitable Ethereum client, based on resilience requirements. Our benchmarking experiments indicate which client has a better resilience against some system call invocation errors. For a resilience benchmarking experiment, we define a single, common error model for all of the target clients to have a sound comparison. A common error model (s, e, \(r_m\) ) is one error type (s, e, \(r_c\) ) that has been observed on each client in isolation. The common error rate \(r_m\) is defined as the maximum value of \(r_c\) . The last row in Figure 2 illustrates our procedure for resilience benchmarking. Once both of the clients reach their steady state, ChaosETH conducts chaos engineering experiments using the common error model on each client. A cross-comparison is made among each client’s resilience report that is generated in the chaos engineering experiment. For example, ChaosETH injects an ENOMEM (insufficient memory) error into a read system call invocation. If client A directly crashes while client B continues to run and conducts retries after this error is injected, then client B is more resilient with respect to the injected type of errors.

4.3 Experiment Results

4.3.1 Steady State Analysis Experiment Results.

We perform a steady state analysis experiment on each selected Ethereum client, per Section 4.2.1. Every client is observed for two monitoring epochs of 5 hours, amounting to 10 hours in total. Within each epoch, the metrics of interest are recorded and aggregated for every 15-second monitoring interval.

Figure 3 depicts the distributions of metrics in GoEthereum and Nethermind. In the case of GoEthereum, the client exposes 400 metrics in total. ChaosETH analyzes all of them to identify proper metrics for chaos engineering experiments. As not all of the client’s features are activated using the recommended configuration, 164 out of these 400 metrics are inactive, meaning that the value of these metrics never change or cannot be queried. The comparison of the distributions of active metric values in two different monitoring epochs shows evidence that 44 metrics are statistically stable. As mentioned in Section 3.4.1, the steady state analyzer utilizes the Mann Whitney U test for distribution comparison. In all of the experiments, we use a p-value of 0.03. Under a confidence level of 0.01, the null hypothesis that the two samples are not statistically distinguishable is not rejected. In the case of Nethermind, 231 metrics are analyzed by the steady state analyzer, 115 metrics are inactive during the experiment. Regarding the 116 active metrics, 55 are statistically stable and can be used for further experiments.

Fig. 3.

In Table 1, we display the evolution of metric samples during the steady state experiment. The first half of each evolution chart (in blue) is based on the data gathered during the first monitoring epoch. The second half of the chart (in red) is drawn based on the data of the second monitoring epoch. The last two columns indicate the p-values, obtained after applying the Mann–Whitney U test, to test for the similarity between the two distributions, and the result of the test. For example, the first row in Table 1 shows that the number of account flush operations made by the GoEthereum client regularly has spikes during these two monitoring epochs.

Table 1.

An example where the null hypothesis is rejected at confidence level 0.01 is metric json.rpc.requests(count/s) in the Nethermind client. From Table 1 the line chart in the second-to-last row also visually confirms that the metric does not evolve in the same way during the two monitoring epochs. Considering the confidence level of 0.01, this metric is not stable enough to describe a client’s steady state and thus is excluded from further experiments.

This experiment shows that not all the monitoring metrics provided by an Ethereum client are suitable to describe the client’s steady state in a statically valid manner. Since the experiments are done in production, there are several factors that could affect a metric’s stability. First, the node itself is not always stable. For instance, there exist other applications that take more resources from the node. Second, the network may not be stable. It is possible that the node encounters network scans or attacks every now and then [16]. Last, the behaviors of peers are different. For example, when the node randomly connects with some new peers who have different characters, a metric might be influenced.

Answer to RQ1

The results of the steady state analysis experiments show that not all the monitoring metrics provided by an Ethereum client are suitable to describe the client’s steady state in a statistically valid manner. ChaosETH is able to successfully identify monitoring metrics that are stable in production: 44 out of the 236 active metrics in GoEthereum and 55 out of the 116 active metrics in Nethermind are selected for describing the steady state.

4.3.2 Chaos Engineering Experiment Results.

From the experiment for RQ1, we know that GoEthere-um runs with 10 different types of system calls, accumulating more than 288 million invocations in a 10–hour production run (two monitoring epochs). Interestingly, none of the types of system call invocations has a 100% success rate. We perform chaos engineering by increasing the error rates of those system calls in production. The error rate amplification approach described in Section 3.4.2 produces 15 and 12 realistic error models, respectively, for the GoEthereum client and the Nethermind client. For each error model, we make a chaos engineering experiment with a one-to-one mapping.

Table 2 describes the error models together with the chaos engineering experiments of the selected clients. Every row presents one error injection model, including the target system call invocation (column Syscall), the error code to be injected, and the error rate. The last five columns give the corresponding experiment result, including the total number of injected errors, the number of evaluated metrics, and the results of whether the three hypotheses ( \(H_N\) , \(H_O\) , \(H_R\) ) are verified or falsified with respect to a metric. The metrics that fail the pre-check phase are excluded from the other phases, since ChaosETH considers them not stable enough for behavior comparison. When the client does not invoke a type of system call during the experiment, ChaosETH does not inject any error related to that type of system call, and the corresponding row is omitted in the table.

Table 2.

Client	Syscall	Error Code	Error Rate	Injections	Metrics	H_N	H_O	H_R
GoEthereum	accept4	EAGAIN	0.6	670	24	\(\checkmark\)	18	16
	connect	EINPROGRESS	0.8	206	4	\(\checkmark\)	1	1
	epoll_ctl	EPERM	0.164	405	24	\(\checkmark\)	3	0
	epoll_pwait	EINTR	0.05	2,781	29	\(\checkmark\)	1	0
	futex	EAGAIN	0.05	2	-	X	-	-
	futex	ETIMEDOUT	0.05	4	-	X	-	-
	read	EAGAIN	0.559	10,680	0	\(\checkmark\)	0	-
	read	ECONNRESET	0.05	413	24	\(\checkmark\)	5	3
	recvfrom	EAGAIN	0.6	1,500	24	\(\checkmark\)	3	1
	write	EAGAIN	0.05	106	-	X	-	-
	write	ECONNRESET	0.05	2,012	-	X	-	-
	write	EPIPE	0.05	892	-	X	-	-
Nethermind	accept4	EAGAIN	1	1,524	51	\(\checkmark\)	4	2
	futex	EAGAIN	0.05	2	-	X	-	-
	futex	ETIMEDOUT	0.05	1	-	X	-	-
	recvfrom	EAGAIN	0.549	51,715	48	\(\checkmark\)	0	-
	recvfrom	ECONNRESET	0.05	2,612	42	\(\checkmark\)	28	23
	recvmsg	EAGAIN	1	6,588	50	\(\checkmark\)	2	1
	sendmsg	EAGAIN	0.05	2,277	52	\(\checkmark\)	16	14
	sendmsg	ECONNRESET	0.05	1,991	47	\(\checkmark\)	13	10
	shutdown	ENOTCONN	0.05	82	51	\(\checkmark\)	2	0
	unlink	ENOENT	0.577	40	42	\(\checkmark\)	2	2

Table 2. Chaos Engineering Experimental Results on the Major Ethereum Clients

H_N: “ \(\checkmark\) ” if the injected errors do not crash the client, otherwise “X”.

H_O: The number of metrics that the injected errors have a visible effect on.

H_R: The number of metrics that matched its steady state during the validation phase.

If a hypothesis is left to be untested, then it is marked as “-”.

For the GoEthereum client, ChaosETH conducts 12 chaos engineering experiments. The results show that 5 out of 12 error models crash the GoEthereum client (the rows whose \(H_N\) column is marked with “X”). For the other 7 error models, 6 of them have a visible effect on the monitoring metrics (the rows whose \(H_O\) column contains a non-zero value). For example, when ChaosETH uses the error model (accept, EAGAIN, 0.6) for experiments, 24 metrics are stable during the pre-check phase. During the error injection phase, 18 metrics are observed to deviate from their normal behavior. After stopping injecting the errors, 16 out of these 18 metrics recover to the normal state after the recovery phase. This confirms that the GoEthereum client is resilient to EAGAIN errors in accept4 with respect to these 16 metrics.

Regarding the Nethermind client, there are 10 chaos engineering experiments in total (second half of Table 2). The results show that two error models, (futex, EAGAIN, 0.05) and (futex, ETIMEDOUT, 0.05), lead the Nethermind client to a crash. Seven error models cause a visible effect on at least one metric during the error injection phase. The error model (recvfrom, EAGAIN, 0.549) does not cause any impact on all of the 48 metrics that pass the pre-check. In this case, ChaosETH does not check the \(H_O\) hypothesis, because no metric deviates from its steady state even during the error injection phase.

This experiment has five main outcomes, with different meanings for the Ethereum developers.

Crash ( \(H_N\) =X). The client directly crashes because of the injected errors. This is considered as a severe case: This means that an Ethereum node disappears from the distributed consensus and validation process. As the client crashes, the hypotheses \(H_O\) and \(H_R\) cannot be tested and are marked as “-” in Table 2. For example, ChaosETH detects that the GoEthereum client directly crashes when an EAGAIN error code is injected to the system call write. Since error code EAGAIN in Linux means that the target resource is temporarily unavailable, crashing is an over-reaction; the client should consider implementing a classical retry mechanism instead of crashing directly.

Invisible effect ( \(H_N\) = \(\checkmark\) and \(H_O\) =0). In some cases, there is no visible effect detected during the error injection phase. For example, the Nethermind chaos experiment using error model (recvfrom, EAGAIN, 0.549) reveals such a situation. In this experiment, ChaosETH injects 51,715 system call invocation errors to system call recvfrom. During this error injection phase, none of the 48 metrics have an abnormal behavior. This indicates that the Nethermind client seems to be functioning normally when a system call invocation to recvfrom returns an EAGAIN error code, which can potentially signify resilience. However, we cannot exclude that the client state is corrupted in an invisible manner, because we do not have a provably perfect steady state oracle. Since ChaosETH does not capture anything abnormal during the error injection phase, the verification of hypothesis \(H_R\) is skipped. Overall, the presence of such invisible effect cases is good with respect to consistency: If we would not perform steady state pre-checking and observability hypothesis checking, then developers may falsely believe that the client state is valid according to the monitored metrics.

Long-term effect ( \(H_N\) = \(\checkmark\) , \(H_O\) = \(\checkmark\) , and \(H_R\) =X). For some of the error models, the client under experiment does not crash. However, during the error injection phase, some metrics deviate from their steady state and do not recover after the given recovery phase. For instance, the experiment result of error model (accept4, EAGAIN, 1) in the Nethermind client belongs to this category. During the error injection phase, it shows that metrics eth66get_block_headers_received/s, local_receive_message_timeout_disconnects/s, process_private_memory/s, and process_virtual_memory/s deviate from their normal behavior. However, after the recovery phase, only metrics process_private_memory/s and process_virtual_memory/s recover to the steady state. The other two metrics stay abnormal during the validation phase. This means that either it takes a longer time for the client to recover from the injected errors or that the injected errors lead the client to a stalled or corrupted state. Overall, such cases show that ChaosETH gives Ethereum developers insights about the timespan of recovery.

Resilient case ( \(H_N\) = \(\checkmark\) , \(H_O\) = \(\checkmark\) , and \(H_R\) = \(\checkmark\) ). Certain error models do not crash the client and also cause visible evidence of resilience. After the error injection stops, the monitoring metrics recover to their steady state. This indicates that the target client is equipped with an effective, graceful error-handling mechanism that brings the client back to normal after errors. For example, during the chaos engineering experiment using error model (connect, EINPROGRESS, 0.8) in the GoEthereum client, the injected errors do not crash the client, thus the \(H_N\) hypothesis holds. During the error injection phase, the metric geth.txpool.slots.gauge/s no longer matches the steady state. When the error injection stops, the client’s behavior related to the transaction pool slots is restored during the recovery phase. During the validation phase, ChaosETH checks the metric again and confirms that geth.txpool.slots.gauge/s has recovered to its steady state. By looking at the client logs, we indeed confirm that the client has resumed downloading, sharing, and verifying Ethereum blocks.

Answer to RQ2

ChaosETH successfully conducts 12 and 10 different chaos engineering experiments, respectively, on the GoEthereum and the Nethermind blockchain clients. The results show that the clients have different degrees of resilience with respect to system call invocation errors. ChaosETH demonstrates that the clients crash under errors that are recoverable in theory. Since clients may crash concomitantly, this is a threat to the consensus and resilience properties of the Ethereum network from a systemic perspective. ChaosETH gives valuable insights about resilience founded on well-defined chaos engineering hypotheses.

4.3.3 Benchmarking Ethereum Clients.

We cannot strictly compare the considered clients based on the results of RQ2, because the error models are different. To overcome this, we have introduced in Section 4.2.3 the idea of testing the clients under a meaningful common error model. ChaosETH identifies four common error models for the selected clients. The results of this resilience benchmarking experiment are summarized in Table 3. Each row in the table presents the verification of the three hypotheses for both clients, according to a set of client metrics. Only the metrics that pass the pre-check phase are selected for hypothesis verification. This table is interesting in the following three aspects:

Table 3.

Common Error Model			GoEthereum					Nethermind
Syscall	Error Code	Error Rate	Injections	Metrics	H_N	H_O	H_R	Injections	Metrics	H_N	H_O	H_R
accept4	EAGAIN	1	736	35	\(\checkmark\)	9	2	1,344	49	\(\checkmark\)	6	3
futex	EAGAIN	0.05	4	-	X	-	-	2	-	X	-	-
futex	ETIMEDOUT	0.05	3	-	X	-	-	1	-	X	-	-
recvfrom	EAGAIN	0.6	1,257	0	\(\checkmark\)	0	-	26,691	48	\(\checkmark\)	0	-

Table 3. Resilience Benchmarking Experiment Results

First, regarding the \(H_N\) hypothesis (absence of crash), the results show that both the GoEthereum client and the Nethermind client crash under the same specific error models. These two clients are crashed by futex system call invocation errors with codes EAGAIN and ETIMEDOUT. Overall, there is no client that is absolutely more robust than the other with respect to crashing.

Second, focusing on the \(H_O\) hypothesis (observability), when the error model (accept4, EAGAIN, 1) is used for experiments, both the GoEthereum client and the Nethermind client are observed to have abnormal behavior with respect to metrics. For the GoEthereum client, 9 metrics become abnormal during the error injection phase. Regarding the Nethermind client, 6 metrics deviate from the steady state. This is evidence that the metrics capture the client’s internal state and that not all clients have the same observability.

Third, considering the \(H_R\) hypothesis, ChaosETH successfully identifies resilient cases for the two Ethereum clients. ChaosETH shows that the GoEthereum client is resilient to error model (accpet4, EAGAIN, 1) with respect to metrics geth.p2p.peers.gauge/s and geth.txpool.reheap.timer/s. For the same error model, the Nethermind client is resilient with respect to metrics nethermind_mod_exp_precompile/s, nethermind_state_db_reads/s, and nethermind_useless_peer_disconnects/s. As opposed to toy examples with perfect oracles, assessing behavior of real-world software through monitoring yields multiple shades of resilience.

Answer to RQ3

ChaosETH identifies four common error models for resilience benchmarking. The results show that neither of the clients is consistently more resilient than the other. For futex system call invocation errors, both the GoEthereum client and the Nethermind client crash. When injecting EAGAIN errors in system call invocations to accept4, the GoEthereum client has 9 metrics that deviate from the steady state but only 2 of them recover. The Nethermind client has 6 metrics that are affected by the injected errors, 3 out of which are able to recover. In both cases, given the nature of the injected errors (EAGAIN and ETIMEDOUT), the clients could implement a retry mechanism for better resilience.

5 Discussion

5.1 Threats to Validity

Per our methodology, during a steady state analysis experiment, a set of statistically stable metrics is identified. A corresponding threat to the internal validity relates to the selection criterion based on a Mann–Whitney U test. The threshold on the p-value may impact the selection results and subsequent outcome of the chaos engineering experiments. Another thread to the internal validity is that all the experiments are done in the environment mentioned in Section 4.2. The generalizability of our results can be strengthened in two ways: First, in this work, we evaluate the Ethereum clients under a synchronization workload. However, different workloads, such as performing RPC invocations, would most likely vary the behavioral metrics and consequently the captured steady state. Second, since system calls are implemented differently by operating systems, changing the operating system for chaos experiments may have an impact on the results.

A threat to the construct validity is that, in the chaos engineering experiments, errors are injected by replacing a successful return code of a system call with an error return code according to a pre-defined error model. However, the system call itself is still executed according to the eBPF execution model, potentially modifying the program’s state as originally intended. It is possible that abnormal behavior observed under error injection is due to the inconsistency between the returned error code and the internal state of the program and not to the returned error code alone.

5.2 Ethical Considerations

The more diverse the exchanged messages are, the better it is for ChaosETH to detect interesting behavior during fault injection experiments. From this perspective, it is beneficial to apply ChaosETH directly to the main network of Ethereum, per the core principles of chaos engineering. At the same time, the possibility of harming the network has to be evaluated. First, we make sure that ChaosETH only perturbs a single Ethereum client. Second, during a fault-injection experiment the perturbed client only downloads and verifies blocks, making it less likely to send out malformed messages to other peers. Finally, we have made the Ethereum Foundation aware of this research project and received their endorsement for this line of the software reliability research.

5.3 Applicability to Other Clients

As ChaosETH does chaos engineering experiments at the system call invocation level, theoretically, any Ethereum client implementation that runs on top of a Linux operating system can be evaluated by ChaosETH. As long as the operating system has the eBPF module installed, ChaosETH is able to monitor the system call invocations and inject specific errors. No specific changes are required in the client for attaching ChaosETH. At the conceptual level, our methodology is not limited to Ethereum or EVM-compatible blockchains; it can be applied to other blockchains as well.

6 Conclusion

In this article, we have presented a novel chaos engineering framework called ChaosETH. It actively injects system call invocation errors into Ethereum client implementations to assess their resilience in production. Our experiments show that ChaosETH is effective to detect system call-related error-handling weaknesses and strengths, ranging from direct crashes to full resilience in two of the most popular Ethereum clients, GoEthereum and Nethermind. As a direction for future research, it is promising to investigate the multiple kinds of resilience improvement in a blockchain client: at the operating system level, at the level of standard libraries, and in the client’s code, to achieve ultra-reliable blockchain systems.

Acknowledgments

This work was supported by the Wallenberg Artiicial Intelligence, Autonomous Systems and Software Program (WASP) funded by Knut and Alice Wallenberg Foundation, and by the Swedish Foundation for Strategic Research (SSF). Some experiments were performed on resources provided by the Swedish National Infrastructure for Computing (SNIC).

Footnote

See https://geth.ethereum.org/docs/monitoring/metrics and https://docs.nethermind.io/nethermind/ethereum-client/metrics

References

[1]

Chaos Community. 2018. Principles of Chaos Engineering. http://principlesofchaos.org/

Abstract

1 Introduction

2 Background

2.1 Blockchain Nodes

2.2 Chaos Engineering

2.3 Related Work

2.3.1 Blockchain Dependability.

2.3.2 Chaos Engineering.

3 Design of ChaosETH

3.1 Motivation

3.2 Challenges of Applying Chaos Engineering to Blockchain Systems

3.3 Overview

3.4 Components of ChaosETH

3.4.1 Steady State Analyzer.

3.4.2 System Call Error Injector.

3.4.3 Experiment Orchestrator.

3.4.4 Chaos Engineering Hypotheses.

3.4.5 Implementation.

4 Experimentation

4.1 Overview

4.2 Experiment Protocol

4.2.1 Steady State Analysis Experiment.

4.2.2 Chaos Engineering Experiment.

4.2.3 Resilience Benchmarking Experiment.

4.3 Experiment Results

4.3.1 Steady State Analysis Experiment Results.

4.3.2 Chaos Engineering Experiment Results.

4.3.3 Benchmarking Ethereum Clients.

5 Discussion

5.1 Threats to Validity

5.2 Ethical Considerations

5.3 Applicability to Other Clients

6 Conclusion

Acknowledgments

Footnote

References

Cited By

Index Terms

Recommendations

Blockchain Ethereum Clients Performance Analysis Considering E-Voting Application

Decentralize transaction records of digital payment gateway using Ethereum Blockchain and Interplanetary File System

A Flexible Instant Payment System Based on Blockchain

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations