Ethereum is moving towards a major upgrade that aims at making the network more sustainable, with the transition from Proof-of-Work (PoW) to Proof-of-Stake, and more scalable with the introduction of data sharding. This process started with the deployment of the Beacon Chain in December 2020 and the next step called the Merge which is expected to happen later this year. In this article we look at how far the Ethereum 2 ecosystem has progressed in this transition and how ready is to move to the next level.
For this purpose, we performed a comprehensive study of the Ethereum 2 Consensus Layer (CL) clients under different hardware, multiple setup configurations and several use cases. The study took several months, consuming almost 30,000 CPU hours (3.4 CPU years), in which we collected over 735 million data points, from which we extracted almost 150 million data points to plot about a thousand different figures showing how the different CL clients perform. Moreover, we publicly release all these figures for free to the community, you can find them here.
We measured how the CL clients perform on three different hardware platforms, a standard node, a fat node and a Raspberry Pi 4b. The hardware configuration of each type of node is given in Table 1.
We also tested the default configuration as well as the all-topics configuration in which the clients listen to all subtopics on the GossipSub layer. The clients and the versions we tested were:
To have the fully detailed information about all the experiments we performed and how we ran them, please check this document.
In the following, we discuss some of our findings. We collected so much data that is simply impossible to discuss all the details in this post (that will be the role of a coming scientific article). However, we will try to go over some of the most important points. First, we plot the CPU usage (Figure 1) and the memory consumption (Figure 2) of the different CL clients. We see that most CL clients have a fair usage of the CPU on the standard node.
In terms of memory, they all expose different behaviors. Nimbus is the one with the lowest memory consumption in this experiment. Lighthouse memory consumption increased continuously until the client crashed and we had to restart it. This could be some kind of memory leak, our team discussed this with the Lighthouse team and they are investigating the issue. We also had a few issues at the beginning setting up the memory for the Java virtual machine (JVM) for Teku, which explains the weird pattern at the beginning of the run. Please note that there is a point in which most clients start to behave differently both in terms on CPU usage and memory consumption, this point is denoted with the red dotted line in the CPU figure. This line marks the transition to Altair hard fork. From Altair the CL clients need to track sync committees as well as other significant changes on slashing conditions, which explains the change on resource consumption.
If we look at disk usage (Figure 3), we notice that there is an important difference in how much storage the different clients require, for instance Lighthouse takes over three time the storage of Teku, which is the CL client that requires the least storage followed by Nimbus.
In terms of disk write operations (Figure 4), most clients, and in particular Prysm, have a higher number of disk write operations per second at the beginning of the syncing process, and this decreases gradually until it plateaus after Altair.
This is explained by the fact that at the beginning of the Beacon chain the number of validators was significantly lower than today and the slot processing of those epochs was much faster. Both figures are correlated as we can see that in Figure 3, the increase in disk usage is rather low in the first half and accelerates in the second half, as the number of validators and attestations in the network increases.
In terms of disk read operations per second, we discovered a pattern that we have discussed with the different teams but that we have not managed to fully understand yet. There is a point in which the disk read operations increase dramatically and this is observed in most of the CL clients, except for Teku. This is not really an issue because it does not affect the performance but we are still trying to understand why this happens. The most clear client to expose this behavior is Prysm. We have shared this findings with the Prysmatic team and they are investigating this pattern. What is more strange is that it happens virtually at the same time for most clients, around slot 900K as shown in Figure 5.
We repeated the experiment in the fat nodes and we saw almost the same behavior (Figure 6), but this time the pattern started much later, around slot 2M. We noticed this happens also on the Raspberry Pi but much earlier. The fact that the phenomena occurs later on nodes with more memory makes us believe that there is some kind of on memory caching process going on. When the client reaches its buffering limit, it starts generating a significant amount of disk reads.
For most of the experiments we did, we had to sync the CL clients. We tried both, syncing from genesis and syncing from a checkpoint. This is kind of a controversial point, since many have expressed that measuring syncing speed from genesis should not be a relevant measurement, and others saying that regardless of weak subjectivity, syncing from genesis is still the most common practice in the field. Since we tested both, we present the results. In Figure 7 we present the syncing speed for the CL clients. As we can see Nimbus is the fastest syncing client from all the open source CL clients.
We also tested Grandine, which is a closed source CL client that seems to have put a significant effort on parallelizing certain aspects of the consensus processing. This can be observed in a high processing speed of Grandine as shown in Figure 8, which makes it sync faster than all other clients. Please note the similar pattern to the disk write operations (Figure 4). Indeed, slots are processed much faster at the beginning an this rapidly decreases until it plateaus, which is again related to the low number of validators in the network at the beginning of the Beacon chain.
In terms of network bandwidth utilization, we analyzed how many peers the CL clients pair with and we can see that there are significant differences (Figure 9). Lodestar pairs with only 25 peers while Nimbus is the most flexible with wide range between 100 and 150 peers, and a mean around 125 peers. Prysm shows a range oscillating between 40 and 60 peers. Lighthouse and Teku also show very stable peering strategies. It is worth mentioning that we have used the default configuration for the peer limit for all clients. We chose to do this because the CL client teams explained that their clients were “optimized” for those numbers, and changing it could impact performance.
When we compare this with the network activity (Figure 10), the results are a bit different from expected. Nimbus has the largest number of peers but consumes less bandwidth than most other clients. Lodestar has a low bandwidth consumption which fits with the least number of peers. Teku instead consumes more bandwidth than all clients and this is particularly visible when enabling the all-topics option. We have shared these findings with the Teku team and they are investigating this high network activity. The Teku team has reported 25% reduction in CPU and 38% reduction in outgoing bandwith since the update to v22.5.1. Lodestar also showed some strange pattern on all-topics, in which it sends less data than in the default mode. The Lodestar team is investigating this issue.
We also studied how the CL clients behave in archival mode (only four support it). Archival mode is a configuration in which the client stores many intermediate states in order to be able to easily answer queries about any time in the history of the blockchain. This is particularly useful when setting up a block explorer or some similar kind of API. We sent thousand of requests to the four CL clients and recorded their response time, as shown in Figure 11. The slowest client in this regard was Prysm because they only have partial support for the standard API, so they translate the query to another interface on gRPC, which adds a certain overhead. The fastest of all clients for answering queries in archival mode was by far Teku with very fast and stable response times. Talking with the Teku team, they explained us that they have developed a new extremely fast tree structure state storage to quickly answer queries, which is particularly interesting for the Infura team.
Finally, we tested running the clients on a Raspberry Pi device. Syncing from genesis on such a small device can take a long time. Happily, checkpointing syncing works and all CL clients perform fairly well on these low power devices. Figure 12 shows the memory usage of CL clients on the Raspberry Pi.
In our journey through this exhaustive evaluation we found multiple strong points as well as room for improvement in all Ethereum CL clients. Our objective with this study was threefold: i) to provide useful feedback to the CL client teams; ii) to give a vast amount of data to the Ethereum Foundation measuring the readiness of the Ethereum 2 CL clients; iii) and last but not least, to provide staking pool operators with all the information they need to guide decisions about CL client deployment.
The evaluation presented above gives empirical data about how the Ethereum CL clients perform under different scenarios. However, there are many other aspects that also play an important role when deploying a software on an operational platform. Please note that some of those aspects might be subjective and others might not agree with our conclusions. In what follows, we try to cover some of those aspects and together with the empirical data measured, discuss about the strengths and points for improvements of the Ethereum CL clients.
For our team it was clear that Prysm was the client with the best user experience. It is extremely easy to use and to deploy on its default configuration. The Prysmatic team has gone to great lengths to improve the user experience. On the other hand, Prysm has room for optimization in several aspect. The documentation portal could be improved, the search function is not very intuitive. Plus, they use gRPC and the standard HTTP API is redirected to the gRPC, which has shown low performance and even crashed when performing multiple requests at the same time. The syncing on archival mode took much more time than other clients.
This is the client with the most complete API. It has almost all the Eth2 Beacon node API standard implemented which is why is the one we used to check CL client rewards, and other relevant data for our study. On the other side, the client seems to have some memory leak while syncing from genesis and it has the highest storage consumption of all clients. The memory and IO management should be reviewed.
Teku seems to be one of the most stable clients with a very complete documentation, in which it is really easy to find any execution option and command line flag. Also, it is the client with the lowest storage needs and the archival mode has the fastest API response time of all clients. However, despite its fast response time, to sync on archival mode takes long time (more than 3 weeks). Also, it is not always easy to setup the JVM correctly to avoid memory issues.
Nimbus is the client with the lowest CPU and memory requirements across all platforms, as well as the fastest syncing open-source client. It is clearly the client better suited to run in low power devices, but it also performs well on more powerful servers. On the other hand, its compilation and deployment are not as user-friendly as other clients. Also, the fact that the Beacon node and Validator node run on the same executable could be viewed as a feature but also as a disadvantage, as sometimes it is useful to stop the Validator client while keeping the Beacon node alive. The Nimbus team mentioned that they are working on an independent Validator client.
Lodestar is one of the latest CL clients to join the race and it is really commendable to see that the software supports most of the features that the other clients offer. Also, it shows a fairly low resource consumption. However, Lodestar is not always easy to compile and deploy (except when using Docker), there were multiple outdated instructions in the documentation, such as the nodeJS version required, among others. It is also the slowest client to sync from genesis and it does not offer archival mode.
Grandine was by far the fastest client to sync across all. It seems to have a great parallelization strategy that definitely outperforms other clients while syncing from genesis. However, it is not sure how much this speed can impact the performance after syncing. There are many features that are still in beta. Clearly the biggest drawback of this client is that it has not been open sourced yet.
In this article we have shown multiple aspects of all Ethereum CL clients while tested under different conditions. We have exposed their strengths as well as discussed some points for improvement. After all these experiments, it seems clear that the different CL client teams have focused on different aspects, users and use cases and they excel in different points.
Perhaps the most important conclusion that should be highlighted, is that all Ethereum2 CL clients run well on different hardware platforms and configurations. They showcase the strong software diversity that Ethereum has, and this is hard to find anywhere else in the blockchain ecosystem. Overall, our evaluation demonstrates that the efforts of all CL client implementation teams and researchers involved have pushed the Ethereum ecosystem one step closer towards a more sustainable and scalable blockchain technology.
This research work was mainly funded by the Lido research grants. We also would like to acknowledge the huge support from Ethereum Foundation research team and their interest on this work. We also would like to thank all the CL implementation teams which have helped us deploy and test their software under so many configurations. They have also been extremely helpful during the analysis and discussion about these results. Finally, we would like to thank all other reseachers with who we have discuss about this work, their feedback has been highly valuable.
If you liked this work and you want to support us, you can contribute to our Gitcoin Grant, visit our Website and follows us on Twitter.