When it comes to running a validator in the Ethereum ecosystem, especially after The Merge, it is important to measure its performance, as this will directly impact how many rewards it obtains. Therefore, we have analyzed how many rewards validators obtain, in order to get some hints of their performance in the network.
From a hardware perspective, running a validator in the Ethereum ecosystem requires, nowadays, two different clients. The execution layer (EL) client is in charge of creating and validating blocks in the execution layer. Every time a new block is proposed and validated, the block proposer, a.k.a. a miner during the Proof-of-Work (PoW) era, can collect all the transaction fees as a reward (during the PoW era it would also collect a base reward, which was how ETH was created).
The consensus layer (CL) client is in charge of operating a node in the Consensus Layer (or Beacon Chain). Validators run on top of this node, which can interact with the rest of the Beacon Chain network. In the CL, a validator can earn rewards by attesting, proposing new blocks and participating in a sync committee.
Both the Execution Layer and the Consensus Layer have a protocol specification, which describes how each node should interact with the rest of the nodes in the network. There are several implementations by several teams, which contain all the needed implementations to run a node in both networks. In the Consensus Layer, there are 5 main implementations:
Lighthouse
Lodestar
Prysm
Nimbus
Teku
Before The Merge, these two clients, EL and CL, would run separately, as both chains were running in parallel but independently. After The Merge, both clients work together, where the CL client takes the control and governs the EL client. Also, the base reward in the EL has been removed, and the new ETH is created through the rewards at the CL.
From a protocol perspective, running a validator requires depositing a minimum of 32 ETH, which, at the moment we are writing this blog post (November 2022), is equivalent to over 40K USD. As this amount may not be affordable to everyone, the idea of Staking Pools appeared. Mirroring the idea of Mining Pools, anyone can join a Staking Pool, deposit any amount of ETH and obtain the proportional reward to that deposit. Therefore, the 32 ETH to activate a validator is now achieved through the aggregations of small stakers’ deposits.
Since The Merge, the mining process has been removed, so a random validator is chosen to propose the new block at every slot. Inside the proposed block, there is one field that can be configured upon executing the CL client: the graffiti. This is a free byte field representing a string. Validators can configure a specific graffiti that contains information about the EL and/or CL clients running on the node. With this, it is possible to associate a validator with both: a CL and EL client.
For instance, for some blocks, we can identify two initials after the “RP”: the first referring to the EL client and the second one representing the CL client. Sometimes there is only one initial, which means that only the CL client can be identified.
As mentioned previously, some validators usually configure a specific graffiti when proposing a block. From these graffitis we were able to identify the CL client of a large number of validators. After that, we have grouped validators by the CL client and calculated the aggregate rewards for the 20K epochs (from epoch 133875 to 153875). We can see that all clients obtain a similar reward every week, with an average variance of 0.003%.
The maximum score difference happens at week 95, with a 7.9% difference between the highest (Teku) and lowest (Prysm) aggregate reward. For this set of validators, taking into account the average reward per client across all weeks, the validators running on a Teku node earned the most rewards, with 0.2147 ETH, while the validators running on a Nimbus node obtained the least rewards, with 0.2097 ETH. Comparing these last two numbers, the difference is only around 2.3% compared to the average across all clients.
Comparing the average client reward per week to the global average, we can see that Teku obtains roughly 1.24% on top and Nimbus obtains roughly 1.08% below the average across all clients. This shows that in the short term there might be some reward differences, but in the long term all clients will have a similar performance and reward.
With the launch of the Beacon Chain back in 2020, the Ethereum Foundation launched the Client Incentive Program (CIP), granting each of these teams a certain amount of ETH so they could run their own validators. With this grant, each of these teams would be able to activate 144 validators (except Lodestar, which received funds to activate up to 72 validators). With this, the Ethereum Foundation encouraged each of these teams to keep developing and upgrading their software following the Ethereum Beacon Chain upgrades and, thus, maintaining a certain client diversity.
As we described before, each of the main developer teams had the possibility to activate 144 validators (72 for Lodestar). Each of these validators are running on a node with the corresponding CL client: Prysm, Lighthouse, Nimbus, Teku and Lodestar. Therefore, by measuring the rewards these groups of validators obtain we are also measuring the performance of each of these clients. We have measured the rewards each of these groups of validators obtain during over 40K epochs (from epoch 110875 to 153875) and divide it by the number of validators in each group, so that we obtain the average validator CL reward.
Reading each of the weeks separately, we may observe that the maximum variance in rewards happens at week 72, with a value of 0.013%, which means that during that week validators of the Lodestar pool were earning 17% less rewards in comparison to the validators of the Prysm pool. However, in general, we can observe that the average variance is around 0.005%, which is almost insignificant. Taking the average of all clients as the baseline, the maximum positive difference for a client for a week was 8.9% for Teku on week 79 and the maximum negative difference for a client was -8.1% for Nimbus on week 74. When we look at the average difference across all weeks, the differences are even lower, +0.7% and -0.7% as we can see in the next figure.
In the end, the average validator rewards obtained for the whole 26 weeks varies between 0.713 ETH for Lodestar on the low side and 0.724 for Prysm on the high side. That is a difference of only 0.011 ETH for a 6 month period, or at the current price a difference of roughly $30 USD a year (the price of a small dinner). In other words, although we see that some weeks there might be a client performing better than the others, probably due to many block proposals and/or sync committee duties, the reality is that over a long period of time, all clients should return roughly the same amount of CL rewards.
Focusing more into the block rewards side of the protocol, we have also measured the score of the block proposal from each CL client. Proposing blocks return a very high reward so it is interesting to estimate if one client would create a block with more rewards, for which we have used a scoring system. For this part of the study, we have used a custom fork of vouch to obtain the block proposals.
The block reward a validator obtains depends on the new attestations included in the block. The more new attestations, the highest the reward. The total block reward would also depend on the balance of the attesting validators. In an effort to measure the performance of the block proposer, we will remove the attesting balance from the formula and only measure how many attestations were included in the block, which should give us a hint of how well the block would perform from an objective point of view.
All 5 clients were running on the mainnet network in the same machine, located in Central Europe, with the following hardware configuration:
32 cores AMD Ryzen 9 5950X
128 GB RAM
7TB NVMe disk
Therefore, all clients were running under the same conditions, to make this study as fair as possible.
For this part of the study we have measured block proposals on mainnet for around 17K epochs (several epoch ranges inside 120340 - 162491). In general, we can see that the average block score from each of the clients is very similar, with a maximum average score of 11754 (Nimbus) and minimum average score of 11657 (Lodestar). However, this difference only represents 0.82% of the average score. Using the average across all of them as a reference, Lodestar falls only 0.37% below, and Nimbus sits 0.45% on top. These differences in block score are almost insignificant.
The above case shows the score of blocks during regular chain conditions, but there are certain moments when the chain temporarily behaves differently, for example when there is a missed block. In that scenario, there are some extra attestations pending to be included in a block (that were supposed to be included in the missed one). Therefore, we have measured the block score after a missed block, when the proposer is able to include more attestations than usual.
First, we can observe that the average block score of the 5 clients is higher now. It sits at around 18000 in comparison to 11750 on normal conditions. As there are more attestations pending to be included in a block, more rewards can be achieved.
Then, the average block score of each of the clients is not so similar now. In this case, Lighthouse achieves an average block score of 20285 which is around 12% on top of the average across all clients. The difference between the highest average score (Lighthouse) and the lowest average score (Nimbus) is 3389, which represents an 18% difference. Therefore, we see an important difference in the block scores after a missed block. However, missed blocks are not frequent, thus in the long run this difference smoothes out with all the other rewards.
During this study we have analyzed how much CL rewards each of the CL clients obtains, measuring three different metrics; i) graffiti-based analysis, ii) entity-based analysis and iii) a block proposal scoring system. From the analyzed data, we can highlight the following observations:
Looking at short periods of time (a week) one can find important reward differences across CL clients (up to 17% were observed); but they are likely to be related to sync committees duties, block proposals and other network conditions.
When comparing CL rewards over a long period of time (e.g., six months) the difference across all CL clients is rather negligible, between -0.7% and +0.7% compared to the global average.
There are differences in block proposal rewards after a missed block, where score differences rise up to 18%. It would be interesting to look at how different CL clients handle those situations.
In conclusion, the large differences in rewards across clients are mostly transient and related to different system conditions. This confirms that, even though one client could temporarily obtain more rewards than others (due to several block proposals, sync committees or even network alterations), if we measure over a long period of time, the obtained rewards would be roughly the same for all CL clients.
This is not a bad result because it demonstrates that all of the consensus layer clients are capable of running a validator in the network with a similar output and this should lead to higher client diversity, which in turn should make the network more robust.