In this post we analyze the performance of different Ethereum staking pools/operators/stakers during The Merge (migration from Proof of Work to Proof of Stake), taking 12 hours before, and 12 hours after it happened.
We start with some context on what The Merge is, and then we introduce ethereumpools.info a real-time monitoring tool whose data was used for this analysis. After that, we present and analyze multiple plots showing how this transition affected different entities, ending up with the methodology we followed for the analysis. To conclude, we present our conclussions.
Last 15th of September 2022 at 08:42:59 UTC block 15537394 time something changed in the Ethereum blockchain forever. Proof of Work ⛏️ was switched off in favor of Proof of Stake 🌱, making Ethereum greener and reducing its energy consumption by 99.988%, among other implications.
This put an end to several years of research and development, finally delivering the GASPER consensus protocol to a production environment securing billions of dollars in assets. Note that while this was a huge milestone in the Ethereum blockchain, it’s just one upgrade of the roadmap, that continues with withdrawals, proto-dansksharding, PBS, account abstractions, or single secret leader election among others.
And you may think, why is this transition from PoW to PoS referred to as The Merge? Well, it’s actually a merge (or fusion if you are a Dragon Ball Z fan) because two chains were actually merging together:
On one side we had Ethereum Mainnet, live since July 2015, what everyone calls Ethereum.
On the other one we had the Ethereum Beacon Chain, which launched on the 1st of December but that didn’t process any transaction until The Merge was due.
This event was huge and was followed by thousands of people around the globe, with live streams, and local parties in multiple capitals. Even Google acknowledged it with a funny animation when searching for The Merge. If these 🐻🐼 emojis mean something to you, you know what we are talking about.
The Merge was a transition that was required to be monitored carefully since the network couldn’t stop, some people even compared it to changing the engine of an aircraft while flying without the passengers even noticing. Note that the Ethereum blockchain is used by hundreds of thousands of people and secures hundreds of millions of dollars in value, so it can just stop.
On top of that, there was an extra added complexity. The Ethereum blockchain is defined by a set of execution and consensus common specifications, but there are a total of 9 different teams implementing its software, with more than a dozen combinations that are required to be tested. It was not an easy task and parithosh_j can tell you more about that. We call this client-diversity:
geth, erigon, besu, and nethermind on the execution side
lighthouse, teku, prysm, nimbus, and lodestar on the consensus side.
You may wonder, why that many software implementations of Ethereum. There are multiple reasons behind this, but the main ones are:
They are written in different programming languages with different use cases in mind
If a software bug affects a given client, the network can continue working. This adds a new level of redundancy, similar to avionics systems in the aerospace industry.
Moreover, we have the so-called operator-diversity, meaning that this software is operated by different entities such as companies, exchanges, operators, and solo stakers, each one in a different location, with different hardware.
At ethereumpools we have spent almost a year analyzing real-time the metrics of groups of validators belonging to whales, exchanges, pools, operators, and even some small stakers. Our main goal is to monitor their performance and alert them if we detect problems. Over this time, we have publicly alerted on issues with ChorusOne, Whales, Figment, StakeWise, P2P Lido, Poloniex, BitcoinSuisse, and Stakefish among others. We also like writing about the technicalities behind staking and talking about rewards and penalties among other topics.
We want to hold them accountable for their performance, using publicly available data present in the beacon chain.
These are some of the consensus metrics that we look at to detect possible issues:
Source and target votes, see Casper FFG.
Votes for LMD GHOST, the fork-choice rule, aka head votes.
Block proposal duties, and check the proposed and missed blocks.
Slashings.
Earned and lost rewards.
For example, this is one of the metrics we monitor in real-time, which measures the rewards/penalties in peta-wei that each pool earns per active validator every epoch, which is around six minutes. Since it’s a normalized metric, we can use it to compare how capital efficient each pool is.
We can use it to extrapolate the yearly APR% that each pool would get in consensus rewards. Since a year has 82125 epochs and each validator get on average 0.0175 peta-wei, one validator would earn 1.43718 Eth/year or 4.49%/year. This number doesn’t include execution rewards, and it decreases as number of active validators increase.
But how did this kind of graphs looked like during The Merge transition?
Did professional operators performed better than solo stakers?
Did any pool had issues?
Was any pool offline?
We are here to answer these questions, extending the data avaialble on ethereumpools.info with extra metrics, plots and explanations.
Hereafter, we present a set of 24 hours plots, 12 hours before and 12 hours after The Merge. Our goal is to see which pools had issues during the transition, using several metrics to back our thesis. We divide the analysis in different groups:
1️⃣ Well known exchanges and pools.
2️⃣ Lido operators.
3️⃣ Solo stakers and whales.
The emoji 🐼 shown in the following diagrams denotes the boundary between PoW and PoS, or more precisely block 15537394, which was proposed early morning 15 September 2022 UTC time.
Each plot shows the performance 12 hours before/after The Merge with the following metrics:
Percent of incorrect source votes for each epoch. A pool with 1000 validators and 100 validators voting wrong to source will have a 10% of wrong votes. The ideal value is 0%.
Percent of incorrect head votes for each epoch. A pool with 1000 validators and 100 validators voting wrong to the head of the blockchain with have a 10% of wrong votes. The ideal value is 0%.
Number of proposed blocks per epoch. Out of the 32 blocks that are proposed in a slot, indicates how many blocks a given pool proposed. The ideal value depends on the number of validators the pool controls.
Number of missed proposer blocks per epoch. Amount of blocks that were missed. Ideal value is 0.
Earned rewards per epoch per active validators. Amount of rewards that the pool earns per active validator on each epoch in peta-wei. This is a normalized metric that can be compared across different pools. Ideal value depends on the amount of active validators and varies with block proposals. Just consensus rewards are considered.
Lost rewards per epoch per active validator. Amount of rewards that the pools is missing every epoch for each active validator. Ideal value is 0 peta-wei.
We can observe a small increase in wrong votes but its almost neglectable. On the other hand just one block proposal was missed, since Coinbase proposes 6-8 blocks per epoch this is neglectable. Rewards stayed constant and there were almost no penalties.
We can observe that around 20% of stakewise validators voted wrong to source and head, but it was quickly fixed within an hour. We can observe that no block proposals were missed, the rewards decreased a bit during some minutes and the penalties also increased.
Beyond performance, note how the earned rewards variance is higher than for Coinbase, but its expected. This is because of block proposals. Since Stakewise contains less validators, it proposes blocks less frequently.
We can observe a similar pattern as in Stakewise, but with less validators affected. Rewards and penalties were impacted but it lasted less than an hour.
Quite interestingly, Bloxstaking performed great during The Merge transition, 9 hours later, 40% of their validators had an issue during a small period of time. This lead to a missed proposal and some missed rewards, but was quickly fixed.
Since Lido has around 30 different operators controling their validator, we will just share few examples of performance: some of them good, and others not so good.
Stakely validators had no issues during The Merge transition. We can observe a small increase in missed votes, but neglectable. There were no missed blocks and we can observe how rewards stayed constant. Note that the spikes in rewards are due to block proposals.
We can clearly see how Stakin had some issues after The Merge. Wrong source votes started to increase up to 18%, and 40% of their votes to the head of the blockchain were wrong, which can be an indicator that their beacon node was having troubles staying in sync with the latest head.
Its also clear how the rewards decreased from a moving average of 0.016 peta-wei per epoch to 0.012. And beyond this, some of their validators were slightly penalised. Last but not least we can observe a missed proposal few hours after.
For P2POrg we can see that the wrong votes increased a bit, but within reasonable limits when compared to other pools. However the amount of incorrect head votes increased notoriously, to almost 20% during few hours after The Merge. We can also see two missed proposals and a clear increase in the penalties.
First and foremost note that RocketPool is not a unique entity. We group here all solo stakers that run validatos for their pool, so the metrics we present here are an average of all of them.
We can observe how after The Merge, there was a spike to 10% in the number of incorrect source votes. Its not as crazy as Stakin (Lido) but enough to see a small impact in the rewards. Note that there were no missed proposals after The Merge.
We group here all validators that we estimate run on DappNode based on the graffiti they use. These are thousands of validators running across the world in different hardwares and maintaned by non professional operators.
We can observe that The Merge had almost no impact in the wrong votes, staying at a similar level as before. Without being perfect, they performed better than some Lido operators or well known staking pools as shown above. No blocks were missed, and the impact in rewards was neglectable.
Similar to the previous ones, here we group all validators that run Avado as per identified by their graffiti. We can see that after The Merge there was a spike in the amount of wrong votes, to 22% and then stabilized to 20%. We can clearly see how the rewards were impacted a bit and few hours later we had 3 missed block proposals.
All data is fetched directly from the beacon chain, using a Teku as Consensus Client and Geth as Execution Client.
All metrics are calculated on different groups of validators every epoch, which translates to datapoints every six minutes.
Validators are grouped per exchange, pool or solo staker according to different criteria. Some operators publicly recognise the validators they run. In other cases we rely on onchain analytics and different heuristics to label them. For solo stakers such as DappNode and Avado, we filter them out based on the graffiti they use.
The software eth-metrics is used to calculate all the metrics, and were exported into csv through Grafana for its analysis.
Python was used for the data analysis and plots. Note that the % of missed votes is a moving average of 10 epochs.
All these data and more can be accessed live at ethereumpools.info.
Disclaimer:
While the analysis was done with strict scientific rigor, bear in mind that some of the validators labeling might be wrong or incomplete. Note that it was crosschecked with other members of the community, but its imposible to have a 100% accurate estimation.
We are an independent entity that is not affiliated with any of the companies, operators or pools analysed here. If you think any data might be wrong, feel free to challenge us.
We presented an analysis of how different staking entities performed during 12 hours before and after The Merge. While this is an interesting analysis, don’t use it to judge a given entity. Real-time data with the latest performance can be found at ethereumpools.info. Bad performance during this transition mean nothing in the long term.
Its interesting to see that groups of solo stakers such as DappNode have a really good performance, taking into account that their nodes are not operated by profesionals in the cloud, but by individuals usually running the setup at their homes.
On the other hand, we can see that even profesional operators like the ones Lido has can have issues that take hours to be resolved.
The Merge went great, and you can refer to official sources for a different analysis on clients level rather than pools.
Needless to say, there was no slashing during The Merge, something that some people were really afraid of.
Performance over short periods of time is cool, but what really matter is a constant okey-ish performance month after month. Bear in mind when setting your KPIs, specially if you are a solo staker.
Solo stakers performed great during The Merge, specially when compared to some profesional operators.
For more on real-time Ethereum validator monitoring
Al. Rev.