Modelling Rated_v0’s output on validator effectiveness

March 4th, 2022

In this post, we are presenting a series of post-hoc analyses we ran on the efficacy of the Rated_v0 model, as a *descriptor* and later *predictor* of validator performance. We segment the analyses into several cohorts in order to test for the robustness of the approach in different frames of reference.

In order to reflect the changes that *Altair* has introduced to consensus rules and rewards distribution among validators, we have made changes to the components of the validator effectiveness model (namely uptime/participation rate and correctness). To dig deeper into what these changes entail, please see How the Rated_v0 model works.

For the purpose of the exercise, and to embed some determinism in the groupings, we have grouped validators over deposit address commonality.

The first frame we are slicing for is the at the validator index level, looking at only the 21,022 indices that have participated on the Beacon Chain since genesis. We do this to make an apples-to-apples comparison when we measure how well the proposed effectiveness metric describes rewards.

**Time period:** Genesis till Jan 6, 2022

**Validator indices:** 21,022

**Result 1:** It looks like there is a linear relationship but we have to filter for the outliers to make it more legible. These could be attributable to sync committee participation (post-Altair) and collecting a slashing reward as a proposer.

**Result 2:** Here we have filtered for the 11 rightmost indices in the sum_earnings scale. The result is a strong linear relationship between total validator earnings and validator effectiveness.

**Result 3:** Digging a bit deeper, we see strong correlation between validator effectiveness and earnings, though not as strong as correctness and uptime. This hints to the fact that although a good descriptor of rewards, the proposed effectiveness metric is lagging some of its composites.

Granted we are designing the effectiveness score for groupings of validator indices in order to reflect operator effectiveness, we perform the same analysis as above, but this time we look at the validator indices grouped per operator address; this is the eth1 deposit address. We also break the analysis in two parts (pre- and post- Altair) to reflect the adjustment we have made in the model.

**Time period:** Genesis to Oct 29, 2021**Validator indices:** 21,022

**Operator addresses:** 2,218

**Result 1:** The metric we are indexing for here with respect to rewards earned, is `earnings_per_val`

. We do this to control for varying validator index ownership. We observe that the same strong linear relationship holds. [R^2 = 99%]

**Result 2:** In this frame of analysis, the correlations between validator_effectiveness and earnings_per_validator have increased, indicating that the model is indeed better calibrated to describe the performance of groups of indices as opposed to individual indices. Correctness (Head and Target accuracy average) is still more highly correlated to `earnings_per_val`

.

Over the Altair upgrade, consensus rules regarding rewards changed (see here for an abbreviated version of the changes that took effect). In response to those, we proceeded to adjust the way we score for correctness and participation/uptime.

In this frame of analysis, we are looking at the group of all operator deposit addresses active post-Altair, that correspond to more than 200 indices per address.

We do this because sync committee participation introduces a lot of noise on `earnings_per_val`

; operators with less corresponding indices that have participated in sync committees earn an outsized reward that is not necessarily indicative of their performance. Granted Altair and sync committee rewards have only been live since Oct 29, 2021, the relative shortness of the time period further exaggerates the noise. We expect that as the post-Altair sample gets larger, the model’s descriptive power will further strengthen.

**Time period:** Oct 29, 2021 till Jan 7, 2022

**Validator indices:** 145941

**Operator addresses:** 115

**Result 1:** We observe that the same strong linear relationship holds. [R^2 = 99%]

**Result 2:** Although the correlation between validator_effectiveness has weakened compared to the pre-Altair frame of analysis, it appears that `validator_effectiveness`

is now more highly correlated to rewards than any of its components.

For completeness we ran the numbers on the performance of the overall operator effectiveness score post-Altair, while keeping the pre-Altair interpretation unchanged.

**Time period:** Oct 29, 2021 till Jan 7, 2022

**Validator indices:** 145941

**Operator addresses:** 115

**Result 1:** Surprisingly, we observed nearly identical results to the post-Altair adjustments we presented above [R^2 = 99%]. This is very encouraging, especially as two of our design goals starting out were Generalisability and Permanence.

**Result 2:** Digging into the components of the effectiveness model, we observe an equally strong correlation of the effectiveness model to the post-Altair adjustments. At the same time, however, the descriptive power of the components of the model (correctness in particular) somewhat weakened.

In this frame of analysis we are taking all validator indices grouped by operator address, and once again filtering for operators with more than 200 indices mapped to them (for the same reasons as above). As we are testing for the predictive power of the model, we are now taking an arbitrary cutoff value at epoch 67,500 (3/4 of our sample) and are regressing the aggregate validator effectiveness pre-cutoff, with their realised earnings post-cutoff. With this we are effectively surveying how well validator effectiveness can predict future earnings.

**Time period:** epoch [0 to 67,500] for validator_effectiveness and epoch [67,500 to ~90,000] for earnings_per_validator

**Validator indices:** 114672

**Operator addresses:** 115

**Result 1:** We observe that a strong linear relationship holds, with a correlation of 78% between validator_effectiveness (pre-cutoff) and earnings_per_validator (post-cutoff). [R^2 = 99%]

**Result 2:** Running the regression, we observe a very high R^2 (goodness of fit) and commensurately a very low p-value (statistical significance). We can therefore conclude that the v0 effectiveness score **is a very good predictor of future rewards potential.**

Through the exercise we went through on this post, we have shown the following:

- The Rated_v0 model for validator effectiveness is a very good descriptor of performance, both at the individual index level and the operator level.
- The Rated_v0 model for validator effectiveness is optimised for larger groupings of validator indices (> 200).
- The Rated_v0 model for validator effectiveness is a very good predictor of future earnings.

To add some colour to our findings about performance and rewards, we aggregated our results on the modelling we did on the whole sample (first 400 days of Beacon Chain Mainnet) in the following table. From there it’s evident that as the number of indices and/or the ETH/USD rate get larger, even smaller changes in validator and operator effectiveness can have an outsized impact on rewards earned.

We expect that as the Merge rolls on and the net return on ETH staked increases, both the variability of outcomes in observed performance, as well as the magnitude of performance improvements on realised rewards, will increase.

Perhaps the most interesting of all the results we presented in this post, is the fact that the initial model’s descriptive power remained unchanged as the Altair upgrade came to effect. This means that even without adjusting the definitions of correctness and participation/uptime to reflect the changes in consensus, the validator effectiveness methodology we proposed continued to perform.

While this result satisfies the design goals of **Generalisability** and **Permanence**, it somewhat compromises **Specificity**. Given that the results the two models produced are nearly identical, we are more inclined to stick with the updated model, as we feel there is power in correctly defining its components.

Be that as it may, the fact that the pre-Altair model of effectiveness held up well as the nature of rewards accounting changed, has important downstream implications when considering in-prod applications of it. If the initial model we proposed worked as an input in financial products (e.g. insurance, derivatives etc), then the downstream distortion the Altair upgrade would introduce would be minimal to non-existent, making those products more robust and resilient in return.

**Let’s Rate!** 🍬

*Follow Rated on Twitter* 👉 @ratedw3b

*Join the conversation* 👉 discord.gg/QHNCYmBPGq

Subscribe to Rated

Receive the latest updates directly to your inbox.

Verification

This entry has been permanently stored onchain and signed by its creator.

Arweave Transaction

CK9EyqW-HiIMs1E…3yzpuysfEAokHkI

Author Address

0x63deD17a9246816…F44Ff4EcE5409f6

Content Digest

2byJU9p9kn5T840…nGaNPH_zweUaKEc