Modelling Rated_v0’s output on validator effectiveness

Useful resources

Context

In this post, we are presenting a series of post-hoc analyses we ran on the efficacy of the Rated_v0 model, as a descriptor and later predictor of validator performance. We segment the analyses into several cohorts in order to test for the robustness of the approach in different frames of reference.

In order to reflect the changes that Altair has introduced to consensus rules and rewards distribution among validators, we have made changes to the components of the validator effectiveness model (namely uptime/participation rate and correctness). To dig deeper into what these changes entail, please see How the Rated_v0 model works.

For the purpose of the exercise, and to embed some determinism in the groupings, we have grouped validators over deposit address commonality.

Genesis validator effectiveness [all time]

The first frame we are slicing for is the at the validator index level, looking at only the 21,022 indices that have participated on the Beacon Chain since genesis. We do this to make an apples-to-apples comparison when we measure how well the proposed effectiveness metric describes rewards.

Time period: Genesis till Jan 6, 2022
Validator indices: 21,022

Figure 1: Scatterplot of validator effectiveness and validator earnings, at the index level [Genesis to Jan 6, 2022]
Figure 1: Scatterplot of validator effectiveness and validator earnings, at the index level [Genesis to Jan 6, 2022]

Result 1: It looks like there is a linear relationship but we have to filter for the outliers to make it more legible. These could be attributable to sync committee participation (post-Altair) and collecting a slashing reward as a proposer.

Figure 2: Cleaned-up scatterplot of validator effectiveness and validator earnings, at the index level [Genesis to Jan 6, 2022]
Figure 2: Cleaned-up scatterplot of validator effectiveness and validator earnings, at the index level [Genesis to Jan 6, 2022]

Result 2: Here we have filtered for the 11 rightmost indices in the sum_earnings scale. The result is a strong linear relationship between total validator earnings and validator effectiveness.

Table 1: Correlation matrix of key results [Genesis to Jan 6, 2022]
Table 1: Correlation matrix of key results [Genesis to Jan 6, 2022]

Result 3: Digging a bit deeper, we see strong correlation between validator effectiveness and earnings, though not as strong as correctness and uptime. This hints to the fact that although a good descriptor of rewards, the proposed effectiveness metric is lagging some of its composites.

Operator effectiveness [pre-Altair]

Granted we are designing the effectiveness score for groupings of validator indices in order to reflect operator effectiveness, we perform the same analysis as above, but this time we look at the validator indices grouped per operator address; this is the eth1 deposit address. We also break the analysis in two parts (pre- and post- Altair) to reflect the adjustment we have made in the model.

Time period: Genesis to Oct 29, 2021Validator indices: 21,022
Operator addresses: 2,218

Figure 3: Scatterplot of validator effectiveness and validator earnings, at the index level [Genesis to Oct 29, 2021]
Figure 3: Scatterplot of validator effectiveness and validator earnings, at the index level [Genesis to Oct 29, 2021]

Result 1: The metric we are indexing for here with respect to rewards earned, is earnings_per_val. We do this to control for varying validator index ownership. We observe that the same strong linear relationship holds. [R^2 = 99%]

Table 2: Correlation matrix of key results [Genesis to Oct 29, 2022]
Table 2: Correlation matrix of key results [Genesis to Oct 29, 2022]

Result 2: In this frame of analysis, the correlations between validator_effectiveness and earnings_per_validator have increased, indicating that the model is indeed better calibrated to describe the performance of groups of indices as opposed to individual indices. Correctness (Head and Target accuracy average) is still more highly correlated to earnings_per_val.

Operator effectiveness [post-Altair, post-adjustment]

Over the Altair upgrade, consensus rules regarding rewards changed (see here for an abbreviated version of the changes that took effect). In response to those, we proceeded to adjust the way we score for correctness and participation/uptime.

In this frame of analysis, we are looking at the group of all operator deposit addresses active post-Altair, that correspond to more than 200 indices per address.

We do this because sync committee participation introduces a lot of noise on earnings_per_val; operators with less corresponding indices that have participated in sync committees earn an outsized reward that is not necessarily indicative of their performance. Granted Altair and sync committee rewards have only been live since Oct 29, 2021, the relative shortness of the time period further exaggerates the noise. We expect that as the post-Altair sample gets larger, the model’s descriptive power will further strengthen.

Time period: Oct 29, 2021 till Jan 7, 2022
Validator indices: 145941
Operator addresses: 115

Figure 4: Scatterplot of validator effectiveness and validator earnings, at the operator deposit address level; post-adjustment [Oct 29, 2021 till Jan 7, 2022]
Figure 4: Scatterplot of validator effectiveness and validator earnings, at the operator deposit address level; post-adjustment [Oct 29, 2021 till Jan 7, 2022]

Result 1: We observe that the same strong linear relationship holds. [R^2 = 99%]

Table 3: Correlation matrix of key results; post-Adjustment [Oct 29, 2021 till Jan 7, 2022]
Table 3: Correlation matrix of key results; post-Adjustment [Oct 29, 2021 till Jan 7, 2022]

Result 2: Although the correlation between validator_effectiveness has weakened compared to the pre-Altair frame of analysis, it appears that validator_effectiveness is now more highly correlated to rewards than any of its components.

Operator effectiveness [post-Altair, pre-adjustment]

For completeness we ran the numbers on the performance of the overall operator effectiveness score post-Altair, while keeping the pre-Altair interpretation unchanged.

Time period: Oct 29, 2021 till Jan 7, 2022
Validator indices: 145941
Operator addresses: 115

Figure 5: Scatterplot of validator effectiveness and validator earnings, at the operator deposit address level; pre-adjustment [Oct 29, 2021 till Jan 7, 2022]
Figure 5: Scatterplot of validator effectiveness and validator earnings, at the operator deposit address level; pre-adjustment [Oct 29, 2021 till Jan 7, 2022]

Result 1: Surprisingly, we observed nearly identical results to the post-Altair adjustments we presented above [R^2 = 99%]. This is very encouraging, especially as two of our design goals starting out were Generalisability and Permanence.

Table 4: Correlation matrix of key results; pre-adjustment [Oct 29, 2021 till Jan 7, 2022]
Table 4: Correlation matrix of key results; pre-adjustment [Oct 29, 2021 till Jan 7, 2022]

Result 2: Digging into the components of the effectiveness model, we observe an equally strong correlation of the effectiveness model to the post-Altair adjustments. At the same time, however, the descriptive power of the components of the model (correctness in particular) somewhat weakened.

Operator effectiveness predictive power

In this frame of analysis we are taking all validator indices grouped by operator address, and once again filtering for operators with more than 200 indices mapped to them (for the same reasons as above). As we are testing for the predictive power of the model, we are now taking an arbitrary cutoff value at epoch 67,500 (3/4 of our sample) and are regressing the aggregate validator effectiveness pre-cutoff, with their realised earnings post-cutoff. With this we are effectively surveying how well validator effectiveness can predict future earnings.

Time period: epoch [0 to 67,500] for validator_effectiveness and epoch [67,500 to ~90,000] for earnings_per_validator
Validator indices: 114672
Operator addresses: 115

Figure 5: Scatterplot of validator effectiveness and validator earnings, at the operator deposit address level [in/out of sample regression analysis]
Figure 5: Scatterplot of validator effectiveness and validator earnings, at the operator deposit address level [in/out of sample regression analysis]

Result 1: We observe that a strong linear relationship holds, with a correlation of 78% between validator_effectiveness (pre-cutoff) and earnings_per_validator (post-cutoff). [R^2 = 99%]

Table 5: Regression results [in/out of sample analysis]
Table 5: Regression results [in/out of sample analysis]

Result 2: Running the regression, we observe a very high R^2 (goodness of fit) and commensurately a very low p-value (statistical significance). We can therefore conclude that the v0 effectiveness score is a very good predictor of future rewards potential.

Conclusion

Through the exercise we went through on this post, we have shown the following:

  • The Rated_v0 model for validator effectiveness is a very good descriptor of performance, both at the individual index level and the operator level.
  • The Rated_v0 model for validator effectiveness is optimised for larger groupings of validator indices (> 200).
  • The Rated_v0 model for validator effectiveness  is a very good predictor of future earnings.

To add some colour to our findings about performance and rewards, we aggregated our results on the modelling we did on the whole sample (first 400 days of Beacon Chain Mainnet) in the following table. From there it’s evident that as the number of indices and/or the ETH/USD rate get larger, even smaller changes in validator and operator effectiveness can have an outsized impact on rewards earned.

We expect that as the Merge rolls on and the net return on ETH staked increases, both the variability of outcomes in observed performance, as well as the magnitude of performance improvements on realised rewards, will increase.

Table 6: Sensitivity analysis on how changes in validator effectiveness impact operator rewards [first 400 days of Beacon Chain Mainnet]
Table 6: Sensitivity analysis on how changes in validator effectiveness impact operator rewards [first 400 days of Beacon Chain Mainnet]

Perhaps the most interesting of all the results we presented in this post, is the fact that the initial model’s descriptive power remained unchanged as the Altair upgrade came to effect. This means that even without adjusting the definitions of correctness and participation/uptime to reflect the changes in consensus, the validator effectiveness methodology we proposed continued to perform.

While this result satisfies the design goals of Generalisability and Permanence, it somewhat compromises Specificity. Given that the results the two models produced are nearly identical, we are more inclined to stick with the updated model, as we feel there is power in correctly defining its components.

Be that as it may, the fact that the pre-Altair model of effectiveness held up well as the nature of rewards accounting changed, has important downstream implications when considering in-prod applications of it. If the initial model we proposed worked as an input in financial products (e.g. insurance, derivatives etc), then the downstream distortion the Altair upgrade would introduce would be minimal to non-existent, making those products more robust and resilient in return.

Let’s Rate! 🍬

Follow Rated on Twitter 👉 @ratedw3b
Join the conversation 👉 discord.gg/QHNCYmBPGq

Subscribe to Rated
Receive the latest updates directly to your inbox.
Verification
This entry has been permanently stored onchain and signed by its creator.