Title Image by MMT from Pixabay
For a while now, we’ve been looking at client diversity, and sometimes beating the drum pretty hard admittedly, to try to make sure everyone understands what it is and why we care…
Sites like clientdiversity.org have done a good job of socialising some of the data, and a lot of people are sitting up and taking notice, which is awesome.
The gap, as always with these kinds of things, is a lot of data is ‘opt in’ - where people go out of their way to volunteer data in order for us to get the level of data that we have today, and still there’s a large gap - over 30% of execution layer clients we don’t actually know about.
With the consensus layer, several tools like blockPrint from sigp have helped gather data, and as it turns out that seems to be fairly reliable. The remaining gap is on the execution layer side where is also currently (in theory) the largest disparity in diversity, with geth occupying an estimated 63% (very very rough) of all execution layer clients.
Recently, there was a proposal suggested to try to surface some of this information. Basically, the idea is that clients can use some spare graffiti space to pack information into without the need for a hard fork, as the space is already available.
Obviously we don’t want to stop people being able to set graffiti, but at the same time the feeling is that this would be super useful information to have, if people have space available for us to use.
In teku land, we’ve taken the position that this data should be gathered in an ‘opt-out’ fashion, because we really do believe it’s important information, so as long as there’s space available in proposal graffiti, we’re going to try to pack some information.
If a user sets no graffiti, this might then appear something like `TKcdcb1773GE87246f3c`, and admittedly this is not super pretty, so lets unpack it a little…
Basically this string has 2 sections, “TKcdcb1773” and “GE87246f3c”. The first section starts with TK, so that indicates teku was the consensus-layer client, and the data after TK is the git hash of the client. Similarly, the second part starts with “GE”, which indicates geth, and the rest is the version of geth in use. Each client will use their own 2 character code.
If a user specifies a string in graffiti, and we have space left for the full identifier, we’ll go ahead and add it. If we do run out of room, we’ll start reducing the length of the git hash to the point where we may end up with just the client types (TKGE).
Some people may prefer that version information isn’t surfaced for security reasons etc, so it’s possible to specify in teku that you just want the client strings without version information, and if people could use that at a minimum that’d be very very helpful in gathering client diversity information. If people do feel super strongly about it, it’s possible to disable this reporting completely. Please be aware the data is just for diversity information, so it’d be very helpful if it is possible to leave at least the client types being reported when blocks are produced.
If you fill more than 28 bytes of graffiti, that’s totally fine, we’ll not overwrite your graffiti to gather this data, we can appreciate that it’s ultimately yours to do with as you will!
Very similar to gathered data currently, this is purely an effort to understand network diversity, and it’s ultimately possible that people want to falsely report for whatever reason, but if nodes can report honestly it would allow us to get a much clearer overall picture of the network health in terms of diversity, and maybe we can reduce that 30% or more that are un-reported…
Because this is still in the process of being implemented by various teams, it’s possible only the CL might be reporting, but over the coming weeks it’s likely this data will become a lot more common in graffiti. It’s my hope that explorers may choose to display the information in some way and filter it from graffiti, but I’m not sure if there’s plans around that, or if it’s just something I think would be a good idea :)
Hopefully this helps shed some light on the mystery of this new data appearing in your proposals!