As investors grapple with an increasingly heterogeneous blockchain landscape of over 100 different solutions, the nuances between chains can be difficult to parse out. While we won’t tackle that monster today, we’ll explore one of the most popular trends in DeFi today - the modular blockchain, a term popularized by Celestia. As opposed to a “monolithic” blockchain, whose nodes are responsible for data availability, execution and consensus, a modular blockchain is one in which data availability and consensus components are separated from execution. We’ll explore the current construction of blockchains, blockchains using Celestia, rollups, rollups utilizing Celestia and sovereign rollups. Strap in. This one gets a bit technical.
BTC Protocol Overview
· BTC node receives BTC transaction from peer
· Node verifies signature and checks transaction against consensus rules
· If validity fails, node drops the TX. If all checks out well, node adds transaction to mempool
· Miners then create candidate blocks and fill them with TXs from mempool
· Some miner eventually finds a valid nonce for their candidate block
· Passes their solution to peers, peers check the validity of this block, then begin extending the chain by building off this newly solved block
In this process nodes are responsible for carrying out a variety of tasks:**
1) Data Availability- nodes receive and store every transaction in the system in their local storage, and ensure that those transactions are available to any other peer in the network who may request it
2) Validity Enforcement/ Execution- nodes will immediately execute new transactions upon receipt to check their validity against protocol rules. Additionally, in a VM blockchain like Ethereum, nodes will execute transactions sequentially in a block to compute the new network state.
3) Consensus- nodes will collectively agree upon which transactions to include into a new block, and the time ordering in which they will be stacked. The nodes then attest to the block by placing some sort of economic stake against the block’s integrity.
We refer to blockchains where nodes perform these multifaceted roles as monolithic chains. In contrast, Celestia is the first implementation of a true modular blockchain. A modular blockchain is one in which data availability and consensus components are separated from execution.
So how does this compare to rollups, who also offloads execution to an outside system?
The majority of rollups in existence today use Ethereum as the data availability layer. In an Arbitrum transaction for example, the call data is submitted to the ETH mainchain just like an ordinary transaction. However, there is a specific opcode embedded which tells the validators not to run the call data through the EVM. the transaction is picked up and inserted into an Ethereum block, but the validators won’t execute it and transition the Ethereum global state against it.
Execution occurs by having a completely separate blockchain with its own group of validators ( the rollup chain) who are monitoring ETH L1 and downloading these unique transactions once they are posted. Then, the validators will apply these transactions to the rollup state to transition it to a new state. There is no threat of rollup validators reaching asynchronous state transitions when applying these transactions, since the transactions themselves have already been ordered during ETH consensus.
With Celestia, there’s the ability to bypass utilization of the Ethereum network entirely. The role that Ethereum plays (providing DA + ordering the data via consensus process) could be replaced by the Celestia blockchain, and because of specific design implementations, do it in a cheaper and more effective manner. However, it is not a 0-sum game between these two protocols. As we will explore in later paragraphs, there are scaling benefits that would involve supplementing both the Ethereum base layer and the Celestia blockchain into the design architecture. How rollups will want to integrate has tradeoffs and ultimately will be a design implementation decision.
How Celestia works:
- Celestia node receives rollup transaction (rollup transaction is submitted by a rollup node)
- Celestia node ensures appropriate fees were paid
- Nodes order the data (ie produce some sort of time sequencing for the TXs)
- Attest to the data by collectively signing off on the block’s integrity
- Compartmentalize the data based on a DNS mapping that corresponds to a particular rollup that plugs into Celestia
And that’s it! Just receive transactions, and order them into blocks. Unironically, Celestia was previously called Lazy Ledger before rebranding, a nod to the intentionally minimal functionality that the protocol affords.
What happens next with execution is where things get interesting.
One of the interesting properties of Celestia is the versatility it affords for execution. As demonstrated above, Celestia provides a protocol for ordering transactions into some sort of time sequencing, then applying consensus to these transactions. While seemingly trivial, its importance cannot be understated. Now that we have a) a group of transactions within the block b) a collective view on the order in which these transactions appeared, any rollup node can apply transactions to its initial state in the ordering format as produced by Celestia, and compute the new state.
To stress this point further, the Celestia protocol is completely blind to the data within the transactions they are putting into blocks. It’s the rollup’s job to figure out how they want to interpret the data embedded within these transactions (i.e. the rollup themselves establish protocol rules).
There are two ways in which these rollups can configure their architecture to plug into Celestia:
1) Sovereign Rollups
2) Settlement Enforced Rollups (CEVMOS, Celestiums, etc)
Here’s how they work:
Notice the difference here in terms of fraud dispute resolution. In a traditional ETH-based rollup, the fraud proof would be submitted to a smart contract that “bridges” the rollup chain and the ETH L1 chain, not the other rollup nodes. All of the protocol rules, which detail what is considered a valid state transition, are transcribed into the smart contract at the rollup chain’s inception. This smart contract will receive and interpret the fraud proof, and determine the canonical outcome of the rollup’s state (rolling back the chain if found to be invalid). Thus, the other rollup validators play no role in the fraud dispute resolution; rather this smart contract acts as judge and jury.
In the above sections we described the implementation details of what a “sovereign” rollup might look like, i.e. one in which nodes ultimately determine the canonical state of the ledger, as opposed to the smart contract that embeds the rollup’s consensus rules. But what if this sovereignty is an undesirable property? One of the key value propositions of an ETH-based rollup is that there’s transaction enforcement and fraud resolution underwritten by the ETH L1 validators directly, an ecosystem with hundreds of billions of economic stake. Thus, there are incredibly strong assurances attached to the integrity of blocks (and transitively, transactions within a block) produced by these validators. If we remove interaction with the ETH network, while we may save on fees, we also lose these strong economic assurances on the validity of transactions within our rollup.
With a sovereign rollup, the assurances of the chain are only as strong as the majority of nodes’ desire to adhere to them (which , during the early stages of a network are quite low, as forks are often desired) This principle holds true for all L1’s, but as the network grows in market capitalization, size and scope, a greater collection of factions and stakeholders materialize, many of whom have conflicting viewpoints on direction and decisions. As the economic footprint becomes increasingly diverse, protocol rules tend to ossify over time because getting an overwhelming majority of nodes to agree unilaterally on any decision becomes increasingly difficult. Rollups will need to consider this design implementation when figuring out what kind of Celestial based architecture model they want to conform to. If priority rests on extremely low transaction fees + ability to “move fast and break things”, sovereign chain model seems more applicable here. Likewise, if users are willing to stomach higher transaction fees in exchange for stronger underwriting of protocol rule enforcement, a Celestium architecture model seems more applicable.
Settlement Enforced Rollups
A settlement rollup is one in which the rollup nodes lack sovereignty over the execution environment. In other words, settlement is enforced by a smart contract that bridges the “rollup chain” with some other “higher order” chain. Settlement based rollups are the common framework most readers are likely familiar with, as this is the approach taken by Arbitrum, Optimism etc with the rollup settling to Ethereum.
While the full implementation details are out of the scope of this article, the rollup has its own chain that validators will append blocks to. As validators on the rollup add blocks (transactions) to the chain, the state of the rollup changes. Periodically, the validators will send a most recent merkle hash of the rollup’s state to an ETH based contract that “bridges” the rollup and Ethereum. The contract will “optimistically” accept this new state root as valid, and a 7 day challenge window will subsequently begin. If no challenges occur during the dispute window, then the contract will ossify the update into permanency. This action effectively anchors the state of the rollup into the state of the L1.
However, if the rollup validator submitted an invalid state update, then anyone in the system can attempt to roll back this fraudulent update by issuing a fraud proof. In the context of a court case, we could say that the fraud proof issuer is the plaintiff, the validator is the defendant, and the ETH based settlement contract acts as the judge. The ETH contract will “evaluate the evidence” by re-executing the state transition itself, taking the initial state and applying the transactions (this data provided by the fraud proof) , transitioning the state from S to S’. If S’ matches the state root posted by the validator, then they are “acquitted”, and the fraud disputer is slashed. If S’ does not match the validator proposed state root, they are found “guilty” and the validator is slashed.
Historically, Ethereum has been the dominant L1 in which rollups settle to. However, due to the high costs and limitations of ETH L1 gas, Celestia is in the process of developing an alternative L1 environment in which rollups can settle to. This alternative is called CEVMOS, and is a derivative of EVMOS. For those unfamiliar with EVMOS, think of it as a clone of Ethereum built as a Cosmos SDK blockchain. EVMOS is EVM compatible and IBC enabled. CEVMOS is a spearheaded partnership between the Celestia and EVMOS teams:
“The [EVMOS] settlement chain leverages Celestia as the data availability layer and Evmos functionality (smart contracts, interoperability, composability, and shared security) to provide a fully EVM-equivalent stack for interoperable smart contracts on Cosmos and the EVM ecosystem. Unlike mainnet Ethereum, the chain will be optimized solely for rollups.”
Celestiums are most akin to the concept of Validiums. They use Celestia for data availability/consensus of data, the rollup environment for execution, and Ethereum for settlement. While Ethereum may not be the most economically optimized settlement layer due to its scarce block space, rollups that build on this model are essentially prioritizing security + composability over cost, which is a valid tradeoff to make in many instances.
As we touched on earlier, a vanilla rollup like arbitrum uses ETH L1 as the data availability layer. This makes settlement trivial because the L1 contract (which underwrites settlement) already has ready access to the underlying transactions, since they are posted on L1 as call data.
The problem with Celestiums is that the transaction data does not rest on ETH L1, but rather is stored on Celestia. This creates issues during the settlement process because the ETH contract handling does not have access to those transactions, since they exist on a separate siloed blockchain. So, we introduce something called the Quantum Gravity Bridge (QGB) between the Celestia chain and the Ethereum chain as a solution.
Every time a new Celestia block is produced, an attestation is produced by the Celestia validators and transmitted through the bridge onto Ethereum. This attestation is quite simple: it’s just a merklized tree of all transactions across all block since Genesis, collectively signed by the Celestia validators:
As new Celestia blocks are appended to the chain, the validators will recompute the new TX root, sign the attestation, then submit it through the QGB
With all the data being fed to the ETH contract via the QGB, it looks somewhat akin to a light node, where only block headers are stored, but not the actual data behind those blocks (though in Celestia’s case, it's only storing a root hash of of all those block headers).
When the rollup validator attempts to “optimistically'' update the rollup, they will post the most current state root (SRn), their proposed state root (SRn+1), and a merkle proof of inclusion of the TX’s that are being applied during SRn→SRn+1. The purpose here is that if a fraud dispute were to arise, the ETH contract can hash down the merkle proof of inclusion for all the TXs in the batch into a single hash. That root hash should match the one sent by the Celestia validators through the QGB. This ensures that the TXs applied from SRn → SRn+1 are actually transactions included into the Celestia chain. Again, this process is needed because these are two siloed blockchains.
With all the data readily at the disposal of the ETH contract, proper settlement can now be ensured!
Light Clients Recap
As a recap, when Celestia block producers produce a block, they are gathering outstanding transactions, hashing the formatted TX data, grouping these hash digests into pairs, then subsequently hashing the resultant hashes until they compile down to a single root, which we call a merkle root.
By possessing the transaction merkel root, we are attesting to the integrity of the hashed data structure that the root encapsulates. In the above diagram, if we tried to swap out TX1 for a different transaction after creating the merkle root, we would have a totally different merkel root digest and be able to tell that the original input data had been tampered.
The problem here is that with just the merkel root alone, one cannot “unroll” the sequence to back out into all the independent transactions, due to properties of how a hash function works. The only possibility is cross checking the integrity of the merkel root structure if one has access to all the independent transaction data present within the tree.
The problem is that if a client has to download every TX that flows through the system in order to be able to verify the integrity of a block, that client inherently becomes a full node. Full nodes are undesirable because the infrastructure costs to operate are too expensive and tedious for the average user.
Light clients are nodes that only download block headers; they do not download every single transaction within the block. They are trusting other full nodes in the system that these merkel roots, and the transactions behind the root, are available and valid.
Why is it important that all transactions in a Celestia block are available anyway? Well, without being able to see the transactions behind the merkel root of a Celestia block header, rollup validators can’t apply these transactions to the current state, in order to transition it to state n+1. In essence, the network freezes up if there’s no transactions to apply.
**Data Availability and Data Withholding Attacks
So what are the shortcomings of this model? Well, everything described above is only applicable to full nodes. Full nodes can verify the integrity of a transaction root produced by a Celestia validator because they download every TX posted to Celestia and can run the resulting computation themselves in order to ensure that all the transactions in a Celestia block are indeed available in the network. Light nodes are out of luck; they are simply downloading block headers from the Celestia miners and hoping that the transactions are actually in there. Even worse, because they are following headers only, if Celestia validators continue to extend the chain and add more invalid headers, light clients will unknowingly fork away from honest full nodes who refuse to follow such headers due to their invalidity!
Using an example: suppose a Celestia miner produces a new block which is subsequently validated through consensus. There are 8 transactions in the block. A Celestia full node will attempt to download all 8 transactions, compile them down into a single merkel root, then counter check the integrity of their locally computed merkel root with the one posted by the Celestia block producers. Suppose however the full node only sees 7 transactions… This means that the full node cannot guarantee the integrity of the merkel root that was published via consensus by the Celestia block producers!
**However, light clients will see the block header published by the miners, which they claim to have 8 TX’s, and basically trust that those transactions are there. Now, a full node could try to “alert” the light client that there’s only 7 transactions in the block. This creates a complexity for the light client- who is in the right?
The only way for the full node to “prove” that 1 of the transactions is missing is to send the light client all 7 transactions that it knows about. However, who is to say that the 8th transaction is actually present, and there was a networking issue on the full node’s end, and they simply didn’t hear about it?
The light client could then request the Celestia miners for the 8th transaction in question. If they give it to them, great, they compute the merkle root locally and crosscheck against the one produced in consensus and verify if it's true or false. On the other hand, if the light client requests the celestia miners for the 8th TX and they can’t give it to them…then they know the miner is acting maliciously.
But a bigger challenge arises when Celestia validators start performing selective data withholding attacks. Referring to our example above, here’s how the attack would look:
In summary the problem with this model is twofold:
1) It requires the light node to download all the data behind a block to check for itself whenever a dispute about the integrity of a block header arises. This effectively turns the light node into a full node, the whole thing we wanted to avoid in the first place.
2) It’s impossible to pinpoint with certainty the malicious actor in a data withholding attack.
Data Availability Sampling
Data availability (DA) sampling is a cryptographic technique that allows light nodes to generate near full node security properties, without having to download the entirety of a block, solving our challenge above. While research into DA sampling has been an ongoing effort for a few years, Celestia is the first blockchain to implement it directly into the protocol, and as we will see momentarily, incorporates its features into how blocks are constructed from a protocol level.
The secret sauce to DA sampling involves erasure coding. At a high level, erasure coding is a form of redundancy that is appended to a data string. The unique characteristic is that, if we possess the redundant data, we can reconstruct the original data, in the event that it gets lost. Even cooler, it doesn’t matter which order of bits are lost…as long as we have X% of our tolerance threshold, the full original data can be reconstructed!
A useful example of erasure coding is in CD-ROMs; if the disc gets scratched and the original data gets corrupted, we can use the erasure padding to recreate the original data, and business can continue as usual.
You might be thinking this sounds too good to be true- because it comes with a few stipulations. There is a linear equation between the amount of erasure padding that must be appended, and how much recoverability it allows.
Code rate is simply the ratio of original data: padded data. A ½ code rate doubles the size of our total data structure. 1/3 triples it etc
To compute tolerance, ie how much of the original data we can afford to lose, and still reconstruct the whole packet, we use the formula: 1-(original data/data size after erasure coding)
Celestia uses 2D Reed-Solomon erasure codes with a 1/4 code rate. What this means is that Validators will erase code blocks prior to publishing them on chain. A 1MB block of original data will turn into a 4MB block after padding. The increase in data size sucks, but it means that as long as we have at least 25% of the transactions inside the block, we can recover the remaining 75%. Doesn’t matter the order of those 25%, any will work.
Data sampling in a nutshell is the light client randomly selecting a piece of transaction data from the block, and asking the Celestia validator to send it to them. In the event that a validator cannot send us a piece of data that we request, we know immediately that data has been withheld from the block.
Suppose the Validator creates a block header that attests to 1000 transactions inside it. However, he withholds 1 transaction, so in reality there is only 999. In a naïve model, a light node would randomly sample the block X number of times. Let’s say that threshold is 5 samples.
With a single sample, there is a 999/1000 chance that the Validator can “trick” us into believing he has all the data available. Thus, the light node needs to iterate this sampling process over and over until it is statistically negligible that he actually made all the data available. To get to a 99.9999% confidence interval, they would have to run approximately 13,800 samples. Better off just downloading the entire block…
However, when we bake in the concept of erasure coding, we can severely reduce that value:
P = x^k
99.9999% = 0.25^k
k = 10
This is order of magnitude in reduction: now, light nodes can get the same confidence interval with simply 10 data samples!
The most important point to notice here is that this sampling value is constant. It doesn’t matter if our block size only holds 100 transactions or 100 million transactions…either way 10 samples will satisfy our requirement. Because of this relationship, there is an incentive to maximize Celestia block sizes as much as we can. However, as we will discuss momentarily, there are some bottlenecks to this scaling factor which forces us to find a ‘sweet spot’ in terms of block size, it can’t just increase in perpetuity.
Intrinsically baked into this P=x^k equation is the assumption that we maintain a 75% tolerance threshold, ie we collectively have access to at least 25% of the original data. But if our light nodes are only downloading 10 TX’s a piece, that means we need a lot of nodes performing this simultaneously. Here’s how it all works:
Let’s work through an example. Assume a Celestia block has 10,000 transactions in it. Let’s say my light client wants a 99.9999% probability threshold of data availability guarantee. So it will need to run at least 10 samples. Now, suppose every other light client in the network went offline simultaneously and my light node was the only one sampling this block. Collectively, the nodes will have only 10/10,000 = 0.1% of the block. This is lower than the 25% minimum we need to rebuild the entire block structure. Uh oh.
The problem we face here is that when our node initially sampled, it was operating under the assumption that a bunch of other nodes “had its back” and would be performing the sampling process as well, which allowed it to relax the total # of samples it needed to run. But without this support, 10 is no longer sufficient.
This is one of the first bottlenecks to scaling in the Celestia system: for us to lean into this O(n) constant sampling, there MUST be a large number of other nodes performing the process alongside us; enough so that we collectively retrieve at least 25% of the total block data.
The second bottleneck to increasing the Celestia block size is we have to make it reasonable enough for some small number of nodes to run full node clients. We don’t need everyone to run a full node, but we do need a few. All light client DA sampling can do is give the light node confidence that the data is available somewhere. But it doesn't deliver knowledge of what the data actually is, meaning a light node can't compute the current state or apply state transitions. Only full nodes can perform this task.
This is quite an interesting relationship because as more nodes that engage with the network, the more scalable it becomes. This lies in contrast to most blockchains of the present, where a marginal increase in nodes does not contribute to system scalability, but actually makes block production more expensive.
Overall, the modular blockchain stack introduces certain competitive advantages over its monolithic counterparts, as well as semi-modular stacks like ETH-based rollups. Let’s briefly summarize some of the primary advantages discussed in this article:
Scalability – In a monolithic blockchain, consensus and execution are performed by the same validator entity. This bottlenecks throughput severely because in a distributed system we want a lot of redundancy at the node level. And because blockchains are globally replicated, distributed state machines, the more complexity of execution we push onto this global state machine, the more costly and complex it becomes for the system to maintain synchronicity.
Expense- Rollups are a nice scalability improvement because they remove computation (an expensive resource) off the base layer nodes and push it on some external system. Rather than having thousands of computers execute complex state updates and then globally coordinate and reach consensus, we only need to have a small group of super computers doing that.
Rollups are a huge improvement to naïve L1 execution but still pose challenges because in order to solve the data availability problem, the call data still needs to be posted to L1. Archiving data on Ethereum is much cheaper than performing execution, but because it's still inherently competing for highly scarce block space on the L1, it’s a non-trivial cost. This is why ETH-based rollup fees are still in the order of $2-3.
Celestia solves the problem of data availability by having rollup transactions posted to Celestia chain, which will only ever be a data availability layer, never anything execution related. This means that transactions posted to Celestia will never be in competition with other “high profile” transactions that are performing compute onchain and thus paying orders of magnitude higher fees. Additionally, because nodes aren’t performing dual roles as they are in monolithic L1s, they don’t need to have high performance processors; instead, they can optimize their hardware around storage and bandwidth only.
Sovereignty - Additionally, rollups built on Celestia maintain sovereignty over their environment. This is not the case for a monolithic-based rollup where security is underwritten by the base layer. Take optimistic rollups on ETH for example. In theory we can have a single rollup block producer and so long as there is one honest party re-executing the transactions alongside the validator and cross checking their work, the system can be secure. But this only works because if there’s a faulty state transition, the watcher will issue a fraud proof to the ETH-based smart contract, which will determine the canonical view of the rollup chain. And the consensus rules as to what determines a state transition as “valid” must be transcribed into the smart contract from the beginning of the rollup chain. This is troublesome in the event when you want to change the consensus rules (say to implement a soft fork or a hard fork) because the smart contract is immutable. So you are faced with two decisions:
1) You have a multisig that allows the devs to upgrade the contract
2) DAO, which runs via onchain voting, and can be subject to gamification and exploitation
Contrast this to how validity works in a L1 blockchains work, where typically there is no token voting that determines who controls the chain, but rather operates through offchain governance which allows nodes/users control over the network (nodes follow chain header they consent to). This allows for non-restrictive consensus forks and consensus rules implementations.
Data Availability Sampling – handles the problem of light nodes’ interdependence on full nodes. By introducing data availability sampling techniques at the protocol level, light nodes are able to get strong assurances of data availability while simultaneously scaling the chain. This is a huge win because most participants in a blockchain network will be operating light clients, as full nodes require capital intensive hardware and bandwidth, and can be costly and difficult from a skill level to properly maintain.
Written by Joe Kendzicky - @jkendzicky
Join us in Discord
Follow us on gm.xyz