Celestia 101 - Modular Components for Permissionless Innovation
0xbd05
April 13th, 2022

Preface

I was drawn to the concept of Celestia when I first saw it. I thought it would bring about a massive paradigm shift. The multi-chain ecosystem became incredibly clear.

What appeals to me most is that Celestia will bring permissionless innovation, which is a very open component. From the evolution of biology to the development of science and technology, there will always be some open components and some building blocks in a system, which increase the network's connection and support the network to burst out more complex innovations. Celestia is such a component that it will promote the further explosion of innovations and components, like a positive feedback loop, to promote the more prosperous and more complex evolution of the entire blockchain system.

1. What is Celestia

Celestia is the first modular blockchain network. Unlike traditional monolithic blockchains, it does not execute any transactions; it only provides a consensus data network.

It provides a particular execution environment - Data Availability (DA) service. By separating the DA layer, the consensus and execution layers are decoupled - the blockchain becomes more modular.

Before everything starts, we first need to understand a few concepts.

2. Consensus and Execution

Blockchain is essentially a distributed network that runs State Machine Replication (SMR). SMR has three primary stages: data, consensus, and execution; the blockchain is also divided into these three layers. The key to creating currency on the Internet is to introduce a consensus system that cannot be interfered with by outsiders. The solution proposed by Satoshi is to introduce the "Nakamoto Consensus" so that people worldwide can maintain and operate Bitcoin.

Here's how Bitcoin's consensus protocol works:

  • Bitcoin nodes receive transactions from peers.
  • Nodes verify signatures and check transactions against consensus rules.
  • If validity fails, the node rejects the transactions. If the verification passes, add them to the memory pool.
  • Miners then create candidate blocks and populate them with transactions from the mempool.
  • Under the POW mechanism, some miner eventually finds a valid nonce for their candidate block.
  • Broadcast, the nodes check the block's validity and extend the chain by building this new block.

During this process, the nodes perform various tasks:

  1. Data Availability (DA) — Nodes receive every transaction in the network, store it locally, and ensure that those transactions are available to any other node that may request it.
  2. Execution — Nodes will check the validity of a new transaction against the protocol rules after receiving it. Also, they will execute them sequentially in a block to compute the new network state.
  3. Consensus — Nodes will jointly agree on which transactions will be included in the new block and the chronological order in which they will be stacked (Transaction ordering). Nodes then attest to the block by imposing some economic weight on the integrity of the block.

A modular blockchain is a blockchain that separates the DA layer, the execution layer, and the consensus layer. Celestia acts as the DA layer to verify the integrity of the data. For validators, this occurs during consensus. For non-consensus nodes, this happens when blocks pass consensus and are propagated throughout the network.

In a monolithic blockchain, these three layers of work are all done by a network, from data verification to transaction execution are all done by network nodes. Since the blockchain is a distributed globally Replicated State Machine, the more complex we push execution onto this global state machine, the higher the cost and complexity for the system to maintain synchronization.

Roll-up solves part of the problem, separating the execution layer to handle complex transactions, and they use Ethereum as their DA and consensus layers. Publishing data (by "calldata") on ETH L1 is much cheaper than executing on L1, but it's still essentially competing for the highly scarce block space on L1, so it's still going to cost a lot.

Two problems will result:

  1. The cost of "call data" is fixed, at least for now; no matter how powerful L2 performance is, the cost of L1 call data (16 Gas / Byte) cannot be changed, which is why the current Roll-up cost is still very high s reason.
  2. L1 is limited by resource pricing. As long as the block space is written on L1, the gas cost problem will continuously attack users.

We can solve this problem with Celestia, Celestia can provide DA, and finally return the verification result to Ethereum (of course, this is only one of the use cases of Celestia, which will be described in Section 7).

3. The dilemma of Scaling

There are usually two aspects to blockchain scaling: Scalable verification of computation and Scalable verification of data Availability. In simple terms, it refers to increasing the number of transactions processed without increasing the cost of verification.

Nodes essentially limit how many transactions and gas fees the blockchain can process per second. Since the hardware's processing power is limited, we can almost estimate the computing power of the public chain.

This is true of all blockchain networks (except Solana). When demand exceeds supply, transaction fees go up. Blockchain can only guarantee storage and computation, not cheap transaction fees, which can skyrocket once network capacity is hit. Solana doesn't get into this issue because it doesn't care about the cost of running a full node, supply is always greater than demand, and of course, something is sacrificed.

Bitcoin averages 4 megabytes per block, and the reason why the block cannot be expanded is that the block expands -> the threshold for full nodes increases -> the number of full nodes in the network decreases, and the number of light nodes increases -> the network security cannot be guaranteed, and it becomes more centralized.

Satoshi Nakamoto proposed the concept of SPV when designing Bitcoin. That is, the network can be used without running a full node. The light node only downloads the data of the block header and not all the data. The execution of the transaction is run by the download code of the full node, and the integrity of the data is verified at the same time.

Reducing full nodes weakens decentralization because light nodes have an honest majority assumption. By default, light nodes believe that the transaction behind the block is valid.

Resisting data withholding attacks is the main security gap between light and full nodes.

Data Withholding Attacks:

When miners produce a block, they hash the formatted TX data. These hash digests are grouped into pairs, and the resulting hashes are hashed until compiled into a single root, called a Merkle root. By Merkel root, we prove the integrity of the hashed data structure.

There is a problem here: due to the nature of the hash function (Trapdoor Function), using only Merkle roots, we cannot unroll the sequence to access all TXs. We can only check if the structure is complete. The light node is the node that only downloads block headers and does not download all transactions.

By default, Light nodes trust other full nodes in the network that the transactions behind these Merkel roots are available and valid. As a result, light nodes may fork unknowingly with honest full nodes if the block header is invalid (such as being misled by malicious nodes).

Verifying that data is in a block is easy. Proving that data is not in a block is difficult. Therefore, DA is guaranteed by the full node downloading all data. Because full nodes do not assume that the consensus is honest (light nodes have this assumption and only download block headers), malicious consensus can never deceive full nodes into accepting invalid blocks.

However, to ensure the decentralization of the blockchain, we cannot raise the operating threshold of full nodes, and in current blockchain networks, a marginal increase in nodes will not contribute to the scalability of the system and will actually make producing blocks more expensive.

This is the blockchain trilemma, which can be well solved by modularizing the monolithic architecture.

4. Celestia - Data Availability Sampling Light Node

Recommended reading: Principles of Sampling

Celestia is designed to provide consensus and data availability without executing any transactions, unlike most other blockchains. Likewise, Celestia light nodes do not verify transactions (whether this transaction can be executed or not). They only check each block for consensus and whether the block data is available to the network.

Data Availability Sampling (DAS) is a cryptographic technique that solves the above challenges by allowing light nodes to generate security properties close to full nodes without downloading the entire block.

While research into DA sampling has been going on for several years, Celestia is the first blockchain to implement it directly into the protocol.

Briefly explain:

DAS mainly applies erasure coding technology. Using erasure coding to data is a method of expanding data. That is, all data can be reconstructed by having a small part of the data.

One use case for erasure codes is in CD-ROMs. If the disc is scratched and the original data is damaged, we can use erasure coding to recover the entire data.

Celestia applies erasure coding to blocks, allowing light nodes to randomly sample pieces of data belonging to a block, probabilistically guaranteeing that other pieces of data are available. The number of nodes participating in the sampling process is vital for this probabilistic guarantee. Block producers propagate the headers to the network. According to the data routing belonging to the header, each light node requests a random piece of data. If all samples are available, this can prove that the entire block is available. By sampling random data from a block, it can be probabilistically verified that the block is complete.

Celestia can verify that 100% of the data is available by sampling 75% of the data in the block. As long as there are enough light nodes to ensure that the probability threshold is 99.9999%, each node only needs to extract 16 TXs to verify the integrity of the data (the specific principle). This reduces the amount of data a single node needs by a square root.

Due to the nature of Celestia nodes that do not execute transactions, transactions posted to Celestia will never pay high transaction fees to compete for computing power like other blockchains. Furthermore, since nodes do not perform dual roles as in monolithic L1, they do not require high-performance processors, and a light client such as a mobile phone can complete a sample verification of 16 TXs. This enables light nodes and honest full nodes to follow the same chain, maximizing the functionality of light node verification.

As mentioned above, in traditional blockchains, the marginal increase of nodes will not contribute to the system's scalability and will increase the cost of block production; not so with Celestia. The key to DAS is that the more data you sample, the more data available you can determine.

In Celestia, it is safe to increase the block size (which can provide higher TPS) as the number of nodes participating in the data sampling increases.

5. The Birth of Great Ideas

The idea of Celestia can be traced back to a white paper written by Satoshi, in which Satoshi mentioned that light nodes could become more secure if full nodes send an "alert" to light nodes when they find the invalid block.

“One strategy to protect against this would be to accept alerts from network nodes when they detect an invalid block, prompting the user’s software to download the full block and alerted transactions to confirm the inconsistency” — Satoshi Nakamoto, Bitcoin: A Peer-to-Peer Electronic Cash System.

Vitalik also came up with this concept in his early years when he designed Plasma. The original Plasma paper described a mechanism to build a "blockchain tree". Each node in the tree will represent a unique blockchain connected to its parent node, all of which are arranged in a hierarchy of chains with the Data Availability Layer (DA) at its core.

By: The Celestia Thesis - Rain and Coffee

Great ideas are always built on "adjacent possibilities", and Celestia finally did it.

6. Cost

Recommended reading: Ethereum Rollup Call Data Pricing Analysis

L2 costs are divided into two parts:

Fixed cost:

  • Proof cost (in case of zk rollups) = ranges in gas, typically based on rollup provider
  • State Write cost = 20,000 gas
  • Ethereum Base transaction cost = 21,000

Variable costs:

  • Call Data: 16 gas per byte of data.
    (Roll up collects data from multiple transactions into a batch transaction published to Ethereum. The batch transaction includes the aggregated transaction data as calldata, i.e. data that is published to Ethereum but not executed directly)
  • L2 Gas: Usually quite cheap.

The current limitation on L2 is the Call Data cost, which is the lingering 16 gas/byte.

Here is a data dashboard for rollup Call Data:

Ethereum is also solving this problem:

EIP - 4488: Reduce gas fee from 16 per byte to 3 gas per byte

EIP - 4844: Shard Blob Transaction reduces Call Data fee by changing L2 call data mode

I am very optimistic about Danksharding. Celestia is diversified, and it is not a zero-sum game with Ethereum. I think Celestia will be the most secure and efficient Validium solution.

I don't know what the end is like, I can only imagine all the possibilities as I can, and finally, I tell myself: I have no reason not to embrace this new thing because Celestia will bring more possibilities of innovation and combination.

7. Possible combinations

Peter's tweet inspired me.

Celestia centric:

  • Sovereign Rollup: Built on Celestia, it has its P2P network for OPR or ZKR.
    There is no need to publish proofs and data to the L1 contract but to verify in an independent P2P network. The requirements for the ecological expansion and developer experience are high.
  • Settlement Rollup: Roll up built on the settlement layer. Cevmos is a good settlement layer, which can be well compatible with the Ethereum ecosystem. Modular settlement is usually limited to only running the roll up environment of the execution layer to prevent other computing from competing for space.
  • Celestium: Through the quantum gravity bridge, Celestia will serve as a DA off-chain solution for Ethereum roll-up, providing data availability for rollups on Ethereum, which is currently the safest low-cost Ethereum Validium design I've seen. Roll up publishes the Batch to Celestia, and Celestia publishes the verification result to Ethereum through Quantum Gravity Bridge.

8. Advantages of Modular Innovation

1. Efficient technical iteration

If you look at the history of EIPs, many are related to execution. But the upgrade involves a unified upgrade of consensus and execution, so development is slow. In a modular blockchain, Celestia realizes the decoupling of consensus and execution, making technology upgrades more convenient, encouraging experimentation, and paving the way for innovation.

Modularity opens the door to permissionless innovation.

2. Sovereignty

Because the Celestia network itself is only responsible for verifying data integrity and does not involve a complete consensus mechanism, the rollup on Celestia is essentially a self-sovereign blockchain. Nodes are free to hard/soft fork their software.

In L1, the fork will mean the fork of the execution and consensus layers. If the roll-up on Ethereum is vulnerable or attacked, it needs to be redeployed or the entire network fork to complete the state update. But Celestia allows chain forks without fear of losing security because the DA layer used after the fork is the same.

Updates become easier, technology iterates faster, and the execution layer can focus on optimizing the environment and rate of execution.

3. Easy to deploy

Deploying a chain requires establishing consensus and incentivizing nodes to join the network, which requires high resources and costs. With the development of PoS, tools such as the Cosmos SDK make it easier to create new blockchains, but developers still need to find validators to join.

Optimint introduced by Celestia will help developers deploy chains more efficiently because Celestia provides consensus and security.

4Cross-chain interoperability

Multi-chain adopts the same DA layer, which realizes the trust-minimized bridge between blockchains in the same cluster. It improves the security that multiple blockchains can communicate with each other.

Celestia combines the open ecology of Cosmos and the shared security of Ethereum, providing the possibility of multi-chain openness and shared security.

Future

We are on the verge of a massive paradigm shift, the crypto ecosystem is developing and iterating in an unprecedented way, and all efforts and attempts are approaching the end of the winding road for this goal. Celestia brings more possibilities to the network ecology, and the evolution of the network requires such components to support more possibilities.

Before entering the door that Celestia opened for us, it was hard to see how many doors there were behind it for me to explore - the space of adjacent possibilities expanded.

Maybe Celestia is not the best solution, but it will bring more innovation to the ecology.

I hope this article can help you.

Thanks for reading, and if you like my articles, feel free to chat with me on Twitter.

Arweave TX
g3cqP0xuJumTu9_PJuAw44AUnKAXOYzcjeXr-77FmZ4
Ethereum Address
0xbd0531975D4D273e557E40856320304B39806AD8
Content Digest
7q29a3lMXkG2f51-BrTLxovS3tqnwf4qi7_VWFhnqKc