Celestia - An Introduction

Execution, Consensus and Data Availability

Celestia is a modular blockchain that decouples the consensus and execution layers, which are traditionally intertwined in monolithic blockchains. Celestia's architecture is not just an incremental change but a fundamental rethinking of how blockchains are structured, aiming to serve as a foundational layer for both L1s and L2s.

To better understand how Celestia decouples consensus and execution, it’s important to have a basic understanding of blockchain stacks:

Consensus allows users of the network to reach an agreement on the state of the blockchain. It involves establishing a common understanding of the transaction history and the order in which transactions have occurred.

Execution refers to the interpretation of transactions and their subsequent impact on the blockchain's state. This process is crucial for the accurate update and maintenance of the blockchain.

Data availability refers to the assurance that data is present and accessible within the network, ensuring that all participants can verify transactions and maintain the integrity of the blockchain. We’ve already discussed the importance of data availability for scaling in a previous article [link EigenDA article], but here’s a short rundown:

Data availability signifies the assurance that nodes within a network can access and confirm the details of transactions, meaning this information is accessible to any node. Whenever a new block is created, it is essential that all its data is published and accessible to other participants in the network, especially nodes. These nodes need the capability to download the block's data and reprocess the transactions to confirm their legitimacy.

In monolithic blockchains, data availability is usually not an issue, as all transaction data in each block is downloaded by the full nodes. This ensures that since a node can download all the data, the data is indeed available. However, each node downloading all the data poses a scaling challenge and is not efficient.

Celestia's core focus is on efficiently ordering transactions and ensuring data availability, which ensures that other chains can scale more efficiently, as they do not have to download all block data.

Monolithic vs Modular:

Monolithic Architecture: Currently, most chains are monolithic in nature, where consensus and state execution are concurrent processes handled by the same set of validators. This architecture, while providing a robust framework, has inherent scalability issues and often leads to high transaction fees. The need for full nodes to download and execute all transactions creates a bottleneck, limiting the blockchain's ability to scale efficiently.

Modular Architecture (Celestia’s Approach): In contrast to the monolithic model, Celestia adopts a modular architecture. This innovative structure separates the execution and consensus layers into distinct, specialized components. This separation not only allows for greater flexibility in blockchain development but also enables the creation of chains that are optimized for specific applications or use cases. Celestia’s focus is on ensuring data availability and maintaining consensus, a streamlined approach that enhances overall network efficiency.

Mechanics

Celestia operates as a flexible data availability layer, utilizing data availability sampling, supported by lightweight clients and fraud proof mechanisms. It comprises two main elements: a Proof of Stake (PoS) blockchain built on the Cosmos SDK and a modified version of the Tendermint consensus protocol, alongside a network of light clients, referred to as the Halo Network.

The Celestia blockchain is maintained by a group of validators operating full nodes, which are responsible for downloading the complete block data. This system is specifically designed for data availability, ensuring these full nodes function with maximum efficiency and scalability.

Rollups utilize Celestia for storing and publishing their transaction data. Celestia uses data availability sampling (DAS) to ensure that the complete data of a block is available for all nodes in the network, particularly for light nodes that don't download the entire block data.  Data availability sampling relies on light nodes to confirm the availability of data without the need to download the entire block data, by randomly sampling small segments of the block data. Once enough light nodes have sampled data, there is a high assurance that the data is available.

In Celestia, validators are responsible for downloading all the block data. However, light clients, which have limited storage and processing capabilities, only access a small, randomly selected segment of the block data. These light clients are thus tasked with storing and validating only a portion of the data, typically only the block headers, not the entire blockchain. These headers contain commitments like Merkle roots, which act as a digital fingerprint for the entire block data, effectively summarizing the list of all transactions. Since numerous light clients verify different segments at random, the integrity of the entire dataset is highly likely to be maintained. When needed, light clients can also publish their portions of block data to full nodes for further verification, particularly when a full node lacks the complete data or wishes to confirm its accurate storage and publication.

To support DAS, Celestia utilizes a two-dimensional Reed-Solomon encoding technique. Also referred to as erasure coding, encoding is crucial for data recovery and integrity. In erasure coding, redundancies are created in  the data, expanding 'n' data into 'k' data, where 'k' is greater than 'n'. These redundancies, known as erasure codes, are instrumental in recovering the full data if any part of it is lost or corrupted.

This process involves dividing the block data into multiple chunks, arranging these chunks into a matrix, and then expanding this matrix with additional parity data for redundancy and error correction through Reed-Solomon encoding. The result is an extended matrix. Merkle roots are computed for both the rows and columns of this extended matrix, and the Merkle root of these roots is then included in the block header as a commitment to the block data.

For verifying data availability, light nodes engage in a process where they randomly select coordinates in the extended matrix and request data chunks and corresponding Merkle proofs from full nodes. Successful receipt and validation of these data chunks by light nodes suggest a high likelihood that all block data is available.

Furthermore, in what is known as network gossiping, any valid data chunk received by a light node is shared across the network. If enough unique chunks are collected by the light nodes, honest full nodes can reconstruct the entire block.

If you’re curious what the Celestia Halo or consensus node maps look like, you can head over to our maps explorer and see for yourself: https://validao.xyz/#maps

ValiDAO scans and and keeps an up-to-date map of Celestia's Halo and consensu networks
ValiDAO scans and and keeps an up-to-date map of Celestia's Halo and consensu networks
Subscribe to ValiDAO
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
Verification
This entry has been permanently stored onchain and signed by its creator.