Exploring Blockchain Data Availability and Blobspace with a Deep Dive into Solutions and Strategies

✅ Follow me: X: @Younsle1 Warpcast: @zer0luck

TL;DR 👀

  • Data availability in blockchain networks is crucial for ensuring accurate and complete information.

  • Full nodes offer the highest security by downloading and verifying all transactions, yet require significant resources.

  • Light clients, with a lower level of security, download only block headers, assuming all transactions are valid.

  • The data availability problem relates to ensuring that all new block data is genuinely published on the network.

  • Light clients depend on full nodes for fraud prevention against invalid transactions, necessitating a mechanism for block producers to publish all data.

  • Services like EigenDA, Celestia, and Arbitrum offer decentralized solutions with customizable slashing mechanisms, consensus, and Data Availability Committees (DAC) for detecting manipulated data.

  • Dank sharding simplifies sharding by focusing on data storage and availability, organizing blocks as large 'Blobs' to minimize verification efforts.

  • Full dank sharding impacts the entire network, requiring all nodes to download and process data blobs organized in a 2D grid for efficient data verification.

Data Availability Problem

Let's use an analogy to understand why data availability is essential. 👇👇

Explaining Data Availability Through a Library Book Lending System

Explaining Data Availability Through a Library Book Lending System -DALL·E 2-
Explaining Data Availability Through a Library Book Lending System -DALL·E 2-

Libraries provide the service of storing and lending a vast collection of books. This is like how each node in a blockchain network stores and makes transaction data accessible.

Searching for Books Directly

  • Users visiting the library to find books is like operating a full node. By searching directly, one can ensure the information is accurate and complete.

  • However, not all users have the time or resources to visit the library.

Using the Online Catalog

  • Most people opt for the online catalog for convenience and timesaving. This is like light clients or rollup validators querying data in a blockchain network.

  • While the online catalog is fast and convenient, one must trust that the library's database is accurate and current.

Problem with Data Availability

  • Users might rely on incorrect data if the library's online system shows errors or provides manipulated information. This mirrors the circulation of inaccurate or manipulated transaction data in a blockchain.

  • An alternative method must be used to verify whether the online system's data is current and accurate.

Data Verification and Validation

  • To verify the accuracy of the library system, users can refer to catalogs from other libraries, book review sites, and other sources.

  • This is like cross-verifying data among multiple nodes in a blockchain. Checking various sources is crucial to ensure the accuracy of information.

  • This translation maintains a professional tone while being approachable, effectively conveying the original message's intent and details.

Blockchain Node Processing Structures

Blockchain Node (Full, Light)
Blockchain Node (Full, Light)

In blockchain, each block is composed of two main components 👇

  1. Block Header

    1. This block metadata consists of basic information about the block, including the Merkle root of transactions.
  2. Transaction Data

    1. Occupying most of the block, this consists of the actual transactions.

Two Types of Nodes in the Network

  1. Full Node

    1. This type of node downloads and verifies all transactions on the blockchain.

    2. Although it requires significant resources and storage space, it is the most secure as it cannot be tricked into accepting blocks with invalid transactions.

  2. Light Client Node

    1. One can operate a light client if one lacks the resources to run a full node on a computer.

    2. Light clients do not download or validate transactions.

    3. Instead, they only download the block headers and assume that only valid transactions are included, making light clients less secure than full nodes.

Analysis of Data Availability Problem

The term "data availability problem" refers to a challenge encountered in various blockchain scaling strategies.

Q. How can a node be sure that all new block data has been genuinely published on the network?

👉 The dilemma lies in the possibility that if the block producer does not disclose all block data, no one can detect whether the block contains any malicious transactions.

Transaction Validity Detection Problem in Light Nodes

Q. How can Light Clients indirectly verify the Validity of all transactions in a block?

👉 Instead of verifying transactions, light clients can rely on full nodes to send Fraud Proofs if a block contains invalid transactions. This serves as evidence that certain transactions within the block are invalid.

Light Node -> (action:Fraud Proof) -> Full Node

One issue is that full nodes need access to the block's transaction data to create Fraud Proof.

Suppose the block producer publishes only the block header without the transaction data. In that case, full nodes cannot verify the transaction's Validity or generate fraud proofs for invalid ones.

Block producers need to publish all block data, but there needs to be a mechanism to enforce this requirement.

Block Producer -Post-> (Block Header, Non-TxData?) <= Full-Node (invalid txData):Fraud Proof Generation

Comparison of Celestia's and EigenDA Services for Detecting Manipulated Data in Blockchain

EigenDA DA Layer (ReStaking Mechanism)

EigenDA DA Layer (ReStaking Mechanism)

EigenLayer
EigenLayer
  1. Custom Decentralization

    1. Native stakers can support services that demand high decentralization (e.g., settings allowing only native stakers to participate).

    2. Multi-alignment and secret sharing, censorship resistance through MPC (Multi-Party Computation) across various nodes.

  2. Customizable Slashing

    1. Provides economic security through a variety of slashing mechanisms.

    2. Specifies slashing conditions in the slashing contracts deployed for each service.

    3. By choosing specific services and restaking, stakers accept the risks according to the rules of the slashing contract.

    4. Restakers receive additional rewards for their staked ETH, and validators gain extra income from their validation work.

    5. In case of disputes, EigenLayer uses the service's slashing contract to verify malicious actions by stakers and carry out slashing.

Celestia's DA Layer

Celestia's DA Layer

Celestia
Celestia

Utilizes Ethereum or other blockchains. These providers have consensus mechanisms to ensure data correctness and must identify issues that need to be processed.

The more robust the consensus layer within the DA Layer, the higher the reliability of the data.

Provides transaction data to validators to verify the chain state. As each DA Proposer ensures the Validity and availability of data differently, it's up to the chain to decide which provider best suits their objectives.

Data Availability Committee (DAC)

Key Elements of AnyTrust

  1. Data Availability Committee (DAC)

    1. Assumes that at least two members out of N in the committee are honest.

    2. If N-1 members promise data access, at least one will be an honest member who provides the data.

  2. Keysets

    1. Specifies the public keys of committee members and the number of signatures required for valid data availability certificates.

    2. Keysets are identified by hashes.

    3. The L1 KeysetManager contract maintains a list of currently valid keysets.

  3. Data Availability Certificates (DACert)

    1. Includes the data block's hash, expiration time, and signed proof from N-1 committee members.

    2. Based on the 2-of-N trust assumption, a DACert proves that a data block is accessible by at least one honest committee member until expiration.

  4. Data Availability Servers (DAS)

    1. Committee members operate DAS software.

    2. Provides two types of APIs: Sequencer API and REST API.

    3. Data can be stored in local files, Badger databases, Amazon S3, etc.

  5. Sequencer-Committee Interaction

    1. The Arbitrum sequencer sends data batches to committee members via RPC.

    2. After collecting sufficient signatures, the sequencer generates a DACert and publishes it on the L1 inbox contract.

How It Works

AnyTrust offers two methods: either the Arbitrum sequencer directly publishes data blocks to the L1 chain, or it publishes a DACert proving data accessibility.

L2 software verifies the Validity of DACerts and reads the data block if it is valid.

If the collection of signatures fails, the sequencer can "fall back to roll up," directly publishing the complete data on the L1 chain.

Integration of Celestia with Arbitrum Rollups

Integration of Celestia with Arbitrum Rollups

This integration involves using Celestia as a modular Data Availability (DA) solution in Arbitrum Sepolia for deploying the Orbit chain.

Key features include Nitro fraud proofs and Ethereum fallback functionality.

Celestia performs as the first modular DA integrated with the Nitro stack.

Allows for easy deployment of high-throughput Arbitrum Orbit chains, as simple as deploying smart contracts.

Following additional testing and audits, I planned the main net launch in early 2024.

Solutions for Enforcing Transaction Data by Block Creators

Light nodes must verify whether block transaction data has been published on the network without requiring full nodes to confirm this.

Requiring light nodes to download the entire block to verify its publication contradicts the purpose of light nodes, so it's crucial to first identify where the Data Availability problem is most pertinent.

What Should Be the Focus in Addressing Data Availability Issues?

Considering Block Size Increase

Keeping the block size artificially limited maintains a network size that allows full nodes to operate and verify the full chain. This standard is crucial for enabling full nodes to function effectively.

Impacts of Increasing Block Size Limit

Increasing the block size limit may reduce the number of full nodes independently verifying the chain, leading to greater reliance on relatively less secure light client nodes.

This could negatively impact decentralization, especially if it becomes easier for block creators to alter protocol rules or for light clients to mistakenly incorporate malicious transactions.

Supporting Fraud Proofs for Light nodes

Adding Fraud Proof support for light clients is crucial. Still, it's also necessary for light clients to have the capability to confirm whether all data in a block has been published.

Scalability and Efficiency of Rollups

Optimistic Rollups represent a new scaling strategy based on a concept called rollups, which can be considered a type of side chain-like sharding.

These side chains have dedicated block producers capable of transferring assets to different chains.

Reasons for Enhanced Scalability of Rollups

  1. Reduced Burden on the Ethereum Chain

  2. Ethereum validators and nodes do not need to compute every rollup transaction but only verify the security and Validity of these transactions. This improves scalability.

  3. Interaction Between Rollup Sequencers and Ethereum

  4. On Ethereum, some contracts impose restrictions on the actions and asset connections that rollup sequencers can perform.

Utilization of Data Availability in Rollups

Rollups use the blockchain as a data availability layer solely for dumping blockchain transactions.

All transaction processing and computations are designed to occur within the rollup.

The blockchain need not perform actual computations but should at least arrange transactions into blocks and ensure the data availability of transactions.

What if a block proposal includes invalid transactions in a block and steals all the money on the sidechain?

Fraud Proof can be used for detection, but users of the side chain need a way to verify that all data for every block on the side chain has indeed been published.

Solution: Ethereum-based rollups can resolve this by publishing full rollup blocks on the Ethereum chain and relying on Ethereum for data availability.

Ethereum acts as a data availability layer for dumping data, a strategic structure for utilizing Ethereum data.

Differences in Block Detection Methods between ZK Rollup and Optimistic Rollup

  1. Optimistic Rollup

    1. L1 contracts consider all matches valid but impose a delay before assets can be withdrawn from the rollup.

    2. During this delay, anyone can submit Fraud Proof if the sequencer commits improper transactions.

  2. ZK Rollup

    1. Like Optimistic Rollup, but instead of using Fraud Proofs to detect invalid blocks, it uses Zero-Knowledge (ZK) cryptographic proofs called Validity Proofs to demonstrate the Validity of the blocks.

The Need for Data Availability in ZK Rollup

Even if a block producer creates a valid block and proves it with Validity Proofs, if the block's data is not made public, users cannot know the state and balance of the blockchain and, therefore, cannot interact with the chain. This highlights the ongoing need for data availability.

Solutions for Blockchain Throughput via Sharding and Data Availability

In the sharding process, the network is divided into multiple chains through sharding, each with its own self-block producer.

Each shard is equipped to communicate with others for transferring tokens.

The essence of sharding is dividing network block producers, so instead of processing all transactions in one place, each shard handles only a portion of the transactions.

The role of nodes is divided such that in a sharded blockchain, typically, full nodes operate in one or a few shards. In contrast, light nodes operate in the remaining shards.

Operating full nodes across all shards contradicts the primary objective of sharding, which is to distribute network resources across multiple nodes.

Sharding -> Block Producer (BP)
	-> (BP1) -> TX'(A)
	-> (BP2) -> TX'(B)
	-> (BP3) -> TX'(C)
	-> (BP4) -> TX'(D)
	-> (BPn) -> TX'(N)

Full Node -> [TX'(A)+TX'(B)+TX'(C)+TX'(D)+TX'(N)] 
... Reamin:TX' -> Light Node 	

Q. What happens if a shard's block producer is malicious and starts accepting invalid transactions?

With only a few block producers in each shard, it becomes easier to launch attacks.

The block producers are distributed across multiple shards, which is a crucial point to consider in the security of sharded systems.

Analysis of EIP-4844 for Enhancing Scalability and Efficiency of Rollups

Blob

Typically, a Blob in databases and other storage systems stores large binary data such as images, videos, or audio files. It's a container for raw data that cannot be easily represented in text or numbers.

A key component of EIP-4844 is "blob-carrying transactions".

These transactions include data chunks called Blobs (up to 16 per block) stored on shards.

Blobs can contain all types of data, including smart contract bytecode, NFTs, application data, etc.

EIP-4844 Spec: Blob Life Cycle

Blob Process
Blob Process

Initial Creation Process

The lifecycle of a small blob begins when a rollup sequencer signs and broadcasts a new type of transaction called a Blob Carrying Transaction.

The transaction includes sender, receiver, nonce, gas fees, etc.

New elements in the transaction:

max_fee_per_blob_gas: The bidding price of how much the sender is willing to pay for the blob.

blob_version_hashes: A list of hashes of the blobs (a transaction can contain multiple blobs).

Notably, the actual blob data should be included in the transaction.

The execution layer only recognizes a reference to the blob through the blob_version_hashes field.

The data operates as a sidecar, being received, shared, and verified in the consensus layer.

Blob Composition Process

Blob Data Capacity

According to the EIP-4844 standard, a blob can contain up to 125 KB of data.

Data Chunk Division:

Data is divided into 4096 chunks of 32 bytes each. Any shortfall in data is filled with zeros.

Polynomial Equation Calculation

Each chunk is treated as a number, and a polynomial (P(x))(P(x)) is calculated, where (w) satisfies (w4096=1).(w^{4096} = 1).

Optimized Calculation

Using (w)(w) allows for speed enhancements through optimization techniques.

(P(i))(P(i)) corresponds to each ithith chunk.

Secret Point Evaluation

(P(x))(P(x)) is evaluated at a very specific and secret point (s)(s), which is not disclosed.

Blob Commitment Creation

The value of (P(s))(P(s)) acts as the blob's commitment, effectively functioning as a hash function. Any alteration in data changes the commitment completely.

Version Commitment

The commitment undergoes a process to become versioned. It comprises the last 31 bytes of the SHA-256 hash of (P(s))(P(s)) and a Version Byte 0x01.

Use in Blockchain

The final versioned commitment is used in EVM as a 32-byte length. It can be utilized in smart contracts for various types of ZK technologies.

DankSharding Implementation

With DankSharding implemented, blobs could contain more than 4096 chunks but currently start with 4096.

Blob Expiration Process

According to the Deneb fork p2p spec, nodes provide requested blob data for approximately 18 days or 4096 epochs. Afterward, they clean the data from their hard drives and stop serving it.

https://eth2book.info/capella/part4/history/deneb/

For blob expiration

A cap is set on additional storage requirements, using 125KB per blob.

With 4 blobs per block as the target, old blobs are deleted as new ones come in, requiring less than 50GB of additional data storage.

In blockchain, discarding data may need to be more intuitive.

Data availability does not necessarily mean data storage.

Expecting scalability with a model where everyone stores everything forever for free is unrealistic.

Ethereum ensures that those needing data can download and store it at the protocol level, mitigating rollup restrictions.

Expired blobs don't mean the data is lost forever. It can constantly be retrieved through means external to the protocol.

Some methods are more decentralized than others, and especially in storage, a 1-of-n trust assumption applies - only one node needs to be honest to deliver the stored blob.

Thanks to the specified hash of the blob version, users cannot be tricked into accepting blob data without proper channels.

Security-Oriented Blob Structuring

While simply committing data could be achieved with SHA-256(data), using polynomials allows for the use of various technologies like Erasure Coding, polynomial Commitment systems, and data availability sampling.

Erasure Coding

Erasure Coding Diagram
Erasure Coding Diagram

Erasure coding is a method used to enhance data availability by transforming and expanding data into a polynomial.

Transformation of Original Data into a Polynomial

For example, consider the string "DEAD". Assign numerical values to each character: 'D' = 1, 'E' = 2, 'A' = 3. From the numeric list [1,2,3,1], create a third-degree polynomial P(x)P(x). This polynomial corresponds to the original data points [1,2,3,1] at x=1,2,3,1x = 1,2,3,1.

Polynomial Expansion for Additional Points

Evaluate P(x)P(x) at additional points, say x=5,6,7,7x = 5,6,7,7, to generate expanded data [5,6,7,7].

Data Recovery

From the total data [1,2,3,1,5,6,7,7], having any 4 points allows the reconstruction of the original polynomial and the recovery of the remaining data.

For instance, knowing only [1,3,6,7] allows for the recovery of the other numbers.

This method allows for the recovery of entire data despite partial loss. In actual implementations, modular arithmetic prevents numbers from becoming too large, but the principle remains the same. This ensures data availability and enhances resilience against data loss.

Data Availability Proof and Security Mechanism via Erasure Coding

Clients can download a tiny part of a block to verify with a high probability that all block data has been published.

To use 100% of a block, a block producer must only publish 50% of the block on the network.

Does a malicious block producer want to withhold even 1% of the block? Since 1% can be recovered from 50%, they must withhold 50% of the block. Clients can use this to verify that no part of the block has been withheld.

If the download of even one chunk fails, the client can reject the block as unusable.

Trying to download a random chunk gives a 50% chance of detecting the block as unusable if part of it is missing.

This means that a client can verify with a high probability that the entire block has been published by downloading only a part.

KZG (Kate-Zaverucha-Goldberg) Polynomial Commitment

Polynomial Commitment Scheme

This method compresses a high-degree polynomial into a short commitment CC, acting as a cryptographic hash.

Verifying Polynomial Evaluations

Using the commitment CC, a verifier can be assured that a polynomial evaluates to a specific value yy at a given point xx without revealing the entire polynomial.

Efficient Verification

The verifier needs four values to confirm the accuracy of the polynomial evaluation: the commitment CC, the point xx, the evaluation yy , and the proof π\pi.

Elliptic Curve Properties

The encrypted form of a number xx is denoted as [x][x]. Encrypted forms of multiplication and addition maintain their relationship with operations on unencrypted numbers.

Computations without Secret ss

Utilizes elliptic curve scalar multiplication, a one-way cryptographic function that allows computations without knowing the actual value, to generate the Secret value ss.

Given ss encrypted power, one can compute the encrypted form of P(s) P(s) without knowing ss.

Trust in the Setup

Trust is required that the person who knows ss and generates ss encrypted power will discard ss after use.

Distributed Trust

A trustworthy setup involves many participants creating and combining parts of the Secret. This ensures that the final Secret ss remains unknown unless all participants conspire.

Ethereum's Contribution to Trust

Ethereum conducted the KZG Ceremony with hundreds of thousands of participants to ensure the security secret of the cryptographic protocol.

Dank Sharding

Sharding
Sharding

Simplified Version of Sharding Focused on Data Storage and Availability

Instead of dividing the entire blockchain, this approach shards only the data storage, maintaining operations on the main chain.

Large amounts of data can be off-chain to smaller shards.

Abandoning the concept of "mini blockchains," blocks comprise large "Blobs." These blobs are designed to require minimal effort from each node to verify the availability of all data. The key here is the verification of data availability. Nodes can be assured of the presence of all data without needing to download everything.

This approach helps resolve issues with rollup sequencers withholding data.

For example, as nodes wouldn't follow a fork containing invalid transactions, they won't follow a fork where 99% of validators withhold data.

Disadvantages of Dank Sharding

The process of composing blocks becomes resource-intensive in terms of bandwidth and computation, potentially leading to the centralization of block producers. However, block producer centralization is already occurring due to the Maximal Extractable Value (MEV).

Block producers already have hardware optimized for extracting MEV from Ethereum's block space.

With Dank Sharding, they include large chunks in blocks, while the rest of the network performs minimal work to keep this in check.

Proto Danksharding (EIP-4844)

Dank Sharding, while significantly simplifying the original data sharding plan, is still complex and time-consuming to deploy.

It utilizes a type of transaction for rollup sequencers called blob-carrying transactions. Since EIP-4844 alone does not allow for actual data scaling, blobs are limited in number and size for now (as all nodes still need to download every blob).

However, it's a significant step as it meets the data requirements of rollups.

When full Dank Sharding is provided, there will be no change from the perspective of rollups, except that blobs will suddenly become much larger.

Full Danksharding

Based on EIP-4844, it affects the entire network from the execution layer of Ethereum to the consensus layer. The core of this technology is that all nodes download and process every data blob, which is crucial for maximizing data availability and enhancing the overall stability of the network.

Data Blob Reconstruction

Data blobs start relatively small, reducing the network's burden and allowing more data to be processed efficiently.

These data blobs are organized in a 2D grid structure, which facilitates the reconstruction and verification of data.

Efficiency of Data Reconstruction

In a 2D grid, having some data points of any row or column allows for the reconstruction of the entire row or column.

This ensures data integrity and plays a significant role in reducing network load.

Random Sampling Checks

In full Dank Sharding, random sampling checks verify data integrity. This method effectively confirms that data is correctly stored and processed while minimizing the burden on the network.

Conclusion

  1. Specific Application of Data Availability (DA) in Rollups

    1. In rollups, DA plays a crucial role in ensuring the availability of off-chain data and maintaining the integrity of transactions.

    2. Rollups use DA to store off-chain data and provide evidence for its verification when needed, contributing to reduced throughput on the main layer and increased efficiency.

  2. Cost and Throughput Analysis of the Solution

    1. Rollups using DA typically reduce on-chain data storage costs. However, additional computational and verification costs may arise to ensure data availability.

    2. DA enables rollups to process more transactions quickly, reducing the load on the main layer and increasing the overall network throughput.

  3. Advantages and Disadvantages of the Solution

    1. Advantages

      1. Improved Scalability: Ability to process more transactions quickly.

      2. Cost Reduction: Reduction in on-chain data storage costs.

    2. Disadvantages

      1. Increased Complexity: Additional mechanisms are needed to ensure data availability.

      2. Security Considerations: Additional measures may be required to maintain the security and integrity of off-chain data.

  4. Cost Reduction Effects of EIP-4844

    1. Introduction of Blob-carrying Transactions: EIP-4844 introduces blob-carrying transactions, enabling more cost-efficient storage and management of data.

    2. Gas Cost Reduction: This significantly reduces gas costs for rollups and lowers overall transaction costs.

    3. Increased Rollup Efficiency: Rollups can process more data at lower costs through EIP-4844, enhancing overall network efficiency.

Future Update Directions

The Ethereum-Cancun Upgrade aims to improve the scalability, security, and efficiency of the Ethereum network (introducing the concept of Proto-dank sharding).

The Ethereum 2.0 (Serenity) upgrade, initially planned for October 2023, is postponed to the first half of 2024.

It aims to resolve unresolved issues after the Shanghai upgrade (based on 5 EIPs, including EIP-4844, EIP-1153, EIP-4788, and EIP-6780).

  1. Improved Scalability

    1. Introduction of temporary data storage capacity.

    2. Ability to process more transactions.

  2. Reduced Transaction Costs

    1. Cost-efficient data addition through blob-carrying transactions.
  3. Optimized Data Management

    1. Block space optimization and reduced on-chain data storage costs through EIP-1153.
  4. Improved Cross-chain Communication

    1. Enhanced interoperability between blockchain networks through EIP-4788.
  5. Enhanced Security

    1. Reduction of risks associated with the SELFDESTRUCT code through EIP-6780.

Resources

Data Availability

Sharding + Blob

KZG Erasure Coding

Subscribe to Zer0Luck
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
Verification
This entry has been permanently stored onchain and signed by its creator.