Secret Sauce of Sui part 2 : Narwhal and Bullshark - The secret sauces of Sui’s consensus scalability

Narwhal and Bullshark - The secret sauces of Sui’s consensus scalability Managing transactions that involve owned objects, such as individual payments or minting and transferring NFTs, can be handled independently without strict ordering requirements. Conversely, transactions involving shared objects, such as liquidity pools or order books in decentralized exchanges (DEXs), where multiple users can access and manipulate them, require a specific order of execution. Sui leverages this distinction to handle owned and shared objects differently.

For owned objects, Sui utilizes FastPay, as previously discussed in the previous article Below article :

to enable parallel execution for owned objects, allowing them to forgo Sui consensus. In contrast, transactions involving shared objects require sequencing with a consensus protocol. To achieve this, Sui utilizes the horizontally scalable Narwhal DAG-based mempool for data dissemination and employs Bullshark for data ordering.

1 . Overview

The core concept behind Narwhal is to establish a high-performance distributed ledger by separating the responsibilities of the mempool for data dissemination and the consensus layer for data ordering. In traditional mempools, transactions submitted by users to validators require each validator to broadcast the transactions to other validators' mempools. A proposer or leader then selects transactions from their mempool to create a block, which is subsequently broadcast again during the consensus mechanism. This process leads to inefficiency since transactions are broadcast twice, first from the mempool and then from the block dissemination during the consensus process.

In contrast, DAG-based consensus systems like Narwhal and Bullshark enable each validator's mempool (using Narwhal) to independently disseminate batches of transactions. When a validator receives a transaction batch, it stores the batch as a vertex within a Directed Acyclic Graph (DAG) in its local view. Edges from a vertex connecting to the previous round vertices represent votes for the vertex. By utilizing a consensus logic, each validator can interpret the local DAG structure and totally order it without the need for additional message exchanges. This method ensures that the consensus logic built on top of the DAG structure adds no extra communication overhead.

It's important to note that this article provides a general understanding of the concept based mainly on the works referenced in [1], [2], [3], and [4]. We aim to explain it to general readers, so we have omitted technical details such as proofs, claims, lemmas, etc. For those interested in delving deeper into the material, we recommend reading the referenced sources to gain a comprehensive understanding after grasping the foundational concepts outlined here.

2 . Narwhal

Narwhal serves as a mempool layer used to construct a round-based DAG for each validator. The DAG is composed of vertices and edges, where each vertex represents a batch of transactions, and outgoing edges connecting a vertex are the references to its parents from the previous round (as illustrated in Figure 1). Validators exchange information about these vertices and references with one another to form a DAG within their respective local views. Narwhal is mainly used to construct a DAG, while a consensus logic built on top of Narwhal is responsible for interpreting it.

Figure 1 shows the components of the round-based DAG, which contains vertices (blue boxes) and outgoing edges (references) represented as green arrows.
Figure 1 shows the components of the round-based DAG, which contains vertices (blue boxes) and outgoing edges (references) represented as green arrows.

As always, we will consider a n3f+1 system designed to tolerate up to one-third of Byzantine nodes. In Narwhal, each validator consists of workers and one primary. Workers have two responsibilities:

     1.Stream batches of transactions to corresponding validator workers (e.g., Worker1      shares batches with other Validator Worker1, as shown in Figure 2),

     2.Send digests (hashes) of the batches to their respective primaries. The primary      of each validator uses these digests to run the Narwhal mempool protocol and form      a local round-based DAG in its own view.

Figure 2 illustrates the inner workings of each validator, which is composed of workers and primary.
Figure 2 illustrates the inner workings of each validator, which is composed of workers and primary.

Thanks to the independent operation of each worker, the data dissemination process can be horizontally scaled by adding more workers to a validator system. The primary remains the bottleneck because each validator has only one primary responsible for running the mempool protocol. However, this trade-off yields significant benefits. The separation of data dissemination from DAG construction allows a primary to form a DAG independently of data dissemination. Additionally, primaries utilize digests of batches from the workers to run the mempool protocol instead of using the entire transaction batches, resulting in significant bandwidth savings.

2.1 Constructing the DAG

To construct a DAG, each validator follows these steps at round r:

     1. Each validator's primary sends a message to all other validators, including:

          a.     Digests of batches of transactions from its workers

          b.     n-f certificates (references) from the previous round (r-1),

     2. Upon receipt, a primary (receiver) validates the message by ensuring that:

          a.     The message is from the same round as the receiver.

          b.     The receiver receives this message from the sender primary for the first                     time in round r.

          c.     Its workers have stored the batches corresponding to the specified digests                in the message.

     3. If the checks pass, the validator casts its vote for the message by sending back           its signature.

     4. The sender aggregates n-f signatures from different receivers to create a           certificate and send this certificate back to all other primaries. The receivers can           then use this certificate along with the associated message digest to create a           vertex in their respective local DAGs.

     5. Once a primary can form n-f vertices, each with a certificate, it can proceed to the next round, r+1.

Figure 3: The flows of how a validator broadcasts its vertex and adds other validators’ DAG into their local DAG.
Figure 3: The flows of how a validator broadcasts its vertex and adds other validators’ DAG into their local DAG.
Figure 4 shows the responsibilities of workers and a primary of each validator (source: https://www.youtube.com/watch?v=K5ph4-7vvHk)
Figure 4 shows the responsibilities of workers and a primary of each validator (source: https://www.youtube.com/watch?v=K5ph4-7vvHk)
Figure 5: DAG construction (source: https://decentralizedthoughts.github.io/2022-06-28-DAG-meets-BFT/)
Figure 5: DAG construction (source: https://decentralizedthoughts.github.io/2022-06-28-DAG-meets-BFT/)

3. Bullshark

Once we have a DAG, Bullshark can be used on top of it for ordering logic, which adds zero communication overhead. The term "zero communication overhead" means that Bullshark doesn't require validators to engage in additional communication to establish a total order for their vertices. Instead, each validator looks at its local DAG and uses Bullshark as a consensus protocol to interpret the vertices and edges, ultimately determining their order. Even though different validators might see different DAGs due to the unreliableness of the network or the presence of corrupt Byzantine nodes, Bullshark ensures that honest validators will reach the same total order. Bullshark consensus has two versions: a synchronous version and a partially synchronous version. Our focus here is on the latter because it guarantees bounded memory usage through garbage collection, a feature we'll delve into later. Bullshark interprets a round-based DAG by splitting it into odd and even rounds. For simplicity, let's consider a scenario with n=4 and f=1 (remembering that n3f+1). In the odd round, a predefined vertex known as the "anchor," represented in green in Figure 6, is introduced. The goal is to determine which anchors to commit to, and this is where the even round becomes crucial.

Figure 6: Anchors and its causal history (source: https://decentralizedthoughts.github.io/2022-06-28-DAG-meets-BFT/)
Figure 6: Anchors and its causal history (source: https://decentralizedthoughts.github.io/2022-06-28-DAG-meets-BFT/)

The edges from an even round, which connect to the anchor in the previous odd round act, as votes for that anchor. An anchor becomes committed once it receives at least f+1 votes from the next even round. For instance, in Figure 7, anchor A2 in round 3 is committed because it receives three votes from round 4, surpassing the threshold of f+1=2. Conversely, A1 remains uncommitted as it only has one vote.

Figure 7 illustrates how vertices in even rounds vote for the anchors in the previous odd round (source: https://decentralizedthoughts.github.io/2022-06-28-DAG-meets-BFT/)
Figure 7 illustrates how vertices in even rounds vote for the anchors in the previous odd round (source: https://decentralizedthoughts.github.io/2022-06-28-DAG-meets-BFT/)

However, A1 should not be ignored, considering the asynchronous nature of the network. Different validators may possess different DAGs in their local views. As illustrated in Figure 8, while Validator 2 is trying to keep up with Validator 1, it commits anchor A1, even if Validator 1 does not. The issue arises when Validator 2 catches up with Validator 1 because Validator 2 will then commit A1, followed by A2.

Figure 8: The different views of DAG for different validators (source: https://decentralizedthoughts.github.io/2022-06-28-DAG-meets-BFT/)
Figure 8: The different views of DAG for different validators (source: https://decentralizedthoughts.github.io/2022-06-28-DAG-meets-BFT/)

To ensure consensus among validators regarding the total order of vertices, even with different DAG views, Bullshark comes up with a solution. If there is a path from an old anchor to a future anchor, and the future anchor gets committed, then the old anchor must also be committed and ordered before the future anchor. Otherwise, it is considered safe to skip the old anchor. Validators recursively apply this principle until they reach the previously committed anchor. Once anchor commitment is achieved, validators can establish a total order for all committed anchors and their causal history using a deterministic rule, such as topological sorting or ordering by gas price. In Figure 9, the blocks of anchor A3's causal history are depicted by white boxes with green corners.

Figure 9 shows how Bullshark total order anchors and its causal history (source: https://www.youtube.com/watch?v=NGOXVSFzYdI)
Figure 9 shows how Bullshark total order anchors and its causal history (source: https://www.youtube.com/watch?v=NGOXVSFzYdI)

Figure 9 provides a visual representation of the commitment process and total ordering. In round 6, the validator commits anchor A3 because it receives three votes, exceeding the threshold of f votes. The validator then recursively looks back at its history to commit and establish the order of older anchors. Since there is no path from A3 to A2, it is considered safe to skip A2. As the validator progresses back to A1, it immediately commits and orders A1 after A3 since there exists a path from A3 to A1. The causal history of A3, along with anchors A1 and A3, is ordered, as demonstrated at the bottom of Figure 9. This process restarts from round 1, where the validator continues looking back until it locates the previously committed anchor, implying that all its historical data is already ordered.

It's worth noting that it is considered safe to commit anchors that have a path from committed future anchors to them, which comes from the quorum intersection. The proof can be found in [2] and [3].

3.1 Garbage Collection and Fairness

In this section, we delve into how Bullshark uses garbage collection to address memory issues and achieve fairness within DAG-based consensus. Fairness is a desired property in blockchain technology, ensuring that all nodes have the right to contribute to the consensus protocol, even if they operate at a slow pace. However, the challenge arises when we aim to ensure that all nodes' vertices are eventually added to other nodes’ DAG. Due to the asynchronous nature of networks, it's uncertain how long this process will take. This uncertainty necessitates that all nodes have to retain all arbitrarily old vertices and certificates readily available for other nodes to access, allowing slower nodes to catch up. Nevertheless, as our DAG continues to grow indefinitely, the fairness property that is too strong leads to unbounded memory in the DAG construction. Hence, we need a way to clean up the old, unused history, and this is precisely garbage collection. Garbage collection is a mechanism to recover unused memory, making it available for future use. In simpler terms, it involves the removal of unused historical data within the DAG, effectively freeing up memory resources, which, in turn, helps maintain bounded memory usage. However, a fundamental contradiction arises between garbage collection and fairness. When we garbage collect the old DAG data, we can't guarantee that our arbitrary historical information will be readily accessible to slower nodes. Thus, achieving both garbage collection and fairness simultaneously seems impossible. Bullshark addresses this challenge by relaxing the fairness requirement to ensure fairness only during the synchronous period, which is why a partially synchronous version of Bullshark is preferred over an asynchronous one. The partially synchronous model consists of two periods: the asynchronous and synchronous periods. During the asynchronous period, the system behaves asynchronously, implying that messages may experience delays due to malicious actors or the inherent unreliability of the internet. Nevertheless, the attacks from malicious actors or internet-related latency issues must eventually stop, allowing the system to recover and resume normal operation. The point at which the system recovers is known as GST (Global Stabilization Time), after which the system transitions to synchronous operation. To put it simply, in a partially synchronous model, the system remains asynchronous until GST, after which it operates synchronously. The process of garbage collection is simple. Validators add timestamps to their vertices when broadcasting them. When an anchor is committed, it is assigned with a timestamp computed as the median of its parents (the vertices that the anchor has strong edges pointing to). Now that the history of the anchor is ready to be ordered, a validator looks back at the history of the anchor while computing a timestamp for each round by using the median of the vertices’ timestamps in that round. Once a round GCround whose timestamp is smaller than the anchor timestamp by more than the predefined time is reached, the GCround and the rounds below it are garbage collected.

Figure 10: The process of how Bullshark garbage collects the old history and achieves fairness (source: https://decentralizedthoughts.github.io/2022-06-28-DAG-meets-BFT/)
Figure 10: The process of how Bullshark garbage collects the old history and achieves fairness (source: https://decentralizedthoughts.github.io/2022-06-28-DAG-meets-BFT/)

Figure 10 provides a concrete example of how Bullshark's garbage collection works. In this illustration, is defined as 2, and the timestamp of anchor A (colored in green) is calculated to be 5. Consequently, one can revert and find the threshold of the GCround timestamp as 5-2, resulting in 3. Thus, any vertex with a timestamp between 3 and 5 can be included in the DAG, even if it arrives after its peers. As demonstrated in Figure 10, the vertex from validator 4, marked with a timestamp of 4, can be added to the DAG despite arriving later than the other vertices. Meanwhile, the rounds that have timestamps below 3 are garbage collected. However, adding validator 4's vertex alone does not guarantee fairness because there is no path from the anchor to this vertex. As a result, the vertex will be ignored during the total order process. To solve the fairness issue, the concept of the "weak link" or "weak edge" is introduced. Unlike the edges we've discussed thus far, known as strong edges, which link a vertex in round r with the vertices in the immediate previous round (round r-1), weak edges connect a vertex in round r with vertices in rounds below r-1. They are not counted as votes in the consensus logic; instead, their purpose is to help create paths to the vertices of slow validators to ensure that other honest validators eventually add them to the DAG. Thus, the combination of garbage collection and weak links ensures fairness during the synchronous period while simultaneously bounding memory usage.

4 . Conclusion

Throughout this article, we delved into Narwhal and Bullshark, which is the fundamental idea used to handle shared objects in Sui’s consensus protocol, named Sui Lutris.

The general idea of Narwhal is to decouple mempool functions from the consensus mechanism. By using Narwhal as a mempool, workers can disseminate batches of transactions at network speed. At the same time, workers forward the digest of batches to their primary to form a round-based DAG. With this design, Narwhal's structure enables horizontal scalability by adding more workers to each validator.

Following Narwhal, we discuss a consensus protocol called Bullshark. Once validators have a DAG constructed using Narwhal, the consensus logic can be built on top of the DAG. Bullshark handles the ordering logic in the consensus layer without sending extra messages between validators. With this clean separation design, data can be disseminated at network speed regarding the ordering speed. Notably, Bullshark provides solutions for garbage collection and fairness during the synchronous period, which addresses memory and fairness problems that many DAG-based consensuses have faced.

References

[1] Narwhal and Tusk: A DAG-based Mempool and Efficient BFT Consensus

[2] DAG Meets BFT - The Next Generation of BFT Consensus

[3] Bullshark: DAG BFT Protocols Made Practical

[4] Bullshark: The Partially Synchronous Version

[5] All You Need is DAG (DAG Rider)

[6] On BFT Consensus Evolution: From Monolithic to DAG

[7] BFT on a DAG

[8] (Youtube Video) DAG Meets BFT: The Next Generation of BFT Consensus with Alexander Spiegelman | a16z crypto research

[9] (Youtube Video) ConsensusDays 21 / S1.4 / Narwhal and Tusk - Alberto Sonnino

Subscribe to ContributionDAO
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
Verification
This entry has been permanently stored onchain and signed by its creator.