gmonad: a deep dive into parallel execution

March 18th, 2024

Everyone keeps saying gmonad-gmonad on CT, but what does it really mean? Who understands what Monad is really doing? What is parallel execution, and how does Monad utilizes it? Let’s deep dive here:

Monad is an EVM compatible L1, and expected to have a throughput of 10,000 TPS. So, you’re thinking, just another L1, right? Not really. Monad uses multiple different foundational pillars to be this efficient. We will go through each one of them in this blog.

MonadBFT: this is the consensus mechanism used by Monad for achieving consensus (duh) about the txn ordering under partially synchronous conditions in the presence of Byzantine actors. What the hell is BFT? In very short terms, Byzantine Fault Tolerance (BFT) uses a consensus algorithm that prevents double-spending, censorship and DOS attacks. It requires a certain percent of nodes in the network to agree on a txn before that txn is added to the blockchain. MonadBFT is a derivative of HotStuff, which is a leader-based BFT replication protocol. HotStuff basically enables a correct leader to drive the protocol consensus at the pace of actual network delay (called responsiveness). It is very well explained in the 2019 research paper which you can find here: https://arxiv.org/pdf/1803.05069.pdf

MonadBFT operates in a two-phase pipeline manner, featuring optimistic responsiveness. It typically has a linear communication cost, but this can increase to quadratic in scenarios where a timeout occurs. Similar to other BFT protocols, it involves a process where, in each phase, a leader dispatches a signed message to voters. These voters then forward their signed replies to the subsequent leader. The use of pipelining permits the inclusion of the quorum certificate (QC) or timeout certificate (TC) for a given block `k` within the proposal for the next block k+1. I will not bore you with how the consensus protocol actually work, but in general MonadBFT is a “pipelined consensus mechanism” which goes in rounds. The general idea of going into rounds is basically that for every 3 "bad" nodes, you need at least 1 "good" node. This ensures that the majority of the nodes are reliable. A block is only considered final and part of the chain if the following round's block also gets agreed upon. This makes sure that even if a leader was bad and only sent the proposal to some of the nodes, the block doesn't become final until the next block is also agreed upon by everyone. If a leader acts badly, either by proposing something that doesn't follow the rules or by not proposing to enough nodes, enough nodes will notice and trigger a timeout. This leads to the creation of a TC (Timeout Certificate), alerting the next leader that there was an issue. There is another term called QC, which is basically a Quorum Certificate and is derived by aggregating (via threshold signatures) YES votes from 2f+1validators.
Shared Mempool: Validators have a mempool where user transactions await inclusion in a final block. To efficiently distribute these transactions to other validators, they're fragmented via erasure coding and shared through a broadcast tree, streamlining the process. But, MonadBFT efficiently reaches consensus but transmitting large data blocks strains validator bandwidth. For example, a 5 MB block from 10,000 transactions requires significant bandwidth. To reduce this, block proposals use transaction hashes (32 bytes each) instead of full txns, significantly cutting data transmission size. Validators must have the transactions in their mempool for verification during voting and block finalization. Like pending transaction sharing, submitted transactions are distributed to other validators through erasure coding and a broadcast tree, enhancing efficiency.
Deferred Execution: The main idea here is that Monad has separated out the execution and consensus. This essentially means that the leader suggests a sequence before understanding the final outcome or state, and the nodes that check for correctness cast their votes on whether the block is valid or not, even without knowing details like whether all transactions in the block will be completed successfully without any need to undo them. It’s done by something that is currently limited. When transactions are arranged in a specific order, the final outcome or state becomes predetermined. Discovering the actual state requires executing these transactions, but their outcomes are fixed once the order is set. Using this as a leverage, Monad eliminates the need for nodes to execute transactions before reaching an agreement. The consensus among nodes focuses solely on establishing the correct sequence of transactions. Each node then processes the transactions in block N on its own while simultaneously working on achieving consensus for block N+1.

This approach allows for the allocation of the entire block time to the gas budget, as the only requirement is that transaction execution keeps pace with the consensus process. Moreover, this method is more adaptable to differences in processing times since the execution needs only to match the consensus rhythm on average. At the same time, in MonadBFT, finality is single-slot (1 second), and execution outcome will generally lag by less than 1 second for those using a full node.
Parallel Execution: In Monad, txns are executed in parallel using optimistic execution, which initiates the execution of subsequent transactions before earlier ones in the block are fully completed. This approach can occasionally lead to incorrect executions, although such instances are rare. To address potential discrepancies, Monad carefully monitors the inputs and outputs during transactions. For example, if the outputs from txns 1 do not match the expected inputs for txns 2, Monad re-executes with the correct data, ensuring accuracy by sequentially merging the updated states for each txn. To optimize this process, scheduling plays a crucial role. A straightforward application of optimistic execution might attempt to execute the next transaction as soon as processing resources become available. However, this could lead to inefficiencies, especially when transactions within a block depend on the outcomes of previous transactions, creating long "chains" of dependencies. Executing these dependent transactions in parallel could cause many failures. To circumvent this, Monad employs a static code analyzer to identify these dependencies in advance, strategically scheduling transactions only after their prerequisites are met. This proactive approach minimizes wasted effort by ensuring transactions are executed in an order that respects their inter-dependencies. In scenarios where predicting dependencies is challenging, Monad defaults to a more basic execution strategy, striking a balance between efficiency and practicality in transaction processing.
MonadDb: It is a “custom database” for storing blockchain state. According to Monad, Eth’s MPT data structure approach is sub-optimal because one data structure is embedded into another data structure of a different type. However, Monad implements a Patricia Trie data structure natively, both on-disk and in-memory. What does this mean now?

Monad processes several transactions at the same time. When a transaction requires fetching data from the storage, it's inefficient to halt and wait for this data retrieval to finish. Instead, the system should request the data and then proceed to handle another transaction while the data is being fetched. This approach necessitates the use of asynchronous input/output (async I/O) capabilities for database operations. Traditional key-value databases are not fully equipped for efficient async i/o, although there are ongoing efforts to enhance this functionality. MonadDb, on the other hand, effectively leverages the most recent developments in async I/O support provided by the operating system. This advanced utilization means there's no need to create numerous operating system threads just to manage waiting i/o tasks, allowing for smoother and more efficient asynchronous operations. Let’s understand what is async I/O real quick. Async I/O is basically input/output processing that allows CPUs to continue executing concurrently while communication is in progress. The reason why CPU is used because a CPU can initiate these operations faster than SSDs or Networks if it knows that a certain instructions don’t depend on the result of I/O operation. Here is a quick comparison between different devices provided by the Monad’s documentation:

Wrapping all of it, and thinking about what Monad is building, it really looks that they are looking to get a hyper-scaled EVM compatible L1. Using techniques like pipelining, async I/O, MonadBFT etc, I personally think that they can push past limits and build a great L1. A bigger thing about Monad is the community of Nads and Purple Pepes. They are everywhere, they are excited about Monad launching and they are constantly in the discord server talking, making friends, and making dope memes and driving the content. While we still don’t know how everything will work out when mainnet launch, I am excited about the future of Monad and its team.

That’s all for today. If you like this, feel free to follow my blog and like this post! Thanks my fellow Nads!

Subscribe to naruto11

Receive the latest updates directly to your inbox.

Mint this entry as an NFT to add it to your collection.

Verification

This entry has been permanently stored onchain and signed by its creator.

Arweave Transaction

Uqxp7jemOldpewb…e5llYMD4nkeupQA

Author Address

0x8480D20583A3138…8F17Bf3C01E73b7

Content Digest

8aLc6gopOJXRDmY…Pg_1QeB-xkUcGR8