Tendermint Consensus: The Center of the Cosmos-Chain Universes

October 26th, 2023

Reaching consensus on the order of transactions within a distributed system is one of the fundamental problems that blockchains are trying to solve. Tendermint is one of the novel solutions used in many blockchains as a consensus engine, especially in the Cosmos ecosystem. Tendermint is a Byzantine Fault Tolerant (BFT) consensus protocol used to order transactions in a distributed network to ensure that all honest nodes agree on the same value, even in the presence of adversarial nodes.

Typically, each blockchain uses a different consensus protocol that suits its purpose. Today, our focus lies on understanding the mechanics of Tendermint, which are used intensively in the Cosmos ecosystem and are easy to understand compared with the other protocols. In this article, you will learn how nodes in the network interact to achieve consensus in Tendermint. We hope this article will give all readers an intuition on how blockchains work and other consensus protocols, which will be released later in the following articles.

Introduction

A Byzantine Fault Tolerant (BFT) consensus is a protocol that can withstand malicious behavior from nodes within a network. These malicious nodes are called 'Byzantine' due to their similarity with the classical Byzantine Generals' problem. Tendermint is designed to work under partially synchronous communication, which is BFT and capable of tolerating up to 33% of Byzantine nodes (f<n/3, where n represents the total number of nodes in the network and f denotes the maximum number of Byzantine nodes). In simple terms, consider a scenario where our network gets attacked or unexpectedly experiences an internet or power outage, causing some nodes to be unable to cast any votes or become unresponsive, which could be considered Byzantine behavior. The partially synchronous model expects that such outrages and attacks will eventually stop, allowing the network to quickly recover to normal operation after an attack.

Overview of the protocol

Before delving into how Tendermint works, let's first introduce the key components that are important in the Tendermint protocol so that we can go through the protocol without losing continuity:

Block: A block stores a batch of transactions. It also contains other metadata, such as previous block height, Merkle root of transactions, etc., but let’s ignore it here.
Block height: Block height is a unique, incrementing index specifying each block's position within a blockchain.
Nodes (validator nodes): Nodes are responsible for maintaining a full copy of the historical record of blocks and transactions. They actively engage in voting for block validity and reaching a consensus on a block at each height. Each node can be identified by its public key.
Round: Nodes in the Tendermint protocol might be required to participate in multiple rounds before agreeing on a block at a specific height.
Proposer: A proposer, also known as a leader node, is responsible for proposing a block to other nodes. At each height, validators take turns proposing new blocks in each round. As a result, there will be only one unique proposer, and all nodes will know who the current leader is.

At a high level, Tendermint uses a round-based voting mechanism to confirm a block at each height. Nodes in the Tendermint protocol exchange three types of messages before committing on a block, as follows:

PROPOSAL: A proposal for a block that a leader or proposer is proposing at a given height and round. A PROPOSAL message consists of the proposed block (B), its associated height (h), and a round (r), represented as PROPOSAL(B,h,r).
PREVOTE: PREVOTE is a voting message for the first stage of voting. It consists of the identity of the voter (i), the block it is voting for (B), the corresponding height (h), and the round (r). We denote this message as PREVOTE(i,B,h,r). A node can proceed to the next voting phase only if it receives PREVOTE messages from at least two-thirds of all nodes.
PRECOMMIT: PRECOMMIT is a voting message used in the second voting stage. Similar to the PREVOTE, it includes the node identity (i), the voted block (B), height (h), and round (r), and it is represented as PRECOMMIT(i,B,h,r). A node can commit and store block B locally, then proceed to the next height, h+1, only if it receives at least two-thirds of PRECOMMIT messages for block B from different nodes in the network.

Based on the previously described message exchanges, the protocol can be divided into three stages: propose, prevote, and precommit. Additionally, the protocol relies on three timeouts:

proposeTimeout: This timer activates when a node starts a new round. If the node fails to send a PREVOTE message within the specified time interval (dT), the proposeTimeout is triggered.
prevoteTimeout:: Once a node sends its initial PREVOTE message, the prevoteTimeout timer starts. This timer is triggered if, within the designated time frame (dT), the node doesn't receive supporting PREVOTE messages for a specific block from at least two-thirds of the network nodes.
precommitTimeout: Similar to prevoteTimeout, precommitTimeout is used with PRECOMMIT messages.

These timeouts are needed for a node to move to the next phase in case the node doesn't take any action in a stage.

ref: https://atrium.lib.uoguelph.ca/server/api/core/bitstreams/0816af2c-5fd4-4d99-86d6-ced4eef2fb52/content

Round 0

Assuming we are at the start of height h, each node begins with initial variables as follows:

lockedBlock=nil (the block that gets two-thirds of PREVOTE from other nodes, starting with nil)
lockedRound=-1 (round that associates with the lockedBlock block)
h=h, height
r=0 (round)
s=propose (stage)

The local state of a node can be specified by the tuple (lockedBlock,lockedRound,h,r,s ). In this round, a proposer p proposes a block Bp and broadcasts PROPOSAL(Bp,h,0) with its signature to other non-leader nodes. Upon receiving PROPOSAL(Bp,h,0), a non-leader node checks the validity of transactions in block Bp and confirms that the PROPOSAL was indeed signed by the proposal p. If the check passes, the non-leader node i updates its local variable to (nil,-1,h,0,prevote) and broadcasts PREVOTE(i,Bp,h, 0) to other nodes. If node i doesn’t receive any correct PROPOSAL message within the time interval dT, proposeTimeout is triggered. In such a case, the node sends a special message nil in the PREVOTE message as PREVOTE(i,nil, h, 0) and updates its local stage variable to prevote. This results in the node's local variables becoming (nil -1, h, 0,prevote).

Upon receiving PREVOTE messages from two-thirds or more of the other nodes in support of block Bp, a node updates its local variables to (Bp,0, h,0,precommit) and advances to the second stage. Subsequently, it broadcasts PRECOMMIT(i,Bp,h,0) to signify its support for the block. However, if the node fails to receive the required PRECOMMIT messages for block Bp before the precommitTimeout expires, it proceeds to the precommit stage. At this point, it sends a PRECOMMIT message with a nil value, represented as PRECOMMIT(i, nil, h, 0). Depending on its existing local variables, the node's local state then becomes either (nil,-1, h,0,precommit) or (Bp,0, h,0,precommit). However, note that in round r, a node must pre-commit to the same block for which it cast a pre-vote. For instance, if it casts a pre-vote for nil, it must also pre-commit to nil in that round.

To achieve consensus and commit to the current block Bp, a node waits until it receives at least two-thirds of the PRECOMMIT messages from other nodes supporting block Bp. Once a node decides on block Bp, it proceeds to the next height h+1 with round 0 and resets its local variables. However, if a node fails to reach a decision before the timeout window dT starts the countdown after its first PRECOMMIT message, the precommitTimeout must kick in and cause the node to move to the next round, r=1, resetting its stage to propose.

Note that a node skips to the precommit phase in case it receives two-thirds of PREVOTE for nil and moves to the next round if it receives two-thirds of PRECOMMIT for nil.

The above flow shows how nodes in Tendermint work in the first round (r=0). However, as you can see in the last stage, a node can move to a new round if it doesn’t hear any committed blocks in that round. Thus, let’s explore a bit further how the other round works in general.

Round r

During the propose phase, a proposal checks its local state to identify the most recent block that has successfully passed the prevote stage (lockedBlock), denoted as block B'. This block B' is obtained from the proposer's state (lockedBlock=B',lockedRound=r',h,r,propose ), where r' is the most recent round and less than r. If such a block B' exists, the proposer uses it with the proposer's PROPOSAL message. Alternatively, if there is no such block B' (i.e., lockedBlock=nil), the proposer is free to propose a new block Bp' and broadcast the PROPOSAL message as PROPOSAL(Bp',h,r).

When a node receives a PROPOSAL message for block Bp' during round r, it accepts the proposal under specific conditions:

The node accepts the proposal if its local lockedBlock variable is either nil or equal to Bp' (i.e., B = Bp'). Here, "accept" means that the node broadcasts the PREVOTE message with the corresponding proposed block and moves to the precommit phase.
In cases where the node's lockedBlock variable does not match the proposed block Bp' for the current round, the node considers the proposal under the condition that it receives at least two-thirds of PREVOTE messages in favor of the proposed block. Otherwise, the node waits for the proposeTimeout to be triggered and broadcasts PREVOTE with a nil block value.

This locking mechanism plays a crucial role in ensuring safety within the consensus process; see [1] Section 3.2.3 for more explanation.

Similar to the description for round 0, in round r, nodes follow a structured process:

When a node receives two-thirds of PREVOTE messages supporting the proposed block PROPOSAL(Bp',h,r), it adopts this proposal message as its local variable, broadcasts PRECOMMIT messages, and progresses to the precommit phase. If a node doesn't receive the required two-thirds of PREVOTE messages for block Bp' within the time window, the prevoteTimeout is triggered.
In the precommit stage, a node faces two possibilities: it either commits to block Bp' if it receives a sufficient number of PRECOMMIT messages supporting this block (e.g., from at least two-thirds of other nodes), or it advances to the next round in the event that the precommitTimeout is triggered.

Tendermint's design guarantees that, as long as the number of Byzantine nodes remains below one-third of the network, all honest nodes will eventually decide on a correct proposed block and proceed to the new height. This robustness ensures the integrity of the consensus process, even in the presence of malicious nodes.

Why two stages of voting?

The two stages of voting are required to ensure the consistency of the protocol. A single stage of voting would leave the system vulnerable to potential manipulation by malicious nodes. They could partition the network, causing one set of honest nodes v to commit to a value in round r, while another set w commits to a different value in a subsequent round.

The two stages of voting prevent this issue by introducing a soft commitment mechanism in the first stage. When an honest node receives two-thirds of the PREVOTE messages supporting a particular block, it is locked to that block but not yet committed to it. This soft commitment signals nodes across the network to prepare for the commitment phase.

Subsequently, the node opens for the second confirmation before finalizing its commitment to the block. However, nodes can start anew on a new block proposal if they get sufficient evidence (two-thirds of PREVOTE messages for a new block with a more recent round) indicating that they lag behind other nodes and need to catch up.

Let's explore how the two-stage voting mechanism enhances safety. Consider a one-stage voting protocol involving four nodes: A, B, C, and D, with B being Byzantine and A the proposer.

One-Stage Voting Scenario:

Node A proposes block "X.".
Nodes C and D immediately vote for block "X" upon receiving A's proposal.
Byzantine Node B manipulates the network, allowing only A to receive votes from C and D while isolating C and D from each other's votes.
Node A commits to block “X” after receiving a two-thirds majority.
Nodes B, C, and D timeout and start a new round.
Node B becomes the new proposer and suggests block "Y.".
Without the locked mechanism from the pre-vote phase, nodes C and D freely vote for block "Y", leading to a split in consensus: A commits to "X", while C and D commit to "Y.".

Two-Stage Voting Scenario:

Case 1: Conflict during the Pre-Vote Phase:

Node A proposes block "X.".
Nodes C and D pre-vote for block “X” immediately after receiving the proposal block.
Byzantine Node B disrupts the network similar to the one-stage voting case, isolating C and D from each other's pre-votes.
Node A locks onto block "X" but has not yet committed to the block.
After a timeout, nodes B, C, and D begin a new round.
Node B proposes block "Y.".
Nodes C and D are free to vote on this block because they are not locked on block “X” resulting in B, C, and D pre-voting on block "Y.".
Later, node A learns that the majority of nodes pre-vote for “Y”, resulting in A unlocking from "X", and then locking onto "Y", which prevents the inconsistency like in the one-stage voting scenario.

Case 2: Conflict during Pre-commit Phase:

Similar to the first case, but in the pre-commit phase, for A, C, and D to send pre-commit messages on block “X”, they need to lock on that block; that is, they need to already send their pre-vote message for the block, and everyone already receives each other's message. But since they already lock on block “X”, no matter what B does, it cannot convince C and D to vote for the new block "Y." Two-thirds of the pre-vote messages are required to unlock the locked block, which cannot happen because all honest nodes are already locked on block "X.".

For a more in-depth explanation, [3] is a recommended resource to delve further into this topic.

Bonus

Where the heck does the 33% threshold come from?

Because the 33% threshold has appeared in many blockchain whitepapers and articles, and, of course, in this article. We also mentioned the partially synchronous model, which is a kind of model that introduces this threshold. So let’s sketch a proof here as a bonus for those who don’t know where it comes from. You are free to skip this section.

This proof sketch is inspired by the works cited in [1] and [2]. You might have seen another version of the 33% threshold, which is written as n3f+1, where n is the number of all nodes and f is the number of Byzantine nodes in the network. Hence, we get f<n/3, or 33% of all nodes. We will stick with the ‘n >= 3f+1’ representation, which is quite obvious and easy to illustrate in the following proof.

To allow a consensus protocol consisting of n nodes to continue operating even if there are f Byzantine nodes, an honest node has to take action at one point. Suppose an honest node has waited a long time, but the network has not reached a consensus yet, which can happen because messages from some nodes might go missing or be delayed by malicious nodes. It is intuitive to let the honest node take action once it has heard from n-f nodes (counting itself) at some point in time. Note that the f nodes that stay silent might not necessarily be Byzantine since their messages. The worst scenario that could happen while an honest node is able to avoid getting tricked by the Byzantine node is that f number of Byzantine nodes are able to delay f of honest nodes messages, so the remaining honest nodes receive n-2f messages from the honest nodes and f messages from the Byzantine. Now, the best course of action for an honest node is to make a decision based on the majority (50%) of the vote from n-2f honest nodes and f malicious nodes without being deceived by f Byzantine nodes is

50% of the total number of the nodes that cast the votes > number of Byzantine nodes

(0.5)(remaining honest node + Byzantine nodes)>f

(0.5)(n-2f+f)>f

n>3f

or n3f+1, and this is the 33% threshold. Please note that this is just a sketch, not a rigorous proof. [1] and [2] are the recommended resources for the proof.

References:

[1] https://atrium.lib.uoguelph.ca/items/5459099e-67aa-4a23-83ae-d3471d8d8336

[2] https://github.com/timroughgarden/fob21/blob/main/l/l6.pdf

[3] https://github.com/timroughgarden/fob21/blob/main/l/l7.pdf

[4] https://arxiv.org/pdf/1807.04938.pdf

Subscribe to ContributionDAO

Receive the latest updates directly to your inbox.

Mint this entry as an NFT to add it to your collection.

Verification

This entry has been permanently stored onchain and signed by its creator.

Arweave Transaction

ffRnWb-jQilohz9…KaKZuoOOnT7eFuM

Author Address

0x0b0b191fe5f328a…125989271BBB077

Content Digest

-TLP4crabITcCut…fA5UHb1gqMGnJ4g