Special thanks to the Kasar Labs team for discussion and review.
Blockchains represent a significant advancement in decentralised, permissionless databases, built to be trustless and scalable. Ethereum, as a prime example is a global database operating with over six thousand nodes that constantly read and write to the blockchain. A distributed system of such proportions demands a state architecture that meets three key criteria: it must be immutably verifiable, enable lightning-fast lookups, and scale to handle tens of thousands of writes.
Ethereum achieves this by maintaining a "world state" - a global state that is constantly updated by executing transactions. This world state stores critical information about Ethereum accounts, such as balances and smart contract states. Each interaction, whether it's a transfer, NFT mint, or memecoin snipe involves multiple reads and writes to this state. During these interactions, the system accesses the world state and executes state transformation functions to update the state accordingly.
In this article, we'll discuss the evolution of Ethereum's state architecture, its future, and how we've reimagined state at Madara. We'll explore Madara's Bonsai architecture towards the end, but we recommend reading the preceding sections to establish a solid foundation.
The blockchain “verifiability problem” arises from the need for an architecture that can support thousands of reads and writes, while remaining verifiable even on compute-light machines (light clients). While centralised databases such as SQL are efficient, scalable and battle-tested, they lack the trustless nature that blockchains require.
The primary challenge lies in achieving a balance between performance and verifiability. Blockchains require nodes to validate and store each transaction. Consequently, blockchain state architecture must be designed to be lightweight and easy to verify.
As the blockchain state expands, block verification requires more resources, which state architectures like MPTs address. Here we discuss state architectures implemented by Ethereum and Starknet
In the following examples, we will use the words "bin," "binary," "tree," "trie," "move," and "movie" to illustrate the concept of tries. Additionally, we will present the key-value pairs (bin,10), (binary,8), (move,30), (movie,55), (tree,20), (trie,48) to demonstrate Merkle trees, Merkle Patricia trees, and Bonsai trees.
A trie is an ordered tree data structure used to represent a set of strings over an alphabet set. A trie's architecture allows for the efficient storage of words with common prefixes. This design allows optimised checks on the presence of a word in a dictionary.
Nodes: Each node in a trie represents a prefix of the strings stored. A trie node can have multiple children, each corresponding to a different character.
Character Mapping: Each node contains a mapping from characters to child nodes.
In the adjoining diagram, there are two types of nodes:
Leaf Nodes (Green Nodes):
Signify the end of a branch in a Trie
Have a leaf flag to indicate whether the node is a complete word or a prefix.
“tree” and “trie” are examples of leaf nodes that signal complete words in the dictionary.
Intermediate Nodes (Yellow Nodes):
Have a character mapping to one or more child nodes
“move” represents an intermediate node, a node that maps two child nodes to its existing prefix.
“bin” behaves as a terminal node, representing a full word that exists in the dictionary while also having child nodes of its own.
A Trie’s unique design enables straightforward reads of leaf nodes by encoding the string as the path to the leaf node that stores the string. Hence the time complexity for insertion and search is proportional to the length of the string (O(m)), where m is the length of the string. This is because each operation involves traversing ALL the characters.
Similarly, tries demand a storage complexity of O(n), where n is the total size of all strings stored. However, the height of the trie depends only on the length of the longest string.
A trie’s structure allows efficient retrieval of leaf values, as the string is sufficient to determine its position within the Trie. As the Trie expands, this encoding allows for optimised lookup times; however a trie’s naive limitations can also be identified.
For sparse key sets, a trie may run unnecessarily deep, with many nodes having only one child. Long 'single-child' chains increase read and write times due to the need to traverse the entire tree depth.
PATRICIA (Practical Algorithm To Retrieve Information Coded In Alphanumeric) Tries are a specialised type of trie designed to optimise space usage by collapsing chains of nodes with only one child.
Nodes and Edges: Each node in a PATRICIA Trie represents a common prefix of the strings stored in the tree. Unlike a standard trie, where each edge corresponds to a single character, edges in a PATRICIA tree can represent multiple characters, leading to a more compact structure.
Merging Nodes: In a PATRICIA tree, any node that has only one child is merged with its parent. This merging reduces the number of nodes and the overall height of the tree, making it more space efficient.
A Patricia Trie’s optimisations result in improvements in the storage and compute complexity of lookups and insertions. While the worst case time complexity remains proportional to the length of the string (O(m)), due to the collapsed chains, the actual time can be less than in standard tries.
Similarly, due to a Patricia tries space-efficient design, the number of nodes required to represent a (sparse) dictionary reduce.
Using Patricia tries, we can create a data structure that quickly locates a node's position in any arbitrary tree based on its key. This allows for efficient lookups even in large databases.
With the implementation of a Patricia trie, we address the challenge of rapid reads/writes by enabling access to data storage slots through knowledge of the corresponding keys. However, in trustless environments, it is essential to ensure that the retrieved data is accurate. While Patricia tries can facilitate data retrieval, they require two separate read operations to achieve verification:
Value Retrieval: The first operation retrieves the value associated with a given key from the trie.
Value Validation: The second operation involves the validator traversing the trie again to validate the correctness of the retrieved value.
To validate without a second read, we can use a tree-like data structure that stores hashed values instead. This approach ensures that any change to the leaf node’s value automatically triggers a rehash across the tree, updating the leaf node and all its parent nodes up to the root.
The validation problem is reduced to a simple cryptographic check: if the provided leaf hash can be used to reconstruct the root hash through the Merkle proof, we can be certain that the leaf value is untampered.
This data structure is called a Merkle Tree. It is a tree based structure where each leaf node represents a hash of the underlying data, and each non-leaf node is a hash of its child nodes. This recursive hashing mechanism ensures that any data modification alters the hash of the corresponding node, propagating changes up to the root.
From this, we can derive two important properties:
To verify if all leaf nodes are unchanged between a “trusted” state tree and an “untrusted” state tree, we only need to compare the root node hash.
Merkle proofs verify the inclusion of specific data within the tree
To verify the presence of the leaf node "bin", we compute and return the following hashes:
h(binary)
h(h(move)+h(movie))
h(h(tree)+h(trie))
By calculating these hash values, we can traverse through all sibling nodes, allowing us to reconstruct the hash of the root node, to verify whether the leaf node "bin" is included in the tree. Read more on Merkle Proof verification here and here.
Merkle trees excel at proving data inclusion. To demonstrate the inclusion of a leaf node, a user can simply request a Merkle proof that shows a particular leaf is a child of the state root. Consequently,“Verifying Proof of Data Inclusion” requires only O(log N) time.
However, Merkle trees are inefficient at retrieving the "actual value" at the leaf node. This inefficiency arises because the position of a node is independent of it’s key. Hence the only way to fetch the “actual value” of a leaf node is to traverse the entire tree in a brute force fashion.
Read more about how Ethereum uses Merkle Proofs here.
A Merkle Trees property of enabling efficient checks on inclusion is leveraged in the Bitcoin Network. Light clients can use Merkle proofs to verify the status of a transaction. They do this by requesting a proof that demonstrates the inclusion of a transaction in one of the Merkle trees, with the root of that tree being included in the block header of the main chain.
This achieves Bitcoin’s use-case as a peer-to-peer version of electronic cash. However as Ethereum extends to behave as a global computer, Merkle Proofs are limiting. In particular, while Merkle Proofs enable verifying inclusion of transactions, proving the current state would require multiple, expensive reads.
Ethereum's state architecture must meet two crucial requirements:
Rapid Access: The architecture must support fast reads and writes to the state
Verifiability: Ethereum's state must allow inclusion proofs
To achieve this, Ethereum takes the Merkle tree concept one step further with “Merkle Patricia Trees”.
Trees & Tries: A tree in is a hierarchical data structure consisting of nodes connected by edges, where each node has at most one parent and zero or more children, with a single root node at the top.
A trie is a specialised tree-like structure designed to store strings. Each node in a trie represents one or more characters and paths corresponds to prefixes. A Merkle Patricia Tree derives properties from the Patricia Trie but is categorised as a Tree.
Merkle Patricia Trees, are specialised tree structures that combine the benefits of Merkle Trees and Patricia Tries. In particular, with an MPT, Ethereum gains the data integrity guarantees of a Merkle Tree, alongside the optimised reads/writes achieved through Patricia Tree’s key encoding. Due to these design advantages, MPTs today power Ethereum and several other blockchains.
Node Types: In a Hexary Merkle Patricia Tree, nodes are categorised into three types:
Branch Nodes: These nodes can have up to 16 children, representing the 16-character hexadecimal alphabet. Each branch node contains a hash for each of its child nodes.
Leaf Nodes: These nodes indicate the end of a path in the trie and contain the final segment of the path along with the associated value.
Extension Nodes: Similar to Patricia trees, these nodes optimise the trie by compressing paths for nodes with a single child, reducing the number of nodes required.
Hashing: Similar to Merkle trees, each node in the tree has a unique hash value derived from its content. This ensures that any alteration in the data results in a change in the root hash, making tampering evident.
The MPT is visualised as a tree, however it is stored in a key-value database like RocksDB where each node is serialised and stored as a value, with its hash serving as the key. This allows for efficient retrieval and verification of nodes based on their hash values.
In the above example,
The root node (h1) is a branch node that points to two child nodes (h2 and h3); it’s children are stored as a 17 member array with the first 16 nodes being pointers and the 17th member being the (optional) value
Node h3 is an extension node that points to another node (h6). Extension nodes have a 2 item array, with the first member as the encoded path and the second member as the hash to the child node.
Node h9 is a leaf node and has no children. Leaf nodes have a 2 item array as well, similar to extension nodes, with the first member as the encoded path and the second member as the hash to the child node.
Leaf nodes and Extension nodes are differentiated through the use of a bit prefix, however we’ve omitted this detail to maintain simplicity in the diagram.
Key Lookup: When a read request is made, the system begins by converting the key into its corresponding hexadecimal representation.
Traversal for Insertion: The system traverses the Tree using the hexadecimal string. If a suitable path does not exist (e.g., the key is new), the tree structure may need to be modified:
At each branch node, the tree traverses between 0-9 or 0xA-0xF
allowing the trie to split across upto 16 child nodes.
New nodes may be created if the key diverges from existing prefixes.
Extension nodes may be added to compress the tree further.
To read the value encoded by the string "binary" in a Merkle Patricia Tree, follow this process:
Convert "binary" to a Hexadecimal String:
The string "binary" is converted to its hexadecimal representation: 0x62696e617279
.
Descend the Tree:
Begin at the root branch node. At each branch node, descend to the child node linked to the current index of the hexadecimal string. The traversal follows the sequence of digits in the hexadecimal string: start with 6
, then move to 2
and continue this pattern until reaching the last digit 9
.
Handling Extension Nodes:
When you reach an extension node, you can skip a prefix of the string entirely. This is possible because an extension node indicates that the prefix is common among all its child nodes, eliminating the need to traverse to a specific child node.
6
and then 2
we reach an extension node. Extension nodes bypass unnecessary nodes allowing the traversal to skip 0x696e
and reach the next branch node.Reaching the Leaf Node:
Finally, you will arrive at the leaf node. The key section of the path can be compressed into a single key, allowing you to read the value stored at the leaf node.
The MPT inherits it’s compute complexity from the Patricia Tree’s structure with operations such as insertion, search, and verification proportional to the height of the tree, which is logarithmic in the number of leaf nodes (O(log n)). Similarly, the space complexity is also optimised due to the use of Patricia Trie's collapsing mechanism and the efficient hashing of Merkle Trees. This results in a space complexity that is generally better than standard tries (O(n)).
Hexary Merkle Patricia Trees build short trees of height , creating shorter Merkle Paths than in a Binary Tree. This allows faster computation, as disk accesses are a greater bottleneck than hashing; shorter trees means fewer DB reads.
The Ethereum tree was originally hexary because database accesses are expensive and this would reduce the number of database reads. However, with some clever storage optimisations we can achieve hexary-like database accesses even with a binary tree (see https://ethresear.ch/t/optimizing-sparse-merkle-trees/3751#:~:text=And here is,binary tree implementation.)
By optimising database reads to be hexary-like, we can make BMPTs equally DB efficient as Hexary ones. Binary Trees have only 2 children per node, so BMPT Merkle proofs require 15x less node information to be sent (one sibling vs 15 in a hexary tree), while only being 4x longer than a Hexary tree; ~4x savings.
Noting this, EIP 3102 proposed switching Ethereum to a Binary Merkle Patricia Tree. As “hashing was computationally cheaper than disk accesses”, binary tries looked to maximise hashing by growing the length of the tree while reducing its breadth.
Consequently, with a binary tree we achieve low proof sizes and a more efficient representation in the database.
In general, in an N-element tree with each element having k children,
the average length of a branch is roughly 32 * (k-1) * log(N) / log(k)
plus a few percent for overhead.
32 is the length of a hash; the k-1 refers to the fact that a Merkle proof
needs to provide all k-1 sister nodes at each level, and log(N) / log(k)
is an approximation of the number of levels in the tree
For any N, the expression is minimized at k = 2.
Here’s a table of branch lengths for different k values assuming N = 2**24:
While EIP 3102 was shelved in favor of Ethereum’s “Verge”, it was picked up by several rollups including Starknet, which now operates as a Binary MPT.
Similar to a hexary MPT, a binary MPT is also stored in a key-value database like RocksDB where each node is serialized and stored as a value, with its hash serving as the key. This allows for efficient retrieval and verification of nodes based on their hash values.
In the above example,
The root node (h1) is an extension node that points to its child node (h2). Extension nodes have a 2 item array, with the first member as the encoded path and the second member as the hash to the child node.
The node (h2) is a branch node that points to two child nodes (h3 and h4); it’s children are stored as a 3 member array with the first 2 nodes being pointers and the 3rd member being the (optional) value
Node h12 is a leaf node and has no children. Leaf nodes have a 2 item array as well, similar to extension nodes, with the first member as the encoded path and the second member as the hash to the child node.
Leaf nodes and Extension nodes are differentiated through the use of a bit prefix, however we’ve omitted this detail to maintain simplicity in the diagram.
The structure of a Binary MPT is quite similar to a Hexary MPT, with a simple modification where each node has upto two children. This design results in a longer tree—potentially up to four times longer than its hexary counterpart. However, the benefits of this structure are substantial.
In a BMPT, each branch node has only one sibling instead of fifteen, which achieves a reduction in the size of Merkle proofs, leading to a reduction of up to fifteen times in the data required for these proofs.
Additionally, BMPTs carry forward all the optimizations achieved by combining Merkle and Patricia trees.
A Binary Merkle Patricia Tree is architecturally similar to a Hexary Merkle Patricia Tree, hence it’s reads and writes are performed in a similar fashion, where bits are compared instead of hexadecimal digits for path encoding. The time and space complexity are also similar.
While Merkle Patricia Trees and its binary version has significantly improved Ethereum's state management, challenges persist as the blockchain continues to grow. Today, an Ethereum archive node requires over 12 TB of storage, highlighting the need for more efficient state architectures.
Storage Bloat: As the chain grows, so does the state size, leading to increasing storage requirements.
Read Performance: Taller Merkle paths in growing trees result in slower read times, especially for accessing historical state.
Write Overhead: Updates to the state require traversing and updating the entire path from leaf to root, which becomes more time-consuming as the tree grows.
These challenges have led to the development of stopgap measures such as state snapshots and pruning.
Developed by Consensys, Bonsai reimagines state storage and access rather than incrementally improving upon existing tree structures. This new approach is built on two key insights:
For non-archive nodes, the majority of state reads occur within the most recent 500 blocks.
Optimising for recent state access can yield significant performance improvements and storage savings.
The Bonsai Tree stands apart due to it’s distinct feature sets
O(1) read times
90% reduced storage requirement (under optimal conditions where Bonsai retains all trie logs and minimal snapshots to balance storage and read times)
These metrics can be achieved due to Bonsai's implementation of leaf nodes as "directly accessible key-value pairs in a flat database".
A flat database is a data storage structure that organises data in a two-dimensional format, consisting of rows and columns, where each row represents a record and each column represents a field.
A Bonsai tree is a modified Binary Merkle Patricia Tree that stores leaf nodes in a flat database; hence significantly optimising read times. Unlike traditional Merkle Patricia Trees, where reading data requires traversing from the root node to a leaf, Bonsai enables retrieval with a single disk read.
However, updates or writes to the Bonsai tree are performed similarly to those on a BMPT, requiring traversal of the entire tree. This traversal is necessary to update the hashes for each parent node.
It is important to note that Bonsai stores state trie nodes based on their location within the trie. Therefore, an update to a value does not change its flat-DB key, allowing for O(1) reads to be maintained even after state updates.
While most representations of Ethereum's global state trie use a key-value store indexed by each node's hash, Bonsai stores state trie nodes based on their location within the trie. This allows for direct access to an account's data** from storage using its key**
The Bonsai tree is an modified Binary Merkle Patricia Tree that stores leaf nodes in a flat-database to enable O(1) read times
To present a short example of Bonsai’s state reads and updates, we can refer to the diagram above. Note that the diagram is a simplification and omits certain steps of tree construction.
Reading data from a Bonsai Tree involves retrieving information based on a specific key. Here’s how it works:
Key Lookup: The system converts the data queried into its corresponding flat database key.
Data Retrieval: The system queries the flat database using the calculated key. By skipping the trie traversal, Bonsai Trees can achieve O(1) access times.
To read the value encoded with the key bin
in a Bonsai Tree:
Key to Storage Translation: Bonsai stores data internally with the a calculated hash as the flat-database key, in our example this is h(bin)
Direct Storage Lookup: With the hash, Bonsai directly queries the flat-database and retrieves the data.
Writing data involves updating the state of the Bonsai Tree and the flat database:
Key Conversion: The path key is converted into a binary string.
Traversal for Insertion: The system traverses the Bonsai Tree using the binary string. If a suitable path doesn't exist, the tree structure may need modification.
New nodes may be created if the key diverges from existing prefixes.
Extension nodes may be added to compress the tree further.
Updating the Flat Database: Once the appropriate leaf node is identified or created, the system updates it with the new value
An example of writing data to a Bonsai Tree is as follows.
Identify the Storage Slot: The key that is being written to is bin
, this key is converted to it’s binary equivalent 01100010 01101001 01101110
Modify State Data: For example, let's say the value for this key needs to be updated from 10
to 11
.
Update the Leaf Node: The system locates the leaf node associated with the key bin
in the flat database.
A trie traversal is performed, descending down the Bonsai trie according to the calculated binary key
The final branch node in the path has the leaf node as a child. This leaf node is used to update hashes for parent nodes, whereas the actual value for the leaf node is stored and accessed from the flat-database.
Write New State Data: The new state data (e.g., updated value of 11
) is written to the identified leaf node. The trie structure is updated to reflect this new state.
Calculate New Root Hash: After updating the trie, the system recalculates all parent hashes up to the root hash of the trie.
Thus, to write to a Bonsai Tree, the client must update all parent nodes of that leaf. This results in writes taking as long as Binary Merkle Patricia Trees.
Ethereum’s MPT allows for efficient retrieval of older data by maintaining a link between the current state and its historical versions. Each block in the blockchain contains a root hash of the state trie, which reflects the state of all accounts and contracts at that particular block.
As an MPT preserves older blocks, a user only needs to specify the block they want to query to access older data. Since all states are linked, an older block in the MPT can be fetched, and its state tree can be read. The MPT also allows unchanged storage in newer blocks to reference older blocks, reducing data duplication. Therefore, reading storage from block 175224 may require accessing block 175223, as the state tree of the former points to the latter.
In a Bonsai tree, keys to leaf nodes are derived from their location in the trie rather than from the values of child nodes, as is the case with Merkle Patricia Trees. Consequently, when a leaf node value is updated, the previous value cannot be retained, and Bonsai only stores the latest state, automatically pruning older values with each update.
To access older state data, Bonsai utilizes trielogs, which capture modifications made to leaf nodes within a specific block. Each trielog records updates to the flat-database including both previous and current values. These trielogs are stored in the database with the block hash as the key.
Trielogs represent the differences between two blocks’ states by capturing each modification. To read an older state, a node can use the trielogs to selectively revert or apply changes. By traversing the trielogs from the current block back to the desired block, the node can reconstruct the state at that specific point in time.
For example, to read the state at block 175223, a node would start from the current block and apply the trielogs in reverse order until it reaches block 175223.
A simplified example of Trie Logs is attached below.
Block Hash: 0xabcdef1234567890...
Trielog:
- Account:
- Account address: 0x1234567890abcdef...
- Previous Value: { balance: 100, nonce: 0 , ...}
- Next Value: { balance: 150, nonce: 1 , ...}
- Storage Slot:
- Account address: 0x1234567890abcdef...
- Slot key: 0x0111...
- Previous Value: 0x01
- Next Value: 0x02
- Code:
- Account address: 0x1234567890abcdef...
- Previous Value: 0xaabbccddeeff...
- Next Value: 0x112233445566...
Bonsai’s Flat Database:
As Ethereum's state grows, Merkle paths become taller, leading to slower read times as tree traversal grows complex.
With Bonsai’s architecture, O(1) read times can be achieved by directly accessing leaves in the flat database
Single State Representation:
Bonsai’s prioritises providing “real-time” state over functioning as an archive node.
This design enables the Bonsai tree to store only the latest state representation at any given time, resulting in no connections to previous blocks.
Consequently, this leads to a significantly more compact state representation compared to an MPT.
However, a single state representation means, unlike an MPT, Bonsai trees do not store previous block information in it’s state tree
Trielogs:
With a Single State Representation, Bonsai is still able to serve older block data through the use of Trielogs
Trielogs represent the differences between two blocks’ states by capturing each modification made.
They contain the prior and next versions for each modification, allowing for state changes to be reverted or applied selectively.
Through the use of Trielogs, Bonsai trees can efficiently perform chain reorganisations by “rolling back” or “rolling forward” state.
While enabling transitions between states, a recursive application of Trielogs to attain an older state can be both computationally and storage-wise expensive.
In conclusion, blockchain state architectures have evolved significantly to address the challenges of verifiability, scalability, and performance. From simple Merkle trees to more complex structures like Merkle Patricia Trees (MPTs) and their binary variants, each iteration has brought improvements in efficiency and functionality.
Bonsai as the latest innovation represents a shift in state management. By storing leaf nodes in a flat database, Bonsai achieves O(1) read times and significantly reduces storage requirements. While optimized for recent state access, Bonsai uses trielogs to maintain historical data accessibility.
This evolution in state architecture demonstrates the ongoing efforts to balance the needs for rapid access, verifiability, and efficient storage in blockchain systems, paving the way for more scalable and performant decentralized networks.