[Cover photo by Mali Maeder]
Data structures such as Merkle Trees are critical to how blockchain data integrity and make read/write times quicker.Merkle trees have a few different components: roots, nodes, and leafs. Leafs are located at the base of the structure: they contain the hashes of the original information in the dataset. They are hashed together in pairs to form the first layer of non-leaf nodes.
For example, in the tree above, this bottom most layer is comprised of leafs. Leaf hash a and leaf hash b are hashed together to form node hash AB.
The first layer of nodes originates from the bottom leaf nodes. As we ascend the tree, this process repeats. For example, the hash of the next two paired nodes will become their parent and so on until we reach the top. We call the top node the root.
Merkle trees have some cool properties, such as being able to prove inclusion in log time. Let’s say you need to prove that element a is a valid leaf contained within the tree without having access to the entire tree. We recreate each parent node with the intermediate hashes until we hit the root. To recreate the first step of the tree, we need leaf b. By hashing these leafs together (a & b), we have the first node of three (root included). With ab, we can recreate the parent node by hashing it with the other child node, cd. And once we reach abcd, the last hash we need to provide is efgh before the root is reconstructed. A matching root means that we have successfully proved the leaf is valid and contained within the tree. We know the leaf is contained within the set because hashes are one-way functions whose output will be massive altered with one small change to the preimage (or input), this is called the avalanche effect. This is particularly attractive for blockchain based applications where you need to conserve space and can only store the verifier on-chain. This method is a bottom-up way of navigating the tree.
In brief, we reviewed how Merkle trees are constructed and proved at a high level. The next articles will go in depth on these two topics. Merkle trees are very useful for proving that an element is contained within a large data set, the integrity of the data itself, and compressing the total amount of data needed to be stored. Merkle trees are used as a primitive in blockchains and distributed systems more widely, but even can be useful in the context of smart contracts.
For example, suppose that you needed to whitelist a large number of addresses. By using method one with a valid root, you can prove that the address is contained in the tree without porting it on-chain and spending a ton of precious Ether on the infamous hash map.
Thank you for reading!
Check out Raisin Labs on Twitter: @Raisin_Labs!