In a cryptographic commitment scheme, the sender can commit to a chosen value (or chosen statement) while keeping the value hidden from others. At some point in the future, the sender can reveal the value and the recipient can test whether it matches their commitment. This is useful as the sender can prove they know something long before it is publicly revealed.
If we think of a commitment scheme from an engineering perspective, it empowers a receiver to independently test whether they have received (or computed) the anticipated data from the sender. This works as the recipient can find commitment from a trusted source and can safely request the data from an untrusted source.
Hash functions. One common way to implement a commitment scheme is with a hash function like SHA2 or KECCAK256 (SHA3). Hash functions are. the only primitives that can easily be understood with a visual cue.
There are some easy-to-see properties:
We use the following notation to describe a cryptographic hash function:
h = H(x), where "x" is the pre-image of a hash.
The word image comes from set theory. An image is the set of all possible outputs a function can produce and a pre-image is the set of elements that map to the image. It may sound confusing and don’t worry because it is confusing. I think of pre-image as the input to a function and the image as its corresponding output.
We explore three security properties of a cryptographic hash function:
It may not seem obvious why the above properties are important. This set of slides provides a good deep-dive into the topic. For now, we will take this opportunity to provide a simple example and help explain each property’s goal.
Pre-image resistance can be best described as guessability. Given a publicly disclosed hash, there should be no better strategy to find corresponding pre-image than simply trying to guess it. To find the secret, the attacker’s only option is to perform an exhaustive search which amounts to continuous guessing.
For example, can you guess the pre-image for this hash (SHA256)?
A399677C5B6A183950C553D31F8BD2F560E1819FA2D97203404D091EB99E7A6B
The answer:
guess me bro
There are two aspects of to consider:
The former involves fending against the pigeonhole principle and it is more in the realm of cryptographic protocol design. If we assume the hash function is well-designed and it is not a fundamentally flawed, then we can simply consider the latter issue.
We need to consider “guessing entropy” of a pre-image and how “random it is”. In an ideal world, it is a long sequence of random bits and it is computationally infeasible for any machine to guess it. However, a sequence of random bits is not often useful for many applications and in practice the pre-image is a human readable phrase.
For example, this has led to password policies and best practice guidelines for user-generated passwords. The phrase “guess me bro” is estimated to have ~42 bits of entropy and we can safely assume that many machines today can eventually brute-force it. The implementation of such policies feels more like an art than a science, and as this study highlights most websites do not follow best practices around password policies.
BIP39 defines the seed scheme for cryptocurrency wallets. It recommends a seed should contain 12 or more words, and each word brings about ~11 bits of entropy. The standard is interesting as tries to push for a secure brain wallet such that the seed is human-readable (and memorable) while preventing a machine from guessing the seed and stealing your coins.
If care is not taken, then an attacker can spend time building a reverse lookup table like a rainbow table. This makes it easy as someone has done the hard work of building a hash → pre-image lookup table. Anyone can simply take a hash and find its corresponding pre-image. To fend against lookup tables, we recommend including is a nonce (salt) alongside the pre-image such that:
h = H(“guess me bro”, r), where r is a random number.
If R is publicly known, then an attacker is forced to generate a new lookup table. If R is kept secret (and it is large enough), then it can help make it computationally infeasible to brute force the pre-image.
Second pre-image resistance protects the very concept of a commitment scheme. If the property is broken, then the sender can simply lie about the value they have committed too. It is as simple as the sender committing to value A, but later revealing value B. The recipient cannot distinguish the truth as h(A) === h(B)
.
Rock, Paper and Scissor. Let’s consider this childhood game as it can be implemented as a two-round protocol using hash functions.
If second pre-image resistance is broken, then Bob can win the game by changing his choice from “rock” to “scissors”.
✨🔮 Bob magically computes H_b === H(“rock”, r_b) === H(“scissors”, r’_b) ✨🔮
Thanks to a magic trick, Bob can perform the final move of the game:
Clearly, the magic trick is bad. It should be computationally infeasible for Bob to change his choice from rock to scissors.
Generally, this property must guarantee that it is impossible for an attacker to pick a hash and selectively find a pre-image collision that works to their advantage.
The property collision-resistance is the most extreme case. It should be computationally infeasible to find ANY pair of messages that lead to a collision. Even if the messages are completely random gobbledygook.
It is no longer about targeting a publicly disclosed hash to attack. The attacker can pick any hash and inputs.This property often has theoretical significance as finding a single collision, even if it is meaningless, provides evidence that a second pre-image attack may eventually be possible.
One example we can consider is proof of work in Bitcoin where miners must repeatedly perform hashes until they solve the following puzzle:
- Pick random 'r' and perform hash:
000000000abc... = h = hash(block_header, r)
- Success if first X bits of h are zero. If not, repeat process.
Collision-resistance prevents a miner finding a hash which is computable using two different (and conflicting) blocks such that h === h(block1) === h(block2). If possible, it would undermine the blockchain’s determinism as it is now possible for nodes to compute two different databases using the same blockchain.
In cryptocurrencies, a hash function protects the blockchain’s integrity and there are two components in a block to consider:
Transactions. An ordered list of transactions that represents an update to the database. All transactions are signed by the signers (a hash of the transaction is signed).
Block header. The metadata for a block. It includes a hash of all transactions and its position in the chain. In other networks like Ethereum, it contains more extensive commitments including the current state of the database, transaction receipts, etc.
Given this information, we can consider two interesting problems that hash functions can help solve:
We need a mechanism that intrinsically ties a list of ordered transactions to a single block header. This is to allow the receiver to check the ordering of transactions across blocks and the ordering of transactions within a block.
A hash tree is the way to solve this problem.
Four transaction example. Let’s consider a hash tree from the bottom to the top:
To build the tree, we repeatedly take the a list of items (at the same level), pair the items together and hash the concatenated pair. For example, if we consider the inner node 34, it is a hash of both child nodes h3 and h4. This shrinks the size of the list until there is only a single hash. The root represents a commitment to the entire list of items and the tree structure.
Linking transaction list to a bock header. The merkle tree root (transaction root) is stored as a field in the block header (header.txRoot
). The root hash is 32 bytes and anyone can compute the same root as the tree building process is deterministic. It also preserves the ordering of all items as they must be hashed in the correct order.
This brings us back to the question which is slightly more granular:
How can we verify a single transaction is in a block (as opposed to checking all transactions in a block)?
A proof of membership problem. We want to prove that one transaction was confirmed in the block without sending the entire list of transactions.
The sender can send a branch of the hash tree which includes the single transaction and a list of inner node hashes. The receiver can use this information to compute the merkle tree root’
and compare the computed root with the hash stored in the block header:
root' = check_membership([T1, h2, h34]); // Compute potential root
root' === header.txRoot; // Roots must be equal
Interestingly, the receiver only needs a subset of the data and not the entire list of transactions to check for membership. In fact, the membership proof grows logarithmic in the depth of the tree or put another way it is O(log n). This is ideal for large data structures or in our case, a lot of transactions per block.
A more detailed explanation for merkle trees including the algorithm for insertion and verification can be found here. As well, an interesting historical note is that Satoshi Nakamoto made a mistake in the Merkle tree implementation that led to a severe bug.
Let’s remind ourselves of the problem statement:
How can verify the position of a block in the blockchain?
Hash chain. We need to link a block to its predecessor to form a chain and then it is straight-forward to compute its position (“height”) in the chain. The solution is to construct a hash chain:
h(h(h(h(x)))); // A chain of 4 hashes, where "x" is the start of the chain.
We can take the previous block header’s hash and it include it as a field in the new block header:
header2.previous_block = h(header1)
This is repeated for every new block header and it forms the basis of the block chain.
Why hash the block header? An interesting observation is that we only need to hash the previous block header and we do not need to hash the block in its entirety.
Can you guess why? …
A block header already includes the transaction root which commits to all transactions in the block.
Powerful construction. An attacker cannot change 1-bit of any previous block without breaking the chain of hashes. This protects the integrity of all historical transactions and block headers in the blockchain. As long as we can identify what is the one true blockchain, then we do not need to trust the sender with its integrity. We can independently check it and use it to compute a copy of the database.
This brings us to a pinnacle idea that can help networks like Bitcoin and Ethereum scale to billions of users.
Light clients. We can assume that most users will not want to process the entire blockchain or independently maintain a copy of the database. Especially if a user is interacting with the network on an embedded device (like a mobile phone). We can call this a light client who has minimal computation, bandwidth or storage resources. At the same time, it is ideal if we can keep the principal of public verifiability and help light clients to be convinced about the validity of updates to the database at any given time.
Simplified Payment Verification Mode. The rise of light clients was anticipated by Satoshi Nakamoto and he proposed the concept of SPV. As long as a client can follow the proof of work and keep a copy of the relevant block headers, then we can use the proof of membership (merkle branch) technique to convince a light client user that a transaction was confirmed in the blockchain. For example, this allows a merchant to only release the goods to a user after the payment transaction is confirmed.
Of course, the original SPV mode does not come for free. It introduces a new security assumption that light clients must trust the block producers to only include valid transactions.
Light clients and SPV mode continues to evolve. In Ethereum, the proof of membership concept was extended to facilitate convincing light clients about the database state. For example, we can prove a user’s account balance or the state of a smart contract at a certain block height.
More generally, these techniques fall into a category of research called verifiable computing. We can go beyond simply proving information about the database state and convince a light client that an action (execution) was performed correctly without the need to replicate the same work.
As we will see later, it is possible to convince a light client:
Together, a light client does not need to trust the block producers to enforce the validity of transactions. They only need to trust them to stay online and to continuously propose new blocks which include transactions. As we will see, verifiable computing forms the foundation for layer-2 protocols and rollups.