Thinking in Zero Knowledge

By yush_g and nulven, with special kudos to 0xPARC, gubsheep, Austin Griffith, Veronica Zheng, Kobi Gurkan, Wei Jie Koh, and Vivek B for helping + reviewing!

Article estimated reading time: 1-2 hours (also can watch a simplified recording on YouTube). Let me know if it took you less/more time :)

Motivation: Why ZK

Zero knowledge proofs are a powerful cryptographic primitive, often used for proving pre-images of hashes (see the planet interaction in zkga.me) or anonymizing transactions (see zcash). We envision (and will interactively lead you through) an on-chain application that allows anyone to feel safe expressing their truth. Imagine an application where they could anonymously post on behalf of a verified group, without revealing who they are -- whether journalists are reporting without censorship, students of a university posting anonymous quips, congress members voting anonymously, or employees of a company are reporting harassment. This type of social media is ripe for crypto innovation, and this is a great start.

For this to happen in a credibly neutral way (see Vitalik’s definition), crypto-native code is a natural method by which all the code can be public and a member can feel completely safe in their anonymity. ZK-snarks (and starks, and newer gadgets) allow zero knowledge proofs that prove anonymity, to verify in solidity on chain. For this construction, we need to engineer two functions: to prove group membership on chain, and to post on behalf of that group. We’ll give a naive construction, and then attack each one’s security individually -- showing why ZK is needed, and how one could properly execute such a scheme. The hope of this post is to inspire you all to engineer your own secure ZK protocols, by thinking like a true snarkoor.

The First Construction

Let’s assume there is an off-chain group we want to anonymously replicate on chain. Companies or groups can share a password between them, but have no public connection from that to their identities or wallets. We present a simple scheme, the most naive such implementation that minimally implements both steps on chain without any anonymity or security.

Join: A smart contract stores the hash(hash(password)). You send the hash(password), with a public key. That public key is added to an allowlist for message sending.
Send: In future, can send a signed message with any allowlisted public key.

I recommend pausing for a moment and trying to think of all the security properties this breaks, before reading the three answers. In general, this will be a good way to read the post and fully convince yourself of the need for each building block of machinery.

.

.

.

Anyone can frontrun or resend a previously sent hash(password) with their own public key, and join a group.

This reveals the public key of the sender: when joining the group, when sending each transaction, and when verifying the signature on the message!

The first few messages sent have a small anonymity set on chain, even if the whole group is large.

.

.

.

Traditionally in crypto, revealing a public key has been desirable -- however, here we prioritize privacy and want anonymity for more general functions than just sending money.

To make explicit our goals, we define these properties:

Actions: Users can create groups, join them, and post from them.

Security properties: Groups are permissioned by password, users anonymously join, messages cannot be traced back to a user, and messages cannot be forged.

The Plan to Prove Group Membership On Chain

How do you properly show that your private/public key belongs to a group? Let’s try a natural extension of our original idea, each time attempting to fix one of the issues. At the end, we will have the needed complexity (fans of Gall’s law will recognize this, where we build the minimally complex system from simpler systems that don’t fully work).

Let’s start with the simple contract idea, and discuss how to verify the password to allow users to join an allowlist.

Password security

Password idea 1: Recall our naive construction, a smart contract where you send the hash(password) and your public key, to be added to an allowlist on-chain.

Check if you remember the problems with only this part before moving on!

.

.

.

You prove you have the password and are part of the group, but you lose the uniqueness (anyone can send the hash(password)) and the privacy. Oof.

.

.

.

Let’s focus on the reusable password problem first, then we can handle the public keys. If the problem is reuse, maybe we can solve it with many one-time-use passwords?

Password Idea 2: Additionally send a nonce -- in this case, a unique password per user. As before, send hash(password). Smart contract verifies it matches anyone in the preset list of hash(hash(password))s, then adds your public key to the allowlist as before.

This is almost exactly what we want, and we made our implementation with this, but there are a few subtle improvements still.

.

.

.

This adds an inherent max group size (because there is a finite number of passwords), which in some cases may be the desired design. When there are few allowlisted users, users who send a second message can be associated with their past messages (recall that messages are sent with allowlisted public keys only)! This can be desirable (and is why we implement our actual GitHub repo off of this), but in case we want complete dissociation from the start, let’s search a little more.

.

.

.

An astute blockchain enthusiast may have noticed this unique nonce can still be frontrun. The standard commit reveal with this scheme avoids frontrunning -- this is the same scheme ENS uses. Basically, you publish the hash of this transaction x blocks in advance, and then their actual message will only go through if there is a commit to it already at least x blocks beforehand. As long as the transaction doesn’t linger in the mempool too long, this will mean attackers have to wait for x blocks.

Turns out if you couple the join and post steps cleverly, we don’t actually need this! We’ll elaborate more on that solution later.

Assuming frontrunning is resolved as mentioned, let’s address the multiple password issue first.

Password idea 3: Back to one group password. Each user concatenates a nonce (message number, corresponding to the number of messages sent by anyone in that group prior), and proves it is part of a merkle tree of 1M hash(password | nonce)s (by sending ~13 hashes in total), which the smart contract verifies with the root. The merkle root is on chain, but the entire merkle tree can be off chain or re-computed manually by the user. The user will “rejoin” by using a new nonce, for each message sent. [Note that | means concatenation]

Note this merkle tree limits the number of total messages, but since merkle tree widths vary exponentially with heights, we can easily add more possible nonces for little cost. There’s a particularly devious problem with this solution, relating to the merkle proofs.

.

.

.

If the nonces are consecutive, you might give away the next nonce’s hash in the process of proving the previous message! This is because proving a route up a merkle tree involves showing you know the hash of all the neighbors at each level. Users need to prove they knew the pre-image of the hash... ** (additional detail addressed at bottom)

.

.

.

Folks familiar with ZK will realize the problem of proving a pre-image is canonical, and we try exactly that.

Password idea 4: One group password. Each user concatenates a nonce (message number), and a merkle path of a merkle tree of say 1M hash(password | nonce)s (by sending ~20 hashes in total). They also send a zk proof that they know the pre-image of the hash.

This works! The group on creation starts with the merkle root of the hash(password | nonce)s.

We can do slightly better: probably not worth implementing, but an interesting theoretical exercise. This has one point of failure: the entire system depending on one shared password that might get leaked. Can we have a system of individual passwords that allows reuse, without replay attacks?

Password idea 5: Individual passwords. The contract has all the merkle roots of each hash(password | nonce) for each password and nonce. We send a message nonce based on the number of messages sent from each user so far, and a zk-proof that we know the pre-image, and the merkle root of a claimed merkle path of (password | nonce) is one of the allowed merkle roots in the contract.

One might ask what such a zk proof would look like. Let’s assume there are 26 allowlisted merkle root a...z as public inputs to the circuit, we can send a proof that our generated merkle root, say x, satisfies (a-x)(b-x)(c-x)... == 0. One of these will be (x-x), and the proof will pass.

Multiple points of failure might encourage anonymous password leaking -- we discuss at the bottom how this can be discouraged with staking.***

Public key security

Now that we’ve resolved the password leaking issues, we still have the public key leaking issues to fix. We can assume the join step is taken care of by password idea 2, where users use unique passwords to add a public key to an allowlist (focusing mostly the posting step helps simplify the logic).

Public key idea 1: Recall this from our original idea. After the join step, a smart contract stores each authenticated public key. On the post step, a user posts a message with a transaction from their public key, which the contract checks is in the allowlist.

What are the three places the public key is revealed here?

.

.

.

The public key gets compromised three times here: in the contract’s list after joining, on posting when you prove you’re in the list, and in the sending address.

.

.

.

Let’s focus on the first two leaks, by not having plaintext public keys stored as a list in the contract.

Public key idea 2: On the join step, the contract instead stores hashes of all the allowlisted public keys. On the post step, a message sender sends which hash(public key) they are (which is checked to be in the allowlist), and a zk proof that they know the preimage of that hash (thus not revealing which public key).

Even before thinking about attaching message text, or the sending address leaking the public key, there’s a problem with this hashing.

.

.

.

It is actually possible to brute force all possible preimages of the public key hashes; the anonymity set of all ETH public keys is only ~350M.

.

.

.

One could imagine using random keypairs and not ETH addresses, but ETH addresses are useful to, for instance using their NFTs as a gate to joining a group. How do we avoid brute force-able hashes?

Public key idea 3: On the join step, the contract instead stores hash(public keys | salt). On the post step, a message sender sends which hash(public key | salt) they are (which is checked to be in the allowlist), and a zk proof that they know the preimage of that hash (thus not revealing which public key or salt). The salt can be any random string.

.

.

.

We’ve fixed the contract revealing the public key! Note we are still sending from the public key, so let’s focus on that, and forget the message for a moment.

.

.

.

Public key idea 4: The same as idea 3, but joining and posting is sent via a relayer. For a reminder of idea 3: on the join step, the contract instead stores hash(public keys | salt). On the post step, a message sender sends which hash(public key | salt) they are, and a zk proof that they know the preimage of that hash.

For a relayer, we can use Gas Station Network (how GSN works) and fund the account’s gas via tornado.cash/zcash! We could alternatively harness frontrunners to anonymize transactions like this secret unreleased project if the contract pays for gas. But there is a consequence of allowing someone else to send a transaction only dealing with public keys...

.

.

.

Now, when joining, you could choose any public key that’s not yours instead. Then, you can post a message that seems to come from them!

We can prove knowledge of a private key via a signature or raw calculation of a private key. How can we do this privately, because verifying either would reveal a public key?

.

.

.

Public key idea 5: Same as idea 4, but on the post step we also prove the private key generates the public key in the same zk proof. We also attach the message.

For a reminder of idea 4: We use relayers. On the join step, the contract instead stores hash(public keys | salt). On the post step, a message sender sends which hash(public key | salt) they are, and a zk proof that they know the preimage of that hash.

This solves anonymity of the sender! A plaintext message has one issue...

.

.

.

If another account sees this transaction in the mempool, they can send an identical transaction but change the message text. Thus, the frontrun message can be completely forged! We can’t attach a message signature in plaintext or else we’d reveal our public key, but we have one last zk trick up our sleeves...

.

.

.

Public key idea 6: We use relayers. On the join step, the contract instead stores hash(public keys | salt). On the post step, a message sender sends which hash(public key | salt) they are, and a zk proof that: they know the preimage of a hash (public key | salt), that pre-image is in the allowlist, and that you can produce a message signature that verifies with the same public key. The message checked in the proof is public.

This works! If we reveal which hash we are in the list, we can optionally link ourselves to previous posts if we’d like. Remember the group on creation has either some merkle trees/salted hashes for password verification like an invite key system. To allowlist only certain users, we can set the salt as the password and just have a bunch of hash(public key | password)s pre-stored.

Full Constructions

We’ve done all the dirty work. Now, we just have to put it all together!

Full construction 1:

Create step: The group creator adds a bunch of hash(hash(password))s to a contract.

Join step: Use a zk proof with the password scheme to verify a password, and add hash(public key | salt) to an allowlist.

Post step: A ZK proof shows that you know a public key (and some salt) that hashes to a value on the allowlist, AND verify sign(message) from your private key. Also, attach the message. As before, sent via a relayer.

The exact zk proof input and outputs are in the next top level header, for more details! This protocol means a user must have a new password for each message, but can opt to link their message to a previous instance of a user. We implement this password scheme idea in our repository. To resolve frontrunning, a production instance could simply add commit-reveal, or verify the salt is a password.

Full construction 2:

Create step: The group creator adds a bunch of merkle roots of hash(password | nonce)s.

Join step: Send a zk proof with the password scheme that proves a merkle root, and hash(public key | password salt) uses the same password. The hash(public key | password salt) added to an allowlist.

Post step: A ZK proof shows that you know the public key (and the password salt) that hash to any one allowlisted value, AND verify sign(message) from your private key. Also attach message. As before, sent via a relayer.

This works even better, removing the need for unique passwords and making frontrunning impossible, since the original password is never revealed but needs to be known. This also enables linking to past messages, if one opts to make hash(public key | password salt) a public input to the circuit instead.

Protocol

To prove this could actually be done, we build a general purpose client and set of circom circuits/solidity contracts that implements all the logic from the first construction, and proves in a couple of seconds on chain. It is at https://github.com/nulven/zk-message-board.

This is what our primary joining zk proof looks like:

Private inputs

  • Password (= salt)
  • Private key

Public inputs

  • Our hash(public key | salt)

Main proof constraints

  • Private key generates public key
  • Hash of public key and salt generates the public hash

This is what our primary message zk proof looks like:

Private inputs

  • Password (= salt)
  • Private key
  • Public key | salt
  • sign(message, public key)

Public inputs

  • Message nonce
  • Message
  • All allowed hash(public key | salts)

Main proof constraints

  • sign(message) is correct
  • public key | salt allowed

If trying to avoid front running, one would tack on ‘verify password is valid’ to the last proof’s constraints.

Extensions

Note that after this implementation, there were some great new off-chain constructions like zkmessage.xyz that provide some pretty cool new features (it’s a very elegant construction as well!) -- feel free to reach out to them/me for more info!

WIth this in place, one could extend the given constructions for several more situations. Want to recreate Signal, but where the data, server, and client layers are permissionless and public? Just encrypt each message one more time with a group’s “password” -- now you have private group chats! Want to remove the need for passwords in the first place? Any mathematically verifiable function will do in place of it (say, a specific property of a specially generated API key). Want connections from web2 groups to these? Using InterRep for early proof of concepts like verified Twitter badges is also an exciting way to move group communications onto web3. Worried about gas? Deploy proof of concepts on the xdai sidechain or an EVM-compatible rollup/L2 until fees are low. We’re excited to help anyone who wants to build such an extension -- we think these are particularly interesting constructions.

Implications

Privacy is a difficult topic to meander -- any solution that shields users from government backdoors enables scammers, criminals, and covert communication. Given that end to end secure messaging already exists, we are not worried about the additional implications of releasing this technology; this is simply a credibly neutral version of tech already provided by trusted third parties.

** Password Idea 3: Note that one could alternatively just block off every alternate node in the merkle tree, and make the even nonces necessary in proofs but invalid for messages. This works, but still boils down to proving the pre-image of a hash (in this case by just sending it).

*** Staking: To discourage password leaking, we can have each password user stake money, and if an attacker knows a leaked password, they can tell the contract and receive the stake in exchange for deleting that merkle root. Assuming this amount is more than the value an attacker gains from an unfavorable post, this system is incentivized to be more secure against password leaks (on the social layer, not tech layer).

Subscribe to yush_g
Receive the latest updates directly to your inbox.
Verification
This entry has been permanently stored onchain and signed by its creator.