In Devcon in 14th November 2024, Ethereum foundation, Phantom.zone, and 0xPARC launched a 10 000 USD bounty for their practical Indistinguishability Obfuscation implementation. You can read more about this in obfustopia.io. This text provides a brief explanation of Indistinguishability Obfuscation (iO), an overview of the associated bounty, and a description of how I successfully broke it.
Indistinguishability Obfuscation (commonly abbreviated as iO) is a cryptographic scheme designed to transform a program into a "black box." This means the program can be freely shared and executed while keeping its internal workings entirely hidden. Users can run the program as often as they wish, but they cannot deduce anything about its structure or internal logic.
At first glance, this concept might seem implausible. After all, if you have unrestricted access to the program, wouldn’t testing all possible inputs and analyzing the outputs reveal its functionality? In reality, the input space of most programs is so vast that exhaustive testing is practically impossible.
The most obvious applications of iO is in digital rights management (DRM). For example, a game could be processed through an obfuscator, allowing users to play it freely without reverse-engineering its code.
A less obvious but far more intriguing use case of iO is securely embedding secrets within an obfuscated program. For instance, a private cryptographic key can be hidden inside an iO-protected program. This approach enables the development of smart contracts that have access to the key but will only sign messages meeting specific criteria. This ensures the key remains secure while enforcing strict conditions for its usage. Furthermore, it enables smart contracts to operate outside blockchain environments, broadening their potential applications.
In theory, iO has the potential to serve as the foundation for almost any cryptographic protocol, earning it the nickname of a "god protocol". Its versatility and power make it a revolutionary concept in cryptography. However, this also raises skepticism, as such a powerful tool seems almost too good to be true.
When I first began researching Indistinguishability Obfuscation (iO)—a topic I had no prior knowledge of—I was deeply skeptical about its feasibility. It seemed nearly impossible, even from a purely theoretical standpoint. I had similar doubts when I first encountered zero-knowledge technology. However, after studying it, I became convinced that zero-knowledge proofs are not only theoretically sound but also practically applicable in many cases. While zero-knowledge technology has proven both viable and impactful, I remain unconvinced that iO is achievable in practice. Still, I hold out hope that it is possible, as its realization could unlock groundbreaking advancements in cryptography.
Despite its immense potential, a practical implementation of Indistinguishability Obfuscation has yet to be achieved. Various proposals have been put forward, but each has ultimately been proven flawed or insecure.
That said, there is a theoretical proof demonstrating that iO is possible under well-established cryptographic assumptions. This proof confirms that iO can exist in principle, though it does not guarantee the feasibility of a practical or efficient implementation.
It’s important to note that a fully black-box obfuscator—a tool that could obfuscate any program while revealing nothing beyond its input-output behavior—has been proven impossible. iO, however, is a more narrowly defined and attainable concept. Rather than obscuring all internal details of a program, iO ensures that two programs with identical functionality are indistinguishable when obfuscated.
For the bounty, the authors created a random program comprising 1,024 operations and then obfuscated it, increasing its size to over 200,000 operations. The goal of the bounty was to optimize this obfuscated program into a functionally identical version containing 1,024 operations or fewer. Importantly, the bounty was not about uncovering a hidden secret within the program, deciphering its purpose, or understanding its behavior. Instead, the challenge was purely focused on reducing the program to a more efficient representation. Naturally, once a smaller, optimized version is found, identifying any hidden aspects of the program becomes significantly easier.
The provided program is a straight-line code program, meaning it contains no loops or branching. Execution starts at the first line and proceeds sequentially to the last. The program takes a 64-bit Boolean array as its input (e.g., [true, false, true, false, ...]
), performs mutations on it, and outputs a 64-bit result. This design ensures the program is free from the halting problem and is not Turing complete.
The Obfustopia iO implementation employs a local mixing approach to obfuscate programs, as described in Towards general-purpose program obfuscation via local mixing. While the paper is dense, this summary by Janmajayamall offers a more approachable explanation. My simplified and concise interpretation of the protocol is as follows:
Rather than obfuscating an existing program, the bounty involves generating a random program that is then subject to obfuscation. This program is built using reversible boolean gates, meaning the operations can be undone by applying the same operations again. For example:
v0 = v0 XOR v1
In this case, we have two Boolean variables, v0
and v1
. We apply the XOR operation and store the result back in v0
. The following truth table shows all the possible outcomes:
If you perform this operation twice one after another:
v0 = v0 XOR v1
v0 = v0 XOR v1
nothing happens, this is an identity operation. This can be seen by writing this differently:
v0 = (v0 XOR v1) XOR v1
= v0 XOR v1 XOR v1
= v0 XOR (v1 XOR v1) // XOR operations can be computed in any order
= v0 XOR (FALSE) // XOR with itself is always false (check above truth table)
= v0
This would not be true if we were to use non-reversible gates, such as true
gate:
v0 = true
now, if we try do the operation again, we cannot recover v0
, v0
is always true and we cannot recover the information what the v0
used to be, this is a non-reversible gate. Reversible-gates are not important in terms of trying to crack the circuit, but they are important on how the obfuscated program is made in the first place.
Next, we obfuscate the random program using Local Mixing method. This approach involves taking a small subset of the program and modifying it in a way that preserves its functionality while making it appear different. A simple way to achieve this would be to add a few extra operations that ultimately have no effect, such as inserting two additional v0 = v0 XOR v1
gates. However, this method is not ideal, as an attacker could easily identify and remove these redundant operations. To achieve iO (indistinguishability obfuscation), a more robust form of obfuscation is required.
The local obfuscation consists of two stages:
Inflationary stage: The small subset of program is being made a bit bigger "inflated"
Kneading stage: The local parts of the programs are spread around to the rest of the program, like one would knead a bread.
Once I realized that the bounty is essentially an optimization problem, I quickly came up with a few ideas that I believed could break the obfuscation.
My first thought about attacking an iO protocol was as follows: If I am given an obfuscated program and I can observe its behavior as much as possible and run it as many times as needed, how can the obfuscation be secure? For example, if the obfuscated program is something simple, like a basic operation such as v0 = v1 AND v2
, I could easily run the program with random inputs millions of times and potentially deduce that it’s performing an AND operation.
One counterargument to this is that such an attack would be probabilistic. I might become fairly confident that the program is an AND operation, but I can never be entirely sure unless I exhaustively iterate through all possible inputs and confirm this from the output. Additionally, while the program might appear to be a simple AND operation, with a carefully chosen input, it could behave like an OR gate. To identify this, I would need to be very lucky with my input selection or be able to iterate through every possible input combination. This is known as the Punctured Program strategy, which is further explained in How to Use Indistinguishability Obfuscation: Deniable Encryption, and More. The Punctured Program technique prevents attackers from probabilistically guessing what the program is designed to do.
With the Obfustopia.io bounty, I know it's impossible to iterate over the entire input space (since 64 bits correspond to 1.8446744 * 10^19
unique inputs). However, I can be fairly confident that the randomly generated program won’t behave like a simple program except for one (or a very small subset of) input(s). Since the original program consists of 1,024 operations, I understand that it would be infeasible to guess which 1,024 operations to select in order to replicate the original program’s behavior. While I haven’t calculated the exact number of possible 1,024-operation programs for 64-bit inputs, I suspect it’s far too large to feasibly iterate through.
The obfuscated program looks something like this (but a lot a lot longer):
1: v20 ^= v51
2: v15 ^= !v2
3: v1 ^= !v42
4: v33 ^= (v40 & v51) | (!v40 & !v51)
5: v50 ^= !v32
6: v60 ^= v29
7: v13 ^= v26
8: v46 ^= !v43
9: v48 ^= (v13 & v19) | (!v13 & !v19)
10: v12 ^= v6
11: v29 ^= v22
12: v50 ^= !v32
Here, ^=
denotes the XOR assignment operation. This means we calculate the operation on the right-hand side, XOR it with the variable on the left, and then store the result back in the left-side variable. Similarly, &
represents the AND operation, |
represents the OR operation, and !
represents NOT.
If you look carefully, you’ll notice that the operation v50 ^= !v32
appears twice in the program. Between these two operations, neither v50
nor v32
is used elsewhere. As a result, we can safely remove these operations (as doing the same reversible-gate after another is an identity operation), resulting in the following simplified program:
1: v20 ^= v51
2: v15 ^= !v2
3: v1 ^= !v42
4: v33 ^= (v40 & v51) | (!v40 & !v51)
5: v60 ^= v29
6: v13 ^= v26
7: v46 ^= !v43
8: v48 ^= (v13 & v19) | (!v13 & !v19)
9: v12 ^= v6
10: v29 ^= v22
Now we have a program that is simpler, with only 10 operations, compared to the original which had 12 operations. The Obfustopia.io bounty program contains over 200,000 operations. Could we iteratively apply this kind of simplification to the program and eventually reduce it to 1,024 operations or fewer?
What’s particularly interesting here is that I can take any small subset of consecutive lines in the program and treat it as a separate program. I can then examine the input and output of this subprogram and check if it can be simplified. If simplification is possible, I can apply it and reinsert the simplified version into the original program. Each time I do this, the program becomes simpler. This optimization method is called Peephole Optimization, which is commonly used by various compilers.
This iterative process makes me question the viability of local mixing. An iO program should not allow for the obfuscation to be removed in this manner. The design of an iO program should ensure that every part of the program depends on everything else, making it impossible to remove any portion without disrupting the entire structure—similar to how most cryptosystems function.
I began by writing a Rainbow Table because I knew performance would be crucial for this task. My rainbow table consists of all possible ways to write a program that uses at most three operations and at most four variables. Here's a very stupid, but good enough program that iterates through all possible combinations of three operations for a selected number of variables (four in my case):
function* generateAllGates(numberOfVariables: number) {
for (let a = 0; a < numberOfVariables; a++) {
for (let b = 0; b < numberOfVariables; b++) {
for (let target = 0; target < numberOfVariables; target++) {
for (let gate_i = 0; gate_i < 16; gate_i++) {
yield { a, b, target, gate_i }
}
}
}
}
}
I then replicated this for programs using zero, one, and two operations as well. This resulted in a rainbow table that ended up being over one gigabyte in size.
With this rainbow table, I could efficiently answer queries like: "Given a program with at most four variables for both input and output, what is the most optimal way to write it in terms of the number of operations (given that the most optimal program can be represented with three or less operations)?".
To test this attack method, I generated a simple 4-bit obfuscated program using the tools provided in the bounty. This process was incredibly helpful, as having access to the code that generated the original obfuscation made it much easier to understand the problem and test my own program to ensure I was approaching it correctly.
I then iteratively worked through the program, extracting subsets of 2-10 lines. If the truth table for a given subset was found in my rainbow table, I replaced it with the most optimal representation I had. My program was remarkably successful! I was able to crack all the simple 4 bit programs I generated.
I was still facing one significant challenge. The bounty program operates on 64-bit inputs, while I had only been able to crack 4-bit programs. How could I scale my solution to handle larger programs?
The key insight here was that I didn’t necessarily need to solve the entire 64-bit program at once. Instead, I was solving smaller subprograms, and I could focus on subprograms that utilized only a smaller number of variables. In fact, most of the obfuscated program consists of subprograms that use only a few variables. Additionally, each gate can depend on at most three variables—such as v0 = v1 AND v2
, which involves three variables and is the most complex gate in terms of variable dependencies.
I adapted my program to only consider subprograms that used at most four variables, instead of considering all possible 2-10 line subsets. With this adjustment, I started applying my solution to the actual program. To my surprise, I was able to reduce the original program to fewer than 100,000 gates, achieving over a 50% reduction!
I also realized that a program consisting of operations executed sequentially can be written in multiple ways. Specifically, you can swap the order of rows that don’t depend on each other. For example:
1: v1 ^= v3
2: v60 ^= v29
3: v13 ^= v26
and
1: v1 ^= v3
2: v13 ^= v26
3: v60 ^= v29
are identical programs, but my rainbow table implementation treats them as different programs, as I only compare consecutive lines.
To address this, I implemented a row swapper that scans through the program and randomly swaps rows that don’t depend on each other. After making this adjustment, I ran the attack again and noticed that the optimization progressed even further!
After applying the rainbow table attack and the row shuffler, I realized that I could also target program subsets that involve more than four variables. I came up with a strategy where I could take a subprogram with 5-8 variables and randomly replace one of its variables with another variable from the subset. For example, if I have a program that takes [v0, v1, v2, v3, v4]
, I could map v0
to v4
, reducing the program to only the variables [v1, v2, v3, v4]
. I could then calculate the truth table for both versions of the program and compare them. If they are identical, I know I can remove one of the variables, thereby simplifying the program. This approach reduces the number of variables, but not the number of operations.
However, because the subset now only involves four variables, my rainbow table attack becomes effective. This variable reduction strategy scales better than the rainbow table approach alone, as even with 10 variables, the truth table only has 2^10 = 1024
rows, making it more manageable to be computed.
I didn’t need to increase the number of variables any further because the obfuscation had already been cracked. I managed to reduce the program to just 1 023 operations, which is one less than the original! I packed the simplified program in the format expected for the bounty and started comparing it to the deobfuscated program using the tools included in the bounty.
Checking if two programs are identical is no simple task. In the bounty, this is done by randomly sampling inputs and verifying that both programs produce the same output. I ran this check repeatedly, increasing the number of iterations. I was confident that I had cracked the obfuscation, so I submitted the program to the organizers, who were also convinced that the obfuscation had been broken.
As the solution progressed, I was able to reduce the program even further, ultimately bringing it down to fewer than 800 gates. This means I was able to find a more optimized version of the author's original pseudo-random program.
My biggest issue with the solver was that it was not deterministic, I am randomly shuffling the rows of it and then testing if I can find any matches, and then doing this again and again. This is super inefficient. I also had way more attack ideas to try, but I never needed to program them in.
I have been slowly optimizing and making the deobfuscator better, you can find the most up to date code from GitHub. This code is not the original code that cracked the bounty, but a more optimized and smarter version.
After discussing with the authors of the bounty, they acknowledged that the original obfuscator contained a bug that made the challenge easier to solve. They are working on a new version, which they believe will prevent the type of attacks I employed. Interestingly, the authors of the local mixing paper explicitly address the kind of attack I used, stating in their paper that when the protocol is correctly implemented, such attacks should not be possible.
In a humorous twist, during Devcon, Barry Whitehat asked whether an attack similar to mine could be executed against the program. At the time, he was told it wasn’t possible. Well, it turns out Barry was right.
I am eagerly looking forward to testing their updated obfuscation system. I intend to keep trying to break it, again and again, until I am either proven wrong or we finally achieve a robust Indistinguishability Obfuscation cryptosystem.
That said, it’s essential to remember that just because I cannot break a cryptosystem doesn’t mean it is secure. The Local Mixing approach, for instance, has not been proven secure. Even if implemented correctly, it might still contain vulnerabilities.