Illia Dovhopolyi, Artem Sdobnov, Dmytro Zakharov
The UltraGroth proving system explained here is an adapted for real use cases version of the protocol initially described in Merlin404's article on HackMD. Huge credit to their work, which inspired much of what follows.
We assume you already understand the basics of modern zero-knowledge (ZK) constructions. This article aims to build intuition around the UltraGroth proving system, and deep, formal knowledge of Groth16 isn't strictly necessary. However, familiarity with our previous blogs on the Bionetta framework will be beneficial, as examples here utilize Bionetta circuits.
During the development of Bionetta, one of our most significant challenges was optimizing the constraint cost of activation functions like ReLU. Each ReLU operation costs at least constraints due to the bit decomposition of input signals. Considering neural networks rely heavily on non-linear activations, this quickly becomes prohibitively expensive. As an example, consider our LeakyReLU
implementation:
template LeakyReLUwithCutPrecision(...) {
signal input inp;
signal output out;
var absBits[253] = abs_in_bits(inp);
signal absInBits[253];
var abs = 0;
var pow = 1;
for (var i = 0; i < 253; i++) {
absInBits[i] <-- absBits[i];
// We need to constraint each bit is either 0 or 1 separately :(
absInBits[i]*(1 - absInBits[i]) === 0;
abs += pow * absInBits[i];
pow += pow;
}
(abs - inp)*(abs + inp) === 0;
// 3 more constraints for max(0, inp) and precision cut
}
Our "zkLiveness" model for detecting whether the input photo corresponds to the real person (and not to, say, a photo from someone’s device), requires non-linear activations, totaling at least constraints.
Initially, reducing constraints here seemed impossible — we couldn't lower the activation count or simplify bit-checking. The further research showed that one of the prominent solutions was introduced in the PlonKish proving systems, where repetitive computation of the same function is verified using lookup tables.
However, we’re sticking with Groth16 because of its other Bionetta-specific benefits like “zero” constraint linear layers. To get the best of both worlds, we developed UltraGroth — a Groth-based proving system that brings lookups to our circuits😎
The idea behind lookup tables is straightforward. Rather than checking each bit individually, we can split activation function inputs into larger chunks and verify them against a predefined lookup table containing all values from 0 to .
To do this, let’s first rewrite the LeakyReLU
function from the previous example so that it splits the input signal into chunks and outputs them so that we can check their range later:
template LookupReLU(CHUNK_NEW, CHUNK_OLD, CHUNK_SIZE) {
assert (CHUNK_NEW <= CHUNK_OLD);
signal input inp;
signal output out;
// Now the ReLU outputs chunks so we can handle range check later
signal output absInChunks[253 \ CHUNK_SIZE];
var absChunks[253 \ CHUNK_SIZE];
for (var i = 0; i < 253 \ CHUNK_SIZE; i++) {
absChunks[i] = (value >> (i * CHUNK_SIZE)) & ((1 << CHUNK_SIZE) - 1);
}
var abs = 0;
var pow = 1;
for (var i = 0; i < 253 \ CHUNK_SIZE; i++) {
// We don't need to constraint each bit anymore
absInChunks[i] <-- absChunks[i];
abs += pow * absInChunks[i];
pow *= 1<<CHUNK_SIZE;
}
(abs - inp)*(abs + inp) === 0;
// 3 more constraints for max(0, inp) and precision cut
}
In the top-level circuit, we combine all chunks from layers with non-linear activations into a single signal vector. Although each activation produces multiple chunks (CHUNK_FOR_NUM
), their order doesn't matter here:
var CHUNK_SIZE = 15;
var CHUNK_FOR_NUM = 253 / CHUNK_SIZE;
var CHUNK_NUM = /* sum calculated based on network layers */;
var CHUNK_BUSY = 0;
signal chunks[CHUNK_NUM];
// Some layer with non-linearities
component layer1Comp = LookupEDLightConv2D(...);
layer1Comp.inp <== image;
// Aggregating chunks from activation functions
for (var i = 0; i < layer_size * CHUNK_FOR_NUM; i++) {
chunks[i + CHUNK_BUSY] <== layer1Comp.chunks[i];
}
CHUNK_BUSY += layer_size * CHUNK_FOR_NUM;
To complete this optimization, we must ensure each aggregated chunk falls within the expected range. This verification is done through a lookup argument.
Consider a lookup table representing the range [0; . Our goal is to prove that each chunk in the vector ) exists in . Thus, all chunk values must fall within the specified range.
To constraint this, we first count how many times each value appears in the chunks, creating a multiplicity vector , where is the occurrence count of value in .
Finally, using a random challenge point , we perform the following check:
If this equality holds, all chunk values are confirmed to be within the lookup table. Detailed reasoning and proof are available in this paper.
This is how this check looks inside the Bionetta circuit:
signal input rand;
signal chunks[CHUNK_NUM];
// Aggregating chunks
// Counting m_i coefficients
var m_temp[1<<CHUNK_SIZE];
for (var i = 0; i < CHUNK_NUM; i++) {
m_temp[chunks[i]]++;
}
signal m[1<<CHUNK_SIZE];
m <-- m_temp;
var s1 = 0;
var s2 = 0;
signal inv1[CHUNK_NUM];
signal inv2[1<<CHUNK_SIZE];
signal prod[1<<CHUNK_SIZE];
// Counting s1
for (var i = 0; i < CHUNK_NUM; i++) {
inv1[i] <-- (chunks[i] + rand) != 0 ? 1 / (chunks[i] + rand) : 0;
inv1[i] * (chunks[i] + rand) === 1;
s1 += inv1[i];
}
// Counting s2
for (var i = 0; i < 1<<CHUNK_SIZE; i++) {
inv2[i] <-- (i + rand) != 0 ? 1 / (i + rand) : 0;
inv2[i] * (i + rand) === 1;
prod[i] <== inv2[i] * m[i];
s2 += prod[i];
}
s1 === s2;
The lookup argument introduced above hinges on a single field element . If the prover were free to choose , it could maliciously pick a value that makes hold even when some chunk is outside the table .
UltraGroth prevents this by deriving the challenge from a previously committed part of the witness. We split the private witness into two rounds:
After the prover computes the first commitment , the verifier derives .
In Groth16, the verifier checks the proof using a single pairing equation:
In UltraGroth, this equation changes slightly. Instead of a single commitment , the prover sends two commitments: and . The verifier updates the pairing check to:
Additionally, the verifier does not accept a random value from the prover. Instead, it derives the randomness on its own: . This derived value is then inserted into the public inputs commitment (i.e., into ) so that it's part of the final verification check.
To make the scheme practical, we modified the fastest open-source Groth16 stack rather than starting from scratch.
https://github.com/rarimo/ultragroth — UltraGroth prover (fork of https://github.com/iden3/rapidsnark).
https://github.com/rarimo/ultragroth-snarkjs — UltraGroth zkey and Solidity verifier generator (fork of https://github.com/iden3/snarkjs).
Note that UltraGroth is compatible with Groth16’s powers-of-tau, so you do not need a new trusted setup.
Let's estimate the savings. Assume the model has activations, each with a -bit input. Instead of checking each bit separately, we split inputs into chunks of size bits.
In the original method, every activation required constraints to perform a full bit decomposition. Thus, the total number of constraints was simply constraints.
In the lookup-based method, the constraint cost comes from two parts. First, we need constraints to handle the lookup table inverses sum. Second, we split each activation into chunks, resulting in an additional constraints for the chunk inverses sum. Altogether, the new total constraint number is:
Thus, we go from $\color{blue}{977592} $ down to constraints. This gives us more than a improvement and even more for heavier circuits.
Figure 1 shows the number of constraints before and after applying lookup optimizations: the red graph represents the original Groth16 cost rising linearly with the number of activations, while the blue and green graphs correspond to chunk sizes of 14 and 15, respectively. Notice how both optimized graphs stay near the bottom of the chart, growing only slightly even as activations increase from 15000 to 30000.