Implementing Lookups in Groth: The second secret ingredient to reduce Bionetta’s constraints
June 2nd, 2025

Illia Dovhopolyi, Artem Sdobnov, Dmytro Zakharov

Acknowledgement

The UltraGroth proving system explained here is an adapted for real use cases version of the protocol initially described in Merlin404's article on HackMD. Huge credit to their work, which inspired much of what follows.

Prerequisites

We assume you already understand the basics of modern zero-knowledge (ZK) constructions. This article aims to build intuition around the UltraGroth proving system, and deep, formal knowledge of Groth16 isn't strictly necessary. However, familiarity with our previous blogs on the Bionetta framework will be beneficial, as examples here utilize Bionetta circuits.

Activation Functions are too expensive

During the development of Bionetta, one of our most significant challenges was optimizing the constraint cost of activation functions like ReLU. Each ReLU operation costs at least 253253 constraints due to the bit decomposition of input signals. Considering neural networks rely heavily on non-linear activations, this quickly becomes prohibitively expensive. As an example, consider our LeakyReLU implementation:

template LeakyReLUwithCutPrecision(...) {  
    signal input inp;  
    signal output out;  

    var absBits[253] = abs_in_bits(inp);

    signal absInBits[253];  

    var abs = 0;  
    var pow = 1;  
    for (var i = 0; i < 253; i++) {  
        absInBits[i] <-- absBits[i];  
        // We need to constraint each bit is either 0 or 1 separately :(
        absInBits[i]*(1 - absInBits[i]) === 0;
        abs += pow * absInBits[i];  
        pow += pow;  
    }
    (abs - inp)*(abs + inp) === 0;

    // 3 more constraints for max(0, inp) and precision cut
}

Our "zkLiveness" model for detecting whether the input photo corresponds to the real person (and not to, say, a photo from someone’s device), requires 38643864 non-linear activations, totaling at least 3864×253=9775923864 \times 253 = 977592 constraints.

Initially, reducing constraints here seemed impossible — we couldn't lower the activation count or simplify bit-checking. The further research showed that one of the prominent solutions was introduced in the PlonKish proving systems, where repetitive computation of the same function is verified using lookup tables.

However, we’re sticking with Groth16 because of its other Bionetta-specific benefits like “zero” constraint linear layers. To get the best of both worlds, we developed UltraGroth — a Groth-based proving system that brings lookups to our circuits😎

Efficient Lookup Argument

The idea behind lookup tables is straightforward. Rather than checking each bit individually, we can split activation function inputs into larger chunks and verify them against a predefined lookup table containing all values from 0 to 2chunk_size12^\text{chunk\_size} - 1.

To do this, let’s first rewrite the LeakyReLU function from the previous example so that it splits the input signal into chunks and outputs them so that we can check their range later:

template LookupReLU(CHUNK_NEW, CHUNK_OLD, CHUNK_SIZE) {  
    assert (CHUNK_NEW <= CHUNK_OLD);  
  
    signal input inp;  
    signal output out;  
  
    // Now the ReLU outputs chunks so we can handle range check later
    signal output absInChunks[253 \ CHUNK_SIZE];  
  
    var absChunks[253 \ CHUNK_SIZE];

    for (var i = 0; i < 253 \ CHUNK_SIZE; i++) {  
        absChunks[i] = (value >> (i * CHUNK_SIZE)) & ((1 << CHUNK_SIZE) - 1);  
    }  
  
    var abs = 0;  
    var pow = 1;  
    for (var i = 0; i < 253 \ CHUNK_SIZE; i++) {
        // We don't need to constraint each bit anymore
        absInChunks[i] <-- absChunks[i];
        abs += pow * absInChunks[i];  
        pow *= 1<<CHUNK_SIZE;  
    }  
    (abs - inp)*(abs + inp) === 0;

	 // 3 more constraints for max(0, inp) and precision cut
}

In the top-level circuit, we combine all chunks from layers with non-linear activations into a single signal vector. Although each activation produces multiple chunks (CHUNK_FOR_NUM), their order doesn't matter here:

var CHUNK_SIZE = 15;
var CHUNK_FOR_NUM = 253 / CHUNK_SIZE;
var CHUNK_NUM = /* sum calculated based on network layers */;
var CHUNK_BUSY = 0;  
signal chunks[CHUNK_NUM];  

// Some layer with non-linearities
component layer1Comp = LookupEDLightConv2D(...);  
layer1Comp.inp <== image;  

// Aggregating chunks from activation functions
for (var i = 0; i < layer_size * CHUNK_FOR_NUM; i++) {  
    chunks[i + CHUNK_BUSY] <== layer1Comp.chunks[i];  
}
CHUNK_BUSY += layer_size * CHUNK_FOR_NUM;

To complete this optimization, we must ensure each aggregated chunk falls within the expected range. This verification is done through a lookup argument.

Consider a lookup table L=(0,1,,n)=(0,1,,2chunk_size)L = (\ell_0, \ell_1, \ldots,\ell_n) = (0, 1, \dots, 2^{\text{chunk\_size}}) representing the range [0; 2chunk_size1]2^{\text{chunk\_size}} - 1]. Our goal is to prove that each chunk cic_i in the vector C=(c0,c1,,cmC = (c_0, c_1, \ldots, c_m) exists in LL. Thus, all chunk values must fall within the specified range.

To constraint this, we first count how many times each value appears in the chunks, creating a multiplicity vector M=(m0,m1,,mn)M = (m_0, m_1, \ldots, m_n), where mim_i is the occurrence count of value ii in CC.

Finally, using a random challenge point rand\mathsf{rand}, we perform the following check:

s1=i=0m1randci,s2=i=0nmirandi,s1=?s2s_1 =\sum_{i=0}^{m} \frac{1}{\mathsf{rand} - c_i}, \quad s_2 = \sum_{i=0}^{n} \frac{m_i}{\mathsf{rand} - \ell_i}, \quad s_1 \stackrel{?}{=} s_2

If this equality holds, all chunk values are confirmed to be within the lookup table. Detailed reasoning and proof are available in this paper.

This is how this check looks inside the Bionetta circuit:

signal input rand;
signal chunks[CHUNK_NUM];  
  
// Aggregating chunks

// Counting m_i coefficients
var m_temp[1<<CHUNK_SIZE];  
for (var i = 0; i < CHUNK_NUM; i++) {  
   m_temp[chunks[i]]++;  
}  

signal m[1<<CHUNK_SIZE];  
m <-- m_temp;  

var s1 = 0;  
var s2 = 0;  
signal inv1[CHUNK_NUM];  
signal inv2[1<<CHUNK_SIZE];  
signal prod[1<<CHUNK_SIZE];

// Counting s1
for (var i = 0; i < CHUNK_NUM; i++) {  
   inv1[i] <-- (chunks[i] + rand) != 0 ? 1 / (chunks[i] + rand) : 0;  
   inv1[i] * (chunks[i] + rand) === 1;  
   s1 += inv1[i];  
}  

// Counting s2
for (var i = 0; i < 1<<CHUNK_SIZE; i++) {  
   inv2[i] <-- (i + rand) != 0 ? 1 / (i + rand) : 0;  
   inv2[i] * (i + rand) === 1;  
   prod[i] <== inv2[i] * m[i];  
   s2 += prod[i];  
}  

s1 === s2;

Why UltraGroth?

The lookup argument introduced above hinges on a single field element rand\text{rand}. If the prover were free to choose rand\text{rand}, it could maliciously pick a value that makes i=0m1randci=?i=0nmirandli\sum_{i=0}^{m} \frac{1}{\text{rand} - c_i} \stackrel{?}{=} \sum_{i=0}^{n} \frac{m_i}{\text{rand} - l_i} hold even when some chunk cic_i is outside the table LL.

UltraGroth prevents this by deriving the challenge from a previously committed part of the witness. We split the private witness into two rounds:

After the prover computes the first commitment C1C_1, the verifier derives rand=Hash(C1)\text{rand} = \text{Hash}(C_1).

In Groth16, the verifier checks the proof using a single pairing equation:

e(A,B)=e([α]1,[β]2)e(C,[δ]2)e(IC,[γ]2)e(A, B) = e([\alpha]_1, [\beta]_2) \cdot e(C, [\delta]_2) \cdot e(IC, [\gamma]_2)

In UltraGroth, this equation changes slightly. Instead of a single commitment CC, the prover sends two commitments: C1C_1 and C2C_2. The verifier updates the pairing check to:

e(A,B)=e([α]1,[β]2)e(C1,[δ1]2)e(C2,[δ2]2)e(IC,[γ]2)e(A, B) = e([\alpha]_1, [\beta]_2) \cdot e(C_1, [\delta_1]_2) \cdot e(C_2, [\delta_2]_2) \cdot e(IC, [\gamma]_2)

Additionally, the verifier does not accept a random value from the prover. Instead, it derives the randomness on its own: rand=Hash(C1)\text{rand}=\text{Hash}(C_1). This derived value is then inserted into the public inputs commitment (i.e., into ICIC) so that it's part of the final verification check.

To make the scheme practical, we modified the fastest open-source Groth16 stack rather than starting from scratch.

Note that UltraGroth is compatible with Groth16’s powers-of-tau, so you do not need a new trusted setup.

How Many Constraints Does This Save?

Let's estimate the savings. Assume the model has n=3864n = 3864 activations, each with a b=253b = 253-bit input. Instead of checking each bit separately, we split inputs into chunks of size =15\ell = 15 bits.

In the original method, every activation required bb constraints to perform a full bit decomposition. Thus, the total number of constraints was simply n×b=3864×353=977592n \times b = 3864 \times 353 = \color{blue}{977592} constraints.

In the lookup-based method, the constraint cost comes from two parts. First, we need 2×22 \times 2^{\ell} constraints to handle the lookup table inverses sum. Second, we split each activation into b/\lfloor b / \ell \rfloor chunks, resulting in an additional n×b/n \times \lfloor b / \ell \rfloor constraints for the chunk inverses sum. Altogether, the new total constraint number is:

2×2+n×b/=2×215+3864×253/15=65536+61824=1273602 \times 2^{\ell} + n \times \lfloor b / \ell \rfloor = 2×2^{15}+3864× \lfloor 253/15 \rfloor =65536+61824=\color{green}{127360}

Thus, we go from $\color{blue}{977592} $ down to 127360\color{green}{127360} constraints. This gives us more than a 7×7 \times improvement and even more for heavier circuits.

Figure 1 shows the number of constraints before and after applying lookup optimizations: the red graph represents the original Groth16 cost rising linearly with the number of activations, while the blue and green graphs correspond to chunk sizes of 14 and 15, respectively. Notice how both optimized graphs stay near the bottom of the chart, growing only slightly even as activations increase from 15000 to 30000.

Number of constraints before and after applying lookup optimization.
Number of constraints before and after applying lookup optimization.

Benchmarking UltraGroth

Subscribe to Rarimo
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
Verification
This entry has been permanently stored onchain and signed by its creator.
More from Rarimo

Skeleton

Skeleton

Skeleton