Implementing Lookups in Groth: The second secret ingredient to r…

Implementing Lookups in Groth: The second secret ingredient to reduce Bionetta’s constraints

June 2nd, 2025

Illia Dovhopolyi, Artem Sdobnov, Dmytro Zakharov

Acknowledgement

The UltraGroth proving system explained here is an adapted for real use cases version of the protocol initially described in Merlin404's article on HackMD. Huge credit to their work, which inspired much of what follows.

Prerequisites

We assume you already understand the basics of modern zero-knowledge (ZK) constructions. This article aims to build intuition around the UltraGroth proving system, and deep, formal knowledge of Groth16 isn't strictly necessary. However, familiarity with our previous blogs on the Bionetta framework will be beneficial, as examples here utilize Bionetta circuits.

Activation Functions are too expensive

During the development of Bionetta, one of our most significant challenges was optimizing the constraint cost of activation functions like ReLU. Each ReLU operation costs at least $253$ constraints due to the bit decomposition of input signals. Considering neural networks rely heavily on non-linear activations, this quickly becomes prohibitively expensive. As an example, consider our LeakyReLU implementation:

template LeakyReLUwithCutPrecision(...) {  
    signal input inp;  
    signal output out;  

    var absBits[253] = abs_in_bits(inp);

    signal absInBits[253];  

    var abs = 0;  
    var pow = 1;  
    for (var i = 0; i < 253; i++) {  
        absInBits[i] <-- absBits[i];  
        // We need to constraint each bit is either 0 or 1 separately :(
        absInBits[i]*(1 - absInBits[i]) === 0;
        abs += pow * absInBits[i];  
        pow += pow;  
    }
    (abs - inp)*(abs + inp) === 0;

    // 3 more constraints for max(0, inp) and precision cut
}

Our "zkLiveness" model for detecting whether the input photo corresponds to the real person (and not to, say, a photo from someone’s device), requires $3864$ non-linear activations, totaling at least $3864 \times 253 = 977592$ constraints.

Initially, reducing constraints here seemed impossible — we couldn't lower the activation count or simplify bit-checking. The further research showed that one of the prominent solutions was introduced in the PlonKish proving systems, where repetitive computation of the same function is verified using lookup tables.

However, we’re sticking with Groth16 because of its other Bionetta-specific benefits like “zero” constraint linear layers. To get the best of both worlds, we developed UltraGroth — a Groth-based proving system that brings lookups to our circuits😎

Efficient Lookup Argument

The idea behind lookup tables is straightforward. Rather than checking each bit individually, we can split activation function inputs into larger chunks and verify them against a predefined lookup table containing all values from 0 to $2^\text{chunk\_size} - 1$ .

To do this, let’s first rewrite the LeakyReLU function from the previous example so that it splits the input signal into chunks and outputs them so that we can check their range later:

template LookupReLU(CHUNK_NEW, CHUNK_OLD, CHUNK_SIZE) {  
    assert (CHUNK_NEW <= CHUNK_OLD);  
  
    signal input inp;  
    signal output out;  
  
    // Now the ReLU outputs chunks so we can handle range check later
    signal output absInChunks[253 \ CHUNK_SIZE];  
  
    var absChunks[253 \ CHUNK_SIZE];

    for (var i = 0; i < 253 \ CHUNK_SIZE; i++) {  
        absChunks[i] = (value >> (i * CHUNK_SIZE)) & ((1 << CHUNK_SIZE) - 1);  
    }  
  
    var abs = 0;  
    var pow = 1;  
    for (var i = 0; i < 253 \ CHUNK_SIZE; i++) {
        // We don't need to constraint each bit anymore
        absInChunks[i] <-- absChunks[i];
        abs += pow * absInChunks[i];  
        pow *= 1<<CHUNK_SIZE;  
    }  
    (abs - inp)*(abs + inp) === 0;

	 // 3 more constraints for max(0, inp) and precision cut
}

In the top-level circuit, we combine all chunks from layers with non-linear activations into a single signal vector. Although each activation produces multiple chunks (CHUNK_FOR_NUM), their order doesn't matter here:

var CHUNK_SIZE = 15;
var CHUNK_FOR_NUM = 253 / CHUNK_SIZE;
var CHUNK_NUM = /* sum calculated based on network layers */;
var CHUNK_BUSY = 0;  
signal chunks[CHUNK_NUM];  

// Some layer with non-linearities
component layer1Comp = LookupEDLightConv2D(...);  
layer1Comp.inp <== image;  

// Aggregating chunks from activation functions
for (var i = 0; i < layer_size * CHUNK_FOR_NUM; i++) {  
    chunks[i + CHUNK_BUSY] <== layer1Comp.chunks[i];  
}
CHUNK_BUSY += layer_size * CHUNK_FOR_NUM;

To complete this optimization, we must ensure each aggregated chunk falls within the expected range. This verification is done through a lookup argument.

Consider a lookup table $L = (\ell_0, \ell_1, \ldots,\ell_n) = (0, 1, \dots, 2^{\text{chunk\_size}})$ representing the range [0; $2^{\text{chunk\_size}} - 1]$ . Our goal is to prove that each chunk $c_i$ in the vector $C = (c_0, c_1, \ldots, c_m$ ) exists in $L$ . Thus, all chunk values must fall within the specified range.

To constraint this, we first count how many times each value appears in the chunks, creating a multiplicity vector $M = (m_0, m_1, \ldots, m_n)$ , where $m_i$ is the occurrence count of value $i$ in $C$ .

Finally, using a random challenge point $\mathsf{rand}$ , we perform the following check:

s_1 =\sum_{i=0}^{m} \frac{1}{\mathsf{rand} - c_i}, \quad s_2 = \sum_{i=0}^{n} \frac{m_i}{\mathsf{rand} - \ell_i}, \quad s_1 \stackrel{?}{=} s_2

If this equality holds, all chunk values are confirmed to be within the lookup table. Detailed reasoning and proof are available in this paper.

This is how this check looks inside the Bionetta circuit:

signal input rand;
signal chunks[CHUNK_NUM];  
  
// Aggregating chunks

// Counting m_i coefficients
var m_temp[1<<CHUNK_SIZE];  
for (var i = 0; i < CHUNK_NUM; i++) {  
   m_temp[chunks[i]]++;  
}  

signal m[1<<CHUNK_SIZE];  
m <-- m_temp;  

var s1 = 0;  
var s2 = 0;  
signal inv1[CHUNK_NUM];  
signal inv2[1<<CHUNK_SIZE];  
signal prod[1<<CHUNK_SIZE];

// Counting s1
for (var i = 0; i < CHUNK_NUM; i++) {  
   inv1[i] <-- (chunks[i] + rand) != 0 ? 1 / (chunks[i] + rand) : 0;  
   inv1[i] * (chunks[i] + rand) === 1;  
   s1 += inv1[i];  
}  

// Counting s2
for (var i = 0; i < 1<<CHUNK_SIZE; i++) {  
   inv2[i] <-- (i + rand) != 0 ? 1 / (i + rand) : 0;  
   inv2[i] * (i + rand) === 1;  
   prod[i] <== inv2[i] * m[i];  
   s2 += prod[i];  
}  

s1 === s2;

Why UltraGroth?

The lookup argument introduced above hinges on a single field element $\text{rand}$ . If the prover were free to choose $\text{rand}$ , it could maliciously pick a value that makes $\sum_{i=0}^{m} \frac{1}{\text{rand} - c_i} \stackrel{?}{=} \sum_{i=0}^{n} \frac{m_i}{\text{rand} - l_i}$ hold even when some chunk $c_i$ is outside the table $L$ .

UltraGroth prevents this by deriving the challenge from a previously committed part of the witness. We split the private witness into two rounds:

After the prover computes the first commitment $C_1$ , the verifier derives $\text{rand} = \text{Hash}(C_1)$ .

In Groth16, the verifier checks the proof using a single pairing equation:

e(A, B) = e([\alpha]_1, [\beta]_2) \cdot e(C, [\delta]_2) \cdot e(IC, [\gamma]_2)

In UltraGroth, this equation changes slightly. Instead of a single commitment $C$ , the prover sends two commitments: $C_1$ and $C_2$ . The verifier updates the pairing check to:

e(A, B) = e([\alpha]_1, [\beta]_2) \cdot e(C_1, [\delta_1]_2) \cdot e(C_2, [\delta_2]_2) \cdot e(IC, [\gamma]_2)

Additionally, the verifier does not accept a random value from the prover. Instead, it derives the randomness on its own: $\text{rand}=\text{Hash}(C_1)$ . This derived value is then inserted into the public inputs commitment (i.e., into $IC$ ) so that it's part of the final verification check.

To make the scheme practical, we modified the fastest open-source Groth16 stack rather than starting from scratch.

https://github.com/rarimo/ultragroth — UltraGroth prover (fork of https://github.com/iden3/rapidsnark).
https://github.com/rarimo/ultragroth-snarkjs — UltraGroth zkey and Solidity verifier generator (fork of https://github.com/iden3/snarkjs).

Note that UltraGroth is compatible with Groth16’s powers-of-tau, so you do not need a new trusted setup.

How Many Constraints Does This Save?

Let's estimate the savings. Assume the model has $n = 3864$ activations, each with a $b = 253$ -bit input. Instead of checking each bit separately, we split inputs into chunks of size $\ell = 15$ bits.

In the original method, every activation required $b$ constraints to perform a full bit decomposition. Thus, the total number of constraints was simply $n \times b = 3864 \times 353 = \color{blue}{977592}$ constraints.

In the lookup-based method, the constraint cost comes from two parts. First, we need $2 \times 2^{\ell}$ constraints to handle the lookup table inverses sum. Second, we split each activation into $\lfloor b / \ell \rfloor$ chunks, resulting in an additional $n \times \lfloor b / \ell \rfloor$ constraints for the chunk inverses sum. Altogether, the new total constraint number is:

2 \times 2^{\ell} + n \times \lfloor b / \ell \rfloor = 2×2^{15}+3864× \lfloor 253/15 \rfloor =65536+61824=\color{green}{127360}

Thus, we go from $\color{blue}{977592} $ down to $\color{green}{127360}$ constraints. This gives us more than a $7 \times$ improvement and even more for heavier circuits.

Figure 1 shows the number of constraints before and after applying lookup optimizations: the red graph represents the original Groth16 cost rising linearly with the number of activations, while the blue and green graphs correspond to chunk sizes of 14 and 15, respectively. Notice how both optimized graphs stay near the bottom of the chart, growing only slightly even as activations increase from 15000 to 30000.

Benchmarking UltraGroth

Subscribe to Rarimo

Receive the latest updates directly to your inbox.

Mint this entry as an NFT to add it to your collection.

Verification

This entry has been permanently stored onchain and signed by its creator.

Arweave Transaction

iImP6xkrd7KbR2p…BPVEwnq2NyDMlh0

Author Address

0x90699B5A52Bccbd…d039a33fb2D1AF6

Content Digest

T4lHiBlo7VoYp-5…pwkZbanOOwjeZI4