Fox is an Ethereum zkRollup using ZK-EVM and its original ZK-FOAKS technology. ZK-STARKS comes from Zero Knowledge - Fast Objective Argument of Knowledges that is faster than Stark with linear proof time, sublinear verification time, transparent, and proof recursion inside.
We believe that hardware acceleration will be an essential component in enabling blockchains protocols to achieve the scalability required for practical use. Today blockchain protocols are limited by their ‘on-chain’ compute and storage capacities, as well as by their networking bandwidth. We believe that through the improvement of ‘off-chain’ hardware and client-side proof generation, we can drastically improve the performance of blockchain networks.
The key reason why ZK-Rollup needs hardware acceleration is that ZKP is a very expensive and complex operation so it results in longer proof generation times. Even that it is also common for certain operations to be unfriendly to ZK (for example, bitwise operations used in SHA or Keccak), resulting in long generation times for proofs that might be very cheap operations on a classic computer. The proof generation process may vary depending on the proof system, but the bottleneck is always:
Multiplications of large number vectors (fields or group elements), especially multiscalar multiplications of variable cardinality and fixed cardinality (MSM); Or,
Fast Fourier Transform (FFT) and inverse FFT (although there are techniques for FFT-free proof systems).
In systems with both FFT and MSM, about 70% of the time for generating proofs is spent on MSM, and the rest of the time is dominated by FFT.
Both MSM and FFT are slow, but both have ways to improve performance:
MSM is embarrassingly parallel and can be accelerated by running them on multiple threads. But even on hundreds of kernels, if the length of each element vector is 225 (or 33 million elements, a conservative complexity for an application like zkEVM), multiplication can still end up taking a lot of time. This means repeating the same operations often and using up most of the available memory on the device. In short, MSM requires a lot of memory and is slow even when highly parallelized.
FFT relies heavily on the frequent shuffling of data at runtime. This makes it difficult for them to accelerate by distributing the load across computing clusters, as DIZK shows. In addition, they require a lot of bandwidth to run on hardware. Shuffling means that you need to load and unload elements "randomly," for example, >100GB data sets on hardware chips with 16GB or less memory. While operations on hardware are very fast, the time it takes to load and unload data over the network can end up slowing operations down significantly.
In short: MSM has predictable memory access and allows a lot of parallelisms, but they are still expensive because of the raw computation and amount of memory required.
FFTS have random memory access, which makes their hardware unfriendly and naturally difficult to run on distributed infrastructures.
What is the best hardware to use highly optimized MSM and FFT algorithms to speed up ZKP generation? The acceleration technology can be implemented on a variety of hardware technologies: GPU, FPGA, or ASIC. But which is the best choice? We expect that the winning players in the market will be companies that focus on GPUs over ASICs or FPGAs.
However, if only one or more ZK L1 or L2 eventually achieve dominant scale, and ZK demonstrates that the system is stable near a single implementation, then the likelihood of ASICS outperforming FPGAs may be higher. But if that happens, we may still be a few years away. then the ASICs may win over FPGAs and these two may win over GPUs.
This hardware acceleration can take two forms: 1) Better utilization of existing resources (e.g CPU, GPU) through improved software and algorithmic implementations and 2) The development of custom hardware (e.g. FPGA and ASIC) and novel algorithms suited for their hardware configuration and available resources.
We believe hardware acceleration will be an important part of enabling blockchain protocol to achieve the scalability required for real-world use. Today's blockchain protocol is limited by its "on-chain" computing and storage capacity, as well as its network bandwidth. We believe we can dramatically improve the performance of blockchain networks by improving "off-chain" hardware and client-proof generation.