Fox Hardware Implementation Plan

July 26th, 2022

Fox is an Ethereum zkRollup using ZK-EVM and its original ZK-FOAKS technology. ZK-STARKS comes from Zero Knowledge - Fast Objective Argument of Knowledges that is faster than Stark with linear proof time, sublinear verification time, transparent, and proof recursion inside.

Hardware Implementation is referred to GPU/FPGA/ASIC zkMiners and different ZK proof systems are based on different cryptographic assumptions and target different computational problems. Their implementations are written in different programming languages and run on a variety of hardware. Arithmetic circuit complexity is a reasonable measure for comparing various proof systems. The main parameters that affect the complexity of the proof system are circuit depth, circuit width (the number of gates in each "level" of the circuit), nondeterministic witness size, and multiplication complexity, i.e., the number of multiplication gates.

Hardware implementation is required to deal with the arithmetic circuit complexity. Fox is trying to design a GPU-friendly algorithm and is likely to use GPU as generalized hardware acceleration instead of FPGA/ASIC to execute the hardware acceleration. The ASIC circuit is "static", it brings composability issues, and it requires strong circuit design expertise, the developer experience is poor. ASIC is not the best choice for implementation from this aspect. FPGA requires smaller capital invested and a shorter manufacturing cycle, which is somehow better than ASIC in this aspect. For considering the huge number of Ethereum GPU miners will significantly transfer to zkRollup, GPU may be a better choice than FPGA.

CPU Implementation

The first step of hardware optimization is to perform a simple implementation of the operation that you would like to accelerate. This can help us to accelerate the computation from the algorithm level.

Python is often a good place to begin when implementing a new algorithm and the most commonly used package for numerical calculation is NumPy. NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

Using NumPy in Python gives functionality comparable to MATLAB since they are both interpreted, and they both allow the user to write fast programs as long as most operations work on arrays or matrices instead of scalars. In comparison, MATLAB boasts a large number of additional toolboxes, notably Simulink, whereas NumPy is intrinsically integrated with Python, a more modern and complete programming language. Moreover, complementary Python packages are available; SciPy is a library that adds more MATLAB-like functionality and Matplotlib is a plotting package that provides MATLAB-like plotting functionality. Internally, both MATLAB and NumPy rely on BLAS and LAPACK for efficient linear algebra computations.

GPU Implementation

The graphics processing unit (GPU), as a specialized computer processor, addresses the demands of real-time high-resolution 3D graphics compute-intensive tasks. By 2012, GPUs had evolved into highly parallel multi-core systems allowing efficient manipulation of large blocks of data. This design is more effective than CPUs for algorithms in situations where processing large blocks of data is done in parallel.

Nvidia GPUs can be programmed in CUDA. CUDA is a parallel computing platform and allows software to use certain types of GPUs for general purpose processing. CUDA is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements, for the execution of compute kernels.

While GPUs are highly parallel, affordable, and easy to buy, copying between host and device memory may incur a performance hit due to system bus bandwidth and latency (this can be partly alleviated with asynchronous memory transfers, handled by the GPU's DMA engine). Therefore, the GPU is not the best hardware candidate.

FPGA Implementation

FPGA is the practical hardware for zero-knowledge proofs acceleration since it can be both more efficient and lower power consumption.

To define the behavior of the FPGA, the user provides a design in a hardware description language (HDL) or as a schematic design. The most common HDLs are VHDL and Verilog as well as extensions such as SystemVerilog. However, in an attempt to reduce the complexity of designing in HDLs, which have been compared to the equivalent of assembly languages, there are moves to raise the abstraction level through the introduction of alternative languages. For example, the Constructing Hardware in a Scala Embedded Language (Chisel) is an open-source hardware description language (HDL) used to describe digital electronics and circuits at the register-transfer level.

Chisel is based on Scala as an embedded domain-specific language (DSL). Chisel inherits the object-oriented and functional programming aspects of Scala for describing digital hardware. Using Scala as a basis allows describing circuit generators. High quality, free access documentation exists in several languages.

Although Chisel is not yet a mainstream hardware description language, it has been explored by several companies and institutions. The most prominent use of Chisel is an implementation of the RISC-V instruction set, the open-source Rocket chip. Chisel is mentioned by the Defense Advanced Research Projects Agency (DARPA) as a technology to improve the efficiency of electronic design, where smaller design teams do larger designs. Google has used Chisel to develop a Tensor Processing Unit for edge computing. Some developers prefer Chisel as it requires 5 times lesser code and is much faster to develop than Verilog.

ASIC Implementation

An application-specific integrated circuit (ASIC) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use. For example, a chip designed to run in a digital voice recorder or a high-efficiency video codec (e.g. AMD VCE) is an ASIC. ASICs have a higher main frequency so they are much faster and cheaper than FPGAs.

Specialized "ASIC" circuits can be designed for different dApps. This is the most traditional way to use zero-knowledge proofs. By customizing the circuit design, the overhead of each DApp will be smaller. However, since the circuit is "static", it brings composability issues, and since it requires strong circuit design expertise, the developer experience is poor. ASIC is not the best choice for implementation from this aspect.

Subscribe to Fox Tech

Receive the latest updates directly to your inbox.

Mint this entry as an NFT to add it to your collection.

Verification

This entry has been permanently stored onchain and signed by its creator.

Arweave Transaction

4qrW7RN-_x65EOs…DceeRWcSVMBi8uE

Author Address

0x9918a0c87861D5B…c59604B3C01d2C8

Content Digest

a2MYvsW9PSgGptT…9dScG9Ho_ryKDsw