ZK Section 9 — ETHSANFRANCISCO 2022

November 22nd, 2022

Introduction

Big data has provided tremendous insights that enable machine learning and neural networks. However, artificial intelligence models need extra caution when applied to real-world scenarios as AI models could cause incorrect or misleading results in critical applications. Furthermore, AI model execution could also take up too many resources including memory, disk, and CPU/GPU. Not to mention that the results themselves cannot be reproduced for verification in most cases. All these shortages lead developers to find a generalized solution known as model quantization. After the quantization process, an AI model can reproduce execution for inference. With zero-knowledge proof (ZKP) system, AI models can Even be verified easily and quickly in constant time.

Currently, the proving time in zk-SANRKS is too slow to be applied to real-world AI applications and many projects have been working to solve this issue. vCNN[1] proposed a new efficient verifiable convolutional neural network framework to increase proving performance in 2020. The following ZEN[2] team proposed a SIMD compiler optimization that encodes multiple 8bit integers in large finite field elements without overwhelming extraction cost. In the meanwhile, zkCNN[3] created a ZKP scheme that allowed the owner of the CNN model to prove to others that the prediction of a data sample could be indeed calculated by the model, without leaking any information about the model itself. Mystique[4] presented an improved ZK protocol for matrix multiplication that yielded a 7* improvement compared to the state-of-the-art. And in 2021, zk-ml[5] published a demo that implement a zk-SNARK circuit where the proof verified that a private model could have a certain accuracy under a public dataset. In 2022, the famous zk-SNARK language, known as circom, created a library for machine learning in circomlib-ml[6] and zk-mnist[7] for hand-drawn digit recognition based on blockchain DApp.

AIGC Design

AIGC, also known as AI-Generated Content, has come into fashion recently. This service provides AI-powered and deep-learning-based text-to-image generation capabilities. Projects like DALL-E 2, Stable Diffusion, and Midjourney are all working on AIGC with slightly different focuses. The CortexLabs believes that It would be beneficial if we can bring these AI fashions into the world of Web3.

The three key stakeholders (AI creators, ML engineers, ZKP developers) don’t talk to each other at the moment. Yet, AI + ZKP + Blockchain is significant.

AI art is gaining traction, AI films have won major international awards, but AI creators lack guidance for Web3. Meanwhile, machine learning models are hard to train and generate astonishing results. Also, ML engineers don’t understand the methods and requirements to convert an algorithm to a blockchain-friendly format. Lastly, ZKP developers lack visibility into the capabilities of AIGC models and the performance/ecosystem/adoption gap.

In ETHSANFRANCISCO 2022 competition, we proposed the ZK Section 9 for on-chain artificial intelligence content-generating(AIGC) with Succinct ZK Proofs. As of this writing, we won the following prizes:

ETHSanFrancisco Finalist
ENS’s Integration Bounty
Optimism’s Top 10 Deployed

With ZK Section 9, one can generate an image with an AI model, prove it with ZKP, and submit the output content on a blockchain contract. The blockchain can easily verify that the image is generated by the composer. For CortexLabs, ZK Section 9 is the main cornerstone to step into zkCVM: a complete zkRollup solution for on-chain AI models.

Here is the presentation: https://docs.google.com/presentation/d/1CIxGfM_oySgWkgdNZqh-hoeH8C8zJDxnEpHx_VK6-KQ/.

Machine learning models are huge and divergent, huge for complex models e.g. FastBERT[1]: ~1800 MFLOPS (million floating point operations), divergent for model types e.g. NLP, CNN, GAN. ZKP is hard to generate or operate: ceremony, contribution, beacon, etc. And do not forget the speed. Transforming huge and divergent AI models into a ZKP-friendly format is a challenging job. Translator and the following on-chain AI model proof brings several problems to solve, we have started to overcome the difficulties, though it is not perfect until now. More solution details will be introduced in the next section.

Focus

As mentioned at the beginning, the translation from the AI model into ZKP circuits has been challenging for developers. There are lots of key points to talk about.

Model Quantization

AI quantization has been developed and optimized over the years. Int8 quantization has become a popular approach for limited memory, computing resources, and power. The quantization trend is not only for machine learning frameworks such as TensorFlow, PyTorch, and TVM but also for hardware toolchains. For example, the USB Accelerator TPU from Coral, Inference devices with int8 precision including the Intel acceleration DL Boost instructions, NVIDIA TensorRT, Xilinx DNNDK, and many other int edge devices for machine learning.

These requirements can be considerable, and much research has been investigated. But there is no common scheme for all-in-one quantization frameworks. Many target devices and frameworks have been applied to model inference, but each executor has a different focus on quantization goals. Most quantizations are even not full-integers for accelerated purposes.

TVM concentrates on the target device deployment including quantization. The internal relay representation of TVM is a place to start for quantized models. And its front end can connect with many ML frameworks including ONNX and PyTorch.

Hence, we use PyTorch to train the AIGC model and transform it into the TVM IR representation. Then, we quantize the model and dump it into the disk.

Translator

The pre-posted ZKML papers or projects are almost model-specified schemes, which are various in zk-SNARKS implementation and can not be adapted to arbitrary AI models. So we designed a uniform model to circom translator procedure:

parse from the TVM intermediate relay IR, which is generated from another model framework such as PyTorch, TensorFlow, etc.
construct a temporary model representation, which can dump to or load from disk.
generate consistent circom code for zk-SNARK.

When diving into the translator, you will find it’s not that simple. The parser should accept arbitrary IR representation, which is dumped from TVM QAT models. It’s hard to leverage between the model universality and the tolerance for error. As for the self-constructed internal symbol, we refer to the MxNet symbol graph representation, this representation is easy and scalable to apply the model transformation.

The most workload is located at the circom generator. First, parse all existing circom ML operators and store the necessary information about the operators such as dependent file path, operator arguments, and input/output. Then it’s time to map the last internal symbol to correct circom operator generators. Symbol and generator are not one-to-one mappings, we had to write valid map rules for all supported operators. And the last is to inject appropriate circom code into the template.

Circom ML Library

In this competition case, the quantization integer-only target is the ML circom operators which are like the Intel DL Boost instructions. We refer to the cvm-runtime to write ML-complete circom operators. The ZKP circom constraints are slightly different but mostly the same as the CVM operators’ design.

Besides, we implement an incomplete inference engine for the circom ML operators. It’s necessary for the circom procedure correctness verification.

ZKP Optimization

ZKP constraints are too huge to generate proof in a limited time. Previous papers have designed lots of methods like SIMD. But these optimizations were not considered due to the game time limit. Our attempts were made in the model translator to reduce the constraints in generated circom.

Contract Optimization

For the last step, there is one trouble needing to be solved. That is the contract Gas limit on the blockchain. The Ethereum block has a maximum of 8000K Gas limit to avoid Turing halting and block syncing delay problems. But it’s far from enough for a normal AIGC model to generate an image size of 32 * 32. There are several solutions: one is using the SIMD like the ZEN to embed many output elements into one field. Another is truncating model outputs into different parts and deploying contracts separately in many blocks. We tried the latter method.

Future Work

We have made some key innovations in the AIGC design, but it’s not enough for the final target. Some points are implemented and some are ignored or not perfect enough, due to the time limit. Here we list a few scalable works that can extend in the future.

ZK-Friendly Quantization

In the ZK Section 9 project, quantization is applied with the TVM and PyTorch’s internal quant aware training (QAT) method. The QAT target is designed for specific devices such as ML accelerators, this design can not be well adapted to zero-knowledge proof. We have to make a temporary symbol representation in our translator for applying many fusion passes to ZK-compatible quantization. These passes may include batch size adjustment, shape adaptor for circom operators, more compact constant fusion, etc.

Those existing ML frameworks have different backend representations and different implementations. For our united quantization goal, we hope a consolidated model intermediate representation (IR) comes true. If not, we will choose one that has the best universality, sufficient stability, and enough capacity. Then the ONNX is placed in front of us. We had chosen NNVM before because it is the oldest and stablest machine learning backend. However, most modern AI frameworks discard it and accept the ONNX-compatible scheme as the representation, for flexibility, and capacity purposes.

As for the quantization procedure, many papers have been researched on this topic. Quantization is mainly divided into quantization-aware training (QAT) and post-training quantization (PTQ) according to the occasion and cost of quantizing. The CortexLabs have made the PTQ method in MRT and gotten a decent accuracy loss for an int32 maximum quantization. In the future, transforming the PTQ from MxNet into ONNX format is necessary, also for the QAT implementations.

Inference Engine

In this project, we have made a rough inference engine in python. The engine is written in NumPy format and a few MxNet ndarray reference codes. This implementation is not sufficient for cross-validation, for the lack of code detail examination and test data. Luckily, the CortexLabs have ever published a well-tested integer-only inference engine: cvm-runtime, which is written in C++ and implemented with CPU, GPU, and formalization version. In the future, more test cases and operators will be applied to this inference engine for security and scalability.

Optimizations

ZKP-generated constraints are naturally too huge, this is a big problem for ZKP applications. The more constraints number ZKP generated, the more time and computing resources the ZKP proof procedure will cost. A simple linear layer in the AI model will lead to ~10M constraints, the subsequent snarkJS prove process almost consumes 200G memory and one-hour proving time in a 96 core 2.7 GHz CPU, 1.5T Memory machine! Moreover, a naive ResNet has 18 layers, and the relationships between layer constraints are multiplication instead of addition. So optimizations are imperative and fortunately, previous researchers have put hard work on these.

SIMD[2]

Embed multi-input elements into one big int256 field. This will make use of stranded encodings for matrix multiplication.

sign-bit grouping[2]

Use unsigned int quantization instead of an integer, this quantization is more R1CS friendly. ZKP uses elliptic curve cryptography (ECC) to generate a proof, all operation data are the 256 unsigned bit integers. Signed operators can be applied for ZKP, but the constraints will be far larger than unsigned ones.

remainder-based verification[2]

Sometimes division operators cannot be avoided in AI models, which are more complex than addition, subtraction, or multiplication. Use an extra matrix R to store the division remainder, and utilize this remainder to avoid the division.

Project Architecture

In this section, we briefly describe the project architecture, the GitHub directory structure is as follows:

├── circuits/
│ ├── Arithmetic.circom
│ ├── circomlib/
│ ├── circomlib-matrix/
│ ├── operators
│ └── util.circom
├── contracts/
├── frontend/
├── integer_only_gan/
├── main.py
├── python/
└── README.md

Model training and quantization to TVM are located in interger_only_gan.
Model translator and inference engine are located in python and the cmd is the main.py, print usage with command -h. Circom ML operators are located circuits/operators. The frontend demo is located in frontend. Blockchain contract debug and deploy code is located in contracts.

Future Applications

Our goal for this competition is AIGC, but the application future is much more than that. Like the infinite rooms of AI models, our project has a profound influence on expanding the capabilities of blockchains:

(Shown) Vision models -> AIGC
Language models -> chatbot, writing assistant
Linear models and decision trees -> fraud detection, Sybil attack prevention
Multi-modal models -> recommender systems

And some scenarios are considered for reference:

Converting machine learning models to zero-knowledge proofs, enabling people in underdeveloped countries to generate income with on-chain AIGC or ZK-ML-enhanced trustless freelancing.
Governance tech in consensus is tricky. Our tools can enhance existing voting and Sybil resistance techniques by allowing an ML-based approach: imagine a self-evolving DAO smart contract powered by a neural network.
Wash-trading detectors in DEX.
Provable biometric IDs. Like in Worldcoin.
AI Oracles which can verify off-chain world data. Like steps, health data, environmental data, etc.
AI competitions. People can commit their model weights’ hash first and the inputs can be revealed later.
GameFi NPCs. We can have AI characters instead of plain old scripted NPCs.
Dynamic tokenomics. With AI-controlled tokenomics, we can have potentially better algo-stablecoins.
AI DAOs. AI can participate in DAO’s decision makings.
Automated traders. Although on-chain AI will most likely fail in competition with real-world Hedge funds, it can be a showcase.
Anti-fraud in DeFi. Like in lending protocols and insurance protocols.
NFTs. Like what we’ve built in this project.
Self-evolving Blockchains. Ultimately, ZKML can be used to determine crucial parameters of blockchain, such as block interval, block size, and block rewards, based on collected data.

More possibilities are in your imagination.

References

Subscribe to Cortex Labs

Receive the latest updates directly to your inbox.

Mint this entry as an NFT to add it to your collection.

Verification

This entry has been permanently stored onchain and signed by its creator.

Arweave Transaction

xt1mZVhky2bZxEd…AUX5PKN54wZqcN4

Author Address

0xb7aAFdE89259d45…F51d33E7fbeA8FB

Content Digest

I-65Fb0n6qaPz8A…-KgIxWG_vhsyUGI