Proving machine learning (ML) model inference via zkSNARKs is poised to be one of the most significant advancements in smart contracts this decade. This development opens up an excitingly large design space, allowing applications and infrastructure to evolve into more complex and intelligent systems.
By adding ML capabilities, smart contracts can be made more autonomous and dynamic, allowing them to make decisions based on real-time on-chain data, rather than just static rules. Smart contracts would be flexible and adaptable to various scenarios, including those that may not have been anticipated when the contract was initially created. In short, ML capabilities will broaden the automation, accuracy, efficiency, and flexibility of any smart contract we put on-chain.
In many ways, it is surprising that smart contracts do not use embedded ML models today given the prominence of ML in a majority of applications outside of web3. This absence of ML was largely due to the high computational cost of running these models on-chain. For example, FastBERT, a compute-optimized language model, uses ~1800 MFLOPS (million floating point operations), which would be infeasible to run directly on the EVM.
When considering the application of ML models on-chain, the primary focus is with the inference phase: applying the model to make predictions over real-world data. In order to have ML-scaled smart contracts, contracts must be able to ingest such predictions, but as we mentioned earlier, it’s infeasible to run the model directly on the EVM. zkSNARKs give us a solution: anyone can run a model off-chain and generate a succinct and verifiable proof showing that the intended model did in fact produce a particular result. This proof can be published on-chain and ingested by smart contracts to amplify their intelligence.
In this piece, we will:
Review the potential applications and use cases for on-chain ML
Explore the emerging projects and infrastructure building at the heart of zkML
Discuss some of the challenges of existing implementations and what the future of zkML could look like
Machine learning (ML) is a subfield of artificial intelligence (AI) that focuses on developing algorithms and statistical models that enable computers to learn from and make predictions or decisions based on data. ML models typically have three primary components:
Training Data: A set of input data that is used to train a machine learning algorithm to make predictions or classify new data. Training data can take many forms, such as images, text, audio, numerical data, or a combination of these.
Model Architecture: The overall structure or design of a machine learning model. It defines the types and number of layers, activation functions, and connections between nodes or neurons. The choice of architecture depends on the specific problem and data being used.
Model Parameters: Values or weights that the model learns during the training process to make predictions. These values are adjusted iteratively through an optimization algorithm to minimize the error between predicted and actual outcomes.
Models are produced and deployed in two phases:
Training Phase: During training, the model is exposed to a labeled dataset and adjusts its parameters to minimize the error between predicted and actual outcomes. The training process typically involves several iterations or epochs, and the accuracy of the model is evaluated on a separate validation set.
Inference Phase: The inference phase is when a trained machine learning model is used to make predictions on new, unseen data. The model takes in input data and applies the learned parameters to generate an output, such as a classification or regression prediction.
zkML today is primarily focused on the inference stage of ML models rather than on the training phase primarily due to the computational complexity of verifying training in-circuit. zkML’s focus on verifying inference is not a limitation though: we expect some very interesting use cases and applications to be produced from the inference phase.
There are four possible scenarios for verified inference:
Private Input, Public Model. A Model Consumer (MC) may wish to keep their inputs private from a Model Provider (MP). For example, a MC may wish to prove the result of a credit-scoring model to a lender without disclosing their personal financial information. This could be done using a pre-commitment scheme and running the model locally.
Public Input, Private Model. A common issue with ML-as-a-Service is that the MP may wish to hide their parameters or weights to protect their IP, and the MC wants to verify that the generated inferences do in fact come from the specified model in an adversarial setting. Think of it this way: a MP has an incentive to run a lighter model to save on costs when serving inferences to a MC. Using a commitment of the model weights on-chain, a MC can audit a private model at any time.
Private Input, Private Model. This scenario arises when the data being used for inference is highly sensitive or confidential, and the model itself is hidden to protect IP. An example of this might include auditing a healthcare model using private patient information. Compositional techniques in zk or use of multi-party computation (MPC) or variations of FHE can be used to service this scenario.
Public Input, Public Model. When all aspects of the model can be public, zkML solves for a different use case: compression and verification of off-chain computation to an on-chain environment. For larger models, it is more cost-effective to verify a succinct zk proof of an inference than to re-run the model themselves.
Verified ML inference opens up a new design space for smart contracts. Some crypto-native applications include:
Verifiable Off-Chain ML Oracles. Continued adoption of generative AI may help push industries to implement signature schemes for their content (e.g. a news publication signing an article or image). Signed data is ZK-ready and makes the data composable and trustworthy. ML models can process this signed data off-chain to make predictions and classifications (e.g. classifying the outcome of an election or a weather event). These off-chain ML oracles could be used to trustlessly settle real-world prediction markets, insurance protocol contracts, and more by verifying the inference and publishing the proof on-chain.
ML-parameterized DeFi applications. Many facets of DeFi could be more automated. For example, a lending protocol could use an ML model to update parameters in real time. Today, lending protocols primarily trust off-chain models run by organizations to determine the collateral factor, LTV, liquidation threshold, etc., but a better alternative could be community-trained open-source models that can be run and verified by anyone.
Automated trading strategies. A common way of showcasing the return profile of a financial model strategy is for the MP to provide various backtests to investors. However, there is no way to verify the strategist is following the model when executing a trade - instead, the investor must trust the strategist is actually following the model. zkML offers a solution where the MP can provide proof of the financial model inference when deploying into a specific position. This could be particularly useful for DeFi managed vaults.
A decentralized, trustless implementation of Kaggle. A protocol or marketplace could be created that allows a MC or other interested party to verify the accuracy of a model without the MP disclosing the model weights. This can be useful for selling models, running competitions around model accuracy, etc.
Decentralized prompt marketplaces for generative AI. Prompt creation for generative AI has evolved into a sophisticated craft where the best output generating prompts are often complex with many modifiers. External parties could be willing to purchase these sophisticated prompts from a creator. zkML can be used in two ways here: 1) verifying the outputs of a prompt to ensure to a potential purchaser that the prompt does in fact create the desired images, and 2) allowing the prompt owner to maintain ownership of the prompt after a purchase by keeping it obfuscated from the purchaser while still generating verified images for them.
Replacing the private key with privacy-preserving biometric authentication. Private key management remains one of the largest frictions in web3 UX. Abstracting the private key via facial recognition or other unique factors is one possible solution from zkML.
Fair airdrops and contributor rewards. An ML model could be used to create detailed personas of users that determine airdrop allocations or contribution rewards based on multiple factors. This can be particularly powerful when combined with an identity solution. In this scenario, one possibility could be having users run an open-source model that evaluates their participation in an application along with higher context participation like governance forum posts to infer an allocation for them. They then provide this proof to the contract to receive their token allocation.
Filtering for web3 social media. The decentralized nature of web3 social applications will result in higher volumes of spam and malicious content. Ideally, a social media platform could use an open-source ML model that is agreed upon by the community and publish proofs of the model’s inference when electing to filter a post. For more on this, check out Daniel Kang’s zkML analysis of the Twitter algorithm here.
Advertising / Recommendations. As a social media user, I may be willing to view personalized advertisements but wish to keep my preferences and interests private from advertisers. I could elect to run a model on my tastes locally that feeds into media applications to serve me content. In doing this, advertisers in this scenario may be willing to pay for the end-user to do this, however it is likely these models would be much less sophisticated than targeted advertising models in production today.
In-game Economic Rebalancing. ML models could be used to dynamically adjust token issuance, supply, burn, voting thresholds, etc. One possible model would be an incentivized contract that rebalances an in-game economy if a certain rebalancing threshold is met and a proof of inference is verified.
New types of on-chain games. Co-operative human versus AI games and other innovative takes on on-chain games could be created where a trustless AI model serves as a non-playable character. Each action the NPC takes is published to the chain with a proof that anyone can verify to determine the correct model is being run. In the case of Modulus Labs’ Leela vs. the World, the verifier wants to ensure the stated 1900 ELO AI is picking the chess moves and not Magnus Carlson. Another example is AI Arena, a Super Smash Brothers style AI fighting game. Players in high stakes competitive environments would want to be sure that there is no interference or cheating involved with their trained models.
The ecosystem for zkML can be broadly broken down into four primary categories:
Model-to-Proof Compilers: Infrastructure that compiles a model from an existing format (e.g. Pytorch, ONNX, etc) into a verifiable computational circuit.
Generalized Proving Systems: Proving systems built to verify an arbitrary computational trace.
zkML-Specific Proving Systems: Proving systems specifically built to verify an ML model’s computational trace.
Applications: Projects working on unique zkML use cases.
When looking at the zkML ecosystem, most attention has been given to creating model-to-proof compilers. Generally, these compilers translate high-level ML models written in Pytorch, Tensorflow, or the like into zk circuits.
EZKL is a library and command-line tool for doing inference for deep learning models in a zk-SNARK. With EZKL, you can define a computational graph in Pytorch or TensorFlow, export it as a ONNX file with some sample inputs in a JSON file, and point EZKL to these files to generate a zkSNARK circuit. With the latest round of performance improvements, EZKL can now prove an MNIST-sized model in about 2 seconds and 700MB of RAM. EZKL has seen some significant early adoption thus far, being used as infrastructure for various hackathon projects to date.
Cathie So’s circomlib-ml library contains various ML circuit templates for Circom. Circuits include some of the most common ML functions. Keras2circom, also built by Cathie, is a python tool that transpiles a Keras model into a Circom circuit using the underlying circomlib-ml library.
LinearA has developed two frameworks for zkML: Tachikoma and Uchikoma. Tachikoma is used to convert neural networks into an integer-only form and generate a computational trace. Uchikoma is a tool that converts TVM's intermediate representation into programming languages that don't support floating point operations. LinearA plans to support Circom, which uses field arithmetic, and Solidity, which uses signed and unsigned integer arithmetic.
Daniel Kang’s zkml is a framework for constructing proofs of ML model execution in ZK-SNARKs based on his work in the Scaling up Trustless DNN Inference with Zero-Knowledge Proofs paper. As of the time of this writing, it is capable of proving an MNIST circuit using around 5GB of memory and around 16 seconds to run.
On the more generalized model-to-proof compilers, there is both Nil Foundation and Risc Zero. Nil Foundation’s zkLLVM is an LLVM-based circuit compiler and is capable of verifying computational models written in popular programming languages such as C++, Rust, and JavaScript/TypeScript, among others. It is generalized infrastructure versus some of the other model-to-proof compilers noted here, but it is still suitable for complex computations like zkML. This can be particularly powerful when combined with their proof market.
Risc Zero has built a general-purpose zkVM tagetting theopen-source RISC-V instruction set, hence supporting existing mature languages such as C++ and Rust, as well as the LLVM toolchain. This allows for seamless integration between host and guest zkVM code, similar to Nvidia's CUDA C++ toolchain, but with a ZKP engine in place of a GPU. Similar to Nil, it is possible to verify the computational trace of an ML model using Risc Zero.
Improvements in proving systems have been the primary enabler in bringing zkML to fruition, specifically the introduction of custom gates and lookup tables. This is primarily due to ML’s reliance on nonlinearities. In short, nonlinearities are introduced through activation functions (e.g. ReLU, sigmoid, and tanh), which are applied to the outputs of linear transformations within a neural network. These nonlinearities are challenging to implement in zk circuits due to limitations around mathematical operation gates. Bitwise decomposition and lookup tables can help solve this problem by precomputing the possible outcomes of a nonlinearity into a lookup table, which interestingly is much more computationally efficient in zk.
Plonkish proof systems tend to be the most popular backends for zkML for this reason. Halo2 and Plonky2 with their table-style arithmetization scheme can handle neural network non-linearities well via lookup arguments. In addition, the former has a vibrant developer tooling ecosystem coupled with flexibility, making it the de facto backend for many projects including EZKL.
Other proof systems have their benefits as well. R1CS-based proof systems include Groth16 for its small proof sizes and Gemini for its handling of extremely large circuits and linear time prover. STARK-based systems like the Winterfell prover/verifier library are also useful especially when implemented via Giza’s tooling that takes a Cairo program’s trace as an input and generates a STARK proof using Winterfell to attest to the correctness of the output.
Some progress has been made in designing efficient proof systems that can handle the complex, circuit-unfriendly operations of advanced ML models. Systems like zkCNN, which is based on the GKR proving system, or the use of compositional techniques in the case of systems like Zator are often more performant than their generalized counterparts as evidenced by the Modulus Labs’ benchmarking report.
zkCNN is a method for proving the correctness of convolutional neural networks using zero knowledge proofs. It uses a sumcheck protocol to prove fast Fourier transforms and convolutions with a linear prover time that is faster than computing the result asymptotically. Several improvements and generalizations have been introduced for interactive proofs, including verifying the convolutional layer, the ReLU activation function, and max pooling. zkCNN is particularly interesting given Modulus Labs’ report on benchmarking where they found it outperformed other generalized proving systems in both proof generation speed and RAM consumption.
Zator is a project aimed at exploring the use of recursive SNARKs to verify deep neural networks. The current constraint for verifying deeper models is fitting the entire computation trace into a single circuit. Zator proposes verifying one layer at a time using recursive SNARKs, which can verify N-step repeated computation incrementally. They use Nova to reduce N instances of computation into a single instance that can be verified at the cost of a single step. With this approach, Zator was able to snark a network with 512 layers, which is as deep as most production AI models today. Proof generation and verifying time for Zator are still too large for mainstream use cases, but their compositional technique is interesting nonetheless.
Much of the focus of zkML given its early stage has been on the above infrastructure. However, there are a few projects working on applications today.
Modulus Labs is one of the most diverse projects within the zkML space, working on both example applications and relevant research. On the application side, Modulus Labs has demonstrated zkML use cases through RockyBot - an on-chain trading bot - and Leela vs. the World - a chess game where all of humanity plays against a verified, on-chain instance of the Leela chess engine. The team has also branched into research, writing The Cost of Intelligence which benchmarked various proving systems for the speed and efficiency across differing model sizes.
Worldcoin is applying zkML in its attempt to make a privacy-preserving proof of personhood protocol. Worldcoin is using custom hardware to process high-resolution iris scans which are inserted into their Semaphore implementation. This can then be used to perform useful operations like membership attestation and voting. They currently use a trusted runtime environment with a secure enclave to verify the camera-signed iris scan, but they ultimately aim to use a ZKP to attest to the correct inference of the neural network for cryptographic-level security guarantees.
Giza is a protocol that enables the deployment of AI models on-chain using a fully trustless approach. It uses a technology stack that includes the ONNX format for representing machine learning models, the Giza Transpiler for converting these models into the Cairo program format, the ONNX Cairo Runtime for executing the models in a verifiable and deterministic way, and the Giza Model smart contract for deploying and executing the models on-chain. While Giza could also fit into the Model-to-Proof compiler category, their positioning as a marketplace for ML models is one of the more interesting applications out there today.
Gensyn is a distributed hardware provisioning network used to train ML models. Specifically, they are engineering a probabilistic audit system based on gradient descent and using model checkpoints to enable a decentralized GPU network to service the training of full-scale models. While their zkML application here is highly specific to their use case - they want to ensure that when a node downloads and trains a piece of a model that they are honest about the model updates - it showcases the power of combining zk and ML.
ZKaptcha is focused on the bot problem within web3, providing captcha services for smart contracts. Their current implementation has the end user essentially producing a proof of human work by completing the captcha, which is verified by their on-chain verifier and accessible by a smart contract via a couple lines of code. Today, they are primarily only relying on zk, but they intend to implement zkML in the future similar to existing web2 captcha services that analyze behaviors like mouse movements to determine if a user is human.
Given how early the zkML market is, a lot of applications have been experimented with at the hackathon level. Projects include AI Coliseum, an on-chain AI competition built using EZKL that uses ZK proofs to validate machine learning outputs, Hunter z Hunter, a photo scavenger hunt using the EZKL library to validate an image classification model’s outputs with halo2 circuits, and zk Section 9, which converted an AI image generating model into a circuit for minting and verification of AI art.
While improvements and optimizations are being made at light-speed, some core challenges remain for the zkML space. These challenges range from technical to practical and include:
Quantization with minimal accuracy loss
Circuit sizes especially when a network is composed of many layers
Efficient proofs for matrix multiplication
Adversarial attacks
Quantization is the process of representing floating-point numbers, which most ML models use to represent model parameters and activations, as fixed-point numbers, which is necessary when dealing with the field arithmetic of zk circuits. The impact of quantization on accuracy for machine learning models depends on the level of precision that is used. Generally, using lower precision (i.e., fewer bits) can result in reduced accuracy because it can introduce rounding and approximation errors. However, there are several techniques that can be used to minimize the impact of quantization on accuracy, such as fine-tuning the model after quantization and using techniques like quantization-aware training. In addition, Zero Gravity - a hackathon project at zkSummit 9 - has shown that alternative neural network architectures developed for edge devices such as the weightless neural network can be useful for avoiding the problems of quantization in-circuit.
Outside of quantization, hardware is another key challenge. Once a machine learning model is properly represented via a circuit, it is cheap and fast to verify proofs of its inferences due to the succinctness property of zk. The challenge here is not on the verifier, but on the prover since RAM consumption and proof generation time scales quickly as the model size grows. Certain proving systems (e.g. GKR-based systems that use sumcheck protocols and layered arithmetic circuits) or compositional techniques (e.g. wrapping Plonky2, which is efficient at proving time but poor at efficient proof sizes for large models, with Groth16, which doesn’t see a growing proof size with complexity of the model) are better suited for handling these issues, but managing tradeoffs is a core challenge for projects building in zkML.
On the adversarial side, there is also still work to be done. For one, if a trustless protocol or DAO elects to implement a model, there is still risk around adversarial attacks during the training phase (e.g. training a model to behave a specific way when it sees an input which could be used to manipulate later inferences). Federated learning techniques and training phase zkML could be one way of minimizing this attack surface.
Another core challenge is the risk of model-stealing attacks when a model is privacy-preserved. While a model’s weights can be obfuscated, it is theoretically possible to reverse engineer the weights given enough input-output pairs. This is mainly a risk for smaller scale models, but it is a risk nonetheless.
Although there have been some challenges with optimizing these models to operate within the constraints of zk, improvements are being made at exponential speed and some anticipate we will reach parity with the broader ML space soon assuming further hardware acceleration. To underscore the pace of these improvements, zkML has gone from 0xPARC’s zk-MNIST demonstration in 2021 showing how a small-scale MNIST image classification model could be performed in a verifiable circuit to Daniel Kang’s paper doing the same for ImageNet-scale models just under a year later. In April of 2022, this ImageNet-scale model was further improved from 79% accuracy to 92% accuracy, and networks as large as GPT-2 are potentially feasible in the near-term albeit with slow proving times today.
We view zkML as a rich and growing ecosystem that looks to scale the capabilities of blockchains and smart contracts to be more flexible, adaptable, and intelligent.
While zkML is still in its early stages of development, it has already begun to show promising results. As the technology evolves and matures, we can expect to see even more innovative use cases for zkML on-chain.
For more information on the zkML space, I highly recommend the zkML Community telegram group and the associated resource Github.
If you are building zkML applications or infrastructure, DM me - we would love to connect with you!