tl;dr Copy-pasting source code without attribution and making misleading claims about the original work is against the open source ethos and hurts the ecosystem.
Crypto is an ideal environment for open source because it unlocks business models that are consistent with permissively licensed software. We can sustainably build software in the open, leveraging the combined talent and expertise of entire communities, and everyone benefits. Contributors working together can produce amazing results - see Linux, Rust, and the EVM.
But it’s not cynical to say that perverse incentives also exist. There’s the temptation to use others’ work without attribution, to make misleading claims about that work for marketing hype, and to appropriate others’ ideas. In this mode, open source development is a zero-sum game and open source projects are resources to be exploited.
There are norms in open source development designed to protect against this. Anyone can use, modify, or distribute open source code, but they must give credit to its authors and operate in good faith. Participants in an open source ecosystem should be net contributors, not net extractors.
This post is about an instance where that didn’t happen.
Plonky2 and Starky are based on a simple insight: we can make proving systems faster by designing them for the hardware that they run on. We created a feedback loop between engineers and cryptographers to give cryptographers insight into improving hardware performance (64-bit fields, tuned hash function parameters), and engineers an understanding of where protocol parameters could be changed without impacting security.
This work began when we were a tiny startup called Mir, and has continued within Polygon. We open sourced these libraries under a permissive MIT/Apache license because we have benefited enormously from the effort of others and we want to be net contributors. We welcome others building on and modifying Plonky2 - there’s already a growing community of enormously talented people who are using Plonky2 and Starky for their own projects.
Matter Labs, developers of zkSync, recently released a proving system called Boojum that includes a substantial amount of source code that is copy-pasted from performance-critical components of the Plonky2 library. This code is included without the original copyrights or clear attribution to the original authors. As an aside, it’s very difficult to fail to abide by the terms of the MIT/Apache licenses, but this is how to do it.
Above is the original code in the Plonky2 repo, written more than a year ago, and below is the copied code in the Boojum library, released recently.
Below is another example, but truly there are so many instances of copy-pasted code that you can simply look for anything that looks particularly complicated in the Plonky2 repo, search in the Boojum repo, and there’s a good chance that you’ll find it.
They’re the same picture.
Beyond the directly copied code, Boojum is extremely similar to Plonky2. It uses the same strategy of parallel repetition to boost soundness in a small field, similar custom gates to efficiently arithmetize recursive verification, and the same lookup argument developed by our teammate Ulrich Haböck. The MDS matrix and parameters for Poseidon are identical to the parameters discovered by the Polygon Zero team.
To add insult to injury, the founder of Matter Labs claimed that Boojum is more than 10x faster than Plonky2. Wondering how this is possible, given that the performance-critical field arithmetic code is directly copied from Plonky2? You should be.
The claim is based on the Celer benchmarks, but there’s a huge catch: the statement that Plonky2 is proving is 16x larger than what Boojum is proving. The specific implementation of sha256 is optimized in the Boojum repo, but not in Plonky2 (it doesn’t even use lookups).
Anyone could simply implement a more efficient arithmetization of sha256 in the Plonky2 library and performance would be equivalent because the proving system is effectively identical. To claim that Boojum is 10x faster than Plonky2 is extremely misleading.
Starky, despite being developed almost two years ago (also not using lookup tables for sha256!), is much faster than Boojum. Yet Matter Labs claims:
The problem is that 1) Boojum is substantially slower than Starky (the C++ version of Starky powers Polygon zkEVM on mainnet) and 2) Boojum isn’t used in production.
The post introducing Boojum never mentions Polygon or many of the Polygon developers that have written the code used in Boojum. There’s no mention of copy-pasted code, or any indication that the design of Boojum is effectively identical to Plonky2. The closest thing to attribution is buried in the README: “The non-vectorized math implementation largely follows the approach of Plonky2.” If we may say so, “largely follows the approach” is doing a lot of work in this sentence.
It’s great to give credit, and we appreciate the recognition for our optimization of the Poseidon parameters. However, it might not be apparent to the reader that Boojum borrows far more than the Poseidon constants from Plonky2, and in fact that Boojum’s design is nearly identical to Plonky2’s, even to the point of copy-pasted code.
An important part of Plonky2 and Starky’s performance is the Goldilocks field discovered by Hamish Ivey-Law, a researcher at Polygon Labs.
The post introducing Boojum attributes Goldilocks to Mike Hamburg, who also named a field “Goldilocks” but the two fields have almost nothing in common besides the name. Hamburg’s field follows the Golden Ratio and is defined as 2^448 - 2^224 - 1 (following x^2 - x - 1). It’s astronomically larger than the Goldilocks Field used in Plonky2 and is used in a completely different area of cryptography.
The Goldilocks Field in Plonky2 is defined as 2^64 - 2^32 + 1, and is named after the children’s story because it’s “just right” for ZKP systems: it fits in a 64-bit word, is large enough to just accommodate the product of two 32-bit integers, and has a nice 2-adic multiplicative subgroup. None of these things are true of Hamburg’s field.
To confuse the two is either intentionally misleading or implies a fundamental misunderstanding of the underlying cryptographic systems and their applications.
It’s reasonable to question why this matters. Isn’t this just a dispute between two competitors in the L2 space? Does it really have broader implications for the open source ecosystem in crypto?
We believe that it does. Before our team was acquired by Polygon Labs, we were a tiny startup, stretching a $2m seed round to cover more than two years of runway. The majority of the development of Plonky2 and Starky occurred when we didn’t have the resources or recognition to push back if a better-funded competitor had reappropriated our work without attribution.
Right now, there are many teams that look like this, small but exceptionally talented and producing amazing work. The success of the entire space depends on these teams being incentivized to contribute to the open source ecosystem. Good behavior, respecting the norms of open source development, benefits everyone. Bad behavior harms everyone.
We can make it more specific. We made the source code for Plonky2 available in January of 2022, but we didn’t add a permissive license until August. We hesitated because we were worried about precisely this scenario, that a competitor would take our work, rebrand it, and claim it as their own - there’s precedent for this!
Our hesitation hurt other teams. Many teams delayed building on Plonky2 due to licensing uncertainty and this set back the community that has developed around Plonky2. Fortunately, many are now using our libraries, but this is an example of how ignoring norms around open source development hurts the space.
Respecting these norms leads to greater collaboration, to developers converging around shared implementations that function as public goods. Take the EVM as an example: the EVM wasn’t perfectly designed, but the combined energy and talent of an entire community has built tooling and a developer ecosystem that has allowed it to become the default choice for building smart contracts.
The alternative to open source norms is restrictive licenses and closed source code. This leads to competition and siloed implementations rather than collaboration and public goods. Crypto companies begin to look a lot like traditional web2 monopolies, for whom any tactic is justified if it wins more market share.
We’re nearing the release of Plonky3, a new library that radically improves on the performance of Plonky2 and Starky. Boojum represents a significant amount of work to build a proving system that is already outperformed by the existing Starky library and will be even further behind Plonky3.
However, cultivating a productive environment for open source development is critical. On a personal note, our team has worked incredibly hard to produce work that has benefited the entire space. It’s a bad feeling to see this work being used without recognition or attribution.
Matter Labs/zkSync talk a lot about ethos. They’ve centered their marketing around the Ethereum Ethos, professing to be “scaling the ethos and technology of Ethereum.” They recently released a ZK Credo, centered around three core properties. The first one is integrity, described as “doing the right thing... even when no one else is looking or will ever know.”
Taking others’ work and representing it as your own isn’t consistent with the Ethereum Ethos. The Ethereum community deserves better.
Update 8/4/23 11:18am ET: Updated the graph of Celer benchmarks at their request.