Arweave and Filecoin: a (mostly) technical comparison

Thanks a great deal to Dan MacDonald (@DMacOnArweave) for answering questions about Arweave and reviewing this article.

Getting technical.
Getting technical.

True, talking to a shill on twitter is not the way to get good technical info. Yet the conversation about differences between decentralized storage providers too often ends with oversimplified descriptions akin to: ”Arweave is permanent storage and Filecoin is not”.

I think it is an important conversation to have. Lots of seasoned crypto-folk I’ve talked to are skeptical of modern “web3” projects. Yet, even if none of the current projects survive medium-term, the architectural choices they make today will inform the projects to come. Skepticism is no excuse for technical ignorance if we want decentralized web to become a reality.

The NFT boom brought about awareness of two projects for storing images and metadata: Arweave and Filecoin. Both are decentralized storage providers in the sense that they are run by a network of independent nodes and knocking out or censoring a handful of the nodes, in theory, should not harm the network as a whole. That’s about where the similarities end and even that rule does not apply to each network equally. The two are vastly different in their technical choices but we will compare them side-by-side in order to get insight into ways decentralized storage is implemented today.

NOTE: Yes, there are other storage providers and they’ve seen upticks in usage lately (Sia, Storj) but my bandwidth forces me to leave them aside for a later day.

ALSO NOTE: I am not an expert in this, just a researcher. If you find that I’ve made factual mistakes or omissions please let me know on twitter @dayofniagra.

High-level overview

Huobi Digital in their research paper on web3 storage categorizes Arweave as “Storage on Network Basis: Data is stored among network, where the network manages and verifies on-chain data storage resources” and Filecoin as “Storage on P2P Network Basis: Data is stored on a peer to peer basis, verified on-chain”. In layman’s terms one could say that Arweave is a network that manages storage, while Filecoin is a network that implements a protocol that allows coordinating and verifying peer-to-peer storage deals. But let’s get progressively specific:

Arweave: First thing you hear about Arweave is that it is “permanent” storage. It allocates enough of funds in its native token to, theoretically, fund file storage in perpetuity as storage costs go down over the years. This done by setting up “endowment” from which the fees are disbursed to miners over time. Arweave achieves its goals through a few game-theoretical mechanisms that incentivize replication of data. It also uses clever mechanisms to decrease the amount of work each node has to do.

Filecoin: Filecoin is an incentivization layer over IPFS - the content addressable file distribution protocol. IPFS only specifies how files and other data are to be discovered and served but provides no incentive to store or serve them. Filecoin amends IPFS by implementing a blockchain on which deals are made with individual providers for storage and serving of IPFS-adressable data for a specified period of time. Filecoin invests a lot of resources into research of Zero Knowledge proofs and the networks relies on them heavily. One thing to note, though, is that the public IPFS network and Filecoin’s IPFS are two separate networks.

Let’s get technical

The organizations behind the projects

This is not specifically a technical metric, but the organizations are of vastly different sizes and thus manpower. This ripples through many technical choices.

Arweave: According to their LinkedIn and Crunchbase profiles Arweave employs only a few dozen people.

Filecoin: Filecoin consists of number of organizations working on the IPFS, Filecoin itself, ZK research and tooling, clients in different languages, gateways, accelerators and grants. Among all of those a few hundred people are employed to contribute to Filecoin (not necessarily as developers). In other words Filecoin’s manpower seems to be greater by at least a factor of 10.

What’s in a block?

Arweave: A transaction in an Arweave block is either a transfer of AR token or a file to be stored along with payment for the service. Because they contain the actual files Arweave blocks get big, and I mean hundreds of megabytes or even gigabytes big. Best just look at this block on the block explorer.

Filecoin: The state of the Filecoin network consists of “deals” or agreements between network users and miners to store and allow retrieval of data at a fixed price. In a sense at it’s core Filecoin is just a simple distributed storage marketplace. So a block contains bid-ask deals for the market, cryptographic proofs submitted by miners that prove that they are conforming to the terms of a certain deal (they are storing specified data at a certain time) and, lastly, the block contains transactions of the FIL token.

How is a block mined?

Arweave: The algorithm is called Succinct Random Proofs of Access (SPoRA). The basic idea of the algorithm is: a pseudorandom (computed based on state of the chain) chunk of stored data is requested from a miner, if they have the correct chunk they then run a proof-of-work algorithm to find a desired block hash. This is also how replication incentivized - the more rare data a miner stores the greater is their chance of mining a block and receiving the rewards. Pretty clever.

Filecoin: Filecoin is essentially a proof-of-stake system weighted not by stake but by committed storage. They describe it like this:

Filecoin is built on a variation of Proof-of-Space. It is also related to Proof of Stake in that instead of only tokens as stake, stake is in the form of proven storage that determines a miner’s probability of mining a block.

How does an end-user retrieve stored data?

Arweave: An end-user talks to a “gateway” which is a server that caches data stored on-chain. There is no built-in incentive for these. Examples: arweave.net, ar.io. Anybody could connect to an Arweave miner directly but it’s cumbersome and slow because data lookup is a slow process.

Filecoin: The network does not make it easy to upload and retrieve data because the interface is quiet low-level and requires directly making deals with storage providers. As with Arweave one must rely on a service that abstracts the complexity, for example web3.storage or

After data is submitted to the chain when does it become available for retrieval?

Aweave: Data is written onto a chain when it is accepted into a block, which could take from a few minutes to never. Yet the gateways are aware of this and will optimistically provide you your files while they are waiting to be confirmed.

Filecoin: To quote the docs directly:

It takes up to 24 hours for a storage provider to seal the data.

In practice, like with Arweave, it’s the layer on top of Filecoin that creates your deals and will likely to choose to serve your files before they are safely saved by a storage provider.

Smart contracts?

Arweave: Arweave does not have smart contracts in the common meaning of the term. SmartWeave is essentially a stored piece of JS code with some serialized state stored along with it. To submit a transaction the client needs to find the latest valid state and then compute and save a new one. It means that the “contract” needs to always be re-executed (or cached) by the client or a centralized middleman. SmartWeave contract has no access to the state of the blockchain.

Filecoin: Filecoin developers are very adamant that the network is incomplete without a programmable layer. Filecoin Virtual Machine (FVM) is that layer and has been released in a non-executing beta state to the network. FVM, which can use any WASM-compiled language and Solidity, would allow to programmatically manage deals and allocate funds. In my opinion FVM would make Filecoin a vastly more useful and powerful network than it is now. You can read about all the potential use-cases here but some example are: perpetual storage akin to Arweave’s, brokering computation of data, L2s and “data DAOs” or organizations whose goal is preservation of data that is important to them. Exciting!

How do both deal with state growth?

Arweave: Nodes get to chose what data/blocks they store, so the strategy of how much data to store is up to them.

Filecoin: I was not able to find any information about this, but considering that Filecoin has very high minimum hardware requirements this should not be a problem for a long time.

How do they ensure sufficient replication?

Arweave: Miners are incentivized to replicate because it increases their chances of mining a future block (see “How is a block mined?” above).

Filecoin: This is not applicable to Filecoin. Only those miners that you’ve paid will store your data, so replication is all down to the user of the network. Yet this is a temporary situation due to the upcoming release of FVM. FVM can manage network state and can be used to customize how deals are created, including things like automatic replication of data across miners.

Can files be censored? (a good and a bad thing)

Arweave: Every miner has their own blacklist which they can choose to add files to. The gateways can also refuse to serve specific files.

Filecoin: I have not found a way that would allow this.

How can one participate in the network?

Arweave: The minimum miner requirements are very modest and, unlike with Filecoin, no collateral is required. Documentation and conversations that I’ve seen say that a good processor, 8 GB of memory and a couple of TB of storage space would get you a minimal setup. Now, that does not guarantee mining success, that’s a different topic, but you’ll be a node on the network. You can also join a mining pool.

Filecoin: Anybody can become a storage or retrieval provider but even Filecoin themselves say it’ll cost hundreds of thousands of dollars in FIL collateral and equipment. Basically Filecoin mining is for data centers and other players with a lot of resources.

Price of storage.

Arweave: Pricing storage is dynamic but fluctuates around a few (2-5) USD per GB.

Filecoin: By improving their ZK proofs Filecoin has been able to bring down storage costs considerably in the past year. This dashboard will give you info on fees. As of writing the cost is a fraction of a cent per GB per year or in the dashboard’s terms “0.0011% the cost of Amazon S3”.

Permanent” storage.

Arweave: Well of course, that’s the whole point.

Filecoin: While Filecoin is not designed for that usage specifically the developers are keeping the system intentionally agnostic of the end-user desires. The upcoming FVM is the key feature which would enable things like “permanent” storage.

Notable Features Present and Future.

Arweave:

  • SmartWeave - I have mentioned this feature above but want to reiterate how much faith Arweave ecosystem seems to be putting onto SmartWeave - the network’s smart-contract-like system. The system provides only storage of code and state and pushes computation onto the consumers. This might sound somewhat funky but it frees the system from the bounds of gas fees and such, allowing larger computational loads to be processesd (including Arweave stored data) than a conventional system could. This, of course, comes with its own downsides. Yet the devs keep churning out DAO and tokenomic infrastructure on top SmartWeave relentlessly. For an example take a look at now.arweave.dev.

Filecoin: Filecoin ecosystem is producing a tremendous amount of work that is meant to improve or complement the existing features. I will only look at a couple.

  • ZK - zk-SNARKs are the soul of Filecoin base layer. But the vision for them goes beyond that base functionality. For example check out the release announcement for Lurk - a programming language for recursive zk-SNARKS. Blog goes into some technicalities that are above my head but one of the things that the language would allow is verifiably processing information stored on Filecoin. In other words snarks can provide a safe compute layer of self-custody data leading to truly “web3” applications. In theory, at least.

  • Filecoin+ - Filecoin has been in business of heavily subsidizing storage of data they hold in high esteem. This tests the network, provides income for miners, onboards other organizations, provides marketing for the network, yet drains Filecoin (the organization’s) funding. How long Filecoin can pull this off is uncertain.

Both:

  • Storage of structured data - Both Filecoin and Arweave allow storage of structured data which a few products built on top of each are making use of. IPFS uses protocol level abstraction for structured (and unstructured) linked data called IPLD, while Arweave relegated that job completely to the app layer. You can see the results of this use case in Ceramic Network (IPFS with Arweave support coming), which stores streams and structured data, and Kwil which builds an SQL DB on top of Arweave. (Note: I cannot say whether those products are actually good. I am just trying to point out the possibilities.)

Subjective Reflections

In this closing section a I will commit a number of sins, such as comparing the achievements of the two networks and contemplating their viability. After all, while the two are very different it seems like both are pushing towards similar goals of being base layers for a future decentralized app layer.

Arweave nice: Today I would say Arweave is a more complete product. In its current state it’s a valid product in its own right. By using a lot of clever design it can today provide convenient permanent (in theory) storage to an end user. NFTs, family photos, important documents, land deeds are all prime examples of things that you can easily load onto Arweave for posterity. To use business speak it appears that Arweave has a product-market fit.

Arweave ugly: This product-market fit has not yet translated into actual network usage as the network revenue has been tiny and inconsistent.

Arweave ugly: A thing that Arweave has to somewhat taint its trustworthiness are the mechanisms that I have described as “clever”. For example its consensus mechanism or the pricing model. “Clever” could mean “smart” but could also be replaced with a phrase like “untested by time”. This is probably not lost on exchanges who I use as litmus test of a network’s perceived security - the list of exchanges trading Arweave’s token is small, and large US players like Coinbase, Kraken and Gemini are notably missing. (Dan MacDonald who reviewed this article argued that calling a twist on PoW “untested” is somewhat odd and I generally agree but I will go with my intuition that Arweave is a fairly unique system).

Filecoin ugly: While Arweave feels like a finished product Filecoin devs themselves insist that their creation is still at the beginning of its roadmap and speed and smart contracts are on their way to add features and make the network flexible and easy to use. It is hard to disagree - the system is complex and difficult to use. All while running on subsidies and incentives of questionable sustainablity.

Filecoin nice: On the positive side, with dozens of workstreams focusing on data indexing, data delivery, improved payment models, privacy-preserving data processing and L2s the final vision that Filecoin paints looks beyond impressive - it’d have all of the features of Arweave and then many many more. Most important, in my opinion, is the insistence on putting a lot of faith into zero knowledge proofs which are central to privacy which in turn is essential for self-sovereignty. It is my opinion that privacy is a key missing ingredient in anything web3 that stands between it and actual adoption and Filecoing seems to agree.

Filecoin nice: it’s really really cheap. Maybe unsustainably, but it might be the cheapest storage out there. This and subsidies have grown the size of stored data into hundreds of petabytes. This data is not “permanent” like Arweave’s but it is about 2000x larger than what Arweave is storing as of writing.

Arweave ugly: Let me revisit yet again Arweave’s approach to smart contracts - I am not certain what to make of it. The base system is awfully primitive and relies on centralized parties to serve actual state of these so-called contracts. The idea of offloading computation to clients might be genius or it might be a gimmick, I am not at all certain which. On the positive side a few projects in the app layer are actively trying to mitigate the shortcomings of SmartWeave’s unload-computation-on-clients approach.

Final Words

All of the above is to say that time will tell. Arweave is a simple system that is useful today and Filecoin is a behemoth of a product (or a collection of products rather) and organization with a long and ambitious roadmap fueled by a closed $200m ICO. Arweave has built its pricing models into the system, yet Filecoin is aiming for flexibility and customizability at every level. Filecoin’s vision relies on future complex systems such a WASM smart contract engine, L2s and zero knowledge proofs. Arweave is building its app/smart contract layer using an easy-to-understand system. Godspeed.

Subscribe to dayofniagra
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
Verification
This entry has been permanently stored onchain and signed by its creator.