Filecoin: The Decentralized Data Storage

February 12th, 2023

Online file storage is a crucial matter in our world. We trust big companies like Google and Amazon to store our files on their servers. What prevents them from looking through our files? What prevents them from using them? Monetize them? What ensures that our file is safe and that somehow one of these servers that our file is on won’t get caught in the fire? All of these questions really matter.

Google offers a free drive, limiting you to 15 GB of storage. Do you think that this “free” drive costs Google nothing? Is it out of generosity? Of course not. Google still has to pay electricity bills to maintain that drive. Alongside electricity, they have to buy more storage space, pay insurance companies for that storage, and still, there is a probability that the storage burns or some problems happen. So, they probably want to make sure that nothing happens. Then they want to store your files in more than one storage space, which multiplies the cost.

Knowing all this, why Google gives this storage to you for free? How do they make money? Well done! They monetize your personal data. As Steve Jobs said, “If you are not paying for it, you are the product.” Maybe you don’t have important information on your Google Drive, so you don’t care about them using your data. But you must know that this free product is paid for with your personal information and data.

Google offers you monthly payment plans, which at the time of writing this article, are: $12 per month for 2 TB of storage space, $18 per month for 5 TB of storage space, and a discussable enterprise plan that gives you “as much space as you need.” Here, you may think that now you are paying the fees, so you are not the product. But you are. You store your personal data on cloud storage without encrypting, and that data is readable to the cloud storage. Do you think they won’t read it? Or monetize it? Let’s quote Google’s terms and services on what data they collect:

“Terms you search for
Videos you watch
Views and interactions with content and ads
Voice and audio information
Purchase activity
People with whom you communicate or share content
Activity on third-party sites and apps that use our services
Chrome browsing history you’ve synced with your Google Account”

After they collect this information, they may share it with others in the below situations, as Google terms and services say:

“Meet any applicable law, regulation, legal process, or enforceable governmental request. We share information about the number and type of requests we receive from governments in our Transparency Report.
Enforce applicable Terms of Service, including investigation of potential violations.
Detect, prevent, or otherwise address fraud, security, or technical issues.
Protect against harm to the rights, property or safety of Google, our users, or the public as required or permitted by law.”

As another example, let’s look at Amazon AWS. Amazon AWS has terms and services that mostly look like Google’s. So, in this term, there are almost no differences. How about pricing?

There is a difference in price calculation between Amazon AWS and Google Drive. In Google Drive, you need to purchase the plan based on what you estimate of your need. In Amazon AWS EFS, you may pay as you go, which means you pay for every Megabyte of data you store. So, how much is the price compared with Google Drive?

Amazon AWS S3 charges $0.023 per GB per month for their Ohio servers at the time of writing this article. This means that if you want to store 2 TB of data on this service, you have to pay approximately $46 per month. Well, that’s a very high price.

So, as you can see, Google Drive charges you $12-$18 per month for some standard, several Terabytes of storage. And Amazon AWS charges more than $46 monthly for the same amount. These prices can be very high. Especially when you know that not only should you pay these prices, but you and your personal data are also the product.

I personally use Google Drive’s “free” plan, and I am not encouraging anyone to stop using it. I just want you to know some information about data collecting and sharing that you probably didn’t know existed and give you a substitute for cloud storage.

No, the substitute is not Microsoft Azure or any other “centralized” cloud storage because Microsoft has, more or less, the same terms and services, and prices.

The substitute I’m talking about gives you freedom, ownership, and monetization power. It might be more expensive than what I mentioned above, but if you value your personal information and privacy, you might be happier using such things instead of centralized cloud storage.

IPFS

Before diving into Filecoin, we start with Inter-Planetary File System or IPFS.

IPFS is a peer-to-peer hypermedia protocol designed to preserve and grow humanity’s knowledge by making the web upgradable, resilient, and more open.

What does that mean, exactly? Let’s say you’re doing some research on aardvarks. (Just roll with it; aardvarks are cool! Did you know they can tunnel 3 feet (0.91 m) in only 5 minutes?) You might start by visiting the Wikipedia page on aardvarks at https://en.wikipedia.org/wiki/Aardvark.

When you put that URL in your browser’s address bar, your computer asks one of Wikipedia’s computers, which might be somewhere on the other side of the country (or even the planet), for the aardvark page.

However, that’s not the only option for meeting your aardvark needs! There’s a mirror of Wikipedia stored on IPFS, and you could use that instead. If you use IPFS, your computer asks to get the aardvark page like this:

/ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/wiki/Aardvark

IPFS knows how to find that sweet, sweet aardvark information by its contents, not its location (more on that, which is called content addressing, below). The “IPFS-ified” version of the aardvark info is represented by that string of numbers in the middle of the URL (bafybeiaysi…), and instead of asking one of Wikipedia’s computers for the page, your computer uses IPFS to ask lots of computers around the world to share the page with you. It can get your aardvark info from anyone who has it, not just Wikipedia.

And, when you use IPFS, you don’t just download files from someone else—your computer also helps distribute them. When your friend a few blocks away needs the same Wikipedia page, they might be as likely to get it from you as they would from your neighbour or anyone else using IPFS.

IPFS makes this possible for not only web pages but also any kind of file a computer might store, whether it’s a document, an email, or even a database record. Today’s web has some problems that IPFS claims that it solves and is needed for the web of tomorrow:

Today’s web is inefficient and expensive: HTTP downloads files from one server at a time — but peer-to-peer IPFS retrieves pieces from multiple nodes at once, enabling substantial bandwidth savings. IPFS makes it possible to distribute high volumes of data without duplication efficiently.
Today's web can't preserve humanity's history: The average lifespan of a web page is 100 days before it’s gone forever. The medium of our era shouldn’t be this fragile. IPFS makes it simple to set up resilient networks for mirroring data, and thanks to content addressing, files stored using IPFS are automatically versioned.
Today's web is centralized, limiting opportunity: The Internet has turbocharged innovation by being one of the great equalizers in human history—but increasing consolidation of control threatens that progress. IPFS stays true to the original vision of an open, flat web by delivering technology to make that vision a reality.
Today's web is addicted to the backbone: IPFS powers the creation of diversely resilient networks that enable persistent availability—with or without internet backbone connectivity. This means better connectivity for the developing world, during natural disasters, or just when you’re on flaky coffee shop Wi-Fi.

How IPFS works

Here’s what happens when you add a file to IPFS—whether you’re storing that file on your own local node or by a pinning service or IPFS-enabled app.

When you add a file to IPFS, your file is split into smaller chunks, cryptographically hashed, and given a unique fingerprint called a content identifier (CID). This CID acts as a permanent record of your file as it exists at that point in time.
When other nodes look up your file, they ask their peer nodes who are storing the content referenced by the file’s CID. When they view or download your file, they cache a copy—and become another provider of your content until their cache is cleared.
A node can pin content in order to keep (and provide) it forever or discard content it hasn’t used in a while to save space. This means each node in the network stores only content it is interested in, plus some indexing information that helps figure out which node is storing what.
If you add a new version of your file to IPFS, its cryptographic hash is different, so it gets a new CID. This means files stored on IPFS are resistant to tampering and censorship—any changes to a file don’t overwrite the original, and common chunks across files can be reused in order to minimize storage costs.
However, this doesn’t mean you need to remember a long string of CIDs—IPFS can find the latest version of your file using the IPNS (Inter-Planetary Name System) decentralized naming system, and DNSLink can be used to map CIDs to human-readable DNS names.

Content Identifier (CID)

When we try to access a file or open a web page on the centralized web, we refer to that file or web page by its location address. By contrast, we need to use content addressing if we want to exchange data with other peers on the decentralized web.

As we’ve seen, the centralized web relies on trustworthy authorities to host our data and uses location-based URLs to access it. On the decentralized web, peers can all host each other’s data with a different kind of link that’s more secure, making it easy to trust our neighbours.

Content Identifiers are made out of hashing the file cryptographically to make a unique identifier for the content. This identifier is much more secure than leaving the content on the internet on its own because:

Cryptographic hashes can be derived from the content of the data itself, meaning that anyone using the same algorithm on the same data will arrive at the same hash. If Ada and Grace are both using the same decentralized web protocol, such as IPFS, to share the exact same photo of a kitten, both images will have exactly the same hash. By comparing those hashes and confirming that they’re the same, we can guarantee that every single pixel of those two photos is identical.
Cryptographic hashes are unique. If Grace uses Photoshop to remove a single whisker from that kitty, the updated image will have a new hash. Simply by looking at that hash, even without access to the file itself, it will be easy to tell that the file now contains different data.

Using content identifiers on the decentralized web gives us an advantage. With traditional location addressing, we knew we needed to visit the domain puppies.com to find the content stored as beagle.jpg. If the puppies.com domain were broken for some reason, we’d lose access to that image.

The decentralized web works differently. When we want a specific photo of an adorable pet, we ask for it by its content address (hash). Whom do we ask? The whole network! If Ada is online, we’ll see that she has the content we’re looking for, and we’ll know that it’s exactly the file we need because it has a matching hash. If she goes offline, we may still be able to get the same photo from Grace or another peer.

Since we use hashes to request data on the decentralized web, we can think of a hash as a link, not just a name.

Decentralization

Making it possible to download a file from many locations that aren’t managed by one organization:

Supports a resilient internet. If someone attacks Wikipedia’s web servers or an engineer at Wikipedia makes a big mistake that causes their servers to catch fire, you can still get the same web pages from somewhere else.
Makes it harder to censor content. Because files on IPFS can come from many places, it’s harder for anyone (whether they’re states, corporations, or someone else) to block things. We hope IPFS can help provide ways to circumvent actions like these when they happen.
Can speed up the web when you’re far away or disconnected. If you can retrieve a file from someone nearby instead of hundreds or thousands of miles away, you can often get it faster. This is especially valuable if your community is networked locally but doesn’t have a good connection to the wider internet. (Well-funded organizations with technical expertise do this today by using multiple data centers or CDNs—content distribution networks. IPFS hopes to make this possible for everyone.)

That last point is where IPFS gets its full name: the Inter-Planetary File System. We’re striving to build a system that works across places as disconnected or as far apart as planets. While that’s an idealistic goal, it keeps us working and thinking hard, and almost everything we create in pursuit of that goal is also useful here at home.

IPNS

The Inter-Planetary Name System (IPNS) is a system for creating mutable pointers to CIDs known as names or IPNS names. IPNS names can be considered links that can be updated over time while retaining the variability of content addressing.

Technically, an IPNS name can point to an arbitrary content path (/ipfs/ or /ipns/), including another IPNS name or DNSLink path. However, it most commonly points to a fully resolved and immutable path, i.e., /ipfs/[CID].

A name in IPNS is the hash of a public key. It is associated with an IPNS record containing the content path (/ipfs/CID) it links to and other information such as the expiration, the version number, and a cryptographic signature signed by the corresponding private key. New records can be signed and published at any time by the private key holder.

IPNS records can point at an immutable or mutable path. The meaning behind CID used in a path depends on the used namespace:

/ipfs/<cid>: an immutable content on IPFS (since the CID contains a multihash)
/ipns/<cid-of-libp2p-key>: a mutable, cryptographic IPNS name which corresponds to a libp2p public key.

The following is a useful mental model for understanding the difference between the two:

IPNS names are self-certifying. This means an IPNS record contains all the information necessary to certify its authenticity. IPNS achieves this using public and private key pairs:

Each IPNS name corresponds to a key pair.
The IPNS name is a CID with a multihash of the public key.
The IPNS record contains the public key and signature, allowing anyone to verify that the record was signed by the private key holder.

This self-certifying nature gives IPNS several benefits not present in hierarchical and consensus systems such as DNS and blockchain identifiers. Notably, IPNS records can come from anywhere, not just a particular service/system, and it is very fast and easy to confirm that a record is authentic.

DNSLink

DNSLink uses DNS TXT records to map a DNS name, like ipfs.io, to an IPFS address. Because you can edit your DNS records, you can use them to always point to the latest version of an object in IPFS. Since DNSLink uses DNS records, you can assign names, paths, and sub-domains that are easy to type, read, and remember.

IPFS Users

Many types of users can benefit from IPFS. This includes:

Archivists: Storing archival data using IPFS enables deduplication, clustered persistence, and high performance—empowering you to store the world’s information for future generations.
Service providers: Providing large amounts of data to users? Storing on IPFS could help you slash bandwidth costs thanks to its use of secure, peer-to-peer content delivery.
Researchers: If you’re working with or distributing large datasets, storing that data using IPFS can help speed up performance and unlock decentralized archiving.
Blockchain developers: IPFS content addressing enables you to store large files off-chain and put immutable, permanent links in transactions—timestamping and securing content without having to put the data itself on-chain.
Content creators: IPFS empowers creators to build and share on the decentralized web—whether delivering content free from intermediary control or minting NFTs that stand the test of time.
Offline users: High-latency networks cause major obstacles for those with poor internet infrastructure. Peer-to-peer IPFS offers resilient access to data independent of latency or backbone connectivity.

Filecoin

IPFS is a great idea. People can store their files in a decentralized, secure, private way. On the other hand, if you have free storage space, you can help the ecosystem grow and store other’s files on your hard drive.

In the world of capitalism, no one likes to work for free. IPFS is great, but if you want to give others your free storage space, you definitely want something in return. What is better than money? So it would help if you had a way to monetize your free storage space, and it must be in such a way that is better than the centralized way of cloud storage.

Filecoin is a decentralized data storage network built by Protocol Labs, the same team that brought IPFS to the world, that allows users to sell their excess storage on an open platform. It acts as the incentive and security layer for IPFS. Filecoin turns IPFS’ storage system into an “algorithmic market,” where users pay storage providers in Filecoin’s native token, FIL, to store and distribute data on the network.

Filecoin is looking to provide an alternative to traditional online storage providers and protocols. Its technology acts as an incentive layer for the peer-to-peer file transfer system IPFS, which uses hash-addressed content structures to store data instead of centralized servers and IP addresses. This is intended to reduce redundancy, increase permanence, and improve efficiency.

Decentralized data storage will also increase the efficiency of storage, which will lead to reduced storage costs. Consider that Amazon AWS S3 charges more than $23 per Terabyte per month, but Filecoin reduces that to approximately $0.19 per Terabyte per month at the time of writing this article. This means that with a $5.5 price for every FIL, 1 TB of data for one month costs 0.035 FIL. So, let’s assume a scenario where FIL price soars to $100 (almost a 2000% increase), which is a bit out of mind for a short time outlook. If, in this scenario, storing 1 TB of data for one month costs 0.1 FIL (almost 180% increase), then the Dollar price of storing such data for one month will be only $10–still less than the cost of centralized storage providers charge you right now.

The scenario described above is not impossible, but it is very improbable, especially for a short time view of the market. But, even in this improbable scenario, storing files on Filecoin costs less than centralized cloud storage. Decentralized networks can lower costs dramatically because they don’t have the running costs of centralized networks.

Filecoin incentivizes IPFS by rewarding storage providers and retrievers for contributing resources to the system. The network also comes with built-in Ethereum integration, allowing developers to access data on Ethereum’s blockchain and interact with its smart contracts.

Filecoin is being developed by Protocol Labs, a development firm founded in 2014 by Juan Benet. Benet and his crew constructed Filecoin and IPFS in tandem, raising a few Seed equity rounds to fund the process. In 2017, the team hosted a token sale to secure funds for Filecoin’s development, which raised around $205 million in one of the largest token offerings at the time.

Protocol Labs seeks to build a fundamental layer for data infrastructure that can be used by both blockchain and traditional providers, like Amazon Web Services (Amazon AWS) and Microsoft Azure. The project plans to achieve this goal by creating a marketplace in which any user who has storage capacity can connect to the network, creating a supply of unused storage both in consumer hardware as well as data centres of existing businesses. The Filecoin team believes this will reduce the storage price in a way similar to how sharing economy companies like Airbnb reduced the price of short-term rentals in marketplaces traditionally dominated by large players with large capital requirements.

Protocol Labs introduced many new technologies in its Filecoin whitepaper that could add value to multiple blockchain projects. Filecoin is secured by proof-of-work in the same way that bitcoin is, but this work is specifically related to data storage instead of finding a random nonce to make the block hash fit in a target range. The network’s proof-of-work is restricted to proving that a miner has stored data for a specific duration and replication. This is achieved with two new types of proof-of-work: Proof-of-Replication (PoRep) and Proof-of-Spacetime (PoSt), which we will discuss later.

Proof-of-Replication allows a server to convince a user that some data has been replicated to its own uniquely dedicated physical storage, while proof-of-Spacetime allows an efficient prover to convince a verifier that they are storing some data for a specified duration of time.

These proofs allow Filecoin to solve issues with large-scale storage networks made of independent parties by making it theoretically impossible to falsify data storage records to increase miner rewards. Competitors like Siacoin ($SC) and Storj ($STORJ) lack this functionality, though they still need to create enough storage demand for this to be a significant issue.

Miner proofs are used to create a network based on three primary methods, put, get, and manage. The put and get methods are responsible for storing data and accessing it on client request, respectively. The managing method is responsible for managing the marketplace by matching buy and sell orders, as well as managing buyer and seller reputation on the platform.

These methods are executed across two marketplaces, storage and retrieval, which are managed by different miners. Protocol Labs believes that miners will often participate in both markets. Storage miners are responsible for receiving put requests and storing client data while also pledging collateral proportional to the data. They will be penalized by losing this collateral in the event of invalid or missing proofs. Storage providers run the management method in conjunction with clients and auditors. Retrieval miners are responsible for managing get requests and giving clients their data. While retrieval miners do not need to pledge collateral, they are still compensated in the native Filecoin for performing work for the network.

Filecoin was among the first blockchain projects to introduce the concept of a decentralized storage network (DSN). A DSN is a data storage scheme that includes a network of independent storage nodes and clients. The DSN aggregates the storage offered by the independent node operators and coordinates the storage and retrieval of the data.

The aggregation and coordination are decentralized, which removes the need for trusted third parties. Instead, security is achieved through the operating protocols, which coordinate operations and verify data storage and retrieval.

Consensus Mechanisms

IPFS brought peer-to-peer, decentralized storage to the world. After this idea and project, Protocol Labs created Filecoin to be the incentive and security layer of the IPFS. Due to the decentralized nature of IPFS, Filecoin needs to be decentralized too. The idea of blockchain, which Satoshi Nakamoto introduced to the world with Bitcoin, helped Protocol Labs to create a decentralized system on top of IPFS protocol.

Filecoin’s blockchain doesn’t differ from most of the other simple blockchains, like Bitcoin and Ethereum. The most differentiating part of this blockchain is the consensus mechanism, which not only distinguishes the Filecoin blockchain from others, but it distinguishes it from other decentralized storage protocols existing before.

Before Filecoin’s two creative consensus mechanisms, many other Proof-of-Storage (PoS) consensus mechanisms existed:

Provable Data Possession (PDP): This proof allows users to send data to a server, and later, users can repeatedly check whether the server is still storing the data. PDPs are useful in cloud storage and other storage outsourcing settings. PDPs can be either privately-verifiable or publicly-verifiable and static or dynamic. A wide variety of PDP schemes exist.
Proof-of-Retrievability (PoRet): These kinds of proofs are similar to PDPs but also enable extracting the data; namely, they offer retrievability. PDPs allow the verifier to check that the server is still storing the data, but the server may submit valid PDP proofs yet hold the data hostage and never release it. PoRs solve this problem by making the proofs themselves leak pieces of the data so that the user can issue some number of challenges and then reconstruct the data from the proofs.
Proof-of-Replication(PoRep): These schemes are another kind of PoS that additionally ensure that the server dedicates unique physical storage to storing the data. The server cannot pretend to store the data twice and deduplicate the storage. This construction is useful in Cloud Storage and Decentralized Storage Network settings, where ensuring a proper level of replication is important and where rational servers may create Sybil identities (a duplicated identity, usually digital, for one individual to create more power of vote or other malicious uses) and sell their service twice to the same user. PoRep schemes ensure that each replica is stored independently. Some PoRep schemes may also be PoRet schemes.
Proof-of-Work (PoW): These schemes allow the prover to convince the verifier that the server (prover) has spent some resources. The original use case presented this scheme to allow a server to rate-limit usage by asking the user to do some expensive work per request. Since then, PoW schemes have been adapted for use in cryptocurrencies, Byzantine Consensus, and many other systems. Famously, the Bitcoin network expends massive energy in a hashing PoW scheme, used to establish consensus and safely extend the Bitcoin ledger.
Proof-of-Space (PoSpace): These schemes allow the prover to convince the verifier that the server (prover) has spent some storage resources. PoSpace schemes are PoW schemes where the expended resource is not computation (CPU instructions) but rather storage space. In a sense, a PoS scheme is also a PoSpace, since a PoS implies using storage resources.
Proof-of-Spacetime(PoSt): These schemes allow the prover to convince the verifier that the server (prover) has spent some “spacetime” (storage space used over time) resources. This is a PoSpace with a sequence of checks over time. A useful version of PoSt would be valuable, as it could replace other PoW schemes with a storage service.

Filecoin has created two new consensus algorithms to make its storage system publically verifiable. These are Proof-of-Replication (PoRep) and Proof-of-Spacetime (PoSt).

Proof-of-Replication

Motivations

Consider the following scenarios:

Replication: A user wishes to hire a server to store n independent copies of some data; in other words, the user wants a replication factor of n. PDP and PoRet schemes do not give the user a way to verify that the server is storing these n replicas separately, rather than merely pretending to do so.
Deduplication: A user asks each of n different servers P0, P1, ⋯, Pn in 𝓟 to store some data. With normal PDP and PoRet schemes, the servers could collude and store the data only once instead of n times (once each). When issued a challenge, Pi would only need to retrieve the data from whichever Pj that is actually storing it, calculate the proof, and discard the data.
Sybil identities: A setup very similar to deduplication above, but now all servers P0, P1, ⋯, Pn in 𝓟 are secretly just one server, say P0. The others are Sybil identities.
Networks: A set of users and servers come together to form a Decentralized Storage Network (DSN), where all participants simulate a unified service that outsources storage to each individual server. Ideally, each individual server could prove they are storing each replica of data uniquely in a transparent and publicly-verifiable way.

Current PoS schemes do not address these scenarios in full. PDP and PoRet schemes do not prevent a single prover (or group of provers) from deduplicating data across multiple user requests. Users with the data can achieve replication with PDP and PoRet schemes by deriving a set of encrypted replicas (the data signed by the secret key of the prover number i) and keeping the mapping and keys secret. However, this is expensive and not transparent, which means that in a Decentralized Storage Network setting, a user could also play the role of the server, have access to some replication mappings and keys, and thus deduplicate the storage. We must do better.

How does it work?

Since the formal definitions and the step-by-step guide of this scheme is complex and needs high-level education in computer science and cryptography, I simplify the process here. Please consider that this simplified version may only answer some questions since it may lack accuracy and precision. Here is how it works.

Basically, the whole idea is to play a game. The challenger (verifier) asks some cryptographic questions over time, and the adversary is obligated to answer them cryptographically. There are three challenges: Sybil Attack, Outsourcing Attack, and Generation Attack.

The prover needs to answer correctly against all these attacks at any time in a specified period of time. She will pass the test if she can prove that she is not cheating in any of these challenges. Otherwise, she is considered an adversary and may lose part or all of her staked tokens (we talk about staking a bit later). The image below shows a high-level overview of what happens with an honest prover and an adversary (attacker) who cannot prove on time.

To make things even harder for the adversary, a different type of encryption system is used in the process of encrypting the data to store. This encryption system is very slow in encrypting and pretty fast in decrypting. This allows the verifier to ensure that the attacker is not requesting another server for the data and trying to encrypt it as per the challenge. The slow encryption ensures the challenger that if the prover answers (creates a proof) on time, it means that it had the data and the encryption on its storage all the time. Also, it ensures the verifier that the prover stores the data encrypted and sealed, and nothing leaks from the data.

Proof-of-Spacetime

In a Proof-of-Storage scheme, a user can check if the storage provider is actually storing the expected data when a challenge is issued. However, it doesn’t verify that the data remains stored across a given period of time.

One way to accomplish this would be to challenge the storage provider repeatedly. Of course, this introduces a huge amount of complexity and communication and would become a bottleneck to the Filecoin system since storage providers must submit their proofs to the blockchain network.

Proof-of-Spacetime bypasses this by allowing a verifier to check If a storage provider is storing requested data over a range of time. It accomplishes this by requiring the storage provider to:

generate sequential Proofs-of-Storage (in our case Proof-of-Replication), as a way to determine time; r
ecursively compose the executions to generate a short proof.

PoSt and PoRep both use zk-SNARKs, making proofs very short and easy to verify.

Smart Contracts

Smart contracts were included to allow users to access stateful programs which allow for the validation of storage proofs, request storage and retrieval of data, and spend tokens.

The smart contracts are triggered by certain transactions sent to the ledger. Filecoin has extended the smart contract system to include its own blockchain-specific operations, such as proof of verification and market operations.

Cross-chain Interactions

While not fully implemented yet, Filecoin’s developers are working on support for cross-chain interaction through the use of bridges. This allows other blockchains to utilize the Filecoin storage system while also allowing Filecoin to benefit from the functionalities of other blockchain platforms.

Mining on Filecoin

Since the Filecoin mainnet is live, users have the opportunity to earn FIL tokens by providing data storage and retrieval services to users across the global network.

The more data a miner stores, the greater its storage power. The miner increases the likelihood of generating new blocks and winning block rewards by increasing storage power. Miners get to choose if they want to participate in storage mining, retrieval mining, storage power consensus, or all three.

Mining on Filecoin differs from mining on a Proof-of-Work blockchain because Filecoin mining is based on storage power consensus rather than raw computing power. That means the more proven storage you have on the network, the more likely you will win block rewards.

The storage power is linear with respect to the amount of storage added to the network by each miner. The amount of GPUs does not determine the likelihood of winning block rewards. This contrasts with a Proof-of-Work blockchain where miners compete on GPU power to win block rewards.

The Filecoin miners only use GPU power during the ElectionPoSt, only if they have winning election tickets. In short, the cheapest way for a miner to gain power on the Filecoin network is by adding more useful storage to the network. It is possible to test mining on the testnet, which went live in December 2019.

For small miners who worry about GPU power for the ElectionPoSt, the Filecoin team is researching ways to outsource the zk-SNARK computation to minimize GPU costs for miners.

Codes and Programming

I reviewed some parts of some codes. They look pretty strong in programming, and I haven’t found any visible bugs. They especially work on FIPs (Filecoin Improvement Proposal), and the Filecoin version of EIPs (Ethereum Improvement Proposal). Through the FIPs, people can propose their improvement proposals. These proposals go to a committee of Filecoin developers through a back-and-forth process, and then the best ones will be chosen to improve Filecoin.

There has been a major problem in which miners could cheat while showing that they are storing the files themselves. This major problem was fixed by one of these FIPs, the 0003 FIP.

Also, Filecoin codes have been audited by codecov.io, which you can see in the results below.

In conclusion, not only is Filecoin working well in terms of programs and codes but it is constantly upgraded by FIPs. This makes Filecoin a great threat to other companies and projects in the same area.

Team and Partners

Team

Unlike most blockchain projects, Filecoin was not founded by an individual or group of individuals. Instead, it comes from a U.S. company called Protocol Labs. Protocol Labs was founded in 2014 by Juan Benet, and long before it became involved with Filecoin, it was involved with creating foundational internet infrastructure technology.

One of its most widely known and used inventions is the Inter-Planetary File System (IPFS), a decentralized web protocol that hopes to replace HTTP. The company continues to research, develop, and deploy network protocols.

Juan has been the founder, CEO, and engineer in Protocol Labs since 2014. He formerly was the founder of Athena and co-founder and CTO at Loki Studios. He graduated in computer science from Stanford University.

Investors

This project had three Seed sale rounds. The first one occurred on Jul 16, 2014, almost at the start of the Protocol Labs. This round raised $120k for Protocol Labs. The next two rounds happened in 2017. The information of only one of these rounds is available, which brought $300k to the company.

On Aug 10, 2017, Filecoin made its first Initial Coin Offering (ICO). Seventeen investors participated in this round, and Andreessen Horowitz, the big cryptocurrency investing company, was the lead investor. It raised $52 million in this round.

Filecoin made its second ICO round on Sep 7, 2017. It’s odd since it is very contradicting to have the words “second” and “initial” together. Filecoin made $205 million in this round, and the coin price skyrocketed afterward.

Partners

Protocol Labs has many collaborations with some mediocre companies, including AE.studios, actyx, Brave Browser, ChainSafe, Dapper, Dover, Fleek, Infinite Scroll, Infura, Internet Archive, Wolfram, and so on.

Top Competitors

There are other projects with similar goals to Filecoin, which have already launched their networks. The most well-known and notable of these are Siacoin and Storj.

At the time of writing this article, Siacoin has over 550 active providers with a total capacity of 5.1 Petabytes and current utilized storage of nearly 700 Terabytes. These numbers were much higher a year ago, and it’s plummeting over time. You can see the used storage and capacity storage of Siacoin for the past five years in the image below.

Sotrj has had a functioning cryptocurrency (STORJ) since 2017. This project has 16.4 Petabytes of storage, from which 4.25 Petabytes have been used over time. Storj has more than 21000 active nodes securing the network and providing storage.

Also, consider that the centralized players in cloud computing will not give up easily. Amazon S3 is currently the largest file storage platform in the world, but others, such as Microsoft and Alphabet (Google’s mother company), are also working hard to claim market share.

It could be extremely difficult for decentralized options like Filecoin to overtake these centralized giants that have strong business connections, offer reliable service, and are easily scalable. It is also an excellent choice for developers who want integration with other Amazon services (or Microsoft or Alphabet, depending on the platform used).

Today’s centralized data storage capacity is more than 8 Zettabytes, estimated to become more than 16 Zettabytes by 2025.

Filecoin’s storage capacity is more than 14.27 EB, from which more than 500 PB has been used at the time of writing this article. Total unique CIDs are around 6 million, and more than 1100 unique providers are securing the network and providing storage. The image below shows you the network storage capacity for the past two years. The “Network QA Power” shows how much storage is ready in the network, and the “Baseline Power” shows the expectation of Filecoin through time for the storage capacity.

Now, let’s compare Filecoin (decentralized data storage) with centralized data storage like Google Drive or Amazon AWS S3.

Roadmap

You can find Filecoin’s community roadmap for 2022 in the image below.

Since then, they have almost reached all of their deadlines. They even announced Filecoin Virtual Machine (FVM) and implemented it in the past year. A dynamic roadmap is designed for this project by the community. The image below is the most recent update on the roadmap, which is obviously not updated at all.

Tokenomics

The Filecoin token (FIL) is the native crypto token of the Filecoin network, similar to BTC and Ether. Token holders can use FIL to participate and transact in the Filecoin network. In particular, users pay miners in FIL to store or distribute data and retrieve their information. Storage providers also post FIL as collateral to provide a minimum level of guarantee of their service, which gets slashed should a deal with a customer fall through.

Storage Provider

Becoming a Storage Provider on the Filecoin Network is a way to participate in the preservation of humanity’s most valuable information. It can also be a profitable endeavor. The Filecoin community is home to a fast-growing ecosystem of SPs of all sizes and geographic locations. Most of these have successfully applied different strategies and setups that have become financially sound businesses.

It’s important to understand the economics of the Filecoin Network for SPs, to understand how this is possible. We’ll go through a few of the basic concepts SPs should be familiar with as they become active members of the Filecoin economy.

SPs have two main sources of revenue in the Filecoin Network: they earn fees charged to end users for the storage and retrieval of data, and they have a chance to receive block rewards in FIL.

The first is available to all SPs who meet the hardware requirements for participating in the network. These fees are an essential part of the Filecoin economy since they’re determined by each SP and create a market for storage and retrieval that offers competitive opportunities to small and large participants alike.

The second is limited to SPs participating in the network as consensus nodes. To do so, they must meet a minimum of 10 TiB in committed storage capacity. This, however, entitles them to verify the next block in the Filecoin Network in exchange for a reward, much like miners who receive a coinbase transaction in PoW networks. Their chances of receiving this reward are proportional to the amount of committed capacity they’ve contributed to the total Storage Power available in the network.

In terms of costs, SPs have many factors to consider:

purchasing and maintaining performant hardware setups
optimizing sales and marketing operations
power consumption, as well as other expenses and interest payments on borrowed FIL when

applicable Some of these, such as hardware setups and power consumption, can be negotiated.

Another economic concept to understand is collateral. This is simply the amount of FIL that SPs have to stake in order to guarantee that they will act in good faith and that their incentives are aligned with the rest of the economy. If they fail to meet their responsibilities to the network, their collateral is slashed, meaning they lose a portion of the FIL.

Collateral is considered separately from the costs in the P&L since the pledged FIL is paid back after successful storage and deals. However, it is still an important factor to consider since it can be seen as part of an initial investment.

There are three types of collateral:

Initial pledge collateral: The amount of FIL needed to be staked to participate in the Filecoin economy. It is equal to seven days’ worth of Sector fault fees plus one Sector fault detection fee.
Block reward as collateral: The amount of FIL SPs receive for proposing a block in the consensus process. Seventy-five percent of this amount can vest over a period of six months and is subject to slashing if a Sector is terminated before its expiration.
SP deal collateral: The amount of FIL that is slashed if a deal is terminated. Higher collaterals indicate higher reliability to potential clients.

Token History & Economics

Within a short timespan in 2017, Filecoin raised $52 million in an advisor pre-sale, followed by a $153 million public offering through the token sale platform CoinList. It raised these funds using a relatively new method known as a Simple Agreement for Future Tokens (SAFT). The SAFT is a legal agreement inspired by the Simple Agreement for Future Equity (SAFE), pioneered by Y Combinator. It gives accredited investors an allocation of tokens once the network is live in return for an upfront investment.

Of the $205 million raised, $52 million was sold to advisors at a rate of $0.75 per FIL token, with an additional discount of between 0% and 30%. Vesting for advisors ranged from one to three years. The remaining $153 million was raised from investors with FIL tokens priced based on a linearly increasing function. The starting price for the sale was $1.00 per FIL and increased up to $5.00 per FIL using the formula (price = $ amount raised / 40 million), with a hard cap of 200 million FIL sold. Investor tokens from the advisor and token sale rounds will vest on variable periods with a six-month minimum.

FIL has a max supply of two billion tokens, of which 600 million were pre-allocated at genesis according to the following allocations (percentages based on max supply):

10.5% was allocated to Protocol Labs with a 6-year linear vesting
4.5% was allocated to Protocol Labs team members and contributors with a 6-year linear vesting
7.5% was allocated to 2017 SAFT investors with a 6-month to 3-year linear vesting
2.5% was allocated to future fundraising or ecosystem development
5% was allocated to the Filecoin Foundation with a 6-year linear vesting

The remaining 70% of the total supply is allocated to Filecoin miners and will be released over time to reward providing data storage service, maintaining the blockchain, distributing data, and running applications.

This chart is derived from purely market data. This data shows the mined FIL tokens by miners over time since the beginning of the mainnet. This is not a prediction of the future since mined tokens in a period of time are subject to market conditions in that period and may vary.

A crucial point about the FIL token price is that it is not subject to the demands of a centralized authority or the miners’ demands. The whole market–suppliers and demanders–decides the price of the FIL token. This happens due to the supply of FIL by miners and the demand of users who want to store their files or want to invest in the token.

References

Filecoin: A decentralized storage network for humanity's most important information, https://filecoin.io/. Accessed 24 February 2022.
Filecoin Spec: Home, https://spec.filecoin.io/#section-intro.implementations-status. Accessed 24 February 2022.
IPFS Powers the Distributed Web, https://ipfs.io/#why. Accessed 24 February 2022.
Filfox - Filecoin explorer, https://filfox.info/en. Accessed 24 February 2022.
“About.” Protocol Labs, https://protocol.ai/about/. Accessed 24 February 2022.
Amazon AWS. “Amazon S3 Simple Storage Service Pricing - Amazon Web Services.” Amazon AWS, https://aws.amazon.com/s3/pricing/. Accessed 23 January 2023.
Amazon AWS. “AWS Product and Service Pricing | Amazon Web Services.” AWS, https://aws.amazon.com/pricing/. Accessed 21 January 2023.
Amazon AWS. “AWS Site Terms - Seattle.” Amazon AWS, 30 September 2022, https://aws.amazon.com/terms/?nc1=f_pr. Accessed 21 January 2023.
Amazon AWS. “Serverless File System | Amazon Elastic File System Pricing | AWS.” Amazon AWS, https://aws.amazon.com/efs/pricing/?did=ap_card&trk=ap_card. Accessed 23 January 2023.
Ashenfelder, Mike. “The Average Lifespan of a Webpage | The Signal.” Library of Congress Blogs, 8 November 2011, https://blogs.loc.gov/thesignal/2011/11/the-average-lifespan-of-a-webpage/. Accessed 21 January 2023.
Beegle, Kaitlin. “Filecoin Community Roadmap - 2022Q1 Release · Discussion #456 · filecoin-project/community.” GitHub, https://github.com/filecoin-project/community/discussions/456. Accessed 23 January 2023.
Bennet, Juan. “Juan Benet - Founder, CEO, Engineer - Protocol Labs.” LinkedIn, https://www.linkedin.com/in/jbenetcs/. Accessed 23 January 2023.
Bennet, Juan, et al. “Proof of Replication.” Filecoin, 27 July 2017, https://research.filecoin.io/assets/proof-of-replication.pdf. Accessed 23 January 2023.
Buterin, Vitalik. “Ethereum Whitepaper | ethereum.org.” Ethereum.org, 2013, https://ethereum.org/en/whitepaper/. Accessed 23 January 2023.
Coin Bureau. “Filecoin is HOT Right now! But Will You Get BURNED??” YouTube, 27 March 2021, https://www.youtube.com/watch?v=R0aAmh3PMMA. Accessed 24 February 2022.
Coin Bureau. “Filecoin Review 2020: Top Launch To Watch!” YouTube, 5 January 2020, https://www.youtube.com/watch?v=z4aRBC2qsrY. Accessed 24 February 2022.
Coin Bureau. “Filecoin Review: Here's The Lowdown On FIL!!” YouTube, 23 October 2020, https://www.youtube.com/watch?v=I_g6PPWG5P0. Accessed 24 February 2022.
Crunchbase. “Filecoin - Funding, Financials, Valuation & Investors.” Crunchbase, https://www.crunchbase.com/organization/filecoin/company_financials. Accessed 23 January 2023.
Cyber Initiative. “Proof of Replication using Depth Robust Graphs - BPASE '18.” YouTube, 30 January 2018, https://www.youtube.com/watch?v=8_9ONpyRZEI. Accessed 23 January 2023.
Dashboard Starboard. Filecoin Network Health Dashboard - Starboard, https://dashboard.starboard.ventures/dashboard. Accessed 23 January 2023.
File App. “The State of Storage.” Storage Market, https://file.app/. Accessed 23 January 2023.
Filecoin. Filecoin Storage Stats, https://storage.filecoin.io/. Accessed 23 January 2023.
Filecoin. “The Economics of Storage Providers.” Filecoin, 5 April 2022, https://filecoin.io/blog/posts/the-economics-of-storage-providers/. Accessed 24 January 2023.
Filecoin Community. “Filecoin Roadmap by the Filecoin Community.” Miro, https://miro.com/app/board/uXjVOR1oLO8=/?invite_link_id=215641697288. Accessed 30 January 2023.
Filecoin Community. “Home.” YouTube, https://miro.com/app/board/uXjVOR1oLO8=/?invite_link_id=215641697288. Accessed 23 January 2023.
“Filecoin · GitHub.” GitHub, https://github.com/filecoin-project. Accessed 24 February 2022.
“Filecoin News 36.” Filecoin, 16 February 2022, https://filecoin.io/blog/posts/filecoin-news-36/. Accessed 24 February 2022.
“Filecoin price today, FIL to USD live, marketcap and chart.” CoinMarketCap, https://coinmarketcap.com/currencies/filecoin/. Accessed 24 February 2022.
“FIL - Filecoin · portfolio.” weronika zak, 14 February 2022, https://weronikazak.github.io/2021/Filecoin/. Accessed 24 February 2022.
Google. “Google Privacy Policy.” Privacy & Terms, https://policies.google.com/privacy?hl=en. Accessed 21 January 2023.
Hannan, Ed. “What is Kilo, Mega, Giga, Tera, Peta, Exa, Zetta and All That?” TechTarget, https://www.techtarget.com/searchstorage/definition/Kilo-mega-giga-tera-peta-and-all-that. Accessed 23 January 2023.
IPFS. “DNSLink.” IPFS Docs, 26 July 2022, https://docs.ipfs.tech/concepts/dnslink/#publish-content-path. Accessed 22 January 2023.
IPFS. “IPNS (InterPlanetary Name System) and Mutability.” IPFS Docs, 16 January 2023, https://docs.ipfs.tech/concepts/ipns/#how-ipns-works. Accessed 22 January 2023.
IPFS. “What is IPFS?” IPFS Docs, 5 August 2022, https://docs.ipfs.tech/concepts/what-is-ipfs/#decentralization. Accessed 22 January 2023.
Nakamoto, Satoshi. “A Peer-to-Peer Electronic Cash System.” Bitcoin.org, https://bitcoin.org/bitcoin.pdf. Accessed 23 January 2023. Nanda, Vineet. “Difference between Centralized Data Storage and Distributed Data Storage.” Tutorialspoint, 25 November 2022, https://www.tutorialspoint.com/difference-between-centralized-data-storage-and-distributed-data-storage. Accessed 23 January 2023.
Njogu, Tabitha. “Centralized Data Storage and Distributed Data Storage | Difference Between.” Difference Between, http://www.differencebetween.net/technology/difference-between-centralized-data-storage-and-distributed-data-storage/. Accessed 23 January 2023.
Protocol Labs. “Proofs-of-Replication - Filecoin Research.” YouTube, 14 February 2019, https://www.youtube.com/watch?v=L826rIziNMQ. Accessed 23 January 2023.
ProtoSchool. “DWeb Tutorial | Content Addressing on the Decentralized Web.” ProtoSchool, https://proto.school/content-addressing/. Accessed 21 January 2023.
ProtoSchool. “Multiformats Tutorial | Anatomy of a CID (Lesson 1).” ProtoSchool, https://proto.school/anatomy-of-a-cid/01/. Accessed 21 January 2023.
SiaStats. “Hosts network.” SiaStats.info, https://siastats.info/hosts_network. Accessed 23 January 2023.
Starboard. “Chart: FIL Protocol Circulating Supply / Starboard.” Observable, https://observablehq.com/@starboard/chart-fil-protocol-circulating-supply. Accessed 24 January 2023.
Taylor, Petroc. “Global Datasphere: storage capacity 2025.” Statista, 26 September 2022, https://www.statista.com/statistics/1185900/worldwide-datasphere-storage-capacity-installed-base/. Accessed 23 January 2023.
Walters, Steve. “Filecoin Review: Beginners Guide | Everything You NEED To Know.” The Coin Bureau, 3 January 2020, https://www.coinbureau.com/review/filecoin-fil/. Accessed 24 February 2022.
West, Kanye. “Who originally suggested that 'if you're not paying for the product, you are the product'?” Quora, https://www.quora.com/Who-originally-suggested-that-if-youre-not-paying-for-the-product-you-are-the-product. Accessed 21 January 2023.
“What is Filecoin (FIL) | History, Roadmap, Economics.” Messari, https://messari.io/asset/filecoin/profile/launch-and-initial-token-distribution. Accessed 24 February 2022.

Appendices

Appendix 1

The tables below show the prefixes, symbols, and power of 10 for the scientific terms used for numbers throughout the article.

Subscribe to Arya

Receive the latest updates directly to your inbox.

Mint this entry as an NFT to add it to your collection.

Verification

This entry has been permanently stored onchain and signed by its creator.

Arweave Transaction

cGFAzwqqxwLe825…0wqWkAARrqtElww

Author Address

0x2d209040c031d4e…aabf18F52260AB0

Content Digest

m5XB8GpyDWMZd-R…i241x1iQcdhoLgY