IPFS & Arweave: A Winning Combination For Extending NFT Metadata and Provenance

"Artworks most dear have, for countless moons, escaped the grip of oblivion - and such strange, bewitching treasures, borne of genius arcane, shall persist eternally beyond the gloaming of generations yet to come."

Edgar Allan Poe, or maybe just an unprovable AI hallucination

The most precious works of art have been preserved for centuries, and precious art is worth being preserved for centuries to come. During their travel through time, works of art have built up, and continue to build up, a record of historic events, such as changing owners or getting shown at important exhibitions. And they also collect stories along the way: stories the work of art tells us and stories that are told about it.

Digital art (in the form of NFTs) is just as worthy of preservation as traditional art. Imagine you could store a digital work of art in a safe and provable way, together with provable information about its provenance and its history, like previous owners, galleries where it was exhibited, and so on.

And yet, the NFT space is still in the Wild West stage of exploration and experimentation. It happened (and continues to happen) that an NFT file and its history vanishes within a short timeframe because they were not stored in a future-proof way. There is a certain irony here, because what better place should there be for storing digital art than a blockchain?

The blockchain storage dilemma

Blockchains are designed to store information provably, reliably, and securely. Invented initially as an infallible ledger for financial transactions, blockchains have since evolved further to store unique, identifiable information: Non-Fungible Tokens, or NFTs.

Therefore, storing related metadata along with an NFT on the blockchain seems a natural choice, but this approach has two problems.

  1. A blockchain grows steadily. New blocks are added, but none are deleted. To make a blockchain grow as slowly as possible, only the most essential information is stored directly on the chain.

  2. Storing large amounts of data on-chain can be prohibitively expensive or, for large files like 4K videos, outright impossible. The entire Bitcoin blockchain is approximately 500 GB, which would store less than 25 hours of 4K video.

Off-chain storage, on the other hand, is cheap and abundant. Web2 has various options for storing any kind of data, but they lack the required properties for storing data provably and in the long term.

  1. In Web2, data is typically identified by location (primarily through URLs, IP addresses, or other address identifiers). When data moves, location identifiers become invalid and must be updated to reflect the new location.

  2. In Web2, data is usually stored in centralized locations, making it heavily reliant on individual data centers or companies that may discontinue services without notice.

If neither on-chain nor off-chain storage meets the needs for reliable, provable storage of NFT metadata, could a hybrid solution solve this problem?

IPFS to the rescue

A solution to both problems of Web2 data storage exists: The inter-planetary file system (IPFS) addresses data by content rather than by location. IPFS stores immutable chunks of data that are irreversibly bound to a content identifier (CID) immune to location changes. You can think of each CID like an unforgeable fingerprint that is unique to each file. Change one byte of the file, and the fingerprint changes.

IPFS also supports distributed information storage, thus solving the problem of centralized storage. And IPFS storage is much cheaper than on-chain storage.

IPFS would, therefore, be an ideal solution for provable, long-term data storage: Once the data is stored immutably on IPFS, its CID can be stored on-chain to tie the data to the corresponding NFT.

However, IPFS has some shortcomings that get in the way of storing digital art metadata.

  1. Files on IPFS have no timestamps. It is impossible to tell when a particular file was added to IPFS. What is the history of a piece of art worth without provable timestamps?

  2. The lack of file metadata makes querying for data on IPFS next to impossible without relying on third-party indexing services.

  3. IPFS does not guarantee permanent storage. IPFS is like a peer-to-peer network where nodes can come and go. Data can be distributed across multiple nodes, but once the last of these nodes goes offline, the data is lost.

  4. The only way of keeping data more or less permanently on IPFS is by paying a pinning service to keep your IPFS content alive on their nodes. Just be sure to watch your pinning service regularly. If it goes out of business, you’d better have your data already pinned with another service (that you would have to watch for going out of business again, so the cycle repeats ad infinitum).

These shortcomings must be overcome to create a truly reliable storage of NFT metadata.

Existing solutions on top of IPFS, like FileCoin, strive to make IPFS storage more reliable, but these approaches are onerous for the casual user to set up and maintain. Other services strive to make the process easy, but they don’t offer a complete solution for all kinds of files.

How can we achieve permanent storage, metadata, timestamping, and easy querying?

Enter Arweave

A solution already exists that offers the features missing in IPFS, such as permanent storage, metadata, and easy querying. Arweave is a blockchain-based storage service, meaning a blockchain network secures storage. Arweave data is timestamped, thus allowing users to verify the minimum age of NFT metadata (when data was added to Arweave). This feature is essential for attestation of historical data of NFT art. Finally, Arweave data is trivial to query by tags, an aspect that sets Arweave apart from IPFS and blockchains built for transaction verification rather than storing and retrieving historical data.

Unfortunately, even Arwave has some limitations that get in the way of storing NFT metadata.

  1. Arweave apps default to using Web DNS for routing (despite efforts to establish a new routing standard). DNS records are stored with a single registrar and are ephemeral. Changing or deleting a DNS record can remove access to data.

  2. Arweave does not use content identifiers (CIDs). Its identifiers (TIDs) are not derived from the content they represent. There is no way of proving that a given TID belongs to a given data object.

  3. Being a blockchain, Arweave cannot be self-hosted on typical end-user hardware. If the Arweave network goes down completely (which can happen for many reasons, such as becoming financially unviable), access to the data is lost (or, at least, frozen).

  4. There is a lot of controversy around Arweaves claims to “permanent” storage. If the network was to ever go down, this promise would be broken. So files uploaded to Arweave need to have a redundant way to export and reconnect any relationships (ideally through IPFS).

The disadvantages of IPFS appear to be the advantages of Arweave and vice versa. Can we get the best of both worlds?

IPFS and Arweave should join forces. But how?

It turns out that both systems can be woven together so that all the desired features of a permanent, provable, and affordable storage system come together.

Arweave has

  • permanence,

  • timestamping, and

  • metadata,

and IPFS has

  • content-addressability and

  • distributed serving.

IPFS can, therefore, act as a “frontend,” providing distributed access to content-addressable data in a peer-to-peer fashion. Arweave can act as a storage backend, providing failsafe, on-chain backup storage with metadata, timestamps, and querying capabilities.

At Atomic Form, we have put this approach into practice. Atomic Sign stores NFT metadata on IPFS and in Arweave and links the data together by tagging every data object with information from both Arweave and IPFS.

Here is an excerpt of tokens stored with an object on Arweave:

  • App-Name: Atomic Sign

  • App-Version: v0.1.0

  • IPFS-CID: $IPFS-CID

  • Type: Attestation

  • Topic: Attestations

  • Content-Type: application/json

  • chain-ID: $CHAINID

  • contract: $CONTRACT

  • tokenID: $TOKENID

(If you are curious, see a live example here.)

Without going too much into technical details, the main point is how Atomic Sign connects data from IPFS and Arweave: The simple introduction of the IPFS-CID tag creates the link to the data stored on IPFS, and the three tags chain-ID, contract, and tokenID represent the NFT itself.

Advantages of hybrid data storage

This hybrid solution creates synergy effects that make it ideally suited for storing NFT metadata. Several advantages emerge from this hybridity.

IPFS is a peer-to-peer network in nature, as mentioned earlier. Storage permanence is not IPFS’ main goal. Data stored on IPFS can be replicated to multiple nodes and pinned there to stay online; however, every node owner can decide to shut down their service and take the node offline.

Here is where Arweave comes into play: If an IPFS link breaks because the last node that stored that link goes offline, Arweave can be queried for the CID to re-seed and pin the data in IPFS.

Content typing and tagging

IPFS does not describe the content behind a link. Unlike traditional file systems that store file metadata like creation date along with each file, IPFS focuses on making data provably accessible through content addressing.

Three problems are associated with that approach when it comes to saving NFT metadata.

  1. Without file metadata, IPFS content is not queryable.

  2. As we will see in the next section, without a timestamp for each stored object, building a verifiably documented NFT history is impossible.

  3. Dapps/apps rely on knowing what type of file they are trying to render. Without this, the onus is on developers to download and identify the mimetype of a file before they can use it. This is something we faced a lot with our hardware displays at Atomic Form.

Arweave tags can fill this gap. Tags can hold any metadata associated with a data object. Atomic Sign’s hybrid approach adds queriable metadata to every IPFS object.

Proof of existence

Without a verifiable timestamp, IPFS data does not reveal when it was uploaded. To verify the history of digital art, it is crucial to know the age of each historical document added to the NFT metadata. Otherwise, there would be no way to verify an NFT’s history.

Atomic Sign connects IPFS uploads to the Arweave blockchain. After connecting the Arweave and IPFS parts of the NFT metadata through Arweave tags, the data is anchored to Arweave’s blockchain, therefore providing proof that the uploaded file has existed at least since the date and time of the upload. Any attempt to add a newer document and claim it as the original one would fail.

Arweave’s blockchain makes the complete history of an NFT tamper-proof.

Richer relationships

When a file is added to IPFS, IPFS creates a cryptographic hash value of the file called content identifier (CID). Files and directories can be linked to each other using these CIDs as a reference. With a given CID, IPFS can quickly look up the corresponding file through a distributed hash table. From there, a user can explore the file’s relationships to other files using the CIDs the file refers to.

Arweave, on the other hand, can connect files through context rather than content. Unlike CIDs, Arweave tags can be chosen freely. So looking up a tag can return one or more documents that carry this tag, independently of how they are linked up in IPFS.

Combining both mechanisms results in a richer data set with more options for querying and traversing this data set. Moreover, with two complementary querying techniques available, querying becomes faster by picking the one that is optimal for the query at hand.

Self-sovereignty: Stay in control of your data

While Arweave greatly solves the problem of permanent storage, it is a blockchain that steadily grows rapidly in size. Few people have the resources to keep a complete copy of the Arweave blockchain at home.

You can, however, run an IPFS node on minimal hardware (think Raspberry Pi) to store a copy of your NFT metadata. Content addressing makes this personal copy as valid as any copy on other IPFS nodes. Atomic Sign links the CID of your data to Arweave’s on-chain data for provability.

Conclusion

Combining two different storage technologies has significant advantages over existing concepts. Digital art, and all its history and the stories that surround it, can be saved permanently while remaining verifiable, searchable, and linkable. Although blockchain storage is involved, prices remain affordable.

For anyone who wants to preserve digital art along with all records of provenance, history, and any background information, Atomic Sign is the perfect solution.

Subscribe to Atomic Form
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
Verification
This entry has been permanently stored onchain and signed by its creator.