Self-Sovereign Data (SoDa): The New Web3 Data Economy

This is not another post about the metaverse. Well, not exactly… but with the Meta rebrand solidifying the concept in the 2021 zeitgeist, there is a less explored, and I believe critical concept, from the same novel — Neal Stephenson’s “Snowcrash” — that has me thinking.

In Stephenson’s “Snowcrash” Hiro Protagonist, the aptly named protagonist of the book, side hustles as a “Stringer for CIC”; essentially a gig economy worker collecting digital intelligence and posting it to a massive data marketplace. Users of the metaverse can then access this library to search for any information they want.

This concept of a community “dataverse” where people or corporations freely share data is almost universal to sci-fi but is obviously missing in the real world. While the early internet gave us Wikipedia, this clearly falls far short of a structured, real-time, global database of collective intelligence. All of which begs the question — if this is a prerequisite of any sufficiently impressive advanced sci-fi culture — why isn’t there a billion dollar company built around exactly this paradigm today?

The cynical answer is that this is not how mega-caps have become mega-caps to date. In fact, to the contrary, fortunes have been made by building the richest gated internal dataverses possible. What I’d like to posit as a slightly more balanced hypothesis is that we have lacked the economic incentive to fund it, the organizational structures to manage it, and the technology to build it.

An Introduction to SoDa

The predecessor to any dataverse would be a vibrant “data economy” or a fluid market where data is exchanged. Today, internal spending on data dwarfs the market for external data; and most data-related value is concentrated in a few monopolies. The data economy is lagging far behind the rest of the data industry. Over the past few months as I have descended the rabbit hole of blockchain and “Web3”, it has become increasingly clear that this is about to change.

Web3, the decentralized internet built on the back of blockchain technology, shifts ownership and power back to users and away from monolithic platforms (e.g., Facebook, Google, Amazon etc). It is a world where people own portions of the platforms they use, earn money fairly for content they generate, and are no longer held hostage via their data. This newly freed data, will also give rise to a new data economy, which I am calling Self-Sovereign Data or SoDa for short.

SoDa begins with users claiming more rights over their data from large platforms, but also extends to the ways organizations will monetize, and build around newly available data. While there are already many efforts that allow you to “get paid” for your data; I believe SoDa goes far beyond this very literal application. Data will emerge as the 3rd major category of assets next to physical and financial assets. It has the potential to bring the next wave of capital and users over to Web3 and represents the next “killer application” of blockchain much in the way that DeFi (Decentralized Finance) was the first.

What does decentralization do for data?

There are three ways blockchain/web3 could transform how we build a vibrant data economy:

  1. Ownership: 
    Data, like all digital assets, is incredibly hard to value given that it can be replicated at almost zero cost. Yet, fundamental to any economy is a widely accepted mechanism for valuing an asset and tracking ownership. Fortunately, enforced digital scarcity is core to what made Bitcoin, and later any cryptocurrency, viable. By tokenizing access to data we can track ownership and lineage, and create open markets to help determine fair value.
  2. **Organization: 
    **One of the most striking achievements of Web3 has been the massive shift in how communities self-organize. Building on patterns pioneered in open source software, Web3 organizations are able to rapidly enlist communities to contribute in small increments. This is driven by tokens which serve as payment and shares of ownership that directly align incentives without the need for formal employment or contracts. These organizations could be immensely valuable for creating shared data protocols that incentivize the curators, maintainers, and contributors necessary for success, without creating the threat of lock-in.
  3. **Technology: 
    **One of the largest technical issues with data is that it is siloed. Siloed data has a tendency to diverge in structure, quality and standards. The internet is the ultimate example of this as a networked array of data silos that leaves information trapped within each application and company. A better model for this would be a shared database with tight paradigms for access control that we all could draw from and contribute to. This would require large buy-in and incentives (see 1 and 2) and a massively distributed database. As Balaji Srinivasan’s essay Yes, You May Need a Blockchain explains, blockchains can behave as exactly that. While there are practical problems around scaling, it is a pattern that could change data engineering dramatically.

We’ll explore examples and applications of these forces later as we dive into the existing SoDa landscape, but let’s first look at how this could work.

An internet-native data layer

Until Bitcoin the concept of payments and cash were not native to the internet, instead companies like Stripe built infrastructure to bridge the gap between banks and the internet. Financial applications were still beholden to the underlying infrastructure which was slow and costly and wrapped in a complex layer of regulation and bureaucracy. Ethereum introduced programmable contracts (smart contracts) with its own token (internet money) which created a modern, composable financial infrastructure. This unlocked an explosion of DeFi applications last summer, dubbed “DeFi Summer”, which offered much more attractive financial options and drove massive global user adoption.

While an internet-native financial economy is an absolute prerequisite for a new internet-native data economy it is not a complete solution in and of itself.

So what then are the building blocks of an internet-native data layer?

In his vision for Ocean Tokens: “From Money Legos to Data Legos” Trent McConaghy, co-founder of Ocean Protocol, provides an incredible overview on how data can build on top of a structure similar to DeFi. Ocean and other early leaders in the space are beginning to converge around a number key elements that compose this data layer:

  1. Self-Sovereign IDs: A shared universal system for the identification of people, organizations and devices.
  2. Data Wallets: Interfaces for the secure management of personal data assets.
  3. Protocols for Tokenization & Data Exchanges: Agreed upon ways of allowing configurable access to data through tokens and a listing marketplace for those tokens.
  4. Secure Data Enclaves: Neutral compute zones, which allow access for machines to run processes without transferring data or exposing its contents.
  5. Data Oracles: The equivalent of data APIs for developers to access data on the blockchain.
  6. **Data Unions (DAOs): **Decentralized autonomous organizations governing a contributory data network.

It’s useful to consider what a theoretical application of these elements working together might look like.

Snowcrash: A Theoretical Case Study for SoDa

YoursTruly, or “YT” for short, is a courier. She delivers high value packages around the city, so when she broke her smart skateboard on her last delivery she knew needed a new one, ASAP.

She logs on to Sk8!, a skateboard e-commerce site.

The site can pull her universal public profile (1): her delivery addresses, language preferences, “dark mode” html customizations, and she overrides the delivery standard option from “next day” to “next hour”. The site requests access to her relevant previous browsing history and financial transactions through her wallet (2) in order to give her board recommendations and financing options, she accepts and the information is provisioned to a secure enclave (4) where Sk8! can run their recommendation algorithms and underwriting models but without possessing the underlying data. She buys the board, and accepts the financing. When she receives it, she connects the live location data to an oracle network (5) this allows the competing delivery apps to see where she is and connect her with the next client. YT is also a member of the CourierDAO (6) a data union that pools together movement data from couriers around the world, and rents access to the dataset on a data exchange (3). Companies like Sk8! license the data to train their AI-driven R&D efforts and YT gets her cut of that license revenue.

If that virtuous cycle seems too “out of this world”, consider that the Indian government began an ambitious project over a decade ago which has already begun to make a domestic ecosystem like this a reality.

“India Stack”: A Real World Case Study

In 2009, only 17% of India’s massive population was participating in the formal financial system. The hurdles to open bank accounts and enroll in digital payment ecosystems or debt markets was too great. The government saw this as a massive limit to their development potential and

to solve this problem began implementing one of the most ambitious state-led digital transformations of all time.

“India Stack” is a three layer: identity, payments and data sharing, network. While the data sharing network is still in the early stages of its roll out, the identity system is responsible for bringing hundreds of millions of individuals into the banking system. To date this has grown the percentage of adults with bank accounts to 80%! Progress that might otherwise have taken decades has happened in 9 years.

For the data sharing system, they envision a time when consumers will be able provision their data to a new bank or service provider for a limited time to make a decision, after which access will be revoked. “The Internet Country” by Aaryaman Vir and Rahul Sanghi gives an in depth view into how this came about, and the future for the system.

While the original instantiation of India Stack was not based on blockchains, the government is now developing a strategy to do just that. To do this at a global scale would likely require a decentralized solution from the beginning.

India Stack is first and foremost a financial platform rather than a new data economy, but it is a clear demonstration of how intrinsically linked these concepts are and foreshadows the new wave of applications that may be built as DeFi becomes more widely adopted. This real world example also sets a precedent for learning to “rent” data, rather than storing and owning everything — representing a fundamental shift in development architectures.

So where are we today when it comes to SoDa?

SoDA Landscape

Inspired by Matt Turck’s data landscapes, whose annual releases serve as a visual survey for the growth and evolution of the data industry, I’ve compiled a landscape for the organizations, technologies and products that I believe comprise SoDa today. Inclusion is not based on the use of data alone, nor does data need to be the only product or value proposition. Instead, SoDa organizations (in addition to being a blockchain-based technology) fall into one or many of the following groups:

  1. Tooling essential for the use and collection of data in Web3 applications.
  2. Protocols for facilitating the portability of user data between applications.
  3. Blockchains which make data privacy a primary differentiator.
  4. Applications which utilize data beyond blockchain metrics (token prices etc)
  5. Applications focused on the monetization of data enabled by blockchain tokenization.
Landscape Notes: 1. Some organizations could be placed in multiple locations based on their products/features, I have chosen where I believe they may be strongest. 2. Identity could wide landscape in itself, these are a selection of projects which I believe have promise or a focus on data in particular.
Landscape Notes: 1. Some organizations could be placed in multiple locations based on their products/features, I have chosen where I believe they may be strongest. 2. Identity could wide landscape in itself, these are a selection of projects which I believe have promise or a focus on data in particular.

Today SoDa has its roots in DeFi, and therefore the largest players (ex. Chainlink) concentrate on tooling and infrastructure for it. But we are already beginning to see a shift. Insurance applications which were originally targeted at fellow DeFi applications have grown to include more traditional lines like weather (Arbol) or travel (Koala). With the launch of projects like Sign in with Ethereum from Spruce, users will begin to keep their data in wallets rather than storing it in-application. This will allow users to monetize their data through data unions or improved prices and services in the application layer without needing technical skills. Similarly, while most of the data that is being put on oracle networks today is DeFi orientated, demand for traditional 3rd party data companies to provide more “real world” (weather, traffic, commerce, movement) data will grow and be an opportunity for new revenue streams for these companies. Infrastructure like Helium will also allow low cost IOT devices to help collect data and transact with each other directly on chain.

Dataverse: So when moon? 

We are still extremely early and the infrastructure in this space feels like the internet of the 90’s. Network effects are difficult to break and getting users to adopt web3 social platforms or e-commerce won’t be easy. Yet as consumers feel what it is like to have a stake in the networks they use, my intuition is that progress will compound rapidly. With some of the largest companies in the world reliant on captive data for their moats there will be no shortage of companies competing to change the game.

The “dataverse” is a big audacious goal and our current patterns of thinking lead us to believe that big problems lead to big companies. Even Stephenson conceived of it as being controlled by a centralized entity — CIC was the CIA merged with the Library of Congress. But just as few of us believe that Meta will control the metaverse; the dataverse will also emerge not from a centralized roadmap but from an ecosystem of projects. It is unlikely to manifest in the way we imagine, or be a one-for-one replacement of a company today but something brand new and that’s what’s most exciting.


I hope this is a useful introduction into the world of Web3 Data Economy. If you’re interested in the space, have projects that we should add to the landscape or just would like to connect, you can find me on twitter (@craig_danton).

Thank you to @I_F_H and @Chris_AA for the help getting this out there.

Subscribe to Craig Danton
Receive the latest updates directly to your inbox.
This entry has been permanently stored onchain and signed by its creator.