If Bitcoin can bring about financial honesty, Web3 more broadly can bring about data honesty.
Back then
-1400s-2009: The era of double entry accounting. A ledger of debits and credits to keep track of things.
-2009-Present: The era of triple entry accounting. Ledger of debits and credits with third party verification via shared supercomputer (blockchain).
In the multi-thousand year arc of humanity, this is the largest tectonic shift in how we organize ourselves.
Sure you could trust the Dutch East India company to deliver goods across the world, but there was always the risk someone was cooking the books, a stray barrel of whisky here, an extra passenger there.
The larger the incentive, the larger the chance somewhere in the system there is an emperor without clothes manipulating the ledger. Long Term Capital Management, cough Lehman Brothers, cough cough Evergrande.
Today
If all the largest power structures today are hidden behind an opaque database, how do we know they aren’t cooking the books.
-Twitter putting their thumb of the scales to squash dissenting opinion
-PwC auditing Evergrande with glowing results (and 50mm+ in fees)
It’s all related.
Hashing Analytics Onchain
Our salvation will come from properly configuring the holy trinity of concepts:
Hashing - mathematical fingerprint proving digital information (data) is unique
Analytics - digital information over time that tells a story
Onchain - data stored on a shared ledger database (blockchain)
HAO also happens to be pīnyīn mandarin for good so there’s that for a fun mnemonic
Experiments in hashing analytics onchain have occurred since the dawn of Bitcoin. Some of the earliest transaction on the Bitcoin blockchain contained scripture, binary images, and occasionally more nefarious data.
Recently the Ordinals concept has gained momentum. Scribing data beyond simply transaction data on the Bitcoin blockchain. Expanded further - the entire notion of scaling the blockchain relies on storing hashes of a layer 2 chain data on a layer 1 chain backed by higher security.
Of course, most developers thinking about novel data to store on blockchains largely play in Ethereum land. Looser rules and faster cheaper blockchains lead to more experimentation. That and shitchains like Hedera and Cardano being nearly impossible to code in. Long live tarted up Javascript and the EVM!
Goldilocks
Centralized Systems (web2) writing data to Blockchains (Web3) is the answer.
When a user logs in to a Web3 application (called a dApp) they give up much more information about themselves than they realize.
As web3 dApps are 99% likely to run on centralized cloud infrastructure like AWS or GCP, all the juicy Web2 analytics are available at the fingertips of cloud owner.
-IP address
-click behavior
-Identifying data (email address, social login, etc)
But Web3 can be different!
Let’s protect user data, privacy, right to be forgotten and all that GDPR jazz but in a Web3 context.
Instead of on-or-off, black-or-white.. Web3 (configured properly) allows for selective data access.
Eg,
-what is safely stored by a private walled garden (web2)
-what is made public for all to see (web3)
-and a third wrinkle, can we write encrypted data (hashes) to public blockchains to achieve our Goldilocks. The user owned public storage of private data.
In Action
Let’s take the use case from the last article - friend.tech leaking 100K+ wallet addresses and associated Twitter handles.
In a properly configured environment (adults in the room) this wouldn’t happen.
-The holder of the sensitive data would keep it walled behind their web2 garden.
-When the owners of the data request access. The holder writes the data as an encrypted message (hash) to a public blockchain.
-This ensures a timestamped hash correct record of the event.
Assuming the source data is clean, storing it on chain for all time ensures no one tampers with it after the fact.
Tiered Privacy and Data Monetization
Some transactions we want directly on-chain for all to see. Alice sends Bob 1 BTC. We don’t care who Alice is, or who Bob is - we just want a public record that A sent 1 BTC to B.
For all ancillary data (Like Alice transferring a real estate title to Bob) we want hashes of relevant data stored on chain associated with the transaction. Hashed fingerprint of the property survey, etc.
This is how to create tiered levels of privacy. The USER decides what level of detail they are willing to share and monetize.
Let’s end rant here.
In the next article we’ll start fleshing out how we can think about these data markets, and how users can selectively monetize portions of their data exhaust.