Secure Web3 with Trusted Data

June 4th, 2024

Welcome, welcome! I hope you’ve been ushered comfortably into your seat. You’re about to discover why I locked myself in a room to tackle a data problem within the realm of blockchains.

Before we dive in, let me introduce myself. I’m Ryan, a long-time programmer, tinkerer, and product enthusiast. As far back as I can remember, I always wanted to create value for the world through my craft, skills, and authenticity.

In 2022, I founded a company called Usher Labs to pursue this goal. It’s been an interesting ride, and I’ll share more about that journey at the end. For now, let’s focus on the topic at hand.

I won’t delve into why blockchains are fantastic for managing value—you’re probably already well-versed in that. However, it’s worth noting that this technology, while revolutionary, is still quite isolated from the broader world. There’s a long road of technical development ahead until we can seamlessly embed advanced cryptography, potentially ZK (Zero-Knowledge) proofs, and blockchain transactions into the very fabric of society.

But why is this the case?

Trust is Quite Elusive in Web3

Due to the financial freedoms and sovereignty that blockchains offer, Web3 has found itself battling against its own unforgiving nature. This has been exploited by some to extract value from participants who may or may not have been acting in good faith. Trust has been eroded by firms and individuals that seek to abuse this technology for personal gain. While on the surface this gives Web3 bad optics, the adversarial nature of Web3 acts as a demand for improvements and innovations in securing this new financial system. Conversely, trust is required by brands to drive the industry forward and enhance the financial security of their users.

The very premise of a blockchain is to decentralise trust. It achieves this through a network of sufficiently non-collusive operators, all transparently validating that an input yielded a particular outcome. Blockchains shift assumed trust from brands or developers to the technology itself. Smart Contracts are the mechanisms that assume this trust. However, users may not know how to code or understand the security dynamics of Smart Contracts, making it difficult for them to trust these contracts directly. Instead, users continue to trust the brands and developers behind the Smart Contracts.

At least we can trust that whatever the developer did will be consistent because, ideally, we trust the blockchain executing the logic.

On a positive note, Web3 standards have evolved to uphold trust among users. Security audits of Smart Contracts that manage digital assets on behalf of users are a testament to this. Essentially, Smart Contract audits allow users to delegate their trust to firms with technical security expertise. The outcome is a report and stamp of approval indicating that the Smart Contract is trustworthy, allowing users to trust the technology rather than the brand.

However, Smart Contracts are evolving. Similar to how static web pages evolved into dynamic web apps in the mid-1990s to 2000s, Smart Contracts across various blockchains are evolving to leverage data in managing digital assets. This introduces a new problem: data becomes a security risk.

I could have a perfectly secure Smart Contract that depends on a dataset I control, own, and have authority over. How would you know if I’ve tampered with this data for personal gain? Would you be able to tell if I changed a reward’s recipient wallet address to mine? Who’s checking this?

If these questions haven’t successfully piqued your intrigue about the gravity of the problem, consider this analogy:

Imagine two snails racing in a pitch-dark room. You have a flashlight, but it only reveals the snails' current positions, not the paths they've taken. Because it's dark, you can't see if someone else in the room has moved the snails. Without being able to track the snails' trails, you can't be sure if the race has been tampered with.

Now imagine millions of dollars are on the line.

This is essentially the state of data-driven Smart Contracts.

So… Oracles, Right?

If you’re familiar with Web3 services such as Chainlink, then I’ll assume you’re aware of what a Decentralised Oracle Network (DON) is. The very purpose of a DON is to solve the problem I’ve outlined. In most circumstances, DONs do exactly that. Consider scenarios where Smart Contracts depend on digital or real-world asset pricing data. DONs essentially decentralise trust in the data point indicating the latest price of a particular asset relative to another asset, referred to as an asset pair. Similarly to a blockchain, this occurs by establishing a network of substantially non-collusive node operators that transparently source this price data point from their respective data sources and submit it to a Smart Contract. On the blockchain, aggregation over the various price data points submitted by each node operator is facilitated, typically finding the median value. In other circumstances, the reputation of node operators is factored in to weight the submissions during aggregation. Each DON employs its own protocol on data — a data protocol.

I’d like you to mentally save this concept of data protocols. Traditional DONs are implementations of a data protocol—a set of conditions cryptographically enforced on data to determine validity. Here, the data protocol enforces decentralised sourcing of data.

Ultimately, what’s used on-chain and within Smart Contracts is the final result of many node operators’ attestations to the price of a particular asset pair.

However, there is still a problem.

This only works if the data is highly available.

The price of assets is essentially the only type of data that is highly available across many data sources. If all node operators leverage the same data source, it defeats the purpose of the DON entirely.

In many scenarios, especially for the next generation of data-driven protocols, the data used is proprietary, private, and provided by a single highly trusted data source.

Coindesk has attested to this problem, noting that existing solutions only ensure that data from centralised entities is not tampered with before it makes it on-chain. It doesn’t make that original data any more (or less) credible.

Chainlink is fully aware of this, and I’m quite a fan of their work — especially regarding their explanation of trust minimisation.

Trust Minimisation

I agree, the language does sound like a paradox. However, I’ll do my best to explain this paradigm.

“Trust” is a subjective belief in the outcome of something based on some knowledge. Where “trust” is assumed (established and maintained) changes based on context. Minimising trust assumed by technology means making knowledge available that maximises user trust in the outcome of said technology. Regarding data, if I can guarantee the accuracy of my data based on a source, a user will not need to trust whether I’ve tampered with it once the data is presented.

Minimising trust means alleviating the trust assumed by a technology or application. For a platform like Ethereum, its premise is to act as a trustless program operator. Trustless means assuming no trust at all. These programs are referred to as Smart Contracts. While users still trust Smart Contract logic, no trust is assumed by an operator of the Smart Contract because it’s executed by many operators, collectively known as Ethereum. The aim of this whole endeavour, and for cybersecurity in every technical domain, is to maximise user trust in outcomes. For example, banks, ideally, uphold security to prevent the loss of their customers’ capital. Airports employ border control to maximise customers’ trust in the outcome of their flights.

Cryptography enables us to guarantee knowledge of things to minimise the amount of trust assumed by technology. In essence, it’s a form of transparency. Cryptography inherits trustlessness if it is verified on a trustless platform like Ethereum. If a centralised party verifies my cryptographic proof, then the verifier assumes trust in the outcomes. In this regard, cryptographic guarantees verified in a trustless manner ensure a user has access to more knowledge and therefore minimises the trust required to know the outcome of a technology.

I like to refer to this as verifiable transparency — where cryptography allows us to poke holes or completely observe what’s in a black box.

Let’s return to minimising trust. It’s a means of trust management in technology where particular cryptographic guarantees to knowledge can be made about an operation to maximise trust in outcomes. While Chainlink conflates trust minimisation with decentralisation, I consider decentralisation as the ultimate form of trust minimisation because it’s fundamental to obtaining trustless outcomes. In the aforementioned circumstance where sensitive data is highly centralised and not highly available, yet is required for data-driven blockchain protocols, decentralisation in data sourcing is fundamentally excessive, expensive, redundant, and jeopardises data privacy. The reason is that we must assume trust in an entity at some point in the data lineage and decentralisation does not help us gain new knowledge about data lineage.

Data lineage is a means of tracing the state of data at various points throughout its pipeline and transportation to a destination system, like a blockchain.

Therefore, security in data, where data is not highly available, such that it is private, custom, or niche, is upheld through trust minimisation in data lineage. Here, entities involved in the data lineage must create cryptographic guarantees (be verifiably transparent) that the data has not been tampered with, censored, and that the data is an accurate representation of the original source of this data.

Data flow comparison between a transparent data provider and one that is not. Who would you trust more?

What is “Trusted” Data?

To maximise user trust in the outcome of a data-driven protocol or Web3 application, the original source of data must be highly “trusted”. I’ve touched on this topic when explaining verifiable transparency for RWAs, which is also referenced in Truflation’s whitepaper. However, for those new to the concept of highly “trusted” data, let me explain how we can use logic to evaluate whether a data source can be trusted.

Before I dive in, I must confess that the value of measuring trust is becoming more apparent to me. Quickly evaluating whether a data source, Web3 app, data protocol, etc., is trustworthy based on a highly transparent scoring methodology may be necessary. After a short interview with a Web3 degen in my close network, I realised that trust issues are rampant. Investors are afraid to commit to safe and trustworthy staking pools, opting instead to gamble on coin exchanges due to continuous hacks and failures to uphold protocol security. Rekt.news indicates 4 to 10 hacks and compromises a month over the last three months. Builders are reluctant to create data-driven protocols because they’re assuming a high degree of trust and have no reputation to back it. I want to play a role in solving this. Through your support, I aim to make an active effort to offer you analytics-backed reports that score trust in various data-related entities powering our new financial system based on logical conditions, measurements, and an appropriate analytical methodology. This will come in time.

But I digress.

For now, we use a heuristic to determine whether a data source is trustworthy.

The heuristic questions the data source before its use within a lineage from the point of sourcing to verification and utility within blockchains. This way, data consumers and data protocol developers can evaluate whether their data sources are indeed viable for maximising user trust.

“Trusted” Data as a Heuristic

The trustworthiness of data provided by a service entity is predicated on the following conditions:

Primary Purpose and Data Provenance: Is the entity's main function to provide accurate data, and is the data sourced directly—i.e., is this first-party data?

Entities fitting this criterion include:
- Government-backed APIs
- Government-authorised information brokers
- Private companies in highly regulated industries, such as big banks or large payment processors
- Private companies specialising in industry-specific data aggregation and delivery
Value Correlation: Is the entity’s data validity and accuracy directly and proportionally correlated with the value of its service?

This condition differentiates entities prioritising data volume for broad insights from those offering high-quality data crucial for sensitive operations, such as those in financial markets, healthcare, and identity verification.
- Examples of the former include analytics providers such as Semrush or Similarweb, which source various types of data, from clicks and user engagement to tracking tools to ISP data sold privately. The purpose of these companies is to generalise big data into valuable insights—even if data accuracy is off by X% where X < 50 for any given time range.
- Examples of the latter include:
  - Coingecko, our beloved cryptocurrency price and charts application, whose sole responsibility is to serve accurate pricing data.
  - Corelogic, whose sole responsibility is to serve data related to real estate assets.
Capitalisation and Incentives: Is the entity sufficiently capitalised such that the financial incentive for maintaining data accuracy surpasses the temptation to falsify data?

This consideration views the capitalisation of an entity as a quantification of reputation and a measure of potential loss in acts of mistrust. The higher the capitalisation, the more people essentially trust the brand/entity to carry out its designed services and outcomes. If bad faith actions are carried out, then there are higher penalties. While big banks and big tech are theoretically examples of this, penalties imposed through regulatory repercussions tend to fall under the radar due to their financial leniency. This is especially highlighted when compared to the misconduct of a node operator on a Proof of Stake blockchain, where penalties are generally half of what is staked as a bond against such behaviour.

This consideration also plays a significant role in the realm of real-world assets (RWAs). When a Real-World Asset (RWA) token issuer manages funds exceeding their company's valuation, the incentive to exploit vulnerabilities increases, as does the demand for verifiable transparency in their asset management practices. Currently, a standard is in place for RWA issuers to involve third-party licensed auditors. This transparency practice for upholding trust and security, as Ondo Finance puts it, is a disclosure standard where “on-chain finance should meet or exceed those in TradFi”. To democratise RWA issuance and truly tokenise the world of traditional finance, I firmly believe we need to exceed our current standard of transparency and disclosures. This is exacerbated by the fact that:
- Regulatory guardrails diminish in various jurisdictions where issuers and auditors may operate.
- Processes and data can be obfuscated from auditors to serve the RWA issuers' interests over their investors.
- In the worst case, issuers and auditors may collude.
If you think my concerns are exaggerated, even after awareness of the unforgiving nature of cryptocurrencies, research various cases of financial accounting scandals, or review documentaries and reports on the matter.

When referencing entities within these preconditions, we are specifically referring to original first-party data publishers. A trustworthy data protocol would transparently leverage and even cryptographically enforce these data sources on the basis that they sufficiently fit the criteria for being “trusted”.

Although it is unlikely that many trusted data sources will meet all preconditions, this framework facilitates human-in-the-loop decision-making regarding the risk profile of a verifiable data pipeline and its designated "trusted" data sources.

Through the analysis of data protocols, pipelines, and Web3 apps within the blockchain ecosystem, I believe that community-led risk assessment will play a pivotal role in endorsing specific data sources as "trusted" providers, thereby reinforcing the integrity and reliability of the data underpinning protocols that the world depends on.

Evaluating “Trusted” Data Sources

A good evaluation of a “trusted” data source is that it fits at least two of the three preconditions.

Companies such as Stripe, Plaid, Mercury, Binance, and Equifax are examples of first-party data providers that meet these conditions. They provide accurate data as their primary function, satisfying the first condition. They handle highly sensitive data directly correlated to their value proposition, meeting the second condition. They are also highly capitalised, aligning their financial incentives with meeting expected outcomes, thereby satisfying the third condition. These companies play critical roles in sensitive operations associated with financial data management, whether it’s underwriting a loan based on credit history or revenue-based financing. There are many other companies that meet this criterion, even outside the realm of traditional finance. As previously mentioned, it will become a community-led effort to surface these entities as trusted data sources for use in data protocols that integrate blockchains into our everyday lives.

Users of any application, service, or Web3 are also entities that fit within this paradigm. Every transaction submitted by a user for the purpose of engaging a social network, a DEX, a lending protocol, etc., and the associated metadata involved in the user’s session essentially satisfies the first condition. The primary purpose of the data in a user’s session is to satisfy a goal that fundamentally requires accurate data. The series of clicks and user interactions that lead to the creation and submission of a transaction must be accurate for the transaction to be valid. Users also satisfy the third condition depending on the context of the user’s engagement and their role in a given application or service. When a user purchases a service, item, or product, their capitalisation is marked by the transaction volume they’ve engaged in, aligning their incentives with the data provided. However, if a user is a referrer in an affiliate market, this may not be the case, as there is no capitalisation to evaluate whether referral metrics are fraudulent or not. This is why affiliate markets and other marketplaces treat these user roles as “untrusted” entities requiring fraud prevention, detection, and data attributions to evaluate whether to reward these users for their activities.

Operators in various decentralised networks, such as IoT device or compute resource operators in a decentralised physical infrastructure network (DePIN), or node operators in a blockchain, sufficiently satisfy these conditions. These operators are essentially first-party data providers, as their attestations to the validity of the network or metrics produced are generated entirely through their operation. Furthermore, these operators either stake capital through the purchase of physical equipment and energy for operational uptime or through bonding an allocation with a blockchain. The third condition is only satisfied if the capitalisation is sufficient. While blockchains are capable of detecting tampered data through replicated validation, allowing for identity in Proof of Authority chains to satisfy capitalisation where the penalty is exile, DePINs are still in search of a solution to prevent tampered data.

Bad data in DePINs should cause reputational implications and subsequent financial penalisation. The issue is that there’s a lack of tools to enable this transparency and verifiably manage data lineage without centralised oversight.

Illustration of various projects facilitating data integration, colour-coded to indicate the level of trust in each entity based on the heuristic.

Recap

Blockchains provide cryptographic security, but this security doesn’t extend beyond the blockchain itself. This creates a significant risk for Web3 apps that manage digital assets using external data.

Without transparency, users must trust brands to handle their data and assets, centralising trust and increasing risk.

The data problem in blockchains is that there’s little transparency into where data has been and, therefore, whether it is valid. Traditional Decentralised Oracle Networks (DONs) tackle this by decentralising the source of this data and aggregating the results on a trustless platform, such as a blockchain. This approach is only viable for highly available data like asset pricing data, where many sources provide similar results. While decentralising trust in a network of oracles forming independent attestations to this data is the most widely adopted solution, as seen with Chainlink, novel technologies are emerging that enable a centralised node to incorporate trust minimisation to achieve the same outcome. This latter approach offers more cost efficiencies, better management of private or gated yet highly available data, and better error handling, at the expense of achieving true trustlessness through traditional decentralisation.

This data problem is exacerbated where protocols are dependent on private, custom, or niche data that is highly centralised and not highly available. Evaluating data validity is done through a set of conditions relevant to the dataset. Fundamentally, validity at least follows characteristics such as being tamper and censorship resistant and maintaining an accurate reflection of data sourced from a highly trusted entity.

At Usher Labs, we’re fundamentally working on this very problem. Whether for infrastructure projects such as DePINs and Oracles, or for data-driven protocols such as RWAs, minimising trust in data pipelines between the world and blockchains is a solution to this dilemma. By extending blockchain’s trust and security into systems beyond the blockchain, and creating verifiable transparency into data lineage and systems involved in these data pipelines, we offer insights through analysis as to where trust is truly established and maintained, and cryptographically supported data security in digital asset management.

Illustrating how trust in data varies based on whether it is first-party or third-party and its source context. The graph highlights how transparency, especially when verified, minimises the trust required in technological outcomes, thereby enhancing user trust in those outcomes.

Conclusive Note

You may have noticed that I haven’t detailed exactly how the technical solutions we’re actively proving and implementing at Usher Labs work. For the sake of keeping this piece brief, you can learn more by visiting our website. To learn more about our journey to data security and verifiable transparency, read our story.

I’m getting familiar with this writing thing and aim to share more research and thoughts in the future. This will include articles and research covering the Web3 technology landscape, data security and analysis, and Usher Labs’ research and solutions roadmap. To keep up to date with Usher Labs and this sort of Web3 research-related content, hit the subscribe button and join us on Discord and Twitter.

Cheers 🤗

Subscribe to ryanwould.eth

Receive the latest updates directly to your inbox.

Mint this entry as an NFT to add it to your collection.

Verification

This entry has been permanently stored onchain and signed by its creator.

Arweave Transaction

37IZiXcGLfbxsqI…thf3DWmwHQyx5dk

Author Address

0xc6D330E5B7Deb31…7771178bD8e6713

Content Digest

OcM1YRfsDuLXfOQ…TbVmXPYdL4xZ4qo