Oracles: Getting Real-World Data into the Blockchain

Welcome! This is a gentle 10 minute introduction to Oracles on web3: How can we feed real world data into a blockchain?

What an oracle does

Said simple:

"An oracle is a bridge between the blockchain and the real world. They act as on-chain APIs you can query to get information into your smart contracts. This could be anything from price information to weather reports. Oracles can also be bi-directional, used to "send" data out to the real world." (@minimalism, October 2021).

Suppose a smart contract handles betting on a 'real world' (off-chain, from now on) outcome. For example, who wins the next presidential election. The Oracle would confirm this 'real world' result.

Note that there are no API calls from a blockchain. This will be resolved by the Oracle.

Yes, the Oracle is an intermediary.


The problem the Oracles seek to solve

  • Blockchains are designed to be deterministic. That is why, in principle, we cannot make on-chain API calls. We must do this through services such as Chainlink. Decentralization requires that all nodes in a system confirm the same result. There should be no ambiguity. Common APIs may not meet the no-change requirement. For example, Juan sends Carla the equivalent of $10 in ETH. Then use a off-chain API to get the ETH/USD value at that time. Transactions are recorded at a node on the blockchain and other nodes must validate the transaction by calling the same API that provided the ETH/USD value. However, this API service is likely to be dynamic and change the value within a minute. So if a single validator calls the API and gets a different value, the transaction will not validate. This is why blockchains need to be deterministic, so that validation can happen: if we replay every transaction from the first block, then we should be able to get the same results (this amazing video was the source of inspiration for this example).
  • The Oracle problem. An Oracle is only as good as the data sources it uses. A single source of truth would be insecure and violate the goal of decentralization. An API provided by a private company, for example, could be manipulated or hacked to pursue particular objectives. Even a source like government agencies or big tech companies could, and have been, hacked and manipulated. (see this and this). This could be solved by a decentralized oracle that gets the information from various data sources. If one source fails, the others will continue to provide the required information.

How an Oracle works

An Oracle requests from an off-chain API and writes the result to the blockchain. Other contracts will need to get the data from the contract where the Oracle wrote it.

To provide the most accurate and safe ETH/USD price possible, MakerDAO collates multiple data sources (gets the median value). It would only update the price if the (1) new price, collected from different external sources, differs by more than 1% from the last price; and (2) the last price update was made more than 6 hours ago. This would allow not to change the price relationship if it is not necessary (MakerDAO).

Once MakerDAO decides that it should update the ETH/USD price ratio, based on the decision criteria outlined above, it writes the new one onto the blockchain through its price feed contracts, called Medianizers. I believe the Medianizer name comes in because it calculates the median value of all prices feeds and this is the value it writes. This would allow a permanent record of that price on the blockchain, keeping it deterministic and capable of being validated by all nodes.

An Oracle like this is called a Decentralized Oracle Network (DON). It processes several off-chain sources in a way that would allow to deal with (1) deterministic blockchains and (2) the Oracle problem. This provides end-to-end decentralization.

In what applications can an Oracle be used?

"Major industries benefit from combining oracles and smart contracts including asset prices for finance, weather information for insurance, randomness for gaming, IoT sensors for supply chain, ID verification for government, and much more." (Chainlink).


How to send custom off-chain data to the blockchain

Chainlink, for example, allows you to interact with off-chain data with API calls (see here). The service advertises itself as being able to call any API (read here for how to make GET requests) :

"Whether your contract requires sports results, the latest weather, or any other publicly available data, the Chainlink contract library provides the tools required for your contract to consume it." (Chainlink documentation).

The exact mechanics on how we can request data from any external API, even using a Chainlink, would be material for other text. The question of how not to fall into the Oracle problem remains: What if the data we want to GET is only available through a single source of truth? Only one company provides this information. Theoretically, this would violate decentralization and vulnerize the blockchain.

Here's a very cool contract in Remix that GETs data from an external API using Chainlink.


The Gemini Cryptopedia’s article “What Is Chainlink in 5 Minutes” will be strongly cited in this section.

“Chainlink is a decentralized network of nodes that provide data and information from off-blockchain sources to on-blockchain smart contracts via oracles.” (Cryptopedia, 2021).

Note that Chainlink is not an Oracle per se but a network of oracles, also called a Decentralized Oracle Network (DON). This way each oracle might be obtaining off-chain data from a single source but Chainlink uses a mechanism combining (1) reputation (by repeatedly providing accurate information), and (2) stake into the protocol (by the number of LINK Tokens staked by the oracle)

The below image illustrates how the Chainlink network of oracles (the blue nuts in the third column) introduces off-chain data from several sources (the first two columns) into smart contracts (on-chain). Note that this works in the contrary direction, passing on-chain data to off-chain requesters. LINK Tokens are used to pay Chainlink by the entities requesting the data. “Prices are set by the Chainlink node operator based on demand for the data they can provide and the current market for that data.” (Cryptopedia, 2021).

Obtained from the Cryptopedia, 2021.
Obtained from the Cryptopedia, 2021.

The process in four steps

  1. To request data from Chainlink, a smart contract creates a Requesting Contract.
  2. “The Chainlink protocol registers this request as an ‘event’ and in turn creates a corresponding smart contract (Chainlink Service Level Agreement (SLA) Contract), also on the blockchain, to get this off-chain data.” (Cryptopedia, 2021).
  3. The Chainlink SLA Contract generates three sub-contracts (Cryptopedia, 2021):
    1. Chainlink Reputation Contract. It '“checks an oracle provider’s track record to verify its authenticity and performance history — then evaluates and discards disreputable or unreliable nodes.”
    2. Chainlink Order-Matching Contract. It '“delivers the Requesting Contract’s request to Chainlink nodes and takes their bids on the request (when the Requesting Contract does not choose a specific set of nodes) — then selects the right number and type of nodes to fulfill the request.”
    3. Chainlink Aggregating Contract. It “takes all the data from the chosen oracles and validates and/or reconciles it for an accurate result.”
  4. The Chainlink Aggregating Contract proceeds to validate the oracles’ data. It “can validate data from a single source and from multiple sources — and it can reconcile data from multiple sources. So, if five nodes deliver one answer from a weather sensor and two other nodes deliver a different answer, the Chainlink Aggregating Contract will know that those two nodes are faulty (or dishonest) and discard their answers. In this manner, Chainlink nodes can validate data from a single source.” (Cryptopedia, 2021).
Obtained from the Cryptopedia, 2021.
Obtained from the Cryptopedia, 2021.

Let’s look at the same process from a different angle. Individual chainlink nodes --themselves oracles since they are looking to provide a blockchain with off-chain data-- provide “raw data” (as Chainlink calls it). In the case of Chainlink Price-Feeds, these raw data comes from several centralized (Binance, Coinbase) and decentralized (Uniswap, Sushiswap) exchanges that, for example, provide the price of ETH in USD (figure below).

Obtained from the Chainlink, 2021.
Obtained from the Chainlink, 2021.

Then, Data aggregators (in the case of the Price-Feeds, the figure below, BraveNewCoin and CoinGecko are good examples) collect raw data from across the mentioned providers to generate refined datasets. These datasets remove outliers and look to filter fake or manipulated sources. Notice these aggregators work off-chain, we are still not in the blockchain. This step will most likely not exist in less famous applications and we would go straight to the following step.

Obtained from the Chainlink, 2021.
Obtained from the Chainlink, 2021.

The Data Aggregators would already filter for faulty data and add a layer of security since they are obtaining data from a decentralized network of sources. To further ensure decentralization, with the main goal in mind of not allowing that a single faulty data source makes the whole system fail, Chainlink Node Operators work on-chain and would source and broadcast the off-chain data to the whole blockchain. This is the first on-step we have. The Node Operators would source the already processed data from several Data Aggregators and obtain the median value between them. This would further ensure that outliers are taken care of. Below an image of relevant Node Operators for the Price-Feed service.

Obtained from the Chainlink, 2021.
Obtained from the Chainlink, 2021.

Finally, we get to the Chainlink Validating Contract we review in the last section. We need to arrive at a single data point and make sure it was obtained in a decentralized manner. Chainlink would aggregate the responses of the Node Operators, probably thought obtaining the median value of the nodes that attended the ask for data. This aggregation process could occur off-chain or on-chain depending on the war data source(s) and the cost of integrating the data on-chain.

How many times did we aggregate data (obtained the median value for most cases)? Three in the above example. These are good news for decentralization and security.

If the aggregation process was made on-chain by the Validating Contract then the FluxAggregator.sol model was most likely used.

The transition from the FluxAggregator Model to the Off-Chain Reporting (OCR) Model

Requesting data from a single source

The ChainlinkClient.sol (Client, from now on) smart contract allows smart contracts to consume data from oracles in the Chainlink network. The Client contract uses the transferAndCall function from the LinkTokenInterface.sol contract to make a request to a known oracle in the Chainlink’s network. The Client contract initiates this process with a call to the function sendChainlinkRequestTo.

This way we can practically use any API service off-chain that was introduced on-chain even by a single Chainlink node. The Client would create a request by providing the particular oracle address, the task to be performed by the oracle node, and a callback function where the oracle will send the data to. This, however, won’t allow for the robustness of having several nodes providing data that can be aggregated by the Chainlink Aggregating Contract.

An off-chain oracle node would listen to an OracleRequest event made on-chain and perform the job, probably by calling an API (remember there are no on-chain APIs so this has to be done off-chain). The node would then convert the result into blockchain compatible data and send it to the on-chain oracle contract using a fulfillOracleRequest function, defined in the Oracle.sol contract that node operators would use, that sends the result to the Client contract.

“Chainlink node operators also use LINK to stake in the network — node operators must deposit LINK with Chainlink to demonstrate their commitment to the network and incentivize good service.

The Chainlink Reputation Contract considers the size of a node’s stake (among other criteria) when matching nodes with requests for data. Nodes with a greater stake are therefore more likely to be chosen to fulfill requests (and thus earn LINK tokens for their services). The Chainlink network also punishes faulty or dishonest nodes by taxing their stake of LINK for poor service.” (Cryptopedia, 2021).

Thanks for reading!

Feel free to reach out to me with thoughts, comments and feedback via Twitter.(@espejelomar) or email (espejelomar@protonmail.ch).

Subscribe to Omar Espejel
Receive the latest updates directly to your inbox.
Verification
This entry has been permanently stored onchain and signed by its creator.