Welcome! This is a gentle 10 minute introduction to Oracles on web3: How can we feed real world data into a blockchain?
"An oracle is a bridge between the blockchain and the real world. They act as on-chain APIs you can query to get information into your smart contracts. This could be anything from price information to weather reports. Oracles can also be bi-directional, used to "send" data out to the real world." (@minimalism, October 2021).
Suppose a smart contract handles betting on a 'real world' (off-chain, from now on) outcome. For example, who wins the next presidential election. The Oracle would confirm this 'real world' result.
Note that there are no API calls from a blockchain. This will be resolved by the Oracle.
Yes, the Oracle is an intermediary.
An Oracle requests from an off-chain API and writes the result to the blockchain. Other contracts will need to get the data from the contract where the Oracle wrote it.
To provide the most accurate and safe ETH/USD price possible, MakerDAO collates multiple data sources (gets the median value). It would only update the price if the (1) new price, collected from different external sources, differs by more than 1% from the last price; and (2) the last price update was made more than 6 hours ago. This would allow not to change the price relationship if it is not necessary (MakerDAO).
Once MakerDAO decides that it should update the ETH/USD price ratio, based on the decision criteria outlined above, it writes the new one onto the blockchain through its price feed contracts, called Medianizers. I believe the Medianizer name comes in because it calculates the median value of all prices feeds and this is the value it writes. This would allow a permanent record of that price on the blockchain, keeping it deterministic and capable of being validated by all nodes.
An Oracle like this is called a Decentralized Oracle Network (DON). It processes several off-chain sources in a way that would allow to deal with (1) deterministic blockchains and (2) the Oracle problem. This provides end-to-end decentralization.
In what applications can an Oracle be used?
"Major industries benefit from combining oracles and smart contracts including asset prices for finance, weather information for insurance, randomness for gaming, IoT sensors for supply chain, ID verification for government, and much more." (Chainlink).
"Whether your contract requires sports results, the latest weather, or any other publicly available data, the Chainlink contract library provides the tools required for your contract to consume it." (Chainlink documentation).
The exact mechanics on how we can request data from any external API, even using a Chainlink, would be material for other text. The question of how not to fall into the Oracle problem remains: What if the data we want to GET is only available through a single source of truth? Only one company provides this information. Theoretically, this would violate decentralization and vulnerize the blockchain.
Here's a very cool contract in Remix that GETs data from an external API using Chainlink.
The Gemini Cryptopedia’s article “What Is Chainlink in 5 Minutes” will be strongly cited in this section.
“Chainlink is a decentralized network of nodes that provide data and information from off-blockchain sources to on-blockchain smart contracts via oracles.” (Cryptopedia, 2021).
Note that Chainlink is not an Oracle per se but a network of oracles, also called a Decentralized Oracle Network (DON). This way each oracle might be obtaining off-chain data from a single source but Chainlink uses a mechanism combining (1) reputation (by repeatedly providing accurate information), and (2) stake into the protocol (by the number of LINK Tokens staked by the oracle)
The below image illustrates how the Chainlink network of oracles (the blue nuts in the third column) introduces off-chain data from several sources (the first two columns) into smart contracts (on-chain). Note that this works in the contrary direction, passing on-chain data to off-chain requesters. LINK Tokens are used to pay Chainlink by the entities requesting the data. “Prices are set by the Chainlink node operator based on demand for the data they can provide and the current market for that data.” (Cryptopedia, 2021).
Let’s look at the same process from a different angle. Individual chainlink nodes --themselves oracles since they are looking to provide a blockchain with off-chain data-- provide “raw data” (as Chainlink calls it). In the case of Chainlink Price-Feeds, these raw data comes from several centralized (Binance, Coinbase) and decentralized (Uniswap, Sushiswap) exchanges that, for example, provide the price of ETH in USD (figure below).
Then, Data aggregators (in the case of the Price-Feeds, the figure below, BraveNewCoin and CoinGecko are good examples) collect raw data from across the mentioned providers to generate refined datasets. These datasets remove outliers and look to filter fake or manipulated sources. Notice these aggregators work off-chain, we are still not in the blockchain. This step will most likely not exist in less famous applications and we would go straight to the following step.
The Data Aggregators would already filter for faulty data and add a layer of security since they are obtaining data from a decentralized network of sources. To further ensure decentralization, with the main goal in mind of not allowing that a single faulty data source makes the whole system fail, Chainlink Node Operators work on-chain and would source and broadcast the off-chain data to the whole blockchain. This is the first on-step we have. The Node Operators would source the already processed data from several Data Aggregators and obtain the median value between them. This would further ensure that outliers are taken care of. Below an image of relevant Node Operators for the Price-Feed service.
Finally, we get to the Chainlink Validating Contract we review in the last section. We need to arrive at a single data point and make sure it was obtained in a decentralized manner. Chainlink would aggregate the responses of the Node Operators, probably thought obtaining the median value of the nodes that attended the ask for data. This aggregation process could occur off-chain or on-chain depending on the war data source(s) and the cost of integrating the data on-chain.
How many times did we aggregate data (obtained the median value for most cases)? Three in the above example. These are good news for decentralization and security.
If the aggregation process was made on-chain by the Validating Contract then the FluxAggregator.sol model was most likely used.
The ChainlinkClient.sol (Client, from now on) smart contract allows smart contracts to consume data from oracles in the Chainlink network. The Client contract uses the transferAndCall function from the LinkTokenInterface.sol contract to make a request to a known oracle in the Chainlink’s network. The Client contract initiates this process with a call to the function
This way we can practically use any API service off-chain that was introduced on-chain even by a single Chainlink node. The Client would create a request by providing the particular oracle address, the task to be performed by the oracle node, and a callback function where the oracle will send the data to. This, however, won’t allow for the robustness of having several nodes providing data that can be aggregated by the Chainlink Aggregating Contract.
An off-chain oracle node would listen to an OracleRequest event made on-chain and perform the job, probably by calling an API (remember there are no on-chain APIs so this has to be done off-chain). The node would then convert the result into blockchain compatible data and send it to the on-chain oracle contract using a fulfillOracleRequest function, defined in the Oracle.sol contract that node operators would use, that sends the result to the Client contract.
“Chainlink node operators also use LINK to stake in the network — node operators must deposit LINK with Chainlink to demonstrate their commitment to the network and incentivize good service.
The Chainlink Reputation Contract considers the size of a node’s stake (among other criteria) when matching nodes with requests for data. Nodes with a greater stake are therefore more likely to be chosen to fulfill requests (and thus earn LINK tokens for their services). The Chainlink network also punishes faulty or dishonest nodes by taxing their stake of LINK for poor service.” (Cryptopedia, 2021).