The Graph Tutorial: Why The Graph?

I'm assuming the reader of this article has prior experience in writing basic smart contracts, at minimum. If not, it's a great time to start.

What actually is The Graph?

From the official docs:

The Graph is a decentralized protocol for indexing and querying data from blockchains, starting with Ethereum. It makes it possible to query data that is difficult to query directly.

To sum up The Graph make querying data, of any kind, from the smart contract super easy. Which would be frustratingly hard to do otherwise.

Before diving directly into it, let's focus on what the real problem here is.

The Problem

Chances are you're also coming to web3 from the traditional web2 space, building client-server architecture-based web applications. You might have found the data-querying (from smart-contract storage) capabilities of a contract very limiting. Normally, you had to write any kind of query, however, complex it might be, at the server to retrieve from DB and you make it an API to connect to a user-facing app.

Let's take an example of a ubiquitous ERC-20 Token contract (abbreviated for simplicity):

contract Token is IERC20 {
    mapping(address => uint256) private _balances;

    mapping(address => mapping(address => uint256)) private _allowances;

    uint256 private _totalSupply;

    string private _name;
    string private _symbol;
        .
        .
        .
}

Now think about how you would make a fairly-complex query from it, like: list all addresses with current balance, whose balances are greater than 100,000 and who've received allowance of more than 10,000.

If this was a database it would have been a fairly easy task. You have a wider range of freedom in terms of writing favorable table schemas and can write up a simple DB query in SQL syntax like:

SELECT address, balance FROM Token WHERE balance > 100000 AND total_received_allowance > 10000

or, a NoSQL MongoDB query:

db.token.find(
    { 
        balance: { $gt: 100000 }, 
        totalReceivedAllowance: { $gt: 10000 } 
    },
    { address: 1, balance: 1 }
)

And you're done in a couple of lines. Nothing brain-wrecking.

Now, just try to write a function in Token contract above that returns exactly the same results as the database above. You are destined to face multiple roadblocks:

  • Even a seemingly simple query is frustratingly hard
  • You might be forced to alter your data layout that complements that one query but might not complement other
  • More storage layout constraints if and when upgrading contracts through proxy patterns
  • Complex storage might introduce bugs and/or compromise security, etc.

And what about operations like joins, aggregations, relationships between entities, pagination, and non-trivial filters? Damn!

In the end, you'd have to go build a dedicated server that indexes and processes data from the blockchain, store them to a traditional database, and build APIs that now query from this database instead of directly from contract storage. In fact, this is what applications like Etherscan did. This is not only costly but deviating from the core goal and a very time-consuming task. How could you be free to innovate with such a barrier upfront?

The Solution

Now you know what exactly the problem is. To avoid setting up and maintaining your own dedicated blockchain indexing servers, just so that you can avail freedom around querying data however you want for your application.

The Graph is a decentralized protocol, meaning it is a network with multiple nodes working together to persist and serve the data in response to queries. The Graph network is run by multiple entities with different roles in the network - Indexer, Curator, Delegator and the Developer. In this tutorial I'll be focusing on Developers. Though you can read more about the different roles here.

As a developer, you have to convey information about the "subgraph" corresponding to the contract(s) you will be querying the data from. This is done by writing some necessary configuration files laying out what and how to store data. Then this subgraph, which will be the source of your data, will be indexed by the network according to the requirements mentioned by you in configuration and become available to be queried through a GraphQL API endpoint.

The three required files that need to be defined by the Developer are:

  • Manifest (subgraph.yaml)

    This file defines the data source to index data from, including target contract, block to start indexing from, events to respond to, etc.

  • Schema (schema.graphql)

    The GraphQL schema that defines what data you wish to retrieve from the indexed subgraph. This is the same as defining well-structured & related models in an API.

  • AssemblyScript Mappings (mapping.ts)

    Some code written in AssemblyScript that translates data from a data source (defined in subgraph.yaml) to structured entities (defined in schema.graphql) in the schema.

Exactly what goes in these files defines the whole subgraph and API available to you.
Check out the next in series - Creating a Subgraph to start creating your own subgraphs!

Hope you learned some awesome stuff! 😎

Feel free to catch me here!

Subscribe to Naveen
Receive the latest updates directly to your inbox.
Verification
This entry has been permanently stored onchain and signed by its creator.