Greetings everyone! For those of you who do not know me, I am cofounder of Fungible Systems, a crypto product studio. We help build new and interesting web3 projects and build a lot of open source tooling.
Today I want to walk through some updates around the project we’ve been working on for the our residency with the Stacks Foundation. Project Kourier is our attempt at exploring new ways of building blockchain indexing services on top of Stacks.
tl;dr: We’ve been working on a new architecture and set of tools that would make it possible for developers to build out custom views of Stacks related data (think subgraphs but on Stacks).
Note: you can view an additional (long) breakdown of how this all works in this Loom video.
Data indexers are programs that run next to a blockchain (in our case, the Stacks network) which stores and processes the raw data from the network, transforming it into something useful and then potentially exposing that for folks to use in their apps. Most blockchains have one or many indexing services. Currently, Stacks primarily has one indexing service and API, Hiro PBC’s
stacks-blockchain-api project (repo here), which is the primary API in the Stacks ecosystem. If you’ve built an app on Stacks, you’ve almost certainly used this product.
The Hiro API is a monolithic node.js service that handles a number of different functions:
Below you can see a diagram of how this generally works:
The Stacks node is the software that more or less makes up the Stacks Blockchain -- it’s the software that processes events from its peers and it’s the same thing that miners run if they want to participate in Stacks mining.
The Hiro API is dependent on running a Stacks node along side the node.js application, as the Stacks node is the source of truth and data for the API. The node will emit JSON payloads for any event that it processes. This could be new Stacks blocks, new Bitcoin blocks, mempool transactions, attachments, etc. Basically anything that happens related to Stacks and the network.
The Hiro API has an event observer which captures these events and then saves them to a table in the PostgreSQL database that powers the API, while at the same time, processes the events as they stream in and transforms them into something more useful -- breaking up transactions, events, blocks, etc.
The way that the data is emitted from the Stacks node is a bit lower level than what most applications would want to use. This part of the API takes the raw events and processes them into data shapes that are more separate, verbose, and useful in the context of a relational database.
The API exposes many endpoints for different sorts of data that applications might need. Things like account balances, transaction history, fungible token events, non-fungible token events, and more. Much of the code in the API is actually dedicated to complex queries used to expose this more meaningful and human-understandable data.
Finally, the API also has code that will update different sets of data when a chain reorg happens. A chain reorg is when a new block is mined that references a different parent block. You can read more about them here. The longest chain (highest block height) is considered the “canonical” chain. If a chain reorg is detected, the API will follow the blocks back for however long the fork might be, and update any transaction or events that might have changed and set its canonical state relative to the new canonical chain.
Project Kourier is the result of needing to build out many bespoke data indexing services to power applications like stacking.club and Gamma. Any sufficiently advanced application is going to have unique data requirements that a generalized API like the Hiro API won’t be able to solve (and likely shouldn’t have to).
At a high level, Project Kourier is an attempt to break out the different services of the Hiro API into lower-level primitives: different microservices that developers can use as-needed depending on what their apps require. Additionally, there are some goals around usage and developer experience that we want to achieve:
This microservice deals solely with capturing and saving raw events from a Stacks node. There are many ways this can be done, but for our initial MVP, we have a simple node.js service that captures the events from the Stacks node and saves them into a PostgreSQL database.
We then hook up an instance of Hasura to provide a GraphQL layer for which consumers can query whatever raw data they might need. Due to the log-like nature of these events, you’re able to sync, stop syncing, and pick up from where you left off very easily.
There are many advantages of doing it this way, we can run many instances of this service, we can heavily cache this data because it’s immutable (does not change over time), and it completely removes the need to run a Stacks node.
In the future, I could see ways in which proofs could be given alongside this data to ensure it’s 100% genuine.
The primary goal of this project is remove the reliance on running a Stacks node, and ensuring a way to consistently have access to the raw data that the node emits. Shifting the responsibility of data availability to its own service makes building out higher level APIs and data indexers much more simple -- you no longer need the contextual knowledge and know-how to run a Stacks node. You can simply download all the raw data and process it however much you want.
This microservice stacks (pun intended) on top of the previous one, fetches and processes the raw data, then transforming it into more useful data. This is very similar to what the Hiro API does as it receives raw events.
This microservice is what the vast majority of consumers would sync from. This service would provide access to all the different types of data that the network produces: blocks, microblocks, mempool transactions, confirmed transactions, attatchments, and events (ft, nft, stx, and stacking events). As with the previous microservice, much of this data can be heavily cached.
In addition to the different types of data, the service could also expose the map of canonical block hashes, which clients would use to only sync the data for the current canonical chain. This saves a lot of time and prevents duplicate work.
This concept of only syncing canonical data inspired elements this pull request by Matt Little (an amazing engineer) in the
BTW h/t to @aulneau who inspired the concept behind this idea a while back. Thank you!
Outside of these two microservices, I can see a set of tooling and software that would allow people to build out robust data indexers. The tooling could help you spin up a new API instance, expose helper functions like filtering data to events or transactions associated with a given trait or contract, or filtering data to events associated with a given function. The possibilities here are really endless.
I think of this layer of the stack as the tools which would allow for someone to build a service very similar to The Graph, where you can write sub-graphs which process data from these microservices and expose different “views” of that data. At the end of the day, the Hiro API is simply a custom “view” into this data and exposed as a REST API.
There are many products that can be built from a service like this, here are a few off the top of our heads:
Stacking club is an aggregator for all stacking data on stacks. If you want to know anything about stacking, you can likely find out about it there. To create aggregate views of data, we needed to be able to query and transform data in custom ways which the Hiro API would not be able to support.
With this new architecture, we’d be able to sync data from this service and transform it much more easily than if we had to rely on the generalized Hiro API service. The new indexer would be able to select only the data it needs: PoX reward data and stacking related transactions.
Another use case that would be able to make use of this kind of architecture would be discord bots or anything else that would use a webhook. You could build out a Cloudfare Worker that watches for new events filtered to a set of contracts (eg markplace contracts or other NFT related contracts), and fire a webhook to a discord bot running. You can almost think of these as lamdas for Stacks.
Finally, even Hiro’s API could make use of this architecture. If they had the desire to no longer run Stacks nodes along side their REST API, they could make use of these microservices to populate the database that powers the REST API. This would allow anyone to run an instance of the Hiro API without needing to run a Stacks node.
Much of our time in the residency has been put towards research and development around our own pain points, and ways in which we could architect this new system. We have been working hard on building out an MVP implementation of this architecture, with hopes of having some open source code to share in the next couple of months.
We are planning on releasing both microservices fully open source, so anyone can run them if they choose, and additionally, we’re hoping to build out a simple service that would sync from these microservices: something like a fungible token metadata indexer, or a webhook tool. Have something you want to see us demo? Please reach out.
We see many potential applications that could be built around data indexing as a service solutions. We could see a Prisma wrapper for Stacks data, where you’d be able to make use of a highly typed ORM but for scaffolding out Stacks based APIs and subgraphs.
Because we have static analysis, we can build out tooling to help developers find the data that is important to their specific applications needs: finding data related to a given trait (eg all SIP009 tokens, or all stacking related activity.)
Outside of the standard data we’ve come to expect from things like the Hiro API, something we’re very interested in is building an indexing service for all Clarity state changes. There was a PR that was opened in the
stacks-blockchain repository that would enable this kind of functionality:
At a high level, if the Stacks node software were to emit events for all Clarity state changes, this would open the door for completely new kinds of indexing services built on Stacks that other chains cannot replicate. You could imagine a service that would give historical data for any Clarity value you wanted.
Additionally, we are working hard on a secret public goods product that will make use of some of this new architecture, more on that soon!
Are you interested in building out a platform that uses tooling like this? We’re looking for someone who wants to take this architecture and run with it. We believe it would be an incredibly valuable initiative, and we’d love to see it have a life of its own. Reach out to us if this is you!
Thanks for taking the time to read our deep dive into our residency work. Excited to share more with you as we progress.