Path to Never-Ending Indexing

TLDR

DexGuru has been developing an analytics product on the blockchain for six years, particularly focused on optimizing ETL pipelines for blockchain data to enhance transaction efficiency and scalability. Initially using a simple script to poll data from the Ethereum node, DexGuru evolved to a more sophisticated Messaging/Workers model for processing decentralized exchange (DEX) events. Now we are switching towards https://github.com/blockchain-etl/ethereum-etl adding as open-source community driven indexer, with additional modules we’ve developed allowing us to support wide variety of data sources.

Intro

What’s analytics product on blockchain? That’s an Indexer top to bottom where you’re “moving the data”, “changing the schema” across the way so the data would be massaged enough to be able to be presented to users. I’m myself working in that kind of special computer science olympics for 6 years already, and can say the the last 3 years when I’m applying myself to blockchain are the most dynamic once and rough. Sometimes I’m feeling like we are in never ending marathon on squeezing more transactions in the blocks, making it cheeper, making it scale for streaming applications which demands for never-ending optimizations in performance across all ETL pipeline from Nodes to OLAP Storage and APIs to serve the frontend.

In DexGuru we started our way there with our first MVP in 2021, where script was just polling data in never ending loop from Ethereum node, processing new blocks coming. As pipeline for processing DEX Events got more comprehensive and we’ve got chains like BNB, we switched the never-ending loop thing to Messaging/Workers models which you can scale horizontally to the limits of Nodes/Storage Indexation Capacity. In late 2021, that was the first time I came across https://github.com/blockchain-etl/ethereum-etl. We already had indexer set which could scale and was architect as Messaging based set of workers combined together in ETL pipeline sending docs towards multiple storages (Elasticsearch, Clickhouse). So my evaluation was that https://github.com/blockchain-etl/ethereum-etl looks well architected, there was even some addition to it with Airflow integration, but we already had Messages orchestrated ETL with all the outputs we need for trading terminal. So we continued to contribute towards our proprietary indexer.

One year ago, as we faced the need to switch our services lineup more towards B2B type of clients we needed to horizontally scale our indexer towards more types of entities/chains needed, and we’ve realized that there are solutions there, which are following the same modular architecture as https://github.com/blockchain-etl/ethereum-etl and in the same time implementing the features we need. So we could save time developing our own proprietary Indexer, if we would switch/make our pipelines compartable with https://github.com/blockchain-etl/ethereum-etl.

There was the decision, switch our indexation onto fork https://github.com/blockchain-etl/ethereum-etl!

But, as we are trading terminal, our tastes for data are very specific, and set of features we demand as well, so over the year we’ve developed on top of https://github.com/blockchain-etl/ethereum-etl including:

  • near real-time indexation (1 block diff)

  • ERC20 Transfers enriched with balances and USD Prices

  • ERC721/1125 Support for transfers and Balances

  • Data Consistency checker and fixer

  • DEX Inventory Entities support

  • DEX Trades Entities support

  • USD Prices, OHLC

  • DEX Liquidity Events Support

  • Balances Entities Support

  • Account PnL Entities Support

  • Clickhouse Migrations Support

  • Multi-Chain Support Improvements (we indexing 15 chains)

  • Chain Re-Orgs Support

  • RabbitMQ cross-workers (https://github.com/blockchain-etl/ethereum-etl for different entities) messaging

  • GraphQL API for supporting GuruBlock explorer to expose https://github.com/blockchain-etl/ethereum-etl data.

And now out of four products we have in our lineup, three are running completely off new “seasoned” https://github.com/blockchain-etl/ethereum-etl indexer. It's evident that our journey to this point has been challenging yet immensely rewarding. Recognizing the depth and significance of the problems we've overcome, I intend to dedicate the next month to a detailed exploration of each issue we faced and resolved, dissecting them one topic at a time. This retrospective will not only chronicle our journey but also serve as a valuable resource for understanding how to evaluate Indexer, and Indexation problem in future.

Subscribe to evahteev
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
Verification
This entry has been permanently stored onchain and signed by its creator.