Basin Vaults Tracker: web3.py & GitHub Actions to Parse Contract…

Basin Vaults Tracker: web3.py & GitHub Actions to Parse Contract Events for New Vaults

February 1st, 2024

What is it?

Textile has built an experimentally verifiable data layer called Basin, which lets anyone create “vaults” that act as a data container for arbitrary data. A vault is created by an EVM-compatible wallet (on Filecoin FVM), thus, granting the account write ownership to the vault. As data is pushed to the vault, it’s registered onchain as an append-only log for all mutations associated with the vault.

We built a simple demonstration of how you can:

Parse all vault creation events from the Basin storage smart contract.
Fetch vault information from the Basin HTTP API.
Store the state for each run and a markdown file that summarizes the activity.
Run this as a GitHub action daily.

Check out the source code and output data files, which lets you inspect various vaults, view their event CIDs, and extract the raw data pushed to the vault as parquet:

https://github.com/textileio/demo-vault-tracker

How does it work?

The program is built in python. It uses web3.py to connect to Filecoin Calibration and then parse vault creation events from the logs at the Basin storage contract (at the address 0xaB16d51Fa80EaeAF9668CE102a783237A045FC37 ). For each log, there is information emitted that includes:

owner: The address of the vault creator.
vault: The bytes of a keccak hash of the vault’s actual name.
block number: The block in which the event/vault was created.

To retrieve the logs, a block range is specified to get all of the logs during that period, and then a few processing steps happen:

Parse the logs for event and block information (described above).
Fetch all vaults for the owner addresses seen during the run via the Basin HTTP API.
Store the run and vault information in a state.json file.
Write summary results to a data markdown file.

These steps run with a GitHub action every day. The action runs the logic and then commits the changes automatically so that anyone can inspect all vaults and related information about the underlying data on the network.

How did we make it?

Overview

To kickstart the project, cookiecutter and sourcery-ai’s template helped set everything up with pipenv, linting, formatting, type hints with mypy, and pre-commit/push hooks. Here’s the gist of the project’s structure:

.
├── Data.md
├── abi.json
├── state.json
├── .github
│   └── workflows
│       └── daily.yaml
└── vaults_tracker
    ├── __init__.py
    ├── __main__.py
    ├── fetch.py
    └── write.py

Data.md: Stores all vaults ever created with relevant links to the Basin HTTP API that points to the actual data in the vault.
abi.json: The Basin storage contract ABI, used with web3.py to connect to the contract.
state.json: A simple datastore for each run and cumulative vault information per owner.
.github/workflows/daily.yaml: The GitHub actions workflow file that runs the script and commits the data once per day.
__main__.py: The entry point for the script
fetch.py: Connects the Basin storage contract, parses events, and makes calls to the Basin HTTP API.
write.py: Writes run information to both the state.json file and Data.md file.

If you’d like to better understand the actual source code and setup steps, clone the repo and read over the README:

git clone https://github.com/textileio/demo-vault-tracker

Fetching data with web3.py

The first step is to instantiate the Web3 class with an RPC URL (Filecoin Calibration), the contract’s ABI, and the contract’s deployed address:

from json import loads
from pathlib import Path

from web3 import Web3

url = "https://rpc.ankr.com/filecoin_testnet"
w3 = Web3(Web3.HTTPProvider(url))

def get_contract_create_events(start_block, end_block):
    # The ABI file is in the root directory
    abi_file = Path(__file__).parent.parent / "abi.json"
    with open(abi_file, "r") as basin_abi:
        abi = loads(basin_abi.read())
		
    # Creation events occur when `PubCreated` is emitted
    new_vault_event = "PubCreated"
    basin_address = Web3.to_checksum_address(
        "0xaB16d51Fa80EaeAF9668CE102a783237A045FC37" # On Filecoin Calibration
    )
		
    # Create a Basin contract connection
    contract = w3.eth.contract(address=basin_address, abi=abi)

In our case, we’re only interested in parsing log data for PubCreated. This is what signals that a new vault has been created and includes the owner and vault information. Now, there is a bit of a challenge when using the get_logs method (shown below). Each Filecoin Calibration RPC provider implements their own limits for how far in the past you can query, and the block range for retrieving events also has a limit.

Thus, if you want to query a range greater than the block range limit, you have to chunk it up into multiple get_logs queries. The following shows how the result from chunking is used. An events list holds all events that are seen between the start_block and end_block parameters:

def get_contract_create_events(start_block, end_block):
# ...
    chunks = chunk_block_range(start_block, end_block)
    events = []
    for chunk in chunks:
        new_events = contract.events["PubCreated"].get_logs(
            fromBlock=chunk["start_block"], toBlock=chunk["end_block"]
        )
        if new_events:
            events.append(new_events)

The chunking process involves looking at the start_block and end_block, and if these are greater than the 2880 block limit imposed by the RPC provider, then multiple batches must be created. The logic below isn’t the most effective way to accomplish this, but it gets the job done!

def chunk_block_range(start_block, end_block):
    block_range = end_block - start_block
    if (block_range) > 2880:
        start_chunk = start_block
        end_final = end_block
        chunks = []
        while block_range > 0:
            end_chunk = start_chunk + 2880  # Block range max is 2880
            if end_chunk > end_final:
                end_chunk = end_final
            chunks.append({
              "start_block": start_chunk, 
              "end_block": end_chunk
            })
            start_chunk = end_chunk + 1
            block_range = end_final - start_chunk
        return chunks

Once all of the events are retrieved, we can then parse the data to get the vault owner’s address, a hash of the vault’s name, and the block number it occurred at.

def get_data_from_events(contract_events):
    data = []
    for events in contract_events:
        for event in events:
            args = event["args"]
            owner = args["owner"]
            vault = args["pub"]
            block_num = event["blockNumber"]
            data.append(
                {
                  "owner": owner,
                  "vault_hash": vault.hex(), 
                  "block_num": block_num
                }
            )

    return data

Calling the Basin HTTP API

Now that we have the event data, we know the value of each owner's address from the event logs. It’s possible that the same owner created more than one vault, so we’ll want to deduplicate these values. In our __main__.py file, we’ll set everything up and call the methods above.

# Note: the start/end block would be defined by application logic

# Get the vault creators from the Basin contract's `PubCreated` event
create_events = get_contract_create_events(start_block, end_block)

# Get data from the logs, including vault creator, vault hash, & block
data = get_data_from_events(create_events)

# Remove duplicate vault creators/owners
owners = {data["owner"] for data in data}

# Get all vaults for each vault creator via Basin API
vaults = []
for owner in owners:
    vault = get_vaults(owner)
    vaults.append({owner: vault})

This gives us a way to fetch all vaults that have been created by the unique set of owners with the get_vaults method.

def get_vaults(address):
    url = "https://basin.tableland.xyz/vaults"
    params = {"account": address}
    response = get(url, params=params)

    if response.status_code == 200:
        vaults = response.json()
        return vaults
    else:
        raise Exception(
            f"Failed to fetch vault data: {response.status_code}",
        )

Storing state & writing files

The last step is to take the information from above and store it in a state.json and Data.md file. We’ll do the following in our __main__.py file:

Create a snapshot dictionary with the run info (starting block, ending block, and the parsed/formatted events seen during that period).
Take the snapshot and index the run by the end_block, which will be used to ensure that each run has a unique identifier because runs are spaced out over a long period.
Write the new state, read the new/full state, and then write to a summary markdown file.
- Note: The get_saved_state logic just reads the state.json file.

This is what it looks like:

# Store the current snapshot/run
snapshot = {
    "start_block": start_block,
    "end_block": end_block,
    "events_data": data,
}
# Index the run by the `end_block` number
new_run = {end_block: snapshot}

# Write the new run and cumulative vaults files
write_to_state(new_run, vaults)
updated_state = get_saved_state()
write_to_markdown(updated_state, end_block)

You can check out the source code for the details behind the last couple of steps—there are plenty of inline comments and docs to explain what’s going on. The resulting state file will resemble the following:

{
    "vaults": [
        {
            "0x635285e3d83ba6723E7b23840b324b2a7A6532bC": [
                "wxm_t1.data",
            ...
            ]
        },
        {
            "0x00FEEc1fC91074f5F38a8FC5129dbc4FD204eca6": [
                "bruno2.test",
            ...
            ]
        },
        {
            "0x0c9CE72E9c30a0ebC61f976e8254a4F58D276248": [
                "avichalp.test_01_24_24",
            ...
            ]
        }
    ],
    "runs": {
        "1296221": {
            "start_block": 1076346,
            "end_block": 1296221,
            "events_data": [
                {
                    "owner": "0x635285e3d83ba6723E7b23840b324b2a7A6532bC",
                    "vault_hash": "a49518b3dfc33a3c7091c0821c2b9979b3bfd6f8d3f7f354a1747eedfb19d0c4",
                    "block_num": 1289156
                },
                ...
            ]
        }
    }
}

How can someone use it?

There are a couple of ways anyone can work with this information. You could:

Inspect the Data.md file (here) Events columns and use the Basin CLI or API to retrieve the vault mutation events/CIDs—these can be extracted as parquet files to get the raw data pushed to the vault.
Produce your own code that runs through similar logic and lets you build an automated index of onchain events with GitHub actions.

Learn more

If you found this interesting, dive into our Discord and let us know! We also have a weekly newsletter called Weeknotes, which features the latest updates about the protocol. You can check it out here.

Subscribe to Tableland

Receive the latest updates directly to your inbox.

Mint this entry as an NFT to add it to your collection.

Verification

This entry has been permanently stored onchain and signed by its creator.

Arweave Transaction

nwsnpvqc53tCCC4…yL4s4eIFVwBo-j8

Author Address

0x4D5286d81317E28…b478552Bbe641ae

Content Digest

NLzxTnXlPxJUfFR…tB55EBtj-jy4EcU