Custom Event Tables with BigQuery

I previously wrote about how to use BigQuery to analyse Ethereum datasets. This is quick and easy for datasets that already have event tables thanks to the hard work of the Blockchain BigQuery team, but what if you want to analyse events that aren’t already broken out into their own tables? There is a global event table, bigquery-public-data.crypto_ethereum.logs, but event data is in its raw format there - not easy to query on.

Fortunately, thanks to BigQuery’s support for user-defined functions, it’s possible to generate your own event tables that are functionally identical to the ‘official’ ones. Here’s how.

Extracting relevant logs

The logs table is huge - over 2 billion rows and 1TB in size - and if you simply query it directly you’re going to end up paying a lot in BigQuery data charges, particularly if you’re extracting data on multiple events from a single contract, or testing things out. A sensible first step is to extract the event data you care about to a smaller working table. Run the following query, substituting the address of the contract you want to work with and a table name that works for you:

CREATE OR REPLACE TABLE `ens-manager.token.token_events` AS
SELECT
  *
FROM
  `bigquery-public-data.crypto_ethereum.logs`
WHERE
  address = "0xc18360217d8f7ab5e7c516566761ea12ce7f9d72";

Helper Functions

Next, we’re going to need some user-defined functions for doing crucial event processing data:

CREATE OR REPLACE FUNCTION `ens-manager.token.get_topic_hash`(abispec STRING) RETURNS STRING LANGUAGE js
OPTIONS (library=["https://storage.googleapis.com/ens-manager.appspot.com/ethers-abi.js"]) AS R"""
return abi.Interface.getEventTopic(abi.EventFragment.fromString(abispec));
""";

CREATE OR REPLACE FUNCTION `ens-manager.token.decode_log`(abispec STRING, data STRING, topics ARRAY<STRING>) RETURNS ARRAY<STRING> LANGUAGE js
OPTIONS (library=["https://storage.googleapis.com/ens-manager.appspot.com/ethers-abi.js"]) AS R"""
var iface = new abi.Interface(["event " + abispec]);
    return iface.decodeEventLog(iface.fragments[0], data, topics);
""";

Both of these functions rely on a version of ethers.js that has been built to run as a self-contained JavaScript dependency. The resource is public, so feel free to use it here - no need to host it yourself.

get_topic_hashtakes an ABI specification such as Transfer(address indexed from, address indexed to, uint256 value) and calculates the 256 bit topic hash for it, which we can then use to filter out the rows we care about from our logs table.

decode_log takes an ABI specification in the same format, and uses it to decode an event log from data and topics fields, returning an array of decoded fields.

Finally, we’ll write a table function that uses our other two functions to dynamically generate a table with the events we care about:

CREATE OR REPLACE TABLE FUNCTION `ens-manager.token.decoded_logs`(event STRING) AS (
SELECT
    transaction_hash,
    block_number,
    block_timestamp,
    block_hash,
    log_index,
    `ens-manager.token.decode_log`(event, data, topics) AS event,
  FROM
    `ens-manager.token.token_events`
  WHERE
    topics[SAFE_OFFSET(0)] = `ens-manager.token.get_topic_hash`(event)
);

These three functions are persistent and independent of the logs you’re decoding - so you can define them once and reuse them everywhere.

Building a table of decoded events

Now we have the prerequisites in place, we can finally build the table of events we want. We can simply query decoded_logs, and rename and cast the fields we care about. Here’s an example of decoding a Transfer event:

CREATE OR REPLACE TABLE `ens-manager.token.event_Transfer` AS
    SELECT
    transaction_hash,
    block_number,
    block_timestamp,
    block_hash,
    log_index,
    event[OFFSET(0)] AS `from`,
    event[OFFSET(1)] AS `to`,
    CAST(event[OFFSET(2)] AS bignumeric) / 1e18 AS value
  FROM
    `ens-manager.token.decoded_logs`("Transfer(address indexed from, address indexed to, uint256 value)");

Note how decoded_logs outputs the event fields as an array in the order they’re specified in the ABI; they’re all output as strings, so we have to cast each element to its correct data type, and alias it as appropriate.

Keeping it up to date

Assuming you want to do this persistently, you can simply define the functions once, then create a script that executes the first extraction query, followed by the decoded event table queries, and use BigQuery’s support for scheduled execution to run it on a regular basis, such as daily.

Subscribe to Nick Johnson
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
Verification
This entry has been permanently stored onchain and signed by its creator.