I previously wrote about how to use BigQuery to analyse Ethereum datasets. This is quick and easy for datasets that already have event tables thanks to the hard work of the Blockchain BigQuery team, but what if you want to analyse events that aren’t already broken out into their own tables? There is a global event table, bigquery-public-data.crypto_ethereum.logs
, but event data is in its raw format there - not easy to query on.
Fortunately, thanks to BigQuery’s support for user-defined functions, it’s possible to generate your own event tables that are functionally identical to the ‘official’ ones. Here’s how.
The logs
table is huge - over 2 billion rows and 1TB in size - and if you simply query it directly you’re going to end up paying a lot in BigQuery data charges, particularly if you’re extracting data on multiple events from a single contract, or testing things out. A sensible first step is to extract the event data you care about to a smaller working table. Run the following query, substituting the address of the contract you want to work with and a table name that works for you:
CREATE OR REPLACE TABLE `ens-manager.token.token_events` AS
SELECT
*
FROM
`bigquery-public-data.crypto_ethereum.logs`
WHERE
address = "0xc18360217d8f7ab5e7c516566761ea12ce7f9d72";
Next, we’re going to need some user-defined functions for doing crucial event processing data:
CREATE OR REPLACE FUNCTION `ens-manager.token.get_topic_hash`(abispec STRING) RETURNS STRING LANGUAGE js
OPTIONS (library=["https://storage.googleapis.com/ens-manager.appspot.com/ethers-abi.js"]) AS R"""
return abi.Interface.getEventTopic(abi.EventFragment.fromString(abispec));
""";
CREATE OR REPLACE FUNCTION `ens-manager.token.decode_log`(abispec STRING, data STRING, topics ARRAY<STRING>) RETURNS ARRAY<STRING> LANGUAGE js
OPTIONS (library=["https://storage.googleapis.com/ens-manager.appspot.com/ethers-abi.js"]) AS R"""
var iface = new abi.Interface(["event " + abispec]);
return iface.decodeEventLog(iface.fragments[0], data, topics);
""";
Both of these functions rely on a version of ethers.js that has been built to run as a self-contained JavaScript dependency. The resource is public, so feel free to use it here - no need to host it yourself.
get_topic_hash
takes an ABI specification such as Transfer(address indexed from, address indexed to, uint256 value)
and calculates the 256 bit topic hash for it, which we can then use to filter out the rows we care about from our logs table.
decode_log
takes an ABI specification in the same format, and uses it to decode an event log from data and topics fields, returning an array of decoded fields.
Finally, we’ll write a table function that uses our other two functions to dynamically generate a table with the events we care about:
CREATE OR REPLACE TABLE FUNCTION `ens-manager.token.decoded_logs`(event STRING) AS (
SELECT
transaction_hash,
block_number,
block_timestamp,
block_hash,
log_index,
`ens-manager.token.decode_log`(event, data, topics) AS event,
FROM
`ens-manager.token.token_events`
WHERE
topics[SAFE_OFFSET(0)] = `ens-manager.token.get_topic_hash`(event)
);
These three functions are persistent and independent of the logs you’re decoding - so you can define them once and reuse them everywhere.
Now we have the prerequisites in place, we can finally build the table of events we want. We can simply query decoded_logs
, and rename and cast the fields we care about. Here’s an example of decoding a Transfer
event:
CREATE OR REPLACE TABLE `ens-manager.token.event_Transfer` AS
SELECT
transaction_hash,
block_number,
block_timestamp,
block_hash,
log_index,
event[OFFSET(0)] AS `from`,
event[OFFSET(1)] AS `to`,
CAST(event[OFFSET(2)] AS bignumeric) / 1e18 AS value
FROM
`ens-manager.token.decoded_logs`("Transfer(address indexed from, address indexed to, uint256 value)");
Note how decoded_logs
outputs the event fields as an array in the order they’re specified in the ABI; they’re all output as strings, so we have to cast each element to its correct data type, and alias it as appropriate.
Assuming you want to do this persistently, you can simply define the functions once, then create a script that executes the first extraction query, followed by the decoded event table queries, and use BigQuery’s support for scheduled execution to run it on a regular basis, such as daily.