on cointbot, my cointegration trader

tl; dr

today i go over a CLI tool and a set of trading bots that i’ve written to detect profitable cryptocurrency pairs to be shorted or longed on trading exchanges.

these statistical algorithmic strategies are named cointegration, which has been around for a long time, for either traditional or decentralized finances.


🎶 today’s mood


🧘🏻‍♀️✨ cointegration strategy for pair trading

pair trading is a classic example of a strategy based on mathematical analysis.

put it simply, when two or more non-stationary series can be combined to make a stationary series, they are said to be cointegrated.

in other words, this strategy allows you to find evidence of an underlying economic link for a pair of securities (say, A and B) within a timeframe. it also allows you to mathematically model this link, so that you can make trades on it.

💡 a series are said to be stationary when the parameters of the data-generating process do not change over time.


modeling a pair of securities with math

let’s take two random crypto assets, say, A and B futures. let’s model each of their returns by drawing their normal distributions (aka the bell curve).

💡 crypto derivatives are financial contracts that derive their values from underlying assets. futures are financial contracts that bet on a cryptocurrency's future price, allowing exposure without purchasing.

if these two series are cointegrated, there exists some linear combination between them varying around a mean. in other words, their combination should be related to the same probability distribution.

cointegration of FLOWUSDT and 1INCHUSDT, generated by cointbot
cointegration of FLOWUSDT and 1INCHUSDT, generated by cointbot

the beauty of p-values

correlation and cointegration are similar but not the same. for example, correlated series could just diverge together without being cointegrated.

how do we infer cointegration? we do like the scientists do.

a p-value is the probability of obtaining results at least as extreme as the results of a hypothesis test, assuming that the null hypothesis is correct.

a p-value of 0.05 or lower is generally considered statistically significant.

cointegrated series can show very small p-values but still not be correlated.


the trick of pair trading

the coefficients that define stationary combinations of two series are called hedge ratios. in practical terms, the hedge ratio describes the suggested amount of B to buy or sell for every of A.

because both securities drift towards and apart from each other, sometimes the distance is high, and sometimes the distance is low.

the magick comes from maintaining a hedged position across A and B. if both go down or up, you neither make nor lose money. profit comes from the spread of them reverting to the mean:

  • when A and B are far apart, you short B and long A: when the spread is small, you expect it to become larger.

  • when A and B are close, you long B and short A: when the spread is large, you expect it to become larger.


spread and z-score

we apply a linear regression to calculate the spread of these two series, which is simply defined by:

spread = first series - (hedge ratio * second series)  

this gives us that linear combination coefficient, the hedge ratio (this is known as the engle-granger method).

however, the spread does not give you an immediate signal for trading. the signal still needs to be normalized so it can be treated as a z-score, which is the number of standard deviations separating the current price from the mean price.

traders can look at the momentum of the average z-score and takes a contrarian approach to trade, to generate buy and sell signals. graphically, positive z-scores lie to the right of the mean, and negative z-scores lie to the left of the mean.

here is an example of a strategy:

  • whenever the z-score < -1, you long the spread.

  • whenever the z-score > 1, you short the spread.

  • exit positions when the z-score ~ 0.


“there are three types of lies: lies, damn lies, and statistics”

math is awesome, but…

obviously, any trading strategy comes with advantages and shortcomings (pretty much like any flavor of text editor, you know the drill).

here is a simple picture of cointegration:

.you are the master of your own life.
.you are the master of your own life.

🧘🏾✨ the cointbot package

the cointbot package consists of a CLI and a set of libraries for cointegration pair trading, with support for different market types, parameters, and bots designs:

for example, Bot1 has the following strategy:

1️⃣ search for all possible crypto perpetual derivative contracts in a cex that can be longed or shorted

2️⃣ retrieve their price history for a given timeframe

3️⃣ calculate all the pairs that cointegrated by looking at p-values smaller than a certain threshold

4️⃣ calculate their spread and their latest z-score signal

5️⃣ backtest to long when z-score < 0

6️⃣ if the asset is hot, confirm tokens to be longed and shorted, within the initial capital

7️⃣ with these close signals, average in limit orders or place market orders


setting up cointbot

to test cointbot, you will need a testnet account from bybit. if you want to use any other cex, the code is free (or wait until i have time to implement them).

after cloning cointbot, add all the necessary system and trading settings to a .env file, and then install the python package:


✅ you are now all set to explore cointbot:

cointbot CLI
cointbot CLI

🧘🏿‍♀️✨ fetching a perpetual currency’s data

💡 a perpetual contract is a contract that can be held in perpetuity, i.e., indefinitely until the trader closes their position.

let’s start testing cointbot by running the simplest option, which is simply calling bybit’s API to query the market data for all derivatives (symbols) for a given currency (e.g., USDT):

here is an example of the output:

.fetching all available derivative's data for USDT.
.fetching all available derivative's data for USDT.

🧘🏼✨ fetching price history for a derivative currency

cute. now let’s get to business and start our cointegration analysis.

the second menu option queries the market price k-lines for all symbols above in a given TIMEFRAME and KLINE-LIMIT, not only printing them to STDOUT but also saving them as JSON to OUTPUTDIR/PRICE_HISTORY_FILE:

💡 in the context of trading, a k-line represents the fluctuation of asset prices in a given time frame. it shows the close price, open price, high price, and low price. if the close price > open price, the k-line has a positive line. otherwise, it is a negative line.

here is an example of output:

.fetching price history for USDT.
.fetching price history for USDT.

💡 bybit employs a dual-price mechanism to prevent market manipulations (when the market price on a futures exchange deviates from the spot price, causing mass liquidation of traders' positions). the dual-price mechanism consists of mark price and last traded price. "mark price" refers to a global spot price index plus a decaying funding basis rate, and it's used as a trigger for liquidation and to measure unrealized profit and loss. "last traded price" is the current market price, anchored to the spot price using the funding mechanism.


🧘🏽‍♀️✨ calculating cointegration for the history data

with the price history data from the previous step, we can now calculate cointegration for each symbol (for the desired PLIMIT , the chosen p-value that defines a "hot" pair:

here is an example of output:

this step will also calculate p-values, hedge ratios, and zero crossings. the resulting Pandas' DataFrame is then saved at OUTPUTDIR/COINTEGRATION_FILE, sorted by zero_crossing.

💡 in statistics, zero crossing is a point where the sign of function changes. In the context of trading, it determines an entry point (using the price in relation to the moving average as a direction confirmation).


🧘🏼‍♀️✨ backtesting a cointegrated pair

did it work?

💡 in the context of crypto trading, backtesting is accomplished by reconstructing, with historical data, trades that would have occurred in the past using rules defined by a given strategy, gauging the effectiveness of the strategy.

select your favorite asset pair from the previous step, and let’s backtest their cointegration by testing the success of the hypothesis (and making some cool plots for their series’ spreads and z-score).

example of output for BNBUSDT vs. ALGOUSDT:

by the way, this command also generates their cointegration plots and backtest data, and saves them at OUTPUTDIR/.

💡* lil tip: if you are starting an entirely new run, clean up the current setup with *make clean_data.*


🧘🏽‍♂️✨ looking at the top cointegrated pairs

once we have all data from the previous step, we can look at the top cointegrated securities for the given TIMEFRAME and NUMBER:

example of output:

note that this command automatically generates the backtesting data and plots (similar to the previous option).

✅ congrats, you now understand cointegration pair trading. it’s time to move to our trading bots.


🧘🏾‍♀️✨ testing orderbooks websockets

our bot will be connecting to bybit’s through both REST APIs and websockets endpoints. let’s start by testing the last one.

to open a websocket subscribed to a cointegration pair (either for spot, linear, or inverse markets), run:


topics for spot market

spot market topics are implemented by the trade_v1_stream() method, which pushes raw data for each trade (API docs here).

after a successful subscription message, the first data message (f: true), consists of the last 60 trades.

after (f: false), only new trades are pushed (at a frequency of 300ms, where the message received has a maximum delay of 400ms).

example of output:

websockets connection for spot
websockets connection for spot

topics for inverse perpetual/futures market

inverse market topics are implemented with orderbook_25_stream(), which fetches the orderbook with a depth of 25 orders per side (API docs here).

after the subscription response, the first response will be the snapshot response, showing the entire orderbook.

the data is ordered by price (starting with the lowest buys). push frequency is 20ms.

example of output:

websockets connection for inverse
websockets connection for inverse

topics for USDT linear perpetual

finally, USDT linear market topics are implemented with orderbook_25_stream(), which fetches the orderbook with a depth of 25 orders per side (API docs here).

the first response is the snapshot response, showing the entire orderbook.

the data is ordered by price, starting with the lowest buys and ending with the highest sells. push frequency is 20ms.

example of output:

.websockets connection for linear.
.websockets connection for linear.

🧎🏻‍♀️✨ deploying a cointegrated trading bots

all right, we made it. let’s deploy those cuties.

several bots with different strategies are found inside src/bots/. in this article, we will go over the strategy and deployment of Bot1. feel free to explore the other bots there, and if you would like to keep up to date with the new ones i am continuously adding, just star the repo, dunno 🤷🏻‍♀️.

by the way, each bot has a different number and configuration settings in the .env file (e.g., BOT_COINS, BOT_MARKET_TYPE, BOT_ORDER_TYPE, BOT_STOP_LOSS, BOT_TRADEABLE_CAPITAL, and others). before the next step, you should check them out (and understand their effects).


high-level strategy for Bot1

this is how Bot1 gets set up:

and this is how Bot1 executes, inside a while True loop:

you should check the code (the main class is called BbBotOne), and then spin it up:

for more details on what happens next, check out cointbot repo 😉.

by the way, you can also have Bot1 running inside a docker container with:


▇ ♄

Subscribe to go outside labs
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
Verification
This entry has been permanently stored onchain and signed by its creator.