today i go over a CLI tool and a set of trading bots that i’ve written to detect profitable cryptocurrency pairs to be shorted or longed on trading exchanges.
these statistical algorithmic strategies are named cointegration, which has been around for a long time, for either traditional or decentralized finances.
pair trading is a classic example of a strategy based on mathematical analysis.
put it simply, when two or more non-stationary series can be combined to make a stationary series, they are said to be cointegrated.
in other words, this strategy allows you to find evidence of an underlying economic link for a pair of securities (say, A and B) within a timeframe. it also allows you to mathematically model this link, so that you can make trades on it.
💡 a series are said to be stationary when the parameters of the data-generating process do not change over time.
let’s take two random crypto assets, say, A and B futures. let’s model each of their returns by drawing their normal distributions (aka the bell curve).
💡 crypto derivatives are financial contracts that derive their values from underlying assets. futures are financial contracts that bet on a cryptocurrency's future price, allowing exposure without purchasing.
if these two series are cointegrated, there exists some linear combination between them varying around a mean. in other words, their combination should be related to the same probability distribution.
correlation and cointegration are similar but not the same. for example, correlated series could just diverge together without being cointegrated.
how do we infer cointegration? we do like the scientists do.
a p-value is the probability of obtaining results at least as extreme as the results of a hypothesis test, assuming that the null hypothesis is correct.
a p-value of 0.05 or lower is generally considered statistically significant.
cointegrated series can show very small p-values but still not be correlated.
the coefficients that define stationary combinations of two series are called hedge ratios. in practical terms, the hedge ratio describes the suggested amount of B to buy or sell for every of A.
because both securities drift towards and apart from each other, sometimes the distance is high, and sometimes the distance is low.
the magick comes from maintaining a hedged position across A and B. if both go down or up, you neither make nor lose money. profit comes from the spread of them reverting to the mean:
when A and B are far apart, you short B and long A: when the spread is small, you expect it to become larger.
when A and B are close, you long B and short A: when the spread is large, you expect it to become larger.
we apply a linear regression to calculate the spread of these two series, which is simply defined by:
spread = first series - (hedge ratio * second series)
this gives us that linear combination coefficient, the hedge ratio (this is known as the engle-granger method).
however, the spread does not give you an immediate signal for trading. the signal still needs to be normalized so it can be treated as a z-score, which is the number of standard deviations separating the current price from the mean price.
traders can look at the momentum of the average z-score and takes a contrarian approach to trade, to generate buy and sell signals. graphically, positive z-scores lie to the right of the mean, and negative z-scores lie to the left of the mean.
here is an example of a strategy:
whenever the z-score < -1, you long the spread.
whenever the z-score > 1, you short the spread.
exit positions when the z-score ~ 0.
math is awesome, but…
obviously, any trading strategy comes with advantages and shortcomings (pretty much like any flavor of text editor, you know the drill).
here is a simple picture of cointegration:
the cointbot
package consists of a CLI and a set of libraries for cointegration pair trading, with support for different market types, parameters, and bots designs:
for example, Bot1
has the following strategy:
1️⃣ search for all possible crypto perpetual derivative contracts in a cex that can be longed or shorted
2️⃣ retrieve their price history for a given timeframe
3️⃣ calculate all the pairs that cointegrated by looking at p-values smaller than a certain threshold
4️⃣ calculate their spread and their latest z-score signal
5️⃣ backtest to long when z-score < 0
6️⃣ if the asset is hot, confirm tokens to be longed and shorted, within the initial capital
7️⃣ with these close signals, average in limit orders or place market orders
to test cointbot, you will need a testnet account from bybit. if you want to use any other cex, the code is free (or wait until i have time to implement them).
after cloning cointbot, add all the necessary system and trading settings to a .env
file, and then install the python package:
💡 a perpetual contract is a contract that can be held in perpetuity, i.e., indefinitely until the trader closes their position.
let’s start testing cointbot by running the simplest option, which is simply calling bybit’s API to query the market data for all derivatives (symbols) for a given currency (e.g., USDT
):
here is an example of the output:
cute. now let’s get to business and start our cointegration analysis.
the second menu option queries the market price k-lines for all symbols above in a given TIMEFRAME
and KLINE-LIMIT
, not only printing them to STDOUT
but also saving them as JSON
to OUTPUTDIR/PRICE_HISTORY_FILE
:
💡 in the context of trading, a k-line represents the fluctuation of asset prices in a given time frame. it shows the close price, open price, high price, and low price. if the close price > open price, the k-line has a positive line. otherwise, it is a negative line.
here is an example of output:
💡 bybit employs a dual-price mechanism to prevent market manipulations (when the market price on a futures exchange deviates from the spot price, causing mass liquidation of traders' positions). the dual-price mechanism consists of mark price and last traded price. "mark price" refers to a global spot price index plus a decaying funding basis rate, and it's used as a trigger for liquidation and to measure unrealized profit and loss. "last traded price" is the current market price, anchored to the spot price using the funding mechanism.
with the price history data from the previous step, we can now calculate cointegration for each symbol (for the desired PLIMIT
, the chosen p-value that defines a "hot" pair:
here is an example of output:
this step will also calculate p-values, hedge ratios, and zero crossings. the resulting Pandas' DataFrame
is then saved at OUTPUTDIR/COINTEGRATION_FILE
, sorted by zero_crossing
.
💡 in statistics, zero crossing is a point where the sign of function changes. In the context of trading, it determines an entry point (using the price in relation to the moving average as a direction confirmation).
did it work?
💡 in the context of crypto trading, backtesting is accomplished by reconstructing, with historical data, trades that would have occurred in the past using rules defined by a given strategy, gauging the effectiveness of the strategy.
select your favorite asset pair from the previous step, and let’s backtest their cointegration by testing the success of the hypothesis (and making some cool plots for their series’ spreads and z-score).
example of output for BNBUSDT
vs. ALGOUSDT
:
by the way, this command also generates their cointegration plots and backtest data, and saves them at OUTPUTDIR/
.
💡* lil tip: if you are starting an entirely new run, clean up the current setup with *
make clean_data
.*
once we have all data from the previous step, we can look at the top cointegrated securities for the given TIMEFRAME
and NUMBER
:
example of output:
note that this command automatically generates the backtesting data and plots (similar to the previous option).
our bot will be connecting to bybit’s through both REST APIs and websockets endpoints. let’s start by testing the last one.
to open a websocket subscribed to a cointegration pair (either for spot, linear, or inverse markets), run:
spot market topics are implemented by the trade_v1_stream()
method, which pushes raw data for each trade (API docs here).
after a successful subscription message, the first data message (f: true
), consists of the last 60 trades.
after (f: false
), only new trades are pushed (at a frequency of 300ms, where the message received has a maximum delay of 400ms).
example of output:
inverse market topics are implemented with orderbook_25_stream()
, which fetches the orderbook with a depth of 25 orders per side (API docs here).
after the subscription response, the first response will be the snapshot response, showing the entire orderbook.
the data is ordered by price (starting with the lowest buys). push frequency is 20ms.
example of output:
finally, USDT
linear market topics are implemented with orderbook_25_stream()
, which fetches the orderbook with a depth of 25 orders per side (API docs here).
the first response is the snapshot response, showing the entire orderbook.
the data is ordered by price, starting with the lowest buys and ending with the highest sells. push frequency is 20ms.
example of output:
all right, we made it. let’s deploy those cuties.
several bots with different strategies are found inside src/bots/
. in this article, we will go over the strategy and deployment of Bot1
. feel free to explore the other bots there, and if you would like to keep up to date with the new ones i am continuously adding, just star the repo, dunno 🤷🏻♀️.
by the way, each bot has a different number and configuration settings in the .env
file (e.g., BOT_COINS
, BOT_MARKET_TYPE
, BOT_ORDER_TYPE
, BOT_STOP_LOSS
, BOT_TRADEABLE_CAPITAL
, and others). before the next step, you should check them out (and understand their effects).
this is how Bot1
gets set up:
and this is how Bot1
executes, inside a while True
loop:
you should check the code (the main class is called BbBotOne
), and then spin it up:
for more details on what happens next, check out cointbot repo 😉.
by the way, you can also have Bot1
running inside a docker container with: