Explaining NFT Pricing with Machine Learning

The final installment in our blog series exploring the use of machine learning to accurately price NFTs. You can read the previous entry on determining what matters for NFT valuation here.

NFT pricing in volatile and fast-moving markets

CryptoPunks are one of the most popular NFT projects, and also one of the most dynamic. We need to be able to respond to changes in the market quickly. But how do we use machine learning models to accurately price CryptoPunks in volatile markets?

To achieve this, we implement a “memory” component into our models and cross-validate them, taking into account the time-dependent structure of the data. Below, we dive into how these approaches help us generate pricing in volatile and fast-moving NFT markets.

The graph below shows the trend in CryptoPunk sale prices from launch date to August 2021. The vertical axis shows prices on a log scale, meaning that prices for CryptoPunks have increased exponentially over this time period. 

Model Response to Market Variables
Model Response to Market Variables

The blue line and shaded region show the daily mean sale price together with a standard error bound (which reflects uncertainty around this mean value caused by volatility in the market and/or limited transactions data on that particular day). The red dots show the mean prices predicted by our ML model with a standard error bound.

The prices predicted by our model only use data from prior days, so they can’t look into the future but need to rely only on available historical information to arrive at a price prediction. We verify the accuracy of our models considering the time-dependent structure of the data to ensure that it matches closely with how the models will be used in reality - to predict current and future NFT valuations using past data.

Notice that the average model predictions and actual prices tend to follow each other very closely. You’ll notice the red dots tend to slightly trail the blue line. This is because it takes time for the model to adjust to trends in the market. However, the lag is very subtle. One component that helps us achieve this is the limited memory of our models - we consider recent events more than events that happened further in the past. 

However, by removing past data points we reduce the size of the data used to train the model, which may result in worse performance if the excluded data contain useful information on the relationships between NFT characteristics and prices. We test models with different extents of “past memory” to identify the right durations.

This ensures that we have enough data to accurately infer NFT prices while excluding information too far in the past that may no longer be relevant as the market changes. The importance of “memory” can vary across projects, meaning that it’s always important to run tests to identify the extent to which an NFT market is “stable” over time or “dynamic” and quickly evolving.

Explaining NFT pricing with explainable machine learning

In order to validate the machine learning models used to generate NFT prices, sometimes we need to look under the hood and understand why they arrived at a certain price for a particular NFT. Suppose, for example, that a CryptoPunk was given an unusually high valuation by our models. How would we go about understanding the sources of this result?

Enter Shapley Values, applied at the level of individual predictions. In our previous blog post we discussed Shapley Values and how they can help us understand the most influential variables in our models. Here, we’ll discuss how these methods can be used to understand individual model predictions.

The plot below shows the predicted valuation (in log-base-10) for an example NFT collectable some time in the past. The valuation was 0.51, or about 3.24 ETH (=10^.51).

Predicted Valuation Plot
Predicted Valuation Plot

The next plot decomposes the extent to which each variable contributed to the final predicted valuation, with the values of the input variables in parentheses.

Input Variables Values Plot
Input Variables Values Plot

In this case, time trend was the most influential variable that positively affected the valuation as the prediction was made towards the end of a “bull market” period. The next most influential input was the last sale price for any NFT within that collection (project.prevPrice), which was slightly lower than would have been expected based on time trend alone, thereby tempering the effect of the time trend and pushing the prediction slightly downwards. Finally, the rarest_accessory variable, which measured the rarity of the rarest attribute of the NFT, contributed positively to the valuation over the combined effect of previous two variables. An additional 17 variables affected this valuation but to a less significant extent. 

Decompositions like this help us understand how our ML models reach their predictions, providing insight into which variables influence pricing for different types of NFTs within a collection. 

Accurately pricing NFTs at scale with machine learning

Over the past two months, we took a deep dive into how we use machine learning and crowd intelligence to generate automated and up-to-date NFT pricing and why machine learning models are necessary to achieve reliable valuations at scale.

To briefly recap, our machine learning models ingest historical sales data and NFT metadata to construct features based on this information to generate accurate, reliable pricings. We validate the predictions by examining their accuracy on data not used in the training process and obtain error bounds by comparing our predictions to realized sale prices. Both the predicted pricings and error bounds provide useful information to NFT buyers, sellers, or developers building products on top of the NFT economy.

Machine Learning allows us to incorporate data that simpler models do not take into consideration, such as pooling the sales histories of NFTs to arrive at a prediction for a single NFT and utilizing a range of NFT metadata. Much of our research effort has focused on constructing different predictor variables, using automated methods to uncover the most important ones, and iterating to arrive at a lean but powerful model.

As a result, our ML approach excels in predicting the prices of NFTs where others fall short. For instance, let’s take a look at recent performance of our Cool Cats pricing model. Median Relative Error (MRE) is a measure of how accurate NFT price predictions are. An MRE of 12.75% means predictions are usually off by about 12.75% when compared to actual sales prices. For Cool Cats, an optimized EWMA approach generates a MRE of 65.4%. Upshot’s machine learning model achieved a Cool Cats MRE of 11.6%.

What does this mean for the broader NFT market? Until now, the value held in NFTs has largely remained under-utilized and stagnant. Accurate NFT pricing enables exciting new possibilities at the intersection of DeFi x NFTs that will unlock the true potential of NFTs as an asset class. We believe improved NFT pricing will increase transparency, reduce information frictions in NFT markets and encourage participation from a broader community of developers - opening the floodgates to a new wave of NFT products and protocols.

If you are interested in building new products enabled by near real-time NFT appraisals, please reach out on Discord or at contact@upshot.xyz.

Stay in the loop!

+ Join us on Discord

+ Follow us on Twitter

+ Subscribe to our newsletter


Subscribe to Upshot
Receive the latest updates directly to your inbox.
This entry has been permanently stored onchain and signed by its creator.