Classifying CryptoPunks with Tensorflow, Keras and CPUNKS-10K

CPUNKS-10K are subsets of the 10,000 labeled images in the CryptoPunks collection by Larva Labs. They have been collected, organized & modified for use in Machine Learning research by tnn1t1s.eth. The source images files and meta-data were designed and created by John Watkinson & Matt Hall.

This article demonstrates using Tensorflow and Keras libraries to train a simple neural net classifier to make predictions using the CPUNKS-10k dataset. This notebook assumes familiarity with the Tensorflow, Keras and basic concepts in machine learning.

Setup

Import and configure numpy, pandas, matplotlib, Tensorflow and Keras.

import numpy as np
import pandas as pd

from keras.layers import Input, Flatten, Dense
from keras.models import Model
from keras.optimizers import Adam

import matplotlib.image as mpimg
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('default')

import cpunks10k

Load CPUNKS-10K Dataset and Inspect it.

Initialize the CPUNKS-10K dataset reader.

cp = cpunks10k()

Load the data using a small subset of the available features.

labels = ['male', 'female']
labels = ['alien', 'ape', 'zombie', 'male', 'female']
labels = ['earring']
labels = ['wildHair']
(X_train, Y_train),(X_test, Y_test) = cp.load_data(labels)

Inspect the data. Its shape should be as expected.

(X_train.shape, Y_train.shape), (X_test.shape, Y_test.shape)
(((9000, 24, 24, 3), (9000, 2)), ((1000, 24, 24, 3), (1000, 2)))

Build a Simple Classifier.

Each of the images in CPUNKS-10K is comprised of a 24X24X3 numpy array holding uint8 values. The network shown below setups the Input layer based on image size, creates two Dense layers and a final output layer that will hold probabilities for each of the 2 labels in the output.

input_layer = Input(shape = (24, 24,3))
x = Flatten()(input_layer)
x = Dense(units = 200, activation = 'relu')(x)
x = Dense(units = 150, activation = 'relu')(x)

output_layer = Dense(units=len(cp.labels), 
                     activation = 'softmax')(x)

model = Model(input_layer, output_layer)

Next, create an optimizer with learning rate = 0.0005 and compile the model.

opt = Adam(lr=0.0005)
model.compile(loss='categorical_crossentropy',
              optimizer=opt,
              metrics = ['accuracy'])

Train the model

Use the model's fit method to train the model.

model.fit(X_train,
          Y_train,
          batch_size = 32,
          epochs = 10,
          shuffle = True)

Evaluate the Model

The model's evaluate method will peek at the outsample test data to calculate an accuracy score for the trained model.

model.evaluate(X_test, Y_test)
32/32 [==============================] - 0s 3ms/step - loss: 0.0071 - accuracy: 0.9970
[0.007132469676434994, 0.996999979019165]

You will probably find that the model is strikingly accurate for such a simple model. Understanding why this model is so accurate is a great exercise for beginning machine learning practicioners and left as an exercise.

Trust and Validate

When a model returns surprising results, it is always good to validate using inspection methods. For humans, visual inpsection is a great first pass.

First, collect all the predictions on the outsample and prepare two arrays containing predicted v. actual.

labels=np.array(cp.labels)
preds = model.predict(X_test)
preds_single = labels[np.argmax(preds, axis=-1)]
actual_single = labels[np.argmax(Y_test, axis=-1)]

Now, plot the predicted v. actual in a grid using matplotlib.

n=10
indices = np.random.choice(range(len(X_test)), n)
fig = plt.figure(figsize=(15,3))
fig.subplots_adjust(hspace=0.4, wspace=0.4)
for i, idx in enumerate(indices):
    img = X_test[idx]
    ax = fig.add_subplot(1, n, i+1)
    ax.axis('off')
    ax.text(0.5, -0.35, 
            f"{idx}:  {str(actual_single[idx])}", 
            fontsize=10, 
            ha='center',
            transform=ax.transAxes)
    ax.text(0.5, 
            -0.70, 
            'pred  = ' + str(preds_single[idx]), 
            fontsize=10,  
            ha='center',
           transform=ax.transAxes)
    ax.imshow(img)
ten random wildHair predictions
ten random wildHair predictions

Debugging

After random inspection, you might want to inspect the values for a single punk. For example, checking in on a zombie or alien.

The test set in this notebook is comprised of CryptoPunks 9000 through 1000. We can lookup the actual, predicted and image of Punk 9997 using the code below.

pid = 9966 - 9000
(actual_single[pid],preds_single[pid])
('wildHair', 'wildHair')
plt.imshow(X_test[pid])
Punk with wildHair
Punk with wildHair

Next Steps

This tutorial intended to get you up and running with using Tensorflow, Keras and the CPUNKS-10K dataset. One of the key takeaway’s here is that CPUNKS-10K produces non-intuitive and extremely accurate classification results using the simplest neural network approaches for image classification. This hints at one of the main uses of the dataset, that is, it can be useful for teaching the concepts of model explainability. Why do you think the models trained on CPUNKS-10K are so accurate?

Subscribe to 0x9C65…e51c
Receive the latest updates directly to your inbox.
Verification
This entry has been permanently stored onchain and signed by its creator.