When working with images in deep learning for generative art, you will almost always use the convolution operator to build networks. Convolution is a function that takes an input image and creates a scaled down, convoluted, representation of the image by passing a kernel over the image in steps called “strides”, multiplying each cell and adding the results to create a value in the output matrix.
The opposite of convolution is called transposed convolution or as I like to say, devolution. In devolution, a model is trained that can create a new kernel that can be used to restore the input matrix from the convolved output matrix.
I’ve struggled to find simple examples of convolution and devolution that can be applied to arbitrary datasets; most tutorials use MNIST and gloss over the details of adjusting the network to use new input sizes. This post will walk through the simplest possible example by working the CPUNX-10k dataset.
The CPUNX-10k API can be installed into a virtual environment using pip.
$ pip install cpunx-10k
After installation, you can work with the dataset in a notebook much like you may work with the MNIST dataset made available in Tensorflow.
import cpunks10k
cp = cpunks10k.cpunks10k()
(X_train, Y_train), (X_test, Y_test), (labels) = cp.load_data()
Now, import the minimum set of Keras modules needed for this tutorial.
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Conv2D, Conv2DTranspose
The CPUNX-10k dataset is a set of 10,000 (24,24,4) images encoded in four RGBT channels. For this simple example, we want to work with simpler images, that is, greyscale encoded (24,24,1) images. The following code can be used to transform the images to greyscale.
cmap = plt.get_cmap('gray')
def rgb2gray(rgb):
'''covert an rgb image to greyscale using weights found
when fumbling around on stackoverflow
'''
weights = [0.2989, 0.5870, 0.1140]
return np.dot(rgb[...,:3], weights)
# convert X_train and X_test to greyscale
X_train_g = np.array([rgb2gray(img) for img in X_train])
X_test_g = np.array([rgb2gray(img) for img in X_test])
X_train_g = X_train_g.reshape(9000,24,24,1)
X_test_g = X_test_g.reshape(1000,24,24,1)
# check work
plt.imshow(X_train_g[0], cmap=cmap)
plt.show()
Now, we can build a simple Keras network that applies three convolution operators to create a convolved representation of the images and then uses the transposed convolution operator to reconstruct the image from the intermediate representation output by the convolution. In Keras, the Conv2d operator is used to apply filters=N
different kernels to the input to create the convolved output representation.
The code below creates the model by adding 3 layers each of convolution and transposed convolution.
input_shape = (24,24,1)
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', kernel_initializer='he_normal', input_shape=input_shape))
model.add(Conv2D(16, kernel_size=(3, 3), activation='relu', kernel_initializer='he_normal'))
model.add(Conv2D(8, kernel_size=(3, 3), activation='relu', kernel_initializer='he_normal'))
model.add(Conv2DTranspose(8, kernel_size=(3,3), activation='relu', kernel_initializer='he_normal'))
model.add(Conv2DTranspose(16, kernel_size=(3,3), activation='relu', kernel_initializer='he_normal'))
model.add(Conv2DTranspose(32, kernel_size=(3,3), activation='relu', kernel_initializer='he_normal'))
model.add(Conv2D(1, kernel_size=(3, 3), activation='sigmoid', padding='same'))
You can view the network architecture using the summary
method.
model.summary()
Model: "sequential_1"
_______________________________________________________________
Layer (type) Output Shape Param #
===============================================================
conv2d_34 (Conv2D) (None, 22, 22, 32) 320
_______________________________________________________________
conv2d_35 (Conv2D) (None, 20, 20, 16) 4624
_______________________________________________________________
conv2d_36 (Conv2D) (None, 18, 18, 8) 1160
_______________________________________________________________
conv2d_transpose_25 (Conv2DT (None, 20, 20, 8) 584
_______________________________________________________________
conv2d_transpose_26 (Conv2DT (None, 22, 22, 16) 1168
_______________________________________________________________
conv2d_transpose_27 (Conv2DT (None, 24, 24, 32) 4640
_______________________________________________________________
conv2d_37 (Conv2D) (None, 24, 24, 1) 289
===============================================================
Total params: 12,785
Trainable params: 12,785
Non-trainable params: 0
Now, compile and train the model
model.compile(optimizer='adam', loss='binary_crossentropy')
model.fit(X_train_g, X_train_g,
epochs=num_epochs,
batch_size=batch_size,
validation_split=validation_split)
You can use the model to “predict” a devolved punk using a punk in the test dataset as input. In the line below, we create the first 10 devolved punks from the test set.
devolved = model.predict(X_test_g[0:10])
We can compare the devolved punks against the greyscale punks using a bit of matplotlib trickery.
fig = plt.figure(figsize=(7, 3))
n_rows = 2
n_cols = 5
for i in range(0,n_cols):
ax = fig.add_subplot(n_rows, n_cols, i+1)
ax.imshow(X_test_g[i], cmap=cmap)
ax = fig.add_subplot(n_rows, n_cols, i+1+n_cols)
ax.imshow(devolved[i], cmap=cmap)
plt.subplots_adjust(bottom=-0.2, right=1.4, top=0.9)
An understanding of the transformed convolution function is critical to building an intuition for the tools that we will use to make generative art e.g. autoencoders, variational autoencoders and generative adversarial networks. As an exercise, experiment with the network shape, the sizes of the convolution operators and the number of epochs used in training to create new sets of punks that use the imperfection of the transposed convolution operator to create aesthetically interesting variants.