How far can on-chain AI go?

GAN: introduction

Here we quote wiki:

A generative adversarial network (GAN) is a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in June 2014. Two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is another agent's loss.

GAN was essentially used to generate images/texts or sounds. The most famous one should be Deepfakes, where one can replace the face of a person in the video with another. GAN includes two networks:

  • the generative network (the generator)

  • the discriminative network (the discriminator)

    While training, the generator tries to cheat the latter with different outputs whereas the discriminator refuses them. Ideally, both evolve and the generator can finally cheat the best discriminator - the human. The result was quite fruitful as you might have seen with Deepfakes.

However, it is a little outdated. People are more likely to use Stable Diffusion right now. this should be our next work.

GAN on-chain?

Lately, @VanArman released a fully on-chain collection - byteGANs. They are little sprites generated with different GAN models.

Bytegans. A fully on-chain GAN collection by VanArman.
Bytegans. A fully on-chain GAN collection by VanArman.

Our team was pretty excited about how he was going to put things on-chain. This is how he did it:

He used 29.4 ETH to put different metadata on-chain.

That was a little cheating to us. Since we can always put different files on-chain as long as the gas didn’t bypass the 30M gas limit. That is the reason why we decided to see how far can a GAN model go once they are on-chain.

GAN on-chain!

Although we use the term GAN, only the generator part was put on-chain. Especially, we used DCGAN as our model, since it is much easier to be deployed. Several difficulties were encountered:

  1. Decimals

  2. Putting gifs on-chain.

  3. 30M Gas limit.

1. Decimals

The decimals were pretty hard to deal with. Finally, we chose the fixed point numbers with int16 for the calculation. Every number consists of

  • 1 bit for + or -

  • 3 bits mantissa

  • 12 bits decimals. (So the precision goes to 1/4096)

    It is easier to be implemented than the float point numbers and it has much less gas cost: it is equal to 1 multiplication and 1 division with int16.

    Another point with fixed-point numbers is that you will need to check out over-flow/under-flow issues. You need to look into this.

2. GIFs

It was pretty annoying finding different docs for gif encoding. But finally, we found this doc pretty useful. Don’t forget 1 bytes = int8 here… Lots of bugs come from here… Also, if you want to avoid lzw-compression, you should look into this.

When we searched GIFs online all we found were memes lol.

3. 30M Gas limit.

We wanted to create 11x11 sprites just like byteGANs. However, we were blocked by the gas limit. On Ethereum, there is a 30M gas limit even for view functions. We reduced the output size and finally we end up with 5x5, with RGB taking values between 0 to 255. We also create 3 frames. The total gas for a single tokenURI call is 24M gas.

If you are trying to use GAN on-chain with RGB values, the maximum size should be around 6x6 or 7x7 since the complexity is proportional to the cube of the size in CNN layers.

About in-chain art.

Most of the in-chain art projects use SVG format. However, it requires a specific renderer. That’s why @brotchain was born. @divergenceharri and @divergencearran (they are proof’s Director of Product and VP Engineering now!) create a canvas at 512x512 using a bitmap for their Mandelbrot. This is pretty much the limit of a pixelated image since they also use a little color map to compress their data.

Brotchain: The first in-chain art collection at 512x512. Every pixel is rendered in-chain.
Brotchain: The first in-chain art collection at 512x512. Every pixel is rendered in-chain.

How far can on-chain AI go?

Finally we discuss different constraints when you want to make on-chain AI to generate images:

  1. the 30M gas fee. With CNN-type network it is difficult to go further, the complexity grows in cube.

  2. The precision. The fixed-point numbers with int16 are enough in most of case. But we have had overflow/underflow issues at the final layer.

  3. The encoding part is not a big issue here. We cannot generate large size output with AI due to 1. But keep in mind that there is still a 512x512 limit even for 1 image. You will need to encode the whole result in Base64, so that it can be displayed on different marketplace.

It is pretty constrained to be honest. But changing models might improve this.

If you want to implement one on your own, the code is verified on etherscan! Hope that we can see more on-chain AI art in the future.

Subscribe to LifeOnChain_NFT
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
Verification
This entry has been permanently stored onchain and signed by its creator.