Generative Art Basics: Resources, Prompt Engineering, Inspiration...

This is a primer on getting started with making generative art from a newbie’s perspective. In here you will find lots of hyperlinked resources to guides, studies and artists to catalyze your growth and understanding of the space. I have also included summaries of some of the basic tools in a google colab notebook, and some personal thoughts/wisdom in creating a well defined ‘prompt’.

Join the generative art community (disco diffusion discord) to continue spreading the generative love. A lot of resources in here - all properly credited to the original authors when possible. I do my best to share resources and works on my twitter too.

General Notes on the AI Art process:

  • This doc explores creating art through Google Colabs - notebooks allow anybody to write and execute code through the browser (especially well suited to machine learning, data analysis and education) and run them on Google servers
  • Colab Notebooks written by other skilled users are publicly available (like Disco Diffusion) allowing you to create copies and run existing code without requiring the knowledge to write it entirely yourself.
    • You change variables and input your own prompt into the notebook to create new original results
  • This is a lot of trial and error
  • There are no hard rules
  • The outputs become the inputs, re-iterations towards a polished vision

Colab Notebooks:

PyTTI - python text to image notebook

FiLM - Frame Interpolation for Large Images

Jukebox - for generative music

ESRGAN - Image up-scaler

‘Hitchhiker’s Guide’ - A list of many many more notebooks

Loud Disco - Disco diffusion with audio prompts

Lucid Sonic Dreams - Audio reactive visuals from mp3 files

Centipede Diffusion - Latent Diffusion and Disco Diffusion models combined

Hugging face - a depot for GANs (not a notebook)

Key Notebook Features:

Note for using the notebooks: the ‘Runtime’ button on the top will be your friend; here you can:

  • Restart the runtime in case of any crashes/changes
  • Check your runtime type (GPU) and change to High-RAM (if you have an account)
  • Run all cells once you have inputted your desired settings

To make sure everything is working and to get the download process started while you input your variables, it’s a good idea to hit ‘Run all’ as your first step when working in a notebook. Once you have changed your variables, hit the ‘Restart and Run All’ button (solves the ‘reboot for extra speed’ variable too).

Resources:

Disco Diffusion:

Settings/Variables:

In the Setting section of disco there are a lot of variables to play with. Jumping in an changing a bunch of variables without testing often leads to disappointing results. Below are a few key variables that have some of the biggest effects on the output and on RAM usage. Start messing around with these before exploring the more advanced features (pulled heavily from zippy’s guide)

  1. Steps: 50-10000
    1. One of the most important variables, and biggest consumer of memory when set high
    2. Number of iterations, number of times the AI looks at the image and takes a step towards the prompt.
    3. There is a diminishing return for steps, as each step gets smaller in its directional movement. 250-500 is a good range generally, but going higher for detail specific prompts or intentions is possible
  2. Width height:
    1. 512x minimum, needs to be in a multiple of x64
    2. Size directly relates to the render time, so smaller images will be less likely to crash and load faster
    3. An AI Upscaler can be used to take small blurry images and enhance them. Free Upscaler linked here
    4. Dimension correlate to prompt intentions, e.g. art of tall trees would be best suited for 512x1024 dimensions
  3. Clip Guidance Scale: 1500 - 100000
    1. One of the most important variables, tells the AI how strongly to move towards the prompt at each step.
    2. Correlates to image size
    3. High means more specific, but too high for the relative size and the AI will distort the image
  4. TV Scale (total variance denoising): 0 - 1000
    1. Smooths out details of the final image to reduce noise
    2. Set to 0 to turn it off
  5. Range Scale: 0 - 1000
    1. Used to adjust color contrast with low values being higher contrast, and high values being lower contrast (reduced color palette)
    2. Set to 0 to turn it off
  6. Sat(uration) Scale: 0 - 20000
    1. Adjust saturation, higher values = more saturation and vibracy
    2. Set to 0 to turn it off
  7. Cutn_Batches: 1 - 8
    1. The number of ‘areas’ the image is cut into each step
    2. Changes the style, but at the cost of significant render time. Best to leave at 1, unless you are trying to study a specific prompt.
  8. Init Image:
    1. If you want to use an image as a starting point rather than random noise, this is the place
    2. Download the image into your google drive file: AI → Disco Diffusion → Init images
    3. On the left side of the colab notebook, click on the ‘files’ icon
    4. Find the folder init images, right click the image you want to use a select ‘copy path’
    5. Paste that path into the init image section of the settings
  9. Init Scale: 10 - 20000
    1. How much of your initial image will remain unblurred as initial noise for the diffusion
    2. How strongly the diffusion will stick to the original image at each step
    3. Needs to be balanced against CLIP Guidance Scale - too high and there wont be any changes, too low and the original will be unrecognizable (for better or worse)
  10. Skip Steps: any number up to # of steps
    1. Denoising strength starts high and gradually gets lower- the first few steps of any run have so much denoising that 10-15% can be skipped without affecting the final image, while saving some render time
    2. Skip too many steps, and there will be too much noise (poorly defined objects) in the final image
    3. Don’t skip enough steps, and the image will be blown out as CLIP moves too much towards prompts (often resulting in large singular color splotches)
    4. Skipping steps is important if using an initial image, so that it doesn’t get too drowned in the initial noise

Diffusion Models:

Whenever you change which models you are using, you need to restart your runtime (click on Runtime, then any of the restart options, then run again) . Changing models is likely to cause crashes or out of memory problems (CUDA memory error).

In order of fastest/lightest to slowest/most memory consuming:

  • VitB32
  • RN50
  • RN101
  • VitB16
  • RN50x4
  • RN60x16
  • RN50x64
  • ViTL14

Prompt Engineering:

A prompt is the language input that you write in the colab which forms what you want to see. The diffusion models will look at images and try to pull what it sees to create something new to match your prompt. Understanding how these models work can be frustrating, but there are some methods to hone in on clarity and intentional results.

CLIP - stands for Contrastive Language-Image Pre-Training -a transformer model from OpenAI that is used to match text embeddings with image embeddings, it works by scanning the internet for images and their associated captions then taking an average of those images (click here to see what CLIP sees in a search engine

Common Mistakes:

  1. Use singular nouns or specific numbers. E.g. ‘android astronauts’ could be anything from 2 to 1000 individual androids, and would come out messy compared to ‘2 android astronauts’
  2. CLIP can’t extrapolate ideas for you, avoid vague concepts. E.g. ‘animals one million years from now’ would not generate well captioned images. Instead, something like ‘animals of the future’ would be more suitable.
  3. Speak in positives not negatives - a yellow cat on the internet will be captioned with ‘yellow cat’, not as ‘not a brown cat’.

Methods to get specific:

  1. Mention the specific medium you want to see. E.g. ‘a matte painting of___’, ‘photorealistic 8k photography of____’ , ‘woodblock engravings of ____’
  2. Stylize based on artists/ art hubs you like. ‘Trending on artstation’ is a popular variable that pulls styles that are currently trending on artstation'. E.g. ‘___ by Studio Ghibli’, ‘____ by William Blake and Leonard da Vinci’… Pick an artist that has produced works relating to the prompt you wish to see
  3. Think in terms of what has a lot of photos on the internet. E.g. ‘a wizard’ compared to ‘a person who can do magic’.
  4. Try throwing dates in there. Anything on the internet is fair game for the AI.
  5. Emojis work too

Example list of variables to try adding to your prompts: ‘Kodak portra, maximalist, matte drawing, infrared, DeviantArt HD, ISO 200, quantum wavetracing, ukiyo-e, surrealist, 3840x2160, fauvism, masterpiece, Egyptian art, woodcut, made of ___, holographic, aftereffects, sunrays shine upon it, parallax, depth of field, deviantart, 8k 3D’

Prompt engineering guide by Philipuss

Disco Diffusion Modifiers by Harmeet and Stephen Young - explores modifiers to change the style of the prompts, e.g. ‘by Salvador Dali’ , ‘futuresynth’ , ‘steampunk’…

Getting Intentional:

  • e.g. making a music video by matching lyrics to frames
  • entering the words of a song/poem as the prompts for a diffusion video
  • use original images/animations (or the work of others) to see something you already like in another style
    • e.g. this gif I like - ran through Disco with ‘bioluminescence’ as a prompt
  • Golden Ratio:

From this reddit post: The "golden ratio" is ((1+sqrt(5)/2) = approximately 1.618. The table below gives dimensions divisible by 64 (a requirement in DD) that are roughly at this ratio.

Hopefully, this table won't look wonky. I couldn't get the table feature to work.

e.g. 1024x640 - 1.600 ratio

e.g. 1664x1024 - 1.625 ratio

from Matthew McAteer
from Matthew McAteer

Artists:

To follow, for inspiration, news in the space…

SOMNAI - started Disco Diffusion

Gandamu - also pioneering Disco Diffusion

Heaven’s Last Angel - pytti pioneer

Chris Allen (zippy)- pioneering Disco and guide goat

Euclid - share prompts with full transparency (has a discord for it)

Stephen Young - posts great studies on variables

Roope Rainisto - does a lot - great source for inspiration on creative uses

Surea.i - does studies, I like their style

Sportsracer48 - created the pytti notebooks

KaliYuga - resonance pillar, love her style

There are a lot of others I probably missed, but these folks came to the top of my mind. People share and support actively on twitter, so there is no shortage of inspiration:)

Subscribe to Gustaf Palm
Receive the latest updates directly to your inbox.
Verification
This entry has been permanently stored onchain and signed by its creator.