Using artificial intelligence to create images - on your own PC - Daniel Springwalds Blog * Basteln mit Daniel

I had often read that AI can write texts or paint pictures. Until now, I had always thought that this could only be tried via web services such as GPT-3 or DALL-Efrom(commercial) providers.

I was therefore pleasantly surprised that there are also GANs that run on the home computer - as long as the graphics card has the appropriate performance and special drivers such as CUDA.

A few years ago, I had already successfully experimented with the related topic of "deep-fakes" with an even older graphics card. So I was in good spirits 😀.

On my first attempt in 2021, I wanted to go "all out" and train a GAN myself. It is important to know that training the model is usually the more computationally intensive part. Later use then requires comparatively little computing power. My PC has a GeForce GTX 1080 TI with 12 gigabytes of GPU memory. Although this already performs quite well, it is mediocre at best for AI training. But that shouldn't deter me, because I thought to myself: If the graphics card doesn't have enough power, then you just have to wait a little longer.

A selfie AI of its own

My first idea was: Why don't I train myself an AI to invent selfie photos of me - then I don't have to take 😉 my own selfies anymore. Coincidentally, I had been doing just that for the last few years, namely (spurred on by a corresponding smartphone app) regularly taking selfies. As a result, I already had a lot of training material at my disposal.

The appropriate tool for this was a StyleGAN based on TensorFlow. Setting up such a GAN was surprisingly easy - even though I realized almost too late the first time that you "mess up" the PC with all the Python dependencies. It is better to use virtual environments or the Anaconda tool.

However, the waiting time during training was sobering, even though my PC was calculating at full power (and fan!) day and night. Even after two days, the model was only able to create faces very dimly:

Selfie StyleGAN nach 2 Tagen Training

After two more weeks(!) you could already roughly see where it was going. However, the difference "glasses/no glasses" could perhaps become a problem. The results were starting to get a bit creepy...

Selfie StyleGAN nach 2 Wochen Training

Another week later, an interesting mix emerged: A few pictures were already pretty close to a real photo, while many still had quite obvious mistakes. The glasses, on the other hand, didn't seem to be a big problem after the first training results.

Selfie StyleGAN nach 3 Wochen Training

After another week of training, the network began to break off in the middle of training and quite randomly with mathematical calculation errors. This could be remedied by jumping back to an earlier state of the model and retraining from this state. From then on, however, this problem hung over the training as a sword of Damocles and cost even more computing time than before. If you regularly see in the morning that the training was stopped in the middle of the night without any result, it demotivates you a bit.

As a proof-of-concept and for my own spirit of research, however, the result was enough for me, so I did not start a new training course lasting several weeks. The last successful model was able to create selfies like this one (after a total of 4.5 weeks of training). Still pretty creepy - but also amazing what an AI can train on its own:

Selfie StyleGAN final nach 4.5 Wochen Training

What I find exciting about this is that it is often said that the trained AI models do not allow any conclusions to be drawn about the training material. Instead, they would only "somehow" process the material abstractly and then create completely new things from what they have learned themselves, which would no longer have anything to do with the individual raw data. At least for this type of network, this does not seem to be entirely true from my point of view. In most of the pictures, I can also recognize the background of the picture: Both the whiteboard in the office covered with sticky notes and the blue Ikea shelf and the wallpaper in my study appear again and again recognizably. Which in itself seems logical, because the AI must also be able to generate this component of the image from its trained model.

Generate images by text input

I also find AIs that generate an image based on a text-only description exciting. Depending on the quality of the images, it is sometimes difficult to explain to someone that the computer has generated these images from training data on the basis of purely mathematical-statistical methods - and has not really "understood" what the given text means in terms of content.

Here, too, there are some programs and models that you can run locally and without an Internet connection on your own PC. With most of these AIs, you can also do things like "style transfer", i.e. have an existing image or photo traced in the style of a particular artist. Or you can retouch parts from images by letting the AI regenerate the missing area.

Here I have dealt exclusively with the function "Create image from text". The different AIs should produce the following images:

French fries on the beach with sailboats in the background.
A cat laying on a car in front of a beautiful sunset.
A squirrel at a computer in a server room, with lots of colorful lights.
Nikola Tesla holding a battery on a hill during a thunderstorm.

Optionally, they were output as a painting and/or as a photo - depending on the ability of the AI.

VQGAN-CLIP - creates rather abstract images

For my first attempts, I used VQGAN-CLIP, which is based on Pytorch. There are numerous trained models available, such as vqgan_imagenet, wikiart_16384, sflckr or coco.

The results were rather abstract in style - perhaps quite describable like the motifs of very artistic postcards:

French fries on the beach with sailboats in the background.

A beautiful painting of french fries on the beach with sailboats in the background by Latent Diffusion Models (LDM)

A cat laying on a car in front of a beautiful sunset.

A beautiful painting of a cat laying on a car in front of a beautiful sunset by Latent Diffusion Models (LDM)

A squirrel at a computer in a server room, with lots of colorful lights.

A beautiful painting of a squirrel at a computer in a server room, with lots of colorful lights by Latent Diffusion Models (LDM)

Nikola Tesla holding a battery on a hill during a thunderstorm.

A beautiful painting of a Nikola Tesla holding a battery on a hill during a thunderstorm by Latent Diffusion Models (LDM)

Latent Diffusion Models (LDM)

The next images should produce Latent Diffusion Models based on pytorch and taming-transformers. LDMs can produce recognizable motifs, some of which look quite realistic. I find it interesting that some images also have a white, illegible caption or a kind of "Shutterstock" watermark.

Here are the results of the LDMs:

French fries on the beach with sailboats in the background.

A beautiful painting of french fries on the beach with sailboats in the background by Latent Diffusion Models (LDM)

A photo of french fries on the beach with sailboats in the background by Latent Diffusion Models (LDM)

A cat laying on a car in front of a beautiful sunset.

A beautiful painting of a cat laying on a car in front of a beautiful sunset by Latent Diffusion Models (LDM)

A photo of a cat laying on a car in front of a beautiful sunset by Latent Diffusion Models (LDM)

A squirrel at a computer in a server room, with lots of colorful lights.

A beautiful painting of a squirrel at a computer in a server room, with lots of colorful lights by Latent Diffusion Models (LDM)

A photo of a squirrel at a computer in a server room, with lots of colorful lights by Latent Diffusion Models (LDM)

Nikola Tesla holding a battery on a hill during a thunderstorm.

A beautiful painting of a Nikola Tesla holding a battery on a hill during a thunderstorm by Latent Diffusion Models (LDM)

A photo of Nikola Tesla holding a battery on a hill during a thunderstorm by Latent Diffusion Models (LDM)

Dall-E mini

Also DALL· E mini can be operated free of charge on your own PC. There is also a free online version on huggingface.co.

In my opinion, the generated images tend to be a little less successful than with the latent diffusion models. The motifs often tend to be abstract or sometimes look like a child's drawing. In between, however, there are always almost photorealistic ones, like here with the cats.

French fries on the beach with sailboats in the background.

A beautiful painting of french fries on the beach with sailboats in the background by Dall-E mini

A photo of french fries on the beach with sailboats in the background by Dall-E mini

A cat laying on a car in front of a beautiful sunset.

A painting of a cat laying on a car in front of a beautiful sunset by Dall-E mini

A photo of a cat laying on a car in front of a beautiful sunset by Dall-E mini

A squirrel at a computer in a server room, with lots of colorful lights.

A beautiful painting of a squirrel at a computer in a server room, with lots of colorful lights by Dall-E mini

A photo of a squirrel at a computer in a server room, with lots of colorful lights by Dall-E mini

Nikola Tesla holding a battery on a hill during a thunderstorm.

A beautiful painting of a Nikola Tesla holding a battery on a hill during a thunderstorm by Dall-E mini)

Disco Diffusion v5 Turbo

The last AI I tried was Disco Diffusion v5 for Windows.

The themes of the motifs are usually easily recognizable in the pictures, but often become a bit psychedelic and then remind me of DeepDream.Themotifsarethenincorporatedseveraltimesandoverlappingintothestructuresofthepicture.Inmyexperiments(especiallywiththephotooutput) were nowhere near as realistic as those of the LDMs or Dall-E-mini. However, some results like the "French Fries" have inspired me quite a bit - especially the paintings look quite artistic in my opinion.

Here are the results for Disco Diffusion v5 Turbo:

French fries on the beach with sailboats in the background.

A beautiful painting of french fries on the beach with sailboats in the background by Disco Diffusion v5 Turbo