One aspect of AI that gets a lot of attention is image generation. This makes sense–it’s flashy; the results are easy to see at a glance; it’s lightly horrifying because image generation is supposed to be a human thing, damn it! Here’s an example from a popular project. This person does not exist:
The latest craze in this area is text-to-image. The neural networks for this are evolving rapidly, and lately have been producing some pretty impressive results.
VQGAN + CLIP “painting of small cabin in the middle of snowy mountains in the winter at night in the style of disney trending on artstation | unreal engine” pic.twitter.com/NORfb5ZEIg
— AK (@ak92501) June 19, 2021
How does this software work? What can its eccentricities tell us about AI? About the future? Join me below the fold for a discussion of Generative Adversarial Networks, or GANs…
The simplest GAN is actually two neural networks pitted against each other. One, the generator, creates the images; the other, the discriminator, says when the images are ready. Seems simple enough. But how is this network trained? Let’s use the This Person Does Not Exist GAN as an example.
We begin with a dataset of head shots and two blank-slate networks. Then… it’s a bit involved. First we train the discriminator to identify pictures of head shots, just like you’d train any other such network. Next, we have the generator make some images, and train the discriminator further by telling it that these are fake. An earlier post gets into what this process looks like. It is as straightforward and as well-understood as these things come. And then we flip the script, having the generator create images until it can fool the discriminator into thinking they’re head shots.
We repeat this whole process again and again and again. At the end, we have an AI that can generate head shots.
As is common in this field, narrow-purpose networks tend to perform better. You can compare the two outputs above: the dedicated head shot generator creates images that are almost indistinguishable from the real thing, at least without close inspection. Conversely, the more general text-to-image network, called CLIP + VQGAN, is impressive for what it is, but obviously has a long way to go.
One interesting thing about CLIP + VQGAN is how much better it works when you find the right magical incantations. In the tweet above, you can see that the prompt is “painting of small cabin in the middle of snowy mountains in the winter at night in the style of disney trending on artstation | unreal engine”–this is because adding “trending on artstation” and “unreal engine” steer the network in a more painterly and recognizable direction. Working with it is like trying to extract the right wish from a deaf genie, to paraphrase some tweet I saw. This is because the guts of a neural network are ineffable; it attaches meaning to things in a way that we do not, or at least, in a way that it cannot tell us about. (I’ve covered this in some other posts on this topic, but I love this as an illustration.)
So, where do we go from here? Obviously CLIP + VQGAN won’t be taking over the art world any time soon, but you can see how it makes a great resource for visual artists, just as text generation can be a great resource for writers.
Here’s another good one:
— david gallay (@svengali) July 16, 2021
#LatentVisions is another network that’s doing pretty impressive stuff.
Just for fun, I also asked the text generation AI GPT-3 about GANs. Seed text in bold:
What is a GAN? A GAN, or Generative Adversarial Network, is a machine learning model that is used for generating realistic looking images, audio, and text. GANs can be used to create more convincing fake data for training a machine learning model. GANs consist of two neural networks, a generator and a discriminator. The generator attempts to learn how to output realistic looking images. The discriminator attempts to learn how to distinguish between real images and fake ones.
What are the benefits of GANs? GANs have several benefits that make them attractive to researchers and companies. They can be trained with unsupervised learning, and they can generate new images based on existing ones. They can be trained with unsupervised learning, and they can generate new images based on existing ones. They can be trained with a smaller dataset than required for generative models.
What are the drawbacks of GANs? GANs are computationally expensive, requiring two neural networks (one generator and one discriminator) to be trained. They also suffer from unstable gradients and mode collapse. The recent work in this area is interesting and worth looking into.