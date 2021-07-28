Come for the politics, stay for the snark.

Artificial Intelligence & You: A Picture Worth A Dozen Words

One aspect of AI that gets a lot of attention is image generation. This makes sense–it’s flashy; the results are easy to see at a glance; it’s lightly horrifying because image generation is supposed to be a human thing, damn it! Here’s an example from a popular project. This person does not exist:

The latest craze in this area is text-to-image. The neural networks for this are evolving rapidly, and lately have been producing some pretty impressive results.

How does this software work? What can its eccentricities tell us about AI? About the future? Join me below the fold for a discussion of Generative Adversarial Networks, or GANs…

The simplest GAN is actually two neural networks pitted against each other. One, the generator, creates the images; the other, the discriminator, says when the images are ready. Seems simple enough. But how is this network trained? Let’s use the This Person Does Not Exist GAN as an example.

We begin with a dataset of head shots and two blank-slate networks. Then… it’s a bit involved. First we train the discriminator to identify pictures of head shots, just like you’d train any other such network. Next, we have the generator make some images, and train the discriminator further by telling it that these are fake. An earlier post gets into what this process looks like. It is as straightforward and as well-understood as these things come. And then we flip the script, having the generator create images until it can fool the discriminator into thinking they’re head shots.

We repeat this whole process again and again and again. At the end, we have an AI that can generate head shots.

As is common in this field, narrow-purpose networks tend to perform better. You can compare the two outputs above: the dedicated head shot generator creates images that are almost indistinguishable from the real thing, at least without close inspection. Conversely, the more general text-to-image network, called CLIP + VQGAN, is impressive for what it is, but obviously has a long way to go.

One interesting thing about CLIP + VQGAN is how much better it works when you find the right magical incantations. In the tweet above, you can see that the prompt is “painting of small cabin in the middle of snowy mountains in the winter at night in the style of disney trending on artstation | unreal engine”–this is because adding “trending on artstation” and “unreal engine” steer the network in a more painterly and recognizable direction. Working with it is like trying to extract the right wish from a deaf genie, to paraphrase some tweet I saw. This is because the guts of a neural network are ineffable; it attaches meaning to things in a way that we do not, or at least, in a way that it cannot tell us about. (I’ve covered this in some other posts on this topic, but I love this as an illustration.)

So, where do we go from here? Obviously CLIP + VQGAN won’t be taking over the art world any time soon, but you can see how it makes a great resource for visual artists, just as text generation can be a great resource for writers.

Miscellany

Here’s another good one:

#LatentVisions is another network that’s doing pretty impressive stuff.

Just for fun, I also asked the text generation AI GPT-3 about GANs. Seed text in bold:

What is a GAN? A GAN, or Generative Adversarial Network, is a machine learning model that is used for generating realistic looking images, audio, and text. GANs can be used to create more convincing fake data for training a machine learning model. GANs consist of two neural networks, a generator and a discriminator. The generator attempts to learn how to output realistic looking images. The discriminator attempts to learn how to distinguish between real images and fake ones.

What are the benefits of GANs? GANs have several benefits that make them attractive to researchers and companies. They can be trained with unsupervised learning, and they can generate new images based on existing ones. They can be trained with unsupervised learning, and they can generate new images based on existing ones. They can be trained with a smaller dataset than required for generative models.

What are the drawbacks of GANs? GANs are computationally expensive, requiring two neural networks (one generator and one discriminator) to be trained. They also suffer from unstable gradients and mode collapse. The recent work in this area is interesting and worth looking into.

Not bad!

    13Comments

    1. 1.

      West of the Cascades

      What are the drawbacks of GANs? GANs are computationally expensive

      Once Bitcoin is banned, all those mining computers can be repurposed! Pretty interesting and slightly unsettling stuff.

    2. 2.

      VeniceRiley

      Great. Make me an audio conversation of Barack Obama asking Hillary Clinton to send more baby blood in the style of cell phone audio.

      Q conspiracies will get amazing

    4. 4.

      Old School

      @VeniceRiley

      If you want Q conspiracies to get amazing, have Obama claim that Mike Pence finished the last of the previous batch.

    5. 5.

      citizen dave

      Interesting!

       

      Will we eventually get to the place where each (first we world) will be able to imagine any type of art, movie, song, etc.  A bit off topic but as a music fan I’ve wondered if there will come a time when you can have an artist play any song in any style etc.  And any combinations of bands.

      Or will Idiocracy prevail?

    7. 7.

      MattF

      This is a science thread… so I can link to this astonishing post from Derek Lowe describing how plants were ‘persuaded’ to produce a human enzyme that greatly increased the plants’ productivity. I really got that ‘IT’S ALIVE’ vibe from this one.

    8. 8.

      NotMax

      The perfect stocking stuffer! Hours of fun for the whole family! It’s “Infinite Number of Monkeys in a Box.”

      By Marx*.

      *that last gratuitously thrown in to tickle the olds’ nostalgia bone

    10. 10.

      Elizabelle

      Worried.  Bob Odenkirk was hospitalized last night in ABQ after collapsing on the set of Better Call Saul’s last season.  And no news since.

      And just checked TMZ for any update, and Dusty Hill, ZZTopp’s bassist, has just died.  Damn.

      Sorry to be the bearer of bad news here.

    11. 11.

      The Moar You Know

      The fake young lady above, well, that level of fakery has been around for a while and one area in which it’s being used in is, of course, ‘simulated’ child porn.  Which is not illegal in the US.

      So far, detailed digital forensic analysis can show the difference.   The human eye cannot.

      Courts are going to get a workout with that shit.

    12. 12.

      WhatsMyNym

      At first glance the photo of the girl looks realistic. At second, it just looks creepy.

