As evidence continues to mount that we are living in a cyberpunk dystopia, I’ve decided to do a series of posts on artificial intelligence. This one is about text generation.
I’m something of an AI skeptic. While they’ve proven to be very effective in some areas, these are mostly tasks with a narrow scope and well-defined set of rules, such as playing chess and go, or determining whether something is a picture of a cat. More complex tasks like driving have proven to be a lot harder. Driving in particular has very ambiguous inputs that are highly context-dependent, two things that AIs have trouble with.
(Let’s please continue reading and not get hung up on the political and ethical roadblocks to self-driving cars.)
AIs have also historically had trouble generating text and images. Not because these are magical tasks that only humans can perform; it’s simply been difficult to make computers good at them. Well, much to my surprise, this may be coming to an end in the next few years. And I don’t mean that in the “self-driving cars are always five years away” sense.
In the last couple of years, computerized content generation has made some remarkable advances. Deepfakes–videos that believably replace one person with somebody else–are proliferating, and fairly easy to make. AIs are frighteningly good at generating faces now, too.
But what about text? Google Translate has gotten much more sophisticated lately as Alphabet has perfected the neural networks that power it. But is that more like writing, or more like playing go? I’d say it’s in between.
Which brings us to writing. Earlier this year, the nonprofit group OpenAI built a text-generation neural network that was, in their opinion, too frightening to release. I called malarkey, but now that they’ve been gradually releasing more sophisticated versions of their model, I call… less malarkey. What this means for you, and some examples, below the fold.
This neural network, called GPT-2, was trained on 40GB of internet text (which is a lot). I downloaded and set up the newest version of their model to play with it, and the results were impressive. When given a prompt, it will output its best guess of what the next word should be, over and over, until you have something approaching prose. Here are some that I generated, with the seed text in italics.
Just moments ago, a ballistic missile was launched from the Tonghae Satellite Launching Ground in North Korea.
It was tracked as it made its way to its target in Japan.
Japan’s Defense Minister announced this morning that North Korea launched a ballistic missile that flew over Japan into the Sea of Japan.
The missile’s trajectory is the same as last time North Koreans fired a ballistic missile. The North Korean missile launch happened just hours after the UN Security Council voted unanimously to impose additional sanctions on the reclusive nation because of their nuclear and ballistic missile programs.
China, North Korea’s main ally, condemned the launch and called for “urgent efforts by the international community to properly implement the resolution,” according to a statement issued from the Foreign Ministry.The missile launch followed the US strike on Syria, which came after four of its air strikes were launched in retaliation to an alleged chemical weapons attack in the Syrian city of Khan Sheikhoun last week.
Supreme Court Justice Ruth Bader Ginsburg has died, her family reports.She was 88.
Ginsburg suffered her latest bout of complications in early June, her brother-in-law Ron Sachdeva said Tuesday morning. She was in intensive care following her surgery to relieve the pain from her latest bout of pneumonia, Sachdeva said.
“It is with a very heavy heart and with great sorrow that we say goodbye to our beloved Ruth,” Sachdeva told CNN.
She was brought back to the hospital to begin physical therapy in late June after being “sore for quite a while.”
“The treatment included a series of lung functions over the course of several weeks,” Sachdeva said.
A physician who was not authorized to speak publicly and requested anonymity told CNN last month that she was in critical condition.
Are they perfect? No. Do they always come out this well? No. But, when they do: Are they good enough to trick people who only skim them? Good enough to rile up the gullible or already-convinced? Good enough to generate natural-sounding tweets and website comments? Good enough to run an ongoing confusion & disinformation campaign?
Unfortunately, yes. Add the fact that it’s trivial to generate an endless supply of these, and we can see the problem. It’s already creating content that’s better than some of what comes out of the Russian propaganda mills.
These models are only going to get more sophisticated as time goes on. (Indeed, this model already is more sophisticated; they just haven’t released the whole thing). Neural networks leave fingerprints all over the stuff they create, and it’s easy to detect GPT-2 text, but who’s really going to go out of their way to do so?
We aren’t in a world of endlessly-scalable automated propaganda yet… but we’re getting to the point where I can see it on the horizon.
(If you’d like to play with a smaller version of this model yourself, you can try it at this website.)