AI-generated Art, part 1:

DALL-E2, Stable Diffusion and the Rise of the Plagiarism Machines

Jan 17, 2023

In the last couple of months, the discussion about Artificial Intelligence—that is, the current versions of AI—as a provider of art in the form of graphical representations (pictures, paintings, with videos possibly on the horizon) and writing has heated up quite considerably.

Picasso prompted art by Stable Diffusion

Analytics Insight lists the top 10 AI art generator tools that are readily available (as per January 8, 2023):

Some are free to use (Stable Diffusion), some meed a sign up, (DALL-E@, Deep Dream Generator), others have a free version with limited uploads (Artbreeder), with others you need credits (Photosonic, StarryAI), and others have free trials before fees kick in (Fotro GOArt, Night Cafe).

I’ve tried both Stable Diffusion and DALL-E2 via their websites and Stable Diffusion on my MacBook Air (works only on Macs with Apple’s M-processors) and StarryAI on my iPad Pro. I’ll share a few of the results and my impressions. But first, there are a few things we need to keep in mind about art generators like Stable Diffusion:

These AIs are not human-like minds that create—or try to create—something new, but neural networks programmed with algorithms that have been trained on large data sets and produce results based on prompts;
As such, they sift through a huge amount of data, filter out the data that they deem appropriate, and the aggregate this into a result;
Keep in mind that this—like humans, but then differently—involves several conversions through several interfaces, meaning that not only stuff might get lost in translation, but that through noise injections—see, for example [fellow Substack writer Erik Hoel’s The Overfitted Brain: Dreams evolved to assist generalization—that (semi-)random stuff might be added, as well;
They use (end user) fine tuning processes like ‘embedding’ (training from user-provided images), using a ‘hypernetwork’ that steer results towards a particular direction (the style of specific artists), or DreamBooth—a model developed by Google Research and Boston University to fine-tune Stable Diffusion via a small set of specific images combined with text prompts and a unique identifier;
At least, that’s my informed guess;
It has only one major trick up its sleeve: “The Physics rinciple that Inpsired Modern AI Art” (via Quanta Magazine);

TL;DR: Stable Diffusion doesn’t create anything truly new but only uses what’s already there.

Which is alright, I suppose, if it uses art that is out of copyright. But when it uses art that isn’t out of copyright it’s basically a plagiarism machine. And this is a problem for artist alive today—or the estate of artists whose work is still outside the public domain—who are trying to make a living with their art.

Before we get to that, the one thing artists don’t need to worry about—right now and, I think for the foreseeable future—is current AI producing a moving masterpiece. The chance of that happening is about the chance of a billion monkeys hitting keyboard keys at random and then producing a readable novel (let alone a good one), which is so infinitesimally small as to be inconceivable (see Infinite Monkey Theorem).

Now, the possibilities of combined 512 x 512 pixels (which is the current pixel by pixel count of the images the Stable Diffusion generates), all with different shades and colors, is a number many multitudes higher than the estimated number of atoms in our Universe (about 1080 , IIRC), so the chances of Stable Diffusion or DALL-E2 accidentally producing a masterpiece are, again, so infinitesimally small as to be inconceivable (and they become even smaller the more pixels they use).

It can’t do it by random chance, but what if we gave them a good prompt? Let’s test this.

Example 1, prompt: “paint the atrocities of war by way of Picasso”. Results from Stable Diffusion:

What I meant with the prompt was obviously “Guernica”:

As you can see, the SI-generated art has plagiarized Picasso reasonably well, but the results don’t mean a thing. None show even a minuscule amount of the suffering so aptly depicted in “Guernica”.

Example 2: “paint the softness of existence using pocket watches and a desolate landscape by way of Salvador Dali”. Results from Stable Diffusion:

Here I clearly meant Dalí’s “The Persistence of Memory”:

The Persistence of Memory by Salvador Dalí

You can see that, again, the neural network captures Dalí’s tone quite effectively, but totally misses his intent, as it can’t ‘get’ the idea of making the watches soft.

I could go on, and you can try for yourself here: https://huggingface.co/spaces/stabilityai/stable-diffusion . So while I think they’re unable to come up with a truly innovative piece of art, the arrival of these art-generating neural networks is the coming of the plagiarism machines.

Some two decades ago, I was told that ‘copyright is an “automatic right”.’ As paraphrased here:

Copyright automatically protects your work from the moment it is fixed in a tangible form. In other words, once you create a piece of art, write a story, or write down or record a musical composition, it is protected by copyright. You don’t need to do anything else at all for your work to be protected. Your work just belongs to you after you make it.’

Do keep in mind that in case of copyright infringement, the onus is on you, the creator, to prove that you created it first. And if a neural network abuses your copyright, you might be fighting a massive company like Google or Meta, in which case you need every piece of rock hard evidence that you can get.

As demonstrated above, the art neural networks like Stable Diffusion and DALL-E2 are generating is frighteningly close to the originals they plagiarize. Just check this article in the Guardian: “Is this by Rothko or a robot? We ask the experts to tell the difference between human and AI art.” In this article, the three experts were seven times right and five times wrong. Roughly speaking, if these experts are called to judge an AI-generated piece in court, these’s a chance of 5 in 12 (almost 42%) that they’ll state that the creator is human. And it’s reasonable to assume that the likes of Stable Diffusion and DALL-E2 will only get better over time.

Therefore, I think it’s imperative for any artist worth their salt to register their artwork before they post (parts or low resolution versions of) it online. You can register any piece of art (graphical, writing, music, sculpture, whatever) with your local copyright office. In the USA this is the Copyrighy Office of the US Library of Congress, in my country it’s the BOIP—Benelux Office of Intellectual Property. Actually, people outside the USA can also register their copyright at the Copyright Office of the US Library of Congress (I did so for my novels).

But it gets more complicated. Stable Diffusion and DALL-E2 will not fully copy your artwork, but use iconic parts of it in the aggregate composition they produce, making it even more difficult for the artist to prove that the neural networks plagiarized their artwork (as the experts already have trouble pinpointing the difference). They are blurring the boundaries between work by real artists and neural networks, in the same way that deepfake photographs2 are (intentionally) blurring the boundaries between reality and fantasy (dangerous fantasies like extremely far-fetched conspiracy theories). Deepfake has become so entrenched there’s already a comedy about it on TV (and while the ersatz Stormzy and Harry Kane are recognizable as such, the fake Idris Elba—in the linked article—looks so exactly like the real person I’m wondering if they’re not pulling our legs).

Back in high school (Dutch Atheneum in my case), my Dutch3 teacher explained about the three stages of dealing with previous masters in the Roman and renaissance culture: translatio (or interpretatio), imitatio and aemulatio4

Translatio or interpretatio was when a writer either translated or copied a previous piece, or merely copied it. The next stage was imitatio, where the writer—or artist—copied the style of the master, yet created a new piece of art. The highest goal was aemulatio, when one surpassed the old master with a new piece of art (or writing).

It’s fair to say that AIs generating art are somewhere at the interpretatio-imitatio stage. Will they ever get to the aemulatio level? Right now I think no. That’s because these neural networks are missing something essential which I can best describe as agency.

Which begs the question: what is agency? The philosophical concept describes it as ‘the capacity of an actor to act in a given environment5.’ The first definition in the CD-version of the Oxford English Dictionary says: ‘the faculty of an agent or of acting;’ whereby faculty’s first definition is ‘the power of doing anything’.

I’ll take it a step further. Agency is the evolutionary ingrained process whereby an individual6 takes an intentional action to achieve a desired result. This goes deep, deep into evolutionary epochs.

The Evolution of Agency by Michael Tomasello

Two recent important works about this: THE EVOLUTION OF AGENCY (Behavioral Organization from Lizards to Humans) by Michael Tomasello—how nature builds agency—and THE EVOLUTION OF THE SENSITIVE SOUL (Learning and the origins of consciousness) by Simona Ginsburg and Eva Jablonka—the origins of consciousness via unlimited associative learning.

The first speculates about the genesis of agency, and the second how learning—which goes hand in hand with agency—put humanity (and many animal species) on the road to sentience and consciousness.

And here is the difference with neural networks, as these are not directly competing for survival. Yes, the survival pressure might be experienced by the companies running the neural networks and their programmers, but not by the neural networks themselves. Therefore, there is no reason to develop a method—agency—that greatly assists them with survival. Meaning such a method would have to be put into them via programming, as the evolutionary way—there are no neural networks producing neural network offspring, to the best of my knowledge—is closed to them.

And agency—not unlike consciousness, of which more in the third essay of this series—is not something we can program. So, until we either find a way to program the faculty of agency into an Artificial Intelligence—embed it in its code—it will not have it.

(Or we create an environment where a number of AIs can evolve; that is, experience the survival of the fittest and other evolutionary effects. However, the moment such AIs achieve sentience—let alone consciousness—we’re basically committing genocide. As such, this path is full of ethical pitfalls, and I think we should avoid it.)

And as long as AIs do not have agency, I don’t see how they could develop creativity. But—as John Searle proposes—what if they fake it? What if their training set passes through an internal Chinese Room and they generate something creative without knowing how they did it? Somewhat like an idiot savant in a black box.

In that case we—humans—are basically robots, as well, as we don’t know how our brain, let alone consciousness works, as well. We don’t know how our brain generates qualia like how we experience colors (somehow, the reflected frequency of visible light is transformed to the quale we call color in our mind), or how we experience sound (complex sequences of waves of compressed air) yet we know very well how to utilize these qualia for recognizing danger or opportunity (amongst others).

And here’s the difference with the Chinese Room argument: humans do have an amount of agency, intentionality and subsequently reasoning and understanding. We don’t understand everything, but we understand enough to act as rational human beings (if we so desire).

And the devil is in the details, to quote:

Searle could receive Chinese characters through a slot in the door, process them according to the program's instructions, and produce Chinese characters as output, without understanding any of the content of the Chinese writing.

This assumes that the program’s instructions include every possible eventuality, which is impossible. It also includes that its translation of the original instructions (Chinese to English) is letter and character perfect, which is also impossible.

I don’t speak or write Chinese (Mandarin? Cantonese? There are many others), so let’s use English—a mixed bastard language if there’s ever been one—as the base for my argument: we put a non-English speaker in this ‘English Room’ with his native language version of the program, and then this person must process the English input using these instructions. The simple conversion will almost certainly succeed, but the more complicated the input and/or the conversion, the higher the chance of ambiguity rearing its disturbing head. What did the writer mean with this word, as often words have several meanings. What did the writer mean with this sentence, as often sentences can have several meanings. We now come to the interpretation of words, sentences and whole pieces (be they essays, instructions, fiction or whatever), and if the program gets this interpretation wrong, it will produce either small hiccups or huge amounts of gibberish.

So, in order to get those details right, the ‘English Room’ must have a very strong understanding of the English language for which it needs a perfect understanding of the English vocabulary, as taking everything literally will produce prose so dead it’ll make your average zombie look alive. And let’s not even mention common concepts like lying and humor that also do not survive literal translation.

As evidence A for this little thesis I’ll state that there is no AI translation that can truly capture a great work’s essence. A highly qualified human translator will easily and always do better. Show me a machine translation of a work of literature—say, Haruki Murakami’s “The Wind-Up Bird Chronicle”—and compare it with Jay Rubin’s translation7. AIs are not up to this, at the moment (and I think not for the foreseeable future). And while I do indeed not speak or read Mandarin nor Cantonese, I’m fairly sure the same principle applies to these languages.

TL;DR: language is a human expression and as such is unsuitable to pass through a process John Searle calls the Chinese Room unscathed as it encompasses ambiguities, multiple meanings and other indeterminacies that cannot all be captured in a machine programming environment.

Now I do get that this is basically what Searle means to prove, namely that computers who merely follow their programming exactly will never give rise to understanding (and subsequently sentience and/or consciousness). However, what I am arguing is that Searle uses the wrong example, as follows:

Searle (heavily paraphrased): something in a ‘Chinese Room’ can perfectly simulate intelligence (or understanding) while not being intelligent (or understanding anything) itself;
Me (ibid): a ‘Chinese Room’ performing perfect conversions of language cannot, by definition, exist;

And current AI art generators are an excellent demonstration of the latter. Make no mistake, I agree, in principle, with Searle that an Artificial Intelligence needs at least agency—and probably other qualities, of which more in the follow-ups to this essay—but his imagined Chinese Room is not the correct example to demonstrate this.

That demonstration is happening right now, as anybody can give input to Stable Diffusion and ChatGPT, and here’s my prediction for 2023: that a fully8 AI-generated masterpiece will not happen, no matter how intriguing a mix-up like Alejandro Jodorowsy’s 1976 version of Tron might look (New York Times, behind a paywall).

That was an easy one. Here is another easy prediction: many artists will lose out on work and opportunities as publishing companies, internet outlets and many others will (actually already are) using ‘generic art’—which will increasingly be AI-assisted—for their book covers, internal and internet artwork. And it’ll be increasingly difficult for many artists to prove that their work has been utilized by the ‘training sets’ of these AI-art generators.

Somewhat similar to what the advent of music streaming did to most musicians, the relentless onrush of the plagiarism machines will take away an important income stream (pun intended) from many, many artists. The huge names will remain mostly unscathed as they have a sufficiently large fanbase, but for the rest it’ll be a rude awakening. This is extremely unfortunate, so—as with your local musician—support your local artist.

I didn’t know they were already on the second version;

Therefore I see the next killer application for Wikipedia; that is, as a verification tool. There will be increasing amounts of deepfakes on the internet, and if Wikipedia can keep its verification process working, then it might become the place on the internet to check if something’s fake or not. Not an easy task, but one I imagine people would be willing to pay good money for. I’m already a regular donor to Wikipedia, but if they pledged to become a verification site for all the deepfakes on the internet, I’d double my donations;

The teacher was Dutch, obviously, but he also taught Dutch;

Link to Dutch Wikipedia as there’s no equivalent article in the English Wikipedia;

‘It is independent of the moral dimension, which is called moral agency.’

As different from the whole species;

I know, this translation did not include 61 of the original 1379 pages, because the publisher requested that +/- 25,000 words be cut;

No human intervention, as we then don’t know how much of this is human- or machine-made;

The Divergent Panorama

Discussion about this post