Copyright in the Time of AI

Another reason to register your copyright

Feb 24, 2025

Should you—if you’re a writer—register your copyright, especially in case of a novel?

Officially—to the best of my knowledge and internet research—‘copyright is automatic’; that is:

“A literary work is copyrighted as soon as the words are written down, typed, or otherwise recorded. Therefore, copyright protection for your novel begins as soon as you have written it.”

So far, so good. However, as LegalZoom states: “You cannot sue someone for copyright infringement unless you have registered your copyright.”

Wait what? I thought that this—protection from infringement—was one of the most important aspects of copyright. So I checked the US Copyright Office website:

Registering a work is not mandatory, but for U.S. works, registration (or refusal) is necessary to enforce the exclusive rights of copyright through litigation. Timely registration also allows copyright owners to seek certain types of monetary damages and attorney fees if there is a lawsuit, and also provide a presumption that information on the registration certificate is correct.

TL;DR: your unregistered, ‘automatic’ copyright allows you to reproduce the work (make more copies of it) and distribute these copies by sale or other transfer like rental, lease or lending1. But it does not protect you from copyright infringement!

Now some may argue that the chances of this happening are so close to zero as to be virtually indistinguishable from it. I disagree, for two reasons:

In the extremely unlikely case that a person or a company infringes my copyright, then I want to be able to sue them;
How do I know my novel(s) have not already been used as training data for chatbots like ChatGPT, Google Gemini, Grok and the like?

With regards to point 2: this has happened before. To refresh your mind—or inform you in case you didn’t know—the US court judged in favour of ‘fair use’ in the case of AuthorsGuild v. Google in 20215. When Google set up Google Books, it scanned and digitised over twenty million books without permission or license. Google Books allows users to search the scanned text of these literary works and view ‘snippets’ of the text in the user’s search results.

Quote (via the Copyright Alliance):

Under the first fair use factor, the Second Circuit found that Google’s scanning and uploading of entire books for text-searching purposes was “highly transformative” compared to the purpose of the original books because Google’s purpose was to convey information about the books, not to convey the books themselves. Additionally, the court held that there was not enough “amount and substantiality” of the original text used in the project because Google limited the amount of text that it displayed to users, and because the public could not access the complete digitized copies. Notably, the court also found that the limited text made available by Google did not provide a significant market substitute for the underlying works, as the use “augment[ed] public knowledge by making available information about [p]laintiffs’ books without providing the public with a substantial substitute for matter protected.” Therefore, the fourth factor also weighed in favor of a fair use finding.

Which means that ten years later, those authors with their work(s) in Google Books (my first novel Forever Curious included) are now royally screwed. Why?

In these days of companies scraping the internet for training material for their AIs, how high are the chances that Google used these twenty million books as training material for Google Bard and subsequently Google Gemini? Officially, they shouldn’t, but that didn’t stop Google from filling their Google Books library with 20 million books without asking for permission.

Anyway, I asked Google Gemini directly if it could analyse my novel Forever Curious, and here’s the answer:

Unfortunately, I don't have access to the full content of books available on Google Books. Copyright restrictions prevent me from directly reading and analyzing copyrighted material. Therefore, I cannot provide a detailed analysis of "Forever Curious" by Jetse de Vries at this time.

So I asked: “Was the contents of Google Books part of your training data?”

Large language models like me are trained on a massive dataset of text and code. While that dataset is extensive, it doesn’t include direct access to the full, copyrighted content of every book on Google Books. Training data is gathered from publicly available sources, and copyrighted material is generally excluded.
Think of it this way: I might have learned about common tropes in science fiction novels, or the typical structure of a mystery, from various sources. I might even have encountered snippets of text from books in publicly available contexts (like reviews or articles). But I haven’t “read” every book on Google Books in its entirety as part of my training. That would be a copyright infringement.
So, while the existence of books and general information about them might be part of the dataset (titles, authors, genres, summaries), the full text itself is not directly included. That's why I can’t provide a full analysis of a book I haven’t been given direct access to.

(All empases are Google Gemini’s, not mine.)

If that is true, then AuthorsGuild v. Google may have delivered some protection. However, I’m not sure if I should believe the text from a chatbot, as they do hallucinate. So make of the above what you will, and note the weasel words: “copyrighted material is generally excluded.” (Emphasis mine.)

Therefore—as I did with my ‘forever’ and ‘replicant’ duologies—I registered the copyright of all my material in my ‘consensual reality’ series:

The Three Reflectors of Consensual Reality (first novel);
“Black Swan Hacks” (novelette preceding the second novel);
The Transendence of Consensual Reality (second novel);
“The Transit Year” (novella preceding the third novel2);
The Constructors of Consensual Reality (third novel);
“The Cryptomancers at the Heart of the Galaxy” (novella set in the very far future after the third novel3);

The US copyright office charges $55 per novel, but you can use the $85 option to register up to 10 works. Well, I only had six4.

I see this copyright registration as a kind of fire insurance5; that is, it doesn’t stop the actual infringement, but hopefully will give me a payout in case it happens (and I notice it).

Prompt Engineer it so you don’t have to do it yourself

Problem is that it may not happen directly (a person or AI directly copying my work), but rather indirectly as ‘prompt engineering’ becomes ever more widespread. Yes, people are using prompts to get results from chatbots: OpenAI actuall provides a guide. So it’s possible to use prompts for novel ideas (or even to write a whole novel).

These ‘prompt ideas’ may very well originate from your novel. Which—if6 it happens—may lead to a world of trouble:

How do you find out?
If you find out, how do you prove it?
How does it work with AIs?

In case of the second, registered copyright gives you material. And then the fight about ‘fair use’ begins. As mentioned in the Wikipedia article “Artificial Intelligence and copyright”:

Works “where the expressive elements are determined by a machine” still remain uncopyrightable.

So good luck if some chatbot ‘hallucinates’ (part of) your novel. For example, in the EU’s AI act ‘the more general exception only applies if the copyright holder has not opted out’. Wait what? How do I know that I shoud have ‘opted out’? Shouldn’t OpenAi and their ilk not have requested permission from me? Of course, as huge corporations they act first and ~~ask later~~ never ask.

That’s why I have a ‘no scraping’ disclaimer in my Substack intros, and now that I think of it, will go to put the same notice in all my novels. Welcome to the modern world.

AI: inspiration portal or a plagiarism machine?

Support this writer:

Like this post!
Re-stack it using the ♻️ button below!
Share this post on Substack and other social media sites:
Share

Join my mailing list:

Or you could buy me a coffee.

Author’s note: I registered the copyrights mentioned above on the Friday, which took all Friday—gathering all the loose chapters into one Pages file per novel (yes I write in Apple Pages)—then converting them to PDFs, then entering them in the US Copyright Office website and the BOIP (Benelux Office of Intellectual Property—yes I like to hedge my bets). And felt good—despite the money spent—when it was done.

I started writing the above piece on Saturday, but started feeling under the weather for the rest of the weekend (revenge of the dying winter, I suppose). Nevertheless, I feel better now as the weather is warming (Saturday even peaked at a balmy 18 Celsius, dropping to 10 Celsius today) and spring is coming!

Spring has sprung in my neck of the woods.

Welcome to the new subscribers and many thanks for reading!

And perform or display the work publicly (this includes digital audio transmission like audiobooks);

It was intended as the first chapter, but that would have blown up the novel to 140K, so I decided to separate it;

Meant as the final word on that series, but then again you never know…;-)

Slacker!

My deepest sympathies to people around the world who lost their houses to wildfires: I hope you were insured and that the insurance was paid out;

More cynical souls will say: “when it happens”;

The Divergent Panorama

Discussion about this post