philosophy

Who Owns a Sentence You Didn't Write?

llm, ai, copyright, authorship, philosophy, ethics, writing
philosophy, digital-ethics

I asked an LLM to write a paragraph about grief. It gave me back something beautiful a metaphor about waves and shorelines, about learning to breathe when the water pulls you under. It was better than anything I could have written on the subject, and I have been grieving long enough to know.

The question is: who owns that paragraph?


The Problem with “Author”

The Western concept of authorship is built on a specific model: a solitary genius, inspired by muses or madness, produces original work from the raw material of their own experience. Copyright law is designed around this figure. It assumes a human with a pen, a piano, a brush.

The LLM does not fit this model. It has no experience. It has no intention. It does not “mean” anything it says. It is a statistical engine that predicts the next token based on patterns in its training data. Data that includes, by conservative estimate, billions of human written texts.

When I prompt an LLM and it produces a paragraph, who wrote it?

Option A: I wrote it. The LLM is a tool, like a typewriter or a paintbrush. The credit (and blame) belong to the person who directed it.

Option B: The LLM wrote it. It is an autonomous generator. The output is machine made, and no human can claim authorship.

Option C: The training data wrote it. Every sentence the LLM produces is a remix of existing human writing. The real authors are the millions of people whose work was scraped without consent.

Option D: Nobody wrote it. The paragraph exists in a legal and philosophical vacuum. Produced, but not authored. It is writing without a writer.


Current copyright law is not equipped for this. The US Copyright Office has ruled that AI-generated work cannot be copyrighted unless a human made “creative modifications.” But what counts as a modification? Is writing a prompt creative enough? Is editing the output? What about the person who spent six hours iterating on prompts, testing and refining until they got what they wanted did they “create” the result?

The lawsuits are already piling up:

  • Getty Images v. Stability AI - over training data scraped without license
  • Authors suing OpenAI - alleging that their copyrighted books were used to train models without permission or compensation
  • The New York Times lawsuit - arguing that ChatGPT reproduces their articles verbatim in some cases

These cases will set precedent. But they are asking the wrong question. They are asking “was this legal?” when they should be asking “what does authorship mean in a world where machines can produce language?”


The Philosophical Vacuum

Let us go deeper. (things she never said…)

If I write: “The sky was the color of a bruise, purple and yellow and sorry for itself” or “your momma so fat that her image takes 2 petabytes of space” that is mine. I wrote it. I felt something and translated it into words.

If I prompt an LLM: “Write a sentence describing a bruised sky” and it produces: “The sky was the color of a bruise, purple and yellow and sorry for itself” whose sentence is it?

I provided the concept. The LLM provided the execution. But the LLM only produced that sentence because it had seen something like it in its training data, perhaps thousands of similar descriptions, statistically averaged into a single competent sentence. The real author, in some sense, is the aggregate of every writer who ever described a sky.

This is where the philosophy gets uncomfortable: all writing is derivative. There is no such thing as pure originality. Every sentence you write is influenced by every sentence you have read. The difference between a human and an LLM is one of degree, not kind. We are both remixing what came before us.

But we feel differently about it. A human who copies another writer is a plagiarist. An LLM that copies is operating as designed. The moral weight is not in the act it is in the agent.


The Practical Reality

Here is where I land, for now:

  1. The tool argument is incomplete. A typewriter does not generate sentences on its own. A camera does not compose the photograph. The LLM is not a traditional tool it is a collaborator that exists in a grey area between instrument and independent agent.

  2. Training data is the real question. The ethical crisis is not about output it is about input. Billions of human hours of writing were consumed without consent to create these models. The people who wrote that data deserve recognition and compensation, even if the legal framework does not currently require it.

  3. Attribution matters. If I publish something generated by an LLM, I have an obligation to say so. Not because the work is less valuable but because the reader deserves to know what they are engaging with. Transparency is the minimum ethical standard.

  4. The “I” is changing. When I write with an LLM, the “I” in the sentence is no longer a single self. It is a hybrid my intention plus the model’s statistical output plus the ghost of every writer in its training data. We do not have a pronoun for this kind of authorship yet. We need one. (I had a your mom joke but this is a serious post maybe next time, writing this just so you know that I had a joke …)


A Modest Proposal

I do not think LLM-generated writing should be uncopyrightable. That would create a perverse incentive a vast ocean of public-domain text that humans cannot compete with, drowning out human authorship entirely.

I do not think LLM generated writing should be fully copyrightable either. That would allow companies to own the output of a system built on unlicensed human labor.

I think we need a third category: call it assisted authorship. If the human contribution is substantial (editing, structuring, conceptualizing), the work is copyrightable with disclosure. If the contribution is minimal (a single prompt, lightly edited), the work enters a new legal zone protected from exact copying but not considered “original” in the traditional sense.

This is messy. It requires courts to judge degrees of human involvement, which is always imprecise. But the alternative pretending the technology does not exist, or forcing it into legal categories it was never designed for is worse.


Coda

I wrote this essay with an LLM. I provided the structure, the arguments, the metaphors, and most of the sentences. The LLM helped me rephrase a few passages, suggested transitions, and caught a logical gap in the “tool argument” section.

Should I disclose that? I just did.

Does it make the essay less mine? I do not think so. But I am not entirely sure. And I think the uncertainty the honest, uncomfortable uncertainty is the closest thing to truth we have right now.

The paragraph about grief, by the way: I deleted it. It was too beautiful to be mine.

Previous Thinkers You Should Know: Guy Debord Next The Art of the Coup: The Classical Coup