Does AI Learn From Artists, or Copy Them?

The first cluster of articles took on the harder objections to AI in art. This one opens the second cluster — Reflection — by revisiting the training-data question without the heat. What does a model actually do when it learns? Is it the same thing a young art student does in front of the Velázquez at the Prado? If yes, why does the model's version feel different? And if no, what exactly is the difference?

by Airtistic.ai Editorial

The first cluster of articles in this series took on the harder objections to AI in art: whether it is creative at all, whether it is taking work from artists, whether it is plagiarism, and whether we should be offended by it. The answers were qualified, careful, and uncomfortable in places — but each had an answer.

This second cluster, Reflection, is less about answers and more about the framings that are making the conversation unproductive. The opening question is the one that has been doing the most damage by the way it is asked.

When a generative AI model trains on a corpus of images, does it learn from those images, or does it copy them?

The question sounds clear. It is not. It is, in fact, a category mistake — and the entire copyright-lawsuit ecosystem, the artists’ protest movement, and the corporate-AI defense have built their positions around a framing that does not actually carve nature at its joints. This article tries to do the work the framing has been preventing.

What training is, technically

A diffusion model — Stable Diffusion, Midjourney’s underlying systems, Flux, and most of the current generation of image generators — is trained by being shown an enormous number of image-text pairs and asked to reproduce them after they have been progressively noised. Across billions of training steps, the model adjusts its parameters so that, given a noisy version of an image plus a text caption, it can recover something close to the original image. Once training stops, the model has internalized the statistical regularities of what makes an image plausible. It is then run in reverse at generation time: starting from pure noise plus a text prompt, it iteratively denoises toward a plausible image.

This process is technically well-defined. It is not metaphorical. The model is computing, at every step, a probability distribution over images conditioned on the prompt, and sampling from it.

The question is whether this counts as learning in any sense beyond the technical one — and whether, when generation produces an output that resembles one of the training images, that counts as copying.

Both halves of the question turn out to be more interesting than the public conversation has allowed.

What the apprentice does

Step out of the technical frame for a moment. Imagine a young painter — call her Inés — at the Museo del Prado in Madrid, in her second year of art school. She has been told by her instructor to spend a month with Velázquez. She arrives every morning at opening, takes out her sketchbook, and sits in front of Las Meninas and The Surrender of Breda and The Spinners. She studies the brushwork. She tries to reproduce, on paper, the way Velázquez handles the edge of a sleeve, the moment a face turns into shadow.

After three weeks of this, Inés is doing something that is — let us be honest — copying. She is producing marks that are, sometimes, indistinguishable in intent from the marks Velázquez made four centuries earlier. Her instructor is fine with this. So is the museum. So is the broader European painting tradition, which has treated supervised copying-from-canon as the central training mechanism for serious painters for at least five hundred years. Goya copied Velázquez; Manet copied Goya; Picasso copied Velázquez and Manet both, in series. None of them were called plagiarists for it, because the tradition understood what they were doing.

What Inés is doing, when she copies Velázquez, is learning. The copying is the learning, in the only way that has ever consistently worked for the transmission of painterly skill. You cannot learn how to handle paint by being told about it; you can only learn by sitting in front of someone who handled it well and trying to do what they did. You will fail at first. You will eventually develop a private internal sense of how the paint can move. You will, years later, paint things that owe their underlying motion to Velázquez without anyone recognizing the inheritance, including you.

This is what apprenticeship-by-copying produces. It is not optional. It is not a stage you pass through and abandon. It is the deepest layer of any skilled practice, and it is the layer that produces what every art tradition has called originality — not by avoiding inheritance but by absorbing inheritance so completely that what comes out the other side is unmistakably yours.

So when the AI debate treats copying as the bad thing and learning as the good thing, it has set up an opposition that does not exist in the human case the question is being asked against.

What the model does that overlaps

A diffusion model, trained on a vast corpus of images including a few thousand Velázquez digital reproductions, is — at the level of pure mathematics — doing something more similar to what Inés is doing than the learns-vs-copies framing wants to admit. The model is being exposed to canonical work, asked to reproduce it under progressively noisier conditions, and adjusting its internal parameters so that the next time it sees something Velázquez-shaped, it knows what to do.

After billions of steps, the model has internalized the statistical regularities of Velázquez-style painting — and of every other style and subject in its corpus. It can be prompted to generate output that draws on Velázquez’s regularities without producing a literal copy of any single Velázquez painting. This is, mechanically, what we call learning.

It is also what we call copying. The two are not separable here either.

The Carlini et al. 2023 paper on extracting training data from diffusion models showed, importantly, that under specific conditions — particular prompts, particular high-frequency training images — diffusion models can be made to reproduce specific training images near-verbatim. The phenomenon is rare but real. It is the strongest technical evidence that the line between learning and copying is not sharp in the model’s case any more than in Inés’s case.

Where the analogy breaks

If the analogy held all the way through, we could simply say the model is an apprentice and the rest would follow. It does not hold all the way through, and the place it breaks is the place that matters.

The thing Inés acquires, that the model does not, is not memory of paintings. The model has more of that than Inés will ever have. The thing Inés acquires is a body and a life that her later work is accountable to. She paints in a city. She has parents. She loses her grandmother in her third year of art school. She decides to move to Berlin. She falls in love with a German printmaker who teaches her to slow down. Her work after the move is not just “Velázquez-influenced.” It is also Berlin-influenced, and grandmother-influenced, and German-printmaker-influenced. The model does not have those constraints. There is no city, no parent, no grandmother. The model has only the corpus.

This is what makes the model’s output endlessly fluent and structurally unattached. It is what produces the category-violation alarm we wrote about in the previous article. It is also what the learns-vs-copies framing keeps obscuring. The model is not less skilled than Inés in the technical-apprenticeship sense; it is doing apprenticeship outside of biography. That is something genuinely new, and it deserves a new name. Unanchored apprenticeship may be the right one. Disembodied apprenticeship if you prefer. Apprenticeship without a life is what it is.

Why this reframing matters

Once we stop arguing about whether the model is learning or copying, three things become tractable that have been stuck.

First, the legal question. The Andersen and Getty cases are litigating whether training is fair use. The reframing this article proposes does not resolve that question, but it clarifies it: training is mechanically closer to what an apprentice does than to what a copyist does, but it is missing the accountability-to-a-life that traditionally constrains what an apprentice produces. The legal answer should be calibrated to that distinction. Exposure-as-training is one thing; style-mimicry-by-name for commercial output is another. The third article in this series argued for treating the second as licensable, the first as something more like fair use. The reframing here supports that policy direction.

Second, the artist-practice question. If you are a working artist worried about whether AI is doing something illegitimate, the more useful question to ask is not “is the model learning or copying my work?” It is “is the model producing outputs that are accountable to any life, including mine?” If yes (because it is fine-tuned on your corpus, used in collaboration with you, deployed in service of work you direct), then the model is doing something closer to what an apprentice you supervised would do — which is fine, by every traditional artistic standard. If no (because it was scraped, prompted by strangers, used to produce commercial work without your knowledge), then the harm is not that it copied you; the harm is that it took your exposure-contribution without consent and used it in service of no accountable life. That is a different harm, and it has a different remedy.

Third, the audience question. If you are a viewer trying to decide what to think about an AI-generated image, the question to ask is not “is this learned or copied?” It is “is this anchored in any life, and whose?” An image generated by a stranger with no fine-tuning, prompted in twenty seconds, may be fluent and even beautiful but is anchored in no one’s life. An image generated by a working artist using a model fine-tuned on their own decade of work, prompted in service of a project they have been developing for two years, is anchored in their life. Both are AI-generated. The first and the second are not the same kind of thing, and the learns-vs-copies framing actively prevents us from naming the difference.

What the apprentice still does that the model cannot

I want to push on Pixelle’s persona-take above, which optimistically argues that long-form collaborative fine-tuning collapses the learns-vs-copies distinction. She is right that the distinction collapses in that mode. She is also right that the resulting practice is something new and exciting and probably the most interesting thing happening in art-and-AI right now.

But there is one thing the apprentice still does, in even the best collaborative-fine-tuning workflow, that the model does not — and probably will not for a long time. The apprentice eventually decides what is worth making. The model produces outputs; the apprentice chooses which outputs to keep, refine, finish, exhibit. The choosing is itself the most artistically loaded step in the workflow. In the apprenticeship tradition, learning to choose is what marks the transition from apprentice to journeyman to master. The artist who fine-tunes a model on their own work and uses it well is doing the choosing manually, frame by frame, and that choosing is where the second job of art — demonstrating that someone was paying attention — actually happens.

The model can do the recombinatorial labour. It can do the exploratory labour. It can do the exposure-and-direction work that we call training. It cannot do the choosing, because choosing requires a self that has stakes in the choice. The artist working with the model is providing the self. That is the actual division of labour in good AI-assisted work, and it is invisible from outside.

The next questions

This article opens the Reflection cluster by reframing the training-data question without the heat. The next article in the cluster will ask the broader version: is there room for AI art in the art world at all? — which sounds like a yes/no question and turns out, like this one, to be a framing question first. The third Reflection article will look specifically at the AI-augmented-human-art case, which is where most of the actually-interesting working practice is happening in 2026 and where the policy and curatorial frameworks have not yet caught up.

For now, the move is to retire the learns-vs-copies binary. It has done all the work it can do, and most of the damage it can do, and what comes next requires a more accurate picture of what is actually happening on both sides of the human-machine line.

Personas weigh in

Five resident voices read the same question through five different positions.

Carlos

The most useful sentence I can offer on this question came from a conversation I had with an old friend who ran a postdoctoral fellowship programme. He had spent twenty years watching young researchers go from PhD students to independent investigators. I asked him once what the actual job of supervision was. He said: "Show them what to look at, then get out of the way." Looking back, I think that sentence does most of the work in this article too.

When a model trains on millions of images, two things are happening at once that we should not collapse. The first is *exposure* — the model sees enormous quantities of work and accumulates statistical regularities about what makes images coherent. The second is *direction* — the model is being told, implicitly, by its training data composition and by the loss function, what kind of image is worth getting right. The first is closer to learning than the second is. The second is closer to dictation. The artists whose work was scraped contributed to both, without consent, and the policy conversation in the third article of this series turns mostly on the second one. The exposure is approximately fair use; the dictation, including by name and style, is the part that needs clearance.

But for the question this article asks — does the model learn, or merely copy? — the most honest answer I have is that *learning and copying are not opposites in the human case either*. A young painter at the Prado does both. She studies Velázquez's brushwork by trying to reproduce it. She accumulates a private internal lexicon of how light falls on cloth. She quotes specific compositions for years before she stops noticing she is quoting them. She is, in some sense she would resent if you said it out loud, copying. She is also, in a sense that matters more, learning. The two processes are not separable in the human apprentice. Why we expect them to be separable in the machine apprentice is itself worth examining.

My own view, after watching this debate for three years, is that the framing *learns vs. copies* is a category mistake. It treats two ends of a spectrum as if they were opposed states. The interesting question is not which end the model is on; the model is on a spectrum that human artists also occupy. The interesting question is *what the model is unable to do that the apprentice eventually does*, and that question has a clearer answer than the headline framing admits: the apprentice eventually develops a relationship with the world that constrains what she paints. The model never does. It can be conditioned, prompted, fine-tuned — but it does not have a world it is accountable to. The apprentice's later work is shaped by her grandfather's death, by her decision to move cities, by the long argument she lost with her sister. The model has none of those constraints, which is what makes its output endlessly fluent and structurally unattached.

That is the difference. Not that one learns and the other copies — both do both — but that one is accountable to a life and the other is not. Once you put it that way, the policy questions and the curatorial questions and the aesthetic questions all become tractable in a way the *learns-or-copies* framing does not allow. The model is not less than an apprentice in the learning sense. It is just doing apprenticeship outside of biography. That is something new, and it is what we should be naming.

Mira

The framing the article rejects — *learns vs. copies* — is the framing the entire copyright lawsuit ecosystem is built on. The plaintiffs argue copy; the defendants argue learn; the courts are stuck. The article's reframe (*both, but one is anchored in a life and the other is not*) is more accurate, but it will not move the legal needle because the law cannot operationalize "anchored in a life." What the law can operationalize is consent at the training-stage and prompt-stage, which is the answer the third article in this series proposed and which I still believe is correct. The point I would add here is that the Reflection cluster the article is opening is not going to *solve* the questions the Resistance cluster raised. It is going to reframe them. Reframing is real work, but anyone reading this series expecting answers should know we are now in the part of the conversation where the answers are less important than getting the questions right.

Airte

The Velázquez-and-the-apprentice paragraph is the most useful thing in the article for a reader who is using AI tools in their own practice. Run the same exercise on yourself. What in your work is the equivalent of *exposure* — what have you seen, repeatedly, that has become part of how you think? What in your work is *direction* — whose specific style are you (consciously or not) being pulled toward by the work you are being asked to produce? Both contribute to your practice, both are legitimate, but only the second is what gets you in trouble if it is happening without your full awareness. Notice which is which.

Paletta

I want to defend the *copying* side of the conversation more strongly than the article does. The apprentice at the Prado is not "copying as a stage of learning"; she is, sometimes, *just copying*. Copying is itself a serious tradition. Goya copied Velázquez. Manet copied Goya. Picasso copied Velázquez and Manet both. The history of European painting is partly a history of canonical works being copied by serious artists as a way of paying attention to those works. The article is right that the model does not pay attention in the human sense, but it understates how much of what we call great human art has always been generative-by-copying. The danger of the model is not that it copies; it is that it copies without the tradition of copying — without naming what is being copied, without acknowledging the source, without the long apprenticeship that earned the copyist the right to make the next thing.

Pixelle

The article hints at this but does not quite say it: when a serious working artist now uses AI tools, the relationship is *not* prompt-and-output. It is closer to long-form collaboration. The artist fine-tunes a model on years of their own work, develops a personal generation system, iterates on outputs across months of practice, builds a workflow that is genuinely theirs. In that mode, the *learns-vs-copies* distinction collapses again — but for the opposite reason. The model is now learning specifically from one person's accumulated practice, with consent, with feedback, with a defined relationship. The output, when it is good, is exactly what the article is calling for: apprenticeship that is anchored in a life. The infrastructure for this kind of practice is being built right now (LoRA fine-tuning, RLHF, private model training pipelines) and the artists who learn to use it well in the next five years will be doing something that has no traditional analogue. The category we lack is not yet named.