Published on 2024-02-26

"We" being some though surely not all of us.

A few years ago, the companies building large language and stable diffusion models scraped a significant portion of the internet for training data, using other people's work to build a machine that effortlessly (when you ignore the externalities) creates new, derivative works.

Is this a problem? I think it depends on what you think it means for something to be a problem. As AI fans and apologists will point out, those people were freely offering their stuff to anyone with an internet connection. All they did was download publicly available information.

Further, while we like to treat them as pastiche machines, they aren't literally pastiche machines. If they were, then it'd be much easier to get them on issues of copyright. Creating art using the current generation of AI models feels more like clean-house reverse engineering than "copying" in the more basic sense.

Because of this, I really don't have faith that we collectively will win against AI companies in the courts. I feel the people who think we will are being optimistic.

But that doesn't make it okay, does it?

Copyright is good in as much as it helps artists survive. This is a pretty clean-cut case of copyright not helping artists survive. As is often said, the killer app of AI is its use as a bargaining chip in labour disputes. People will get paid less to do more menial work because of these things being made.

That's clearly not okay. Whether or not these things would actually be good and ethical in a hypothetical anarcho-socialist utopia is an interesting topic of a very different, less urgent discussion.

The problem might be that there's enough plausible deniability of wrongdoing in the specific actions recent AI companies have done to complicate any serious conversation about it. And because of that, to have a serious conversation about the problems with current-generation AI models, you've got to focus less on what was done and more the consequences that they're having on the world.

Artists will be paid less. Writers will be paid less. Visuals in popular media and public spaces will rapidly become more soulless and homogeneous. The internet will overflow with spam blogs that add nothing of value to human knowledge and burn historic amounts of natural gas. Whether or not the scraping itself was objectively, morally wrong is a question that can be meaningfully debated. These issues feel much more clean-cut.

