ChatGPT's viral Studio Ghibli-style images highlight AI copyright concerns

@return2ozma@lemmy.world · 5 days ago

ChatGPT's viral Studio Ghibli-style images highlight AI copyright concerns

FaceDeer · 4 days ago

Training doesn’t involve copying anything, so I don’t see why they wouldn’t. You need to copy something to violate copyright.

Phoenixz · 3 days ago

I hate lawyer speak with a passion

Everyone knows what we’re talking about here, what we mean, and so do you

FaceDeer · 3 days ago

And yet if one wishes to ask:

Did they have the right to do that?

That is inherently the realm of lawyer speak because you’re asking what the law says about something.

The alternative is vigilantism and “mob justice.” That’s not a good thing.

@enumerator4829@sh.itjust.works · 4 days ago

There is an argument that training actually is a type of (lossy) compression. You can actually build (bad) language models by using standard compression algorithms to ”train”.

By that argument, any model contains lossy and unstructured copies of all data it was trained on. If you download a 480p low quality h264-encoded Bluray rip of a Ghibli movie, it’s not legal, despite the fact that you aren’t downloading the same bits that were on the Bluray.

Besides, even if we consider the model itself to be fine, they did not buy all the media they trained the model on. The action of downloading media, regardless of purpose, is piracy. At least, that has been the interpretation for normal people sailing the seas, large companies are of course exempt from filthy things like laws.

FaceDeer · 3 days ago

Stable Diffusion was trained on the LIAON-5B image dataset, which as the name implies has around 5 billion images in it. The resulting model was around 3 gigabytes. If this is indeed a “compression” algorithm then it’s the most magical and physics-defying ever, as it manages to compress images to less than one byte each.

Besides, even if we consider the model itself to be fine, they did not buy all the media they trained the model on.

That is a completely separate issue. You can sue them for copyright violation regarding the actual acts of copyright violation. If an artist steals a bunch of art books to study then sue him for stealing the art books, but you can’t extend that to say that anything he drew based on that learning is also a copyright violation or that the knowledge inside his head is a copyright violation.

@HereIAm@lemmy.world · edit-2 3 days ago

There’s a difference between lossy and lossless. You can compress anything down to a single bit if you so wish, just don’t expect to get everything back. That’s how lossy compression works.

@yetAnotherUser@discuss.tchncs.de · 3 days ago

It’s perfectly legal to compress something to a single bit and publish it.

Hell, if I take and publish the average color of any copyrighted image that is at least 24 bits. That’s lossy compression yet legal.

@witten@lemmy.world · 4 days ago

“In its suit, the Times alleges that, when prompted by users, ChatGPT sometimes spits out portions of its articles verbatim, or shares key parts of its content, such as findings uncovered through investigations by Times reporters, or product endorsements carefully researched and vetted by Wirecutter, an affiliate site.”

From: https://hls.harvard.edu/today/does-chatgpt-violate-new-york-times-copyrights/

FaceDeer · 4 days ago

In its suit, the Times alleges that

Emphasis added. Of course they’re going to claim their copyright was violated, they don’t have a case otherwise.

OpenAI alleges that the New York Times pulled a bunch of shady shenanigans to get the results they’re claiming.

It remains to be seen how the case will be decided.

@witten@lemmy.world · 3 days ago

Lol did you even read the article you linked? OpenAI isn’t disputing the fact that their LLM spit out near-verbatim NY Times articles/passages. They’re only taking issue with how many times the LLM had to be prompted to get it to divulge that copyrighted material and whether there were any TOS violations in the process.

FaceDeer · 3 days ago

They’re saying that the NYT basically forced ChatGPT to spit out the “infringing” text. Like manually typing it into Microsoft Word and then going “gasp! Microsoft Word has violated our copyright!”

The key point here is that you can’t simply take the statements of one side in a lawsuit as being “the truth.” Obviously the laywers for each side are going to claim that their side is right and the other side are a bunch of awful jerks. That’s their jobs, that’s how the American legal system works. You don’t get an actual usable result until the judge makes his ruling and the appeals are exhausted.

@witten@lemmy.world · 3 days ago

If a fact isn’t disputed by either side in a case as contentious as this one, it’s much more likely to be true than not. You can certainly wait for the gears of “justice” to turn if you like, but I think it’s pretty clear to everyone else that LLMs are plagiarism engines.