OpenAI Pleads That It Can’t Make Money Without Using Copyrighted Materials for Free

@flop_leash_973@lemmy.world · 1 year ago

OpenAI Pleads That It Can’t Make Money Without Using Copyrighted Materials for Free

@xelar@lemmy.ml · edit-2 1 year ago

Unregulated areas lead to these type of business practices where the people will squeeze out the juices of these opportunities. The cost of these activities will be passed on the taxpayers.

Chaotic Entropy · 1 year ago

“WE’RE NOT A VIABLE BUSINESS! BWAH!”

Oh. Oh no. Such a shame.

@meliodas_101@lemmy.world · 1 year ago

What kind of a pathetic statement is that ?

Thurstylark · 1 year ago

Oh, poor baby can’t make money with an illegal business model. How awful.

@masterspace@lemmy.ca · 1 year ago

So search engines shouldn’t exist?

Avid Amoeba · 1 year ago

Perhaps. Or perhaps not in the way they do today. Perhaps if you profit from placing ads among results people actually want, you should share revenue with those results. Cause you know, people came to you for those results and they’re the reason you were able to show the ads to people.

maegul (he/they) · edit-2 1 year ago

I mean, their goal and service is to get you to the actual web page someone else made.

What made Google so desirable when it started was that it did an excellent job of getting you to the desired web page and off of google as quickly as possible. The prevailing model at the time was to keep users on the page for as long as possible by creating big messy “everything portals”.

Once Google dropped, with a simple search field and high quality results, it took off. Of course now they’re now more like their original competitors than their original successful self … but that’s a lesson for us about what capitalistic success actually ends up being about.

The whole AI business model of completely replacing the internet by eating it up for free is the complete sith lord version of the old portal idea. Whatever you think about copyright, the bottom line is that the deeper phenomenon isn’t just about “stealing” content, it’s about eating it to feed a bigger creature that no one else can defeat.

@masterspace@lemmy.ca · 1 year ago

I really think it’s mostly about getting a big enough data set to effectively train an LLM.

maegul (he/they) · 1 year ago

I really think it’s mostly about getting a big enough data set to effectively train an LLM.

I mean, yes of course. But I don’t think there’s any way in which it is just about that. Because the business model around having and providing services around LLMs is to supplant the data that’s been trained on and the services that created that data. What other business model could there be?

In the case of google’s AI alongside its search engine, and even chatGPT itself, this is clearly one of the use cases that has emerged and is actually working relatively well: replacing the internet search engine and giving users “answers” directly.

Users like it because it feels more comfortable, natural and useful, and probably quicker too. And in some cases it is actually better. But, it’s important to appreciate how we got here … by the internet becoming shitter, by search engines becoming shitter all in the pursuit of ads revenue and the corresponding tolerance of SEO slop.

IMO, to ignore the “carnivorous” dynamics here, which I think clearly go beyond ordinary capitalism and innovation, is to miss the forest for the trees. Somewhat sadly, this tech era (approx MS windows '95 to now) has taught people that the latest new thing must be a good idea and we should all get on board before it’s too late.

@masterspace@lemmy.ca · 1 year ago

Users like it because it feels more comfortable, natural and useful, and probably quicker too. And in some cases it is actually better. But, it’s important to appreciate how we got here … by the internet becoming shitter, by search engines becoming shitter all in the pursuit of ads revenue and the corresponding tolerance of SEO slop

No, it legitimately is better. Do you know what Google could never do but that Copilot Search and Gemini Search can? Synthesize one answer from multiple different sources.

Sometimes the answer to your question is inherently not on a single page, it’s split across the old framework docs and the new framework docs and stack overflow questions and the best a traditional search engine can ever do is maybe get some of the right pieces in front of you some of the time. LLMs will give you a plain language answer immediately, and let you ask follow up questions and modifications to your original example.

Yes Google has gotten shitty, but it would never have been able to do the above without an LLM under the hood.

maegul (he/they) · 1 year ago

Sure, but IME it is very far from doing the things that good, well written and informed human content could do, especially once we’re talking about forums and the like where you can have good conversations with informed people about your problem.

IMO, what ever LLMs are doing that older systems can’t isn’t greater than what was lost with SEO ads-driven slop and shitty search.

Moreover, the business interest of LLM companies is clearly in dominating and controlling (as that’s just capitalism and the “smart” thing to do), which means the retention of the older human-driven system of information sharing and problem solving is vulnerable to being severely threatened and destroyed … while we could just as well enjoy some hybridised system. But because profit is the focus, and the means of making profit problematic, we’re in rough waters which I don’t think can be trusted to create a net positive (and haven’t been trust worthy for decades now).

@scarabine@lemmynsfw.com · 1 year ago

Case law has been established in the prevention of actual image and text copyright infringement with Google specifically. Your point is not at all ambiguous. The distinction between a search engine and content theft has been made. Search engines can exist for a number of reasons but one of those criteria is obeisance of copyright law.

@patacon_pisao@lemmy.world · 1 year ago

Wow, I just chatted with a coworker about AI, and I told them it was crazy how it uses copyrighted content to create something supposedly “new,” and they said “well how would we train the AI without it?” I don’t think we should sacrifice copyright laws and originality for the sake of improving profits as they tell us it’s only to “improve the experience.”

Admiral Patrick · 1 year ago

Yeah! I can’t make money running my restaurant if I have to pay for the ingredients, so I should be allowed to steal them. How else can I make money??

Alternatively:

OpenAI is no different from pirate streaming sites in this regard (loosely: streaming sites are way more useful to humanity). If OpenAI gets a pass, so should every site that’s been shut down for piracy.

ArchRecord · 1 year ago

If OpenAI wants a pass, then just like how piracy services make content freely open and available, they should make their models open.

Give me the weights, publish your datasets, slap on a permissive license.

If you’re not willing to contribute back to society with what you used from it, then you shouldn’t exist within society until you do so.

@CrayonMaster@midwest.social · 1 year ago

Piracy steals from the rich and gives to the poor. ChatGPT steals from the rich and the poor and keeps for itself.

ArchRecord · 1 year ago

and keeps for itself.

Which is why they should be legally compelled to publicize all of their datasets, models, research, and share any profits they’ve made with the works they can get provenance data for, because otherwise, it’s an unfair use of the public sphere of content.

One could very easily argue that adblockers are piracy, and those would be stealing from every social media creator, small blog, and independent news site, but I don’t see many people arguing against that, even though that very well includes people who aren’t wealthy corporations.

The issue isn’t necessarily the use of the copyrighted content, it’s the unfair legal stance taken on who can use the content, and how they are allowed to profit (or not profit) from it.

I’m not saying there are no downsides, but I do feel like a simple black and white dichotomy doesn’t properly outline how piracy and generative AI training are relatively similar in terms of who they steal from, and it’s more of a matter of what is done with the content after it is taken that truly matters most.

@hddsx@lemmy.ca · 1 year ago

No they shouldn’t. They should cease to exist

TimeSquirrel · 1 year ago

Generative AI is not going back into the bag. If not OpenAI, then someone else will control it. So we deal with them the next best way, force them to serve us, the people.

@leftzero@lemmynsfw.com · 1 year ago

Generative AI is not going back into the bag.

It probably will, though, once model collapse sets in.

That’s the irony, really… the more successful it is, the sooner it’ll poison itself to death.

Admiral Patrick · 1 year ago

Then they can either pay for the copyrighted data they want to train on or lobby for copyright to be reigned in for everyone. Right now, they’re acting like entitled twats with a shit business model demanding they get a free pass while the rest of us would be bankrupted for downloading a Metallica MP3.

ArchRecord · 1 year ago

I think this better solves the issue.

The problem isn’t necessarily the use of copyrighted works, (although it can be a problem in many ways) it’s the unfair legal determination of who is allowed to do so.

@hddsx@lemmy.ca · 1 year ago

Nobody should profit from copyright violation. Yes, copyright law needs to change, but making money isn’t an exception

1 year ago

Good luck putting the cat back in the bag.

@Kalysta@lemm.ee · 1 year ago

Well if everyone who’s copyrighted work independently sues OpenAI, that cat will be deceased real quick due to bankruptcy

1 year ago

Fuck copyright they used gplv3 code why isnt it open source

@hddsx@lemmy.ca · 1 year ago

I have cats. Putting them back in a bag or box is easier

@masterspace@lemmy.ca · 1 year ago

K, so Google should be shut down too?

They can’t operate without scraping copyrighted data.

@MoogleMaestro@lemmy.zip · edit-2 1 year ago

This is a false equivalency.

Google used to act as a directory for the internet along with other web search services. In court, they argued that the content they scrapped wasn’t easily accessible through the searches alone and had statistical proof that the search engine was helping bring people to more websites, not preventing them from going. At the time, they were right. This was the “good” era of Google, a different time period and company entirely.

Since then, Google has parsed even more data, made that data easily available in the google search results pages directly (avoiding link click-throughs), increased the number of services they provide to the degree that they have a conflict of interest on the data they collect and a vested interest in keeping people “on google” and off the other parts of the web, and participated in the same bullshit policies that OpenAI started with their Gemini project. Whatever win they had in the 2000s against book publishers, it could be argued that the rights they were “afforded” back in those days were contingent on them being good-faith participants and not competitors. OpenAI and “summary” models that fail to reference sources with direct links, make hugely inaccurate statements, and generate “infinite content” by mashing together letters in the worlds most complicated markov chain fit in this category.

It turns out, if you’re afforded the rights to something on a technicality, it’s actually pretty dumb to become brazen and assume that you can push these rights to the breaking point.

Admiral Patrick · 1 year ago

Google (and search engines in general) is at least providing a service by indexing and making discoverable the websites they crawl. OpenAI is is just hoovering up the data and providing nothing in return. Socializing the cost, privatizing the profits.

@masterspace@lemmy.ca · edit-2 1 year ago

Uh, that’s objectively false.

OoenAI also provides ChatGPT as a “free” service, and Google has made billions off of that “free” service they oh so altruistically provide you.

teft · 1 year ago

Google points to your content so others can find it.

OpenAI scrapes your content to use to make more content.

@masterspace@lemmy.ca · 1 year ago

That’s not a meaningful distinction, I spent all day using a Copilot search engine because the answers I wanted were scattered across a bunch of different documentation sites.

It was both using the AI models to interpret my commands (not generation at all), and then only publishes content to me specifically.

teft · 1 year ago

I’m talking about the training phase of LLMs.that is the portion that is doing the scraping and generation of copy written data.

You using an already trained LLM to do some searches is not the same thing.

ℍ𝕂-𝟞𝟝 · 1 year ago

Technically it is meaningful, fair use is for specifically things that don’t replace the original in function.

@masterspace@lemmy.ca · 1 year ago

Depends on what the function was. If the function was to drive ad revenue to your site, then sure, if the function was to get information into the public, then it’s not replacing the function so much as altering and updating it.

@BakerBagel@midwest.social · 1 year ago

It’s absolutely a meaningful distinction. Search engines push people to tour website where you can capitalize on your audience however you see fit. LLM’s take your content, through them through the mixer and sell it back to people. It’s the difference between a movie reviewer explaining a movie and a dude in an alley selling a pirated copy of the movie.

@masterspace@lemmy.ca · edit-2 1 year ago

A) An LLM does not inherently sell you anything. Some companies charge you to run and use their LLMs (OpenAI), and some companies publish their LLMs open source for anyone to use (Meta, Microsoft). With neural chips starting to pop in PCs and phones, pretty soon anyone will be able to run an open source LLM locally on their machine, completely for free.

B) LLMs still rarely regurgitate the exact same original source. This would be more like someone in the back alley putting on their own performance of the movie and morphing it and adjusting it in real time based on your prompts and comments, which is a lot closer to parody and fair use than blatant piracy.

@foggenbooty@lemmy.world · 1 year ago

This is actually a very good comparison because restaurants use this argument all the time, except for wages:

“I can’t make money running my restaurant if I have to pay a living wage to my servers, so you should pay them with tips. How else can we stay open?”

These business that can’t operate profitably like any other business should fail.

@Strawberry@lemmy.blahaj.zone · 1 year ago

If they win, we can just train a CNN on a single 4k hdr movie until it’s extremely fitted, and then it’s legal to redistribute

Net_Runner :~$ · 1 year ago

Copyright is a pain in the ass, but Sam Altman is a bigger pain in the ass. Send him to prison and let him rot. Then put his tears in a cup and I’ll drink them

@atrielienz@lemmy.world · 1 year ago

I do not care. Get a real job.

@affiliate@lemmy.world · 1 year ago

“Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.”

exactly which “needs” are they trying to meet?

@Thann@lemmy.ml · 1 year ago

slaps roof of coffin

So what would it take to get you in one of these?

@MehBlah@lemmy.world · 1 year ago

Perhaps they should go back to what they were before the greed machine was spun up.

@Blackmist@feddit.uk · 1 year ago

I should just be allowed to take whatever I want from the shops because I don’t have enough money to buy it!

@Treczoks@lemmy.world · edit-2 1 year ago

If a company cannot do business without breaking the law it simply is a criminal organisation. RICO act, anyone?

@Juice@midwest.social · 1 year ago

Does anyone else hear that? Its the worlds smallest AI violin playing the saddest song composed by an AI