Those claiming AI training on copyrighted works is “theft” misunderstand key aspects of copyright law and AI technology. Copyright protects specific expressions of ideas, not the ideas themselves. When AI systems ingest copyrighted works, they’re extracting general patterns and concepts - the “Bob Dylan-ness” or “Hemingway-ness” - not copying specific text or images.
This process is akin to how humans learn by reading widely and absorbing styles and techniques, rather than memorizing and reproducing exact passages. The AI discards the original text, keeping only abstract representations in “vector space”. When generating new content, the AI isn’t recreating copyrighted works, but producing new expressions inspired by the concepts it’s learned.
This is fundamentally different from copying a book or song. It’s more like the long-standing artistic tradition of being influenced by others’ work. The law has always recognized that ideas themselves can’t be owned - only particular expressions of them.
Moreover, there’s precedent for this kind of use being considered “transformative” and thus fair use. The Google Books project, which scanned millions of books to create a searchable index, was ruled legal despite protests from authors and publishers. AI training is arguably even more transformative.
While it’s understandable that creators feel uneasy about this new technology, labeling it “theft” is both legally and technically inaccurate. We may need new ways to support and compensate creators in the AI age, but that doesn’t make the current use of copyrighted works for AI training illegal or unethical.
For those interested, this argument is nicely laid out by Damien Riehl in FLOSS Weekly episode 744. https://twit.tv/shows/floss-weekly/episodes/744
I’ll train my AI on just the bee movie. Then I’m going to ask it “can you make me a movie about bees”? When it spits the whole movie, I can just watch it or sell it or whatever, it was a creation of my AI, which learned just like any human would! Of course I didn’t even pay for the original copy to train my AI, it’s for learning purposes, and learning should be a basic human right!
That would be like you writing out the bee movie yourself after memorizing the whole movie and claiming it is your own idea or using it as proof that humans memorizing a movie is violating copyright. Just because an AI is violating copyright by outputting the whole bee movie, it doesn’t mean training the AI on copyright stuff is violating copyright.
Let’s just punish the AI companies for outputting copyright stuff instead of for training with them. Maybe that way they would actually go out of their way to make their LLM intelligent enough to not spit out copyrighted content.
Or, we can just make it so that any output made by an AI that is trained on copyrighted stuff cannot be copyrighted.
There is actually already a website where people just recreated the bee movie by hand so idk it might actually work as a legal argument.
If the solution is making the output non-copyrighted it fixes nothing. You can sell the pirating machine on a subscription. And it’s not like Netflix where the content ends when the subscription ends, you have already downloaded all the not-copyrighted content you wanted, and the internet would be full of non-copyrighted AI output.
Instead of selling the bee movie, you sell a bee movie maker, and a spiderman maker, and a titanic maker.
Sure, file a copyright infringement each time you manage to make an AI output copyrighted content. Just run it on a loop and it’s a money making machine. That’s fine by me.
Yeah, because running the AI also have some cost, so you are selling the subscription to run the AI on their server, not it’s output.
I’m not sure what is the legality of selling a bee movie maker, so you’d have to research that one yourself.
It’s not really a money making machine if you lose more money running the AI on your server farm, but whatever floats your boat. Also, there are already lawsuits based on outputs created from chatgpt, so it is exactly what is already happening.
Yeah, making sandwiches also costs money! I have to pay my sandwich making employees to keep the business profitable! How do they expect me to pay for the cheese?
EDIT: also, you completely missed my point. The money making machine is the AI because the copyright owners could just use them every time it produces copyright-protected material if we decided to take that route, which is what the parent comment suggested.
They should pay for the cheese, I’m not arguing against that, but they should be paying it the same amount as a normal human would if they want access to that cheese. No extra fees for access to copyrighted material if you want to use it to train AI vs wanting to consume it yourself.
And I didn’t miss your point. My point was that the reality is already occurring since people are already suing OpenAI for ChatGPT outputs that the people suing are generating themselves, so it’s no longer just a hypothetical. We’ll see if it is a money making machine for them or will they just waste their resources from doing that.
Media is not exactly like cheese though. With cheese, you buy it and it’s yours. Media, however, is protected by copyright. When you watch a movie, you are given a license to watch the movie.
When an AI watches a movie, it’s not really watching it, it’s doing a different action. If the license of the movie says “you can’t use this license to train AI, use the other (more expensive) license for such purposes”, then AIs have extra fees to access the content that humans don’t have to pay.
Both humans and AI consume the content, even if they do not do so in the exact same way. I don’t see the need to differentiate that. It’s not like we have any idea of the mechanism by which humans consume a content to make the differentiation in the first place.
I don’t think that’s a feasible dream in our current system. They’ll just lobby for it, some senators will say something akin to “art should have been always a hobby, not a profession”, then make adjustments for the current copyright laws so that they can be copyrighted.
I am thrilled to see the output you get!
learning should be a basic human right!
Education is a basic human right (except maybe in Usa, then it should be one there)
Yeah. A human right.
In the meantime I’ll introduce myself into the servers of large corporations and read their emails, codebase, teams and strategic analysis, it’s just learning!
The joke is of course that “paying for copyright” is impossible in this case. ONLY the large social media companies that own all the comments and content that has accumulated by the community have enough data to train AI models. Or sites like stock photo libraries or deviantart who own the distribution rights for the content. That means all copyright arguments practically argue that AI should be owned by big corporations and should be inaccessible to normal people.
Basically the “means of generation” will be owned by the capitalists, since they are the only ones with the economic power to license these things.
That is basically the worst case scenario. Not only will the value of work diminish greatly, the advances in productivity will also be only accessible to big capitalists.
Of course, that is basically inevitable anyway. Why wouldn’t they want this? It’s just sad seeing the stupid morons arguing for this as if they had anything to gain.
It’s just sad seeing the stupid morons arguing for this as if they had anything to gain.
The real money shot here… How did we get to a point where people will argue against common working slave good?
There is a pattern too… Iraq, Afghanistan, israeli genocide, bailouts. Anytime there is money to be made for the regime, we got solid 30% of population working as hard for zealots.
Them 2 decades later when the two wars failed, we can’t find a single guy who support either war around 🤡
The same is somehow now shilling we “shouldn’t invafe ukraine but Israeli needs tools to defend themselves”
I’m getting really tired of saying this over and over on the Internet and getting either ignored or pounced on by pompous AI bros and boomers, but this “there isn’t enough free data” claim has never been tested. The experiments that have come close (look up the early Phi and Starcoder papers, or the CommonCanvas text-to-image model) suggested that the claim is false, by showing that a) models trained on small, well-curated datasets can match and outperform models trained on lazily curated large web scrapes, and b) models trained solely on permissively licensed data can perform on par with at least the earlier versions of models trained more lazily (e.g. StarCoder 1.5 performing on par with Code-Davinci). But yes, a social network or other organization that has access to a bunch of data that they own, or have licensed, could almost certainly fine-tune a base LLM trained solely on permissively licensed data to get a tremendously useful tool that would probably be safer and more helpful than ChatGPT for that organization’s specific business, at vastly lower risk of copyright claims or toxic generated content, for that matter.
Thanks for the info. But lets say you want to train a (future) AI to spot and tag disinformation and misinformation. You’d need to use and curate actual data from social media sites and articles.
If copyright is extended to learning from and analyzing publicly available data, such an AI will only be possible by licensing that data. Which will be monetize to maximize profit, first some lump sum, then later “per gb” and then later “per use”.
I’m sure open source AI will make due and for many applications there is enough free data, but I can imagine a lot of cases where there wont. Anything that requires “commercially successful” media, articles, newspapers, screenplays, movies, books, social media posts and comments, images, photos, video clips…
We’re basically setting up a world where the intellectual wealth of our civilization is being transformed into a commodity and then will be transferred into the hands of a few rich capitalists.
And even if there is acceptable amount of free data, if the principle is that data needs to be specifically licensed to learn and train and derive AI works from it - that makes free data use expensive too. It needs to be specifically vetted and is still vulnerable to be sued for mistakes or outrageous claims of copyright. Similar to patents, the uncertainty requires higher capitalization for any startup to defend against lawsuits.
Copyright laws protects the ability of copyright holder to make money. The laws were created before AI and now obviously have to be adapted to new technology (like you didn’t really need copyright before the invention of printing). How exactly AI will be regulated is in the end up to society to decide, which most likely will come down who has the better lobby.
Considering that original works are discarded, it’s strange how effective they’re at plagiarizing them
Yep, its definitely not possible that nice small businesses like universal and sony would sue without an actual case in order to try and crush competitors with costs.
In the same way that a person can learn the material and also use that knowledge to potentially plagiarize it, though. It’s no different in that sense. What is different is the speed of learning and both the speed and capacity of recall. However, it doesn’t change the fundamental truths of OP’s explanation.
Also, when you’re talking specifically about music, you’re talking about a very limited subset of note combinations that will sound pleasing to human ears. Additionally, even human composers commonly struggle to not simply accidentally reproduce others’ work, which is partly why the music industry is filled with constant copyright litigation.
I mean saying they learn is huge kudos to the people that made this tbh
The “you wouldn’t download a car” statement is made against personal cases of piracy, which got rightfully clowned upon. It obviously doesn’t work at all when you use its ridiculousness to defend big ass corporations that tries to profit from so many of the stuff they “downloaded”.
Besides, it is not “theft”. It is “plagiarism”. And I’m glad to see that people that tries to defend these plagiarism machines that are attempted to be humanised and inflated to something they can never be, gets clowned. It warms my heart.
Though I am not a lawyer by training, I have been involved in such debates personally and professionally for many years. This post is unfortunately misguided. Copyright law makes concessions for education and creativity, including criticism and satire, because we recognize the value of such activities for human development. Debates over the excesses of copyright in the digital age were specifically about humans finding the application of copyright to the internet and all things digital too restrictive for their educational, creative, and yes, also their entertainment needs. So any anti-copyright arguments back then were in the spirit specifically protecting the average person and public-serving non-profit institutions, such as digital archives and libraries, from big copyright owners who would sue and lobby for total control over every file in their catalogue, sometimes in the process severely limiting human potential.
AI’s ingesting of text and other formats is “learning” in name only, a term borrowed by computer scientists to describe a purely computational process. It does not hold the same value socially or morally as the learning that humans require to function and progress individually and socially.
AI is not a person (unless we get definitive proof of a conscious AI, or are willing to grant every implementation of a statistical model personhood). Also AI it is not vital to human development and as such one could argue does not need special protections or special treatment to flourish. AI is a product, even more clearly so when it is proprietary and sold as a service.
Unlike past debates over copyright, this is not about protecting the little guy or organizations with a social mission from big corporate interests. It is the opposite. It is about big corporate interests turning human knowledge and creativity into a product they can then use to sell services to - and often to replace in their jobs - the very humans whose content they have ingested.
See, the tables are now turned and it is time to realize that copyright law, for all its faults, has never been only or primarily about protecting large copyright holders. It is also about protecting your average Joe from unauthorized uses of their work. More specifically uses that may cause damage, to the copyright owner or society at large. While a very imperfect mechanism, it is there for a reason, and its application need not be the end of AI. There’s a mechanism for individual copyright owners to grant rights to specific uses: it’s called licensing and should be mandatory in my view for the development of proprietary LLMs at least.
TL;DR: AI is not human, it is a product, one that may augment some tasks productively, but is also often aimed at replacing humans in their jobs - this makes all the difference in how we should balance rights and protections by law.
What do you think “ingesting” means if not learning?
Bear in mind that training AI does not involve copying content into its database, so copyright is not an issue. AI is simply predicting the next token /word based on statistics.
You can train AI in a book and it will give you information from the book - information is not copyrightable. You can read a book a talk about its contents on TV - not illegal if you’re a human, should it be illegal if you’re a machine?
There may be moral issues on training on someone’s hard gathered knowledge, but there is no legislature against it. Reading books and using that knowledge to provide information is legal. If you try to outlaw Automating this process by computers, there will be side effects such as search engines will no longer be able to index data.
Bear in mind that training AI does not involve copying content into its database, so copyright is not an issue.
Wrong. The infringement is in obtaining the data and presenting it to the AI model during the training process. It makes no difference that the original work is not retained in the model’s weights afterwards.
You can train AI in a book and it will give you information from the book - information is not copyrightable. You can read a book a talk about its contents on TV - not illegal if you’re a human, should it be illegal if you’re a machine?
Yes, because copyright law is intended to benefit human creativity.
If you try to outlaw Automating this process by computers, there will be side effects such as search engines will no longer be able to index data.
Wrong. Search engines retain a minimal amount of the indexed website’s data, and the purpose of the search engine is to generate traffic to the website, providing benefit for both the engine and the website (increased visibility, the opportunity to show ads to make money). Banning the use of copyrighted content for AI training (which uses the entire copyrighted work and whose purpose is to replace the organizations whose work is being used) will have no effect.
What do you mean that the search engines contain minimal amount of site’s data? Obviously it needs to index all contents to make it searchable. If you search for keywords within an article, you can find the article, therefore all of it needs to be indexed.
Indexing is nothing more than “presenting data to the algorithm” so it’d be against the law to index a site under your proposed legislation.
Wrong. The infringement is in obtaining the data and presenting it to the AI model during the training process. It makes no difference that the original work is not retained in the model’s weights afterwards.
This is an interesting take, I’d be inclined to agree, but you’re still facing the problem of how to distinguish training AI from indexing for search purposes. I’m afraid you can’t have it both ways.
AI are people, my friend. /s
But, really, I think people should be able to run algorithms on whatever data they want. It’s whether the output is sufficiently different or “transformative” that matters (and other laws like using people’s likeness). Otherwise, I think the laws will get complex and nonsensical once you start adding special cases for “AI.” And I’d bet if new laws are written, they’d be written by lobbiests to further erode the threat of competition (from free software, for instance).
We have hundreds of years of out of copyright books and newspapers. I look forward to interacting with old-timey AI.
“Fiddle sticks! These mechanical horses will never catch on! They’re far too loud and barely more faster than a man can run!”
“A Woman’s place is raising children and tending to the house! If they get the vote, what will they demand next!? To earn a Man’s wage!?”
That last one is still relevant to today’s discourse somehow!?
The argument that these models learn in a way that’s similar to how humans do is absolutely false, and the idea that they discard their training data and produce new content is demonstrably incorrect. These models can and do regurgitate their training data, including copyrighted characters.
And these things don’t learn styles, techniques, or concepts. They effectively learn statistical averages and patterns and collage them together. I’ve gotten to the point where I can guess what model of image generator was used based on the same repeated mistakes that they make every time. Take a look at any generated image, and you won’t be able to identify where a light source is because the shadows come from all different directions. These things don’t understand the concept of a shadow or lighting, they just know that statistically lighter pixels are followed by darker pixels of the same hue and that some places have collections of lighter pixels. I recently heard about an ai that scientists had trained to identify pictures of wolves that was working with incredible accuracy. When they went in to figure out how it was identifying wolves from dogs like huskies so well, they found that it wasn’t even looking at the wolves at all. 100% of the images of wolves in its training data had snowy backgrounds, so it was simply searching for concentrations of white pixels (and therefore snow) in the image to determine whether or not a picture was of wolves or not.
Basing your argument around how the model or training system works doesn’t seem like the best way to frame your point to me. It invites a lot of mucking about in the details of how the systems do or don’t work, how humans learn, and what “learning” and “knowledge” actually are.
I’m a human as far as I know, and it’s trivial for me to regurgitate my training data. I regularly say things that are either directly references to things I’ve heard, or accidentally copy them, sometimes with errors.
Would you argue that I’m just a statistical collage of the things I’ve experienced, seen or read? My brain has as many copies of my training data in it as the AI model, namely zero, but “Captain Picard of the USS Enterprise sat down for a rousing game of chess with his friend Sherlock Holmes, and then Shakespeare came in dressed like Mickey mouse and said ‘to be or not to be, that is the question, for tis nobler in the heart’ or something”. Direct copies of someone else’s work, as well as multiple copyright infringements.
I’m also shit at drawing with perspective. It comes across like a drunk toddler trying their hand at cubism.Arguing about how the model works or the deficiencies of it to justify treating it differently just invites fixing those issues and repeating the same conversation later. What if we make one that does work how humans do in your opinion? Or it properly actually extracts the information in a way that isn’t just statistically inferred patterns, whatever the distinction there is? Does that suddenly make it different?
You don’t need to get bogged down in the muck of the technical to say that even if you conceed every technical point, we can still say that a non-sentient machine learning system can be held to different standards with regards to copyright law than a sentient person. A person gets to buy a book, read it, and then carry around that information in their head and use it however they want. Not-A-Person does not get to read a book and hold that information without consent of the author.
Arguing why it’s bad for society for machines to mechanise the production of works inspired by others is more to the point.Computers think the same way boats swim. Arguing about the difference between hands and propellers misses the point that you don’t want a shrimp boat in your swimming pool. I don’t care why they’re different, or that it technically did or didn’t violate the “free swim” policy, I care that it ruins the whole thing for the people it exists for in the first place.
I think all the AI stuff is cool, fun and interesting. I also think that letting it train on everything regardless of the creators wishes has too much opportunity to make everything garbage. Same for letting it produce content that isn’t labeled or cited.
If they can find a way to do and use the cool stuff without making things worse, they should focus on that.Arguing why it’s bad for society for machines to mechanise the production of works inspired by others is more to the point.
I agree, but the fact that shills for this technology are also wrong about it is at least interesting.
Rhetorically speaking, I don’t know if that’s useless.
I don’t care why they’re different, or that it technically did or didn’t violate the “free swim” policy,
I do like this point a lot.
If they can find a way to do and use the cool stuff without making things worse, they should focus on that.
I do miss when the likes of cleverbot was just a fun novelty on the Internet.
I’m not the above poster, but I really appreciate your argument. I think many people overcorrect in their minds about whether or not these models learn the way we do, and they miss the fact that they do behave very similarly to parts of our own systems. I’ve generally found that that overcorrection leads to bad arguments about copyright violation and ethical concerns.
However, your point is very interesting (and it is thankfully independent of that overcorrection). We’ve never had to worry about nonhuman personhood in any amount of seriousness in the past, so it’s strangely not obvious despite how obvious it should be: it’s okay to treat real people as special, even in the face of the arguable personhood of a sufficiently advanced machine. One good reason the machine can be treated differently is because we made it for us, like everything else we make.
I think there still is one related but dangling ethical question. What about machines that are made for us but we decide for whatever reason that they are equivalent in sentience and consciousness to humans?
A human has rights and can take what they’ve learned and make works inspired by it for money, or for someone else to make money through them. They are well within their rights to do so. A machine that we’ve decided is equivalent in sentience to a human, though… can that nonhuman person go take what it’s learned and make works inspired by it so that another person can make money through them?
If they SHOULDN’T be allowed to do that, then it’s notable that this scenario is only separated from what we have now by a gap in technology.
If they SHOULD be allowed to do that (which we could make a good argument for, since we’ve agreed that it is a sentient being) then the technology gap is again notable.
I don’t think the size of the technology gap actually matters here, logically; I think you can hand-wave it away pretty easily and apply it to our current situation rather than a future one. My guess, though, is that the size of the gap is of intuitive importance to anyone thinking about it (I’m no different) and most people would answer one way or the other depending on how big they perceive the technology gap to be.
Another good question is why AIs do not mindlessly regurgitate source material. The reason is that they have access to so much copyrighted material. If they were trained on only one book, they would constantly regurgitate material from that one book. Because it’s trained on many (millions) books, it’s able to get creative. So the argument of OpenAI really boils down to: “we are not breaking copyright law, because we have used sufficient copyrighted material to avoid directly infringing on copyright”.
Even if they learned exactly like humans do, like so fucking what, right!? Humans have to pay EXORBITANT fees for higher education in this country. Arguing that your bot gets socialized education before the people do is fucking absurd.
That seems more like an argument for free higher education rather than restricting what corpuses a deep learning model can train on
Tomato, tomato…
Porque no los dos? Allowing major corps to put even more downward pressure on workers doesn’t help anyone but the rich. LLMs aren’t going to save the world or become sentient.
Devil’s Advocate:
How do we know that our brains don’t work the same way?
Why would it matter that we learn differently than a program learns?
Suppose someone has a photographic memory, should it be illegal for them to consume copyrighted works?
Because we’re talking pattern recognition levels of learning. At best, they’re the equivalent of parrots mimicking human speech. They take inputs and output data based on the statistical averages from their training sets - collaging pieces of their training into what they think is the right answer. And I use the word think here loosely, as this is the exact same process that the Gaussian blur tool in Photoshop uses.
This matters in the context of the fact that these companies are trying to profit off of the output of these programs. If somebody with an eidetic memory is trying to sell pieces of works that they’ve consumed as their own - or even somebody copy-pasting bits from Clif Notes - then they should get in trouble; the same as these companies.
Given A and B, we can understand C. But an LLM will only be able to give you AB, A(b), and B(a). And they’ve even been just spitting out A and B wholesale, proving that they retain their training data and will regurgitate the entirety of copyrighted material.
The solution is any AI must always be released on a strong copyleft and possibly abolish copyright outright has it has only served the powerful by allowing them to enclose humanity common intellectual heritage (see Disney’s looting and enclosing if ancestral children stories). If you choose to strengthen the current regime, don’t expect things to improve for you as an irrelevant atomised individual,
I am also not really getting the argument. If I as a human want to learn a subject from a book I buy it ( or I go to a library who paid for it). If it’s similar to how humans learn, it should cost equally much.
The issue is of course that it’s not at all similar to how humans learn. It needs VASTLY more data to produce something even remotely sensible. Develop AI that’s truly transformative, by making it as efficient as humans are in learning, and the cost of paying for copyright will be negligible.
If I as a human want to learn a subject from a book, I buy it
xD
That’s good.Dude never heard of a library. I only bought a handful of books during my degree, I would’ve been homeless if I had to buy a copy of every learning source
That was literally in my post. Obviously, in that case the library pays for copyright
Your taxes pay for the library.
If I as a human want to learn a subject from a book I buy it ( or I go to a library who paid for it). If it’s similar to how humans learn, it should cost equally much.
You’re on Lemmy where people casually says “piracy is morally the right thing to do”, so I’m not sure this argument works on this platform.
I know my way around the Jolly Roger myself. At the same time using copyrighted materials in a commercial setting (as OpenAI does) shouldn’t be free.
Only if they are selling the output. I see it as more they are selling access to the service on a server farm, since running ChatGPT is not cheap.
That’s their problem, hands off my material (if I had any).
The usual cycle of tech-bro capitalism would put them currently on the early acquire market saturation stage. So it’s unlikely that they are currently charging what they will when they are established and have displaced lots of necessary occupations.
Imagine if you had blinders and earmuffs on for most of the day, and only once in a while were you allowed to interact with certain people and things. Your ability to communicate would be truncated to only what you were allowed to absorb.
I hate to say this but “let the market decide” if Ai is something the consumer wants/needs they’ll pay for it otherwise let it die.
Okay that’s just stupid. I’m really fond of AI but that’s just common Greed.
“Free the Serfs?! We can’t survive without their labor!!” “Stop Child labour?! We can’t survive without them!” “40 Hour Work Week?! We can’t survive without their 16 Hour work Days!”
If you can’t make profit yet, then fucking stop.
The argument seem most commonly from people on fediverse (which I happen to agree with) is really not about what current copyright laws and treaties say / how they should be interpreted, but how people view things should be (even if it requires changing laws to make it that way).
And it fundamentally comes down to economics - the study of how resources should be distributed. Apart from oligarchs and the wannabe oligarchs who serve as useful idiots for the real oligarchs, pretty much everyone wants a relatively fair and equal distribution of wealth amongst the people (differing between left and right in opinion on exactly how equal things should be, but there is still some common ground). Hardly anyone really wants serfdom or similar where all the wealth and power is concentrated in the hands of a few (obviously it’s a spectrum of how concentrated, but very few people want the extreme position to the right).
Depending on how things go, AI technologies have the power to serve humanity and lift everyone up equally if they are widely distributed, removing barriers and breaking existing ‘moats’ that let a few oligarchs hoard a lot of resources. Or it could go the other way - oligarchs are the only ones that have access to the state of the art model weights, and use this to undercut whatever they want in the economy until they own everything and everyone else rents everything from them on their terms.
The first scenario is a utopia scenario, and the second is a dystopia, and the way AI is regulated is the fork in the road between the two. So of course people are going to want to cheer for regulation that steers towards the utopia.
That means things like:
- Fighting back when the oligarchs try to talk about ‘AI Safety’ meaning that there should be no Open Source models, and that they should tightly control how and for what the models can be used. The biggest AI Safety issue is that we end up in a dystopian AI-fueled serfdom, and FLOSS models and freedom for the common people to use them actually helps to reduce the chances of this outcome.
- Not allowing ‘AI washing’ where oligarchs can take humanities collective work, put it through an algorithm, and produce a competing thing that they control - unless everyone has equal access to it. One policy that would work for this would be that if you create a model based on other people’s work, and want to use that model for a commercial purpose, then you must publicly release the model and model weights. That would be a fair trade-off for letting them use that information for training purposes.
Fundamentally, all of this is just exacerbating cracks in the copyright system as a policy. I personally think that a better system would look like this:
- Everyone gets a Universal Basic Income paid, and every organisation and individual making profit pays taxes in to fund the UBI (in proportion to their profits).
- All forms of intellectual property rights (except trademarks) are abolished - copyright, patents, and trade secrets are no longer enforced by the law. The UBI replaces it as compensation to creators.
- It is illegal to discriminate against someone for publicly disclosing a work they have access to, as long as they didn’t accept valuable consideration to make that disclosure. So for example, if an OpenAI employee publicly released the model weights for one of OpenAI’s models without permission from anyone, it would be illegal for OpenAI to demote / fire / refuse to promote / pay them differently on that basis, and for any other company to factor that into their hiring decision. There would be exceptions for personally identifiable information (e.g. you can’t release the client list or photos of real people without consequences), and disclosure would have to be public (i.e. not just to a competitor, it has to be to everyone) and uncompensated (i.e. you can’t take money from a competitor to release particular information).
If we had that policy, I’d be okay for AI companies to be slurping up everything and training model weights.
However, with the current policies, it is pushing us towards the dystopic path where AI companies take what they want and never give anything back.
You should look at the energy cost of AI. It’s not a miracle machine.
I agree that this is a major concern, especially if non-renewable energy is used, and until the production process for computer technology and solar panels is much more of a circular economy. More renewable energy and circular economies, and following the sun for AI training and inference (it isn’t going to be low latency anyway, so if you need AI inference in the northern hemisphere night, just do it on the other side of the world) could greatly decrease the impact.
Not even stealing cheese to run a sandwich shop.
Stealing cheese to melt it all together and run a cheese shop that undercuts the original cheese shops they stole from.
Whatever happened to copying isn’t stealing?
I think the crux of the conversation is whether or not the world is better with ChatGPT. I say yes. We can tackle the disinformation in another effort.
When you copy to consume yourself it’s way different than when you copy to sell the copy for a lower price.
They’re not selling the copy, bruh. They’re selling a technology that very few understand. Smart people pretend they get it, but they don’t. That’s how rare the math is.
So because you don’t understand it, everything it does should be legal?
It’s not rare maths. There are trns of thousands of AI experts. And most CS graduates (millions) have a good understanding on how they work, just not the specifics of the maths.
Yeah, they’re not selling a copy, they are just selling a subscription to a copying machine loaded with the information needed to make a copy. Totally different.
I should start a business of printers and attach a USB with the PNG of a dollar bill. And of course my printers won’t have any government mandated firmware that disables printing fake money.
I’m not printing fake money! It’s my clients! Totally legal.
A perfect analogy.
I don’t feel it is. They aren’t saying that their physical requirements should be free (computers, engineers, programmers, electricity, etc…) which is what is being used for the analogy (cheese, ingredients, etc…).
It would be better to claim “I run a sandwich shop and couldn’t afford to run it if I had to pay for every recipe, idea, and technique I use in the business.”
Now, it’s not as simple as this, and I’m not claiming it is. But this example isn’t anywhere near correct. It’s like the old claim that pirating something is the same as stealing it. The usage on one thing doesn’t equal the loss of something physical.
It’s one of those reasons why laws about this are difficult. Too strict and no one would be able to do “fan”-anything and many other issues (“if it uses AI” takes out many digital tools, etc…), too loose and you don’t really have laws at all.
deleted by creator
“No, not like that!”
This process is akin to how humans learn…
I’m so fucking sick of people saying that. We have no fucking clue how humans LEARN. Aka gather understanding aka how cognition works or what it truly is. On the contrary we can deduce that it probably isn’t very close to human memory/learning/cognition/sentience (any other buzzword that are stands-ins for things we don’t understand yet), considering human memory is extremely lossy and tends to infer its own bias, as opposed to LLMs that do neither and religiously follow patters to their own fault.
It’s quite literally a text prediction machine that started its life as a translator (and still does amazingly at that task), it just happens to turn out that general human language is a very powerful tool all on its own.
I could go on and on as I usually do on lemmy about AI, but your argument is literally “Neural network is theoretically like the nervous system, therefore human”, I have no faith in getting through to you people.
Now just if we had all famous people saying stuff like this.
But they won’t. Guess why? Because the “won’t” is what made them famous (and rich),Even worse is, in order to further humanize machine learning systems, they often give them human-like names.