Part of the “gobble all the data” perspective is that you need a broad corpus to be meaningfully useful. Not many people are going to give a $892 billion market cap when your model is a genius about a handful of narrow subjects that you could get deep volunteer support on.
OTOH maybe there’s probably a sane business in narrow siloed (cheap and efficient and more bounded expectations) AI products: the reinvention of the “expert system” with clear guardrails, the image generator that only does seaside background landscapes but can’t generate a cat to save its life, the LLM that’s a prettified version of a knowledgebase search and NOTHING MORE
You’ve highlighted exactly why I also fundamentally disagree that all things AI are for-profit. This should be 100% non-profit and driven purely by scientific goals, in which case using copyrighted data wouldn’t even be an issue in the first place… It’d be like literally giving someone access to a public library.
In Spain we trained an AI using a mix of public resources available for AI training and public resources (legislation, congress sessions, etc). And the AI turned out quite good. Obviously not top of the line, but very good overall.
Sad to see you leave (not really, tho’), love to watch you go!
But what data would it be?
Part of the “gobble all the data” perspective is that you need a broad corpus to be meaningfully useful. Not many people are going to give a $892 billion market cap when your model is a genius about a handful of narrow subjects that you could get deep volunteer support on.
OTOH maybe there’s probably a sane business in narrow siloed (cheap and efficient and more bounded expectations) AI products: the reinvention of the “expert system” with clear guardrails, the image generator that only does seaside background landscapes but can’t generate a cat to save its life, the LLM that’s a prettified version of a knowledgebase search and NOTHING MORE
You’ve highlighted exactly why I also fundamentally disagree that all things AI are for-profit. This should be 100% non-profit and driven purely by scientific goals, in which case using copyrighted data wouldn’t even be an issue in the first place… It’d be like literally giving someone access to a public library.
In Spain we trained an AI using a mix of public resources available for AI training and public resources (legislation, congress sessions, etc). And the AI turned out quite good. Obviously not top of the line, but very good overall.
It was a public project not a private company.