Part of the “gobble all the data” perspective is that you need a broad corpus to be meaningfully useful. Not many people are going to give a $892 billion market cap when your model is a genius about a handful of narrow subjects that you could get deep volunteer support on.
OTOH maybe there’s probably a sane business in narrow siloed (cheap and efficient and more bounded expectations) AI products: the reinvention of the “expert system” with clear guardrails, the image generator that only does seaside background landscapes but can’t generate a cat to save its life, the LLM that’s a prettified version of a knowledgebase search and NOTHING MORE
You’ve highlighted exactly why I also fundamentally disagree that all things AI are for-profit. This should be 100% non-profit and driven purely by scientific goals, in which case using copyrighted data wouldn’t even be an issue in the first place… It’d be like literally giving someone access to a public library.
But what data would it be?
Part of the “gobble all the data” perspective is that you need a broad corpus to be meaningfully useful. Not many people are going to give a $892 billion market cap when your model is a genius about a handful of narrow subjects that you could get deep volunteer support on.
OTOH maybe there’s probably a sane business in narrow siloed (cheap and efficient and more bounded expectations) AI products: the reinvention of the “expert system” with clear guardrails, the image generator that only does seaside background landscapes but can’t generate a cat to save its life, the LLM that’s a prettified version of a knowledgebase search and NOTHING MORE
You’ve highlighted exactly why I also fundamentally disagree that all things AI are for-profit. This should be 100% non-profit and driven purely by scientific goals, in which case using copyrighted data wouldn’t even be an issue in the first place… It’d be like literally giving someone access to a public library.