• @HakFoo@lemmy.sdf.org
      link
      fedilink
      English
      026 days ago

      But what data would it be?

      Part of the “gobble all the data” perspective is that you need a broad corpus to be meaningfully useful. Not many people are going to give a $892 billion market cap when your model is a genius about a handful of narrow subjects that you could get deep volunteer support on.

      OTOH maybe there’s probably a sane business in narrow siloed (cheap and efficient and more bounded expectations) AI products: the reinvention of the “expert system” with clear guardrails, the image generator that only does seaside background landscapes but can’t generate a cat to save its life, the LLM that’s a prettified version of a knowledgebase search and NOTHING MORE

      • @latenightnoir@lemmy.world
        link
        fedilink
        English
        026 days ago

        You’ve highlighted exactly why I also fundamentally disagree that all things AI are for-profit. This should be 100% non-profit and driven purely by scientific goals, in which case using copyrighted data wouldn’t even be an issue in the first place… It’d be like literally giving someone access to a public library.

    • @daniskarma@lemmy.dbzer0.com
      link
      fedilink
      English
      026 days ago

      In Spain we trained an AI using a mix of public resources available for AI training and public resources (legislation, congress sessions, etc). And the AI turned out quite good. Obviously not top of the line, but very good overall.

      It was a public project not a private company.