AI crawlers cause Wikimedia(The umbrella organization of Wikipedia and a dozen or so other crowdsourced knowledge projects) Commons bandwidth demands to surge 50%.

Tea · edit-2 3 days ago

AI crawlers cause Wikimedia(The umbrella organization of Wikipedia and a dozen or so other crowdsourced knowledge projects) Commons bandwidth demands to surge 50%.

Mubelotix · 2 days ago

Doesn’t make any sense. Why would you crawl wikipedia when you can just download a dump as a torrent ?

@mke@programming.dev · edit-2 2 days ago

Apparently the dump doesn’t include media, though there’s ongoing discussion within wikimedia about changing that. It also seems likely to me that AI scrapers don’t care about externalizing costs onto others if it might mean a competitive advantage (e.g. most recent data, not having to spend time and resources developing dedicated ingestion systems for specific sites).

cabillaud · 2 days ago

To have the most recent data?

Rose · 2 days ago

To just have the most recent data within reasonable time frame is one thing. AI companies are like “I must have every single article within 5 minutes they get updated, or I’ll throw my pacifier out of the pram”. No regard for the considerations of the source sites.

𝙲𝚑𝚊𝚒𝚛𝚖𝚊𝚗 𝙼𝚎𝚘𝚠 · 2 days ago

AI bros aren’t that smart.

AI crawlers cause Wikimedia(The umbrella organization of Wikipedia and a dozen or so other crowdsourced knowledge projects) Commons bandwidth demands to surge 50%.

AI crawlers cause Wikimedia(The umbrella organization of Wikipedia and a dozen or so other crowdsourced knowledge projects) Commons bandwidth demands to surge 50%.

How crawlers impact the operations of the Wikimedia projects