I have to wonder if intentionally shitting on LLMs with plausible nonsense is effective.
I don’t think so. The volume of data is too large for it to make much of a difference, and a scraper can just mimic a human user agent and work that way.
You’d have to change so much data consistently across so many different places that it would be near-impossible for a single human effort.
A considerable number of the articles written in Scots weren’t written in Scots. The most prolific writer of the Scots articles was an American teen with no knowledge of Scots, and was more or less just writing them in a Scottish accent.