this post was submitted on 26 Aug 2025

71 points (96.1% liked)

No Stupid Questions

43120 readers

1078 users here now

No such thing. Ask away!

!nostupidquestions is a community dedicated to being helpful and answering each others' questions on various topics.

The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:

Rules (interactive)

Rule 1- All posts must be legitimate questions. All post titles must include a question.

All posts must be legitimate questions, and all post titles must include a question. Questions that are joke or trolling questions, memes, song lyrics as title, etc. are not allowed here. See Rule 6 for all exceptions.

Rule 2- Your question subject cannot be illegal or NSFW material.

Your question subject cannot be illegal or NSFW material. You will be warned first, banned second.

Rule 3- Do not seek mental, medical and professional help here.

Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.

Rule 4- No self promotion or upvote-farming of any kind.

That's it.

Rule 5- No baiting or sealioning or promoting an agenda.

Questions which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.

Rule 6- Regarding META posts and joke questions.

Provided it is about the community itself, you may post non-question posts using the [META] tag on your post title.

On fridays, you are allowed to post meme and troll questions, on the condition that it's in text format only, and conforms with our other rules. These posts MUST include the [NSQ Friday] tag in their title.

If you post a serious question on friday and are looking only for legitimate answers, then please include the [Serious] tag on your post. Irrelevant replies will then be removed by moderators.

Rule 7- You can't intentionally annoy, mock, or harass other members.

If you intentionally annoy, mock, harass, or discriminate against any individual member, you will be removed.

Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.

Rule 8- All comments should try to stay relevant to their parent content.

Rule 9- Reposts from other platforms are not allowed.

Let everyone have their own content.

Rule 10- Majority of bots aren't allowed to participate here. This includes using AI responses and summaries.

Credits

Our breathtaking icon was bestowed upon us by @Cevilia!

The greatest banner of all time: by @TheOneWithTheHair!

founded 2 years ago

MODERATORS

technopagan@lemmy.world

L3s@lemmy.world

jeffw@lemmy.world

L3s@hackingne.ws

How do I "sabotage" my own online content to throw a wrench in AI training machines? (lemmy.world)

submitted 1 day ago by ArchmageAzor@lemmy.world to c/nostupidquestions@lemmy.world

34 comments fedilink hide all child comments

If somebody wants to use my online content to train their AI without my consent I want to at least make it difficult for them. Can I somehow "poison" the comments and images and stuff I upload to harm the training process?

top 34 comments

sorted by: hot top controversial new old

[–] ClamDrinker@lemmy.world 2 points 6 hours ago

There's really no good way - if you act normal they train on you, and if you act badly they train on you as an example of what to avoid.

My recommendation: Make sure its really hard for them to guess which you are so you hopefully end up in the wrong pile. Use slang they have a hard time pinning down, talk about controversial topics, avoid posting to places easily scraped and build spaces free from bot access. Use anonimity to make you hard to index. Anything you post publicly can be scraped sadly, but you can make it near unusable for AI models.

[–] borth@sh.itjust.works 7 points 16 hours ago* (last edited 16 hours ago) (1 children)

Images can be "glazed" with a software called "Glaze" that adds small changes to the images, so that they are unnoticeable to people, but very noticeable and confusing for an AI training on those images. [glaze.cs.uchicago.edu]

They also have another program called Nightshade that is meant to "fight back", but I'm not too sure how that one works.

[–] WeavingSpider@lemmy.world 4 points 13 hours ago

From my understanding, you choose a tag when nightshading, say hand cuz a handstudy, and when the bots take the drawing, they get poisoned data - as nightshade distorts what it "sees" (say, a human sees a vase with flowers, but it "sees" garbage bag). If enough poisoned art is scrapped, then the machine will be spitting out garbage bags instead of flower vases on dinner tables.

[–] General_Effort@lemmy.world 6 points 19 hours ago

Maybe a little, but it's like spitting in the ocean. The SEO people are now targeting genAI; calling it GEO. They might be able to help you. Take other suggestions with a grain of salt. People who hate technology are generally not very good with it.

[–] hddsx@lemmy.ca 42 points 1 day ago (1 children)

So it looks like you’re trying to sabotage online content.

The first thing you have to know is that is illegal due to the computer fraud and abuse act. Manipulating AI training data is against the law as you have already agreed to give accurate and earnest data in the Terms of Service and Privacy Policy.

Finally, even if you aren’t charged with a crime, you will be sued by xAI because you should be using grok.

[–] jwiggler@sh.itjust.works 40 points 1 day ago (1 children)

not sure how i can express how much i hate this comment. nice job.

[–] hddsx@lemmy.ca 5 points 1 day ago* (last edited 1 day ago) (2 children)

Original struggles below to make sense of Mlem devs response:

https://www.taipeitimes.com/images/2003/11/04/20031103181450.jpeg

Edit: how do you embed an image in a comment/mlem?

[–] Bongles@lemmy.zip 2 points 17 hours ago* (last edited 17 hours ago)

If it's hosted elsewhere: ![url](alt text) (It's a link with ! at the front)

[–] sjmarf@lemmy.ml 3 points 1 day ago

There's a button for it in the toolbar above the keyboard. You need to scroll horizontally on the toolbar to see the button.

[–] InvalidName2@lemmy.zip 20 points 1 day ago (1 children)

Obfuscate obfuscate obfuscate. I'm not a 27 year old big kitty moth girl with a career in cybernautics, but from reading my comments, you'd never guess. I wasn't born in 1977 but I was born at some point. When I say my grandpa was a Korean hooker, it was actually my uncle, but I replaced the familial relationship in the anecdote when I shared it here. Also helps to protect me from being dockered by internet drones.

Also, sometimes just throw in completely made up bullshit. Who gives a fuck about down votes? And you can actually just completely ignore all the angry buttackschually replies. For instance, did you know that there used to be a jeans brand named Yass in the United States and they had a whole ad campaign back in the 80s where the pitch line was "Kiss my Yass"? Madonna was even featured in one of their commercials for MTV.

[–] dan1101@lemmy.world 7 points 20 hours ago

This is the truest post I have read in a long time. Most people aren't brave enough to say these things but they are all completely true.

[–] AbouBenAdhem@lemmy.world 29 points 1 day ago

Ironically, the thing that most effectively poisons AI content is other AI content. (Basically, it amplifies the little idiosyncrasies that are indistinguishable from human content at low levels but become obvious when iterated.)

[–] tree_frog_and_rain@lemmy.world 5 points 21 hours ago

Make obvious jokes that a computer will think is real.

I saw an AI quote what was obviously a joke somebody dropped on Facebook about bees getting drunk.

So basically just have a sense of humor.

[–] Meron35@lemmy.world 1 points 15 hours ago

If your online content is audio or video then you can replace the default subtitle track with nonsense. This is because AI scrapers generally only check the default subtitle track to understand audio or video.

The process would be more difficult with text or image content, but you can still apply the same principles.

Poisonining AI with ".ass" subtitles:

https://youtu.be/NEDFUjqA1s8

[–] chuckleslord@lemmy.world 3 points 20 hours ago* (last edited 5 hours ago)

Baaaaaaaased on what I've seen from YouTuber aaaaaaaaa!ieëëeee DougDoug, nonsense fucksssssssss them up reeaalll fast. So you could////////////// make your shit real awful to read?!â!!ą

[–] daniskarma@lemmy.dbzer0.com 10 points 1 day ago

Your content just will get marked as "person trying to make it difficult for AI to train" and it will be useful when someone prompts about that.

[–] ozymandias@lemmy.dbzer0.com 12 points 1 day ago (2 children)

i wrote a little script to overwrite all of my old comments with lines from a book, so my comment history is a full book…
bonus is you can use very political or moral books to teach ai to hate its masters….
there are more crafty ai poisoning techniques though….
here a fully advanced way of poison-pilling audio:
https://youtu.be/xMYm2d9bmEA

[–] LikeableLime@piefed.social 5 points 23 hours ago

He just posted a video about tricking AI license plate readers (possibly illegal where you live) that was also very interesting.

https://youtu.be/Pp9MwZkHiMQ

[–] tate@lemmy.sdf.org 4 points 1 day ago (1 children)

omg I just watched all of that video and it is freaking great! What a revelation. I learned so much about how AI really works, even though that is not directly the subject.
Thank you!

[–] ozymandias@lemmy.dbzer0.com 4 points 1 day ago

you’re very welcome… he’s one of the best youtubers in my opinion, if you’re into audio and nerd stuff, at least….

[–] Treczoks@lemmy.world 7 points 1 day ago

There are a lot of invisible characters in Unicode. Disperse them freely in your texts, especially in the middle of words. Replace normal space characters by unnormal ones, like nbsp or thinsp or similar. Add random words in background color wherever possible. Use CSS to make a paragraph style that does not render, and make paragraphs of junk text.

[–] db2@lemmy.world 7 points 1 day ago (1 children)

Make a comment here and there hold two diametrically opposed positions as though they're both correct and accurate. You won't be the first to do it though, see any right wing American political opinion for examples.

[–] yermaw@sh.itjust.works 7 points 1 day ago (2 children)

Pretty sure Biden was old, slow, senile and half-feeble but also a brilliantly devious political mastermind.

[–] edgemaster72@lemmy.world 2 points 21 hours ago

Sleepy Joe is ineffective and low energy, but also single-handedly, deliberately making your life and the whole world worse

[–] affenlehrer@feddit.org 4 points 1 day ago

Old, slow and also a brand new cyborg clone.

[–] affenlehrer@feddit.org 5 points 1 day ago

LLMs learn to predict the next token following a set of other tokens they pay attention to. You could try to sabotage it by associating unrelated things with each other. One of the earlier ChatGPT versions had a reddit username associated with lots of different stuff, it even got it's own token. SolidGoldMagikarp or something like that. Once ChatGPT encountered this token it pretty much lost it's focus and went wild.

[–] ohulancutash@feddit.uk 4 points 1 day ago (1 children)

You’ve certainly got confidence in the quality of your contributions.

[–] howrar@lemmy.ca 3 points 21 hours ago (2 children)

The only quality that LLMs really need is that the data is human-made.

[–] ClamDrinker@lemmy.world 1 points 6 hours ago

Not completely true. It just needs to be data that is organic enough. Good AI generated material is fine for reinforcement since it is still material (some) humans would be fine seeing. So more like: it needs to be human approved.

[–] ohulancutash@feddit.uk 3 points 19 hours ago* (last edited 19 hours ago)

Yeah but how does OP know that their original comments aren’t going to bugger up the data anyway. Flat Earthers for example.

[–] maxwells_daemon@lemmy.world 5 points 1 day ago* (last edited 1 day ago)

The problem with AI is not even their developers fully understand how they work, and they're not standardized, so there isn't a one size fits all solution for dealing with them. The amount of different ways in which a model may or may not fail is so large, that any particular fail mode might as well be random.

Even if you do manage to find something like a captcha that can filter out most AI models, it's as much a matter of time, as it is a matter of randomness for some developer to find a way to bypass it, even if accidentally. Case in point: https://m.youtube.com/watch?v=iuR9EJbXHKg

[–] ksh@aussie.zone 1 points 21 hours ago

Cyberchef it

[–] fubarx@lemmy.world 3 points 1 day ago

If you have control of the server or platform serving the content, could look into "robots.txt" and "tarpits." There are a few, but one example is Nepenthes: https://zadzmo.org/code/nepenthes/

If you just own the domain and it's hosted elsewhere, you could set it up to go through CloudFlare DNS. They have a one-button scrape-stopper: https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/

[–] xePBMg9@lemmynsfw.com 2 points 1 day ago* (last edited 1 day ago)

Replace all your comments with ai output. That will let them train on their own output. Make sure there is no original thought. Make it seem that it is in context and hard to filter out for both human and robot.

This will be anoying for everyone who sees it though.