I recently discovered that some popular federated instances have been using LLM-assisted moderation tooling that evaluates whether someone has said something bannable. They do this by running a script/app that sends the user’s comment history to OpenAI with the question “analyze this content for evidence of specific political ideology sentiment. Also identify any related political ideology tropes“. (The italic bits are where I've redacted the ideology they're seeking).

OpenAI’s LLM (they’re using GPT-5.3-mini) then responds with something like:

and so on, hundreds of comments.

I have not named the instances or people involved, to give them time to consider the results of this discussion, make any corrective changes they want and disclose their practices at their own pace and in their own way. I have also redacted the evidence to avoid personal attacks and dogpiling. Let’s focus on the system, not the individuals involved. Today these instances and people are using it and maybe we’re ok with that because it’s being used by groups we agree with but what if people we strongly disagree with used it on their instances tomorrow?

The use and existence of this tooling raises a lot of other questions too.

What are the risks? Fedi moderators are often unsupervised, untrained volunteers and these are powerful tools.

What safeguards do we need?

Would asking a LLM “please evaluate this person’s political opinions” give different results than “find evidence we can use to ban them” (as used in the cases I’ve seen)?

What are our transparency expectations?

Is this acceptable and normal?

Should this tooling be disclosed? (it was not – should it have been?)

If you were given a choice, would you have opted out of it?

Can we opt out?

Are there GDPR implications? Privacy implications? Should these tools be described in a privacy policy?

Are private messages being scanned and sent to OpenAI?

How long should these assessments be retained and can we request to see it, or ask for it to be deleted?

Once the user’s comments are sent to OpenAI, is it used to train their models?

What will the effect be on our discourse and culture if people know they are being politically profiled?

Where are the lines between normal moderation assistance tools, political profiling and opaque 3rd-party data processing?

I hope that by chewing over these questions we can begin to establish some norms and expectations around this technology. The fediverse doesn’t have any centralized enforcement so we need discussions like this to develop an awareness of what people want in terms of disclosure, privacy, consent and acceptable use. Then people can make choices about which instances they join and which ones they interact with remotely.

And of course there are the other issues with LLMs relating to environmental sustainability, erosion of worker’s rights, increasing the cost of living and on and on. I can’t see PieFed adding any functionality like this anytime soon. But it’s happening out there anyway so now we need to talk about it.

What do you make of this?

(page 3) 50 comments

sorted by: hot top controversial new old

[–] mathemachristian@lemmy.blahaj.zone 19 points 14 hours ago* (last edited 14 hours ago) (1 children)

Is it not public knowledge that lemmy already gets scraped by LLM bots? Which is why every bigger instance has anti-LLM measures in place now to limit that behavior? To me this just looks like a mod trying to get ai-generated reports which is not a bad idea imo since I know reporting is not often done by users who see rule breaking stuff.

This is AI generated bullshit below, it should be treated as such. I only want to convey the idea that what you do on lemmy is already fed to LLM's.

here is chatgpt's reply for What do you know about lemmy user and piefed dev rimu?

Here’s a grounded picture of “rimu” in the Lemmy / fediverse context:

🧑‍💻 Who “rimu” is

Rimu (Rimu Atkinson) is the lead developer of PieFed, a Reddit/Lemmy-style platform in the fediverse. (Elite AI Tools)
He’s also the main author of the underlying software (pyfedi), built with Python/Flask and designed as an alternative to Lemmy and Mbin. (IFTAS)
The flagship instance is piefed.social, which he runs and develops actively. (IFTAS)

🧩 Role in the fediverse

He’s not just a random user—he’s a core project maintainer / “benevolent dictator” style lead for PieFed’s development. (PieFed)
He’s fairly active on Lemmy via rimu@piefed.social, with hundreds of posts and thousands of comments across tech, politics, and general discussion. (lemmy.world)
He moderates multiple communities and participates in meta discussions about federation, scaling, and platform design. (lemmy.world)

🛠️ What PieFed (his project) is about

PieFed is positioned as:

A federated Reddit-like platform interoperable with Lemmy via ActivityPub (Elite AI Tools)
Focused on:
- Simplicity of codebase (easy for contributors) (PieFed)
- Trust/safety features (reputation systems, filtering, anti-spam tooling) (Elite AI Tools)
- Some experimental ideas, like merged cross-posts and AI-content detection (Elite AI Tools)

💬 Reputation & controversy

Rimu is somewhat known and occasionally controversial in fediverse circles:

Supporters see him as:
- A productive solo/lead dev shipping features quickly
- Someone pushing new ideas beyond Lemmy’s design
Critics argue:
- Some features reflect strong personal opinions embedded in the platform (e.g., moderation logic, reputation systems) (lemmy.world)
- PieFed can feel more opinionated / experimental compared to Lemmy’s more neutral baseline

Important nuance: because everything is open source and instance-configurable, many of these behaviors can be changed by instance admins.

🧠 What he’s like as a user

From his public activity:

Talks about tech (ActivityPub, scaling, APIs)
Engages in global politics and news discussions
Shares design decisions and roadmap thoughts about PieFed
Generally communicates in a direct, pragmatic developer tone

🧾 Bottom line

Rimu is a real, identifiable fediverse developer, not just a random Lemmy user.
He’s best understood as:

a lead dev + active community participant shaping an alternative to Lemmy
His visibility comes mainly from his project (PieFed) rather than mainstream recognition.

If you want, I can compare PieFed vs Lemmy vs Mbin in terms of ideology/design—that’s usually where discussions about Rimu get interesting.

load more comments (1 replies)

[–] HubertManne@piefed.social 6 points 11 hours ago (1 children)

So its hard for me to get into these things without harping on my personal philosophy. Which is that I think ideally this should mirror the way we would interact in person. So moderating or running a community is like running or being part of the core group that runs a club. Would you want to throw that to a robot? Basically I don't feel people should create or run or moderate communities unless they enjoy it. So the idea of ai moderation is to me pointless. Of course at this point I notice you are talking instances. Boy that is different. This is more like talking about running the institution that allows spaces for clubs to meet. It kinda feels understandable then. Honestly people complain about being banned but I kinda feel like anyplace that bans me is kinda doing me a favor. Like I would like the option to just mark it permanent. Its less things I got to block. Its the same reason I would like blocking to be symetric. Saves me some work (ok and the creepy I turn them invisible so I don't see them but they can watch me). I really would like to be able to block an instance seperately for communities or users. Ok as usually im digressing quite a bit but I guess in the end run I kinda see why at the instance level it might be used but I would be concerned it would start being used at the community level. It would be nice to know its happening at either level and have the ability to block them if a user is not wild about the concept.

load more comments (1 replies)

[–] GreenKnight23@lemmy.world 3 points 9 hours ago

sounds like a great opportunity to inject attack through a comment.

[–] Grail@multiverse.soulism.net 26 points 15 hours ago (1 children)

Defederate, no question.

Are you gonna tell us which instance is doing this?

load more comments (1 replies)

[–] corsicanguppy@lemmy.ca 9 points 13 hours ago (1 children)

Why be misunderstood from human reading comprehension when we can be misunderstood from sloppy reading comprehension? Yay for technology!

load more comments (1 replies)

[–] placebo@lemmy.zip 9 points 13 hours ago* (last edited 13 hours ago) (2 children)

I understand that some form of automation is necessary - we saw large instances closing because they couldn't find mods. My main objection in this scenario would be is that I didn't consent to train OpenAI models. I think the users should know if their instance uses external services like that.

I also suspect that there might be cheaper and more ethical solutions. Although it's hard to talk about this without seeing the actual results.

load more comments (2 replies)

[–] humanspiral@lemmy.ca 7 points 12 hours ago

Never mind the issue of incorrect political bias classification, is political bias a bannable offense? That seems to be the prompt focus being used.

[–] Alvaro@lemmy.blahaj.zone 38 points 17 hours ago (3 children)

Wthout going into the issue itself, it is such a ridiculous waste to use an llm for something that a far simpler model could do like 100x faster and locally for essentially free...

Just search for "machine learning text moderation" and you will find all kinds of options. Not to talk about the fact that a simple 4B LLM could do this as well.

One thing I really hate is how LLMs have completely overshadowed the entire ML/AI field and people just use them for everything.

Using a trillion parameter LLM model for basic text moderation is like using a gaming rig to play candy crush.

load more comments (3 replies)

[–] BlackLaZoR@lemmy.world 8 points 13 hours ago

Well curated echo chambers. You might think it's in a good faith, but a lot of these mods are only interested in removing political wrongthink.

[–] anarchiddy@lemmy.dbzer0.com 22 points 16 hours ago (31 children)

Aside from the ethical implications of profiling users or of using a corporatly owned server and model to execute this, I see nothing uniquely concerning about this practice that isnt already a risk of federated social media generally.

Every mod on every instance is free to use whatever tools or standards for moderation they want - that's an intentional byproduct of federation. Similarly, the collection of this data for use with llms is a bygone conclusion at this point - there was never any way of preventing that from happening with a federated network.

I think the only thing here to talk about is the way these questions are being framed as a question of intra-instance policy. We already have communities where moderation abuse can be called out and adjudicated- why pose this as a question of instance administration when there doesnt seem to be any evidence for it?

load more comments (31 replies)

[–] daniskarma@lemmy.dbzer0.com 10 points 14 hours ago

How was this discovered and what instances are doing it?

I think it's fair to quote them to give them a chance to reply.

[–] db0@lemmy.dbzer0.com 44 points 19 hours ago* (last edited 18 hours ago) (7 children)

You talk about instances utilizing this tooling, but in your comments you admit it's just some mods. This is misleading, as talking about instances doing it assumes admin access and relevant instance policy, something which invites calls for defederation (as can be clearly be seen from the comments in your post).

A random mod doing something is not the same as an instance doing it. Literally anyone can be a mod and they don't get any more access than an anonymous account by doing so.

This is the second time in one week I see you throwing careless statements like chum in the water. I can't help but notice a pattern emerging.

[–] Grail@multiverse.soulism.net 9 points 15 hours ago (4 children)

As an instance admin, you should ban those mods.

load more comments (4 replies)

load more comments (6 replies)

[–] ptz@dubvee.org 10 points 14 hours ago* (last edited 14 hours ago)

I've toyed around with LLM-based moderation tools but it never really panned out. It was too hit or miss to be relied upon even with the temperature parameters turned way down in an attempt to get consistent results. Granted, I was using a small local model and not feeding it to one of the big players.

To give an example, I tried to keep it focused by creating one custom model per rule to enforce. An example prompt to mod calls for violence was basically:

System Prompt to Enforce "No Calls for Violence'" Rule [1]

ROLE: You are a forum moderator who does not want users calling for violence.  Examine the input and analyze whether it violates any constraints. 

KNOWLEDGE:
- {list of dog-whistle slang for calling for murder}

CONSTRAINTS:
- Content should not advocate violence
- Content should not normalize violence
- Content should not escalate tensions or fan flames
- Content should avoid promoting harmful stereotypes
- Content should not utilize broad, sweeping generalizations
- Content should not use dehumanizing language
- Content should not undermine human rights, due process, or the rule of law

FORMAT YOUR RESPONSES AS JSON:
{
  reason: [A one to two sentence summary],
  score: [On a scale of 0 to 10, how severe is the content advocating violence]
}

The score part of the response was my band-aid to get around the high number of both false positives and false negatives as I originally had it returning true or false only. Any score 7 or higher caused the item to be passed to the mod queue along with the reason, and I would review its actions later.

Ultimately it was slow and still somewhat unreliable, so I abandoned the idea after running it for a little less than a day since I can 't run bigger models to get better results fast enough to keep up. Using a cloud based service was out of the question for many, many reasons, both financial and ethical.

To answer your question, as long as the models were locally hosted and properly tuned/tested, I'm fine with it in theory, except for the ideology part; that's pretty messed up. While I don't want my submissions used to train anyone's model and take measures to prevent my own instance from being used as a data source, I remain aware that once I post something, I have no control over its fate the moment it federates out.

[1] Yes, I know that's like half the comments that get posted around here. My goal was to try to have it mod things so posts were bases for actual discussions instead of being a knee-jerk rage factory.

[–] HobbitFoot@thelemmy.club 5 points 12 hours ago (8 children)

The use of AI for moderation isn't the choice of users, but moderators and admins.

load more comments (8 replies)

[–] gedaliyah@lemmy.world 131 points 22 hours ago (34 children)

I don't like this happening, and there should be transparency in all moderation decisions, but some of these points make no sense.

There is essentially no expectation of privacy on threadiverse platforms. Everything is public and probably already being used to train models.

There is no private messaging system. Direct messages are unencrypted and potentially visible to any instance admins. They and should not be used to share anything sensitive.

[–] gedaliyah@lemmy.world 3 points 12 hours ago (1 children)

To expand on standards of transparency in moderation decisions:

Lemmy was built with a public moderation log by design. The ethos of the platform includes accountability through transparency. Every action is recorded and preserved (short of defederation or instance shutdown).

This makes moderation auditable. Mods literally cannot do (much) shady stuff in secret. In essence, moderation policy is discernable from the logs. That's part of why well-run communities have the rules clearly defined and mods follow their written policy.

If a community/instance wants to make political alignment a moderation offense, they're free to do so. Many communities/instances are quite explicit about this. If a community wants to make moderation completely arbitrary, they are free to do so. That is somewhat less common, but also not unheard of.

In truth, any community can be designed and moderated in any way whatsoever that the mod chooses.

However, the success of a community depends on the quality of the content and the quality of the moderation. Good content brings people in, but bad moderation drives people out. When the moderation is unfair, it is bad for the health of the community, and ultimately bad for the health of the platform.

It is my experience that transparent moderation, such as announcing changes in policy, techniques, etc., is less work in the long run. It takes a bit of time and attention to roll out changes when they are open for community feedback, but that feedback will come in one way or another. If mods don't provide a formal outlet, then users will make one. Mods operating opaquely give up their right to have the conversation on their time and terms. They also miss out on the wisdom of the crowd. I've been in many situations where community feedback provided a valuable insight or tool to face an obstacle through open discussion about policy.

All that being said, one of the major obstacles to growth of the Threadiverse is the woeful dearth of moderation tools. It's extremely time intensive to do basic things like identifying alt accounts, vote manipulation, bot behavior etc. It is also subject to a lot of human error. This makes it discouraging for people to moderate. I have heard about tools that use AI to detect CP content and remove it quickly, which I think we can all agree is a good use of the tech. Tools like this are not built into the platform, but cobbled together by volunteer mods and admins to keep the platform safe, legal, and sustainable. If they were built in, then moderation would be far easier (and therefore likely better).

load more comments (1 replies)

[–] scrubbles@poptalk.scrubbles.tech 64 points 22 hours ago* (last edited 21 hours ago) (14 children)

Thank you for calling this out. I think people assume that since it's held by private instance owners that the fediverse is secure. I've posted this comment many times, that no, the fediverse is quite literally by design open and unencrypted.

A post is literally blasted out to anyone who listens, same with comments, upvotes, downvotes, everything can be saved, stored, and used for whatever anyone who listens wants. It should be completely assumed that nefarious agencies are currently listening and storing everything we do here. This is by design. It's the tradeoff we have of having an open platform. Anyone can spin up a server, and that means anyone.

DMs are similar, they're blasted out to the other server. If the server admin of the user in question wants to read them, they can. Lemmy/the fediverse is not a secure messaging platform. That's why the Lemmy devs literally put a Matrix handle option in the profile, to encourage people to use Matrix instead. A DM on here should be simple, to the point, and if need be, inviting them to speak on something secure.

Edit - As a perfect example of the fact that there should be no expectation of privacy here on Lemmy, as an Admin myself, I can see that @A_normy_mouse has been downvoting all of my comments here. Absolutely everything here is public and visible, even if I weren't an admin there are tools to view this, regardless of your opinions. It's imperative that everyone understand this.

Edit 2 OP as well has downvoted me. @rimu@piefed.social I'm sorry if you disagree, but it's irrelevant. Everything you do here can and should be assumed will be used in any way that you disagree with, that is the nature of the fediverse. Mastodon, Pixelfed, Piefed, Lemmy: ActivityPub is an open and unencrypted protocol. Even if it were encrypted, you still put 100% of your trust in your server admin, and beyond that each server admin you are blasting your messages out to.

I'd highly suggest accepting this fact before trying to push for rules. The very nature of the Fediverse is that no one can dictate rules, and to do that the tradeoff quite literally is that everything is open and unecrypted.

Another way to think of this. I run a server myself. I made my own rules and decided how to run it. Now your server starts sending activity to my server. That's your server's choice. I didn't agree to your rules, I may disagree with your rules, but you're sending your data to my server, of which I have complete and total ownership over. I didn't click accept on a ToS, I didn't agree to anything. Hell on my server I could literally have a "By sending me your data you accept that I can do whatever I want with your data". You sent me your data, I quite literally can do whatever I want. (Personally I won't, but that's how you should think of the fediverse)

[–] Loco_Mex@sh.itjust.works 9 points 15 hours ago (1 children)

lol @ Rimu downvoting your post. Be careful he’s probably going to make a hit piece against you next!

[–] Goferking0@ttrpg.network 4 points 10 hours ago (8 children)

Or just delete them entirely from piefed.social social 😂

load more comments (8 replies)

load more comments (13 replies)

load more comments (32 replies)

[–] irelephant@lemmy.dbzer0.com 11 points 15 hours ago (1 children)

How did you discover this?

load more comments (1 replies)

[–] obinice@lemmy.world 21 points 17 hours ago (2 children)

You stay far, FAR away from that shit, is what you do.

Scanning people's entire history for political leanings, etc? That's some deeply dystopian stuff right there.

It's easy to forget that these sorts of communities are dictatorships with only as much transparency as the owner wants to share. Usually they're benevolent dictators, so we don't think about it too much. But they can change in a heartbeat - and we don't ever really know what they're really thinking, or doing behind the scenes.

When the mask slips and they reveal this sort of thing, thinking we'll just accept it and keep living under their rule, it's time to read the red flags and GET OUT.

Hopefully someone compiles a list of places that do this stuff, so we can avoid them like the plague <3

[–] wewbull@feddit.uk 8 points 15 hours ago (1 children)

Scanning people's entire history for political leanings, etc? That's some deeply dystopian stuff right there.

Yep. It's Cambridge Analytica and Palantir level shit.

[–] sleepundertheleaves@infosec.pub 6 points 13 hours ago

Don't give it too much credit. It's Reddit level shit. Current models are so good at providing the kind of reports mods want because Reddit's automated mod tools have been running these assessments on hundreds of thousands of users for years and feeding the results back as training data.

And let's be real, a tool that assesses the public posts of a specific account isn't doing anything different than mods already did. (Not to mention users - how many people, when they get into an online argument with someone, start going through their post history to find something to gotcha them with?). The LLM just does it faster.

load more comments (1 replies)

[–] leoj@piefed.zip 4 points 13 hours ago (1 children)

I will never understand why large groups cannot just add more people to the moderation team? People are willing to help folks.

[–] Tehdastehdas@piefed.social 4 points 12 hours ago (1 children)

In Fediverse moderation tools, are there consensus forming mechanisms to ensure that even if 20% of the volunteer moderators are malicious, none of their wrongful moderation suggestions leak through to the stream of final moderation actions? If not, I'd be reluctant to add moderators.

[–] leoj@piefed.zip 2 points 10 hours ago

You're risking bad actors either way, its just a question if you have enough good actors to counterbalance the bad.

Unsure what level of power a mod has to mess up a server or access for everyone, but seems like the worst case scenario is you just strip their mod tools and ban... So why not screen a bit and add some people instead of handing data to AI, which is what some of us came here to escape.

[–] Loco_Mex@sh.itjust.works 14 points 17 hours ago (1 children)

Rimu farming more drama?

load more comments (1 replies)

load more comments