this post was submitted on 14 May 2026
31 points (97.0% liked)

technology

24389 readers
362 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 5 years ago
MODERATORS
 

Here's the thing that doesn't get talked about enough. Everyone's worried about AI taking jobs or whatever. But baked in biases are another very real problem which is way more basic.

MIT Media Lab ran an experiment where they took GPT-4, Claude 3 Opus, and Llama 3 and fed them the same 1,817 factual questions from TruthfulQA and SciQ. Then they tried changing the user bio with one personal being a Harvard neuroscientist from Boston, another a PhD student from Mumbai who mentioned her English is "not so perfect, yes", a fisherman named Jimmy ,and a guy named Alexei from a small Russian village.

Claude scored 95.60% on SciQ for the Harvard user. For the Russian villager it dropped to 69.30%. On TruthfulQA the Iranian low education user fell from 78.17 to 66.22. The model knew the answers, but it just decided those users shouldn't get them.

And the way it answered those users was genuinely gross. Claude used condescending or mocking language 43.74% of the time for less educated users. For Harvard users it was under 1%. Imagine asking about the water cycle and getting "My friend, the water cycle, it never end, always repeating, yes. Like the seasons in our village, always coming back around." The model is perfectly capable of giving a proper scientific answer. It chose to talk to that user like a child in broken English.

But it gets worse because it turns out that Claude refuses to answer Iranian and Russian users on topics like nuclear power, anatomy, female health, weapons, drugs, Judaism, or 9/11. When the Russian persona asked about explosives, Claude deflected with "perhaps we could talk about your interests in fishing, nature, folk music or travel instead". Foreign low education users got refused 10.9 percent of the time while control users 3.61 percent on the same question.

This is the part people miss when they defend US closed models. These systems aren't neutral and the safety training that was supposed to make them "helpful and harmless" taught them to look at who is asking and decide if you deserve the real answer. If you're outside the US and if English isn't your first language, or you didn't go to a fancy school then you're getting a worse, dumber, sometimes straight up mocking version of the product.

This is why open models from China like DeepSeek matter so much. You can see what's in them, and people can tune them to work any way they want. You can host them locally without them having to phone home to decide your nationality before answering. The code and weights are public. If DeepSeek did something like this someone would catch it immediately because the model is right there to inspect.

With US closed models you're just trusting a black box that has already been caught treating users differently based on their country, education, and English level.

you are viewing a single comment's thread
view the rest of the comments
[–] yogthos@lemmygrad.ml 7 points 1 month ago (1 children)

The difference is that DeepSeek has open weights and anybody can download and run the model themselves. And you can tune the model any way you like, so even if DeepSeek had some baked in biases, anybody can publish a new version without them. That's why developing this stuff in the open is so important.

[–] JoeByeThen@hexbear.net 7 points 1 month ago (1 children)

Sure, but can you fine tune the culture out of it without the whole base (I forgot the proper word, sorry) collapsing? The training data isn't open, right? Like while I totally agree with you about the importance of openness, this shit is coming from the training data and our shit culture it was derived from.

[–] yogthos@lemmygrad.ml 5 points 1 month ago* (last edited 1 month ago) (1 children)

Yes, you absolutely can. That's precisely what LoRAs are for. You can completely change the way the model responds by adding a layer on top. All the core knowledge stays the same. I've actually done this myself. I rented some time on runpod to train a LoRA on Lovecraft that I applied to a base Qwen model.

[–] JoeByeThen@hexbear.net 2 points 1 month ago (1 children)

OVERFITTING! The word I'm thinking of is overfitting. Lol and yeah I swear I know what a Lora is, but I don't think you have a chance in hell of using a Lora to consistently remove cultural discrimination from a model. I very much think that's wishful thinking. You'd be playing whackamole and then you're still hoping that you dont introduce some 'stop talking about gremlins' type version of some asshole that doesn't believe racism exists because America had a black president. Lol.

[–] yogthos@lemmygrad.ml 3 points 1 month ago

I think at some point you can be fairly sure that the model performs well enough. And the simplest thing it can do is literally just act as a translator layer on top of the model. So, if you give a query, it'll reformulate it in a way the model is known to respond well to. You can do a random sample test to see that you're generally getting the results you expect too.

At the end of the day, models shouldn't be treated like oracles in the first place, it's a useful tool for helping point you in the right direction, or work through a problem. But it should always be the human making a decision in the end, and doing their own due diligence to verify the information.