Technology

6374 readers

122 users here now

Which posts fit here?

Any news that are at least tangentially connected to the technology, social media platforms, informational technologies or tech policy.

Post guidelines

[Opinion] prefix

Opinion (op-ed) articles must use [Opinion] prefix before the title.

Rules

1. English only

Title and associated content has to be in English.

2. Use original link

Post URL should be the original link to the article (even if paywalled) and archived copies left in the body. It allows avoiding duplicate posts when cross-posting.

3. Respectful communication

All communication has to be respectful of differing opinions, viewpoints, and experiences.

4. Inclusivity

Everyone is welcome here regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.

5. Ad hominem attacks

Any kind of personal attacks are expressly forbidden. If you can't argue your position without attacking a person's character, you already lost the argument.

6. Off-topic tangents

Stay on topic. Keep it relevant.

7. Instance rules may apply

If something is not covered by community rules, but are against lemmy.zip instance rules, they will be enforced.

Companion communities

!globalnews@lemmy.zip
!interestingshare@lemmy.zip

Icon attribution | Banner attribution

If someone is interested in moderating this community, message @brikox@lemmy.zip.

founded 2 years ago

MODERATORS

BrikoX@lemmy.zip

LLMs can unmask pseudonymous users at scale with surprising accuracy (arstechnica.com)

submitted 1 week ago by BrikoX@lemmy.zip to c/technology@lemmy.zip

5 comments fedilink hide all child comments

Pseudonymity has never been perfect for preserving privacy. Soon it may be pointless.

you are viewing a single comment's thread
view the rest of the comments

[–] AllNewTypeFace@leminal.space 9 points 1 week ago (1 children)

This seems to mostly scale up stylometry (the method of identifying authorship by writing style), a long-established technique. It unmasked the Unabomber in the 90s, as well as the anonymous author of a scandalous book about the Clinton administration. Indeed, one technique some writers use of dodging this is to deliberately write in character in a contrived style (there was an information-security poster on Twitter whose style was modelled on Taylor Swift, for example).

As all things are an arms race, a countermeasure to this would be a locally-hosted language model that can rephrase text into a more neutral style. Install it on your phone, select the text you’ve written and get it to rewrite it, getting something without any regionalisms, turns of phrase or other peculiarities of your writing style that you wouldn’t notice but would identify you given a large enough corpus of your writings. A voice changer for text, if you will.

[–] lemmysmash@beehaw.org 2 points 1 week ago (1 children)

From the article it seems that it's not even stylometry, but profile features extraction from the large amount of text. So, for example, if I have my full true profile somewhere where I never mention something like BDSM but in another place I have a blog specifically about BDSM but intentionally (and let's assume efficiently) omit or change every single detail about myself there, then, in theory, this particular technique should fail.

But yes, nothing prevents people from using LLMs in the same way for stylometry (and I'm 101% sure that those who are interested in that are already doing so). And yes, local "rewriter" LLM would help to some extent, but I think there has been another research somewhere that LLM-produced text allows to, if not completely recover the original prompt, then at least kind of fingerprint it, so... I wouldn't fully trust that method either :)

[–] AllNewTypeFace@leminal.space 1 points 1 week ago

It mentions style as being among the data points used, along with personal details, though if your hidden account is used for things like whistleblowing or niche erotica, you may not be mentioning telltale biographical details at all often, while you can’t help writing the way you write, with numerous unconscious choices between alternative ways of phrasing things, which will be the bulk of what it has to work with.

Of course, that doesn’t mean you couldn’t slip up, so if you don’t want your posts traced back to you, also look out for any details you’re leaking and file the serial numbers off them (and perhaps rig up a way of delaying your posts outside of your waking hours).