Technology

77769 readers

2716 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

583

A Developer Accidentally Found CSAM in AI Data. Google Banned Him For It (www.404media.co)

submitted 1 week ago by themachinestops@lemmy.dbzer0.com to c/technology@lemmy.world

109 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] Devial@discuss.online 187 points 1 week ago* (last edited 1 week ago) (44 children)

The article headline is wildly misleading, bordering on being just a straight up lie.

Google didn't ban the developer for reporting the material, they didn't even know he reported it, because he did so anonymously, and to a child protection org, not Google.

Google's automatic tools, correctly, flagged the CSAM when he unzipped the data and subsequently nuked his account.

Google's only failure here was to not unban on his first or second appeal. And whilst that is absolutely a big failure on Google's part, I find it very understandable that the appeals team generally speaking won't accept "I didn't know the folder I uploaded contained CSAM" as a valid ban appeal reason.

It's also kind of insane how this article somehow makes a bigger deal out of this devolper being temporarily banned by Google, than it does of the fact that hundreds of CSAM images were freely available online and openly sharable by anyone, and to anyone, for god knows how long.

[–] ulterno@programming.dev 3 points 5 days ago* (last edited 5 days ago) (2 children)

Another point is, the reason Google's AI is able to identify CSAM is because it has that in its training data, flagged as such.

In that case, it would have detected the training material as ~100% match.

I don't get though, how it ended up being openly available as if it were properly tagged, they would probably exclude it from the open-sourced data. And now I see it would also not be viable to have an open-source, openly scrutinisable AI deployment for CSAM detection for the same reason.

And while some governmental body got a lot of backlash for trying to implement such an AI thing on chat stuff, Google gets to do so all it wants because it's E-Mail/GDrive and all on their servers and you can't expect privacy.

Considering how many such stories of people having problems due to this system is coming up, is there any statistic of legitimate catches using this model? I suspect not, because why would anyone use Google services for this kind of stuff?

[–] arararagi@ani.social 2 points 5 days ago (1 children)

You would think, but none of these companies actually make their own dataset, they buy from third parties.

[–] ulterno@programming.dev -1 points 5 days ago

I am not sure which point you are answering to.
COuld you please specify.

load more comments (41 replies)