this post was submitted on 04 Jun 2025

337 points (97.7% liked)

Technology

77816 readers

2570 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

337

Wikimedia Foundation's plans to introduce AI-generated article summaries to Wikipedia (lemmy.dbzer0.com)

submitted 6 months ago* (last edited 6 months ago) by antonim@lemmy.dbzer0.com to c/technology@lemmy.world

142 comments fedilink hide all child comments

I don't know if this is an acceptable format for a submission here, but here it goes anyway:

Wikimedia Foundation has been developing an LLM that would produce simplified Wikipedia article summaries, as described here: https://www.mediawiki.org/wiki/Reading/Web/Content_Discovery_Experiments/Simple_Article_Summaries

We would like to provide article summaries, which would simplify the content of the articles. This will make content more readable and accessible, and thus easier to discover and learn from. This part of the project focuses only on displaying the summaries. A future experiment will study ways of editing and adjusting this content.

Currently, much of the encyclopedic quality content is long-form and thus difficult to parse quickly. In addition, it is written at a reading level much higher than that of the average adult. Projects that simplify content, such as Simple English Wikipedia or Basque Txikipedia, are designed to address some of these issues. They do this by having editors manually create simpler versions of articles. However, these projects have so far had very limited success - they are only available in a few languages and have been difficult to scale. In addition, they ask editors to rewrite content that they have already written. This can feel very repetitive.

In our previous research (Content Simplification), we have identified two needs:

The need for readers to quickly get an overview of a given article or page

The need for this overview to be written in language the reader can understand

Etc., you should check the full text yourself. There's a brief video showing how it might look: https://www.youtube.com/watch?v=DC8JB7q7SZc

This hasn't been met with warm reactions, the comments on the respective talk page have questioned the purposefulness of the tool (shouldn't the introductory paragraphs do the same job already?), and some other complaints have been provided as well:

Taking a quote from the page for the usability study:

"Most readers in the US can comfortably read at a grade 5 level,[CN] yet most Wikipedia articles are written in language that requires a grade 9 or higher reading level."

Also stated on the same page, the study only had 8 participants, most of which did not speak English as their first language. AI skepticism was low among them, with one even mentioning they 'use AI for everything'. I sincerely doubt this is a representative sample and the fact this project is still going while being based on such shoddy data is shocking to me. Especially considering that the current Qualtrics survey seems to be more about how to best implement such a feature as opposed to the question of whether or not it should be implemented in the first place. I don't think AI-generated content has a place on Wikipedia. The Morrison Man (talk) 23:19, 3 June 2025 (UTC)

The survey the user mentions is this one: https://wikimedia.qualtrics.com/jfe/form/SV_1XiNLmcNJxPeMqq and true enough it pretty much takes for granted that the summaries will be added, there's no judgment of their actual quality, and they're only asking for people's feedback on how they should be presented. I filled it out and couldn't even find the space to say that e.g. the summary they show is written almost insultingly, like it's meant for particularly dumb children, and I couldn't even tell whether it is accurate because they just scroll around in the video.

Very extensive discussion is going on at the Village Pump (en.wiki).

The comments are also overwhelmingly negative, some of them pointing out that the summary doesn't summarise the article properly ("Perhaps the AI is hallucinating, or perhaps it's drawing from other sources like any widespread llm. What it definitely doesn't seem to be doing is taking existing article text and simplifying it." - user CMD). A few comments acknowlegde potential benefits of the summaries, though with a significantly different approach to using them:

I'm glad that WMF is thinking about a solution of a key problem on Wikipedia: most of our technical articles are way too difficult. My experience with AI summaries on Wikiwand is that it is useful, but too often produces misinformation not present in the article it "summarises". Any information shown to readers should be greenlit by editors in advance, for each individual article. Maybe we can use it as inspiration for writing articles appropriate for our broad audience. —Femke 🐦 (talk) 16:30, 3 June 2025 (UTC)

One of the reasons many prefer chatGPT to Wikipedia is that too large a share of our technical articles are way way too difficult for the intended audience. And we need those readers, so they can become future editors. Ideally, we would fix this ourselves, but my impression is that we usually make articles more difficult, not easier, when they go through GAN and FAC. As a second-best solution, we might try this as long as we have good safeguards in place. —Femke 🐦 (talk) 18:32, 3 June 2025 (UTC)

Finally, some comments are problematising the whole situation with WMF working behind the actual wikis' backs:

This is a prime reason I tried to formulate my statement on WP:VPWMF#Statement proposed by berchanhimez requesting that we be informed "early and often" of new developments. We shouldn't be finding out about this a week or two before a test, and we should have the opportunity to inform the WMF if we would approve such a test before they put their effort into making one happen. I think this is a clear example of needing to make a statement like that to the WMF that we do not approve of things being developed in virtual secret (having to go to Meta or MediaWikiWiki to find out about them) and we want to be informed sooner rather than later. I invite anyone who shares concerns over the timeline of this to review my (and others') statements there and contribute to them if they feel so inclined. I know the wording of mine is quite long and probably less than ideal - I have no problem if others make edits to the wording or flow of it to improve it.

Oh, and to be blunt, I do not support testing this publicly without significantly more editor input from the local wikis involved - whether that's an opt-in logged-in test for people who want it, or what. Regards, -bɜ:ʳkənhɪmez | me | talk to me! 22:55, 3 June 2025 (UTC)

Again, I recommend reading the whole discussion yourself.

EDIT: WMF has announced they're putting this on hold after the negative reaction from the editors' community. ("we’ll pause the launch of the experiment so that we can focus on this discussion first and determine next steps together")

you are viewing a single comment's thread
view the rest of the comments

[–] antonim@lemmy.dbzer0.com 11 points 6 months ago (3 children)

Looks like the vast majority of people disagree D: I do agree that WP should consider ways to make certain articles more approachable to laymen, but this doesn't seem to be the right approach.

[–] FaceDeer@fedia.io 6 points 6 months ago (2 children)

The vast majority of people in this particular bubble disagree.

I've found that AI is one of those topics that's extremely polarizing, communities drive out dissenters and so end up with little awareness of what the general attitude in the rest of the world is.

[–] miguel@fedia.io 11 points 6 months ago (2 children)

You mean the bubble of people who don't want a factually incorrect, environmentally damaging shortcut to provide a summary that's largely already being done by someone? You're right.

[–] Sandbar_Trekker@lemmy.today 4 points 6 months ago* (last edited 6 months ago)

"environmentally damaging"
I see a lot of users on here saying this when talking about any use case for AI without actually doing any sort of comparison.

In some cases, AI absolutely uses more energy than an alternative, but you really need to break it down and it's not a simple thing to apply to every case.

For instance: using an AI visual detection model hooked up to a camera to detect when rain droplets are hitting the windshield of a car. A completely wasteful example. In comparison you could just use a small laser that pulses every now and then and measures the diffraction to tell when water is on the windshield. The laser uses far less electricity and has been working just fine as they are currently used today.

Compare that to enabling DLSS in a video game where NVIDIA uses multiple AI models to improve performance. As long as you cap the framerates, the additional frame generation, upscaling, etc. will actually conserve electricity as your hardware is no longer working as hard to process and render the graphics (especially if you're playing on a 4k monitor).

Looking at Wikipedia's use case, how long would it take for users to go through and create a summary or a "simple.wikipedia" page for every article? How much electricity would that use? Compare that to running everything through an LLM once and quickly generating a summary (which is a use case where LLMs actually excel at). It's honestly not that simple either because we would also have to consider how often these summaries are being regenerated. Is it every time someone makes a minor edit to a page? Is it every few days/weeks after multiple edits have been made? Etc.

Then you also have to consider, even if a particular use case uses more electricity, does it actually save time? And is the time saved worth the extra cost in electricity? And how was that electricity generated anyway? Was it generated using solar, coal, gas, wind, nuclear, hydro, or geothermal means?

Edit: typo

[–] FaceDeer@fedia.io -4 points 6 months ago (1 children)

What an unbiased view. Got any citations?

[–] tyler@programming.dev -1 points 6 months ago (1 children)

The survey results? Did you read the post?

[–] FaceDeer@fedia.io -3 points 6 months ago

Miguel's claims are:

The summaries are factually inaccurate
Generating the summaries are environmentally damaging.
Summarization is "largely already being done by someone"

There's an anecdote in a talk page about one summary being inaccurate. A talk page anecdote is not a usable citation.

Survey results aren't measuring environmental impact.

An the whole point of AI is to take the load off of someone having to do things manually. Assuming they actually are - even in this thread there are plenty of complaints about articles on Wikipedia that lack basic summaries and jump straight into detailed technical content.

[–] antonim@lemmy.dbzer0.com 6 points 6 months ago (1 children)

The problem is that the bubble here are the editors who actually create the site and keep it running, and their "opposition" is the bubble of WMF staff.

[–] FaceDeer@fedia.io -2 points 6 months ago (2 children)

The problem is that the bubble here are the editors who actually create the site and keep it running

No it isn't, it's the technology@lemmy.world Fediverse community.

[–] antonim@lemmy.dbzer0.com 3 points 6 months ago (1 children)

Have you read my OP or did you just use an AI-generated summary? I copy-pasted several comments from Wikipedia editors and linked a page with dozens, if not a hundred other comments by them, and they're overwhelmingly negative.

[–] FaceDeer@fedia.io -2 points 6 months ago (1 children)

I'm not talking about them at all. I'm talking about the technology@lemmy.world Fediverse community. It's an anti-AI bubble. Just look at the vote ratios on the comments here. The guy you responded to initially said "Finally, a good use case for AI" and he got close to four downvotes per upvote. That's what I'm talking about.

The target of these AI summaries are not Wikipedia editors, it's Wikipedia readers. I see no reason to expect that target group to be particularly anti-AI. If Wikipedia editors don't like it there'll likely be an option to disable it.

[–] antonim@lemmy.dbzer0.com 2 points 6 months ago (1 children)

I’m not talking about them at all.

But it's quite obvious that they were what I was talking about, and you were responding to me. Instead of responding to my actual comment, you deceptively shifted the topic in order to trivialise the situation.

The target of these AI summaries are not Wikipedia editors

Except that the editors will very likely have to work to manage those summaries (rate, correct or delete them), so they definitely will be affected by them. And in general it's completely unacceptable to suggest that the people who have created 99% of the content on Wikipedia should have less of a say on how the website functions than a handful of bureaucrats who ran a survey.

If Wikipedia editors don’t like it there’ll likely be an option to disable it.

Disabling would necessarily mean disabling it wiki-wide, not just for individual editors, in which case the opinions of the editors' "bubble" will be quite relevant.

[–] FaceDeer@fedia.io -1 points 6 months ago (1 children)

No Wikipedia editor has to work on anything, if they don't want to interact with those summaries then they don't have to.

And no, it wasn't quite obvious that that's what you were talking about. You said "Looks like the vast majority of people disagree D:". Since you were directly responding to a comment that had been heavily downvoted by the technology@lemmy.world community it was a reasonable assumption that those were the people you were talking about.

Disabling would necessarily mean disabling it wiki-wide,

No it wouldn't, why would you think that? Wikipedia has plenty of optional features that can be enabled or disabled on a per-user basis.

[–] antonim@lemmy.dbzer0.com 2 points 6 months ago (1 children)

I posted that first reply before the comment had any downvotes at all, and in my first response to you it is even now certainly clear who I'm talking about.

You seem to be more interested in wringing out arguments to defend AI tools than in how to make Wikipedia function better (adding low-quality summaries and shrugging at the negative reactions because the editors don't "really" have to fix any of them is not how WP will be improved). That the editors aren't irrelevant fools in their tiny bubble is something WMF apparently agrees with, because they've just announced they're pausing the A/B testing of the AI summaries they planned.

[–] FaceDeer@fedia.io -2 points 6 months ago (1 children)

And when I saw the reply it had plenty of downvotes already, because this is technology@lemmy.world and people are quick to pounce on anything that sounds like it might be pro-AI. You're doing it yourself now, eyeing me suspiciously and asking if I'm one of those pro-AI people. Since there were plenty of downvotes the ambiguity of your comment meant my interpretation should not be surprising.

It just so happens that I am a Wikipedia editor, and I'm also pro-AI. I think this would be a very useful addition to Wikipedia, and I hope they get back to it when the dust settles from this current moral panic. I'm disappointed that they're pausing an experiment because that means that the "discussion" that will be had now will have less actually meaningful information in it. What's the point in a discussion without information to discuss?

[–] antonim@lemmy.dbzer0.com 2 points 6 months ago (1 children)

You openly declaring yourself as "pro-AI" (which I didn't have to ask for because I remember your numerous previous comments) and opposing yourself to "anti-AI" people really just shows that the discussion is on wrong tracks. The point of being pro or against AI should be to have whatever positive consequences for the stuff people are doing, in this case Wikipedia. The priority should be to be pro-Wikipedia, and then work out how AI relates to that, rather than going the other way around.

And true to your self-description you're only defending AI as a concept, by default, without addressing the actual complaints or elaborating why it would be so desirable, but by accusing others of being anti-AI, as if that's meaningful critique by itself.

Since you claim to be an editor, you could also join the discussion on Village Pump directly and provide your view there. But I believe we've already had a very similar discussion about Wikipedia some time ago, and I think you said you're not actually active there anymore.

[–] FaceDeer@fedia.io -3 points 6 months ago

I'm "pro-AI" in the sense that I don't knee-jerk oppose it.

I do in fact use AI to summarize things a lot. I've got an extension in Firefox that'll do it to anything. It generally does a fine job.

[–] pinball_wizard@lemmy.zip 2 points 6 months ago* (last edited 6 months ago)

How much do you want to bet on the overlap being small?

A bigger question is how much does Wikiemedia Foundation want to bet that their top donors and contributors aren't in this thread...

Edit: Moving my unrelated ramblings to a separate comment.

[–] ptz@dubvee.org 4 points 6 months ago (1 children)

Doesn't it already have simplified versions of most articles at simple.wikipedia.org ?

[–] antonim@lemmy.dbzer0.com 4 points 6 months ago* (last edited 6 months ago)

This is already addressed in the first quote in my post. And no, I'm sure that not even close to most articles have a simple.wikipedia equivalent, or that it actually is adequately simple (e.g. one topic I was interested in recently that Wikipedia didn't really help me with: "The Bernoulli numbers are a sequence of signed rational numbers that can be defined with exponential generating functions. These numbers appear in the series expansion of some trigonometric functions." - that's one whole "simplified" article, and I have no idea what it's saying and it has no additional info or examples).

[–] EncryptKeeper@lemmy.world 3 points 6 months ago

I am pretty rabidly anti-AI in most cases, but the use case for AI that I don’t think is a big negative is the distillation of information for simplification purposes. I am still somewhat against this in the sense that at the end of the day their summarization AI could hallucinate, and since they’ve admitted this is a solution to a problem of scale, then it’s not sensible to assume humans will be able to babysit it.

However… there is some inherent value to the idea that people will end up using AI to summarize Wikipedia using models of dubious quality with an unknown quantity of intentionally pre-trained bias, and therefore there is some inherent value to training your own model to present the information on your site in a way that is the “most free” of slop and bias.