this post was submitted on 10 Jun 2026
1536 points (99.4% liked)

Technology

85333 readers
4415 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 3 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] mabeledo@lemmy.world 2 points 1 hour ago

Even the number is a bit misleading. First of all, anyone who has ever done LLM benchmarking knows that this isn’t an exact science, at all. You can totally get a 99% on a benchmark and fail every single task on another.

But even this particular claim is nuanced. From the original article:

But with Gemini 3, Google’s A.I.-generated answers were more likely to be ungrounded than when the system was based on Gemini 2, meaning the websites they linked to did not completely support the information they provided. In October, correct answers were ungrounded 37 percent of the time. In February, with Gemini 3, that figure rose to 56 percent.

See https://www.nytimes.com/2026/04/07/technology/google-ai-overviews-accuracy.html

Meaning that 56% of the time, users cannot even verify the information given by the LLM with the sources the LLM claims it’s using.