Discussion
Loading...

Post

Log in
  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Lauren Weinstein
Lauren Weinstein
@lauren@mastodon.laurenweinstein.org  ·  activity timestamp 2 days ago

Being generous, I'd say that about half of #Google #Gemini responses have some error or misunderstanding in them that I can recognize. Something like 25% are completely wrong or totally miss the point. I have no idea how many responses there are that are actually wrong in some respect but I don't have the background knowledge to recognize any errors. USELESS.

  • Copy link
  • Flag this post
  • Block
Omar Antolín
Omar Antolín
@oantolin@mathstodon.xyz replied  ·  activity timestamp 14 hours ago

@lauren I think that matches my experience, but it's hard for me to be sure of the precise percentage since it's definitely wrong often enough that I hardly ever read the AI Overview anymore. I wonder if anyone has done a systematic study of often it's wrong.

  • Copy link
  • Flag this comment
  • Block
scott f
scott f
@scott@carfree.city replied  ·  activity timestamp 14 hours ago

@lauren For news-related queries, Gemini was significantly wrong 76% of the time in a BBC-led study. https://www.bbc.com/mediacentre/2025/new-ebu-research-ai-assistants-news-content

Largest study of its kind shows AI assistants misrepresent news content 45% of the time – regardless of language or territory

An intensive international study was coordinated by the European Broadcasting Union (EBU) and led by the BBC
  • Copy link
  • Flag this comment
  • Block
The Servitor
The Servitor
@TheServitor@sigmoid.social replied  ·  activity timestamp 16 hours ago

@lauren

Been dabbling in these things all along, and the PaLM > Bard > Gemini lineage is hands-down the most inaccurate and hallucinatory. I don't trust them for anything, although I hear the recent Pro versions are improved.

GPT is moderate, depends on domain. Claude is relatively better. Still bad at training-cutoff related issues (forever adding 2024 to searches, things like that).

  • Copy link
  • Flag this comment
  • Block
spaf
spaf
@spaf@mstdn.social replied  ·  activity timestamp 17 hours ago

@lauren Over half the time I have given a query to Gemini it has returned an incorrect result.

Perplexity and Claude seem the best, in my testing, although I don't use them extensively.

I have, as a standard part of queries, "Give a URL or preferably a DOI that substantiates your answer." Gemini often ignores this when it is hallucinating an answer.

  • Copy link
  • Flag this comment
  • Block
Bandit
Bandit
@bandit@indieweb.social replied  ·  activity timestamp 16 hours ago

@spaf @lauren

Every time you use generative AI a puppy dies.

Only being sorta sarcastic.

  • Copy link
  • Flag this comment
  • Block
David Andersen
David Andersen
@dave_andersen@hachyderm.io replied  ·  activity timestamp 17 hours ago

@spaf @lauren That's interesting - I've been having a higher hit rate with it (gemini 3.0-thinking) lately. Are you using it in thinking mode?

This is one I did the other day as an example; all of the things listed in "sources" are valid links, though I wouldn't call all of them high-quality! (In fact, many of them are explicitly low-quality - I would be very grumpy with a student if they tried to pass half of that off as "research").

In some ways, of course, laundering a bunch of low-quality sources is .. I don't know if it's _worse_ than fabrication from whole cloth, but it scares me.

Sorry, no caption provided by author
Sorry, no caption provided by author
Sorry, no caption provided by author
  • Copy link
  • Flag this comment
  • Block
Karl Auerbach
Karl Auerbach
@karlauerbach@sfba.social replied  ·  activity timestamp 17 hours ago

@spaf @lauren You said "Over half the time I have given a query to Gemini it has returned an incorrect result."

That caused me to wonder: How often would Gemini return a correct result if I were to give it an incorrect query?

BTW, you might find the following paper from 1977 to be illuminating or amusing:

How Artificial Is Intelligence? The great works of literature and art are not merely rare statistical fluctuations, but are they simply the products of correlation matrices?

Author(s): W. R. Bennett Jr.

Source: American Scientist , November-December 1977, Vol. 65, No. 6 (November-December 1977), pp. 694-702

Published by: Sigma Xi, The Scientific Research Honor Society

Stable URL: https://www.jstor.org/stable/27848169

https://www.jstor.org/stable/27848169

#ai

How Artificial Is Intelligence? The great works of literature and art are not merely rare statistical fluctuations, but are they simply the products of correlation matrices? on JSTOR

W. R. Bennett Jr., How Artificial Is Intelligence? The great works of literature and art are not merely rare statistical fluctuations, but are they simply the products of correlation matrices?, American Scientist, Vol. 65, No. 6 (November-December 1977), pp. 694-702
  • Copy link
  • Flag this comment
  • Block
UkeleleEric
UkeleleEric
@UkeleleEric@mstdn.social replied  ·  activity timestamp 17 hours ago

@spaf @lauren And the question is WHY are you still using them? LLMs are just complex guessing models, designed to give something that sounds like an answer. It's like trying to tell the time from a stopped clock. The only way you can tell if it's right, is if you find another clock.

  • Copy link
  • Flag this comment
  • Block
Tim W RESISTS
Tim W RESISTS
@tim@union.place replied  ·  activity timestamp 2 days ago

@lauren out of curiosity, what kinds of topics/questions are you having? I've not at all had that experience, so I'm curious how we're holding it differently.

(Saying you're not interested in engaging is totally fine as well; asking in my personal capacity and not as a Googler, I don't work directly on Gemini so I couldn't help in that capacity anyway).

  • Copy link
  • Flag this comment
  • Block
Lauren Weinstein
Lauren Weinstein
@lauren@mastodon.laurenweinstein.org replied  ·  activity timestamp 2 days ago

@tim Sidenote: Glad to see there are still some human SREs at G!

So I don't routinely (voluntarily) use any generative AI. My only voluntary use is for specific tests, e.g. https://lauren.vortex.com/2025/11/14/coding-with-gemini and a more recent test where I tried to use Gemini to create a simple test pattern and after an hour of failures gave up and did it in 5 minutes with Gimp manually.

By FAR most of my interactions with Google AI are in Search AI Overviews, which of course are forced onto users unless they take special actions to try suppress them. The statistics I quoted would mainly apply to AIOs given that's the part of the AI environment I'm seeing many times a day, typically.

I am assuming that AIOs are in some manner drawing on the same underlying models (more or less) as Gemini. That is just an assumption though, since while I was in an AI team hierarchy during the most recent period I was working inside Google years ago, that predated Gemini, AIOs, etc.

Thanks. -L

https://lauren.vortex.com

Coding with Gemini: Cheerful, Cooperative, and Usually, Wrong.

  • Copy link
  • Flag this comment
  • Block
Tim W RESISTS
Tim W RESISTS
@tim@union.place replied  ·  activity timestamp 2 days ago

@lauren interesting! AIO and AI Mode (AIM) are actually the areas I'm now working in (though on the SRE side, so not precisely related to product, and I just started there a couple of weeks before the holiday, so I'm still new to a lot of the Gen AI stuff there).

On the personal side my interactions with AIO and AIM have been relatively shallow, but I still haven't noticed the level of wrongness you describe. I do admit it may be a result of a bias on my part towards seeing the good, though.

  • Copy link
  • Flag this comment
  • Block
Tim W RESISTS
Tim W RESISTS
@tim@union.place replied  ·  activity timestamp 2 days ago

@lauren I don't think I'm revealing anything confidential to confirm that AIO/AIM do use the same Gemini foundational models under the hood, but with various "special sauce" to provide better grounding and make it "more Search appropriate" (ideally: less hallucinatory). But of course that can never go to zero.

For what it's worth I can confidently say that folks on both the Search and Gemini sides are absolutely invested in getting right answers, not just any answers.

  • Copy link
  • Flag this comment
  • Block
Tim W RESISTS
Tim W RESISTS
@tim@union.place replied  ·  activity timestamp 2 days ago

@lauren While Google these days (as ever, but even more so) is far from perfect, I do believe that the people are, by and large, still the best part, and despite Google's (many) flaws, the fundamental mechanisms of "Search tries to get good search results" are still in place.

Anyway, don't want to sound like a "Google is great" or I'm trying overly to convince you. But I'm happy to chat more (to the extent that I can within confidentiality restrictions) or relay reports of really bad answers!

  • Copy link
  • Flag this comment
  • Block

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances
Bonfire social · 1.0.1-alpha.44 no JS en
Automatic federation enabled
Log in
  • Explore
  • About
  • Members
  • Code of Conduct