Post · bonfire.cafe

Post

@lauren@mastodon.laurenweinstein.org · 2 days ago

Being generous, I'd say that about half of #Google #Gemini responses have some error or misunderstanding in them that I can recognize. Something like 25% are completely wrong or totally miss the point. I have no idea how many responses there are that are actually wrong in some respect but I don't have the background knowledge to recognize any errors. USELESS.

Omar Antolín

@oantolin@mathstodon.xyz replied · 14 hours ago

@lauren I think that matches my experience, but it's hard for me to be sure of the precise percentage since it's definitely wrong often enough that I hardly ever read the AI Overview anymore. I wonder if anyone has done a systematic study of often it's wrong.

scott f

@scott@carfree.city replied · 14 hours ago

@lauren For news-related queries, Gemini was significantly wrong 76% of the time in a BBC-led study. https://www.bbc.com/mediacentre/2025/new-ebu-research-ai-assistants-news-content

Largest study of its kind shows AI assistants misrepresent news content 45% of the time – regardless of language or territory

An intensive international study was coordinated by the European Broadcasting Union (EBU) and led by the BBC

The Servitor

@TheServitor@sigmoid.social replied · 16 hours ago

@lauren

Been dabbling in these things all along, and the PaLM > Bard > Gemini lineage is hands-down the most inaccurate and hallucinatory. I don't trust them for anything, although I hear the recent Pro versions are improved.

GPT is moderate, depends on domain. Claude is relatively better. Still bad at training-cutoff related issues (forever adding 2024 to searches, things like that).

spaf

@spaf@mstdn.social replied · 17 hours ago

@lauren Over half the time I have given a query to Gemini it has returned an incorrect result.

Perplexity and Claude seem the best, in my testing, although I don't use them extensively.

I have, as a standard part of queries, "Give a URL or preferably a DOI that substantiates your answer." Gemini often ignores this when it is hallucinating an answer.

Bandit

@bandit@indieweb.social replied · 16 hours ago

@spaf @lauren

Every time you use generative AI a puppy dies.

Only being sorta sarcastic.

David Andersen

@dave_andersen@hachyderm.io replied · 17 hours ago

@spaf @lauren That's interesting - I've been having a higher hit rate with it (gemini 3.0-thinking) lately. Are you using it in thinking mode?

This is one I did the other day as an example; all of the things listed in "sources" are valid links, though I wouldn't call all of them high-quality! (In fact, many of them are explicitly low-quality - I would be very grumpy with a student if they tried to pass half of that off as "research").

In some ways, of course, laundering a bunch of low-quality sources is .. I don't know if it's _worse_ than fabrication from whole cloth, but it scares me.

Karl Auerbach

@karlauerbach@sfba.social replied · 17 hours ago

@spaf @lauren You said "Over half the time I have given a query to Gemini it has returned an incorrect result."

That caused me to wonder: How often would Gemini return a correct result if I were to give it an incorrect query?

BTW, you might find the following paper from 1977 to be illuminating or amusing:

How Artificial Is Intelligence? The great works of literature and art are not merely rare statistical fluctuations, but are they simply the products of correlation matrices?

Author(s): W. R. Bennett Jr.

Source: American Scientist , November-December 1977, Vol. 65, No. 6 (November-December 1977), pp. 694-702

Published by: Sigma Xi, The Scientific Research Honor Society

Stable URL: https://www.jstor.org/stable/27848169

https://www.jstor.org/stable/27848169

#ai

How Artificial Is Intelligence? The great works of literature and art are not merely rare statistical fluctuations, but are they simply the products of correlation matrices? on JSTOR

W. R. Bennett Jr., How Artificial Is Intelligence? The great works of literature and art are not merely rare statistical fluctuations, but are they simply the products of correlation matrices?, American Scientist, Vol. 65, No. 6 (November-December 1977), pp. 694-702

UkeleleEric

@UkeleleEric@mstdn.social replied · 17 hours ago

@spaf @lauren And the question is WHY are you still using them? LLMs are just complex guessing models, designed to give something that sounds like an answer. It's like trying to tell the time from a stopped clock. The only way you can tell if it's right, is if you find another clock.

Tim W RESISTS

@tim@union.place replied · 2 days ago

@lauren out of curiosity, what kinds of topics/questions are you having? I've not at all had that experience, so I'm curious how we're holding it differently.

(Saying you're not interested in engaging is totally fine as well; asking in my personal capacity and not as a Googler, I don't work directly on Gemini so I couldn't help in that capacity anyway).

Lauren Weinstein

@lauren@mastodon.laurenweinstein.org replied · 2 days ago

@tim Sidenote: Glad to see there are still some human SREs at G!

So I don't routinely (voluntarily) use any generative AI. My only voluntary use is for specific tests, e.g. https://lauren.vortex.com/2025/11/14/coding-with-gemini and a more recent test where I tried to use Gemini to create a simple test pattern and after an hour of failures gave up and did it in 5 minutes with Gimp manually.

By FAR most of my interactions with Google AI are in Search AI Overviews, which of course are forced onto users unless they take special actions to try suppress them. The statistics I quoted would mainly apply to AIOs given that's the part of the AI environment I'm seeing many times a day, typically.

I am assuming that AIOs are in some manner drawing on the same underlying models (more or less) as Gemini. That is just an assumption though, since while I was in an AI team hierarchy during the most recent period I was working inside Google years ago, that predated Gemini, AIOs, etc.

Thanks. -L

https://lauren.vortex.com

Coding with Gemini: Cheerful, Cooperative, and Usually, Wrong.

Tim W RESISTS

@tim@union.place replied · 2 days ago

@lauren interesting! AIO and AI Mode (AIM) are actually the areas I'm now working in (though on the SRE side, so not precisely related to product, and I just started there a couple of weeks before the holiday, so I'm still new to a lot of the Gen AI stuff there).

On the personal side my interactions with AIO and AIM have been relatively shallow, but I still haven't noticed the level of wrongness you describe. I do admit it may be a result of a bias on my part towards seeing the good, though.

Tim W RESISTS

@tim@union.place replied · 2 days ago

@lauren I don't think I'm revealing anything confidential to confirm that AIO/AIM do use the same Gemini foundational models under the hood, but with various "special sauce" to provide better grounding and make it "more Search appropriate" (ideally: less hallucinatory). But of course that can never go to zero.

For what it's worth I can confidently say that folks on both the Search and Gemini sides are absolutely invested in getting right answers, not just any answers.

Tim W RESISTS

@tim@union.place replied · 2 days ago

@lauren While Google these days (as ever, but even more so) is far from perfect, I do believe that the people are, by and large, still the best part, and despite Google's (many) flaws, the fundamental mechanisms of "Search tries to get good search results" are still in place.

Anyway, don't want to sound like a "Google is great" or I'm trying overly to convince you. But I'm happy to chat more (to the extent that I can within confidentiality restrictions) or relay reports of really bad answers!

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances

Bonfire social · 1.0.1-alpha.44 no JS en

Automatic federation enabled