Friday, April 12, 2024

Gemini has serious hallucinations (at least when you ask about composers!)

 Asking an LLM for facts isn't great... 

P/C Midjourney. Prompt: Bach, Beethoven, Brahms and Chopin sitting at a
coffee table with a musical score in the background.

Remember our Challenge from March 28...when we looked for names of composers that had associated movements, societies, or foundations?  

The easy example is Richard Wagner.  Immensely famous in his lifetime, his legacy gave rise to the adjective Wagnerian, to describe fans who are enthralled with his work.  Gustav Mahler, for instance, has been called a Wagnerian for the kind of music he composed.  

There are also many Wagner societies and clubs.  There's a Wagner society in Northern California, another in London, etc etc.  

In that Challenge, I was curious how we might use Google search AND the power of LLMs to help answer this question.  

After I wrote up the SRS Answer for that Challenge, I had a sudden brainwave.  Why not make a big spreadsheet of all the composers I could find, and then ask an LLM to tell me, for each of these composers, is there a society, club, or foundation associated with their name?  

After an easy search on Wikipedia, I found a long list (4962 composers!) of musicians and composers.  It was simple to pull that into a spreadsheet.  Here are the first few rows... 

And then, I looked for a Google Sheets extension that would connect me to ChatGPT.  (That's an obvious search, and was straightforward to install the extension. I used GPT for Sheets and Docs.) Just for completeness, I did the same thing to find a sheets extension for Gemini (AI Assist for Gemini).

Then, for both sheets, I wrote a prompt in Column B like this: 

=GPT("does this " & A26 & " have a society or foundation to promote their music? Please give a short answer that includes the name of the society or foundation and give a URL to the website if possible")

 That's a pretty straightforward way to ask ChatGPT a bunch of similar questions.  Here are the top few rows of results: 




Note that the first ChatGPT reply is a bit odd ("Michel van der Aa is not a composer.  He is a Dutch composer..."  -- so which is it?)  

Column B is ChatGPT's reply to the prompt.  

Of the 4962 composers/musicians, ChatGPT responded affirmatively for 1214 of them.  I wrote a little function to extract the URL from each of the responses that had a link to the composers website, and I found that about half of them were actually valid sites--632 of them worked.  That's not great, but it was a lot better than what I could do by hand.  


HOWEVER... 

I did exactly the same thing with Gemini (using the Gemini Sheets extension), but got a VERY different answer.  Here's the top of that spreadsheet.  Notice anything different? 

Gemini's replies

Yeah.  Gemini thinks everyone has a foundation and a website. 

What's strange is that it's like that for most of the rest of the spreadsheet.  Gemini found that 75% of the composers listed in the sheet had a society or foundation to promote their music.  I checked around 150 of them--they're all bogus.  

After spending a bunch of time checking the results, I decided to try and just vet the URLs that Gemini suggested.  

Instead of testing every one of these URLs by hand, I wrote a function to extract them, and then did a simple WHOIS (to check and see if they were actually valid domains).  No surprise, virtually none of them were valid. 

But look at the very first result:  It turns out that Michel van der Aa DOES have a foundation, but it's called DoubleA Foundation, and its URL is https://doublea.net/  The URL that Gemini gives above (Michelvanderaafoundation.org) is not a valid website.  This is purely hallucinated.   

This is true for the next several thousand URLs that Gemini "found" for us.  The URLs look convincing, but they're just plausible looking junk.  

Ugh. 


Results 

It's interesting that ChatGPT missed the DoubleA Foundation of Michel van der Aa, AND it hallucinated about 50% of the positive hits.  Still... I was able to learn some useful things.  

But ChatGPT is very picky about the prompt.  In an earlier version of the spreadsheet I asked with the prompt 

"Is there a musical society for the music of <musician>?"  

In the case of musician Gamal Abdel-Rahim, ChatGPT said "No, there's no such society."  

But when I asked with a slightly different prompt: 

"does <musician> have a society or foundation to promote their music? Please give a short answer that includes the name of the society or foundation and give a URL to the website if possible"

The answer completely flipped to "Yes, Gamal Abdel-Rahim does have a society dedicated to promoting his music..."  

That's no especially handy but it does show how sensitive these things are.  


On the other hand, Gemini hallucinated thousands of results and just made-up thousands of URLs, nearly all of which have invalid domain names.  (And it ALSO missed the DoubleA Foundation!)  

Looking on the bright side, ChatGPT at least found a few hundred valid composer/musician societies AND legit links to websites describing them.  I actually found the GPT results to be useful, if slightly buried.  

But after wading through a few hundred totally bogus results from Gemini, I got discouraged.  So much nonsense made up so fast.  


Recommendation

Don't trust ANY LLM for accurate answers to prompts that involve actual people. You really can't trust the results.  

And in particular, don't count on Gemini until they really improve their game.  If accuracy counts, think of another way to do this.  


Keep searching.  



5 comments:

  1. there seems to be an image blocking filter at work…
    not sure who's or why… don't want some sort of deep-fake Bachian thing going on? where is Henk?
    https://i.imgur.com/vN2UYNVl.jpg
    not Wagneresque…
    https://www.youtube.com/watch?v=Zi_XLOBDo_Y

    ReplyDelete
  2. Thanks Dr. Russell for doing these experiments and sharing them with us.

    Out of topic, I noticed that when sharing a YouTube video sometimes the url starts with m.youtube and others with youtu.be

    Why two types of url? Besides the second I can't open on my mobile browser. It only works if I click and then, the url goes to YouTube app. Weird things happening lately on my urls

    ReplyDelete
    Replies
    1. The m. prefix indicates that it's optimized for mobile delivery. The Youtu.be is the default serving mode.

      Delete
  3. new pet? - perhaps not appropriate
    https://www.instagram.com/explore/search/keyword/?q=blue%20ring%20octopus

    ReplyDelete