Wednesday, May 3, 2023

Answer: How well do LLMs answer SRS questions?

 Remember this? 

P/C Dall-e. Prompt: happy robots answering questions  rendered in a
ukiyo-e style on a sweeping landscape cheerful


Our Challenge was this:  

1.  I'd like you to report on YOUR experiences in trying to get ChatGPT or Bard (or whichever LLM you'd like to use) to answer your curious questions.  What was the question you were trying to answer?  How well did it turn out?  

Hope you had a chance to read my comments from the previous week.  

On April 21 I wrote about why LLMs are all cybernetic mansplaining--and I mean that in the most negative way possible.  If mansplaining is a kind of condescending explanation about something the man has incomplete knowledge (and with the mistaken assumption that he knows more about it than the person he's talking to does), then that's what's going on, cyberneticly.  

On April 23 I wrote another post about how LLMs seem to know things, but when you question them closely, they don't actually know much at all.  

Fred/Krossbow made the excellent point that it's not clear that Bard is learning.  After asking a question, then asking a follow-up and getting a changed response: "Bard corrected the response. What I now wonder: will Bard keep that correction if I ask later today? Will Bard give the same response to someone else?" 

It's unclear.  I'm sure this kind of memory (and gradual learning) will become part of the LLMs.  But at the moment, it's not happening.   

And that's a big part of the problem with LLMs: We just don't know what they're doing, why, or how.  

As several people have pointed out, that's true of humans as well.  I have no idea what you (my dear reader) are capable of doing, whether you're learning or not... but I have decades of experience dealing with other humans of your make and model, and I far a pretty good idea about what a human's performance characteristics are.  I don't have anything similar for an LLM.  Even if I spent a lot of time developing one, it might well change tomorrow when a new model is pushed out to the servers.  Which LLM are you talking to now?  

P/C Dall-E. Prompt: [ twenty robots, all slightly different from each other, trying to answer questions in a hyperrealistic style 3d rendering ]

What happens when the fundamental LLM question-answering system changes moment by moment?  

Of course, that's what happens with Google's index.  It's varying all the time as well, and it's why you sometimes get different answers to the same query from day-to-day--the underlying data has changed.  

And perhaps we'll get used to the constant evolution of our tools.  It's an interesting perspective to have.  

mateojose1 wonders if LLMs are complemented by deep knowledge components (e.g., grafting Wolfram Alpha to handle the heavy math chores), if THEN we'll get citations.  

I think that's part of the goal.  I've been playing around with Scite.ai LLM for the scholarly literature (think of it as ChatGPT trained on the contents of Google Scholar).  It's been working really well for me when I ask it questions that are "reasonably scholarly," that is, with papers that might address the question at hand.  I've been impressed with the quality of the answers, along with the lack of hallucination AND the presence of accurate citations.  

This LLM (scite.ai) is so interesting that I'd devote an entire post to it soon.  (Note that I'm not getting any funding from them to talk about their service.  I've just been impressed.)  

As usual, remmij has a plethora of interesting links for us to consider.  You have to love remmij's "robots throwing an LLM into space" Dall-E images. Wonderful.  (Worth a click.) 

But I also really agree with the link that points to Beren Millidge's blog post about how LLMs "confabulate not hallucinate."  

This is a great point--the term "hallucination" really means that one experiences an apparent sensory perception of something not actually present.  At the same time "confabulation" happens when someone is not able to explain or answer questions correctly, but does so anyway.  The confabulator (that's a real word, BTW) literally doesn't know if what they're saying is true or not, but does ahead regardless. That's much more like what's going on with LLMs.  


Thanks to everyone for their thoughts.  It's been fun to read them the past week.  Sorry about the delay.  I was at a conference in Hamburg, Germany.  As usual, I thought I would have the time to post my reply, but instead I was completely absorbed in what was happening.  As you can imagine, we all spent a lot of time chatting about LLMs and how humans would understand them and grow to use them.  

The consensus was that we're just at the beginning of the LLMs arms race--all of the things we worry about (truth, credibility, accuracy, etc.) are being challenged in new and slightly askew ways.  

I feel like one of the essential messages of SearchResearch has always been that we need to understand what our tools are and how they operate.  The ChatGPTs and LLMs of the world are clearly new tools with great possibilities--and we still need to understand them and their limits.  

We'll do our best, here in the little SRS shop on the prairie.  

Keep searching, my friends.  



8 comments:

  1. I've been reading about this some on my own, and am thinking that, though AI chatbots / large language models may one day pose a threat to search engines and may render web search techniques obsolete, that day has not arrived and may not for some time. In other words, being able to both search for and evaluate online information will likely be useful for a while to come.

    What do you think?

    ReplyDelete
    Replies
    1. Trouble is there seems to be a lot of belief that that's not the case. Will it just be evident as we proceed, or will it be an anguished argument? Is it at all relevant to consider how we have been dealing with the media for the past 15 years? Will acceptance differ for images, video, vs text, types of subject matter and context? ... So fascinating.

      Delete
  2. I missed the marathon (luckily, as it ended right in front of where I was staying). Would have been fun to see, though, as I've run my share of marathons, and Hamburg is a classic.

    ReplyDelete
  3. this Henk van Ess piece may be of interest
    https://henkvaness.substack.com/p/chatgpt-3-unlocking-visual-search
    in the context of recent "happenings" in Moscow… discerning truth on the state level -
    https://thegrayzone.com/2023/05/02/ukrainian-banker-cash-drone-terror-russia/
    brain preferences -
    https://www.learnevents.com/learning-insights/imagery-vs-text-which-does-the-brain-prefer/
    it all can be a muddle.
    were you able to get a sense of the German zeitgeist, or was it too insular at the conference?
    have we traveled back to the Memorex 80's? is it real?
    https://www.stevehoffacker.com/2017/12/20/is-it-real-or-is-it-memorex/

    ReplyDelete
  4. Thanks Dr Russell, Remmij and everyone commenting.

    Today is May 4th so good timing for the answer as Star Wars gave us a lot.

    I'd try the LLMS suggested. I hope it's available in Mexico.

    Another one that looks interesting and good to learn is the one done by Khan academy with ChatGPT: Khanmigo

    https://youtu.be/hJP5GqnTrNo

    ReplyDelete
  5. the 20 robot image - car? -
    https://www.google.com/search?tbs=sbi:AMhZZisI7xOP5aPMzZfM18pHnIyxN99NFwvDFDPTGe7vTaqbzBPTLAD558vxKqxAZk8GUcLtWCwhzkqUsM88MiC1_1xvSN5a2SQI5baA4lNg2zEDKYH3hk9z9fPjnU0TWY-qq5vlSYS9RYJNUgN2X670way_1nhn_1oGVmECDPzUEq1-f06XawI37Vi2kEKM4utSaWGBRYlstdvk8SJBx5A5F_1nc0i2klCPylae11ULjUHZnmIKGVfbTrNawBAJd6hj4OJfKrT0MKpid0OmJpthvFrLMZMA3ftFJIHSC4psLXGQaWyLzlsD2N_1GkCc1xU0zOyHxH0M5AalBEcV-ykssjrVIRd-4AfC2r0ogmNZbCVV_1up_1vfYwhyTqZudjt4GHFyF1gSsTpD0Ofq8xHZCq4pToJVzaCYpUC8g

    ReplyDelete
  6. Hello People, Anyone using ChatGPT or Bard to find case studies for thought leadership content?

    I tried this in Bard but got no helpful answers.
    -What are some real life historical examples of companies of using language in a way different than their intended clients?

    ReplyDelete