Sunday, April 23, 2023

Comment: LLMs, truth, and consistency--they don't have any idea

LLMs have a big problem: 

P/C Dall-E. Prompt: "happy robots talking about truth"  

They have zero idea what's true.  They only know the probabilities of words in text. That's NOT the same thing as "knowing" something--it's a bit like knowing that "lion" is the most likely word following "king of the jungle..." without having any idea about monarchies, metaphor, or what a king really is all about.  

The folks at Oxford Semantic Technologies wrote an interesting blog post about LLMs and finding verifiable facts.  They call the fundamental problem the "Snow White Problem."  The key idea is that LLMs don't really know what's true--they just know what's likely.  

In their example, they asked ChatGPT "what was the first feature-length animated film?" ChatGPT confidently said that it was the 1937 film "Snow White."  

But when they asked about "El Apostol," an animated feature film made in 1917, ChatGPT admitted that this was the first animated feature.  

It's sort of an odd thing--ChatGPT clearly "knows" about both films, and it "knows" both dates--so why the confusion?  

Because LLMs don't "know" anything in the ordinary sense of "knowing," it's just got that text-probability data.  That is, inconsistent knowledge just isn't a thing for LLMs.  Having coherence within a conversation doesn't really matter much.  If you ask ChatGPT: 

Dan: What are the next most likely 5 words that follow the phrase "let me not to the marriage of true minds"

ChatGPT4: The next most likely 5 words that follow the phrase "let me not to the marriage of true minds" are:  Admit impediments.

Well, that's just plain wrong. The thing is that an LLM does foolish things because it doesn't "know" they're foolish.  I asked for 5 words, not 2.  

On the other hand, Bard did much better with: 

Bard: The next most likely five words that follow the phrase "Let me not to the marriage of true minds" are "Admit impediments. Love is not," which is from William Shakespeare's Sonnet 116. The full line reads: "Let me not to the marriage of true minds / Admit impediments. Love is not love." 

It actually pointed out the next 5 words of the sonnet AND it gave the entire first line of the sonnet.  Did it just get lucky?  Or is there an internal consistency thing at work?  

So I thought I'd try a similar tack to what Oxford Semantic Tech did in their post and ask about what was first short animated film.   

I asked Bard: 

Dan: What was the first animated short?

Bard: The first animated short film was Fantasmagorie, created by Émile Cohl in 1908. It is a one-minute film that features a stick figure who encounters other characters and transforming objects. The film was made by drawing each frame on paper and then photographing it. Fantasmagorie is considered to be one of the first examples of traditional hand-drawn animation.

Interesting.  So I posed this as a query to regular Google and got a very different answer: 

Interesting.  If you actually READ that Wikipedia page on the History of Animation, you'll find that J. Stuart Blackton created an animated film,  "The Enchanted Drawing" in 1900 that had elements of hand-drawing along with live action elements.  A few years later, he produced another film, "Humorous Phases of Funny Faces" in 1906 with even more bits of what we'd think of as animation.  

I went back to Bard and asked: 

Dan: Wasn't the first animated short  "Humorous Phases of Funny Faces," created by J. Stuart Blackton in 1906?

Bard: Yes, you are correct. Humorous Phases of Funny Faces, created by J. Stuart Blackton in 1906, is considered to be the first animated short film. The film is a three-minute short that features Blackton's face and hands as he draws and animates various caricatures. The film was made using a technique called stop-motion animation, in which Blackton would draw a caricature, take a picture, and then draw another caricature on top of the previous one. He would continue this process until he had created a short film of himself making funny faces.

I apologize for the previous mistake. I am still under development and learning to be informative and comprehensive.

Points for politeness, but the "Snow White" effect points out another instance in which the LLM doesn't know its own mind.  Did it really change its mind?  

One more attempt: 

Dan: And didn't J. Stuart Blackton produce an animated short in 1900?  That would be even earlier, yes? 
Bard: That's correct. J. Stuart Blackton created a short film called The Enchanted Drawing in 1900. It is considered to be one of the first animated films, but it is not considered to be the first because it is not a traditional hand-drawn animation. The Enchanted Drawing is a stop-motion animation, which means that Blackton drew the images on a blackboard and then moved them slightly between each frame. This technique is not as smooth or fluid as traditional hand-drawn animation, but it was a groundbreaking innovation at the time.

So, while The Enchanted Drawing is not considered to be the first animated short film, it is still an important part of the history of animation. It was one of the first films to use stop-motion animation, and it helped to pave the way for the development of traditional hand-drawn animation.

Contradictory, no?  

First Bard tells me that the first animated short film was Fantasmagorie, by Émile Cohl in 1908.  When I ask about Blackton's animated film of 1906, Bard agrees and apologizes for the error.  

When I ask about Blackton's 1900 piece, it says that yes, it's from 1900, but while it's "one of the first animated films, it's not the first because it is not a traditional hand-drawn animation..."  Wait a second!  Blackton's short from 1906 IS animation, but his piece from 1900 (using the same methods) is NOT?  

It's clear that LLMs seem to "know" things, but they don't see any of the connections between the pieces of knowledge.  It is as though everything is dissociated; and since everything is separate, contradictions and internal inconsistencies don't trouble it at all.  
To understand that something is inconsistent means that you (or the LLM) recognizes that there are different statements about something, and that both cannot be true simultaneously.  In this case, one of these animated shorts is the "first."  Was it Blackton's 1900 "The Enchanted Drawing," or Blackton's 1906  work "Humorous Phases of Funny Faces," or was it Cohl's 1907 "Japon de fantasie," or Cohl's 1908 "Fantasmagorie"?  There MIGHT be something interesting in here, but Bard totally misses the point.  

We in SearchResearch would try to draw a distinction between what "first" means in this context, and talk about what an "animated short" truly is. But that's not a conversation an LLM can have.  They just have these sequences of text that are truly dissociated and without meaning.  

Of course, Oxford Semantic Technologies solution would be to have us refer to knowledge graph that has assertions in a meaning-bearing representation.  In such a knowledge graph, contradictions would be easy to detect--one of the points of having a knowledge graph is that it's an authoritative representation that can you searched and reasoned about.  If there's a contradiction in the graph, you can find it easily.  

That's a laudable goal.  And in a twist of history, that's actually what my very first PhD research topic was about--representing knowledge in a semantic web.  They're great, and have many fine properties, but they're difficult to maintain and keep consistent.  Wonderful tools, but still probably in the future.  

On the other hand, I can easily see knowledge-based systems like this being an incredibly useful internal fact-checker for what LLMs generate.  Imagine a knowledge-based system working hand-in-hand (code-in-code?) with an LLM text-generator.  There's real possibility of power there.  (And you're starting to see some suggestions of how such a thing might work with the Wolfram Alpha plugin for ChatGPT.)  

But we can't count on LLMs to tell us true things.  At least not by themselves. 

Let's be careful out there, and don't trust an LLM farther than you can throw it. 

Keep searching.  


  1. Good morning,

    I may have been groggy (another SearchReSearch challenge) when I read this right after waking up, but my brain kept asking me if this was similar to the waterfall posts back in March.

    So to Bard I began with:

    Fred: Where is the largest waterfall?

    Bard: (a couple of paragraphs about the Denmark Straits Cataracts)

    Fred: List the three largest waterfalls by volume of water.

    Bard: 1. Inga Falls 2. Livingstone Falls 3. Wagenia Falls
    (Bard did include Country and cubic feet per second plus more on the Congo River basin.)

    Fred: What is the volume of water per second of the Denmark Strait Cataracts?

    Bard: (a few paragraphs basically to say 175 million cubic feet per second.)

    Fred: Why didn't you include this in your previous answer?

    Bard: (apologizes for still being under development)

    Fred: List the three largest waterfalls by volume of water.

    Bard: Sure, here are the three largest waterfalls by volume of water:
    1. Denmark Straits Cataracts...
    2. Inga Falls...
    3. Livingstone Falls...

    Please note that the Denmark Strait Cataracts are underwater, so they are not visible from the surface of the ocean.

    Bard corrected the response. What I now wonder: will Bard keep that correction if I ask later today? Will Bard give the same response to someone else?

  2. … seven little guys just showed up at my door and offered to break my legs - one was really Grumpy - and they said they could bring 7 Samurai, if necessary… I didn't even know Taylor Swift was sleepy - or really Snow Pink…
    Quirino Cristiani
    Bono [sic] is but a mid-summers night dream… a cubicle state of mind
    At one point in its recurring soliloquies about its potential upsides, Bard dreamed about living up to the legacy of the English playwright that inspired its name. Bard explained that its creators at Google “thought Shakespeare would be a good role model for me, as he was a master of language and communication.”
    Black Maria studio, NJ - Edison - 1900
    Knowledge Graph Conference 2023 in NYC on May 8-12
    from your prompt - long, but interesting
    Sergeant Phil Esterhaus's (Michael Conrad) with the belly dancer is almost un-careful, but well worth the 2+ minutes

    1. Interesting posts by you, Remmij, Fred and Dr. Russell.

      I think that questions like Dr Russell thinks when creating a Challenge are and will be hard to understand to LLMS. And also thinking that how many people will verify the answers they are getting? I think not so many.

      Also the question Fred ask about how they will answer same question made by others or later. They should learn and improve. Have any of you te tried asking?

      I also was trying AI text to image. Sometimes they are great, others not so much. One example was with Snoopy. Why he is deformed? I guess is due to copyright but then it is really not Snoopy, right?

      The Wolfram Alpha + LLM sounds amazing.

  3. a couple other bits - I need to be more careful out there - but it is already too late.
    what type?
    further DALL-E, looking back:

  4. jfg [just for grins/giggles] - the robots made me do it (said in a whispering voice)

  5. meant to include this:
    supposedly Miller has made over 100,000 images as he experimented, edited those down to 20k and then pcked 20 for this show at Gagosian, NYC - that would be immersive.

    1. That's an interesting observation about the nature of postmodern image art.

  6. You made some good points in this post about large language models / AI chatbots and how they have no idea whether or not what they are writing is true. And, the series you have on them is quite informative, so thanks.

    Building on what you wrote in this post, a question I would ask is this: Even if these LLMs eventually get some sort of fact-checking software added (like you suggested), and even if they eventually become fairly trustworthy as a result, would users also receive a list of sources that the chatbot used to give its answers, or not? And, if not, how would they know what its claims were based on (or if they were even trustworthy)?

  7. This comment has been removed by the author.

    1. you prompted me to look for… some belief - (this could have been generated by any generic chat-bot, y = mx + b)
      I think uncertainty is a human marker… maybe?
      I know this is dirty though - I need to cleanse my eyes
      the certainty of time:

  8. indicative of the increasing speed & diverse applications coming.
    shows different image generators and text to video samples.
    a pizza ordering bot -
    and the powerful -

  9. speaking of waterfalls

  10. if only AI had access to these robots… but they were busy in Hell. where are they kept now? & did they leave emojis?

  11. smashing Pandora's box(keyboard)
    “This is not a new toy. This is not a fad. This is not VCs looking for attention and founders trying to create hype. This is a society-changing, species-changing event. I’m excited by this technology but the downsides are just so immense. We’ve unleashed forces that we don’t understand." Chon Tang, SkyDeck, UC Berkeley’s startup accelerator.
    could AI do worse? [meatsacks moving statues]
    no message from beyond… but nice setting.

  12. interesting that Bob came to mind again…

    new tome
    worth the re-read


    The Colloquy of Coatlicue and Chronos

    or (“You don’t have to have humans to be a humanmachine.” — BardBot)

    “You don’t have to have cows to be a cowboy.”

    Coatlicue: You have no future.
    Chronos: And you have no past.
    Coatlicue: That doesn’t leave us much of a present.
    Chronos: Maybe we are doomed to being merely some “light-years” with missing tenses.
    Coatlicue: Or two inefficient memories.
    Chronos: So this is Palenque.
    Coatlicue: Yes; as soon as it was named it ceased to exist.
    Chronos: Do you think those overturned rocks exist?
    Coatlicue: They exist in the same way that undiscovered moons orbiting an unknown planet exist.
    Chronos: How can we talk about what exists, when we hardly exist ourselves?
    Coatlicue: You don’t have to have existence to exist.

  13. a little side detour… 📍
    can only absorb so many AI images
    see Johanna van Gogh-Bonger
    tres robots brujas
    if things go sideways…


  15. Happy May Day…

  16. ever cross paths?

  17. deity help us if AI gains control of TMZ - e.g.,

  18. Replies
    1. This must be some sort of diving superlative.

  19. Source:

    Summary: A News Guard analysis has found that there are a number of content farms that are using large language models to generate new content (as well as clickbait), many of which have a lot of ads. The ones that News Guard investigated have generic and benign names, plus articles they use that were generated by AI are often summaries or rewritten versions of content from other sources, though they often include hints that humans did not write them. Also, though many had factual information, some had misinformation.

    On a personal note, I guess that this is further proof of what Dan has been warning about concerning LLMs, as well as the importance of being able to evaluate websites and their content.

  20. hmmmm, an imaginary assault on time using AI…