SearchReSearch: Comment: LLMs, truth, and consistency--they don't have any idea

Sunday, April 23, 2023

Comment: LLMs, truth, and consistency--they don't have any idea

LLMs have a big problem:

P/C Dall-E. Prompt: "happy robots talking about truth"

They have zero idea what's true. They only know the probabilities of words in text. That's NOT the same thing as "knowing" something--it's a bit like knowing that "lion" is the most likely word following "king of the jungle..." without having any idea about monarchies, metaphor, or what a king really is all about.

The folks at Oxford Semantic Technologies wrote an interesting blog post about LLMs and finding verifiable facts. They call the fundamental problem the "Snow White Problem." The key idea is that LLMs don't really know what's true--they just know what's likely.

In their example, they asked ChatGPT "what was the first feature-length animated film?" ChatGPT confidently said that it was the 1937 film "Snow White."

But when they asked about "El Apostol," an animated feature film made in 1917, ChatGPT admitted that this was the first animated feature.

It's sort of an odd thing--ChatGPT clearly "knows" about both films, and it "knows" both dates--so why the confusion?

Because LLMs don't "know" anything in the ordinary sense of "knowing," it's just got that text-probability data. That is, inconsistent knowledge just isn't a thing for LLMs. Having coherence within a conversation doesn't really matter much. If you ask ChatGPT:

Dan: What are the next most likely 5 words that follow the phrase "let me not to the marriage of true minds"
ChatGPT4: The next most likely 5 words that follow the phrase "let me not to the marriage of true minds" are: Admit impediments.

Well, that's just plain wrong. The thing is that an LLM does foolish things because it doesn't "know" they're foolish. I asked for 5 words, not 2.

On the other hand, Bard did much better with:

Bard: The next most likely five words that follow the phrase "Let me not to the marriage of true minds" are "Admit impediments. Love is not," which is from William Shakespeare's Sonnet 116. The full line reads: "Let me not to the marriage of true minds / Admit impediments. Love is not love."

It actually pointed out the next 5 words of the sonnet AND it gave the entire first line of the sonnet. Did it just get lucky? Or is there an internal consistency thing at work?

So I thought I'd try a similar tack to what Oxford Semantic Tech did in their post and ask about what was first short animated film.

I asked Bard:

Dan: What was the first animated short?

Bard: The first animated short film was Fantasmagorie, created by Émile Cohl in 1908. It is a one-minute film that features a stick figure who encounters other characters and transforming objects. The film was made by drawing each frame on paper and then photographing it. Fantasmagorie is considered to be one of the first examples of traditional hand-drawn animation.

Interesting. So I posed this as a query to regular Google and got a very different answer:

Interesting. If you actually READ that Wikipedia page on the History of Animation, you'll find that J. Stuart Blackton created an animated film, "The Enchanted Drawing" in 1900 that had elements of hand-drawing along with live action elements. A few years later, he produced another film, "Humorous Phases of Funny Faces" in 1906 with even more bits of what we'd think of as animation.

I went back to Bard and asked:

Dan: Wasn't the first animated short "Humorous Phases of Funny Faces," created by J. Stuart Blackton in 1906?

Bard: Yes, you are correct. Humorous Phases of Funny Faces, created by J. Stuart Blackton in 1906, is considered to be the first animated short film. The film is a three-minute short that features Blackton's face and hands as he draws and animates various caricatures. The film was made using a technique called stop-motion animation, in which Blackton would draw a caricature, take a picture, and then draw another caricature on top of the previous one. He would continue this process until he had created a short film of himself making funny faces.

I apologize for the previous mistake. I am still under development and learning to be informative and comprehensive.

Points for politeness, but the "Snow White" effect points out another instance in which the LLM doesn't know its own mind. Did it really change its mind?

One more attempt:

Dan: And didn't J. Stuart Blackton produce an animated short in 1900? That would be even earlier, yes?

Bard: That's correct. J. Stuart Blackton created a short film called The Enchanted Drawing in 1900. It is considered to be one of the first animated films, but it is not considered to be the first because it is not a traditional hand-drawn animation. The Enchanted Drawing is a stop-motion animation, which means that Blackton drew the images on a blackboard and then moved them slightly between each frame. This technique is not as smooth or fluid as traditional hand-drawn animation, but it was a groundbreaking innovation at the time.

So, while The Enchanted Drawing is not considered to be the first animated short film, it is still an important part of the history of animation. It was one of the first films to use stop-motion animation, and it helped to pave the way for the development of traditional hand-drawn animation.

Contradictory, no?

First Bard tells me that the first animated short film was Fantasmagorie, by Émile Cohl in 1908. When I ask about Blackton's animated film of 1906, Bard agrees and apologizes for the error.

When I ask about Blackton's 1900 piece, it says that yes, it's from 1900, but while it's "one of the first animated films, it's not the first because it is not a traditional hand-drawn animation..." Wait a second! Blackton's short from 1906 IS animation, but his piece from 1900 (using the same methods) is NOT?

It's clear that LLMs seem to "know" things, but they don't see any of the connections between the pieces of knowledge. It is as though everything is dissociated; and since everything is separate, contradictions and internal inconsistencies don't trouble it at all.

To understand that something is inconsistent means that you (or the LLM) recognizes that there are different statements about something, and that both cannot be true simultaneously. In this case, one of these animated shorts is the "first." Was it Blackton's 1900 "The Enchanted Drawing," or Blackton's 1906 work "Humorous Phases of Funny Faces," or was it Cohl's 1907 "Japon de fantasie," or Cohl's 1908 "Fantasmagorie"? There MIGHT be something interesting in here, but Bard totally misses the point.

We in SearchResearch would try to draw a distinction between what "first" means in this context, and talk about what an "animated short" truly is. But that's not a conversation an LLM can have. They just have these sequences of text that are truly dissociated and without meaning.

Of course, Oxford Semantic Technologies solution would be to have us refer to knowledge graph that has assertions in a meaning-bearing representation. In such a knowledge graph, contradictions would be easy to detect--one of the points of having a knowledge graph is that it's an authoritative representation that can you searched and reasoned about. If there's a contradiction in the graph, you can find it easily.

That's a laudable goal. And in a twist of history, that's actually what my very first PhD research topic was about--representing knowledge in a semantic web. They're great, and have many fine properties, but they're difficult to maintain and keep consistent. Wonderful tools, but still probably in the future.

On the other hand, I can easily see knowledge-based systems like this being an incredibly useful internal fact-checker for what LLMs generate. Imagine a knowledge-based system working hand-in-hand (code-in-code?) with an LLM text-generator. There's real possibility of power there. (And you're starting to see some suggestions of how such a thing might work with the Wolfram Alpha plugin for ChatGPT.)

But we can't count on LLMs to tell us true things. At least not by themselves.

Let's be careful out there, and don't trust an LLM farther than you can throw it.

Keep searching.

31 comments:

krossbowApril 24, 2023 at 4:38 AM
Good morning,

I may have been groggy (another SearchReSearch challenge) when I read this right after waking up, but my brain kept asking me if this was similar to the waterfall posts back in March. https://searchresearch1.blogspot.com/2023/03/answer-worlds-largest-waterfall.html

So to Bard I began with:

Fred: Where is the largest waterfall?

Bard: (a couple of paragraphs about the Denmark Straits Cataracts)

Fred: List the three largest waterfalls by volume of water.

Bard: 1. Inga Falls 2. Livingstone Falls 3. Wagenia Falls
(Bard did include Country and cubic feet per second plus more on the Congo River basin.)

Fred: What is the volume of water per second of the Denmark Strait Cataracts?

Bard: (a few paragraphs basically to say 175 million cubic feet per second.)

Fred: Why didn't you include this in your previous answer?

Bard: (apologizes for still being under development)

Fred: List the three largest waterfalls by volume of water.

Bard: Sure, here are the three largest waterfalls by volume of water:
1. Denmark Straits Cataracts...
2. Inga Falls...
3. Livingstone Falls...

Please note that the Denmark Strait Cataracts are underwater, so they are not visible from the surface of the ocean.

Bard corrected the response. What I now wonder: will Bard keep that correction if I ask later today? Will Bard give the same response to someone else?
ReplyDelete
Replies
remmijApril 24, 2023 at 9:02 AM
… seven little guys just showed up at my door and offered to break my legs - one was really Grumpy - and they said they could bring 7 Samurai, if necessary… I didn't even know Taylor Swift was sleepy - or really Snow Pink…
Quirino Cristiani
https://vimeo.com/158798301
https://www.facebook.com/people/Quirino-Cristiani-The-Mystery-of-the-First-Animated-Movies/100035392556374/
Bono [sic] is but a mid-summers night dream… a cubicle state of mind
At one point in its recurring soliloquies about its potential upsides, Bard dreamed about living up to the legacy of the English playwright that inspired its name. Bard explained that its creators at Google “thought Shakespeare would be a good role model for me, as he was a master of language and communication.”
Black Maria studio, NJ - Edison - 1900
https://youtu.be/pe7HSnZotbU
Knowledge Graph Conference 2023 in NYC on May 8-12
https://www.ontotext.com/knowledgehub/fundamentals/what-is-the-semantic-web/
from your prompt - long, but interesting
https://www.youtube.com/watch?v=z5WZhCBRDpU
Sergeant Phil Esterhaus's (Michael Conrad) with the belly dancer is almost un-careful, but well worth the 2+ minutes
ReplyDelete
Replies
remmijApril 24, 2023 at 9:30 AM
a couple other bits - I need to be more careful out there - but it is already too late.
what type?
https://www.wired.com/story/ai-isnt-going-to-reinvent-the-alphabet-anytime-soon/
further DALL-E, looking back:
https://imgur.com/a/wUHXpOH
ReplyDelete
Replies
remmijApril 24, 2023 at 10:32 AM
jfg [just for grins/giggles] - the robots made me do it (said in a whispering voice)
https://imgur.com/a/u4VtzXQ
ReplyDelete
Replies
remmijApril 24, 2023 at 10:38 AM
meant to include this:
supposedly Miller has made over 100,000 images as he experimented, edited those down to 20k and then pcked 20 for this show at Gagosian, NYC - that would be immersive.
https://gagosian.com/exhibitions/2023/bennett-miller/
ReplyDelete
Replies
remmijApril 24, 2023 at 1:14 PM
if the barristers are spinning up, things are getting serious… already on an AI list somewhere…
592
870
degree in electrical engineering
Nita Farahany
SLAPLAB

Shakespeare, now BARD - was Will AI?
wiki
Act IV, Scene II
ReplyDelete
Replies
remmijApril 24, 2023 at 1:18 PM
Henk van Ess
ReplyDelete
Replies
mateojose1April 24, 2023 at 1:40 PM
You made some good points in this post about large language models / AI chatbots and how they have no idea whether or not what they are writing is true. And, the series you have on them is quite informative, so thanks.

Building on what you wrote in this post, a question I would ask is this: Even if these LLMs eventually get some sort of fact-checking software added (like you suggested), and even if they eventually become fairly trustworthy as a result, would users also receive a list of sources that the chatbot used to give its answers, or not? And, if not, how would they know what its claims were based on (or if they were even trustworthy)?
ReplyDelete
Replies
MathladyApril 24, 2023 at 6:49 PM
This comment has been removed by the author.
ReplyDelete
Replies
remmijApril 24, 2023 at 7:08 PM
indicative of the increasing speed & diverse applications coming.
shows different image generators and text to video samples.
https://www.youtube.com/watch?v=gQfQiXP9yZA
a pizza ordering bot -
and the powerful -
https://imgur.com/a/tHeo6lm
ReplyDelete
Replies
remmijApril 25, 2023 at 1:27 PM
guardrails
nvidia podcasts
ReplyDelete
Replies
remmijApril 26, 2023 at 8:16 AM
speaking of waterfalls
https://www.getty.edu/art/collection/object/104D03
ReplyDelete
Replies
remmijApril 26, 2023 at 10:20 AM
if only AI had access to these robots… but they were busy in Hell. where are they kept now? & did they leave emojis?
https://www.terranoa.com/en/one-off/fukushima-robots-in-hell-2093
ReplyDelete
Replies
remmijApril 26, 2023 at 12:44 PM
smashing Pandora's box(keyboard)
https://tinyurl.com/ycy43jec
“This is not a new toy. This is not a fad. This is not VCs looking for attention and founders trying to create hype. This is a society-changing, species-changing event. I’m excited by this technology but the downsides are just so immense. We’ve unleashed forces that we don’t understand." Chon Tang, SkyDeck, UC Berkeley’s startup accelerator.
could AI do worse? [meatsacks moving statues]
https://tinyurl.com/mtrz42bs
https://en.wikipedia.org/wiki/Thomas_Fallon
no message from beyond… but nice setting.
https://en.wikipedia.org/wiki/Monolith_(Space_Odyssey)#/media/File:ENS_2001_Monolith_LILA_night.jpg
ReplyDelete
Replies
remmijApril 28, 2023 at 12:16 PM
if things go sideways…
ReplyDelete
Replies
remmijApril 29, 2023 at 10:30 AM
interesting that Bob came to mind again…

new tome
worth the re-read
writings

Coatlicue

The Colloquy of Coatlicue and Chronos

or (“You don’t have to have humans to be a humanmachine.” — BardBot)

“You don’t have to have cows to be a cowboy.”
—Nudie

Coatlicue: You have no future.
Chronos: And you have no past.
Coatlicue: That doesn’t leave us much of a present.
Chronos: Maybe we are doomed to being merely some “light-years” with missing tenses.
Coatlicue: Or two inefficient memories.
Chronos: So this is Palenque.
Coatlicue: Yes; as soon as it was named it ceased to exist.
Chronos: Do you think those overturned rocks exist?
Coatlicue: They exist in the same way that undiscovered moons orbiting an unknown planet exist.
Chronos: How can we talk about what exists, when we hardly exist ourselves?
Coatlicue: You don’t have to have existence to exist.
ReplyDelete
Replies
remmijApril 29, 2023 at 1:37 PM
a little side detour… 📍
can only absorb so many AI images
see Johanna van Gogh-Bonger
letters
tres robots brujas
if things go sideways…
ReplyDelete
Replies
remmijApril 29, 2023 at 3:43 PM
https://futurism.com/blake-lemoine-google-interview
ReplyDelete
Replies
remmijMay 1, 2023 at 10:12 AM
Happy May Day…

https://www.nytimes.com/2023/05/01/technology/ai-google-chatbot-engineer-quits-hinton.html

https://www.technologyreview.com/2023/05/01/1072478/deep-learning-pioneer-geoffrey-hinton-quits-google/?truid=&utm_source=the_algorithm&utm_medium=email&utm_campaign=the_algorithm.unpaid.engagement&utm_content=05-01-2023&mc_cid=dc465b7dad&mc_eid=a2abac3a08

https://aaai.org/working-together-on-our-future-with-ai/

https://twitter.com/geoffreyhinton/status/1635739459764322330?cxt=HHwWtIC8xeiXqLMtAAAA
ReplyDelete
Replies
remmijMay 1, 2023 at 10:21 AM
ever cross paths?
https://research.google/people/jeff/

https://homes.cs.washington.edu/~chambers/bio.html
ReplyDelete
Replies
remmijMay 1, 2023 at 10:29 AM
deity help us if AI gains control of TMZ - e.g.,

https://twitter.com/timnitGebru/status/1334352694664957952

https://www.vox.com/recode/2020/12/4/22153786/google-timnit-gebru-ethical-ai-jeff-dean-controversy-fired

https://twitter.com/JeffDean/status/1334953632719011840

https://en.wikipedia.org/wiki/Timnit_Gebru

https://en.wikipedia.org/wiki/Fei-Fei_Li
ReplyDelete
Replies
remmijMay 1, 2023 at 12:23 PM
you may know someone interested in this…
ReplyDelete
Replies
remmijMay 1, 2023 at 1:47 PM
kept diving, I mean searching
ReplyDelete
Replies
mateojose1May 1, 2023 at 2:34 PM
Source: https://www.newsguardtech.com/special-reports/newsbots-ai-generated-news-websites-proliferating/

Summary: A News Guard analysis has found that there are a number of content farms that are using large language models to generate new content (as well as clickbait), many of which have a lot of ads. The ones that News Guard investigated have generic and benign names, plus articles they use that were generated by AI are often summaries or rewritten versions of content from other sources, though they often include hints that humans did not write them. Also, though many had factual information, some had misinformation.

On a personal note, I guess that this is further proof of what Dan has been warning about concerning LLMs, as well as the importance of being able to evaluate websites and their content.
ReplyDelete
Replies
remmijMay 2, 2023 at 4:36 PM
hmmmm, an imaginary assault on time using AI…
https://www.instagram.com/stelfiett/
https://www.vox.com/videos/2023/5/2/23708076/ai-artist-stelfie-process-workflow
ReplyDelete
Replies
remmijMay 3, 2023 at 12:41 PM
collecting -
potentially on the upside…
AI, DeepMind - Alpha Fold - shape sifter dataset

DeepMind blog
ReplyDelete
Replies

Add comment

SearchReSearch

Sunday, April 23, 2023

Comment: LLMs, truth, and consistency--they don't have any idea

31 comments:

Followers

Blog Archive