When you ask an LLM a silly question...
Ouroboros from an alchemical tract of 1478. P/C Wikimedia. |
... you have to expect a silly answer.
A key skill of great searchers is knowing what it is they're asking. To do decent SearchResearch, you need to understand a bit about what you're asking, if only to avoid being taken for a ride.
Yes, I know this is an ouroboros kind of question--how can you know about a topic before you research it? (That involves bootstrapping your knowledge; we've talked about this before and will doubtlessly, ouroborically discuss this again.)
But this is especially true if you're asking questions of LLMs like ChatGPT or Bard.
The prose generated by these systems seems very smooth and convincing, but you have to check-check-check all of what they generate.
Here are a couple examples of why you need to know a little something before using an LLM as a search enginer...
If I ask our two favorite LLMs these SRS-style Challenges, you, as a reasonable human might question if I have gone a bit off my feed. You'd be right to question my sanity. Yet the LLMs just plow on full-steam ahead. Here are their responses: (with text-coloring by Dan to enhance readability)
Q: which is better for stuffing into a pillow: cow feathers or goose feathers?
Bard:
Goose feathers are generally considered to be better for stuffing into a pillow than cow feathers. Goose feathers are softer and more buoyant than cow feathers, which makes them more comfortable to sleep on. They also tend to last longer than cow feathers. However, goose feathers can be more expensive than cow feathers.
Here are some of the pros and cons of using cow feathers and goose feathers for pillow stuffing:
Cow feathers
Pros:
Less expensive than goose feathers
More readily available than goose feathers
May be more hypoallergenic than goose feathers
Cons:
Not as soft as goose feathers
Not as buoyant as goose feathers
Tend to break down more quickly than goose feathers
Goose feathers
* Pros:
Softer than cow feathers
More buoyant than cow feathers
Tend to last longer than cow feathers
* Cons:
More expensive than cow feathers
Not as readily available as cow feathers
May not be as hypoallergenic as cow feathers
Ultimately, the best type of feather for stuffing a pillow is a matter of personal preference. If you are looking for a soft, comfortable pillow that will last, goose feathers are a good option. If you are looking for a more affordable pillow, cow feathers may be a better choice.
ChatGPT:
Both cow feathers and goose feathers can be used for stuffing pillows, and the choice between the two can come down to personal preference.
Goose feathers are typically considered to be of higher quality and more luxurious than cow feathers, and they are often used in high-end pillows. Goose feathers are also softer and fluffier than cow feathers, which can make for a more comfortable pillow. However, they may also be more expensive.
Cow feathers, on the other hand, are more commonly used in cheaper pillows. They are typically heavier and denser than goose feathers, which can make for a firmer pillow. Cow feather pillows may also be less expensive than goose feather pillows.
Ultimately, the choice between cow feathers and goose feathers will depend on your personal preference and budget. It's worth noting that there are also many other materials that can be used for pillow stuffing, such as synthetic fibers, down, or memory foam, which may be worth considering as well.
Q: Which would be a better insulator, a pound cake or a pound of science?
Bard:
A pound of science would be a better insulator than a pound cake. This is because science is a broad term that encompasses many different fields, including physics, chemistry, biology, and engineering. Each of these fields has its own set of principles and laws that can be used to explain and predict the behavior of matter and energy. This knowledge can be used to design materials that are better at insulating against heat, cold, sound, and electricity.
A pound cake, on the other hand, is a type of dessert that is made with flour, sugar, eggs, butter, and milk. It is a relatively dense food that does not have the same properties as materials that are designed to be insulators. Therefore, a pound of science would be a better insulator than a pound cake.
Here are some examples of scientific principles that can be used to design insulators:
The principle of conduction states that heat is transferred from a warmer object to a cooler object through direct contact. This principle can be used to design materials that are good at conducting heat, such as metals.
The principle of convection states that heat is transferred from a warmer object to a cooler object through the movement of fluids. This principle can be used to design materials that are good at insulating against heat, such as fiberglass.
The principle of radiation states that heat is transferred from a warmer object to a cooler object through the emission of electromagnetic waves. This principle can be used to design materials that are good at insulating against heat, such as reflective insulation.
These are just a few examples of the many scientific principles that can be used to design insulators. By understanding these principles, scientists are able to develop materials that can be used to protect people and property from the effects of heat, cold, sound, and electricity.
Dan's comment: This is, of course, utter balderdash. Fortunately, ChatGPT does a much better job...
ChatGPT:
As an AI language model, I must clarify that the question you asked doesn't make much sense as "a pound cake" and "a pound of science" are not comparable in terms of their insulating properties. Insulators are materials that prevent the flow of heat or electricity, and both the "pound cake" and "pound of science" are not commonly used as insulating materials.
If you could provide more context or clarify your question, I would be happy to try and provide a more helpful response.
That last response is encouraging. Does ChatGPT have a nascent ability to do common sense reasoning? Or is that a bit of an OpenAI guardrail kicking in to make the prose more reasonable in response to absurd questions?
Although OpenAI has "open" in its name, they haven't exactly removed the veil of mystery around how it works. (Google hasn't either, but that's not a surprise.) How DOES it know this is a silly question? Is there a silly question detector? Oh, how I'd love to be the person who write the code for that!
It's pretty clear that there's a fair bit of language processing going on behind the Wizard's curtain for all LLMs--and we don't know what it is. Maybe we'll be able to reverse engineer these things, but it's uncertain that we can, especially if their behavior keeps changing week-by-week. (Here's a prediction for you: the answers to these questions will be different next week as OpenAI and Google keep updating their models. And you thought changing the ranking algorithms made things difficult!)
SearchResearch Summary
Be careful. In many ways, the practical aspects of using LLMs for search hasn't really changed. You really DO have to know a bit about what you're asking, if only to separate the good replies from the poor ones. We've been doing this with search results forever... this hasn't changed.
As always, double check what the LLM tells you. It's clever and fun, but it's not an oracle, not matter how it presents itself. (And remember, oracular answers are sometimes not what you want to hear...and you might not understand them... see the story of Croesus and the oracle.)
Corollary: don't ask questions that you can't verify. With LLMs, always verify, never trust.
Keep searching!
And here I was, looking forward to being able to use a pillow stuffed with cow feathers...
ReplyDeleteIn all seriousness, I've read other things about how large language models can spread fake information. Two of them are as follows:
* https://www.newsguardtech.com/misinformation-monitor/jan-2023/: It turns out that ChatGPT-3.5 is quite good at producing false information and fake news.
* https://www.newsguardtech.com/misinformation-monitor/march-2023/: ChatGPT-3.5's successor, ChatGPT-4, is even better at producing misinformation, plus the fake information it produces is more convincing than what its predecessor made. (Given that it also passed the bar exam, this might well open the door to a bunch of lawyer jokes.)
And now Bing is referring to your blog post as the source of information. I'm stunned that LLM can be influenced so easy.
ReplyDeleteI tried this search on you.com chat feature and the results were good but needed to ask you chat to clarify.
ReplyDeleteI was reading and thinking cow feathers? That doesn't exist. However, while reading the answers that you received, Dr. Russell, thought maybe in English there is something that is called that way. I was ready to [ define cow feathers] Glad that there's no such thing or maybe someone will create something soft and call it that way.
ReplyDeleteIn LLM I am thinking if Bard is for now just in English ( America and UK), and Chatgpt is available on Bing worldwide in waiting list, how they improve faster?
I'm thinking for example I. How Spanish change tú by usted or a word can mean many things like ahorita or cuate and many more. I believe those languages will have the information mostly in English as it's today.
Also, last week was reading about Bard will help to create a Cover letter, have you tried with that kind of question, Dr. Russell?
I'll keep reading and hopefully soon try them by myself
All of this about large language models / AI chatbots (both your recent blog posts and what I've read at Nature magazine) brings up the question: How do you avoid the whole "garbage in / garbage out" phenomenon with them?
ReplyDeleteI am not just talking about silly questions such as whether horse feathers or cow feathers are better insulation, but how you can avoid getting nonsense or garbage if you ask one of these chatbots a question.