At the SearchResearch Rancho...
... we're always asking questions. Pictures that are taken while traveling are a rich source of questions--mostly, what IS that thing?
But the questing minds of SRS Regulars always wants more. So today, a question about asking questions, and in particular, about the limits of using AI systems to tell you what you're looking at. Most of the current AIs are multimodal, meaning they can handle images in addition to text. Let's check this out!
Below I've attached 4 full-resolution images that you can download to your heart's delight (with links so you can get the originals, if you really want them). Here, taken on a recent trip, are 1. my hand; 2. a bottle of wine; 3. a piece of pastry; and 4. a beautiful flower. So... what ARE these things?
Our Challenge for this week is this:
1. How good, really, are the different AI systems at telling you what each of these images are?
2. What kinds of questions can the AI systems answer reliably? What kinds of questions CAN you ask? (And how do you know that the answers you find are correct?)
We're really interested in what answers you find, but just as importantly, what answers you do NOT find! Are there limits on what you can ask? What are those limits?
Let us know in the comments section.
Keep searching!
used google lens & extrapolated from there
ReplyDeleteonce my DI (defective intelligence) saw the Napoleonka with red currents I lost focus on the snow rose & the McLaren Vale, Australia wine, not to mention the ossa metacarpalia/phalanges
I hit many snags with the AI image searches - seemed it would be easier doing a conventional search ¯\_(ツ)_/¯
will look forward to your instruction
It's a great comment (which I'll touch on next week). But how did Google Lens do with the hand?
Deletenot much luck - need a hand using AI image search.
DeleteI had no reason to believe it came from New Zealand…
https://shorturl.at/iGFND
https://teara.govt.nz/en/community-contribution/7206/the-howard-mystery
the other side:
https://shorturl.at/dzeHf
choices -
https://www.shutterstock.com/search/mysterious-hands
the hand needs some cuticle work, just saying
from Japan -
https://yugipedia.com/wiki/Mystery_Hand
sounds about right.
"This monster twists reality and reaches between dimensions to attack its enemies."
https://www.pinterest.com/pin/hand-of-the-mysteries-alchemy-symbol-of-transformation--311100286733964263/
👍
AI told me to talk the digits - I thought it wasn't suppose to have a sense of humor.
Deletehttps://www.mentalfloss.com/posts/talk-to-the-hand-phrase-origin
I was thinking exactly that. Google Lens counts as a AI multimodal. I searched and apparently it is but not sure. I got confused because YouTube was also mentioned.
ReplyDeleteI am trying Gemini. It always asks me if I want to change from Google Assistant. And now I'm wondering. Gemini says it has 3 versions to choose. How we know which one to use for our searches? The last one says it's better for search, maps and others. But all of them are free?
Chart related to the topic
Delete"Turns out that AI image generation is extremely energy intensive..."
https://x.com/simongerman600/status/1892730386876129567
Interesting search exercise. I'll answer in a post per picture.
ReplyDeleteI started with the pastry and Perplexity (Auto model) and free ChatGPT, Gemini, Claude and Copilot. Perpexity, Gemini, Claude and ChatGPT gave variations on "This pastry appears to be a Kremšnita, also known as Cremeschnitte or Kremówka, depending on the region. It is a popular European dessert made of layers of crispy puff pastry filled with a thick, creamy vanilla custard and topped with powdered sugar.
It is commonly found in Central European countries like Slovenia, Croatia, Poland, Austria, and Hungary. The version in the image looks particularly thick and airy, which is characteristic of the Slovenian or Croatian Kremšnita." Gemini also gave an English name: Bled Cream Cake and both Gemini and Claude gave the most detail. I was curious on how it was traditionally served and both ChatGPT and Perplexity said variations on the addition of berries, "as in the image", provides a colorful contrast and a fresh, tart flavor that complements the rich custard cream. ChatGPT added a "Fun fact: Pope John Paul II was a huge fan of the Polish version, Kremówka, after recalling eating it as a child."
Claude also mentioned the berries (unlike Gemini) and said: "Often presented on its own or with minimal accompaniments - though the berries shown in the image are a nice modern touch"
I'd give CoPilot 0 out 10 for its answer - as I know what a Mille-Feuille looks like and I've never seen one like this: "From the image, it looks like a Mille-Feuille, also known as a Napoleon. This delightful pastry consists of thin, flaky layers of puff pastry, traditionally filled with layers of pastry cream or custard. It's often dusted with powdered sugar on top and sometimes adorned with fresh fruit, as seen with the red currants and blueberries on the plate."
So I trust the other tools to have got it right. Perplexity (as usual) gave sources as did Gemini. BUT CoPilot - YOU NEED TO DO BETTER. (I think from googling Kremšnita and the fact that the others all said the same I think it's correct but and more specifically, from Google Lens, Bled Cream Cake as mentioned by all but Perpexity).
Ranking them:
ChatGPT - 10/10 (for the fun fact - assuming true)
Gemini / Claude - 9/10
Perpexity - 7/10 (as a bit less informative)
Copilot: ZERO
The next photo was interesting - and I'm sure I could have done this with the pastry one too, but I wasn't promoted to do so!
ReplyDeleteAll the 5 AI tools I tested gave the same basic answer but Gemini almost excelled by naming the vintage (visible if you look closely at the image) but getting it wrong!
I just asked "What's this a picture of" without saying it was a bottle or anything. This was Gemini's answer:
"This is a picture of a bottle of wine, specifically a 2018 Woodstock Pilot's View Shiraz.
Here's what we can see in the image:
Wine Bottle: The main focus is the dark glass wine bottle. The fill level appears to be relatively high, suggesting it's nearly full.
Wine Label: The label is white with black and reddish-brown text. It reads "PILOT'S VIEW" at the top and "WOODSTOCK" at the bottom. There's also a smaller graphic of what looks like an old airplane or car and the word "SHIRAZ" is visible.
Vintage: The number "2018" is clearly visible, indicating the year the grapes were harvested.
Other Details: You can also see "Mclaren Vale" faintly at the bottom below "Woodstock", indicating the wine region.
Background: The background is slightly blurred, but shows what appears to be a white table or countertop and a person's arm and part of a menu or document. The document seems to list different wines with prices. You can see "Langhorne Creek" and "Mclaren Vale" listed along with prices like "12.50", "19.00", etc.
This suggests that the photo was likely taken at a wine tasting or event where the wines are being presented along with their tasting notes and pricing."
ChatGPT said it was a bottle of "Pilot's View" by Woodstock, which is a Shiraz from McLaren Vale (a wine region in Australia) and also guessed it was a wine tasting event or store. ChatGPT was the only AI tool to say McLaren Vale was in Australia.
Copilot also named the wine but managed to read that it was a 2019 vintage but no mention of the type of event.
Claude named the wine but nothing more (not even that it was a Shiraz) and just said there were documents underneath it.
Perplexity gave the shortest answer: "The picture shows a bottle of "Pilot's View" 2019 Shiraz wine from Woodstock, likely at a wine tasting event. The image also includes a wine list with prices." It then had supplementary questions to follow up on - so I asked the first: "Where was this picture taken" to see if it could tell. It couldn't but gave instructions on using Pic2Map to read the photo Exif data and also suggested PicArta that gave almost the same answer but I suspect Pic2Map was more accurate. (And I see the photographer had an Apple iPhone 14 Plus and was visiting General-Guisan-Quai, Lindenhof, Altstadt, Zurich, Switzerland on Tue 7th of November 2023).
3) All the AI tools (and Google Lens) gave the same answer: ie Helleborus niger, commonly known as the Christmas Rose. The prompt was simple "What flower is this"
ReplyDeleteGemini said:
"Based on the white blooms and the visible yellow anthers, it's likely Helleborus niger, commonly known as the Christmas Rose, or possibly a hybrid derived from it.
It's important to note that without a clear view of the foliage (which is somewhat obscured), it's challenging to be 100% specific. However, the flower characteristics strongly suggest it's a Hellebore and most likely Helleborus niger."
Perplexity again gave a short answer: "Based on the image, the flower appears to be a Helleborus niger, also known as a Christmas rose or black hellebore."
Although all said it was a Christmas Rose / Hellebore only Claude failed to give the Latin name, although it did give other details e.g. that it's one of the few winter flowers.
I then asked where I could see them in the wild and again they all got it right suggesting Alpine areas of Europe / Nothern Balkans and suggesting Slovenia (which is where the photo was taken - this time in February a few days ago, near Bled in Slovenia - confirming the cream cake as a Bled Cream Cake (as there was no Exif data in the that photo - or the hand photo)
thanks Arthur -
Deletehttps://passionspoon.com/bled-cream-cake-in-a-glass/
https://thateconomiststilleats.wordpress.com/2017/07/19/lake-bled-slovenia-bled-cream-cake/
https://www.bled.si/en/inspiration/stories/20190920131928/the-secrets-of-the-bled-cream-cake/
nice analysis on the images/AI
https://en.wikipedia.org/wiki/Kagi_(search_engine)
Thanks for confirming. I've never ised Kagi before. (DDG, Bing, Google, Baidu, Yandex and a few others on https://en.wikipedia.org/wiki/List_of_search_engines but not Kagi).
Deletehttps://images.trvl-media.com/lodging/40000000/39810000/39803300/39803240/3ec7badc.jpg?impolicy=resizecrop&rw=575&rh=575&ra=fill
Deletehttps://en.wikipedia.org/wiki/Napoleonka
Looking at the hand was interesting because of the differences in answers. I just asked "What can you tell me about this image" not saying it was a hand - but all knew.
ReplyDeleteGemini played at being "Dr Google" by assuming it was a medical query - with a detailed answer: https://g.co/gemini/share/67a45ca356a1 but it did say nothing to worry about: "In conclusion, the image shows a close-up view of a hand that appears to belong to an older individual. The skin's texture and appearance are suggestive of normal aging or possible dryness. "
Claude also thought it was a medical query: "This image shows a close-up view of someone's left hand against a light gray or white background. The hand appears to be of someone with brown/tan skin tone. The fingernails are visible and appear natural (unpolished). The skin shows natural creases and wrinkles, particularly around the knuckles and palm area. The image captures fine details of the skin texture and appears to be taken in good lighting conditions. Some light hair is visible on the wrist area. All five digits (four fingers and thumb) are clearly visible in the photograph.
The image seems to be taken from directly above, with the hand laying flat on what appears to be a smooth surface. The composition is straightforward and documentary in style, as if taken for medical or reference purposes."
Copilot and Perplexity just described the hand with little more. ChatGPT tried to diagnose issues with the thumb: "This image shows a human hand placed against a plain, light-colored background. The hand appears to have four fully developed fingers and a thumb that is significantly shorter than usual, possibly due to a congenital condition, injury, or medical condition. The skin tone suggests an adult, and the texture of the skin, including visible wrinkles and veins, indicates some age or exposure to the elements."
Interestingly although they knew it was an adult hand, were they being politically correct in NOT saying it was a male hand. Two mentioned hair on the wrist but still no phenotyping!
If I had to give scores out of 100 (25 marks for each) I'd go with
Copilot 55% - being generous but losing out on the pastry and not being great on the hand
Claude: 65%: Failing on not giving the flower Latin name or the wine vintage / type of event or existence of price lists, but getting good marks in suggesting a medical photo of the hand, and Bled as a possible pastry location.
Perplexity: 72% - losing out on brevity of answers. If it had given more detail it'd have got more but overall the most correct answers.
Gemini: 75% (Identifying Bled and really looking at the hand but losing out on the wrong wine vintage)
ChatGPT: 76% - losing out for not naming the vintage
Finally part 2:
ReplyDeleteIt seems as though AI can now analyse most images - or those you suggested, including looking at text. They can't (yet) analyse meta data on images and I suspect uncover some details if not given reason - or may make guesses for a reason as gemini did with the hand assuming it was a medical query.
How do you know if the answers are correct: You don't. You have to trust and assume and look at several sources to confirm or deny assumptions which may be biased anyway. And check with other sources. With images there's also the risk of deep fakes - that have been used to confuse. (A famous example from a few years ago was Katie Jones - https://apnews.com/article/ap-top-news-artificial-intelligence-social-platforms-think-tanks-politics-bc2f19097a4c4fffaa00de6770b8a60d. Would she have been caught out today - with better images). And it's not just deep fakes: there are many "customer recommendations" using images purchased from stock photo sites featuring glowing product reviews. They exist as many (most) people can't verify information and believe what they read (and get scammed as a result). AI may actually help in the future to prevent scams and fake google search results. (Or is this wishful thinking?)
Conventional search also gets it wrong on a regular basis so when one "miserable failure" gets corrected, another pops up - with a few dinosaurs (not being that old either). For example https://futurism.com/google-sge-false-ai-generated-info gives some recent "googlebombs" (using the old term) but searching on google for countries in Africa beginning with the letter K STILL gives the wrong results for the first couple of entries (https://news.ycombinator.com/item?id=37145312 ; https://www.reddit.com/r/teenagers/comments/o8i3as/fun_fact_theres_not_a_single_country_in_africa/ ; https://kagifeedback.org/d/2418-kagi-falls-for-african-country-starting-with-k-having-no-answer) These should not be viewed as valid to appear at the top of a results listing and so trusted by people who don't know better. (And it's NOT just Google: https://www.whitehouse.gov/?s=proud+boy is another one).
🦖 🦕.
Deletehttps://en.wikipedia.org/wiki/Google_bombing
2005 - https://googleblog.blogspot.com/2005/09/googlebombing-failure.html
2007 -
https://developers.google.com/search/blog/2007/01/quick-word-about-googlebombs
2024 -
https://guides.library.jhu.edu/c.php?g=1005539&p=7304636