What do you do with too much tea?
![]() |
| [a massive amount of tea pour from a large flowery cup] P/C Gemini |
Answer: Spill it, obviously.
I was in the grocery store just before closing, searching for a particular kind of decaf tea that my daughter wanted. I had the thought that I'd try the "Control-F for reality" idea that I wrote about in my previous post.
I took a couple of photos of the tea shelf. Here's what I got, a very full grocery rack of tea.
I was looking for a particular kind of decaffeinated tea, so I prompted Gemini with:
[which of these teas are decaffeinated?]
Which is pretty reasonable--it not only tells me which are decaf, but even where they are on the shelf ("located on the far left of the shelf"). Nice.
Response:
Which is pretty reasonable--it not only tells me which are decaf, but even where they are on the shelf ("located on the far left of the shelf"). Nice.
It EVEN found caffeine-free options that are not explicitly labeled as such (e.g., "chamomile citrus" or "turmeric ginger").
I was thinking this was a smash success.
But you know me--I have to double check these things before I believe them. Pro tip: You should always check too.
That got me to thinking--what would the other AIs do with this simple task? So I tried the same task with ChatGPT, Perplexity, and Claude. Strangely, the results are very variable.
Gemini: found 8 decaf teas
ChatGPT: found 4 decaf teas
Perplexity: found 3 decaf teas (but it warns that ",,,they should be
assumed caffeinated unless their individual packages elsewhere state “decaf” or “caffeine free.”)
Claude: found 1 decaf, 4 caffeine free
So... which is it? Why so much variation?
To compare the different AIs, I thought I'd put all of the results into a spreadsheet so I could see them all side-by-side.
My next prompt was to:
[please make a list of each of the teas on this shelf. Each line of the list should show the name of the company, the name of the tea, and if it's caffeinated or not. Please create a Google sheet with this data.]
It gave me good data, but would NOT put it into a Sheet. (How odd is that? But see below for more info on this...) But it DID give me a CSV block of text with what I was looking for--easy to copy/paste into a new sheet.
Notice that Column C ("Caffeine Status") lists some teas as Caffeine-Free and others as Decaffeinated. I finally noticed that "Decaffeinated" teas have had the caffein removed while "Caffeine-Free" teas never had caffeine in the first place. They're all herbal and without caffeine at all.
BUT... In this spreadsheet, Gemini claims there are 38 different teas, 14 of which are decaf. Interesting! Seconds before, when I asked directly ("which of these teas are decaffeinated?") it only gave me 4 decaf and 4 caffeine-free.
That's pretty funky.
If you ask a question one way you get an answer of 8, but when you ask for the details, you get 14. What's going on here? How did it find an additional 10 decaf teas? And, strangely, when you ask for the teas listed in a CSV form, listed by company and caffeine-status, then drop that into a spreadsheet, you get very different answers.
So now I thought I'd get the other AIs answers in a spreadsheet as well.
Here's ChatGPT's sheet:
Notice any differences between the two sheets of Gemini and ChatGPT?
First off, Gemini lists "Mighty Leaf Organic Breakfast" as one of the teas, but ChatGPT misses it. (There are more diffs.)
Comparing the differences in spreadsheets created by each:
That's a very weird result. If you ASK an AI how many teas there are, you get one answer BUT if you ask it to create a spreadsheet, it gives you a much larger number!
EVEN STRANGER... after not working on this blog post for several days, I went back to Gemini, re-uploaded the image and re-asked all of the questions above--including "create a spreadsheet." Voila! Today it knows how to create a Google Sheet. Even better (and weirder), this time it found 58 teas, 18 of which are decaf. That's 20 more teas than last time!
Key insight #1: So.. your answer varies from AI to AI AND it varies if you ask it directly ("which teas are decaf?") vs. asking for a CSV list to drop into a spreadsheet. Again, all the results are VERY different.
And--no surprise--there are some errors here. None of the AIs found the Blue Lotus Chai or the Builder's Tea (second shelf from the bottom). If you were doing Control-F for "Blue Lotus Chai," you'd be out of luck.
ALL of this was an odd result, so I went back and took a higher-resolution image of the tea shelves and found that it COULD see the Blue Lotus Chai and Builder's "People's Tea."
Key insight #2: You need to have fairly high-resolution images to get decent results. EVEN SO... you'll get variable results depending on how you ask the question ("just ask" versus "give me a spreadsheet"). Asking for a spreadsheet always gives a better answer.
Key insight #3: Most of the AIs won't tell you that they're having problems scanning the image for labels. (To their credit, Perplexity and Grokker told me that "Cannot reliably read and extract every tea name and company from the photo.") But, significantly, both Gemini and ChatGPT never said anything about not having enough resolution to be confident in their results.
And that tells you something: It's clear that all of the results of the image analysis by all of the AIs has some internal confidence measure, and they won't show results when the confidence is too low. That makes sense. But to not say anything about uncertainty is just malpractice. At very least, the AI should say something like "I'm not really sure about these results..."
What to do? I asked both Gemini and ChatGPT a simple validation question:
[were you able to capture all of the teas in the image?]
In both cases the AI was able to look a little bit more carefully. Gemini found the Blue Lotus chai (and a few others that it had missed the first time around)! ChatGPT told me that "I captured most of the front-facing teas, but a few items weren’t fully captured with readable labels, and a couple of my earlier caffeine calls were overconfident because the tea type (black vs rooibos/herbal vs decaf) isn’t legible on some tins." And then it gave me a newly updated spreadsheet which listed 55 different teas.
Note that the interface looked like this...
You might think that this means it found only 14... but you would be wrong. IF you click on the download button (downward pointing arrow in a circle, upper right), you'll find that the full spreadsheet has 55 teas listed, along with the updated assessments about whether they're caffeinated or not.
Bottom line: You really have to work at this to get a good analysis of an image. The different AIs will give you very different answers... and will even work hard, if you ask them to.
SearchResearch Lessons
1. AIs are variable in quality and detail. As you see, different AIs give very different results. Your accuracy and quality will vary by which you use.
2. Beware of asking an AI to make an inference for you. The difference between "decaf" and "non-caffeinated" might be subtle, but the AI doesn't know that it's opaque to you.
3. Ask for all of the details in a spreadsheet if you want to compare or validate the results. Notice that just asking for the tea gave us pretty poor results, while asking for all of the data in a spreadsheet format gave MUCH better results. When you're looking for details on a task like this, request all of the data.
4. Different AIs give you different answers, and will give you different answers if you ask in slightly different ways (including "think harder"). Be cautious.
Keep searching!







