Wednesday, December 24, 2025

SearchResearch (12/24/25): Living in an AI world that kinda, sorta works for OCR

 Let's return to the tea... 


As you recall from our previous episodes, we found a way to use AI to scan an image looking for the names of different teas OR to Control-F for the titles of books on a physical bookshelf.  

There's been a bunch of chatter about this working (or not) in the blogosphere, so I thought I'd re-test my methods.  

So I did.  Guess what I found?  Yeah. It kinda / sorta works.  There are things you can to do improve your use case, but it's never 100% reliable.  Still, for lots of uses, that's pretty good.  

Here's today's results and insights.  

Here's my image from today's trial--a nice high-resolution image of a random bookshelf in my house.  (Again, don't judge this collection of texts!)  


This time, I zoomed in a bit to make sure that everything was visible to the maximum extent possible.  

Gemini: This I asked Gemini to...

[scan this image and make a list of all the book titles and authors you see here. Put them into a spreadsheet format, and put "Can't read" for the book spines you can't read clearly] 

We learned last time that asking for the results in a spreadsheet always ended up returning more and better results than if you just asked for the list of books.  

Note that I ALSO told it to insert a marker ("Can't read") for anything that it was uncertain about.  

A couple of strange things happened.  

First, it did a pretty good job--BUT it hallucinated a couple of book titles and totally missed a few titles.  Here's the bottom part of the spreadsheet it made with my notes on the side... (the top part is totally fine)... 


It's MOSTLY okay (kinda, sorta), but Gemini missed all 3 C.S. Lewis books ("Chronicles of Narnia," "Prince Caspian," and "The Lion, the Witch, and the Wardrobe"), it completely hallucinated the Dostoevsky book (I think it misread C.S. Lewis as Dostoevsky, which is a pretty big miss), it skipped "Macbeth," and it completely hallucinated a book, "The Lost Treasure of Beowulf," which sounds great, but isn't a thing. 

Dang. 

On the other hand, it did correctly place a "Can't read" where the blue-tape-bound book is (which is correct).  

Gemini: Overall, 4 misses and 2 hallucinations out of 35 books (one of which is unreadable).  Roughly 90%.  

ChatGPT: Interestingly, when I did the same thing with ChatGPT, I got similar, but slightly different errors: 


In a couple of places it dramatically shortened the title.  In the example above, ChatGPT shortened "The Lion, the Witch, and the Wardrobe" to just "Wardrobe."  (It also shortened "The Oldest Living Confederate Widow Tells All" to just "Tells All," which is a bit much.)  

But overall, a better accuracy rate than Gemini--on this image.  

I played around for a few hours trying to get any of the systems to 100%, but I wasn't able to get much better than 95%... which is... kinda-sorta mostly right.  

As with most AI things, you still have to double-check.  

But there are things you can do to improve the accuracy.  

1. High resolution images improve your results.  I've seen plenty of OCRs fail because the resolution wasn't high enough.  Heuristic:  If you can't read it when you zoom way in on the text, it's a sure bet than the OCR system can't read it either.  

2. Ask your AI to tell you when it can't read something.  That's the "Can't read" trick from above.  That's a much better way to fail than just silently NOT giving you a result.  

3. Ask for a spreadsheet--it's much better than just getting the list of books.  It's unclear why it should be so, but the "in the form of a spreadsheet" seems to make the AI work harder and more accurately.  

4. But Control-F with a camera works quite well... kinda/sorta.  Here's a real example when I was looking for a particular book in my stacks.  As you can see, it found the book when *I* could not.  (In retrospect this seems obvious, but when you're actually looking for the book, a kind of blindness takes over...) 




Bottom line for much AI work (especially OCR)--it kinda/sorta works, but you still need to validate what it's telling you.  

And that seems to be the way it is at the end of the year, 2025.  

Until next year, keep searching!  (And check, check, check...)  




Saturday, December 20, 2025

SearchResearch (12/19/25): An experiment in cross-posting podcasts - SearchResearch X Unanticipated Consequences

 An experiment... 




As Regular Readers know, I’m writing another book–this one is about 
Unanticipated Consequences of the choices we make.  Here is a link to the Substack where I post those Unanticipated thought-pieces. (You can sign up to get the regular forthcoming book updates there.)

This week I wrote a post for the Unanticipated Consequences Substack on the topic of making decisions about future events.

I ask (and answer!) Why is thinking about the future so hard?

After I wrote that post, methought about making a podcast on the topic.  People keep telling me that would be a good idea.  

So I started narrating it, and have a recording of me doing this, but then I had an even better idea:  What kind of podcast would NotebookLM make of this?  

I poured the Substack post into NotebookLM and--voila!--in a few minutes it generated a 14 minute podcast that I thought was pretty interesting.  

Here's the link to the podcast: 

Dan's podcast about
 Why is thinking about the future so hard?

I'm really curious what you think about it.  Love it?  Hate it?  Let me know your reaction in the comments below.   

You might recall that I did a similar thing back in September of this year, using NotebookLM to create a video about the Prussian Coffee Sniffers. People were split on the quality of the resulting video.  As you know, the quality of the LLMs is constantly improving.  So... how is it today?   


And in the meantime, keep searching! And let us know what you think of this podcast.  


Wednesday, December 17, 2025

SearchResearch (12/17/25): Control-F for reality--when it works / when it doesn't work

What do you do with too much tea? 

[a massive amount of tea pour from a large flowery cup] 
P/C Gemini

Answer: Spill it, obviously.  

I was in the grocery store just before closing, searching for a particular kind of decaf tea that my daughter wanted.  I had the thought that I'd try the "Control-F for reality" idea that I wrote about in my previous post.  

I took a couple of photos of the tea shelf.  Here's what I got, a very full grocery rack of tea.  


I was looking for a particular kind of decaffeinated tea, so I prompted Gemini with:

      [which of these teas are decaffeinated?] 

Response: 



Which is pretty reasonable--it not only tells me which are decaf, but even where they are on the shelf ("located on the far left of the shelf").  Nice.  

It EVEN found caffeine-free options that are not explicitly labeled as such (e.g., "chamomile citrus" or "turmeric ginger").  

I was thinking this was a smash success.  

But you know me--I have to double check these things before I believe them.  Pro tip: You should always check too.  

That got me to thinking--what would the other AIs do with this simple task?  So I tried the same task with ChatGPT, Perplexity, and Claude.  Strangely, the results are very variable.  

Gemini: found 8 decaf teas 
ChatGPT: found 4 decaf teas 
Perplexity: found 3 decaf teas  (but it warns that ",,,they should be 
assumed caffeinated unless their individual packages elsewhere state “decaf” or “caffeine free.”) 
Claude: found 1 decaf, 4 caffeine free 

So... which is it?  Why so much variation?  

To compare the different AIs, I thought I'd put all of the results into a spreadsheet so I could see them all side-by-side.  

My next prompt was to: 

 [please make a list of each of the teas on this shelf. Each line of the list should show the name of the company, the name of the tea, and if it's caffeinated or not. Please create a Google sheet with this data.]  

It gave me good data, but would NOT put it into a Sheet.  (How odd is that?  But see below for more info on this...)  But it DID give me a CSV block of text with what I was looking for--easy to copy/paste into a new sheet.  


(Note the "copy this" icon on the far right of that screencap--looks like a double rectangle: clicking on that copies the CSV text so you really can go to the spreadsheet and paste it.)  Here's that spreadsheet (check out the tabs):  


Notice that Column C ("Caffeine Status") lists some teas as Caffeine-Free and others as Decaffeinated. I finally noticed that "Decaffeinated" teas have had the caffein removed while "Caffeine-Free" teas never had caffeine in the first place.  They're all herbal and without caffeine at all.  

BUT... In this spreadsheet, Gemini claims there are 38 different teas, 14 of which are decaf.  Interesting! Seconds before, when I asked directly ("which of these teas are decaffeinated?") it only gave me 4 decaf and 4 caffeine-free.  

That's pretty funky.  

If you ask a question one way you get an answer of 8, but when you ask for the details, you get 14.  What's going on here?  How did it find an additional 10 decaf teas?  And, strangely, when you ask for the teas listed in a CSV form, listed by company and caffeine-status, then drop that into a spreadsheet, you get very different answers.  

So now I thought I'd get the other AIs answers in a spreadsheet as well.

Here's ChatGPT's sheet: 


Notice any differences between the two sheets of Gemini and ChatGPT? 

First off, Gemini lists "Mighty Leaf Organic Breakfast" as one of the teas, but ChatGPT misses it.  (There are more diffs.) 

Comparing the differences in spreadsheets created by each: 


That's a very weird result.  If you ASK an AI how many teas there are, you get one answer BUT if you ask it to create a spreadsheet, it gives you a much larger number!  

EVEN STRANGER... after not working on this blog post for several days, I went back to Gemini, re-uploaded the image and re-asked all of the questions above--including "create a spreadsheet."  Voila!  Today it knows how to create a Google Sheet.  Even better (and weirder), this time it found 58 teas, 18 of which are decaf.  That's 20 more teas than last time!  

Key insight #1:  
So.. your answer varies from AI to AI AND it varies if you ask it directly ("which teas are decaf?") vs. asking for a CSV list to drop into a spreadsheet.  Again, all the results are VERY different.

And--no surprise--there are some errors here.  None of the AIs found the Blue Lotus Chai or the Builder's Tea (second shelf from the bottom).   If you were doing Control-F for "Blue Lotus Chai," you'd be out of luck.  

ALL of this was an odd result, so I went back and took a higher-resolution image of the tea shelves and found that it COULD see the Blue Lotus Chai and Builder's "People's Tea."  

Key insight #2: You need to have fairly high-resolution images to get decent results.  EVEN SO... you'll get variable results depending on how you  ask the question ("just ask" versus "give me a spreadsheet").  Asking for a spreadsheet always gives a better answer.  

Key insight #3: Most of the AIs won't tell you that they're having problems scanning the image for labels.  (To their credit, Perplexity and Grokker told me that "Cannot reliably read and extract every tea name and company from the photo.")  But, significantly, both Gemini and ChatGPT never said anything about not having enough resolution to be confident in their results.  

And that tells you something: It's clear that all of the results of the image analysis by all of the AIs has some internal confidence measure, and they won't show results when the confidence is too low.  That makes sense.  But to not say anything about uncertainty is just malpractice.  At very least, the AI should say something like "I'm not really sure about these results..."  

What to do?  I asked both Gemini and ChatGPT a simple validation question:  

[were you able to capture all of the teas in the image?] 

In both cases the AI was able to look a little bit more carefully.  Gemini found the Blue Lotus chai (and a few others that it had missed the first time around)!  ChatGPT told me that "I captured most of the front-facing teas, but a few items weren’t fully captured with readable labels, and a couple of my earlier caffeine calls were overconfident because the tea type (black vs rooibos/herbal vs decaf) isn’t legible on some tins."  And then it gave me a newly updated spreadsheet which listed 55 different teas.  

Note that the interface looked like this... 


You might think that this means it found only 14... but you would be wrong.  IF you click on the download button (downward pointing arrow in a circle, upper right), you'll find that the full spreadsheet has 55 teas listed, along with the updated assessments about whether they're caffeinated or not.  

Bottom line: You really have to work at this to get a good analysis of an image.  The different AIs will give you very different answers... and will even work hard, if you ask them to. 

SearchResearch Lessons 


1. AIs are variable in quality and detail.  As you see, different AIs give very different results.  Your accuracy and quality will vary by which you use. 

2. Beware of asking an AI to make an inference for you.  The difference between "decaf" and "non-caffeinated" might be subtle, but the AI doesn't know that it's opaque to you. 

3. Ask for all of the details in a spreadsheet if you want to compare or validate the results.  Notice that just asking for the tea gave us pretty poor results, while asking for all of the data in a spreadsheet format gave MUCH better results.  When you're looking for details on a task like this, request all of the data. 

4. Different AIs give you different answers, and will give you different answers if you ask in slightly different ways (including "think harder"). Be cautious.  


Keep searching!