SearchReSearch: SearchResearch (12/24/25): Living in an AI world that kinda, sorta works for OCR

Wednesday, December 24, 2025

SearchResearch (12/24/25): Living in an AI world that kinda, sorta works for OCR

Let's return to the tea...

As you recall from our previous episodes, we found a way to use AI to scan an image looking for the names of different teas OR to Control-F for the titles of books on a physical bookshelf.

There's been a bunch of chatter about this working (or not) in the blogosphere, so I thought I'd re-test my methods.

So I did. Guess what I found? Yeah. It kinda / sorta works. There are things you can to do improve your use case, but it's never 100% reliable. Still, for lots of uses, that's pretty good.

Here's today's results and insights.

Here's my image from today's trial--a nice high-resolution image of a random bookshelf in my house. (Again, don't judge this collection of texts!)

This time, I zoomed in a bit to make sure that everything was visible to the maximum extent possible.

Gemini: This I asked Gemini to...

[scan this image and make a list of all the book titles and authors you see here. Put them into a spreadsheet format, and put "Can't read" for the book spines you can't read clearly]

We learned last time that asking for the results in a spreadsheet always ended up returning more and better results than if you just asked for the list of books.

Note that I ALSO told it to insert a marker ("Can't read") for anything that it was uncertain about.

A couple of strange things happened.

First, it did a pretty good job--BUT it hallucinated a couple of book titles and totally missed a few titles. Here's the bottom part of the spreadsheet it made with my notes on the side... (the top part is totally fine)...

It's MOSTLY okay (kinda, sorta), but Gemini missed all 3 C.S. Lewis books ("Chronicles of Narnia," "Prince Caspian," and "The Lion, the Witch, and the Wardrobe"), it completely hallucinated the Dostoevsky book (I think it misread C.S. Lewis as Dostoevsky, which is a pretty big miss), it skipped "Macbeth," and it completely hallucinated a book, "The Lost Treasure of Beowulf," which sounds great, but isn't a thing.

Dang.

On the other hand, it did correctly place a "Can't read" where the blue-tape-bound book is (which is correct).

Gemini: Overall, 4 misses and 2 hallucinations out of 35 books (one of which is unreadable). Roughly 90%.

ChatGPT: Interestingly, when I did the same thing with ChatGPT, I got similar, but slightly different errors:

In a couple of places it dramatically shortened the title. In the example above, ChatGPT shortened "The Lion, the Witch, and the Wardrobe" to just "Wardrobe." (It also shortened "The Oldest Living Confederate Widow Tells All" to just "Tells All," which is a bit much.)

But overall, a better accuracy rate than Gemini--on this image.

I played around for a few hours trying to get any of the systems to 100%, but I wasn't able to get much better than 95%... which is... kinda-sorta mostly right.

As with most AI things, you still have to double-check.

But there are things you can do to improve the accuracy.

1. High resolution images improve your results. I've seen plenty of OCRs fail because the resolution wasn't high enough. Heuristic: If you can't read it when you zoom way in on the text, it's a sure bet than the OCR system can't read it either.

2. Ask your AI to tell you when it can't read something. That's the "Can't read" trick from above. That's a much better way to fail than just silently NOT giving you a result.

3. Ask for a spreadsheet--it's much better than just getting the list of books. It's unclear why it should be so, but the "in the form of a spreadsheet" seems to make the AI work harder and more accurately.

4. But Control-F with a camera works quite well... kinda/sorta. Here's a real example when I was looking for a particular book in my stacks. As you can see, it found the book when *I* could not. (In retrospect this seems obvious, but when you're actually looking for the book, a kind of blindness takes over...)

Bottom line for much AI work (especially OCR)--it kinda/sorta works, but you still need to validate what it's telling you.

And that seems to be the way it is at the end of the year, 2025.

Until next year, keep searching! (And check, check, check...)

4 comments:

krossbowDecember 26, 2025 at 10:31 AM
I really enjoy these posts that remind us of the imperfections and that we need to stay vigilant in double checking what AI presents us with.

I'll share a way that I did for scanning text from a photo - I use an app called Paprika for recipes. It's great for saving recipes from the web and removing all of the puff piece writing. My issue was with downsizing there will be no room in my future for cookbooks. Paprika doesn't import recipes from photos. My solution was to work with Gemini to create an Apple Shortcut and while not fully automated, it works to well enough that I have been able to import 400+ recipes from photos into Paprika in the last month or so. Note: These are not from authored cookbooks.

This is basically what the shortcut does:
On launch - open the camera
I take a picture of the recipe page.
Scan for text and copy. Apple is pretty good at grabbing all of the text accurately.
Call to ChatGPT to take the text and convert to html.
Launch the Wordpress app,

The rest is manual of pasting the html into the code editor, publishing it, grabbing the url and putting it in Paprika. It takes me about 1-2 minutes per recipe.

Where an AI might have weakness, work with tools that compensate. Apple to grab text from a photo and AI to format and create. I did appreciate how easy it was to create a shortcut for this with instructions from Gemini.
ReplyDelete
Replies
krossbowDecember 27, 2025 at 5:58 AM
Here is a screenshot of the steps in the Shortcut. View Image Here and here is an example of what copying and pasting the html looks like Visit the Unsameriver Post

Note: I go through these Wordpress posts once a week or so and delete them as they are for personal use. The link above may self destruct if you are reading it at a later date.
ReplyDelete
Replies
NiebylskiJanuary 11, 2026 at 2:53 PM
I took inspiration from this post and your other recent one on AI-augmented OCR) and enlisted Gemini for some help with a similar (much less erudite!) task. I've wanted to make a cocktail called a Kingston Negroni for a little while now and the recipe calls for a very 'funky' rum such as Smith & Cross. I don't normally drink rum so am not at all familiar with the flavor profiles of various types of rum so didn't really have a reference point for how that particular rum compared to others.
The county I live in controls liquor sales and the available options are fairly limited - Smith & Cross is not available. I asked Gemini for some recommendations for alternatives and got some suggestions, but I could never remember the names.
So...the last time I was in the store, I took a couple of pictures of the two shelves worth of bottles of rum and asked Gemini to find the bottle that most closely matched the flavor profile of the Smith & Cross option.
It came back with a response that -like your post- pinpointed the location of specific bottle locations in the image, and explained the 'proximity' to what i was looking for.
In the end i didn't get anything as Gemini was pretty bullish on any of the options being a good alternative, but it was nice to see it give me a real-time suggestion of alternatives based on a couple camera-phone pictures.
ReplyDelete
Replies

Add comment

SearchReSearch

Wednesday, December 24, 2025

SearchResearch (12/24/25): Living in an AI world that kinda, sorta works for OCR

4 comments:

Followers

Blog Archive