Let's return to the tea...
As you recall from our previous episodes, we found a way to use AI to scan an image looking for the names of different teas OR to Control-F for the titles of books on a physical bookshelf.
There's been a bunch of chatter about this working (or not) in the blogosphere, so I thought I'd re-test my methods.
So I did. Guess what I found? Yeah. It kinda / sorta works. There are things you can to do improve your use case, but it's never 100% reliable. Still, for lots of uses, that's pretty good.
Here's today's results and insights.
Here's my image from today's trial--a nice high-resolution image of a random bookshelf in my house. (Again, don't judge this collection of texts!)
This time, I zoomed in a bit to make sure that everything was visible to the maximum extent possible.
Gemini: This I asked Gemini to...
[scan this image and make a list of all the book titles and authors you see here. Put them into a spreadsheet format, and put "Can't read" for the book spines you can't read clearly]
We learned last time that asking for the results in a spreadsheet always ended up returning more and better results than if you just asked for the list of books.
Note that I ALSO told it to insert a marker ("Can't read") for anything that it was uncertain about.
A couple of strange things happened.
First, it did a pretty good job--BUT it hallucinated a couple of book titles and totally missed a few titles. Here's the bottom part of the spreadsheet it made with my notes on the side... (the top part is totally fine)...
It's MOSTLY okay (kinda, sorta), but Gemini missed all 3 C.S. Lewis books ("Chronicles of Narnia," "Prince Caspian," and "The Lion, the Witch, and the Wardrobe"), it completely hallucinated the Dostoevsky book (I think it misread C.S. Lewis as Dostoevsky, which is a pretty big miss), it skipped "Macbeth," and it completely hallucinated a book, "The Lost Treasure of Beowulf," which sounds great, but isn't a thing.
Dang.
On the other hand, it did correctly place a "Can't read" where the blue-tape-bound book is (which is correct).
Gemini: Overall, 4 misses and 2 hallucinations out of 35 books (one of which is unreadable). Roughly 90%.
ChatGPT: Interestingly, when I did the same thing with ChatGPT, I got similar, but slightly different errors:
In a couple of places it dramatically shortened the title. In the example above, ChatGPT shortened "The Lion, the Witch, and the Wardrobe" to just "Wardrobe." (It also shortened "The Oldest Living Confederate Widow Tells All" to just "Tells All," which is a bit much.)
But overall, a better accuracy rate than Gemini--on this image.
I played around for a few hours trying to get any of the systems to 100%, but I wasn't able to get much better than 95%... which is... kinda-sorta mostly right.
As with most AI things, you still have to double-check.
But there are things you can do to improve the accuracy.
1. High resolution images improve your results. I've seen plenty of OCRs fail because the resolution wasn't high enough. Heuristic: If you can't read it when you zoom way in on the text, it's a sure bet than the OCR system can't read it either.
2. Ask your AI to tell you when it can't read something. That's the "Can't read" trick from above. That's a much better way to fail than just silently NOT giving you a result.
3. Ask for a spreadsheet--it's much better than just getting the list of books. It's unclear why it should be so, but the "in the form of a spreadsheet" seems to make the AI work harder and more accurately.
4. But Control-F with a camera works quite well... kinda/sorta. Here's a real example when I was looking for a particular book in my stacks. As you can see, it found the book when *I* could not. (In retrospect this seems obvious, but when you're actually looking for the book, a kind of blindness takes over...)
Bottom line for much AI work (especially OCR)--it kinda/sorta works, but you still need to validate what it's telling you.
And that seems to be the way it is at the end of the year, 2025.
Until next year, keep searching! (And check, check, check...)
No comments:
Post a Comment