Friday, January 7, 2022

Image identification is great, when it works.

 Searching by image... 

What plant is this? Can search-by-image help?

... is a fundamental skill for SearchResearchers.  You should know how to use regular Google Search-by-Image (see this for a refresher), and you should know how to use Google Lens (refresher), and you should know about Bing's Search-by-Image, Tineye, and Yandex's search-by-image tool as well.  That's five major search-by-image systems that you should just know how to use.  

But the problem with all of these systems is that they do what they can with what they see, but you shouldn't rely on them for proper identification, particularly of plants, dogs, cats, people, and the random marginalia of life.  

Here's what I mean.  

This is a photo I took of a plant growing in a nice Silicon Valley bit of landscaping.  I happen to know what this is. (It's an Arbutus unedo, which we've talked about before in SRS, post about these plants back in 2014.)  I'm curious to see if Google Lens will identify it correctly or not.  I know this is a difficult search because this plant looks superficially like several others.  Here, I'm interested in the plant with the red berries and not the low-growing plant below it.  

Original Photo

As you know, you can select the region of interest in the photo--basically telling the image reco system what to pay attention to, so I first selected a random bit of leaves, branches, and a corner of one of the fruits.  This turns out not to have worked well at all.  

Google Lens thinks it's a Toyon bush, which is reasonable guess, except that the berries are all the wrong size (Toyon berries are much smaller and don't have little hairs).  Luckily, I also know what a Toyon bush looks like, so I know it's wrong.  But unfortunately, if you didn't happen to know what a Toyon was, you might well accept this an an ID.  And you'd be wrong if you believed this identification. 

For a second attempt, I focused the area of interest onto just the berries.  This also turns out to not work well either.   

Now Lens thinks it's a Lychee, which isn't a terrible guess, but upclose, they look nothing alike.  This is a basic problem with identification only from an image: it's hard to get the upclose details that matter!  (For what it's worth, this is a problem for humans trying to identify something from just an image.  The key difference is that the human will tell you "I can't be sure, given just this photo" whereas the algorithm just gives you an ID without any indication that there's a possibility of error.  

I try again with a broader region to search, and the third time is a charm.  Here I've changed the region of interest to be the berry, a few leaves, and a couple of small branches.  And this time, it gets it right.  Well, kind of.  It calls it a "strawberry tree" which is correct and informal, you have to dig deeper to get anything truly useful for an ID. I would have preferred the answer to be more like this: 
 Arbutus unedo, an evergreen shrub or small tree in the family Ericaceae native to the Mediterranean region and western Europe. The plant is known for its fruit which look a great deal like a strawberry, hence the common name "strawberry tree." However, it is not really related to strawberries at all. 

SearchResearch Lesson about Search-by-image

1. Basically, you have to be careful and check your results It's tempting to do a quick search and be done with it, but if you're trying to figure out if it's a potentially deadly flower (or berry, or insect bite...), you'll want to do as you always do and DOUBLE CHECK.  

Note that all of the search-by-image systems (Tineye, Yandex, Bing, Google, etc.) are all sensitive to the are of interest--you will get different answers depending on which region of the image you ask about.  

None of them will tell you about the degree of certainty, but will give you a ranked list, rank ordered by a mysterious relevance operation that probably has nothing to do with accuracy, but more to do with image similarity.  

And it should go without saying, but NEVER TRUST a search-by-image function to identify a mushroom.  To get a complete ID of a mushroom often requires examining the gills and the spores, frequently with a microscope.  

Be aware of the limits of the tools you use--the skilled SRS Researcher knows the limits!  

2. The guidance you give to the algorithm is really important.  Try to select an area of the image that has all of the information you would need to do an identification.  In this case, that would be leaves + fruit + branches.  Or, if you're trying to identify something non-botantical, try to find the "most representative" piece of the image, and not just a random decontextualized fragment.  

Search on!  (Cautiously...)  


  1. Hi Dr. Russell.

    Once again, very interesting and helpful. Being there, done that. I have searched sometimes and found some possible answers.

    In some cases I was sure the answer was right. In others, kept the results and tried to verify in other way. And sometimes, didn't find a good answer.

    What is interesting to me, is that Search by image and Google Lens many times give different answers. I searched by image your photo and strawberry is one of the answers.

    That is why I try with both tools as a confirmation method

  2. Replies
    1. Fascinating! I had no idea that there was a liqueur made of the strawberry fruit. Thanks for the links. I HAVE tasted the fruit, and the taste is fine... but the fruit itself has a somewhat disagreeable mealy texture. I assume one filters all of this out leaving just the great taste. Nice find.

    2. "(in Italian)

      O verde albero italico, il tuo maggio
      è nella bruma: s'anche tutto muora,
      tu il giovanile gonfalon selvaggio
      spieghi alla bora

      — Giovanni Pascoli

      Oh green Italian tree, your May month
      is in the mist: if everything die,
      you, the youthful wild banner
      unfold to the northern wind"

      from wiki, hmmmm, the honey from the blossom is a bitter delicacy
      a Victorian crumble cake
      checked the 'talkies'…