SearchReSearch: February 2025

Thursday, February 27, 2025

Answer: Asking questions of images with AI?

Image searches are great...

... until they don't work. Since skilled researchers use Search-by-Image a fair bit (at least *I* do), it's always useful to understand just how well it works. And since the LLMs have introduced multimodal capabilities, it's just as important to see how well the new search tools are working.

Last week I gave you 4 full-resolution images that you can download to your heart's delight (with links so you can get the originals, if you really want them). Here, taken on a recent trip, are 1. my hand; 2. a bottle of wine; 3. a piece of pastry; and 4. a beautiful flower. So... what ARE these things?

Our Challenges for this week is are:

1. How good, really, are the different AI systems at telling you what each of these images are?

2. What kinds of questions can the AI systems answer reliably? What kinds of questions CAN you ask? (And how do you know that the answers you find are correct?)

I did several AI-tool "searches" with each of these images. For my testing, I used ChatGPT, Gemini 2.0 Flash, Meta's Llama, and Anthropic's Claude (3.7 Sonnet). I'm not using anything other than what comes up as the default when accessed over the web. (I didn't any additional money to get special super-service.)

I started with a simple [ please describe what you see in this image ] , running this query for each image on each of the four LLMs. Here's the first row of the resulting spreadsheet looks like (and here's a link so you can see the full details):

Click to expand to readable size or click the link above to see the entire sheet.

Overall, the LLMs did better than I expected, but there are clear differences between them.

ChatGPT gave decent answers, getting the name of the pastry correct (including the spelling!), and getting much of the wine info correct. The flower's genus was given, but not the species.

Gemini gave the most details of all, often 3 or 4X the length of other AIs. The hand was described in excruciating detail ("no immediately obvious signs of deformity"), and Gemini also got the name of the pastry correct (although misspelled: it's Kremšnita, not Kremsnita). Again, immense amounts of detail in the description of the pastry, and definitely a ton of information about the wine photo. Oddly, while Gemini describes the flower, it does NOT identify it.

Llama doest okay, but doesn't identify the pastry or the flower. The wine image just extracts text, but has little other description.

Claude is fairly similar to ChatGPTs performance, though with a bit longer description. It also doesn't identify the pastry or the flower.

You can see the differences in style between the systems by looking at this side-by-side of their answers. Gemini tends to go on-and-on-and-on...

Click to see at full-size. This is at the bottom of the sheet.

It's pretty clear that Gemini tries hard to be all-inclusive--a one-query stop for all your information needs.

Interestingly, if you ask follow-up questions about the flower, all of the systems will make a good effort at identifying it--they all agree it's a Hellborus, but disagree on the species (is it Orientalis or Niger?).

By contrast, regular Search-by-image does a good job with the flower (saying it's Helleborus niger), an okay job with the wine bottle, a good job with the pastry (identifying it as a "Bled cream cake," which is acceptable), and a miserable job with the hand.

On the other hand...asking an LLM to describe an image is a very different thing than doing Search-by-Image.

Asking for an image-description in an LLM is like asking different people on the street to describe a random image that you pop in front of them--you get very different answers depending on the person and what they think is a good answer.

Gemini does a good job on the wine image, telling us details about the wine labels and listing the prices shown on the list. By contrast, Claude gives much the same information, but somehow thinks the prices are in British pounds, quoting prices such as "prices ranging from approximately £12.50 to £36.00." (I assure you, the prices were in Swiss Francs, not pounds Sterling!) So that bit seems to be hallucinated.

I included the hand image to see what the systems would do with a very vanilla, ordinary image... and to their credit, they said just plain, vanilla ordinary things without hallucinating much. (Although Claude did say "...The fingernails appear to have a purple or bluish tint, which could be nail polish or possibly a sign of cyanosis..." I assure you, I'm just fine and not growing cyanotic nor consorting with fingernail polish! It didn't seem to consider that the lighting might have had something to do with its perception.

And, oddly enough, as Regular Reader Arthur Weiss pointed out, the AIs don't seem to know how to extract the EXIF metadata with GPS lat/long from the image. If you download the image, you can get that data yourself and find out that the pic of the pastry was in fact taken near Lake Bled in Slovenia. This isn't just a random cubical cake, but it is a Kremšnita!

Here's what GPS info I see when I download the photo and open it in Apple's Preview app, then ask for "More info."

Not so far from Lake Bled itself.

SearchResearch Lessons

1. No surprise--but keep checking the details--they might be right, but maybe not. I was impressed with the overall accuracy, but errors do creep in. (The prices are nowhere noted in British pounds.)

2. If you're looking for something specific, you'll have to ask for it. The prompt I gave ("describe the image...") was intentionally generic, just to see what the top-level description would be. Overall I was impressed with the AI's ability to answer follow-up questions. I asked [what was the price of Riveria 2017] and got correct answers from all of them. That's a nice capability.

Overall, we now have another way to figure out what's in those photos beyond just Search-by-image. Try it out and let us know how well it works for you.

Keep searching!

Wednesday, February 19, 2025

SearchResearch Challenge (2/19/25): Asking questions of images with AI?

At the SearchResearch Rancho...

... we're always asking questions. Pictures that are taken while traveling are a rich source of questions--mostly, what IS that thing?

But the questing minds of SRS Regulars always wants more. So today, a question about asking questions, and in particular, about the limits of using AI systems to tell you what you're looking at. Most of the current AIs are multimodal, meaning they can handle images in addition to text. Let's check this out!

Below I've attached 4 full-resolution images that you can download to your heart's delight (with links so you can get the originals, if you really want them). Here, taken on a recent trip, are 1. my hand; 2. a bottle of wine; 3. a piece of pastry; and 4. a beautiful flower. So... what ARE these things?

Our Challenge for this week is this:

1. How good, really, are the different AI systems at telling you what each of these images are?

2. What kinds of questions can the AI systems answer reliably? What kinds of questions CAN you ask? (And how do you know that the answers you find are correct?)

We're really interested in what answers you find, but just as importantly, what answers you do NOT find! Are there limits on what you can ask? What are those limits?

Let us know in the comments section.

Keep searching!

Thursday, February 13, 2025

SearchResearch Commentary (2/13/25): Using NotebookLM to help with DeepResearch

Let me tell you a short story...

A red-stained schistosoma worm. The male is the small one,
spending his entire life within the groove of the female. P/C CDC.

As you know, I’m writing another book–this one is about Unanticipated Consequences of the choices we made. (Here’s my substack that I’m using to try out various sections.)

And, as you also know, I have a deep interest in Egypt. So, on a recent visit there, I visited the Aswan High Dam (which we discussed earlier in SRS in conjunction with its creation of Lake Nasser) and thought about what were some of the Unanticipated Consequences of building that massive dam? This led to our SRS Research Challenge about “What has been the effect of the creation of Lake Nasser on the incidence of schistosomiasis in Egypt?”

A little background: Schistosomiasis (aka snail fever, bilharziasis) is a disease caused by several species of Schistosoma flatworms, any of Schistosoma haematobium, S. japonicum, and S. mansoni. Transmission can occur when humans are exposed to freshwater sources that are contaminated with Schistosoma parasites.

People infected with Schistosoma flatworms shed eggs in their urine or stool. In fresh water, immature flatworms called miracidia hatch from eggs. The miracidia find and infect the tissues of freshwater snails where they mature into an infective stage known as cercariae. The free-swimming cercariae are released from the infected snail into the surrounding water. Cercariae enter their human host by burrowing though skin that is exposed to contaminated water. Male and female parasites migrate to either the liver, the lower intestines, or bladder, where they release eggs into feces or urine.

Symptoms of schistosomiasis are a result of the body’s immune response to the flatworms and their eggs. Symptoms include itching at the cercariae entry sites, fever, cough, abdominal pain, diarrhea, and enlargement of the liver and spleen. In some cases, infection may lead to lesions in the central nervous system.

In that previous post we did a little comparison of the different Deep Research tools out there. At least, we looked at how Deep Research search tools might work. I promised to look at some other analysis tools. This is my briefing on using Google’s NotebookLM.

Full disclosure: Some of my Googler friends currently work on NotebookLM. I've tried to not let that influence my writing here.

How NotebookLM works: The core idea of NotebookLM (NLM) is that you create a notebook on a given topic, download a bunch of sources into it, and then “have a conversation” with your sources.

Think of it as though you’ve got a research assistant–you give them a pile of content (Google Docs, Slides, PDFs, raw text notes, web pages, YouTube videos, and audio files). Then you can chat with them about the stuff that’s in there. It’s as though NLM has just read everything and understands it reasonably well. This is handy if you’ve got a lot of content to use. (In the AI biz this is called “Retrieval Augmented Generation,” which means that the text generation process is driven primarily by the contents of the sources you’ve uploaded.)

So… by using our standard SRS methods, I searched for good online content (mostly PDFs of medical articles, mostly found by using Google Scholar) and uploaded it to NLM.

I’ll spare you the long story here… but I spent WAY too much time finding low quality content that initially looked good, but turned out to be puff pieces published in low quality journals. (I assume that you, our SRS regulars, know how to find this out!)

I ended up with a screen that looked like this (the red rounded-rectangles and circuled numbers are mine):

Click to enlarge for details

In panel 1, I’ve uploaded 9 different papers–one of which was my SRS post from last month about Egypt. I picked the 8 papers as being high quality about the effects of the building of the Aswan Dam on the incidence of Schistosomiasis in Egypt.

Once I loaded them up, I chatted with NLM about the papers a bit in panel 2 and then asked my key question:

[ did the incidence of schistosomiasis increase or decrease after the Aswan Dam was built? ]

And then:

[ What were the predicted effects of the Aswan High Dam on schistosomiasis? ]

NLM then gave me a quite reasonable answer based on the documents I’d uploaded as sources. As you can see in the image below, NLM gives citations to places where it extracted the text when you roll over the gray citation number.

Rolling over one of the citation number causes a pop-up that shows exactly where the
supporting text is in the list of sources.

Now, I have to admit that I read these 9 papers, so I have a good idea what’s in them. And I’d written a draft of the answer to that question. Somewhat to my surprise, the answer was quite good–it’s very much what I would have written (although perhaps not in the style that you see. Here’s the start of the NLM reply to my question (and a link to the full doc):

After the Aswan High Dam (AHD) was built, the incidence of schistosomiasis in Egypt decreased overall, despite initial concerns that the dam would cause it to increase [1-4].

Initial Concerns: It was predicted that the impoundment of the Nile would create ideal conditions for snails, which are intermediate hosts for the schistosomiasis parasite [5-7]. It was also thought that steady water flow downstream from the dam would increase human contact with the water, leading to more contamination [5, 6].
Public Health Measures: The implementation of public health programs played a critical role in reducing schistosomiasis [4, 8-10]. These programs included:
Installation of clean fresh water pumps [11, 12].
Treatment of canals and drainage ditches with copper sulfate [11, 12].
Mass distribution of the drugs metrifonate and praziquantel [11-14].
Mollusciciding to reduce snail populations [15].

… etc …

This is pretty good, but I have a couple of quibbles.

The citations are both wonderful and annoying. The wonderful part comes from being a very specific citation. As you see in the above image, NLM tells you the paper it found AND specifically the supporting text. That’s really nice.

On the other hand, you have to go through the note citation-by-citation, one at a time, to figure out that 1, 3, 5, 6, and 8 all are to the paper “Effects of the Aswan Dam.pdf” in the sources. And, at this time, I haven’t been able to figure out how to get NLM to give me a full citation list–you know, something that looks like this:

Abd-El Monsef, H., Smith, S. E., & Darwish, K. (2015). Impacts of the Aswan high dam after 50 years. Water Resources Management, 29, 1873-1885.

I hope that gets added to the list of things NLM will do, but at the moment, it’s kind of a laborious process to figure out the mapping from citation numbers to actual papers.

But perhaps the most impressive piece of this analysis by NLM was the summary (emphasis by NLM):

In summary, while the Aswan High Dam did cause a change in the types of schistosomiasis seen in Egypt, and there were initial fears the disease would increase, overall schistosomiasis decreased significantly because of public health interventions [9]

That’s the conclusion I came to as well, and it’s definitely NOT the first thing you’d find with just a simple Google search. In fact, before the dam was built, an increase in schistosomiasis was predicted by epidemiologist Van Der Schalie who wrote: “...there is evidence that the high incidence of the human blood fluke schistosomiasis in the area may well cancel out the benefits the construction of the Aswan High Dam may yield (Farvar and Milton 1972)”. It was widely thought that schistosomiasis would increase as farmers converted from basin irrigation system to perennial irrigation and so had more water to irrigate with.

But with such predictions widely known, the Egyptian government began a far-ranging program of killing snails, cleaning up waterways, and giving much of the population chemotherapy against schistosomiasis.

In 1978, soon after the AHD was commissioned, a baseline epidemiological survey on over 15,000 rural Egyptians from three geographical regions of Egypt (Nile Delta, Middle Egypt and Upper Egypt), plus the resettled Nubian population showed that the prevalence of schistosomiasis was 42 % in the North Central Delta region, 27 % in middle Egypt and 25 % in Upper Egypt. That sounds bad, but it’s a massive reduction in disease rates.

So, overall, the control program worked pretty well, with infection rates dropping to record lows.

The thing is, the predictions about the consequences about building the dam were right–but they counteracted the increase by anticipating the consequences and being proactive about fixing them.

Wait! There’s more:

In addition to this chat / query process, you can also ask NLM to create a study guide, an overall summary, or a briefing doc. NLM gives you prompts to use (roughly “Create a study guide from these sources…” or “Summarize these sources in a readable format.”)

As an expert SearchResearcher, you can also chat with just one document to get JUST that document’s perspective (and, implicitly, that author’s point-of–view).

Give this a try!

SearchResearch NotebookLM tips:

Make sure the content you upload as sources is high quality. If you have low quality content, that will surface in the answers you ask NLM.
If you lose the connection between the source you uploaded to NLM, it’s a pain to restore. You have to remember it. Be sure the uploaded source has the identifying info in it. (I always put the citation and the original source URL at the top of the source text.)
If you ask a question about something that’s NOT covered by the sources, the quality will drop. Stay on topic and try to not overdrive your headlights.
I haven't tried using NLM without carefully selecting the sources that go into the collection, but I can imagine a use-case where you add multiple sources in and then ask for the plus-and-minus analysis. That's an interesting experiment--let us know if you try it.

Keep searching.

Thursday, February 6, 2025

SearchResearch (2/6/2025): SearchResearch, Search, and Deep Research

I wasn't surprised...

"Deep Research" as imagined by Google Gemini.
I especially like the researcher writing simultaneously with both hands.
I can't do that, can you?

... when "Deep Research" suddenly became the thing all the cool AI kids were talking about. It's the inevitable next step in "Search" when you live in a world full of LLMs going their thing.

What is Deep Research?

You know how regular web search works. The whole premise of SearchResearch has been to figure out how to do deeper, more insightful research by using the search tools at hand.

"Deep Research" uses an LLM to do some of that work for us. Is it the next generation of SRS tools?

In effect, "Deep Research" is AI-guided search + LLM analysis to make multi-step investigations. The goal is to do some of the SearchResearch heavy lifting for you.

The key new idea of "Deep Research" is that the LLM AI uses its "reasoning model" to create a multi-step plan that it then executes to find, consolidate, and analyze information for you. This actually works pretty well because the DR system breaks the large research task down into multiple steps, does a bunch of searches to find relevant documents, then answers the question in the context of those retrieved documents. It's fairly clever.

You'll probably read that this is one of the first uses of "Agent AI" (aka "agentic"), but it's kinda not really that. I think most people agree that an AI agent is when an AI system uses another system to accomplish a task. For instance, an AI agent could login into your favorite airline reservations system and buy a flight on your behalf.

DR "agents" run queries for you and do analyses of texts that they pull off the web. At the moment, no DR agents can log you into a paywalled system and purchase an article for you. THAT would be agentic, and (here's a prediction for you) it'll happen. But what we see today are very simple pulls from resources, not full agency.

Henk van Ess has done a nice piece of work contrasting 4 different "Deep Research" systems. To wit:

a. Gemini Advanced 1.5 Pro with Deep Research

b. OpenAI’s Deep Research (via ChatGPT Pro)

c. Perplexity’s Deep Research Mode

d. You.com’s Research Feature

As you might expect in these days of hurry-up and copy-what-everyone-is-doing, there will be more Deep Research systems in the next few weeks. (HuggingFace just announced that they made their DR system that "cloned OpenAI's Deep Research system" in just 24 hours. Reading their blurb on how they did this is interesting. (Even better: Here's their Github repo if you want to play with it yourself.)

I agree with much of Henk's analysis, but here's my take on the current crop of DR systems.

There are 3 big issues with DR systems from my perspective.

1. Quite often the text of the analysis includes all kinds of tangential / off-topic stuff. If you just blast through the text of the analysis, you'll miss that it's full of things that are true, but not really relevant. The writing looks good, but is fundamentally hollow because it doesn't understand what it's writing about.

2. If you ask a silly research question, you'll get a silly answer. As Henk points out in his analysis, asking a DR system about the "banana peel stock market theory" will generate plausible sounding, syntactically correct, but ultimately meaningless garbage.

Interesting: if you do an ordinary Gemini search with his question:

[Is there a correlation between banana peel speckle distribution and stock market fluctuations in countries that don’t grow bananas?]

you'll get a succinct "There is no known correlation between banana peel speckle distribution and stock market fluctuations in countries that don't grow bananas."

But if you ask Gemini 1.5 Pro with Deep Research you'll get a 2200 word vaguely humorous essay with data tables, a section on fungal diseases, and a list of stock market fluctuations. Even though ultimately the summary of the long-form essay is "Well, based on the available research, the answer is a resounding "probably not."" Gemini DR sure takes its sweet time getting there.

And Gemini handles this better than other DR systems, which generate profoundly idiotic (but polished!) answers.

Deep point here: If your question doesn't make sense, neither will the answer.

3. You have to be careful about handing your cognitive work off to an AI bot. It's simple to ask a question, get an answer, and learn absolutely nothing in the process.

This is a caution we've been saying in SRS since the very beginning. But now it's even more important. Just because something LOOKS nice doesn't mean that it's correct. And just because you've used the tool to answer a question doesn't mean you actually grok it.

What's more, if you haven't done the research yourself, you won't understand all of the context of the findings--you'll have none of the subtleties and associations that come with doing the work yourself.

You do NOT want to be in the position of this guy who was able--with the help of AI--to write a great email, but obviously has no idea how to actually do the thing he proposes to do. (Link to the Apple Intelligence video.)

Be careful with the tools you use, amigos. They can have a profound effect on you, even if you didn't intend for that to happen.

And keep searching--there's so much left to learn.

SearchReSearch