Thursday, May 15, 2025

Answer: How good are those AI summaries anyway?

 Summaries are THE current AI hotness... 

P/C [slide showing someone using AI for summarization] Gemini (May 1, 2025)

... you see the promotions for summarizing anything and everything with an LLM.  Various AI companies claim to be able to summarize meetings, emails, long boring white papers, financial statements--etc etc etc. 

I have my concerns about the onslaught of excessive summarization, but I'll save that for another day.    

This week we asked a very specific question: 

1. How has your experience of using AI for summarization worked out?  

An obvious first question:  What does it mean to summarize something?  Is it just making the text shorter, or does "summarizing" imply a kind of analysis to foreground the most important parts?  

And, "is there a method of summarizing that works for every kind of content?"  

I don't have a stake in this contest: if just shortening a text works, then I'm all for it.  But I kind of suspect that just shortening a book (rather than rewriting it), won't make for a great summary.  For example, just shortening "Moby Dick" would lose its commentary on race, critiques of contemporary thought, and the nature of knowledge. You know, all the stuff you had to learn about while reading it in high school.  

Summarizing is, I suspect, a grand art, much as creating an explanation is.  When I explain what a text is "about," the explanation will vary a great deal depending on what the purpose of the explanation is, and who I'm explaining it to--telling a 10 year-old about Moby Dick isn't the same as telling a 30 year-old.  Those explanations--or those summaries--will be very different. 

So when we prompt an LLM for an explanation, it behooves us to provide a bit of context.  At the very least, say who's the summary for, and what the point of the summary is.  So, whenever you ask an LLM for a summary, a little context is your best friend.  Throw in a "summarize this for a busy PhD student" or a "explain this to my grandma" – it'll make a world of difference. 

To answer this Challenge, I did a bit of experimenting.  

Since I write professionally (mostly technical articles intended for publication in journals and conferences), I have a LOT of samples I can experiment with.  

For instance, I've recently been working on a paper with a colleague on the topic of "how people searched for COVID-19 information during the pandemic."  (Note that this was a 1-sentence summary of the paper. The length of a summary is another dimension to consider. Want a 1-sentence summary of War and Peace?  "It's about Russia.")  

Notice that all tech papers have an abstract, which is another kind of summary intended for the technical reader of the text.  I wrote an abstract for the paper and thus have something completely-human generated as a North Star.

I took my paper (6100 words long) and asked several LLMs to summarize it with this prompt: 

     [ I am a PhD computer scientist. Please summarize this paper for me. ]

I asked for summaries from Gemini 2.5 Pro, ChatGPT 4o, Claude 3.7 Sonnet, Grok, Perplexity, and NotebookLM.  (Those are links to each of their summaries.)  

Here are the top-left sections of each--arrayed so you can take a look at the differences between them.  (Remember you can click on the image to see it at full-resolution.)  


And... 

 I took the time to read through each of the summaries, evaluating each, looking for accuracy and any differences between what we wrote in the paper and the summaries.  

The good news is that I didn't find any terrible errors--no hallucinations were obvious.  

But the difference in emphasis and quality was interesting.

The most striking thing is that different summaries put different findings at the top of their "Key Findings" lists.  If you ignore the typographic issues (too many bullets in Gemini, funky spacing and fonts for Perplexity, strange citation links for NotebookLM), you'll see that:  

1. Gemini writes a new summary that reads more like a newspaper account.  It's quite good and lists Key Findings at the top. giving an excellent summary at the very beginning with good key findings.  Of all the summaries I tested, this was by far the best, primarily for the quality of its synthesis and the clarity of the language it generated. (The summary was 629 words.) 

2. ChatGPT is more prosaic--really a shortening rather than significant rewriting. It didn't really do a summary as much as it gave an outline of the paper.  It was okay, but to understand a few of the sentences in the summary you need to have read the paper (which is NOT the point of a summary).Note that ChatGPT's Key Findings are somewhat different than Gemini's.  (432 words) 

3. Claude also has different Main Findings, and brings up Methodological Contributions to near the top, which Gemini and ChatGPT do not.  But it did a good job of summarizing the key findings, and wrote good prose about each.(324 words) 

4. Grok puts Key Findings below Sources and Methods, but the text is decent. Grok buried the Key Findings under Sources and Methods, which is a bit like hiding the cake under the vegetables. It had 4 key findings (other had more) and a decent, if short, discussion of what this all meant. (629 words)

5. Perplexity is similar, but gets confused when discussing the finding about Query Clusters. It was a bit sketchy on the details and gave a confused story about how clustering was done in the paper.  (I suspect it got tripped up by one of the data tables.)  (256 words) 

6. NotebookLM uses much less formatting to highlight sections of the summary, and includes a bunch of sentence level citations.  (That's what the numbered gray circles are--a pointer to the place where each of the claims originates.)  NLM spent a lot of time up-front discussing the methods and not the outcomes. (1010 words)

Overall, in this particular comparison, Gemini is the clear winner, with ChatGPT and Claude in second place.  Both Perplexity and NotebookLM seem to get lost in their summaries, kind of wandering off topic rather than being brief and to-the-point.  

This brings up a great point--when summarizing a technical article, do you (dear reader) want a structured document with section headings and bullet points?  Or do you want just a block of text to read?   

A traditional abstract is just a block of text that explains the paper.  In fact, the human-generated abstract that I wrote looks like this: 

The COVID-19 pandemic has had a dramatic effect on people’s lives, health outcomes, and their medical information-seeking behaviors.  People often turn to search engines to answer their medical questions. Understanding how people search for medical information about COVID-19 can tell us a great deal about their shifting interests and the conceptual categories of their search behavior.  We explore the public’s Google searches for COVID-19 information using both public data sources as well as a more complete data set from Google’s internal search logs.  Of interest is the way in which shifts in search terms reflect various trends of misinformation outbreaks, the beginning of public health information campaigns, and waves of COVID-19 infections.   This study aims to describe online behavior related to COVID-19 vaccine information from the beginning of the pandemic in the US (Q1 of 2020) until mid-2022. This study analyzes online search behavior in the US from searchers using Google to search for topics related to COVID-19.  We examine searches during this period after the initial identification of COVID-19 in the US, through emergency vaccine use authorizations, various misinformation eruptions, the start of public vaccination efforts, and several waves of COVID-19 infections. Google is the dominant search engine in the US accounting for approximately 89 percent of total search volume in the US as of January, 2022.  As such, search data from Google reflects the major interests of public health concerns about COVID, its treatments, and its issues.  

Interesting, eh?  Although written by a human (me!), the abstract doesn't pull out the Key Findings or Methods (although they're there) in the same way that the AIs do.  

But perhaps the structure of the LLM summaries is better than the traditional pure text format.  When asked for summaries of other papers by LLMs (i.e., NOT written by me), the "outline-y" and "bullet-point" format actually worked quite well.    

When I used Gemini on papers that are from a recent technical conference, I found the summaries to be actually quite useful.  To be clear, what I did was to read the AI generated summary as an "extended abstract," and if the paper looked interesting, I then went to read the paper in the traditional way.  (That is, slowly and carefully, with a pen in hand, marking and annotating the paper as I read.)  

A bigger surprise... 

When I scan a paper for interest, I always read the abstract (that human-generated summary), but I ALSO always look for the figures, since they often contain the heart of the paper.  Yes, it's sometimes a pain to look at big data tables, or intricate graphs, but it's usually also tells you a lot.  Just the figures are often a great summary of the paper as well.  

This is the first figure of our paper.  The caption for Figure 1 tells you that it shows:

Google Trends data comparing searches for “covid mask” and “covid.” This shows searches for “mask” and all COVID-related terms from Jan 1, 2018 until July 7, 2022. Note the small uptick in October for Halloween masks in 2018 and 2019.  This highlights that all searches containing the word masks after March, 2020 were primarily searches for masks as a COVID-19 preventative measure.




Oddly, though, only ChatGPT was able to actually pull the figure out of the PDF file. The other systems claimed that the image data wasn't included in the file, although ChatGPT showed that they were woefully misled. I actually expected better from Gemini and Claude.   

Although I could convince ChatGPT to extract images from the PDF document, I wasn't able to get it to create a summary that included those figures.  I suspect that a really GREAT summarizer would include the summary text + a key figure or two. 

SRS Team Discussion 

We had a lively discussion with 19 comments from 5 human posters.  Dre gave us his AI developers scan of the situation, suggesting that working directly with the AI models AND giving them clean, verified data was the way to go.  

Scott pointed out that transcripts of meetings are often more valuable when summarized.  He reports that AI tools can summarize transcripts quite well, but he finds that "denser" technical materials don't condense as easily.  (Scott: Try Gemini 2.5 Pro with the prompting tips from above.)  

Leigh also runs his models locally and is able to fine tune them to get to the gist of the articles he scans, and has built his own summarizing work flow as well. 

The ever reliable remmij poured out some AI-generated wisdom about using LLMs for summarization, including strong support for Scott's point-of-view that can be best summarized as "They are Language Models, Not Knowledge Databases."  That is, hallucination is still a threat: be cautious.  (I always always always check my summaries.)  

Ramón chimed in with an attempt to summarize the blog post... and found that the summarizer he used produced a summary that was longer than the original!  Fun... but not really useful!  


SearchResearch Lessons

1. More useful than I thought! A bit to my surprise, I've found the LLM summaries of technical articles to be fairly useful. In particular, I found that Gemini (certainly the 2.5 Pro version) creates good synthetic summaries, not just shortened texts.   

2. Probably not for literature analysis as-is. If you insist on using an LLM to help summarize a text with deeper semantics, be sure to put in a description of who you are (your role, background) and what kind of summary analysis you're looking for (e.g., "a thematic analysis of...").  

3. When you're looking at technical papers, be sure to look at the figures. The AIs don't quite yet have the chops to pull this one off, but I'm sure they'll be able to do it sometime soon.  They just have to get their PDF scanning libraries in place!  

Hope you found this useful.  

Keep searching. (And double check those summaries!)  



Tuesday, May 13, 2025

Special: Why are AIs so bad at diagramming?

 I've been reading about all of the wonderful illustrations that current AI systems can make... 

Generated by Gemini 2.0 Flash (May 13, 2025) 

The prose I've been reading is frankly astonishing and heavy on praise for the remarkable things AIs can create.  And while it's true that seeing a green-horse galloping across the heavens in a spacesuit, or a corgi living in a house made of sushi is astounding, that's not the kind of thing I want or need for my everyday work.  

I often want a fairly simple explanatory diagram of how something works.  

Turns out that I wanted a neat, clean, simple diagram of the positions for each of the 9 players on an American baseball diamond.  How hard could it be?  Answer: VERY hard.  What I wanted was something like this diagram below:  

Image by Wikipedia author Cburnett.  Original.  

But I didn't want all of the measurements and extra annotations, so I thought "What a perfect application for an AI!"  

That's what I thought.  Now I realize that AI graphics creation systems are truly terrible at making diagrams of even simple things.  I gave the prompt: [create a diagram of the positions of the players on a baseball diamond]  This is Gemini 2.5 Pro: 

Gemini 2.5 Pro's diagram of a baseball diamond. (May 13, 2025) 

Seems really, really, really wrong.  There should be 9 players, not 12. I don't see a pitcher anywhere, but I AM surprised to see the "First Baenan" at second base, and the "Convergeder" catching behind the plate.  Those are both hallucinations of players in some bizarre intergalactic baseball game.  


Here's the slightly better diagram made by the earlier (and presumably less capable) Gemini 2.0 Flash. 
 

Kind of right, but the pitcher is in the wrong place, and I don't know what "Fiird Base" is.
P/C Gemini 2.0 Flash.  



But other systems get it even wronger... this is Claude's attempt: 

Claude's attempt to show the player positions on a baseball diamond.  



Wow!  That's truly awful.  But the bigger surprise is that ALL of the AI systems can't make decent diagrams.  

Here's an attempt to draw the player positions on a soccer field: 

A truly strange diagram of soccer players on the field with mysterious arrows and
odd positions like "Staupbol." (Gemini 2.0 Flash)


It gets even weirder when you ask for diagrams that are more difficult to fact check.  But an MD might be astonished by this diagram of the blood circulation system: 

Gemini's diagram of the human blood circulatory system. There's so much wrong here I don't know where to begin.  I'm glad to report that MY "Orchomatury" artery is intact.  If your circulatory system looks like this, please check in with your physician.  

Even diagrams that should have been simple proved beyond AIs capability.  When I asked for a timeline of the major events of the US Civil War, Gemini 2.5 just flat out refused.  "Sorry, I can't draw diagrams."  When I pointed out that it JUST drew a few diagrams, it gave a long argument about why a timeline is just too complicated.  (And a human circulatory system is simpler?)  

By contrast, Perplexity was happy to give me a diagram of the US Civil War: 

Perplexity's diagram of the US Civil War.  Done not long after I asked it for a baseball diagram.  

I thought to ask Midjourney the same question:  [diagram of the main events of the US Civil War] 

Midjourney's diagram of the main events of the US Civil War (May 13, 2025) The text is illegible. 

I could go on in this vein.  In fact, I DID go on in this vein, trying different prompts and different kinds of diagrams.  I'll spare you the gory details--let's just say it was an entertaining but unproductive couple of hours.  

SearchResearch Lesson 


Do NOT rely on any current AI systems to give you a decent diagram.  If you're asking about something that you don't understand deeply, you're MUCH better off doing a classical web image search.  

Keep Searching! 



Thursday, May 8, 2025

Slightly delayed...

I should have known this would happen... 


... I took the silver flying bird to Copenhagen today, instead of writing the answer to last week's SRS Challenge.  I was looking forward to spending some quality time thinking about AI summaries... but it's too big a topic to knock out in an hour.  

SO... in order to create the bespoke, hand-crafted, lovingly artisanal SRS answer that you expect, I decided to push it out a week to give me a bit of extra time.  

BUT... if you find yourself in Copenhagen tomorrow (Friday, May 9, 2025), I'll be speaking at the Pioneer Centre for Artificial Intelligence Research at 14:00.  Here's the link for you, just in case you're around.  (And if you DO make it to my talk, please stick around afterwards and let me know you're a SearchResearcher.  The passphrase will be "I verify all my searches.")  

Title:  People, AI, and online research

Abstract:  Given all of the press that LLMs have garnered, it’s worthwhile asking if they’re changing the ways people find information.  Are LLMs pulling traffic away from the search engines?  Just as importantly, how do regular people think about the quality of information they get from their favorite AI systems?  One key lesson of my research into the UX of AI systems over the past 30 years is that people don’t really understand what AI is, how it works, or what it means for them. I’ll review some successes and failures of earlier research approaches, what we should learn from these decades of practice at the boundary between human experience and the use of intelligent systems, and where AI systems will change knowledge practices in the future.

 



Keep searching!  


Thursday, May 1, 2025

SearchResearch (5/1/25): How good are those AI summaries anyway? Let us know!

 I don't know about you... 

P/C [slide showing someone using AI for summarization] Gemini (May 1, 2025)

... but while the entire world seems to be having ecstatic paroxysms about the incredible capabilities of generative AI, my experience has been a little less wonderful.  

On more than one occasion I've spent hours trying to track down a factual claim, a citation, or a reference that some AI system made, only to find out that it's bogus or purely hallucinatory.  

That's my experience.  Don't get me wrong--there's a lot of good stuff in the AI that's deployed, but I find myself constantly having to check the output of an AI to make sure that the bad stuff doesn't overwhelm the good stuff.  

For this week, I'd love to hear your stories about using AI and then having to fact-check it, only to discover that things went way off.  

Let's ask a very specific Challenge for this week: 

1. How has your experience of using AI for summarization worked out?  

I really want to hear your stories.  Let me give you one example.  

I asked Gemini 2.0 (Flash) to [summarize this article and give citations to the claims that you make in the summary].  

It seemed to do a good job, but one of the citations was to an article by a famous author in a well-known publication, on a date that was plausible, with a title that is very consistent with everything he's written over the past few years.  

But try as I might, I could NOT find that damn article.  I eventually went to the journal's searchable archive website and found that no article by that name was ever published.  The whole thing wasted a full hour of my time.  

I definitely do NOT want to be in the situation of the lawyers for Mike Lindell who submitted legal briefs with LLM hallucinated citations.  

So I want to hear your stories about using AI to summarize other texts.  How's that working out for you?  

Share your AI summarization stories in the comments below so we can all learn from your work.

What's your critical and careful analysis of the quality of AI as a text summarizing tool? 

True stories only, please.  And definitely nothing written by AIs.  

I'll summarize your stories--lovingly, by my human eyes, hands, and brains--and let you know what the SRS crew has to say about this.  


Keep searching.  


Monday, April 28, 2025

A lemon in lemon - update

 In our previous SRS episode.... 

P/C bungaboi89, from his post on r/mildlyinteresting

... we discussed the origins of the "lemon in lemon" and when it was first written about in the 18th century.  

I found this great pic on Reddit and thought I'd share it with you here.  Bungaboi89 gave me permission to use this excellent image of exactly what we were talking about.  

And I thought you might enjoy this.  I think I need a glass of lemonade (a double, please) around now. 

Keep searching.  




Thursday, April 24, 2025

Answer: What's a lemon in a lemon called?

 A lemon in a lemon? 

Excerpt  of still life by Jan Davidsz de Heem, "Breakfast with Wine Glass and Goudse Pipe"
P/C source Wikimedia.


Curiously, I've actually found a perfectly formed lemon (peel and all) growing inside of another lemon.  That was weird, so I looked around a bit and discovered a world of remarkable things.   

Obviously, this led to this week's SearchResearch Challenge:  

1.  What do you call this strange lemon inside of another lemon? 

I'll start by telling you what doesn't work:  Almost any query having to do with just "lemon" in it.  There are just too many articles about lemons--especially recipes--that the term "lemon" is overwhelmed by other content. 

BUT, [lemon inside lemon] does lead you to a fun Reddit post about exactly this (in the subreddit /mildlyinteresting).  That post has a perfect pic of the lemon-in-lemon, exactly like the one I saw.   

Otherwise, this kind of search doesn't work well.  Or at least, there's not a lot of information out there that's about "lemons in lemons."  So we have to shift our strategy.  

I realized this only after trying a lot of variations (e.g., [twin of lemon] [double lemon] [lemon growing inside of lemon]). Despite being very clever, all I found were reports of lemon seeds sprouting inside of a lemon fruit.  I've seen a lot of those (I grew up in Los Angeles where every other home has a lemon tree), but it's not quite the same thing.  They look like this, perhaps you've seen this as well:


That's not it, though... We're looking for a fully-formed lemon fruit on the inside of the lemon.  

Then I saw the comment on this blog by Harry8Dresden who said that he just copied much of the Challenge into ChatGPT 4o.  I tried this with ChatGPT 4o, Claude 3.7, and Gemini 2.0 Flash. (If you try other AIs, let us know in the comments.)  I just copy/pasted this much of the Challenge statement: 


[ This happened to me once... I found a fully formed, perfectly intact lemon completely enclosed within an outer lemon shell.  Silly me, I didn't take a picture, but imagine a double-skinned lemon and you'll have the right idea. 

So, as you'd imagine, I did a little bit of research and found that there is a very specific name for this kind of strange double lemon fruit AND learned that it was well known in the 18th century!   Obviously, this has to lead to a SearchResearch Challenge:  1.  What do you call this strange lemon inside of another lemon? ]


Here, in summary, is what I got back from each:  

ChatGPT: calls this lemon-in-lemon an "inclusion" and points to the French naturalist Antoine-Joseph Dezallier d'Argenville included detailed descriptions and illustrations of citrus fruits exhibiting unusual growth patterns in his 18th-century botanical works.  

That may be, but the cited work actually does NOT contain any of d'Argenville's citrus writings!  Hmm.  d'Argenville wrote extensively about gardens, but NOT about citrus, nor enclosed.  Correction: he had one brief mention about the origin of Bergamot oranges, but nothing about the oddities of lemon growth.

(More interesting and odder:  I actually ran this query on ChatGPT twice by accident, and got different answers each time.  The first time ChatGPT called it a "supernumerary fruit," which is close, but not quite right. A "supernumerary" is just an extra fruit in an unusual place--not an enclosed fruit.)  In both cases, ChatGPT didn't help much.  


Claude: Says...well... funny you should ask.  Claude initially said that this was "endocarp polyembryony" but then changed its mind to say that this unusual growth is called a "citrus within citrus" or more specifically an "endocarp proliferation." (Good thing Claude changed its mind: "polyembryony" refers to multiple shoots from the same seed--NOT what we're looking for.)

Except... "citrus within citrus" usually refers to a particular type of citrus being (taxonomically) a child of another citrus variety--example: the Meyer lemon results from a cross of a citron and a pomelo.  Claude interprets this phrase incorrectly. 

And "endocarp proliferation" is the increase in endocarp tissue, the innermost layer of the fruit wall, particularly as it develops and matures.

Claude went on to say that: 

In historical botanical literature, particularly from the 18th century, this was referred to as "Fructus in fructo" (fruit within fruit) or "proliferous citrus.

Meanwhile, "fructus in fructo" translates to "fruit within fruit" and is used to describe the usufruct legal concept, which is a right to enjoy the benefits of property owned by another. It specifically refers to the right to collect the "fruit" (profits) from a property. In this context, "fructus" refers to the product of the property, like crops or animals, or the income generated from it. It has nothing to do with one "fructus" being contained within another.  

Finally, "proliferous citrus" refers to additional fruit that forms on the side of the base fruit.  Many citrus do this, but it's not inside the original fruit--it's not what we're looking for.  

While I learned a lot from Claude, it was mostly off-topic learning.  *I* appreciate the excursions, but most people probably won't.  Not a great reply.  


Gemini: Tells us that some 18th century texts "...describes citrum in citro, noting it as a "strange monstrosity" where "one lemon [grows] from the centre of another." 

This Latin phrase, translating to "citrus in citrus" or, more specifically in this context, "lemon in lemon," provides a historical designation for the observed anomaly. 

Gemini also tells us that: 

"In modern botanical terminology, the phenomenon described by the user aligns closely with the concept of internal proliferation or carpelloid formation. Several of the provided research snippets discuss this occurrence in other fruits, particularly bell peppers." 

A quick regular Google search confirms that these terms (internal proliferation and carpelloid formation) actually DO refer to  the development of an abnormal ovule inside a fruit, resulting in a miniature version of the parent fruit rather than seeds. This phenomenon often occurs in peppers (Capsicum annuum). 

P/C Wikimedia A pepper inside of a pepper, just like lemons.

Gemini goes on to tell us that: 

"Johann Christoph Volkamer's comprehensive work, Nürnbergische Hesperides, published in the early 18th century, aimed to document the various citrus fruits known at the time, including their cultivation and characteristics."  

So I looked up this book and found you can download it from Heidelberg University 

The book bills itself as "Nürnbergische Hesperides, Oder Gründliche Beschreibung Der Edlen Citronat- Citronen- und Pomeranzen-Früchte: Wie solche in selbiger und benachbarten Gegend/ recht mögen eingesetzt/ gewartet/ erhalten und fortgebracht werden" (Or..."Nuremberg Hesperides, or a thorough description of the noble citron, lemon, and bitter orange fruits: how such in the same and neighboring area/ may be properly used, maintained, preserved, and transported.")  

 If you read through the book, you'll eventually find this page: 

Page 173 of Nürnbergische Hesperides. P/C Google Books

Which has this accompanying figure: 


The figure is not 100% clear, but the text is very straightforward. If you use Google Translate (in camera mode) to translate the text, you'll find: 

"When I cut it open, I found two small fruits inside, which had grown a little at the top, but could still be lifted out, and were surrounded by a yellow shell, but inside completely white and thick, without a single mark. When I cut off a space among similar small fruits, and for the third time these small fruits, I found hidden inside reach."  

Now we know: the lemon-in-lemon effect has been known about for a long time, and it's got a very specific modern botanical term: internal proliferation or carpelloid formation.  In the 18th century, it was known as citrum in citro.  (Yes, I did a bit more confirmation searches and found other books that used this phrase.)  

2. What famous 18th century explorer knew about this strange lemon? 

Finding this was a bit tricky.  For many historical searches like this I turn to Google Books, limiting my search to books published in the 18 century.  Choose the tools option and limit your search like this: 


 And then, the guessing game was on:  How would an 18th century explorer talk about something like this?  (And would it be in English, Spanish, Portugese, or Chinese? There were a LOT of explorers at the time.) 

I started with queries in English, hoping against hope that I'd find something, maybe something in translation.  

As before, I tried a lot of queries:  [lemon in lemon] [ lemon within lemon] [lemon surrounded by lemon] [citrum in citro] etc etc.  

On the 7th or 8th attempt, I struck gold with [lemon enclosed within lemon] 


The first hit ("New Royal Geographical Magazine") contains the second document, which is Volume 1 of Captain James Cook's Voyage to the Pacific Ocean.  On page 14 Cook writes about a stopover on Tenerife with a few botanical notes: 


... they called it an "...impregnated lemon. It is a perfect and distinct lemon enclosed within another and differing from the outer only in being a little more globular." 

So we find that Captain James Cook wrote about the citrus in citro 

3. Where did he find these odd botanical mutants?  

Reading through the text, we find that Cook wrote about this odd lemon while his ship was on a pause in the Canary Islands in 1776 near the beginning of his Third Voyage.  This was the Voyage that ended with him being killed in Kealakekua Bay in 1779.

Later, William Bligh (famously of the mutiny) also took his ship, Bounty, to the Canary Islands in 1788 at the beginning of his shortened round-the-world trip when the sailors cast him off the ship into south Pacific waters in April 1789.  He also used the same term ("impregnated lemon") to describe the odd fruit.  

You might wonder how Cook could write this book that was published AFTER his death.  Answer: one of the men who served under Cook on board the HMS Resolute, James King, took over as captain on Cook's death, and wrote up the voyages after he returned to England (based on both his and Cook's extensive notes).  


SearchResearch Lessons 

1. Using an AI to generate ideas and leads is a great idea!  But check everything.  As we saw in this example, LLMs don't always speak truthfully or with attention to detail.  ChatGPT and Claude both were seriously wrong.  Gemini did better this time, but always / always / always check what you find.  Use another AI if you have to, but in the end, validate with grounded texts (that are usually easier to find with regular search). 

2. Searches can take time.. and reward iteration.  When looking for the 18th century commentary on "lemon in lemon," I had to try a LOT of variations on that query.  With persistence, I was able to find the Captain Cook quote--but I admit that I got fairly lucky.  This is one of those cases when your personal knowledge (in this case, about the way an 18th century sea captain might write) can be incredibly handy.  Arr! 

3. Remember Google Books is a great repository for historical searches.  Especially since you can limit the search to a particular period.  

Keep searching! 

Thursday, April 17, 2025

SearchResearch Challenge (4/17/25): What's a lemon in a lemon called?

 Remarkable things demand attention... 

Excerpt  of still life by Jan Davidsz de Heem, "Breakfast with Wine Glass and Goudse Pipe"
P/C source Wikimedia.


This happened to me once... I found a fully formed, perfectly intact lemon completely enclosed within an outer lemon shell.  Silly me, I didn't take a picture, but imagine a double-skinned lemon and you'll have the right idea.  

So, as you'd imagine, I did a little bit of research and found that there is a very specific name for this kind of strange double lemon fruit AND learned that it was well known in the 18th century!  

Obviously, this has to lead to a SearchResearch Challenge:  

1.  What do you call this strange lemon inside of another lemon? 

2. What famous 18th century explorer knew about this strange lemon? 

3. Where did he find these odd botanical mutants?  

Once you figure it out, let us know the answer AND how you found it!  What clever SRS techniques did you use to arrive at the answer? 

Keep searching! 


Sunday, April 13, 2025

Follow-up: Can you extract and summarize a blog?

 In a moment of curiosity, 



I tried the same task as in our previous post (retrieve the last 10 blog posts from SearchResearch) with Grok and Claude... just for comparison purposes.  

I did the same query, but the results weren't much better.  

Here's the view of the sheet for Claude: 

Claude's results: Red shading means the answer is totally wrong (or a 404 error);
yellow shading means it's around 50% right; green means 100% correct.  This actually isn't bad.


For contrast, here are Grok's answers: 

Grok's results.  Truly terrible.


As we've discussed, it's a good practice to iterate when you search, and the same is true when using LLMs.  

I gave both systems a second try, after giving them both the additional prompt of [be sure to give accurate links to the blog posts.  give only high quality summaries of the pages you find.]  

Both systems said that they would do better.  Bemusingly, Grok said:  "I apologize for the oversight in providing links that may not lead to valid pages. I’ve rechecked each URL by attempting to access them and verifying whether they resolve to actual, relevant blog posts on searchresearch1.blogspot.com."  

But here are the results of the second attempt, Claude first: 

Claude second attempt: About the same (9/10 correct), just a different error.


Despite protestations of "rechecking each URL,"  Grok actually performed worse, getting a solid 100% of the links wrong.  

Grok fails in a spectacular way. Nothing is correct.  


I don't know about you, but I'm worried about the future of Agents when the major LLM providers can't get a simple request correct.  

The irony, of course, is that checking for valid URLs is really simple.  But the AI systems don't do it.  

SearchResearch Lessons 

1. Be very, very, very cautious about trusting LLM output.  Don't trust, but validate.  While LLMs CAN do a lot of great things, they can also make monumental errors.  


But have faith, and keep on searching the way you've learned.