Thursday, July 17, 2025

Answer: So what ARE LLMs good at? What are they bad at?

When do you use one tool versus another...

... that's the basic question: "When do you use regular Google versus LLMs for what types of research questions. How do you know when to use each?"

That was the Challenge:

1. How do you know when an LLM AI system will give a good answer to your question? How would you characterize a research question that's really good for AI versus a research question that you'd just use a "regular" search engine for?

I think what I'm looking for is a clear description of when an AI is most likely to give an accurate, high quality answer? By contrast, I think I know how to say when I'd use a search engine, but it's harder to describe the kinds of questions that I think an AI would do poorly.

I’d like to be able to tell my student what and when I’d use one tool over another when asking SearchResearch questions. Here’s my summary…

A. When would I use a regular search engine?

Use a search engine when you need facts, sources, and current information.

If your question is a navigational one (finding a particular web site), or is a "what," "where," or "when," then a regular search engine is what you want.

1. Navigating: When I’m either navigating to a site that know exists (Example: [movie theatre near me] or [auto repair Palo Alto] )

2. Current events: Sometimes you want the latest information or updates on something. You’re usually better off using a search engine since they are constantly updating their index. Often the information will be crawled within a few minutes of your query. This is particularly important for current or breaking news. (Example: [brush fire near San Jose CA])

3. From a particular source: Often I’ll want information from a particular source (usually a source I know and trust). That’s when using the site: operator is incredibly useful. This is one of the true strengths of a search engine. (Example: [site:nytimes.com crypto industry] for articles about crypto from the New York Times.)

4. A particular kind of result: The search engines have specialized tools for finding images, data sets, books, travel information, news, and maps information. While you might be able to get your favorite AI to give you travel directions, I think you’d be MUCH happier with a dedicated mapping app or service like Google Maps.

Overall, there are still a LOT of cases where using a specialized tool is going to work much better than using a generic AI. You, as an expert SearchResearcher, need to know what those tools are, what they're called, and how to use them.

In other words... you still need to know stuff....

B. When would I use an LLM / AI system?

LLMs are really good at tasks that can leverage their vast text training, pulling in language and concepts from many different places and stringing them together. They are really pretty good at answering open-ended questions that require synthesis across a number of information resources.

In some ways, LLMs are good at a large number of the SearchResearch Challenges. A good deal of what we cover here in SRS is how to find information that’s scattered everywhere and pull it together into a coherent whole. That’s a large part of what my book The Joy of Search is all about. (And, incidentally, that’s why I don’t think there will be a Joy of Search Part 2. Maybe a Joy of AI Research?)

Remmij pointed out that the multimodal AIs are pretty good at describing an image, and often very good at identifying what’s in the image. (Although they’re not perfect: check for yourself.)

And, to paraphrase Henk van Ess from his post on this topic:

Use AI when you need analysis, synthesis, or help in creative thinking.

If your question is a "how," "why," or "what if," then an LLM/AI is a great way to explore or explain. AI is especially good at contextual analysis when you provide the files or information yourself, after you've vetted it.

As Regular Reader Arthur Weiss pointed out, AIs are good for “exploratory queries where there is no single or simple answer and the research may involve a multi-step processes to answer. For such questions, AI wins (backup by checks using conventional approaches).”

They’re also quite good at taking an idea and helping you flesh out some good brainstorming notions that will help you get your writing kickstarted.

The obvious caution applies here: Do NOT let LLMs do your writing for you. If you want to learn anything, you need to be engaged in the content in a deep way. Letting an AI do your writing is like outsourcing the eating of your dessert—sure, it’s more efficient, but you get any of the direct experience yourself?

C. Categories of tasks that LLMs generally do NOT do a good job with: We already talked about how bad AIs are at drawing diagrams. What else do they have difficulty with?

1. Complex Multi-step Logical Reasoning or Novel Problem Solving: Example Question: "If there are three people, Alice, Bob, and Carol. Alice is older than Bob. Carol is younger than Bob. Who is the oldest, and who is the youngest?" (While this specific example might be simple enough for some LLMs, scaling it up to many variables or abstract relationships, or requiring true deductive reasoning they haven't seen before, quickly breaks them.)

Why they struggle: LLMs are pattern matchers. They excel at retrieving and synthesizing information from their training data. When faced with a novel problem that requires breaking it down into logical steps and applying general reasoning principles, they often fail because they don't truly "understand" the underlying logic. They can string together plausible-sounding sentences, but the actual logical is usually absent. (People are working on this, but it’s not quite yet at a believable place.)

2. Providing Real-time, Up-to-the-Minute Information or Future Predictions: As mentioned, current information is NOT the AI strong suit. Example Question: "What were the winning lottery numbers for last night's Mega Millions drawing?" or "What's the latest news on the political situation in <Country X> as of an hour ago?"

Don’t make the mistake of asking an AI for what hours a store is open or when a particular concert will happen—it’s pretty easy for the AI to have out-of-date information.

One study shows that for queries about news, the LLMs can get up to 60% of the facts wrong. (And note that the different LLMs give very different answers.) [CJR article on this]

It's worth knowing this: LLMs have a "knowledge cut-off date." That is, their training data is only as current as the last time they were extensively trained, which can be many months ago. They are often not connected to the live internet in the same way a search engine is, and they cannot predict future events with accuracy. (Again, this is changing—some AIs have live access to the net. But even they’re not super-reliable. But stay tuned, this might well change.)

3. Verifying Facts or Citing Specific, Reliable Sources without Prior Instruction: As you know, LLMs can "hallucinate" information, including fake citations or statistics that sound real but aren't. They don't have an inherent mechanism to verify the factual accuracy of what they generate or to browse and retrieve specific, authenticated sources in real-time. While they can format citations if given the data, they can't reliably find and validate the source material itself without external tools (like Retrieval Augmented Generation, aka RAG). What’s more, I’ve seen a lot of AIs hallucinate citations that look plausible.. but are totally wrong.

4. Tasks Requiring Fine-Grained Spatial, Physical, or Visual Understanding: Example Task/Question: "Describe how to reassemble this disassembled complex engine part (without a diagram or image input)" or "If I rotate a square 45 degrees clockwise, then flip it horizontally, what will its final orientation be relative to its original position?" Most LLMs will have a tough time with this.

Why they struggle: LLMs process text. They don't have an inherent understanding of 3D space, physical properties, or visual relationships. While they can describe these concepts if the descriptions are in their training data, they cannot perform novel spatial manipulations or truly "visualize" solutions.

5. Delivering Highly Personalized, Empathetic, or Professional Advice in Sensitive Domains: Example Question: "I'm feeling really anxious about my job. What should I do to feel better and address my underlying stress?" or "Given my unique financial situation, how should I invest for retirement?"

Be aware that LLMs lack personal experience, consciousness, and genuine empathy. They don't understand the nuances of a person's emotional state or specific circumstances. While they can offer general advice found in their training data (e.g., "exercise helps anxiety"), they are not qualified professionals and their advice should never be taken as a substitute for human medical, legal, financial, or psychological consultation. Their responses are based on patterns, not true understanding or personal connection.

Bottom line: Basically, LLMs are cybernetic mansplainers—you have to check their work. Bear that in mind as you work through all of this.

Keep searching.. and checking… and searching... .

Wednesday, July 9, 2025

SearchResearch Challenge (7/9/25): So what ARE LLMs good at? What are they bad at?

A student asked me a simple question...

... and I couldn't come up with an answer that was compelling to me (although I think the student was okay with my answer).

The question was: "You said that you use regular Google for some kinds of research questions, and LLMs for other types of research questions. How do you know when to use which?"

I gave the student an answer (because that's what professors do), but I had a little vague feeling in the back of my mind that this wasn't a very good answer.

So I thought I'd ask the collective wisdom and insights of the SearchResearch team. Here's the Challenge for the week:

1. How do you know when an LLM AI system will give a good answer to your question? How would you characterize a research question that's really good for AI versus a research question that you'd just use a "regular" search engine for?

Can you help me think through this Challenge? What kinds of research questions do YOU ask your AI... and have confidence that you'll get a decent answer? (And conversely, what kinds of questions do you NOT ask your favorite AI?)

Remember that a couple of weeks ago I posted about how terrible the various AIs are at generating diagrams? Well, there's one answer about a kind of question to not ask an AI: Don't ask them to create a diagram for you.

Here's Gemini's attempt at creating a diagram of a toaster.

Yeah. I have no idea what any of those parts are aside from the crumb tray. What's a Contreue or a Frerriod?? Maybe this is the way toasters look in a far distant galaxy, but not in any country (or language) on Earth! This toaster would be a disaster in reality.

So there's one part of the answer: asking an AI to create a diagram for you is a truly terrible idea. (And under no circumstances should you ask for a diagram of something you don't really understand.)

Let us know what you discover--post your observations in the comments, and I'll summarize them (and my thoughts) about this next week.

Keep searching.

Wednesday, July 2, 2025

Answer: Mysteries in Zürich?

SearchResearch is about learning...

St. Peterskirche, Zürich

...how to answer those questions that come up and make you wonder. As mentioned, this happens a LOT while traveling, so this week's quest is to learn how to answer those kinds of popup questions.

1. In the picture of St. Peterskirche (above) there's something about the clock tower that struck my eye and made me say, "That's funny..." Does it strike your eye too? Can you find an explanation for it? (No, it's not leaning.)

The thing that caught my eye here is the Roman numerals on the clock face. It's I - II - III ... and then ... IIII.

Surprise! It's NOT IV, as I expected.

As I travel through the world, I'm always looking for things that are not what I expect. In this case, I fully expected the clock face to be I - II - III - IV

As I wandered around central Europe, I found a LOT of clock faces with the IIII numeral.

So what's up with using IIII rather than IV?

I started with the simplest query:

[ IIII on a clock face ]

And found a number of fascinating points. While I saw the IIII as assumed it was a central European thing, this query showed me that it's used on most clock/watch faces that use Roman numerals.

And it's in common use in the US as well:

Grand Central Clock

But there are notable exceptions, such as Big Ben in London, which uses the IV numeral.

If you just look at Images with the query:

[ clock face roman numerals ]

you'll see that the sample is pretty evenly split between IIII and IV.

But when I saw the IIII numeral in central Europe, because I was in an unfamilar place, I naturally assumed it was a Swiss or central European quirk.

This is a kind of attribute substitution inference. That is, because I was in an unusual place and saw an unusual thing, I inferred that the thing was linked to the place. That is, the use of IIII was a property of my location, and not a sometimes-followed convention around Roman numerals on clocks!

I bring this up because it happens a lot, especially when searching while traveling. The temptation is to do a query like:

* [ Swiss Roman numeral clock face IIII ]

But notice that I've baked-into the query the assumption that this is Swiss. But beware: If I ask an LLM a similar question:

[Why do Swiss clock towers use IIII rather than IV on their faces]

... you'll get a similar kind of biased answer. Both Gemini and ChatGPT give a version of an answer that doesn't undo the bias. ChapGPT: "Swiss clock towers use "IIII" instead of "IV" mostly for aesthetic balance and historical tradition."

A better AI answer would be to say something like "The use of IIII vs. IV is seen around the world. It is not a convention that is particular to Switzerland or central Europe." True intelligence anticipates the errors that might come up in an answer--that's what I'd do if answering your question.

But that's not how LLMs work. If you bake in the bias, you'll get the bias back in the answers.

However, the LLMs are good at coming up with explanations about WHY this might be. The summary of the LLM's output is basically:

1. Aesthetics and Visual Balance: The "IIII" provides a better visual balance on the clock face, especially when contrasted with the "VIII" (8) on the opposite side of the dial. The "IIII" also helps divide the clock face into three visually distinct sections (I-IIII, V-VIII, IX-XII), each using a similar visual pattern.
2. Historical Usage: While "IV" became common later, "IIII" was frequently used for "4" in ancient Roman times, especially on sundials. The subtractive notation (like IV for 4, or IX for 9) became more standard after the fall of the Roman Empire, but the older additive style persisted in some applications.
3. Ease of Manufacturing: For early clockmakers who were often casting or cutting out metal numerals, using "IIII" might have simplified the process. Using “IIII” for 4 and “VIIII” for 9 (that is, all additive numerals) would require the use of only three molds. The first “IIII” mold for casting numerals 1 through 4; the second “VIIII” mold to cast 5 through 9; and a third “XII” mold to cast 10 through 12. Adding “IV” and “IX” would have required additional molds and thus would have made for a more inefficient process.

It's difficult to find good authoritative citations for any of these arguments. There are LOTS of web pages with speculations, but the most reasonable one I've read is the "Ease of Manufacturing" speculation above. I can easily imagine clockmakers trying to simplify the process.

Plausible, but there's no clear authority for this story (or any others).

So we'll simply agree to keep looking for a deeper explanation, but note in passing that this is a worldwide phenomenon.

2. All around Zürich I kept running across places that had this logo (below). They always seem to be centered around a coffee shop / diner of some kind, but they seem to be much more than just another coffee shop. What are these places? Why would one go there?

A simple search for this text (Zürcher Gemeinschaftszentren, aka GZ) takes you to several pages in the city of Zürich website. The GZ are described as "... a network of 17 community centers in Zurich, Switzerland. They serve as vibrant hubs for local residents, offering a wide range of activities, workshops, cultural events, and spaces for community interaction. These centers are co-funded by the city and are open to the public, providing a welcoming environment for individuals and families."

You can get a sense of the activities at the GZs by looking at their home page: https://gz-zh.ch/ Read around in these pages to get a good sense for the kinds of things they do. (Note that you can set your Chrome browser to automatically translate to your preferred language: Look for a section related to "Languages" or "Translation" within your browser's settings or preferences.)

If you live near one, you'll find a coffee shop that's a place for you and your neighbors to hang out, a lunch place (often with free lunches!), and usually an activity space that's very busy. (The one near me had regular musical performances by good bands!)

You might want to go to meet up with people in the neighborhood or to take advantage of the free activities. It's well worth your time to stop by.

(In my case, I saw this logo outside of a couple of different buildings and had to look up what they were. A fascinating cultural meeting ground for people who live nearby. Check out their YouTube channel to see what kinds of things go on at the GZs.)

3. I was lucky enough to see one of these wee beasties flying over a clump of flowers and sipping nectar, looking for all the world like a hummingbird. But it's NOT a hummingbird! What is it? (I wish this image was mine, but none of my shots turned out. This is from Wikipedia.)

A mysterious flying critter. What is it? P/C Wikimedia

A simple Search-By-Image identifies this as a Hummingbird Hawk-moth (Macroglossum stellatarum). Wikipedia tells us that this is "...species of hawk moth found across temperate regions of Eurasia. The species is named for its similarity to hummingbirds, as they feed on the nectar of tube-shaped flowers using their long proboscis while hovering in the air; this resemblance is an example of convergent evolution."

What a spectacular moth!

In German this is called a Taubenschwänzchen. Searching for this term leads to wonderful videos. Here's one.

I note that there's a very similar moth in North America, members of the Sphingidae family, like the Hyles lineata, also known as the white-lined sphinx, sometimes also called a hummingbird-moth.

4. You know the quote above that's reputedly by Asimov? ("The most exciting phrase to hear in science, the one that heralds new discoveries, is not “Eureka!” (I found it!) but “That’s funny…”) I have my doubts. Can you figure out where it actually comes from?

When I'm checking a quote source like this, I ALWAYS look first at the Quote Investigator website. He does a superb job of running down the source of famous / infamous quotations. So my first query was:

[ Quote investigator The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! ]

Note that I did NOT put the quote in quote-marks. (If your quote copy is even slightly wrong it'll take you down a side-path you might not want to travel.)

And, as usual, in this case, the Quote Investigator does a great job. Check out his analysis that concludes with this paragraph:

In conclusion, the ascription to Isaac Asimov remains uncertain. He received credit in 1987 while he was still alive. The attribution appeared in the “fortune” program of the UNIX operating system. The quotation has not yet been found in the writings, interviews, or speeches of Asimov. An interesting precursor occurred in an article by Gordon Rattray Taylor in 1965.

SearchResearch Lessons

1. Pay attention to variations in the world around you. I noticed the strange variant numeral 4 on clock towers and wondered why.

2. When you search, don't bake-in your assumptions (especially about location, country, or other condition). Be very aware that your queries (or questions to LLMs) can easily lead to a bias in the answers you get back. Try to be as generic and broad in scope as needed. If you need to, you can always add in details, but start with your filters open!

3. Sometimes it really helps to know experts in a particular area: my go-to person for finding the original sources of quotes is the Quote Investigator. It pays to take notes... just sayin...

Keep searching!

Wednesday, June 25, 2025

SearchResearch Challenge (6/25/25): Mysteries in Zürich?

One of the best parts of traveling...

St. Peterskirche, Zürich

... is the chance to see the world in new ways. Every time I travel, I always see anew and come across wondrous things that rattle around in my brainpan for months afterward.

As one might say, in a turn of phrase widely attributed to Issac Asimov,

The most exciting phrase to hear in science, the one that heralds new discoveries, is not “Eureka!” (I found it!) but “That’s funny…”

As you know, I've just returned to Silicon Valley from my sojourn in Zürich. It was a wonderful time, full of fascinating places, cultural traditions, and an immersion in a culture that is not my own, but marvelous to behold.

But, as you might expect, I noticed a few funny things that drew my attention--things that I need to share with you as SRS Challenges.

Can you figure these out? (I don't think they're that hard, but they ARE incredibly interesting.)

A mysterious flying critter. What is it? P/C Wikimedia

4. You know the quote above that's reputedly by Asimov? I have my doubts. Can you figure out where it actually comes from?

As always, please tell us your findings in the comments below. BE SURE to tell us what you did to find your answers.

Keep searching!

Saturday, June 7, 2025

Answer: What's the story of Rosamond Lake?

Rosamond Lake is pretty dull...

P/C Google Maps (image from 2021)

{ Preface: Sorry about the long delay in posting this Answer. It was finals week here in Zürich, and I got rather busy with end-of-term stuff. I'm also planning on returning to Silicon Valley--my home-home--next week, so I won't be posting this coming week either. See you in 2 weeks! }

The lake bed very, very, very flat--exceedingly dry, and not much goes on there.

The point of this particular post is to show you how to approach this kind of geographic question.

So... what would be interesting about Rosamond Lake? Let's find out, and test my theory that nothing is boring... you just haven't done enough research yet.

1. What is the story (stories?) with Rosamond Lake? What makes it particularly interesting?

When dealing with a geographic place like Rosamond, I'll first search for some aerial views, just so I can see what the place is and to get a sense of context. A great starting point is to review the SRS post from a while ago about access to different Earth images. (What are some good (almost) real-time satellite image sources?)

Here are a few images I found (remember that you can click on each to get the full-size image):

P/C USGS topo maps 1947

P/C Bing Maps

P/C EarthExplorer (ESRI)

P/C Google Earth

P/C Wikimapia

The first thing that strikes me is the presence of a large X-shaped marks. If you look closely at the Bing and Google images, you can see the double line heading to the upper right. Here's a side-by-side of the images. Note that all of the images vary slightly--difference in time of day, day of the year, the imaging system used... they all make differences in the final images.

Interestingly, the line shown on the Wikimapia image runs from upper left to lower right. In some of the photos, it's really hard to see. I could only spot it on the WeGo image (below, click to enlarge):

P/C Wego.here

Interesting that it's so hard to see, but clearly there. Is it just a faded road, or what?

This is a job for Google Earth's "Historical Imagery" feature. I brought up Google Earth and pulled the time slider back to 1985 and discovered this very clear X shape on the lake bed:

Rosamond Lake, 1985. P/C Google Earth

Those marks are pretty clearly graded emergency landing strips to serve as backup landing sites for the nearby Edwards Air Force Base, located just to the east of Rosamond Lake. It has its own dry lake bed, Rogers Lake, which as you can see, has a bunch of backup landing strips. (Intriguingly, it also has a compass rose marked in the lake bed as well. See the Wikipedia article about this feature, painted on Rogers Lake in 1985, and a high-res image of the rose.)

Rogers Lake, with Edwards AFB on the left. Note all of the graded paths that act as
emergency landing zones. Very similar to Rosamond Lake. P/C Google Earth (1995)

Now we know about the mysterious X on the lake bed--what other stories might we find? More importantly, what's our strategy for finding them?

Here are my search strategies:

#1. Check news. Current news is worth checking, but more importantly check newspaper back issues using either Newspapers.com (or similar) or the Google News Archive (Note, however, that the Google News Archive has a problem actually locating the item you're looking for... you have to scroll around a bit to find the target! It's also not being updated, so it's growing increasingly elderly.)

#2. Search regular search for [ stories "Rosamond Lake" ] -- that will sometimes get you interesting hits.

#3. Check Google Books for "Rosamond Lake."

#4. Check YouTube videos. (You never know what will be in there!)

#5. Check various AIs. I like to use a prompt like this:

[ I am a news reporter who writes about history and science. What stories might I write about the Rosamond Lake in California? Give story ideas along with references to the information behind those story ideas. ]

Note that I'm giving the AI a bit of context ("I am a...") and asking for references. If you have access to Deep Research models, use them to scan a LOT of web pages. As always, check the results.

Using these methods, I found several stories of interest. Many of them having to do with setting endurance records (by both car and plane), with new stories telling of ever newer records year after year.

A. Hard landing: An X-15 landed very hard on Rosamond Lake after engine failure. Unable to dump its load of fuel, it had to land on the dry lakebed, breaking its back in the process. (See: NASA's telling of the story.)

X-15 with broken back after landing hard with full load of fuel on Rosamond Lake. (1959)
P/C NASA

After checking YouTube for this, I found this amazing footage of the X-15 coming in on this hard landing:

Link to video. Note that the narrator tells us that "Crossfield jettisons the rocket fuel..." Although apparently he wasn't able to get rid of all of it, leading to a heavy / hard landing.

B. Piute Ponds. If you look at the images above in the lower left corner, you'll see a place that looks like it's covered in water... and it is! Those are Piute Ponds:

The story is that they were formed when Los Angeles County Sanitation District 14 created a dike keep its effluent from reaching Rosamond dry lakebed. (If you look at the image above, you can see it wasn't always successful.) The ponds continue to be sustained with input from the District 14 Wastewater treatment plant.

The manmade ponds are the largest freshwater wetlands in Los Angeles County and has become an important bird area in California with over 200 species of migratory birds such as the Great Blue Heron, the Great Horned Owl, the Black Crowned Night Heron and the Western Snowy Plover. (Desert News; Edwards AFB news; Edwards AFB public info site; Museum of Art and History article)

C. Gold mining. If you look around Rosamond Lake, you'll find several gold mines with intriguing stories. The Tropico Hill mine used to be a clay mine, until someone tested the clay and found substantial gold in it! That's just a few miles west of Rosamond Lake. Looking a bit northward, you'll find Soledad Mountain, a gold mine that's been in service since 1835, and is STILL in use today. (Mojave Desert news)

Soledad Mountain with the Golden Queen Mine (other mines have been incorporated into this one).
P/C Google Maps

D. Energy production. If you cast your aerial map gaze just slightly north you'll see vast rows of solar panels and wind mills. This area is actually one of the largest green energy sites in California.

Energy production near Rosamond Lake
P/C Google Maps

As I said earlier, there are always stories if you dig deep enough. This is just the beginning!

Our Regular Searchresearchers had a few other insights worth sharing...

Krossbow pointed out how sometimes asking a real person can give you an insight that you might not have otherwise discovered.

And remmij found that the rock start Madonna used a the lake as filming/backdrop location. It's an interesting video because you can see a very evocative version of the lake.

Madonna's "Frozen" video.

OR you can see the lake from the point of view of pilots landing there. The Air Force opened up the lake for a fly-in, 2010:

Edwards AFB fly-in at Rosamond Lake.

I don't know if you found these stories "exciting" enough to move your needle towards "Deep Interest," but I hope these methods are all part of your portfolio of SearchResearch techniques.

Keep searching!

Wednesday, May 21, 2025

SearchResearch (5/22/25): What's the story of Rosamond Lake?

California is full of strange and interesting places...

P/C Google Maps (image from 2021)

... sure, you know about the golden beaches and the snowy mountains, and you might even know a bit about the deserts of Southern California. But there's a lot of history hidden away in the dry reaches of the southland.

One of those places is Rosamond Lake, a dry lake bed in SoCal, a part of the Antelope Valley. It's really, really flat and most of the time it's just dry / dry / dry. And that makes it especially useful for something you might not guess.

Our SRS Challenge for this week centers on Rosamond Lake. Can you figure it out?

1. What is the story (stories?) with Rosamond Lake? What makes it particularly interesting?

This isn't too hard, but a kind of fun Challenge. Can you figure out what the story is here? What did you do to figure it out?

Let us know. Next week I'll have some background for you!

Keep searching!

Thursday, May 15, 2025

Answer: How good are those AI summaries anyway?

Summaries are THE current AI hotness...

P/C [slide showing someone using AI for summarization] Gemini (May 1, 2025)

... you see the promotions for summarizing anything and everything with an LLM. Various AI companies claim to be able to summarize meetings, emails, long boring white papers, financial statements--etc etc etc.

I have my concerns about the onslaught of excessive summarization, but I'll save that for another day.

This week we asked a very specific question:

1. How has your experience of using AI for summarization worked out?

An obvious first question: What does it mean to summarize something? Is it just making the text shorter, or does "summarizing" imply a kind of analysis to foreground the most important parts?

And, "is there a method of summarizing that works for every kind of content?"

I don't have a stake in this contest: if just shortening a text works, then I'm all for it. But I kind of suspect that just shortening a book (rather than rewriting it), won't make for a great summary. For example, just shortening "Moby Dick" would lose its commentary on race, critiques of contemporary thought, and the nature of knowledge. You know, all the stuff you had to learn about while reading it in high school.

Summarizing is, I suspect, a grand art, much as creating an explanation is. When I explain what a text is "about," the explanation will vary a great deal depending on what the purpose of the explanation is, and who I'm explaining it to--telling a 10 year-old about Moby Dick isn't the same as telling a 30 year-old. Those explanations--or those summaries--will be very different.

So when we prompt an LLM for an explanation, it behooves us to provide a bit of context. At the very least, say who's the summary for, and what the point of the summary is. So, whenever you ask an LLM for a summary, a little context is your best friend. Throw in a "summarize this for a busy PhD student" or a "explain this to my grandma" – it'll make a world of difference.

To answer this Challenge, I did a bit of experimenting.

Since I write professionally (mostly technical articles intended for publication in journals and conferences), I have a LOT of samples I can experiment with.

For instance, I've recently been working on a paper with a colleague on the topic of "how people searched for COVID-19 information during the pandemic." (Note that this was a 1-sentence summary of the paper. The length of a summary is another dimension to consider. Want a 1-sentence summary of War and Peace? "It's about Russia.")

Notice that all tech papers have an abstract, which is another kind of summary intended for the technical reader of the text. I wrote an abstract for the paper and thus have something completely-human generated as a North Star.

I took my paper (6100 words long) and asked several LLMs to summarize it with this prompt:

[ I am a PhD computer scientist. Please summarize this paper for me. ]

I asked for summaries from Gemini 2.5 Pro, ChatGPT 4o, Claude 3.7 Sonnet, Grok, Perplexity, and NotebookLM. (Those are links to each of their summaries.)

Here are the top-left sections of each--arrayed so you can take a look at the differences between them. (Remember you can click on the image to see it at full-resolution.)

And...

I took the time to read through each of the summaries, evaluating each, looking for accuracy and any differences between what we wrote in the paper and the summaries.

The good news is that I didn't find any terrible errors--no hallucinations were obvious.

But the difference in emphasis and quality was interesting.

The most striking thing is that different summaries put different findings at the top of their "Key Findings" lists. If you ignore the typographic issues (too many bullets in Gemini, funky spacing and fonts for Perplexity, strange citation links for NotebookLM), you'll see that:

1. Gemini writes a new summary that reads more like a newspaper account. It's quite good and lists Key Findings at the top. giving an excellent summary at the very beginning with good key findings. Of all the summaries I tested, this was by far the best, primarily for the quality of its synthesis and the clarity of the language it generated. (The summary was 629 words.)
2. ChatGPT is more prosaic--really a shortening rather than significant rewriting. It didn't really do a summary as much as it gave an outline of the paper. It was okay, but to understand a few of the sentences in the summary you need to have read the paper (which is NOT the point of a summary).Note that ChatGPT's Key Findings are somewhat different than Gemini's. (432 words)
3. Claude also has different Main Findings, and brings up Methodological Contributions to near the top, which Gemini and ChatGPT do not. But it did a good job of summarizing the key findings, and wrote good prose about each.(324 words)
4. Grok puts Key Findings below Sources and Methods, but the text is decent. Grok buried the Key Findings under Sources and Methods, which is a bit like hiding the cake under the vegetables. It had 4 key findings (other had more) and a decent, if short, discussion of what this all meant. (629 words)
5. Perplexity is similar, but gets confused when discussing the finding about Query Clusters. It was a bit sketchy on the details and gave a confused story about how clustering was done in the paper. (I suspect it got tripped up by one of the data tables.) (256 words)
6. NotebookLM uses much less formatting to highlight sections of the summary, and includes a bunch of sentence level citations. (That's what the numbered gray circles are--a pointer to the place where each of the claims originates.) NLM spent a lot of time up-front discussing the methods and not the outcomes. (1010 words)

Overall, in this particular comparison, Gemini is the clear winner, with ChatGPT and Claude in second place. Both Perplexity and NotebookLM seem to get lost in their summaries, kind of wandering off topic rather than being brief and to-the-point.

This brings up a great point--when summarizing a technical article, do you (dear reader) want a structured document with section headings and bullet points? Or do you want just a block of text to read?

A traditional abstract is just a block of text that explains the paper. In fact, the human-generated abstract that I wrote looks like this:

The COVID-19 pandemic has had a dramatic effect on people’s lives, health outcomes, and their medical information-seeking behaviors. People often turn to search engines to answer their medical questions. Understanding how people search for medical information about COVID-19 can tell us a great deal about their shifting interests and the conceptual categories of their search behavior. We explore the public’s Google searches for COVID-19 information using both public data sources as well as a more complete data set from Google’s internal search logs. Of interest is the way in which shifts in search terms reflect various trends of misinformation outbreaks, the beginning of public health information campaigns, and waves of COVID-19 infections. This study aims to describe online behavior related to COVID-19 vaccine information from the beginning of the pandemic in the US (Q1 of 2020) until mid-2022. This study analyzes online search behavior in the US from searchers using Google to search for topics related to COVID-19. We examine searches during this period after the initial identification of COVID-19 in the US, through emergency vaccine use authorizations, various misinformation eruptions, the start of public vaccination efforts, and several waves of COVID-19 infections. Google is the dominant search engine in the US accounting for approximately 89 percent of total search volume in the US as of January, 2022. As such, search data from Google reflects the major interests of public health concerns about COVID, its treatments, and its issues.

Interesting, eh? Although written by a human (me!), the abstract doesn't pull out the Key Findings or Methods (although they're there) in the same way that the AIs do.

But perhaps the structure of the LLM summaries is better than the traditional pure text format. When asked for summaries of other papers by LLMs (i.e., NOT written by me), the "outline-y" and "bullet-point" format actually worked quite well.

When I used Gemini on papers that are from a recent technical conference, I found the summaries to be actually quite useful. To be clear, what I did was to read the AI generated summary as an "extended abstract," and if the paper looked interesting, I then went to read the paper in the traditional way. (That is, slowly and carefully, with a pen in hand, marking and annotating the paper as I read.)

A bigger surprise...

When I scan a paper for interest, I always read the abstract (that human-generated summary), but I ALSO always look for the figures, since they often contain the heart of the paper. Yes, it's sometimes a pain to look at big data tables, or intricate graphs, but it's usually also tells you a lot. Just the figures are often a great summary of the paper as well.

This is the first figure of our paper. The caption for Figure 1 tells you that it shows:

Google Trends data comparing searches for “covid mask” and “covid.” This shows searches for “mask” and all COVID-related terms from Jan 1, 2018 until July 7, 2022. Note the small uptick in October for Halloween masks in 2018 and 2019. This highlights that all searches containing the word masks after March, 2020 were primarily searches for masks as a COVID-19 preventative measure.

Oddly, though, only ChatGPT was able to actually pull the figure out of the PDF file. The other systems claimed that the image data wasn't included in the file, although ChatGPT showed that they were woefully misled. I actually expected better from Gemini and Claude.

Although I could convince ChatGPT to extract images from the PDF document, I wasn't able to get it to create a summary that included those figures. I suspect that a really GREAT summarizer would include the summary text + a key figure or two.

SRS Team Discussion

We had a lively discussion with 19 comments from 5 human posters. Dre gave us his AI developers scan of the situation, suggesting that working directly with the AI models AND giving them clean, verified data was the way to go.

Scott pointed out that transcripts of meetings are often more valuable when summarized. He reports that AI tools can summarize transcripts quite well, but he finds that "denser" technical materials don't condense as easily. (Scott: Try Gemini 2.5 Pro with the prompting tips from above.)

Leigh also runs his models locally and is able to fine tune them to get to the gist of the articles he scans, and has built his own summarizing work flow as well.

The ever reliable remmij poured out some AI-generated wisdom about using LLMs for summarization, including strong support for Scott's point-of-view that can be best summarized as "They are Language Models, Not Knowledge Databases." That is, hallucination is still a threat: be cautious. (I always always always check my summaries.)

Ramón chimed in with an attempt to summarize the blog post... and found that the summarizer he used produced a summary that was longer than the original! Fun... but not really useful!

SearchResearch Lessons

1. More useful than I thought! A bit to my surprise, I've found the LLM summaries of technical articles to be fairly useful. In particular, I found that Gemini (certainly the 2.5 Pro version) creates good synthetic summaries, not just shortened texts.

2. Probably not for literature analysis as-is. If you insist on using an LLM to help summarize a text with deeper semantics, be sure to put in a description of who you are (your role, background) and what kind of summary analysis you're looking for (e.g., "a thematic analysis of...").

3. When you're looking at technical papers, be sure to look at the figures. The AIs don't quite yet have the chops to pull this one off, but I'm sure they'll be able to do it sometime soon. They just have to get their PDF scanning libraries in place!

Hope you found this useful.

Keep searching. (And double check those summaries!)

SearchReSearch

Thursday, July 17, 2025

Answer: So what ARE LLMs good at? What are they bad at?

Bottom line: Basically, LLMs are cybernetic mansplainers—you have to check their work. Bear that in mind as you work through all of this.

Wednesday, July 9, 2025

SearchResearch Challenge (7/9/25): So what ARE LLMs good at? What are they bad at?

Wednesday, July 2, 2025

Answer: Mysteries in Zürich?

SearchResearch Lessons

Wednesday, June 25, 2025

SearchResearch Challenge (6/25/25): Mysteries in Zürich?

Saturday, June 7, 2025

Answer: What's the story of Rosamond Lake?

Wednesday, May 21, 2025

SearchResearch (5/22/25): What's the story of Rosamond Lake?

Thursday, May 15, 2025

Answer: How good are those AI summaries anyway?

Followers

Blog Archive