Summaries are THE current AI hotness...
![]() |
P/C [slide showing someone using AI for summarization] Gemini (May 1, 2025) |
... you see the promotions for summarizing anything and everything with an LLM. Various AI companies claim to be able to summarize meetings, emails, long boring white papers, financial statements--etc etc etc.
This week we asked a very specific question:
1. How has your experience of using AI for summarization worked out?
An obvious first question: What does it mean to summarize something? Is it just making the text shorter, or does "summarizing" imply a kind of analysis to foreground the most important parts?
And, "is there a method of summarizing that works for every kind of content?"
I don't have a stake in this contest: if just shortening a text works, then I'm all for it. But I kind of suspect that just shortening a book (rather than rewriting it), won't make for a great summary. For example, just shortening "Moby Dick" would lose its commentary on race, critiques of contemporary thought, and the nature of knowledge. You know, all the stuff you had to learn about while reading it in high school.
Summarizing is, I suspect, a grand art, much as creating an explanation is. When I explain what a text is "about," the explanation will vary a great deal depending on what the purpose of the explanation is, and who I'm explaining it to--telling a 10 year-old about Moby Dick isn't the same as telling a 30 year-old. Those explanations--or those summaries--will be very different.
So when we prompt an LLM for an explanation, it behooves us to provide a bit of context. At the very least, say who's the summary for, and what the point of the summary is. So, whenever you ask an LLM for a summary, a little context is your best friend. Throw in a "summarize this for a busy PhD student" or a "explain this to my grandma" – it'll make a world of difference.
To answer this Challenge, I did a bit of experimenting.
Since I write professionally (mostly technical articles intended for publication in journals and conferences), I have a LOT of samples I can experiment with.
For instance, I've recently been working on a paper with a colleague on the topic of "how people searched for COVID-19 information during the pandemic." (Note that this was a 1-sentence summary of the paper. The length of a summary is another dimension to consider. Want a 1-sentence summary of War and Peace? "It's about Russia.")
Notice that all tech papers have an abstract, which is another kind of summary intended for the technical reader of the text. I wrote an abstract for the paper and thus have something completely-human generated as a North Star.
I took my paper (6100 words long) and asked several LLMs to summarize it with this prompt:
[ I am a PhD computer scientist. Please summarize this paper for me. ]
I asked for summaries from Gemini 2.5 Pro, ChatGPT 4o, Claude 3.7 Sonnet, Grok, Perplexity, and NotebookLM. (Those are links to each of their summaries.)
Here are the top-left sections of each--arrayed so you can take a look at the differences between them. (Remember you can click on the image to see it at full-resolution.)
I took the time to read through each of the summaries, evaluating each, looking for accuracy and any differences between what we wrote in the paper and the summaries.
The good news is that I didn't find any terrible errors--no hallucinations were obvious.
But the difference in emphasis and quality was interesting.
The most striking thing is that different summaries put different findings at the top of their "Key Findings" lists. If you ignore the typographic issues (too many bullets in Gemini, funky spacing and fonts for Perplexity, strange citation links for NotebookLM), you'll see that:
1. Gemini writes a new summary that reads more like a newspaper account. It's quite good and lists Key Findings at the top. giving an excellent summary at the very beginning with good key findings. Of all the summaries I tested, this was by far the best, primarily for the quality of its synthesis and the clarity of the language it generated. (The summary was 629 words.)
2. ChatGPT is more prosaic--really a shortening rather than significant rewriting. It didn't really do a summary as much as it gave an outline of the paper. It was okay, but to understand a few of the sentences in the summary you need to have read the paper (which is NOT the point of a summary).Note that ChatGPT's Key Findings are somewhat different than Gemini's. (432 words)
3. Claude also has different Main Findings, and brings up Methodological Contributions to near the top, which Gemini and ChatGPT do not. But it did a good job of summarizing the key findings, and wrote good prose about each.(324 words)
4. Grok puts Key Findings below Sources and Methods, but the text is decent. Grok buried the Key Findings under Sources and Methods, which is a bit like hiding the cake under the vegetables. It had 4 key findings (other had more) and a decent, if short, discussion of what this all meant. (629 words)
5. Perplexity is similar, but gets confused when discussing the finding about Query Clusters. It was a bit sketchy on the details and gave a confused story about how clustering was done in the paper. (I suspect it got tripped up by one of the data tables.) (256 words)
6. NotebookLM uses much less formatting to highlight sections of the summary, and includes a bunch of sentence level citations. (That's what the numbered gray circles are--a pointer to the place where each of the claims originates.) NLM spent a lot of time up-front discussing the methods and not the outcomes. (1010 words)
Overall, in this particular comparison, Gemini is the clear winner, with ChatGPT and Claude in second place. Both Perplexity and NotebookLM seem to get lost in their summaries, kind of wandering off topic rather than being brief and to-the-point.
This brings up a great point--when summarizing a technical article, do you (dear reader) want a structured document with section headings and bullet points? Or do you want just a block of text to read?
A traditional abstract is just a block of text that explains the paper. In fact, the human-generated abstract that I wrote looks like this:
The COVID-19 pandemic has had a dramatic effect on people’s lives, health outcomes, and their medical information-seeking behaviors. People often turn to search engines to answer their medical questions. Understanding how people search for medical information about COVID-19 can tell us a great deal about their shifting interests and the conceptual categories of their search behavior. We explore the public’s Google searches for COVID-19 information using both public data sources as well as a more complete data set from Google’s internal search logs. Of interest is the way in which shifts in search terms reflect various trends of misinformation outbreaks, the beginning of public health information campaigns, and waves of COVID-19 infections. This study aims to describe online behavior related to COVID-19 vaccine information from the beginning of the pandemic in the US (Q1 of 2020) until mid-2022. This study analyzes online search behavior in the US from searchers using Google to search for topics related to COVID-19. We examine searches during this period after the initial identification of COVID-19 in the US, through emergency vaccine use authorizations, various misinformation eruptions, the start of public vaccination efforts, and several waves of COVID-19 infections. Google is the dominant search engine in the US accounting for approximately 89 percent of total search volume in the US as of January, 2022. As such, search data from Google reflects the major interests of public health concerns about COVID, its treatments, and its issues.
Interesting, eh? Although written by a human (me!), the abstract doesn't pull out the Key Findings or Methods (although they're there) in the same way that the AIs do.
But perhaps the structure of the LLM summaries is better than the traditional pure text format. When asked for summaries of other papers by LLMs (i.e., NOT written by me), the "outline-y" and "bullet-point" format actually worked quite well.
When I used Gemini on papers that are from a recent technical conference, I found the summaries to be actually quite useful. To be clear, what I did was to read the AI generated summary as an "extended abstract," and if the paper looked interesting, I then went to read the paper in the traditional way. (That is, slowly and carefully, with a pen in hand, marking and annotating the paper as I read.)
A bigger surprise...
When I scan a paper for interest, I always read the abstract (that human-generated summary), but I ALSO always look for the figures, since they often contain the heart of the paper. Yes, it's sometimes a pain to look at big data tables, or intricate graphs, but it's usually also tells you a lot. Just the figures are often a great summary of the paper as well.
This is the first figure of our paper. The caption for Figure 1 tells you that it shows:
Google Trends data comparing searches for “covid mask” and “covid.” This shows searches for “mask” and all COVID-related terms from Jan 1, 2018 until July 7, 2022. Note the small uptick in October for Halloween masks in 2018 and 2019. This highlights that all searches containing the word masks after March, 2020 were primarily searches for masks as a COVID-19 preventative measure.
Oddly, though, only ChatGPT was able to actually pull the figure out of the PDF file. The other systems claimed that the image data wasn't included in the file, although ChatGPT showed that they were woefully misled. I actually expected better from Gemini and Claude.
Although I could convince ChatGPT to extract images from the PDF document, I wasn't able to get it to create a summary that included those figures. I suspect that a really GREAT summarizer would include the summary text + a key figure or two.
SRS Team Discussion
We had a lively discussion with 19 comments from 5 human posters. Dre gave us his AI developers scan of the situation, suggesting that working directly with the AI models AND giving them clean, verified data was the way to go.
Scott pointed out that transcripts of meetings are often more valuable when summarized. He reports that AI tools can summarize transcripts quite well, but he finds that "denser" technical materials don't condense as easily. (Scott: Try Gemini 2.5 Pro with the prompting tips from above.)
Leigh also runs his models locally and is able to fine tune them to get to the gist of the articles he scans, and has built his own summarizing work flow as well.
The ever reliable remmij poured out some AI-generated wisdom about using LLMs for summarization, including strong support for Scott's point-of-view that can be best summarized as "They are Language Models, Not Knowledge Databases." That is, hallucination is still a threat: be cautious. (I always always always check my summaries.)
Ramón chimed in with an attempt to summarize the blog post... and found that the summarizer he used produced a summary that was longer than the original! Fun... but not really useful!
SearchResearch Lessons
1. More useful than I thought! A bit to my surprise, I've found the LLM summaries of technical articles to be fairly useful. In particular, I found that Gemini (certainly the 2.5 Pro version) creates good synthetic summaries, not just shortened texts.
2. Probably not for literature analysis as-is. If you insist on using an LLM to help summarize a text with deeper semantics, be sure to put in a description of who you are (your role, background) and what kind of summary analysis you're looking for (e.g., "a thematic analysis of...").
3. When you're looking at technical papers, be sure to look at the figures. The AIs don't quite yet have the chops to pull this one off, but I'm sure they'll be able to do it sometime soon. They just have to get their PDF scanning libraries in place!
Hope you found this useful.
Keep searching. (And double check those summaries!)
I have no knowledge base... may have been eaten by dingoes...
ReplyDeletean evolving, wordless, hallucinatory summary:
https://i.imgur.com/j9uhYFy.jpeg
https://i.imgur.com/Ms0VPFg.jpeg
https://i.imgur.com/VBeoitA.jpeg