Wednesday, January 24, 2024

SearchResearch Challenge (1/17/24): How can I get an AI to summarize a document?

We’ve got AI tools all over the place… 

… now, how can we use them effectively? 

An important piece of SearchResearch (or more generally, sensemaking) is finding relevant documents, reading them, and extracting the pieces of knowledge you need to get on with your work.  

I find that I spend roughly half of my research time just browsing, scanning, and reading—it’s what you need to do to understand a topic area enough to do real work. 

But what if we could accelerate the scanning and reading part?  What if we could take a long document and summarize it accurately?  

This is one of the great promises of the current crop of LLMs—they offer the ability to summarize long documents.  

Or do they?  Will they make the long document into a shorter, more focused work?  Or will that tool end up blurring everything into mush?  

The Challenge for this week is to explore how well an LLM can summarize a document in a way that’s useful for us SearchResearchers.  Here’s the Challenge for this week: 

1. How well does your favorite bit of AI technology do at summarizing long texts?  What system do you like for summarizing, and why?  

2. Test out your summarization method on two long-form texts.  First, let’s try that classic gothic tale, Frankenstein by Mary Shelley.  (Here’s a link to the full text that you can use.) How good is that summary?  

3. Second, let’s try your summarization method on a piece of text you might have read, but might have slightly forgotten—Chapter 10 of my book, The Joy of Searchlink to the full-text of Chapter 10 here"When was oil first discovered in California?"  Did your summary method create a good/useful summary?  

Try using your fave summary method on these two works and let us know what you find.  We’re interested in practical ways of getting a good summary of long texts.  When you do this, think about what your goal is in getting the summary—is it just general interest?  Or do you have a particular question in mind?  When is a summary useful?  (And contrariwise, when does it not work well?)  Or, how do you know your summary is a good one?  

Let us know what you've learned in the comments section.  

Next week I’ll summarize what we’ve learned, and what techniques seem to work best for SearchResearchers.  

Keep searching! 



  1. I didn't know they could help making summaries on long texts. I'll try to find the way to achieve that

    1. I searched how to. And then tried with Google Bard. I am still with no success.

      First tried with
      Can you please summarize this url (added Dr Russell book link)

      Bard said still can't do that.

      Can you please summarize this chapter? Added again the same link

      Bard said, you need to add ..I added it. And then Bard said, I can't find the chapter in your library.

    2. I'm wondering. Can AI go to a site, check day by day and give us the text published only on specific days?

      For example, all the Challenges that Dr. Russell posted I'm June 21?

    3. I haven't tried that particular thing, but I'm sure that some LLM could write the Python code for that operation. Given that it can do that, it wouldn't be hard to have it execute its own code.

    4. Not about this Challenge. It's about Swiss. Very much . Did you noticed this, Dr. Russell?

    5. This video is interesting and also made me remember previous Challenges. It's in Spanish and it's about chairs.

    6. About Swiss:

      Toblerone to lose Matterhorn logo as the chocolate can no longer be considered ‘Swiss’

      Now I am thinking what other products lost Swiss Symbols?,be%20considered%20%E2%80%9Cof%20Switzerland%E2%80%9D.

  2. pre-AI synopsis:
    history of CA oil industry image:
    isn't validating a summary a nightmare & time-waster?
    a summary of AI summary generators…
    trying for the overview -
    need summary -

    1. All good stuff. Thanks for these pointers. (I wonder how many students still know about these resources?)

  3. Will you use an LLM-based summarizer to summarize what we've learned? :-)


    I grew up in Los Angeles, in an area with numerous oil wells. This led me to wonder: when and where was oil first discovered in California? I struggled to find a clear answer, as many sources provided different information. I refined my research question to: when was oil first produced as a commercial product in California? I found four major claims:

    1. Charles Alexander Mentry struck oil at Pico No. 4 in Pico Canyon in 1876, with continuous production for over a century.

    2. The Union Matolle Company struck oil in Humboldt County in 1865 and shipped several barrels to San Francisco for refining.

    3. Edward Doheny struck oil near present-day Dodger Stadium in downtown LA in 1892.

    4. Andreas Pico distilled small amounts of oil from Pico Canyon in 1855, but had a very limited market.

    I researched each claim to determine their credibility.

    1. Mentry-1876: Chevron's corporate history and various documents support this claim. Pico No. 4 was the first commercially successful oil well in California and is considered the birthplace of California's oil industry.

    2. Union Matolle Company-1865: Multiple sources, including the California State Parks web site, support this claim. The Union Mattole Company made its first shipment of oil from the Mattole River in Humboldt County to a San Francisco refinery in June 1865.

    3. Doheny-1892: This claim is less credible, as it seems like a tall tale. Many sources repeat the story, but it is difficult to believe that they drilled a well 150 feet deep with a 60-foot eucalyptus tree trunk.

    4. Pico-185

    1. Did you try pasting all of the text of Frankenstein into the window? It wouldn't let me do it, giving me an error that it can "only handle 100,000 characters." Oops.

  5. LLMs in the news: “recent advances in large language models (e.g., ChatGPT) will transform how astronomers interact with data and how astronomy discoveries are made.”.

    I can’t assess these but some of you might be interested: