Wednesday, July 19, 2023

Answer: How can we use LLMs to search better?

Magic is, by definition... 


Precision targeting for SearchResearch. P/C by Mikhail Nilov (Pexels link)


... something that you don't understand.  That's why magicians wow us in their presentations. 

I love magic, but I really don't want magical interfaces to magical systems.   


Still, as we saw in our post about Using LLMs to find Amazing Words..., with a little ingenuity, we can do remarkable things.  (Definition of LLM

In that post, I illustrate how to use LLMs to find words that end in -core that describe an aesthetic style.  The clever thing that LLMs did in that Challenge was to find related words that do NOT have the -core ending with a similar aesthetic meaning. (Example: "dark academia") 

Our Challenge this week came in two parts:  

1.  Can you find a way to use LLMs (ChatGPT, Bard, Claude, etc.) to answer research questions that would otherwise be difficult to answer?  (As with the Using LLMs to find Amazing Words... example.  If you find such a research task, be sure to let us know what the task is, the LLM you used, and what you did to make it work.)  

As we've seen before, LLMs are currently not great at providing accurate answers.  So we're trying to figure out ways to use LLMs in productive ways.

In last week's commentary post (on Friday), I showed a way to search for keywords and phrases to use for regular Google search.  I've used that method a few times since then, and it's always worked out well.

Short summary:  Don't ask your LLM for specific answers to questions, and REALLY don't ask for citations.  At the moment, LLMs are all too happy to make up fake citations.  

But you SHOULD ask for other terms and phrases you should be searching for in addition to what you've been searching.  The sample prompt pattern I used was: 

     [what are the  most common subtopics related to TOPIC?]  

To use this, just replace the highlighted N with a number (typically 5 - 20), and replace the highlighted TOPIC with a short text description of your topic-to-search-for.  

This is a great way to figure out how to expand your range of ideas. 
2.  Here's an example of this difficult to answer "regular search" task: I wanted to make a list of all the SRS Challenges and Answers (the C&A list) since the beginning of this year.  I used an LLM to help me figure out the process.  Can you figure out what I did?  (I'll tell you now that I learned a bunch in doing this, and it only took me about 10 minutes from start-to-finish. I count that as a major win.)  

Big hint from last week was this..    

I broke this task down into 3 steps:

     1. get the list of C&As from the blog into a text file
     2. extract out the Challenges and Answers (getting rid of anything extra) 
     3. then reverse the order of the C&A list

Let me unpack this.  

1. To get a list of ALL the blog posts, I opened the most recent blog post and scrolled to the bottom. (There are other ways to do this.)  It looks like this: 



Then, I just opened all of the twisty triangles for each of the entries for this year to see each of the blog posts.  That listing looks like this: 



Then, I just selected all of that text, hit COPY, and then opened a text editor and paste the text (unformatted) into a .TXT document.  (I used BBedit, which is a robust and useful editor, but you can use whatever editor you'd like.)   

In this pic, I've highlighted a bunch of lines that are neither Challenges or Answers.  We don't want those lines in our final C&A answer. 



Yes, I could manually delete each line, but what if I want to do 1000 lines?  For that process, I asked Bard: 


There are a couple more answers below, but I realized that grep was exactly what I wanted.  It's a command line action that will find lines that match a pattern and extract them.  Perfect!  

Except that there's a small problem--the code snippet shown here doesn't quite work.  It says that I should do: 

      grep -E "Answer|SearchResearch" bbedit.txt 

After playing around for a while, I figured out that the correct expression should be: 

       grep  'Answer\|Challenge' bbedit.txt

The double-quotes don't work, one needs to use single quotes.  And then you need to use the \ character to tell grep that the | character means OR (in classical Boolean logic).  

Then, once you run that, you've got the list of C&As.  Excellent.  But they're in the wrong order! Dang!  

Now I want to reverse the order of the lines in the file.  That's a classic programming problem, but I don't want to fool around--I just want to flip the order so that the last line becomes the first line, etc.  

Back to Bard:  



This is great!  I didn't know about the tac command, so I've learned something new. 

But when I try to do: 
    
      tac bbedit.txt 

my MacOS terminal application says that it's not part of the terminal commands.  (A useful thing to know: the Linux that ships on the MacOS isn't a "full" distribution--a lot of commands, like tac, are missing.) 

Time to turn to regular Google and search for: 

      [ tac in MacOS ] 

which points me to a StackExchange page with an even better answer, one that doesn't require tac: 


So the right next step was to do: 

      tail -r bbedit.txt
 
which then gave me the file listing of SRS posts BACKWARDS.  That is, only Challenges and Answers from Jan 1, 2023 - July 5, 2023.  As you can see, each Challenge is followed by its correct answer:  

SearchResearch Challenge (1/4/23): How can I find latest updates on topics of interest?

Answer: How can I find latest updates on topics of interest?

SearchResearch Challenge (1/18/23): Musicians travels--how did they get 

Answer: Musicians travels--how did they get from A to B?

SearchResearch Challenge (2/8/23): What do you call this thing?

Answer: What do you call this thing?

SearchResearch Challenge (2/22/23): World's largest waterfall?

Answer: World's largest waterfall?

SearchResearch Challenge (3/8/23): What do these everyday symbols mean?

Answer: What do these everyday symbols mean?

SearchResearch Challenge (3/22/23): What do you call the sediment that 

Answer: What do you call the sediment that blocks a river flowing to 

SearchResearch Challenge (4/5/23): What's this architecture all about?

Answer: What's this architecture all about?

SearchResearch Challenge (4/19/23): How well do LLMs answer SRS 

Answer: How well do LLMs answer SRS questions?

SearchResearch Challenge (5/31/23): Did they really burn ancient Roman 

Answer: Did they really burn Roman statues?

SearchResearch Challenge (6/14/23): How to find the best AI-powered 

Answer: How to find the best AI-powered search engine of the moment?

SearchResearch Challenge (6/28/23): How can you find a free audio book?

Answer: How can you find a free audio book?



SearchResearch Lessons

There are lessons here, and surely more to come as we learn more about working with LLMs. 

1. Ask your LLM to help brainstorm ideas for search terms that you might not have thought about.  If you think of the LLM as a reasonably accurate brainstorming partner, you might it generates some good Google you wouldn't have thought about. 

2. Ask your LLM about ways to transform your data.  I've found that it often will suggest things that I once knew, but forgot about (e.g., using the Linux command tail to reverse the order of lines in a file).  Sometime soon I'll write about other ways I've used an LLM to help me clean and restructure data files.  In truth, this is my primary use case for LLMs these days--as a research assistant to fix up and analyze data. 

3. Be aware that the details of what your LLM tells you might need a little tweaking.  In the above example, I had to tweak that grep expression to use single quotes rather than double quotes.  Often what the LLM tells you is in the right ballpark, but not precisely correct.  (Think of it as a slightly unreliable narrator!)  




Keep Searching! 


2 comments:

  1. Hi Dr Russell!

    Totally unexpected Answer. I thought LLMS would do all the work. My question, just to be sure. The commands tail, grep and tac you run it on n the text editor?

    ReplyDelete