Wednesday, April 19, 2023

SearchResearch Challenge (4/19/23): How well do LLMs answer SRS questions?

 This week let's try something different. 

P/C Dall-e. Prompt: happy robots answering questions  rendered in a
ukiyo-e style on a sweeping landscape cheerful


I've written a couple of posts about using LLMs (Large Language Models--e.g., ChatGPT or Bard), and while I believe they can be incredibly useful for many tasks, our focus in SRS is answering the curious questions that come our way.  

This is post #1338.  We started SRS on Jan 30, 2010--638  689 weeks ago.  That means I've been writing around 2 posts each week for the past 13 years.  Over that time, we've seen features come and go, and we've tried to figure out the best ways to get answers from the internet.  

Now, everything seems to be up in the air.  It's exciting times as the LLMs descend, becoming nearly oracular in their capabilities, but still with feet of clay.  

But as I've pointed out, they still need lots of checking.  That's still our job--evaluation, critical analysis, and trying to make sense of everything.  

So this week, I'd like to turn this over to you with this Challenge.  

1.  I'd like you to report on YOUR experiences in trying to get ChatGPT or Bard (or whichever LLM you'd like to use) to answer your curious questions.  What was the question you were trying to answer?  How well did it turn out?  

During the week ahead, I'll post little comments on what I'm learning.  I hope you do as well.  

--------- 

Here's an example of one mini-exploration into using an LLM to answer a very SRS-style question.  

Challenge:  I've noticed that when giving directions in New York City, the locals often just give two numbers.  Example: "I'll meet you at 53rd and 7th."  As an outsider, I have no idea what this means.  Clearly one is a street number and the other is an avenue number.  So I asked: 

What is the canonical order when specifying a corner in New York City?  Do people say street-then-avenue, or is it avenue-then-street?

For this SRS Challenge, I asked a couple of humans (both New Yorkers) and a couple of LLMs.  Here's what they told me:  

Human 1: It's always street first, like "61st and 1st". However, since the avenues only go up to 12, and the streets go to 200+, there's not much room for confusion unless you're below 12th St. If you said "3rd and 34th" people would know where you meant. If you said "7th and 10th," people would probably want more clarification.

Human 2: Building addresses in Manhattan have a number followed by the street or avenue, e.g. 25 West 50 Street or 500 Fifth Avenue. Street addresses are numbered east and west from Fifth Avenue. In general there are 100 numbers per avenue. So, 25 West 50 Street is between Fifth and Sixth Avenues.  Avenue addresses are more complicated, e.g. 500 Fifth Avenue is near 42 Street. But the grid is easy to understand with streets numbered from south to north with 20 streets (blocks) to the mile. Avenues are numbered from east to west with about 6 or 7 avenues to the mile. When referring to an intersection, the avenue is usually stated first, e.g. Fifth Avenue and 42 Street but you could also say it the other way.


Bard In New York City, the canonical order when giving directions is to list the street name first, followed by the avenue name. This is because streets run north-south, and avenues run east-west. So, if you're giving directions to someone, you would say something like, "Go north on 5th Avenue to 42nd Street."

Bing/ChatGPT: According to a post on tripadvisor.com, when giving directions in New York City, you should say the street first if it’s truly an intersection (e.g., “57th and 3rd”). If you want to be dropped off on the avenue, you should say “Third Avenue at 57th.” Technically, there are only a few real combinations like 2nd Street and 1st Avenue / 2nd Avenue and 1st Street. If either of the numbers is greater than 12, it will be clear to a local which is which without the “street” and “avenue” parts, as there are no avenues numbered higher than 12 (in Manhattan). However, to avoid ambiguity you can say “street” and “avenue” where appropriate (e.g., “14th Street and 8th Avenue”). 

Learn more:

     tripadvisor.com; reddit.com; walkingoffthebigapple.com


Dan's comment:  You need to know that in NYC, avenues run north/south, while streets run east/west. Note that streets are NUMBERED from north to south, while avenues are given numbers running from east to west.  

Oddly, the two humans disagree with each other.  

Human 1: streets are specified first 

Human 2: avenues are specified first 

And the LLMs give interestingly different answers, but both say "streets first."  

Bard says "streets first," and then gives a spurious reason about why that's the case. 

ChatGPT says "streets first" as well, but then gives more nuance about what to tell your taxi driver if you want to be dropped off on the avenue side of the intersection.  It ALSO gives links to reasonable results.  


Let us know what successes (or failures) you've had when using an LLM to answer one of your questions.  You need not write as much as I did here, but we're super curious to hear what you found.  Are these systems going to replace search engines? 

Keep searching!  


6 comments:

  1. Hello Dr Russell.

    I don't think search engines will go. They will just add a new function. Not all questions need AI and they still have problems as we have been reading.

    I'm also not sure if people will like LLMs. Maybe it's just novelty. We will see. Also interesting will be which one becomes the best one and how much is lost for the others.

    About Challenge, Bard is not available in Mexico nor in Spanish. ChatGPT, waiting list on Bing. I'll try signing on OpenAI or other to try them.

    This week read about people asking LLMs about General Franco and his favorite team in Spain. For those questions that need lots of data comparison and summary, AI will be helpful

    ReplyDelete
  2. I have never actually used the ChatGPT program, nor have I ever used Bard or any other large language model. That said, my one experience with these was using Bing shortly after all the news about ChatGPT came out: It was my understanding, from what I had read, that Bing used it, so I tried it out there.

    To be frank, it was underwhelming: It regularly gave wrong answers to questions I asked (one I distinctly remember was asking who Nebraska's football coach was and being told "Scott Frost," even though he had been fired in September 2022). From that, I concluded that it left a lot to be desired, and that it was not worth using a program that would require me to fact check everything it claimed.

    Aside from that, I have not used any large language model programs.

    ReplyDelete
  3. "This is post #1338. We started SRS on Jan 30, 2010--638 weeks ago."
    Is this information accurate? According to Bing :
    "Q: How many weeks are from Jan 30, 2010 until now
    A: From January 30, 2010 until now, there are approximately 697 weeks 1.

    Is there anything else I can help you with?

    Q:I don't think it is correct, redo the calculation
    A: I’m sorry but I prefer not to continue this conversation. I’m still learning so I appreciate your understanding and patience.🙏"

    According to Wolfram alpha the correct answer is 689.7

    Back to Bing again, now the answer is 710 weeks.

    ReplyDelete
    Replies
    1. You're absolutely right: it's 689.7 week. It's a pure typo in the original text. Will fix it up now. Thanks for pointing this out. (I fact-checked the number of weeks both by asking Google [ number of weeks between January 30, 2010 and today] and by using the purpose built weeks-calculator at https://dqydj.com/week-calculator/

      Delete
    2. good reminder from Romania not to take things at face value - even DMR things, but I do have to admit to finding some reassurance in mis-calculation - not everything being reduced to numerical lingo.
      nice nod to Wolfram Alpha… they are not "weak"
      today - 689.714285714.

      Delete
  4. who's going to be left to buy the stuff the ads are pushing?
    still haven't found what I'm searching for… (maybe Bard should have been called Bono…)
    again with the NYT
    Bloomberg
    DAIR
    btw, Dall-e doesn't know Ukiyo-e… yet…
    ukiyo-e
    adachi - they have a contemporary twist too
    darth block
    wondered about the current 'flavors'…
    a sampling
    the big dog - for now
    nvidia
    for the AI eyeballs -
    dall-e styles
    AI — alternative, authoritarian, artistic, allusive, aural, abstract, amorphous, akimbo, available, auntie intell_igence…
    too many As…
    on a side note:
    Yosemite should be spectacular
    and in NYC —
    near the corner of 60th Avenue and 60th Street, Queens
    not Manhattan, but Queens, kinda near Forrest Hills

    ReplyDelete