Friday, November 12, 2010

How many words should be in your search query?

There's always a discussion about how many words should you include in a search query.
Do you want it to be short & sweet, or long and descriptive? The thing is, what works for people (long and descriptive) might not work so well for a search engine.  Here's why...

When you do a search on Google, your words are implicitly AND-ed together.  What that means is that every word you add to your search changes the search, usually making the result set smaller and smaller.  That's even true for words that you might not think are very important (words like "the" "or" "of" "by" etc.).

Here's an example: Suppose you'd like to find out a bit about the history of the musical "Oklahoma."

If your first query is just:  [ Oklahoma ] -- you'll get about 112 million results.  (Don't panic, you don't have to read them all!  Just the first 10 or 12 are all you need to see.  Really!)



When you look at those results, it's pretty clear that they're not on topic.  Or, rather, they're great results for just Oklahoma (the state, the university and the football team).  But we're looking for the musical, right?

So try adding the terms "the musical" to the query:  you now get the query [ Oklahoma the musical ].  But look what happens to the size of your results!  It gets a LOT smaller--now it's down to 5.9 million results.  




And if the results aren't to you're liking, then you can add the term "on Broadway" to your query like this:  
[ Oklahoma the musical on Broadway ] and you've made your result set smaller yet again.  Now you're at 4.2 million results.  






The big point to make here is that every time you add a word to your search, you're restricting the set of possible answers.  That is, when you add a word, everything in the result set is limited to that query.

Usually, that's good!  But if you happen to be looking for something that's NOT in the set, say what you're looking for is really the one of the reviews of "Oklahoma," then by adding in "on Broadway," you might have already passed it by.  Here's a graphical illustration of what I mean.

If you're looking for just the right review of "Oklahoma" (say, from 1948), it's very possible that the result will NOT be in the yellow zone (the 4.2 million results).


And you'll have to back up a little bit in order to find that review.

The bottom line is a little subtle, but an important idea to have when searching.  When you make your search LONGER, you're typically making the result set smaller and smaller.  So if you're having trouble finding just what you want, trying removing terms that might be sending you down the wrong rabbit hole.  And keep trying.  I'll sometimes do 10 searches in a row, removing words and swapping out one way of asking for another.

In my next post, I'll explain why and WHEN you might want to go with longer queries.  But for the moment, keep 'em short and to the point!

Search on!

5 comments:

  1. These recommendations are fine, when you are looking for some issue or subject. But sometimes you have an exact title of an article you are looking for and you just want to know the URL. Then, the most extensive and detailed description works better.

    ReplyDelete
    Replies
    1. Well... that's kind of true, but here's what I see happening a LOT. People will *swear* that the title of the book they're looking for is "Very long list of words that they're sure are in the title" -- but the truth is that often people misremember things. Every extra off-topic word takes you a little farther from the goal. So, IF you have all the words correct, then what you say is true. Problem is, that's extremely rare. Hence my recommendation to start with few words and work towards what you want by adding in additional query terms.

      Delete
  2. Well, I got here from Google+ cos I'm trying to follow up the lecture on being a better Google searcher. I like the way you pointed out these fact as they are of real help to online users. Thanks for the tips. Keep the good work.

    ReplyDelete
  3. Dr Russell,

    Google is not consistent?

    If what you just said is true, why do I get the following results?

    Ltede black -clearlycontacts -justeyewear -lensway -priceme -myshopping
    About 17,400 results (0.35 seconds)

    Ltede black -clearlycontacts -justeyewear -lensway -priceme -myshopping -coastal
    About 196,000 results (0.59 seconds)

    In other words, the addition of -coastal increases the pie here.

    Why?

    Please enlighten.

    Thanks a million.

    ReplyDelete
    Replies
    1. In short, when you keep minus-ing out search terms, eventually you knock the total number of results below a threshold that Google thinks is "too few." When that happens, Google "searches harder" by going deeper into the index, pulling up additional results. Note that it does this only when you've added so many constraints (e.g., additional negated terms) that the results are getting thin. You wouldn't see those results in the "normal" search in any case, so delaying the "searching harder" until you've made long query makes sense.

      Delete