Monday, August 8, 2011

What do you find hard to search? Some answers.

I'm still away on vacation, but I thought I'd collect a few answers that I've received from friends and make a few comments along the way (my comments are in blue-sans-serif-bold).  There will be a new search challenge this coming Wednesday... so be sure to come back then! 

What DO people find hard to search?  


• Entities with very common words as names
Yep.  That's hard.  The trick here is to add another term that helps to discriminate.  Example:  [ ted talks ] rather than just [ ted ] 

• Obscure books/entities hidden by more popular ones that come up under the same search terms
When your results are "polluted" by other intrusive results, that's a call for the minus sign!   Example:  [ salsa -dance ] 

• Finding products with hard-to-describe features (e.g., adapter from two 2-stripe ⅛" headphone plugs to one 3-stripe ⅛" plug)
In this case, try using the simplest description you can use.  The trick here is to figure out what language OTHER people use to describe the same thing.

• Ideas that can be described in many different ways (e.g., Has X new project idea already been done by others?)
Truly hard.  The trick is, again, figuring out the common terminology...  

• Ranges of size/capacity (e.g., USB cable 6"-18", backup cell phone battery 2000-4000 mAh, etc.)
Try using the number range operator.  Example:  [ 4..40V battery ]  will find all batterys within that voltage range.  [ 2000..4000mAh battery ] 


• Chinese city information. 
- I have been searching for areas of Shanghai.  Anglicized Chinese words are inconsistent and the results are very mixed.
True.  Sorry about that.  The underlying conversion from Chinese to EN is inconsistent as well.  

• Official rules/laws.
- I recently got a parking ticket on a poorly marked street. I wanted to find out what the parking laws were the city. I found lots of discussion sites but nothing official on the first level pages.
This is another real problem: how to find the authoritative legal or regulatory pages.  In this case, the governmental organization often does a disservice by tucking their content away in obscure locations, or putting it into giant blobs (e.g., the parking code is all in one giant 50Mb PDF file).  


•  Things whose semantics depends on word order, for example, quoted "convert text to rtf" gives you a lot more about converting RTF to text (admittedly a more common problem).
Absolutely correct.  It's not widely known that word order counts in Google searches!  Try to make your searches be the word order you're likely to find.  Example:  [ black and white ] is very different than [ white and black ] 

•  There's a lot of what you might call "stale" pages about software or device problems, but which aren't stale because they pertain to previous releases or previous hardware which people still have, but are irrelevant to new people because new releases happen and the problems of yesteryear are now solved. Or because the same symptom now has a different cause. Now, the pages aren't stale and they're important if that's what your looking for, but if you're on the current release of whatever you're looking for, they're irrelevant.

For example, suppose you just bought a USB jumping frog and you want to look for how to make it work in Ubuntu. You'll be treated to years of interesting discussion about frog-jumping protocol reverse engineering, followed by intricate pages describing frog hotplug scripts. And it will work if you do it that way, and all of this was exciting and relevant when it was happening, but nowadays USB jumping frogs are natively supported, and it's all due to that original work, but for an end user, it's totally irrelevant. So new people find the old stuff, and they comment on it or link to it, and it becomes newly relevant, even though it's been subsumed. But it's not obsolete if you happen to want that stuff, because they're running an old version.

Ditto for anything: Java releases, MS Office, IE6 vs everything else, problems with different versions of Roomba vacuum cleaners.
This is a big problem on the web in general--how does one determine the correct date or version of a document?  IF ONLY there were a single date / version convention... or even a reasonable number of them.  Alas, the only date that seems reliable is the date of posting, and that's often not a useful timestamp.  


Running gives me various activities not related to marathons. If I search for marathon, marathon oil comes up high on my search results as well. Ironman is another contender, I'm interested in ironman the triathlon event not the character.
Another great example of when you need to add a carefully chosen second or third term.  Just doing a search like [ marathon ] or [ oil ] or even something like [ sun ] just isn't going to give you what you want.  [ marathon race ] is probably better.  

Looking through 1300 profiles for the proper search terms really takes the biscuit.
Yep.  True!  


• Digging through my huge social network discussions… everything that goes only through Facebook, Google+, or Twitter gets lost after a week. I have to mark the important stuff in blog posts or bookmarks... but I don’t always  know in advance what will be important…
Well, once upon a time (like, 1 month ago) we had a "real-time" search feature which would search FB, Twitter and other sources.  Now it's history... BUT... we're working on a new set of ideas here.  G+ won't be without search forever.  Some kind of "rapidly updated content" search tool will exist.  Hang in there. 

I'm still interested in what you find difficult.  So please email me directly if you'd like to tell me a story offline, or add a comment below!  


  1. Searching with terms that contain non alphanumeric characters. For examples: PG&E AT&T, Google+ seems impossible. It often seems that some "smarts" then searches for att pge etc, which might now be what you want. In the same vain, words that are hyphenated are treated in strange ways (try searching for blue-sans-serif-bold).

  2. You're also correct. Neither Google nor Bing currently index "special characters" such as ( ) ? & % € ¥ etc...

    The hyphen character between words (without spaces) are treated as an indication of sequence. That is, the query [ blue-sans-serif-bold ] is like the query [ "blue sans serif bold" ] but allows for more synonymy in the term replacement, and not quite as strong on the word order.

  3. This isnt exactly related to 'hard to search' topic, but when I include the word 'pictures' in the search string I want images to show as part of the search results! (searched for 'Neapolitan icecream pictures')

  4. Mohan -- Do you know about the "Sites with Images" feature? (Look in the left-hand panel. Try clicking on it, then re-doing your search.)