Wednesday, August 31, 2011

Wednesday Search Challenge (August 31, 2011): Easiest way to animate the visualization

There are no end of hard problems to ask.  Most of the really good ones aren't quite suitable for this blog--what is the nature of life?  why is there something rather than nothing?  

So I tend to pick problems that have some kind of observation about the world.  Someone will ask me a question ("Is synesthesia consistent across people?") or I'll notice something odd about the world ("What is that anode thing in the middle of the street?") and that will lead to a search challenge.  

For today's question I want to pose a more mundane, but in some ways even harder, much more pragmatic question.  (And I ask this in the hope that you'll come up with a solution that I haven't thought of!)  

Remember last week's question about the words kayak and tint?   I thought so.  

When I saw the data curves, I immediately wanted to make an animation of the US vs. New Zealand data showing the maps of both US and NZ growing darker/lighter with the value of each query volume.  

This is the kind of practical question someone might ask a data analyst. Can you make me a cool animation that does... this cool thing?  

So the challenge is this:  Figure out how to make an animation of US vs. NZ that pulses the color of the country from white-to-black with the values of each query.  I'll give you the data so you don't have to hassle it yourself.  

Link to Google Spreadsheet with data
Link to XSL version of data
Link to CSV version of data

Each version is 104 rows of data, one for each week of 2009 and 2010.  The values vary from 0  - 100 , with 0 being the lowest query volume.  

Go ahead, so us your animation!  (And let us know how you did it!) 

Search on! 

Monday, August 29, 2011

AGoogleADay-inspired research

 One of the pleasures about writing questions for the AGoogleADay (AGAD) series has been to see reader reactions.  Most often people write in with thanks for a question they hadn’t thought about, or a perspective that searching to solve a question gave them. 

You’re wrong!  On the other hand, some people write with quibbles or questions about the AGAD daily challenge.   About a third of the time they have a legitimate complaint about a correct answer not being accepted by our automatic answer verification code.  (“The answer is 10,000,001 and YOU marked it wrong!”  True.  We should have allowed for a little more leeway in the answer range…) 

We try to fix these as quickly as possible, which can be tricky.  In the question about the “southern-most temple at Machu Picchu” we had folks saying we should also accept answers in Spanish (“Templo del Sol” in addition to “Temple of the Sun”), although my favorite complaint was that we should have also accepted answers in the language of the Incas since it was an Incan site!  If only I knew what that temple was called in classic Quechua…

I’m right!  Most often people write to let us know that their answer was correct, when in fact they hadn’t read the question fully.  This is probably the most common email we get, which worries me slightly.  The people playing AGAD are for the most part pretty careful readers and searchers—such comments always make me review the question to see if we couldn’t have phrased it better.  But often, no, we couldn’t have… they just misread the question and found an answer to a question THEY had in their head, not what we actually asked. 

More recently we ran a question (Aug 28, 2011): “One particular kind of bird migrates farther each year than all others. If you added all the miles an individual bird of this kind flies in an average lifetime, how many times would it have circled the Earth?”

There were three interesting complaints about this question...

1.  Your answer is off by a huge margin!  We had a fair number of people write in to say that the arctic tern (the bird in question) actually flies around 24,000 miles in an annual migration, and not (as we say in the answer, 44,000) .  This is a great question about research on the web.  Why? 

When you do the obvious search for [ arctic tern migration ] you’ll find several sources agree with the 44,000 number: -- 44,000 miles / year

But some give a different answer -- 22,000 miles per year:  -- 22,000 miles/year

What's going on with the variation?  (Looks to me as though someone mistakenly copied a km measure for a mile measure...)  

To find an authoritative result, I turned to the Proc. National Academy of Sciences (a very well-known, very authoritative journal in biology).  In the article:

     Tracking of Arctic terns Sterna paradisaea reveals longest animal migration
     PNAS 2010 107 (5) 2078-2081; doi:10.1073/pnas.0909493107; February 2, 2010
     Authors: Carsten Egevanga, Iain J. Stenhouse, Richard A. Phillips, Aevar Petersen,
            James W. Fox, and Janet R. D. Silk

The authors write:  "Our tracking of 11 Arctic terns fitted with miniature (1.4-g) geolocators revealed that these birds do indeed travel huge distances (more than 80,000 km annually for some individuals)."

Which is actually around 49,700 miles / year round trip. 

Why such a long distance?  Not only do they fly pole-to-pole (which WOULD be around 24,900 miles/year), but they also fly back and forth in a daily hunt for food, an activity that basically doubles the distance they fly.  It's a remarkable feat!

The moral (from a SearchResearch perspective) is that errors happen.  They happen all the time.  In this case, it looks like a units conversion problem that happened at one place and was copied along.  (And, FWIW, the Wikipedia entry on this is correct!  Many other sources that look decent are in fact completely wrong.) 

2.  How can you ask questions that need precise answers given that the information is approximate? 

 A few other writers worried about trying to do a computation of “average lifetime migration distance” (which is what we’re asking you to do).  How’s that possible?

Here’s what I did:  If you search for the bird’s lifetime and find it varies (from 20 – 25 years) and then search for estimates of the birds migration distance (44,000 – 49,700 miles / year) you can compute upper and lower bounds. 

Upper bound (that is, maximum distance traveled) = 25 * 49,700 = 1,242,500 miles
Lower bound (minimum distance traveled) = 20 * 44,000 = 880,000 miles

If we take the earth circumference as = 24,901 miles (measured at the equator—note that you get a different number if you measure around the poles)

Upper bound:  1,242,500/24,901 = 49.8 times around the globe
Lower bound:  880,000 /24,901 = 35.3 times around the globe

BUT… unlike other game shows, AGAD accepts a range of answers.  In this case, anything between 35 and 50 would have been okay.

3.  What about….?  

Several people wrote in with other long-distance fliers.  The muttonbird and the sooty shearwater were also posited as the winner in the “longest migration” flights. 

Well, the term “muttonbird” refers to any of a number of seabirds that are commonly eaten.  One such bird is—surprise surprise!—the sooty shearwater, Puffinus griseus

If you look at the migration routes for both the shearwater and the arctic tern, both birds essentially migrate from pole-to-pole over the course of a year.  So I suspect they both travel essentially the same distance.  It’s possible that shearwaters actually travel farther, but the overall distance will be about the same. 

If we look up a comparable report from the PNAS about shearwater migration distances, we find:   

        PNAS  103(34): 12799-12802. (2006) doi:10.1073/pnas.0603715103
          Authors: Shaffer, S.A.; Tremblay, Y.; Weimerskirch, H.; Scott, D.; Thompson, D.R.;
               Sagar, P.M.; Moller, H.; Taylor, G.A.; Foley, D.G.; Block, B.A. & Costa, D.P.

These authors write:  “… shearwaters fly across the entire Pacific Ocean in a figure-eight pattern while traveling 64,037 km (+/-9,779 km) roundtrip, the longest animal migration ever recorded electronically.”    (Or:  39,790 miles, +/- 6076 miles)  Much less than the arctic tern. 

Note that this was in 2006.  The previous paper about the arctic terns was published in 2010.  Darn it.  Scooped! 

Bottom line for search/research:  As we know, research is a tricky business that needs constant digging to get deep enough to the bottom.  And, of course, success is often temporary.  I’m sure that authors Shaffer, et al., thought they’d found the longest migratory bird, but then the arctic tern paper comes out a few years later with a longer migration path.  It’s possible there’s still another bird out there that does a longer commute.  But that’s the great thing about science—we constantly update our knowledge as we learn more about the world.  As John Maynard Keynes has been said to have said,  “When the facts change, I change my mind. What do you do, sir?”  (I’ll let you do the research into whether or not he actually said that or not.) 

Friday, August 26, 2011

Answer: What's the relationship between 'kayak' and 'tint' over time?

Okay, okay.  Several people wrote to me to say "huh?"

But bear with me for a moment--I think this is interesting.

I got to thinking about sunspots.  As you know, they vary dramatically in number over an 11-cycle cycle, the solar cycle.  Here's a diagram from UC Berkeley Solar Physics lab:
But I realized that publishing follows a very similar pattern.  That is, people don't write about everything uniformly at all times.  It's not something you might normally consider, but there is a real cyclical nature to the kinds of ideas and content that are searched for across the year.  Easy examples:  more is queried about flu in the winter than in the summer--same for things like mittens, turkeys, rain and even mice.

This naturally led me to start wondering about more sporting kinds of ideas.  So I started up Google Insights for Trends to see what the time-varying interest level is for biking.  
This is the week-by-week quantity of people querying about [ biking ] on the web during 2007 and 2009.  As you can see, there's a huge difference between summer and winter.  (You know that Google Trends and Google Insights for Search both show you the number of people querying on a given query over time. That's where this graph is from.)  

There's another way to look at the time-course of queries, and that's with Google Correlate.  

So I happen to run Google Correlate on the query [ kayak ] and found the following fascinating chart. 

What this shows us is that the interest level in the idea [ kayak ] varies over the year, peaking in summertime, just like [ biking ].  Again, that makes sense.  

But here's what surprised me.  The second highest correlated query is not [ biking ] but.. 

.. tint.  

Really?  When you select tint as the correlated search term, you get a graph that looks just the same as the one above--the correlation is 0.94--a really, really high correlation. 

But as we know, correlation is not causation.  So..  What's going on?  Why are these two terms so highly correlated?  

The answer is probably obvious to you, but I did all kinds of analyses trying to figure out what the connection was between these two ideas.  Was it that a special tint is used to make the plastics in kayaks?  Was there some kind of kayak brand called the "Tint"?  What?  

In playing around with the data, I finally noticed something seemingly minor, but an observation that led to a good insight.  

On the Google Insights chart for [ kayak ] I noticed that this query appears often in the US and in New Zealand as well.  Huh!  What's that about?  Obviously, Kiwis are pretty big kayakers as well.  So I did the obvious Google Insights search, but this time limiting my data set to just queries from New Zealand.  I'll put the US chart on top with NZ below:  

Notice anything odd?  They're almost perfectly out-of-sync with each other.  US interest peaks when NZ interest is at a low point, and vice-versa.  

I exported both data sets and combined in a handy spreadsheet program to produce this diagram that puts both data lines onto a single diagram (with slightly higher resolution):  

Again, this makes sense: New Zealand is in the southern hemisphere, so their summer is our winter, etc.  

Maybe that's what's going on with tint as well.  So I repeated the exercise:  US vs. AU for [ tint ].  (I used Australia rather than NZ because there's a higher search volume and the chart is clearer.  Everything still holds for NZ as well, but it's a better diagram.)  
Here's the chart:  

So between the northern and southern hemispheres, searches for [ kayak ] and [ tint ] are almost perfectly anti-correlated--one goes up, the other goes down.  This is equally true for [ kayak ] between north and south, and [ tint ] between north and south.  

I still was thinking that there might be some secret, previously unknown connection between kayak and tint, but then I went back to Google Insights and ran the searches together for just the US (2007-2009).  Here's what I got:  

So while the two terms are correlated, they occur in much different overall volumes.  Okay then, by doing a big more digging around (e.g., doing an AROUND [ kayak AROUND(9) tint ] search to see if kayak and tint ever occur near each other in any texts), it was pretty easy to show that this is a marvelous summertime correlation, but that there's no evident causation.  

The queries for "tint" in summertime have to do more with sun screens and tinted glass than with watercraft.  And "kayak" in the summertime might also be correlated with sunscreen, but only incidentally.  

Bottom line:  "kayak" and "tint" co-vary over the year, peaking in summertime, and lower in winter--and they do this out-of-sync with their southern hemisphere counterparts.  That is, they both reflect summertime, rather than some secret, previously-unknown link.  

Searching on! 

Thursday, August 25, 2011

Wednesday Search Challenge (August 24, 2011):Finding word relationships

It's sometimes useful to understand why / when / where a specific term is being used.  If you think about writing as a natural activity, that is, as a process that occurs naturally over time and space, you start to realize that text is created in a particular time and place.  People write articles, books, editorials (etc etc) for a particular audience at a particular time.  

If you take that observation to heart, you can think of documents / text almost like an natural, organic process, as though it were weather that's happening.  

The challenge for today is an interesting one:  What's the temporal relationship between the words "kayak" and "tint"?  

I'm tempted to leave it at that--it's a great puzzle just as it stands.  But I'll amplify a bit.  

In this blog we mostly talk about how find the answer to a specific question.  "What is the between and ?"  or "How much is ?"  

Typically, that involves searching through collections of documents and finding the bits of text that contain information that you need to answer the question.  

But the search challenge of today is somewhat different.  What I'm curious about is how these two words are used over time, and what, if any, relationship there is between them.  

The answer is interesting and surprised me.  It makes perfect sense, but I really hadn't thought about it before.

A big clue:  Consider how and when a word like "kayak" or "tint" is used in the US vs. Australia.  

It's a great puzzlement.  

Search on! 

Tuesday, August 23, 2011

Control-F and other tools for reading online

I seemed to have caused a minor disturbance in the Force with my comments to Alexis Madrigal last week about search.  

We were having a conversation about how I go about doing my research at Google (especially "search anthropology") when I mentioned the results of my Control-F study.  The key result (which I wrote about a while ago in SearchResearch) is that around 90% of all English-speaking US internet users do NOT know how to find a specified string on a web page using anything other than visual search.  That is, they don't know about the Edit>Find> function, or how to use Control-F (or CMD-F)... or any other means to determine that the word (or string) is there.  

In other words, 90% of internet searchers don't know how to jump to the location of a desired string on a web page.  What's more, they also cannot prove that a particular string does NOT appear on a web page.  

This is important because this one tool--Control-F, find--changes the way you read long documents. 

If you're searching for any occurrence of a word (say, "iceberg") in a web document, you can quickly find out where in the document that word appears, and how often.  

Suppose I was reading an interesting article about the effect of tsunamis on iceberg creation.  (Apparently when the recent Japanese tsunami hit the Antarctic ice shelf, some massive icebergs were made as a side-effect!) Here's a good article on this:  snapshoted at the moment when I did a Control-F find for "tsunami."  

(You can click on the image to make it large enough to read my annotations.)  

This is the Chrome browser, which has a pretty nice feature in their Find function.  Note the yellow occurrence lines in the scroll bar.  That shows you where the hits for your find term are in the body of the document.  It's pretty clear to see here that there are a bunch:  46 to be exact, and the find function has highlighted number 19.  

Looking at the pattern of hits can tell you a good deal about the document.  Is it rich with hits?  Do the hits all congregate just at the beginning or the end of the document?  (Often the case when you're searching for an author's name, or key words that are used only in introductory text.)  

I've written more about the Subtleties of finding text in a document (basically, choose the shortest unique substring).  

It's interesting to consider what the presence of a tool like Control-F does for our ability to read.  As mentioned, you can now prove that a given word doesn't appear in a document.  Handy when you're on a very long document and don't want to waste a lot of time.  You can discover relationships in long documents that are difficult to perceive otherwise.  

I remember reading Jurassic Park on my Mac Duo, back in the days when entire books could be purchased on floppy drives (Voyager, 1992)... 

Using their find function, I was able to see where place names in the text were used, and I found that Cabo Blanco was mentioned at the very opening, and at the very end... only.  Seemed to me like the perfect device for a sequel.  I'm not sure I would have noticed otherwise.  

That's the simplest case I can think of: Using a find command to see structure in the text that's otherwise very difficult to notice.  

More importantly, what other tools should we have, and how would they affect our ability to read?  

A few ideas spring to mind.  This is a short list of some of the reading functions I'd like to see in my editor/browser.  

1.  Concordance function.  Able to count words in the current selection and sort by frequency.  Would be handy to get a gist or summary of what language is being used. 

2.  Nearby repeated words flagger.  One of the most common errors *I* make as a writer is to write something, then return to that section and edit it, using nearly identical words.  It's a big oops when you read back through it.  I'd like my editor to automatically note when I'm using a low-frequency word within some distance of another rare term. 

3.  Statistically improbable phrases.  Given a text (e.g., a long report or a magazine article), I want my browser to be able to highlight the phrases that are low-frequency wrt other writings.  If someone writes a really whacky phrase in the middle of a text, I'd like to be able to see it highlighted.  (Idea derived from Amazon's SIP section of their UI.  Knowing that the phrase "tyrannosaur roared" is in the book tells me something interesting about the book).  

What other kinds of reading-behavior tools would YOU like to see? 
Or, what other kinds of tools do you use NOW to help your online reading?  

Search on! 

Thursday, August 18, 2011

Answer: What is this thing in the middle of the street?

So... what IS an "anode" anyway?

If you do something like:

[ define:anode

you'll see it's got something to do with an electrochemical system.  In particular, an anode is an electrode through which electric current flows into a polarized electrical device.  Great.  What's that got to do with an anode in the middle of my street?

I decided the simplest strategy was to do the obvious search:

[ anode in street

 which in turn led me to read an article on "Design Information and Guide for Corrosion Control of Steel Used for Underground Installations."  It's not a catchy title, but it gave me a big clue: the article discusses something called "cathodic protection of underground metal" by using a "sacrificial anode."

After poking around on various version of anode queries for a while (including [ anode in street ] [ anode in pavement ] [ street anode ] and others, none of which seemed useful) I switched strategies to a connected term -- to wit, "cathodic protection."

The query:

[ cathodic protection

took me to the Wikipedia article on Cathodic Protection, which nicely summarizes the situation.  A "cathode" is the other part of the structure to be protected from corrosion.  The anode is "sacrificial" in that it gets consumed in the process.  So... what needs protecting underground?

Reading farther down in the Wikipedia article answers the question--"Pipelines are routinely protected by a coating supplemented with cathodic protection."  Which are " anode, or array of anodes buried in the ground...and can be installed in a vertical hole and backfilled with conductive coke (a material that improves the performance and life of the anodes)"

Now we're getting somewhere!

We just have to confirm that this picture is actually of a sacrificial anode used in a cathodic protection system to maintain underground pipes.

The only other thing we know from the picture is that word "Christy."  Sounds like a manufacturer.  So I do the search:

anode cathodic protection christy

The first result takes me to the website of the Far West Corrosion Company with a page all about the Christy labeled Traffic Valve Box, 8-1/2" I.D. x 12... with a diagram that exactly matches my picture.  It's the right size, it's got the Christy brand mark, and it's labeled with "ANODE."

Found it.  By doing a little reading of web pages about "Cathodic protection anodes" on the Far West website, I learned a great deal about why you need cathodic protection for any metal pipes that are buried soil that's even slightly salty (which is nearly everywhere).

Moral of the story:  Sometimes a search term is just too generic to be useful.  In this case, "anode" is a great term, but it's hard to figure out the solution to this puzzle without learning that the key concept is "cathodic protection."  Once you know that, the rest of the search process is pretty straightforward.

When searching for a difficult term, consider looking around to find another term (not quite a synonym) that will get you to what you're seeking.

Searching, ever more sophisticatedly, onward!

Wednesday, August 17, 2011

Wednesday Search Challenge (August 17, 2011): What is this thing in the middle of the street?

As you know by know, I'm a curious fellow.  When I see something that's puzzling, I often write the puzzlement down in my little notebook, and the look it up later when I have a spare moment.  Sometimes I'll take a picture of said obscuratum and work from that.  

I guess that makes me a high need for cognition personality type.  But you knew that already. 

So I'm walking down the middle of my street the other day when I spotted this: 

 It's in the middle of the street and is about 8 inches in diameter.   

Naturally, I'm curious. 

And since you're reading this blog, so are you.  We're all high NFC personalities.  

Today's search challenge:  What is this thing?  And why is there one in the middle of my street?  What does it do?? 

 (The only other clue I'll give you is that I'm now starting to see them everywhere.  They're pretty ubiquitous.  Now that you're attuned to them, you'll start seeing them too!)  

Search on!

Thursday, August 11, 2011

Answer: What is this creature?

Quick answer:  It's the larval form of a green moray eel! 

Now, here's the real story.  A friend showed me his picture of this amazing animal, but I wasn't able to get a copy of his image.  (Long boring technical failure story elided here.)  So I had to do this search just from my visual memory of the animal.  

I knew it was some kind of eel from the shape of the head.  (And I gave this bit of inside knowledge in my problem statement.)  So I did a fairly obvious image search for 

[ transparent eel ] 

and switched to Google Images to find another example that matched what I'd seen.  It was fairly short work to find that this really WAS an eel and that it was called a leptocephalus ("flat head").  A few quick searches after that confirmed that this was the larval form of a moray eel.  

Answer path:  [ transparent eel ] --> Images -->  which led me to this remarkable paper: Ecology of Anguilliform Leptocephali:  Remarkable Transparent Fish Larvae of the Ocean Surface Layer” Aqua-BioSci. Monogr. (ABSM), Vol. 2, No. 4, pp. 1-94 (2009)

Disclosure:  This is the paper from which I got those great eel images.    If you have any interest in the way eels work, live, breed and develop (and it's still one of the great mysteries of icthyology).  

To quickly summarize:  The lepthcephali are the juvenile form of eels (this picture happens to be of a moray).  All eels go through such a larval stage when they're largely transparent and drift with the plankton.  Oddly enough, muraenid leptocephali (i.e., of the moray eel family) have pectoral fins, very different from that of the adults, which do NOT have them. When they transform from leptocephalus to juvenile that the pectoral fins are resorbed.

Now, what do they eat?  (And why do so many leptocephalids have such large, fang-like teeth?)  

I learned from another paper (Diet of anguilloid larvae: leptocephali feed selectively on larvacean houses and fecal pellets,  N Mochioka, M Iwamizu, Marine Biology (1996) v: 125, i: 3, p: 447-452) that leptocephalids consumed "...larvacean houses and zooplankton... No trace of the many other phytoplankton or zooplankton, which were found with leptocephali...  On the basis of the importance of larvecean houses in the diet of several species of leptocephalus larvae, it is proposed that the peculiar, large, fang-like teeth of leptocephali are used for feeding, and evolved to pierce and grasp the mucous houses of larvaceans."  

So, to save you the trouble, "larvacean houses" are the discarded rigid skeletons of tunicates that live nearly everywhere in the oceans.  Wikipedia describes the larvacea as "...solitary, free-swimming tunicates found throughout the world's oceans. Like most tunicates, appendicularians are filter feeders. Unlike other tunicates, appendicularians live in the pelagic zone, specifically in the upper sunlit portion of the ocean (photic zone) or sometimes deeper. They are transparent planktonic animals, generally less than 1 centimetre (0.39 in) in body length (excluding the tail)." 

 (Larvaean "house" image from: Arctic Science Journeys, 2002. )

I didn't know that eels even HAD larval stages, let alone that they rely so much on the discarded houses of tunicates.  The zooplankton makes sense, but even simple searches result in really interesting discoveries.  

Search on! 

Wednesday, August 10, 2011

Wednesday Search Challenge (August 20, 2011): Identifying from real life

I think I've already mentioned that I'm an avid scuba diver and have been known to travel long distances to go diving in an especially beautiful place.  I grew up diving in Southern California waters, so I'm entranced by the spectacular beauty of the tropics--corals and small, colorful fish just completely make my day.  

One of the challenges of diving in remote locations is that you'll often come across animals you can't identify.  

Here's one such problem that came to me recently (I've modified it a bit from the original in hopes of making it somewhat simpler.)  

A dive buddy sends me an email with the following attached images asking, "What is this animal?  I saw it while diving in southern Indonesia.  It's a little longer than 1 foot in length, and has a pointy fish-shaped head (look at the second photo, the head is on the lower left). It's really flat and moves through the water like an eel.  But it's mostly transparent!  What IS this beast?" 

Or, more to my tastes--what is it?  Where does it live?  What does it eat?  (etc.... basically, I want to know the natural history of this wonderful creature.)  

This, plus these images, are everything I know... 

Search on! 

Tuesday, August 9, 2011

Other cultures, other languages--you have to search where the information lives

In today's puzzle the correct answer is NOT in English.  That's not terribly surprising since the question is about Pachacuti's sacred city in Peru.  What has been surprising is that several people have written in to comment that the answer is wrong.  Their complaint?  That the answer isn't in English!  

One of the tenets of search is that you've got to go where the data lives.  If you're searching for information about another culture, it often makes sense to search in that language.  Example:  searching for information about dolphin-related Greek temples is pretty straightforward in Greek (using "Translated Foreign Pages" feature), but difficult and slow in English.  

On the other hand, the multiple language problem is a persistent bugaboo.  We call the capital of Italy "Rome," not "Roma" even though the locals refer to it in the second form.  (And in English we have multiply-named entities:  The Battle of Bull Run vs. The Battle of Manassas.)  

It's a basic thing to remember as you search for more difficult things:  There are often many ways to refer to something.  While Google's synonym-system is pretty good, you often know a LOT more about what might be acceptable alternate ways of expressing the key idea.   

Especially in other languages.  Keep this in mind when searching!  

Search on!

Monday, August 8, 2011

What do you find hard to search? Some answers.

I'm still away on vacation, but I thought I'd collect a few answers that I've received from friends and make a few comments along the way (my comments are in blue-sans-serif-bold).  There will be a new search challenge this coming Wednesday... so be sure to come back then! 

What DO people find hard to search?  


• Entities with very common words as names
Yep.  That's hard.  The trick here is to add another term that helps to discriminate.  Example:  [ ted talks ] rather than just [ ted ] 

• Obscure books/entities hidden by more popular ones that come up under the same search terms
When your results are "polluted" by other intrusive results, that's a call for the minus sign!   Example:  [ salsa -dance ] 

• Finding products with hard-to-describe features (e.g., adapter from two 2-stripe ⅛" headphone plugs to one 3-stripe ⅛" plug)
In this case, try using the simplest description you can use.  The trick here is to figure out what language OTHER people use to describe the same thing.

• Ideas that can be described in many different ways (e.g., Has X new project idea already been done by others?)
Truly hard.  The trick is, again, figuring out the common terminology...  

• Ranges of size/capacity (e.g., USB cable 6"-18", backup cell phone battery 2000-4000 mAh, etc.)
Try using the number range operator.  Example:  [ 4..40V battery ]  will find all batterys within that voltage range.  [ 2000..4000mAh battery ] 


• Chinese city information. 
- I have been searching for areas of Shanghai.  Anglicized Chinese words are inconsistent and the results are very mixed.
True.  Sorry about that.  The underlying conversion from Chinese to EN is inconsistent as well.  

• Official rules/laws.
- I recently got a parking ticket on a poorly marked street. I wanted to find out what the parking laws were the city. I found lots of discussion sites but nothing official on the first level pages.
This is another real problem: how to find the authoritative legal or regulatory pages.  In this case, the governmental organization often does a disservice by tucking their content away in obscure locations, or putting it into giant blobs (e.g., the parking code is all in one giant 50Mb PDF file).  


•  Things whose semantics depends on word order, for example, quoted "convert text to rtf" gives you a lot more about converting RTF to text (admittedly a more common problem).
Absolutely correct.  It's not widely known that word order counts in Google searches!  Try to make your searches be the word order you're likely to find.  Example:  [ black and white ] is very different than [ white and black ] 

•  There's a lot of what you might call "stale" pages about software or device problems, but which aren't stale because they pertain to previous releases or previous hardware which people still have, but are irrelevant to new people because new releases happen and the problems of yesteryear are now solved. Or because the same symptom now has a different cause. Now, the pages aren't stale and they're important if that's what your looking for, but if you're on the current release of whatever you're looking for, they're irrelevant.

For example, suppose you just bought a USB jumping frog and you want to look for how to make it work in Ubuntu. You'll be treated to years of interesting discussion about frog-jumping protocol reverse engineering, followed by intricate pages describing frog hotplug scripts. And it will work if you do it that way, and all of this was exciting and relevant when it was happening, but nowadays USB jumping frogs are natively supported, and it's all due to that original work, but for an end user, it's totally irrelevant. So new people find the old stuff, and they comment on it or link to it, and it becomes newly relevant, even though it's been subsumed. But it's not obsolete if you happen to want that stuff, because they're running an old version.

Ditto for anything: Java releases, MS Office, IE6 vs everything else, problems with different versions of Roomba vacuum cleaners.
This is a big problem on the web in general--how does one determine the correct date or version of a document?  IF ONLY there were a single date / version convention... or even a reasonable number of them.  Alas, the only date that seems reliable is the date of posting, and that's often not a useful timestamp.  


Running gives me various activities not related to marathons. If I search for marathon, marathon oil comes up high on my search results as well. Ironman is another contender, I'm interested in ironman the triathlon event not the character.
Another great example of when you need to add a carefully chosen second or third term.  Just doing a search like [ marathon ] or [ oil ] or even something like [ sun ] just isn't going to give you what you want.  [ marathon race ] is probably better.  

Looking through 1300 profiles for the proper search terms really takes the biscuit.
Yep.  True!  


• Digging through my huge social network discussions… everything that goes only through Facebook, Google+, or Twitter gets lost after a week. I have to mark the important stuff in blog posts or bookmarks... but I don’t always  know in advance what will be important…
Well, once upon a time (like, 1 month ago) we had a "real-time" search feature which would search FB, Twitter and other sources.  Now it's history... BUT... we're working on a new set of ideas here.  G+ won't be without search forever.  Some kind of "rapidly updated content" search tool will exist.  Hang in there. 

I'm still interested in what you find difficult.  So please email me directly if you'd like to tell me a story offline, or add a comment below!  

Wednesday, August 3, 2011

Wednesday Search Challenge (August 3, 2011): What do YOU find difficult?

I'm actually going on vacation for the next couple of weeks, so I'll be posting a bit more slowly than usual as I travel through a world with intermittent or completely lacking of internet connections. 

And so I want to post a Search Challenge for YOU to answer over the next week.  (I will be checking in to moderate the discussion....)  

One of the key questions I have in this work is what are difficult areas to search?  

We already know that some domains or topics are tough.  Medical search, especially for chronic conditions that require long-term care and management, are difficult topics. 

We know that searching for travel plans (especially when you coordinate flights, hotels, special deals, meetings with friends...) is tough.  

I'm curious about what YOU find difficult to search out!  Are you an antique-collector?  Do you do extensive research into your genealogy and find that painful? 

I want to know your pain points, what difficult for you to seek out on the web.  And if you can, give us a quick analysis about why it seems so hard.  (Is it just that there's no good content out there?  Or is it not indexed correctly, or... what?) 

Looking forward to YOUR search challenges!