## Friday, August 26, 2011

### Answer: What's the relationship between 'kayak' and 'tint' over time?

Okay, okay.  Several people wrote to me to say "huh?"

But bear with me for a moment--I think this is interesting.

I got to thinking about sunspots.  As you know, they vary dramatically in number over an 11-cycle cycle, the solar cycle.  Here's a diagram from UC Berkeley Solar Physics lab:
But I realized that publishing follows a very similar pattern.  That is, people don't write about everything uniformly at all times.  It's not something you might normally consider, but there is a real cyclical nature to the kinds of ideas and content that are searched for across the year.  Easy examples:  more is queried about flu in the winter than in the summer--same for things like mittens, turkeys, rain and even mice.

This naturally led me to start wondering about more sporting kinds of ideas.  So I started up Google Insights for Trends to see what the time-varying interest level is for biking.
This is the week-by-week quantity of people querying about [ biking ] on the web during 2007 and 2009.  As you can see, there's a huge difference between summer and winter.  (You know that Google Trends and Google Insights for Search both show you the number of people querying on a given query over time. That's where this graph is from.)

There's another way to look at the time-course of queries, and that's with Google Correlate.

So I happen to run Google Correlate on the query [ kayak ] and found the following fascinating chart.

What this shows us is that the interest level in the idea [ kayak ] varies over the year, peaking in summertime, just like [ biking ].  Again, that makes sense.

But here's what surprised me.  The second highest correlated query is not [ biking ] but..

.. tint.

Really?  When you select tint as the correlated search term, you get a graph that looks just the same as the one above--the correlation is 0.94--a really, really high correlation.

But as we know, correlation is not causation.  So..  What's going on?  Why are these two terms so highly correlated?

The answer is probably obvious to you, but I did all kinds of analyses trying to figure out what the connection was between these two ideas.  Was it that a special tint is used to make the plastics in kayaks?  Was there some kind of kayak brand called the "Tint"?  What?

In playing around with the data, I finally noticed something seemingly minor, but an observation that led to a good insight.

On the Google Insights chart for [ kayak ] I noticed that this query appears often in the US and in New Zealand as well.  Huh!  What's that about?  Obviously, Kiwis are pretty big kayakers as well.  So I did the obvious Google Insights search, but this time limiting my data set to just queries from New Zealand.  I'll put the US chart on top with NZ below:

Notice anything odd?  They're almost perfectly out-of-sync with each other.  US interest peaks when NZ interest is at a low point, and vice-versa.

I exported both data sets and combined in a handy spreadsheet program to produce this diagram that puts both data lines onto a single diagram (with slightly higher resolution):

Again, this makes sense: New Zealand is in the southern hemisphere, so their summer is our winter, etc.

Maybe that's what's going on with tint as well.  So I repeated the exercise:  US vs. AU for [ tint ].  (I used Australia rather than NZ because there's a higher search volume and the chart is clearer.  Everything still holds for NZ as well, but it's a better diagram.)
Here's the chart:

So between the northern and southern hemispheres, searches for [ kayak ] and [ tint ] are almost perfectly anti-correlated--one goes up, the other goes down.  This is equally true for [ kayak ] between north and south, and [ tint ] between north and south.

I still was thinking that there might be some secret, previously unknown connection between kayak and tint, but then I went back to Google Insights and ran the searches together for just the US (2007-2009).  Here's what I got:

So while the two terms are correlated, they occur in much different overall volumes.  Okay then, by doing a big more digging around (e.g., doing an AROUND [ kayak AROUND(9) tint ] search to see if kayak and tint ever occur near each other in any texts), it was pretty easy to show that this is a marvelous summertime correlation, but that there's no evident causation.

The queries for "tint" in summertime have to do more with sun screens and tinted glass than with watercraft.  And "kayak" in the summertime might also be correlated with sunscreen, but only incidentally.

Bottom line:  "kayak" and "tint" co-vary over the year, peaking in summertime, and lower in winter--and they do this out-of-sync with their southern hemisphere counterparts.  That is, they both reflect summertime, rather than some secret, previously-unknown link.

Searching on!