The quick answer is this: No.
The rainfall for October 2012 was pretty ordinary—in fact, much less
than just 3 years ago, when the rain fell heaviest in October 2009. (Which goes to show you how imperfect human
memory is: I don’t remember October 2009
as being especially rainy, but when I look back at my journal from that year,
it’s clear it rained a LOT.)
As I showed you in my graphics from
yesterday, I really wanted to get a single, simple chart showing the rainfall
across the past decade of Octobers. It’s
often important to have a clear goal in mind, especially when you start wading
into the sea of data that’s out there. It’s really easy to get sidetracked by
all of the other beautiful charts out there and lose track of what you’re
seeking. (Which is why I hand-sketched
the chart: To make it clear what my goal was.)
Like many readers, I knew about a couple of
sources off the top of my head.
Wunderground is well-known (and is a composite of professional and
amateur weather stations), the data quality is variable, but they often have
data where there’s no other source available.
(For instance, there’s a Wunderground reporting station in my
neighborhood. Awfully handy.) And of course, you could have done the obvious search: [ rainfall SFO data ] or if you've done this kind of thing before, try [ precipitation SFO ]
I also knew that NOAA (the National
Oceanographic and Atmosphere Administration—the official government weather
data collection group) would have data.
The problem with NOAA is usually digging down deep enough through their
reams of data to find the one you want.
Then, someone happen to mention
WeatherSpark.com, which has a wonderful set of weather visualizations and the
ability to drill down and select more-or-less what you want.
So I did what any discerning data junkie
would do… I got data from all three so I could compare them.
Data collection: Getting data from Wunderground is pretty
easy. Click on “Local Weather” and find
the “History Data” tab. That will get
you to data for a given date. Once you
find the “Monthly” button, you can find the cumulative precipitation for that
location (KSFO).
If you then
notice the format of the URL for the monthly data report:
http://www.wunderground.com/history/airport/KSFO/2012/10/31/MonthlyHistory.html
You can then change the year to 2011, 2010,
etc and collection the data.
Pop that into your favorite spreadsheet, and
plot the graph. (Here's the link to my spreadsheet version of these charts.)
That took me only a couple of minutes. Of course, it helped that I knew about Wunderground to start
with.
Comparison with NOAA: To get “ground truth” data from the
authority, I started my search at NOAA.gov – it didn’t take me too long to
click through their site (which is really the best way to do it—you have to
learn what they call things by reading their pages—they often use very
technical terms that you have to learn along the way).
But it was only a couple more minutes for me
to get to the National Climatic Data Center (NCDC) and find that I could ORDER
(for delivery via email) the data set for SFO.
I ended up on their data-set order page ( ) and got an
email from them a few moments later with a link to the SFO data set for 2000 –
2012!
I downloaded the data (which has a ton of
values and cryptic notation), poured it into my spreadsheet, opened in up and…
realized I needed to go read the documentation.
This is professional data, so I really need to understand what things
like “Daily HGCN” meant and what that number in the HPCP column meant. (Turns out they measure rainfall in 1/100ths
of an inch—so a 25 in that column is really 0.25 inches.)
Fine.
I’m a data guy, so I opened up THAT spreadsheet, filtered out all of the
non-October values, added up the numbers and got the values for a decade of
Octobers. (The one thing I tripped over
was that one of the months had a HUGE rainfall—well over 10,000 inches! What happened? Turns out they use a number of 9999 to denote
“rainfall not measured,” so I had to go back and clean up the data a bit. Not a
big deal, but lesson learned—when there’s something funny in the graph, go
check it out.)
Note how similar these graphs are. This makes me feel good. Especially
that both graphs have 0 inches of rainfall in 2002 and 2003.
Comparion with WeatherSpark: Using the WeatherSpark UI (check it out if
you’ve never used it) it’s pretty easy to select a decade’s worth of rainfall
data from SFO. I put it into my cart and
then went to checkout. Surprise! This data costs money! Given that I already had 2 data sets (including
one from the government), I was a little reluctant to spend $15 to get the
data, but I figured I’d be willing to do it for the blog.
So, one credit card transaction later I had
the data. Dropped into my spreadsheet
and… guess what… it’s the same.
Not too surprising I guess, but it’s another
lesson learned.
But the good news is that I feel pretty
confident that these graphs represent what has happened over the past ten
years. Including telling me that this
past October was just ordinary.
Lessons learned: There were a lot to learn here.
1.
Triangulation is a best practice.
Get your data from multiple places.
In this case, WeatherSpark just replicated the NOAA data, BUT they also
cleaned it up. In this case, it wasn’t a
big deal—but it could have easily been worth the money for their slicing/dicing
and cleaning of the data.
2.
Check your data. That error code
(9999) might have been easy to miss if the numbers had been larger. Look for obvious errors (in this case, a
giant spike in the data), but also just eyeball it. Sometimes you’ll see things
that would escape plotting.
(Example: Suppose they’d used a
-1 to indicate an error. You’d never
seen that in the plot because it would look just like a 0 at these scales, but
you could visually pick it out of the data.)
Addendum (11/9/12):
4. Make sure you're answering the right question. A few of our loyal readers answered the question... kind-of... When I was talking with people about this particular problem, it was clear that often they'd slip slightly off the rails and find data about "San Francisco" and never notice that they'd started by searching for data from SFO. "San Francisco" usually refers to the city (a place that's noted for its very different weather patterns than those at SFO). (Hat tip to reader Goon for reminding me of this.)
Search on!
(And stay dry…)
Nice. I was useful!
ReplyDeleteAnother fun challenge as usual and interesting lessons to take from it.
ReplyDeleteI would also add another note that you should keep in mind what you're searching for. I noticed a few people providing stats for San Francisco rather than just SFO and for annual rainfall rather than October rainfall.
That's right. I'll add this as a comment / update. Good catch.
DeleteHi Dan, I enjoyed your Keynote at the last GAFE New England Summit.
ReplyDeletePerhaps you can Help a friend sleep :)
Here is a great search question
https://twitter.com/courosa/statuses/256238512108621824
I think I will add this to my next Internet search exam