Monday, May 16, 2011

Synthesizing knowledge with Google Spreadsheets

Sorry about the delay in answering.  I spent last week at the Computer-Human Interaction conference in Vancouver, hanging out with my user-experience colleagues and enjoying Vancouver's flowers and waterfront! 


As you recall from last week, my friend asked:  "Can you tell which of the Fortune 200 companies have been around for more than 100 years?"


It's a great question, but not an easy one.  Here's how I solved it.  I followed much the same process as Fred Deventhal (see his comments from last Wednesday).   Nice job, Fred! 


Another regular reader, Ross Nelson, tried to solve this using Google Squared. That approach almost worked, but fell apart because you don't have quite enough control over how it works. 


My solution... 


After poking around for a while with queries like [ Fortune 500 start date ] and [ Fortune 500 1700..1910 ] I realized that the start date of a company is typically called its "founding" -- as in, AT&T was founded on ....  


Tried that for a while, and got nowhere either.  So. HOW can I get the founding dates?


I knew that for any ONE company I could easily look it up (say, on Wikipedia, or the corporate web site).  But doing that repeatedly would quickly grow old.  How about if I start with a list that's easy to get--just the names of the Fortune 500, from which I'd pull off the top 200.


Changed my query to [ list of  Fortune 500 ]   that worked well and go me to this:


http://money.cnn.com/magazines/fortune/fortune500/2010/full_list/101_200.html


Now I've got the list.  I copied/pasted the top 200 company names into a Google Spreadsheet ("Fortune 200 Spreadsheet")  and then wrote a simple GoogleLookup function to take the company name and lookup (using Google) the "founded" date.  


If you look at column C (the "Date Founded") column, you'll see each formula looks like this: 


=googlelookup(B2, "founded")


This is the Google Spreadsheet Lookup function that searches for the "Founded" property on each of the company names from column B.


As you can see, it actually worked pretty well. I had to go a fix up only a couple of the entries (which returned the date plus a bit of extra text). The fixed-up dates are in column D ("Int version" -- where "Int" stands for "integer").
A little calculation later, and you get the years-as-a-company column (E).


If you look at the right side of the spreadsheet, you'll find two charts I made for fun. The first one "Years as a company") shows the distribution of companies by age. The downward trending slope is what you'd expect if there were companies founded roughly every year that made it to the Fortune 200. So that's not a surprise.


I then made a version with companies grouped by decade using the Spreadsheets 
histogram function.




Search lesson: When you've got a search task that requires looking up a lot of very similar pieces of data, consider using a tool to help out. This one worked pretty well. In future posts I'll talk about more complex ways to do similar kinds of searches over larger, more diverse data sets.


Search on!





No comments:

Post a Comment