As several readers pointed out in the comments, my analysis didn’t REALLY answer the question I’d asked.
They were correct. Thanks for picking up on this.
I found the languages by COUNTRY, then added up ALL the languages spoken in the country. That’s not what I asked for! I wanted the TOTAL number of unique languages spoken in each hemisphere.
Okay. Let’s do this again. Better this time.
This time I pulled the list of languages from the CIA World Fact book page on languages of the world.
I copied it from the web page and pasted it as plain text into a .TXT file. This gave me a simple spreadsheet with country names in the left column and columns of languages to the right.
|From CIA Fact Book (see link above)|
HOWEVER… I now had to do some data cleaning. Many of the language entries had comments (e.g., percentages of speakers of that language in the country) scattered throughout the listing. They’re useful for readers of the language list, but not helpful for the data analyst.
So I spent about 45 minutes cleaning up the data: removing percentages, deciding which small languages to keep (I mostly didn’t, but just called them all called “other”—sorry speakers of regional dialects). I alsot tried to canonicalize all of the variant spellings of different African languages, but I’m sure I didn’t get them all right. (So there’s a slight overcount of African languages.)
After about 1 hour total, I had a spreadsheet of countriesw/ a list of languages. (I’ve shared these spreadsheets with you so you can see what I did.)
I turned this into a Fusion Table of countries and languages spoken (so I could fuse it with my already existing Fusion Table Country-LatLong
Then exported THAT fusion table as a regular spreadsheet so I could easily select the rows that are below the equator (that is, that have a latitude with an “S” in them).
This gave me: https://docs.google.com/spreadsheet/ccc?key=0AhlpTzK9iG-2dDRDbE43SndrN1hOSW5jZ2FrVVEydVE#gid=0
Now we’re getting someplace!
Next I sorted by latitude, then copied all of the languages BELOW the equator into a simple text file (which I then imported into a spreadsheet) so I could remove duplicates easily with a spreadsheet function. In Google Spreadsheets, that’s the UNIQUE function. This step gave me the Master list of Southern Languages.
Did the same thing with the languages ABOVE the equator and created a spreadsheet Master List of Northern Languages.
Once again I used the unique function to identify the unique languages.
Now I can read off the numbers: the Southern Hemisphere has 90 unique languages, while the Northern Hemisphere has 230 unique languages.
Lesson learned: Doing these kinds of analyses can sometimes be tricky, even when you write the question!