Friday, March 23, 2012

Answer: Are there more languages ABOVE or BELOW the Equator?


The question was:

     Are there more languages ABOVE the Equator or BELOW the Equator? 

As I said, a big part of answering questions like this is making your terms very clear.  When we decide to consider only “Official” languages, and not, say, every small language spoken in New Guinea (which has thousands) or every language spoken in central Africa (more thousands), we REALLY simplify the problem. 

What this lets us do is search for a list of “Official languages.”  So just doing a search like [official languages] quickly leads to the Wikipedia page of OfficialLanguages.  And that's not a bad place to start.  

But by looking around, I found that search ALSO leads to a languages-by-country table on Infoplease with more-or-less the same data in a somewhat more convenient form.  (Another search that works would have been [ list of languages by country ].) 

So I now have a table from Infoplease that looks like this:

To convert it into a form that we can manipulate, I just copied out the text of the table and saved it into a plain text file.  Handily, the copy/paste into a file automatically inserts TABs between the columns.  This will make it simple to import into a spreadsheet,  

Now I need to figure out which of these countries have their capital in the northern hemisphere.  So I THEN did a search for [ country capitals latlong ] and found another handy table of country capitals with their associated lat-longs! 

I did the copy/paste to get this data from their web page into my plain text file.  I then imported that into another spreadsheet.  

Now I’ve got two spreadsheets, one with a list of languages-by-country, and another with country-capitals and their lat-longs.



From the Lang-by-country spreadsheet I can easily write a spreadsheet function to count the number of languages in each country. 


A typical row looks like this:

    Afghanistan, Dari(Persian), Pashtu (both official),other Turkic and minor languages

The 199 languages of each country in this table are separated by commas.  So, I write a short spreadsheet function to count commas in each row, and—voila!—I have a table of Country / Number-of-languages.  (I’m happy to let the phrase “…other Turkic and minor languages” be counted as 1 extra language.  You'll see it won't matter in the end.)  By this method, I see that Afghanistan has 3 languages.  Yes, there are others, but there are only 2 official languages (and many non-official languages). 


In my other spreadsheet I have a table of 200 Country / Capital / Lat-longs. 

Obviously what I want to do is to use the Lat-longs to determine if the country is ABOVE or BELOW the equator, and then just count the number of languages in each group. 

This is exactly what Google Fusion Tables are designed to do.  

A Fusion Table can take two different tables that share a common element (in this case, the column labeled “Country”) and JOIN them together.  That’s 200 copy/paste operations I won’t have to do by hand. 

I created two Fusion Tables, one for each of my two spreadsheets. Using Fusion Tables "Merge" function, I just joined them together on their shared column "Country."  

 
I opened the Country-LatLong table and then FUSED it with the Country-Language-Count table.  This basically merges the two tables so that I now have new table with columns for Country / Lat-long / Number of languages! 

Once I have this, it’s an easy sort of the table to put all of the countries above the equator (that is, their Lat-long contains an “N” in the entry), and then write a simple =SUM(…) function to add them all up. 

Here’s what my table looks like now.  I converted the presence/absence of an “N” in the lat-long into a 1 (for ABOVE) or 0 (for BELOW).

And now we can just read off the answer: There are 546 official languages spoken above the equator, but only 164 spoken below.  That’s a big difference. 

I’ll leave it to you to speculate why that is.  But at least we’ve got a little data to go on.

And, since I was in Fusion Tables, it was trivial to visualize the data by mapping the number of languages onto the map-of-the-world and create this map.  The darker the green, the more languages spoken there.  As you can see, the north has a LOT more than the south. 



SEARCH LESSON:  Often you’ll find that data is out there on the web, but it’s in an un-handy form.  In this case, the information we wanted wasn’t immediately available.  But with a little data extraction, spreadsheet jockeying and using Fusion Tables to merge two data sets, we can get to what we want fairly quickly. 

Keep your eyes open for these data-merge possibilities.  Once you start looking around, you’ll see all kinds of merging that leads to insights about the world.  

9 comments:

  1. Nice tool demo, exploring areas relatively unexplored by both search engines and researchers. But I am sorry to underline that the question is more complicated than it seems and the aswer flawed : in your demo, if a particular language is spoken in several countries, it is counted many times, without deduplication which kinda misses the point of counting different languages...

    ReplyDelete
  2. Daniel

    Interesting method, but couple of problems of concern

    a) You have Argentina as 5 languages (in one of the image above), but Argentina has only 1, Spanish. French/German/English etc is not official language

    b) I don't think you can simply sum up the # of languages in your method. The question is # of languages spoken, for example, if Chile and Argentina both speak Spanish, it should be 1 language spoken between the two rather than 2.

    c) Not sure how you adjust that, but Switzerland has 4 official languages, German/French/Italian/Romansh, the infoplease link does not explicitly make it clear that Romansh is an official language

    Let me know your thoughts on this

    b)

    ReplyDelete
  3. Daniel

    Interesting method, but couple of problems of concern

    a) You have Argentina as 5 languages (in one of the image above), but Argentina has only 1, Spanish. French/German/English etc is not official language

    b) I don't think you can simply sum up the # of languages in your method. The question is # of languages spoken, for example, if Chile and Argentina both speak Spanish, it should be 1 language spoken between the two rather than 2.

    c) Not sure how you adjust that, but Switzerland has 4 official languages, German/French/Italian/Romansh, the infoplease link does not explicitly make it clear that Romansh is an official language

    Let me know your thoughts on this

    ReplyDelete
  4. Interesting tools and process - merging does make the world go round, but often muddies insights.
    Language is a virus, and not always a healthy one.
    " A big part of answering questions like this is making your terms very clear."
    "After a few glasses of wine, various complex and learned theories were bandied about, and in the end we boiled it down to one really interesting search problem that we couldn't resolve with a quick Google search."
    ... clearly, the answer to this type of question is: more wine
    another way to "talk" about language:
    GDP

    ReplyDelete
  5. Congratulations for the hard work, but I'm more surprised that as much. I see the France which has only one official language (a few regional dialects or languages can be learned in some schools, but not performed by 0.01% of the population of the country, and are not all used in the life of every day.) (This is just to safeguard the cultural heritage), and you feel that it has a number equal to or greater than the Iran. Very nice work but errors!

    ReplyDelete
  6. Can you share the data you gathered? The Fusion Table, The spreadsheets?

    ReplyDelete
  7. Is not an error. Officially the state of Argentina recognizes only one language Spanish, but 2 provinces also recognizes other languages. Those are guaraní, qom, moqoít and wichí. The first one in Corrientes and the others in Chaco. So technically you could say that are 5 officially spoken languages in Argentina.

    ReplyDelete
  8. To Gero

    I am not particularly certain of the situation in Argentina, I know Spanish is the official federal level language.

    If what you described is the case, then you will need to handle a whole slew of issues about a state level language being different from a federal language.

    Give some examples
    a) Italy - Italian, but the northern province recognize German as a state language.
    b) China - Do you recognize Chinese, Cantonese, Shanghaiese and whole lot of regional languages as well?

    I think the problem becomes unattainable if you drill down to state level, and since we are talking about official languages, we need to be completely certain that we are talking on the federal level.

    Daniel, I think it is tremendous work but I do think there are some errors in the analysis.

    ReplyDelete
  9. The same case as in Spain: there are, within the territories (called Comunidades Autónomas) official languages: euskera in the Basque Country, catalonian in Catalunya and galician in Galicia. They are really considered as official in the Spanish Constitution.

    I can also see the problem about considering the spanish spoken in Argentina and the spanish spoken in Chile, Uruguay, Paraguay, etc as different languages. Below equator spanish is spoken in X countries is the correct answer to the question as it was made. But, if we understand literally the question (Answer: Are there more languages ABOVE or BELOW the Equator?) the one and only answer would be: depending on what language every man speaks. I meam, if there is an irish speaking gaelic in South Africa, gaelic is spoken below the equator. So, the point is that we don't understand the question that literally. And, I think that Daniel's answer is correct, despite there are another posiible answers.

    Sorry for my english. Don't consider me as a spaniard speaking english below the equator.

    ReplyDelete