Yesterday's challenge was simple:
Can you quickly create a visual representation
of world literacy rates by country?
I know that some of you will find this an easy problem--but in my classes, I find that lots of people don't even begin an analysis like this because they have no idea how long it will take. So the theme of this week's challenge is How to get started in visual data analysis!
This is an interactive world map I generated in a couple of minutes. (Obviously, this is just a static image of it--but when you click on any of those dots, you'll get the pop-up with literacy rate by country, male, and female literacy rates as well, if available.) Note that I'm NOT recommending this as a great visualization method--I'm including it here just to give a suggestion of the things you might do. I'm sure you can do a better job!
There are many ways to get this data: I'm going to quickly outline 4 different methods.
1. Get the data yourself, then convert it. The quick and easy search:
[ literacy rate by country ]
leads to the Wikipedia page List of countries by literacy rate. Given Wikipedia's coverage, I sort of figured it would be there.
And once you have that page, you can copy/paste the entire table into a text editor, and with a few passes, you can convert it into a nice CSV for importing into your favorite spreadsheet or visualization tool for analysis.
You might also notice that the source of the Wikipedia data is largely the World Bank data set. So you could easily go to that site and download their CSV files about literacy rates and work from there.
2. Work done by others. If instead you know suspect that other people would have done this analysis before, you could search for:
[ literacy rate analysis ]
And you'll get a bunch of Fusion Tables put together by various people. Including one that looks identical to the map I produced yesterday! Here's the working version mapping literacy rates onto an interactive map visualization.
Another way to think about this is to search for Google Fusion tables specifically.
I assume you know about Google Table Search? Never heard of it? The simplest way to find it is to search for:
[ Google Table Search ]
It will take you to the Google search that specializes on data tables (especially Fusion Tables). When you get there, it looks like this:
As you can see, it's pretty open-ended. But if you do a search for:
[ literacy rate ]
you'll find a lot of interesting tables including many with other visualizations.
3. Google Public Data Explorer. Yet another option is to look for a resource that has collected together lots of data from other sources (and provides a set of visualization tools as well).
Often, Google Public Data Explorer will be triggered when you search for data. It's triggered whenever you do a search like:
[ unemployment rate California ]
4. Search for Images of infographics about literacy. An image search that includes the context term "infographic" or "visualization" will often find charts and graphs for you. (But note that you STILL need to check that they used a reputable data source and that they did all of the steps leading up to the visual display!)
Okay... so now we've got the worldwide literacy data. What can we do with it?
This isn't really a blogpost about visualization or data mining, but I can't help but point out a couple of things to you.
You could just pour all of this data into a Fusion Table and create a chloropleth map. That's what I've done here--just imported the country names into one column, and then literacy rates, then selected "Chart>Map" and I get this:
|Overall world literacy rates. Darker green is more literate (98%+). White is NA. The darker the red, the lower the literacy rate in that country.|
I then made a simple chart showing the difference in literacy rate by gender (I just subtracted female literacy rate from male literacy rate since I had both of those data values / country). Here's that chart. (From this spreadsheet.)
Yeah... I know it's too wide to show accurately. (If you want to the whole chart, click on the image above or go see the original spreadsheet I made.)
But what struck me was the tail on the far right side. See that? There are some places on earth where the female literacy rate is higher than the male literacy rate.
I just used this data (maleLiteracy - femaleLiteracy) and mapped it onto the countries of the world.
|Literacy rates. Red are countries where female literacy is well below male literacy. Dark green are countries with the same literacy rates. Light green is where female literacy is slightly higher than male.|
Fascinating. I leave it to you to interpret why that part of the world has more literate males than females (adult population, 15+).
Better yet, I leave it to you to find even more interesting relationships in the data!