|Illustration from Tony Hirst showing how to use Refine to clean data.|
I just found that Tony Hirst has written a very nice blog post that gives ALL of the details about that data wrangling step.
I admit that I gave it a little bit of short shrift. I said something like "... once you have that page, you can copy/paste the entire table into a text editor, and with a few passes, you can convert it into a nice CSV for importing into your favorite spreadsheet or visualization tool for analysis." That's a little glib, especially since that's what took all of my time.
So I'm really happy to give you the link to Tony's writeups on this.
Actually, he did TWO writeups.
1. How to wrangle the data using Google Spreadsheets.
2. How to wrangle the data using OpenRefine.
Tony shows you a couple of techniques that I know about, but opted to NOT write about in the post. (My post was getting too long as it was.) For instance, he shows how to use:
as a way to wrangle a table from a page and into Google Spreadsheets.
But he ALSO shows how to use regular expressions in Google Spreadsheets (a technical way to do find-and-replace operations with sophisticated pattern-matching). I didn't know you could do that! Tip of the hat to Tony!
And it's worth reading Tony's second article about how to use OpenRefine. It's a powerful tool for transforming data from one form to another. I'll write about that myself one day. (Or maybe I'll just point you to Tony's writeup!)