Wednesday, August 31, 2022

SearchResearch Challenge (8/31/22): How can you find answers to those mysterious and inarticulable questions you might have?

 Our world is full of mystery... 

DALL-E, "thinking hard in the style of Picasso"

.. and everyday I find myself thinking about some question or another that pokes at my curiosity.  Often, this makes me do some searching with the result that some number of these questions appear in SearchResearch!

As you've probably noticed, not all questions are clear, crisp, and simple to articulate.  Sometimes you need to figure out how to move from an inarticulate sense of a question to something that you can say aloud.  

This step--the conversion from an internal wondering to an externalizable question-about-the-world is often tricky.  Sometimes we internally censor ourselves before letting the transformation happen, sometimes thinking that this question is too dumb or that I can't figure out how to ask this thing... 

I've seen this happen with students: they get caught up in something, but don't quite have the language to pursue the thread of interestingness, and so they drop it.  

But SRS exists to help get you out of your internal stuckedness and express your inner, curious child!  

So today's Challenge has a few of these questions that occurred to me over the past week or so.  

The Challenge for us is to figure out how to overcome our lack of language (that is, our inarticulateness), get past this and pursue a search strategy to get some answers.  

1. When I wake up in the middle of the night, my head sometimes feels "fuzzy" or somehow strange and different--a little as though my brain isn't working quite right. I assume that this happens to everyone. If I stay up for a while, it goes away.  And of course, when I awaken properly (at a reasonable time), I don't have this feeling at all.  Challenge:  What is this feeling called?  Is there even a term for it? Does it really happen to everyone?  (Really, does it happen to you too?)  

2. I remember reading a paper many years ago about the psychology behind why people often can't talk very accurately about why they did something.  This comes up most often in psychology studies when people are asked "why did you do that?" and ask for an explanation.  People will give explanations about why they did something, but they're often not very accurate.  Challenge:  What is this effect called?  Can you find a scholarly article about why people are so bad at giving such explanations?  

3.  I know the word "colleen" is often used to refer to an Irish woman: for instance,"she's a lovely colleen".  Likewise, "shelia" is used in Australian English as a synonym for a woman, while in American English "jane" or "john" (Jane Doe, John Doe) often refers to a generic person.  Challenge: Is there a term for this idea, that some names are used as generic signifiers of categories of people?  Are there other names that are used in this way in English?  (Say, Indian English or Nigerian English.)  

For this suite of Challenges, I'm interested in your answers.. but I'm REALLY interested in how you got from "vague idea" (or at least the "vague scribblings of Dan") to something that you could use in an online search. Can you talk about the process you went through?  (And yes, I'm aware of the irony of asking this given Challenge #2 above... still, we have to try!)  

Please let us know your thoughts in the comments below. 


Search on! 


Friday, August 26, 2022

How to find anything #7: How to Find News and Late Breaking Information--Summary

If you recall... 


a while back (August, 2021) we started a series of posts on "How to find anything."  This somewhat outrageous claim promised a series of posts that, when collected together, might form a kind of How-To-Search book organized by topic.  That is, how to search for... something!  

As part of that series, we started working on "How to find News and late breaking information."  Here are links to the first 3 posts on this topic.  

#8.1  - How to find News and late breaking information

#8.2 –How to keep track of your news sources 

#8.3 - Assessing credibility of news sources 

And then, later that month, I buried the summary of these "how to find news" in a post we wrote a while back  that wasn't part of the series.  Oops.  

That was a tactical error: all that work, and then the summary got lost in the midst of the blog.   

So I'm going to fix this accidental hiding.  In this post I re-edit and re-surface that summary here, making it easier to find. Today's post is the seventh in the "How to find anything series" and number 4 in the How to Find News and Late Breaking Information--Summary miniseries.  

In any case, here are the basics of what you should know when searching for reliable and credible news (aka late-breaking information).  


A. When you have a strong response to a story, check it before believing it! 
  Many stories are often written to elicit a response, especially political stories.  When you find yourself being outraged, or remarkably pleased, consider yourself manipulated.  

That might be okay, maybe even desirable when you're reading fiction, but when it happens in your news reading, you should pause for a moment and try to read it without the emotion-inducing material.  Here's a made-up example: 

I can't believe Senator Smith voted for this outrageously expensive and immoral funding bill.  He should be barred from the senate for life!  What an irresponsible low-life.  

Now, if you read this without the over-the-top language, you get a very different read: 

... Senator Smith voted for .. this.. funding bill.... 

The rest of that paragraph is opinion.  You should form those opinions for yourself rather than just accepting the writer's point-of-view.  The opinion can be useful information, but when you find yourself reacting strongly to a story, try this "affect-free reading" style and see if you come away with the same information. 


B. Triangulate your sources.  The same story told from different viewers can be very different.  Don't make the mistake of thinking multiple sources means different reporters, different written accounts, or different points-of-view.  It is, unfortunately, all too common to copy/repeat a story.  (And even data sets.)   ESPECIALLY on news that comes to you via social media sources.  Check for duplicates.  


C. Pull from different kinds of sources. Images, videos, long-form stories, news reports... they're all very different.  They have different production cycles, different ways of being edited, and very different impacts on the reader.  A nice article as seen in hardcopy newsprint media is rather different than a 10-second video summary of the news.  Short videos are intentionally punchy, even if they have to distort a few things to get your clicks.  


D. Cultivate a set of sources that you trust. You really should get to know more than your one most-trusted source.  For instance, I tend to listen to and trust NPR radio for accurate reporting.  But I also know about the BBC (in the UK) and other news outlets in the US, each with their own point-of-view.   You really should be able to quickly get to the top 4 or 5 sources that you really trust... and be able to say why you trust them.  


E. Cultivate a set of sources that give you another perspective that you don't agree with.  Filter bubbles are real, but they're mostly bubbles that we create for ourselves.  Don't be a bubblehead!  Think about the set of resources you read all the time and make sure you vary your diet.  (I subscribe to a couple of very conservative news feeds that put articles in my email every day.  It's useful to see how other people think and what they find valuable / believable.)  


F. Understand the background and point-of-view of the source you're reading.  This is true no matter what your source.  Realize that (for instance) the Wall Street Journal tends to be more conservative in their reporting than the New York Times.  Realize also that any good source typically has a suite of different viewpoints within it.  (Beware of any source that doesn't have some built-in diversity--that suggests they're perilously close to having a party line in their reporting.)  


G. Look for reporting that originates close to the story.  Several people have pointed out that reporting on stories with local reporters can be incredibly valuable.  Beware of stories that are filed remotely, without any reportage from the story location.  It's too easy for people to write about what they're told, rather than what they've experience.  When in doubt, go for direct experience.  


H. Look for an author who has expertise in the subject matter.  Look at the writer's back catalog of stories--have they done this kind of thing before?  What have they done to give you a sense that they know what they're talking about?  (I'm very skeptical of writers who claim to understand the subtle and complex issues of the Middle East if they haven't spent substantial time there.)  


I. Do your own background fact checking.  You'll develop a sixth sense about this over time--you start to understand what basic facts to check in order to credibility-check an article.  If they don't get the basic facts right, then the rest of the article is dubious.  (Example: Check the numbers on a story--did they get those right?  When a place name is mentioned, check it out--does the image of the place agree with what's written about it?)  


J. Do things fit together?  That is, does what you're reading in an article fit with things you've learned before.  If this is a new topic area for you, this might be hard to evaluate.  But the more you learn about a topic, the better you'll be able to make these evaluations.  (And when you're stuck, spend a little time learning about the topic... it will make you a better judge of what you're reading.)  


REALISTICALLY... what will you do?

I don't know about you, but I'm pretty busy.  I don't really have the time and energy to check every single story I read.  You probably don't either.  In truth, almost nobody does.  So, what should you do at minimum?  

Typically, I do these four things for every story: 

1. Pay attention to the emotional level of the story If it's hot, I'll re-scan it by doing the "affect removal reading" trick of above.  Still interesting?  

2. Do a quick Google search to check on some basic element of the story. If it doesn't check out, I'm done.  And don't just do the easy things, but check on the slightly harder things to look up.  You're a SearchRearcher!  Prove it! 

3. If I don't know the author and/or the network (channel, publication), I'll do a quick search on them (often using the -site: trick to exclude their own self-serving articles).  This will sometimes show up a low-quality site pretty quickly, and if it's a high quality channel, that usually shows up as well.  

4. Browse and read laterally.  That is, don't JUST read the news story and the links they provide--the authors have heavy motivation to give you just corroborating connections.  Instead, don’t spend a lot of time on the page or site until you've first gotten your bearings by looking at what other sites and resources say about the source you're reading at the moment.  


This list looks long, but it's usually a pretty quick set of things to do.  Pay attention; check a fact or two; check the publication source; make sure all parts of the story are coherent; read laterally.  I can do this (and you can too!) in less than 1 minute.  


There are obviously a lot more things one can do.   But I hope you'll make these fairly straightforward steps a regular part of the way you search for (and read!) news.  


As always, Search On!  




Wednesday, August 24, 2022

Answer: Horses are native to... where?

  The term native is fraught.... 

White Horse in Field by Helena Lopes (Pexels.com)

... in many conversations these days, but everything comes from somewhere right?  

The Challenge for this week circles around the question "where did horses originate?"  

I'd read that horses in North America were brought here in the 15th century by Spanish conquistadors.  But then I read that there were horses in North America 10,000 years ago. They went extinct, and could only repopulate the continent with a little transportation help by the Europeans.  (I don't even want to think about traveling with horses on a 15th century sailing vessel!)  

In any case, this whole line of thinking brought up a deep question for me:  Are horses native to North America?  Or exactly where ARE they from? 

This leads to our Search Challenge this week: 

1.  Where did horses (as a species) come from?  That is, where are they native? 

As I said, for our purposes, we'll define "horse" as some version of Equus that developed roughly 5 million years ago.  

I like Adam's approach in the comment thread.  He asked the relevant definitional question, "What does native really mean?"  

     [ definition of native ] 

(Note that I didn't do [ define native ] because I wanted more detail on what a "native species" actually means.  For this post, I'm not interested in the socio-political ramifications of "native.")  

This led to the inevitable Wikipedia article, which tells us that: 

In biogeography, a native species is indigenous to a given region or ecosystem if its presence in that region is the result of only local natural evolution during history.  Every wild organism is known as an introduced species within the regions where it was introduced by humans. 

The notion of nativity is often a blurred concept, as it is a function of both time and political boundaries. Over long periods of time, local conditions change--so a native species might have to move to survive. As a consequence, their distribution is rarely static or confined to a particular geographic location. 

So, a native animal or plant is native (or indigenous) to a place if it evolved there.  Of course, "there" might shift over time as continents, coastlines, mountains, and river deltas come and go.  But for horses, what region did they first occupy? 

The story of horse evolution is long and complex (see this Britannica.com article for details on horse evolution), but the consensus of opinion is that by the time Equus emerges as the main type of horse, it had developed very clearly in North America, having evolved from Pliohippus some 4 million to 4.5 million years ago during the Pliocene. With time, it spread southward into South America and to all parts of the Old World by crossing over on the Bering land bridge by the early Pleistocene (that is, the Pleistocene Epoch:  from about 2,600,000 to 11,700 years ago). The Equus horses (there were many different species) thrived in the Americas throughout the Pleistocene but then, about 10,000 to 8,000 years ago, disappeared from both North and South America.  (See also A brief history of the horse in America: Horse phylogeny and evolutionBen Singer, Canadian Geographic magazine. Also: Mihlbachler, Matthew C., et al. "Dietary change and evolution of horses in North America." Science 331.6021 (2011): 1178-1181.)  

When the Bering land bridge was submerged about 10,000 years ago, the watery non-passage prevented any return migration of horses from Asia, and Equus was not reintroduced into its native continent until the Spanish explorers brought horses back from Europe.

So the big surprise to me is that horses are native to North America, but spread into Eurasia, then died out in the Americas around 10,000 - 8,000 years ago.  The horses of North America were then re-stocked by their descendents that crossed the Bering land bridge before it was swamped by the sea around 11,000 years ago.  (Source: a paper from the European Geosciences publication, "Climate of the Past")

2. What other animals are/were native with the early horses? Can you name a few of the megafauna that also lived in the same territory as the horse?  (I'm especially interested in other megafauna that might have interacted with horses.) 

This was relatively straightforward, but since I really didn't know the answer, I wanted to see a bit of the natural history of North America in the Pleistocene.  My query was: 

     [ North American megafauna Pleistocene horse

which gave me multiple lists of critters, including a complete list of the different kinds of horses that wandered the plains (stout-legged and stilt-legged horses such as Haringtonhippus or conversidens).  

There were also many other kinds animals roaming around--horses, camels (as known from a site where horses and camels were trapped and butchered on the spot), the Megalonyx (giant ground sloth 3m long, 1000kg), the American lion, the Glyptotherium (a large rounded tank-like armadillo-like creature), Aiolornis (a giant vulture-like bird), and Woolly mammoths (Mammuthus primigenius).  

While I'm not sure you'd see all of these in the same place at the same time, they were all co-existing with the horses of North America, and would have made for a wild safari.  


SearchResearch Lessons

There's really just one lesson from this week, and you've heard me say this before... 

1. Answering questions needs multiple sources.  Yes, it's pretty easy to find a single source to answer the Equus native point of origin story, but for confidence, I really wanted several sources.  I used Wikipedia, but also Britannica, European Geosciences, the journal Science, and the Canadian Geographic--all very different sources, but all reputable.  I was careful to find very different sources that linked together, but did not duplicate data or text.  

I was also looking for sources that would tell a complete story--not just where they started from, but also why they disappeared... and how/why they were preserved in Eurasia. That was a huge surprise, and I'm glad we were able to find out what really happened.  

Enjoy! 

Wednesday, August 17, 2022

SearchResearch Challenge (8/17/22): Horses are native to... where?

 Everything comes from somewhere.... 

White Horse in Field by Helena Lopes (Pexels.com)

... right?  

The other day I read that horses in North America were brought here in the 1400s by Spanish conquistadors.  As you know, they rode them all across what was once known as Spanish America. 

But then another day I read that there were horses in North America 10,000 years ago. 

What happened?  I know this part of the story--the horses of North America went extinct along with most of the other New World megafauna during the Quaternary extinction event at the Pleistocene-Holocene transition. 

While the causes have been widely debated, their disappearance was rapid. Was it climate change? (Beginning around 12,500 years ago, the grasses characteristic of a open plains ecosystem radically changed.)  Or was it people? Was it just due to overexploitation of large animals by those newly arrived humans. 

In any case, this brought up a deep question:  Are horses native to North America?  Or exactly where ARE they from? 

This leads to our Search Challenge this week: 

1.  Where did horses (as a species) come from?  That is, where are they native? 

For our purposes, we'll define "horse" as some version of Equus that developed roughly 5 million years ago.  Where did THEY develop?  Where are they from? 

The challenge here isn't really to find the information (that part is simple); the Challenge is to figure out what it means to be native (which I take to mean as "historically grew and developed in a particular place") and how we know that history about horses. 

Bonus Challenge: 

2. What other animals are/were native with the early horses? Can you name a few of the megafauna that also lived in the same territory as the horse?  (I'm especially interested in other megafauna that might have interacted with horses.) 

What can you find out?  How do you know?  

Enjoy! 

Search on! 



Friday, August 12, 2022

Answer: What's this rusty thing I found in the woods?

  Remember this? 

All photos P/C Dan. July 8, 2022

As you might remember, I went for a run and spotted something large, rusty, and hovering just behind the trees on the side of the road.  This time, I was jogging down a quiet country road in the eastern Pennsylvanian Poconos.  Naturally, I had to stop for a few minutes, take a few pictures, and save my SRS for later.  

As you can see, this is a very large, very rusty, very old drilling rig that was abandoned years ago.  It's at least 60 feet (20 m) tall, and has several large wheels at the bottom. 



And, naturally, my curiosity was piqued:  What was this?  Why is it here?  How long ago?  



(You can go back to the original post to see even more images that I took that day.)  

I grew up in Southern California, so I know that the pictures shown here are all of some kind of drilling rig.  FYI: These images are all from 41°16'52.9"N, 75°19'16.1"W (41.2813694, -75.323322).  

Those pics form the basis of the SearchResearch Challenges for this week.   

1. What kind of rig is this?  (Is it drilling for water?  Oil? Gas?)  

2. When was this drilling rig first setup?  

3. Who owns this thing?  And what's its current status?  (Obviously, it's not in operating condition--but it might still be a viable well.)  


As mentioned, I had a really good idea that this was an oil drilling rig (I've seen literally thousands of them during my formative years in LA).  So I started with this query:  

    [ oil drilling in the Poconos ]

The first results was to a map of oil and gas wells in Pennsylvania (on a site run by American Geosciences Institute).  

When you then click on the interactive map in the center of the page, it takes you to a map view of oil, gas, methane wells on a site run by the Pennsylvania state Department of Environmental Protection (DEP).  The map they produce let's you see this: 

Map adapted from PA state Department of Environmental Protection


We don't really (yet) know what kind of well this is, so we need to start our search broadly--so in the menu on the left, select all types of wells, then “all status,” then “well designation” (conventional and unconventional), then “Submit request."  You'll see a bunch of new dots appear. Finally, we can zoom into Mountain View Road in this corner of PA.  This is closeup view and the blue dot is the well we seek:  

THEN select the “I” (information) tool in the window (look at the menu that's in the upper left corner of the map), and now one can click on the blue dot and see: 





Combing through the data here (in particular, the "Display inspections" report), we find that this is: 

Site ID: 171768

Site Name: JENNIE HAAG TPC 19 OG WELL

“well record says that the well was temporarily abandoned” 

Permit:  37-103-2002 – issued 9-17-59

Owner: Transcontinental Prod Company 

744 Broad St NEWARK, NJ. 07102

Since there's really no other well nearby, and since the Site ID is very clearly the same as the location of my photo, I'm convinced this is the same site.  

I would have sworn that this rig was at least 100 years old, but the data is pretty clear--it's only from 1959!  

Farther down in the search results I also found WellWiki.org -- when you do a search for the duplicate information (using the permit number we found above) on their website, you'll get to:  https://www.wellwiki.org/wiki/37-103-20002 

WellWiki also gives easy access to the last well inspection (done on 2017-03-02), where the report was:  

Inspection of the Jennie Haag (103-20002) well in Greene Twp., Pike County conducted March 2, 2017 at 12:00 pm. The well is in the woods, East of Mountain View road (PA Gas Mapping coordinates and old location plat are accurate). You cannot miss the well because there is still a 60-70’ abandoned rig on location. I walked up to the rig, it was very rusted. Below the rig was a wooded cellar filled with dirt, leaves, wood, and some newer pieces of trash. There was a cut tree stump in the cellar where you would expect the well to be. I lifted it up; no well or monument. No discharge or flow. I dug around a bit and could not find a well. The well record says the well was temporarily abandoned; not plugged. However; A PA Geology report published in 1960 states this well was plugged and abandoned.

That pretty much accounts for the well.  Probably drilled in 1959, but it was a dry hole and abandoned in 1960.  I don't know why the Transcontinental Prod Company just left everything there in the woods--that's still a puzzle.   

I did the obvious few searches to find out more about TPC, but WellWiki.org was the the best source of accumulated information.  There I learned that they had 14 wells across Lackawanna, Luzerne, Pike, and Wyoming counties in Pennsylvania, and that they had their office at 744 Broad St, Newar, New Jersey 07102-3802 (which is a rather fancy building in downtown Newark!), but the obvious newspaper checks didn't reveal much.  

Interestingly, in pursuing this, I also found the website MineralAnswers that also aggregates information about oil/gas wells across the country.  It's a paid subscription, so I bought one year’s worth for around $40—anything for the SRS cause!  

Using MineralAnswers, I found that Transcontinental Prod Company all the same information--the only new data is that all of their permits were issued in 1957 – 1959, with no records after that.  What's more,  the status of their wells are all “Drilled uncompleted." They seem to be all gas wells that were dry.  It's not really a surprise when you look at the full map of PA wells--this particular well is well east of the productive regions of the state.  I spent a good deal of time searching for any trace of the company, but with luck like this (14 wells, NONE productive), I'm not surprised they vanished into the sands of time.   Ah, well.  

Map adapted from PA state Department of Environmental Protection


There are clearly a LOT of active wells in Pennsylvania, but Another nearby well (that’s currently flowing water, but no gas) in Pike county is the “Walter Hess TPC-5."  Water is always good, but I think they would have preferred oil or gas.  (Same company, different permit: 1958-5-2)


But now I was curious about what an old-fashioned oil drilling rig looked like.  Was this one typical?  Was this really used for drilling back then? 

My search was for:

   [ oil rig 1950s diagram ] 

 Leading to a bunch of nice images, including this one of a cable drilling rig that I've modified a bit to clarify the layout of the device. Note that the images above clearly show a cable drilling rig, not a rotary-drilling outfit, that's where the big wheels come from!   

Cable drilling rig diagram adapted from the Elsemere Canyon site about the history of oil drilling

You can see how this rig is the precursor to what I found in the woods... the derrick, the wheels and even the bands running between the wheels.  On the far left (in the little house, #33) is the engine (#20) that powers the whole thing.  Once you see this diagram, the wreck in the woods makes a lot more sense.   (You can see a modern cable drilling rig in this YouTube video.)  

SearchResearch Lessons 


1. Obvious searches can lead to useful sites that are not searchable!  At the beginning of this Challenge, we did a fairly simple search, but then had to spend considerable time searching on the gas/oil well database sites.  This is a reminder that not all information is accessible to Google... 

2. When using database sites, you have to know how to drive their tool.  In the PA EPA interactive web map, you need to figure out not just how to search for ALL well types, but also that you need to click the "I" (information) tool before clicking on the geo-located dot on the map.  The instructions aren't really clear, and that's fairly typical for lots of professional sites.  If you think there's a way to find the information, there probably is... you might need to spend some time exploring the site.  

3. Repeated data isn't additional credibility! (It's just a repeat.)  In this is episode we found three sites with well information.  But it's ALL THE SAME information! Most likely, the state EPA site is possibly the original source of the data.  (Why?  Because it's their job to collect and organize that information.)   


Hope you enjoyed this Challenge... even if it did take a couple of weeks to get back to this.  (Note that we'll be back on our regular schedule starting next week, August 17.) 
 

Search on! 




Wednesday, August 3, 2022

Answer: What's a large US city with very low population density?

Let's work backwards...   


... and answer the last SRS Challenge first... you remember, the one about finding the largest US city with a very low population density!

I'll answer the previous Challenge ("rusty thing I found in the woods") before next week and get us back on track.  

If you recall, our Challenge was:  "what large US city has the lowest population density as of 2020?"  

There are tables one can find that will tell you one answer, but I'd like you to solve this Challenge in a more direct way--a way that will teach you how to download data directly into a table and then manipulate it yourself.  

Can you do this hands-on data manipulation Challenge? 

Here's what I want you to do:  

1.  Search for a table of the largest US cities by population.  You'll want to find a table with at least 330 entires in it.  

2. Download that table into a spreadsheet. 

3. Compute the population density (if you need to... it might be a column in the data set).  

4. Sort the table by density, and then tell us what the city name is!  

Your table should look like the one above (hint: I got it from Wikipedia, but you can find your own source if you'd like--the diversity of data sources might be interesting).  

Here's what I did... 

First search for the Wikipedia table.  My query was: 

     [ wikipedia table largest US cities population ] 

which led me to the Wikipedia table List of US cities by population.  It's a classic Wikipedia entry with all the standard data disclaimers and information (e.g., This table lists the 331 incorporated places in the United States (excluding the U.S. territories) with a population of at least 100,000 on July 1, 2021, as estimated by the United States Census Bureau.  The table displays: (a) The city rank by population as of July 1, 2021, as estimated by the United States Census Bureau, (b) The city name, etc etc etc.)  

If you look carefully, you'll see that the columns can be sorted by clicking on the sorting widget at the top of each column in the table.  (You need to recognize that that's what these widgets do--it's part of your SRS visual literacy: recognize and understand what UI widgets are and what they do.)  


If you click on that widget, you can sort by population density: 


And voila, you've got the answer.  That's the fast and easy way to get to the answer.  

BUT... the point of the Challenge is to get you to figure out how to download this table and then manipulate it to get to the same answer.  (That is, the pedagogical point of this Challenge is to learn how to download data tables from the web and then how to work with them.)  

Do you know how to pull data tables off the internet and into your favorite spreadsheet?  

Well, the obvious SRS way to find out is with a search: 

     [ how to import data tables into Google sheets ] 

which will lead you to a number of sources, including this well-written and extensive post by Parul Pandey about Importing HTML tables into Google Sheets or this YouTube video from Teacher's Tech about How to Import Data from Webpages into Google SheetsThese are excellent resources, and for full details about how to do this, I recommend those page.  

For our purposes, I'll cut to the chase and point you to my Google Sheet with the population data in it.  This sheet looks like this: 

There are a couple of things to note here.  First, cell A1 has the magic function in it: 

     =ImportHTML (URL, "table", 5) 

which says to import the 5th table of that web page (the Wikipedia link) into the sheet as a table. I had to experiment a little to figure out that it was the 5th table, but I guessed it was #5 on the second try.  It's easy to just keep trying until you get the right table.  (Look at the Wiki page and count down from the top of the page.)  

This imports the 5th table into that location.  If you look back and forth, you'll see it's a complete copy of that data table.  

That's pretty straight forward.  



Next thing to notice: I put that table into Tab 1 of the sheet ("Datatable import"), and then did all my manipulations on a COPY of the sheet that I made in Tab 2.  If you look at Tab 2 ("Cleanedup data"), you'll see that it's where I did my cleaning up.  This is a good practice to follow--don't muck up your original data set as you're exploring.  

Note that when the data is imported, it's often imported as TEXT data, and perhaps not the numeric data you're seeking.  

In particular, column J is the 2020 population density in people/km2, and it's a text field, not a number.  So I initially wrote this formula to extract the number from column J.  


And that LOOKS right.  But as we know, appearances can be deceiving.  I thought it was right, but when I sorted the column, I noticed that the sequence was bizarre.  I saw patterns in the data that looked like this: 


... which is clearly very wrong.  The problem is that column O is all TEXT... and sorting text like this gives you a sort where 7,681 precedes 706. 

Once I realized that, I fixed up the extraction formula to give a real numeric value.  This is that formula (with the extra  =value(...)  in it):  


NOW I can sort by Column L ("Density Value") and see the right names of cities appear at the top: 



So... there's a quick and easy way (just use the built-in sort on the Wikipedia page), and also a more sophisticated way to download the data to your own sheet.  Of course, this then allows you to do more things with it--say, create a chart like this: 



SearchResearch Lessons


Two key points from today.  

1. Use the ImportHTML function to pull data tables from web pages into Sheets.  Incredibly handy when you need to get your hands on the actual data. 

2. ALWAYS check that your data is what you think it is.  In this story, I mistook what LOOKED like numeric data for numbers, but noticed that the sort order was all messed up.  When I converted those text numbers to actual numbers, sorting suddenly started working.  

I guess the summary is, as always, pay attention to what you're doing.  Keep asking yourself, does this make sense?  And when it doesn't, dive into full-on SRS mode and figure out what's happening.  

Search on!  





P.S.  Fairness requires that I point out that Microsoft Excel has a VERY nice import data function built into it.  It even lets you browse through the tables in the source web page rather than trying to figure out the number of the table to import.  See Importing Data into Excel from the Web.  It's actually very handy.