Friday, September 19, 2014

Answer (delayed)...

Sorry folks... this really hasn't been my week at all.  After coming back from my dive trip, I'm still working through the tail-end of my cold/flu thing, while simultaneously trying to get a bunch of unexpected things done at work.  Usually I can just get up earlier in the morning and get my SRS writing done, but this week I just couldn't quite pull it off.  

Luckily, the weekend is coming, and I'll catch up on this long conversation on Monday.  More maps, more analysis, more information.  

Have a great weekend!  See you then. 

-- Dan 

Wednesday, September 17, 2014

Answer (Part 1) to: Can you find the places Twain mentions in "Around the Equator"?

I have to start off by saying that this really is a complicated and difficult challenge.  But the SRSers rose to the challenge.

Answering this is slightly complicated as well, so I'm going to write this up in two (or three) parts.  

Here's installment #1, which is really a story of how to keep digging in, learning things along the way, and finally coming up with something that works.  

Entity identification in arbitrary text.    

When I sat down to do this Challenge I had an advantage--I already knew about the idea of "entity identification" (aka "named-entity recognition").  The idea is that your computer can scan a text (say, "Around the Equator") and automatically identify named entities--the names of cities, rivers, states, countries, mountain ranges, villages, etc.    

Just knowing that this kind of thing exists is a huge help.  All I figured I'd need to do is to find such a service and then use it to pull out all of the entities from the text. 

My plan at this point was just to filter them by kind, merge duplicates, clean the data a bit, and I'd be done.  

But things are never quite this easy.  

My first query was for: 

     [ geo name text entity extraction ] 

which leads to a number of online services that will run an entity extractor over the text.  

The one I tried first, Alchemy, looks like this: 

You can see that I downloaded the fulltext from Gutenberg onto my personal web server ( and handed that link to Alchemy.  

I thought that this would be it--that I'd be done in just a few moments.  But no.  Turns out that you can't just hand Alchemy a giant blob of text (like the entire book), but you have to do it in 50K chunks.  

That is, I would have to split up the entire book (Twain-full-text-Equator-book.txt) into a bunch of smaller files, and run those one at a time.  

Since the entire book is 1.1Mb, that means I'd have to create 22 separate files, each with 49,999 bytes.  

I happen to know that Unix has a command called split that will do that.  I used the split command to break it up into 22 files and I moved those all back out to my server.  

At this point my natural inclination would be to write a program to call the Alchemy API.  The program would basically be something like: 

for each file in Twain-Docs: 
     entities =  Alchemy-Api-Extract-Entities( file ) 
     append entities to end of entitiesListFile 

Which would give me a big file with all of the entities in it.  But I didn't want this to turn into a programming problem, so I looked for a Spreadsheet solution.  

Turns out that Google Spreadsheets has a function that lets you do exactly this.  You can write this into your spreadsheet cell:  

     =ImportXML (url, xpath)  

where url is the URL of the AlchemyAPI and xpath is an expression that says what you're looking for from the result.  

Basically, the url looks like this:  (I learned all this by reading the documentation at

Let's decrypt this a bit.... 

The first part:

tells Alchemy that I want for it to pull out all of the "RankedNamedEntities" in the text file that follows. 

The second part:  apikey=XXXXXXXXXX

tells Alchemy what my secret APIKey is.  (Note that XXXXXXXXXX is not my API key.  You have to fill out the form on Alchemy to get your own.  It's free, but it's how they track how many queries you've done.)  

The third part:  &url=

is the name of the file (neatly less than 50K bytes long) that I want it to analyze.  

Now, I make a spreadsheet with 22 of these =ImportXML(longURL, xpath

Here's my spreadsheet (but note that I've hidden my APIKey here).  

You can see the "Alchemy base url:" which is the basic part of the call to Alchemy. 

The "Composed URL" is the thing we hand to ImportXML.  That is, it's basically the: 

     AlchemyBase + analysisAction + APIkey + baseTextFile

Remember that the spreadsheet function ImportXML takes two arguments--the first is the URL to call Alchemy (which has the link to the file built into it) and an XPath expression. 

What's XPath?  I did the obvious search to find out.... 

     [ xpath tutorial ] 

and found a nice little intro to XPath.  Turns out that it's a kind of language for reaching into XML data and pulling out the parts that you want.  (It took me about 15 minutes to read up about XPath, and then figure out that all *I* wanted was to pull out the entities from the XML that's being imported.  In short, all I needed was the XPath expression:  "//entity" as the second argument.  

Then, for each of the 22 files I split up from the original text, I created a separate spreadsheet, cell A1 gets the magic ImportXML function.  In this case, A1 on spreadsheet A has the ImportXML function that looks like this: 

   = ImportXML ("
            Twain-part-aa", "//entity")  

Here's what the sheets look like after the ImportXML function runs.  This is the Alchemy analysis of Twain-part-aa (that is, the first 50K bytes of the book):  

Looks pretty good, eh? 

I did this same thing 22 times, one analysis for each of the 22 sections of the book (Twain-part-aa through Twain-part-av).  

Then I copy/pasted all of the results into a single (new) tab of the spreadsheet.  I used paste-special>values so I could then do whatever I wanted with them.  

That new page of the spreadsheet looks like this.  

Remember that Alchemy is searching for MANY different kinds of entities (as you can see: HealthCondition, Person, Organization...) 

What we want is just the geographic entities.  This means I can now use the spreadsheet Filter operation.  (Click on cell A1, then click on Data>Filter.  It will popup a menu with all of the values you can filter on.) 

Here you can see that I've already deselected "Crime"-- so all of the "Crime" entities will be filtered out of the list.  

Once I've filtered the list, I'm nearly done.  I can selectively filter for only the geographic entities I care about (City, Country, GeographicFeature, StateOrCountry...).  And my spreadsheet now looks like this: 

This list now has 567 placenames in it, many of which are duplicates.  To create a new list of only the unique names, I'll use the =Unique (range) function to create another tab in my spreadsheet with the unique names.

This gives me a sheet that looks like this: 

Now we have 283 unique entities. 

This column (which I sorted into alphabetic order) looks pretty good, although there are a few oddities in it.  ("Ballarat Fly" is an express train to the New Zealand town of Ballarat. And "Bunder Rao Ram Chunder Clam Chowder" isn't a place name, it's just a funny expression that Alchemy Analytics thinks is a place. "Ornithorhynchus" isn't a place, it's the Latin name for a platypus...)  

So we still have some data cleaning to do.

But this is point at which we need to do some spot checking to see how accurate the process has been.  As is clear, it has included a few extra "place names" that aren't quite right.  This is called a "false positive."  By my count, the false positive rate is around 3% (that is, out of the 283, I found 8 clear mistakes).  

And that makes me wonder, how many "false negatives" are there?  That is, how many place names does Alchemy miss?

There's no good way to do this other than by sampling.  So I choose a section out of the middle of the text (Twain-part-ak, if you're curious) and manually checked for place names. 

I found about a 5% false negative rate as well... (including cities that should have been straightforward, like "Goa").  So this approach could be off by as much as 9 or 10%.  

Still, this isn't bad for a first approximation.  But there's more work to be done. 

In tomorrow's installment, I'll talk about some of the other approaches people used in the Groups discussion.  There are always tradeoffs to make in these kinds of situations, and I'll talk about some of those tomorrow as well.  Creating a map with all this data?  That's Friday's discussion.  See you then. 

Part 2... tomorrow! 

Search on! 

Tuesday, September 16, 2014

I'm teaching a class on Google Books next week (Sept 23rd, 2014)

Want to be Google Books wizard? If you're in Mountain View on 9/23/14, you can take my (free) class at the Plex.  

Register by clicking on this link.  

It starts at 6PM, runs till 7:30, with dinner (free!) to follow.  

You should already be a Books user (but I suspect that most of you reading this are)... 

Feel free to pass around to folks you know in the Mountain View / Palo Alto / Santa Clara / Menlo Park area that might have an interest in this. 

See you then. 

- Dan 

I'm back from vacation... and trying to catch up with your work!

Hi folks.  

I'm now back at home, reading through all of the comments and ideas, all of the back-and-forth everyone's been posting since I left.  Two quick comments spring to mind... 

1.  You guys worked really hard on this!  I see lots of evidence of people putting out ideas, other people testing them, and then other people doing some work, and generally building atop each other's investigations.  This is superb, and exceeds my wildest expectations.  Thanks.  

2.  This is a really hard problem.  I started working on it yesterday, and it's taken me about 4 hours thus far.  (Including dead ends.)  But I see the end, and it will come it at around 5 hours to complete.  (Not counting the writeup.)  It won't be perfect--there will be places mentioned in the text that will be missed--but we should be able to get pretty good accuracy.  Details tomorrow. 

To slightly complicate things, I had a great time on my vacation.  Turns out the resort did have Wifi, but it was a bit spotty; trying to do any real work would have been crazy-making.  

The good news is that the South Pacific was fantastic.  

And the bad news is that the moment I got home I was hammered with a bad case of the flu, so I'm barely functioning.  My solution won't be as clean and beautiful as I would have liked, but it'll be there. 

More tomorrow.  I'll post a few comments in the group today, but the answer will be on Wednesday.  (With no new challenge this week.  You've worked hard enough.  Take a week off yourself!)  

Dan enjoying the surface interval between dives.

Wednesday, September 3, 2014

Search challenge (9/3/14): Can you find the places Twain mentions in "Around the Equator"?

As I mentioned in my last post, I'm about to head out for a few days of SCUBA diving in an exotic, tropical (and undisclosed) location.  Who knows?  I might want to use some of things I pick up there as future Search Challenges! 

This week's Challenge is one that I've wanted to do for a while, but never quite had the time (or nerve) to post it as a Challenge.  

It's fairly tricky, and will require some new skills on the part of Search Researchers.  But I'm confident that you can do this.

Here's the Search Challenge for today: 

Background:  I remember reading Mark Twain's Following the Equator as a schoolboy and completely enjoying the story.  I was also amazed at all of the places he visited.  I know he made it to Hawai'i and Australia, but he also seemed to visit much of the world... and in 1895.  By ship.  Suppose I want to do his trip over again.  Where all would I have to go?  
Challenge 1:  Can you figure out all of the place names he mentions in the book?  The link above is to the Gutenberg Project's plain-text version of his book.  Can you figure out some way to determine ALL of the place names he mentions? 

Example: The first two paragraphs of the book are... 

"The starting point of this lecturing-trip around the world was Paris, where we had been living a year or two.
We sailed for America, and there made certain preparations.  This took but little time.  Two members of my family elected to go with me.  Also a carbuncle.  The dictionary says a carbuncle is a kind of jewel.  Humor is out of place in a dictionary." 

In these paragraphs he mentions "Paris" and "America."  Those should be the first two entries in your list of placenames.  

Now, can you figure out ALL of the OTHER places he mentions in the course of the text?  

(And yes, I know he mentions a lot of places he doesn't actually visit; that's okay, for our list let's include every place he writes about and not worry about whether or not he actually visited there.)  

Obviously, you don't want to do this by hand.  So the question really is, can you find a way to solve this problem using SearchResearch methods? 

Challenge 2:  In case anyone finishes this early... Can you then create a set of Placemarks on Google Earth to show all of the places mentioned in your list of placenames?  Ideally, you should give us a link to your KML file with all of the places Twain mentions in the book.  

This is probably the most sophisticated Challenge I've issued--which is why I'll write up my answer in about 2 weeks.  (Note that I haven't yet solved this myself; but I'm confident that I can.)  

As mentioned, I'll be out-of-town for the next 10 days, so we won't have a Challenge next week (Sept 10).  Instead, I'll write up my solution on Wednesday, Sept 17th.  

I'm also going to be off-the-grid (mostly), so I won't be able to approve your posts to the blog after Thursday.  (Well.. probably.  I will try to check in; but I'm not sure about Wifi coverage where I'm going.)  

So I set up a Google Group for everyone to discuss this Challenge.  For this problem, we can have our discussion in SRS Discusses Around The Equator.  (Click on that link to join the group.)  This way, I won't need to manually approve every comment to the blog (which is what I do now).  

As I said in the Welcome message for the group, this is a no-hold-barred Search Challenge.  If you want to work together, be my guest. You can set up Hangouts to meet and chat about possible solutions, you can swap ideas about how to solve it... Whatever works for you.  

It's a two week Challenge.  Are you up for it?  Can Team SearchResearch do it?  

Search on! 

Friday, August 29, 2014

Answer: What are these plants?

This week was obviously far too easy.  

Or, the SearchResearch readers have been developing their research skills!  

I'll assume it was the second. In either case, very nice work.  Some people knew the answers off the top of their heads, which goes to show the value of a great social network--you can quickly tap into the collective knowledge base (and superior recognition skills) that your extended personal network has.   This isn't to be undervalued!  As Howard Rheingold illustrates in his new book, Net Smart, there is a value and a quality of participation that links together bloggers, netizens, tweeters, and other online community participants.  This set of people and networks form an online collaborative enterprise that can contribute new knowledge to the world in new ways.  And best of all, it forms a personal knowledge network that you can tap into.  

But we'll talk about that in another post.  Today, let's figure out how to search for the answers to these challenges.  

This week I showed the following three images and asked the obvious question--what are these plants?  Here's what I did to answer each question:  

1.  I found this under a redwood tree in a lawn at one of the Google buildings.  I visited here every day for a week, and took this series of pictures over a couple of days.  It's shady here, but as you can see, it's just the lawn under the canopy of the redwood.  What ARE these things? What's the genus and species name?  

A few people reported success with doing Search-by-Image (and that's a great approach).  But I did a simple series of searches: 

     [ mushroom dissolving ] 

Why that query?  Because this transformation (from left to right in the images) happened over a short period of time (about 2 days).  This was easily the most striking thing about this mushroom.  Sure mushrooms often fall apart quickly, but the way the edge of the mushroom just... "dissolved"... was remarkable.  So I chose "dissolving" as one of my key search terms.  And sure enough, the first hit was to the Mushroom Appreciation site where I learned this is the Coprinus comatus, the "Shaggy Mane" mushroom, aka "Lawyer's Wig" or "ink caps."  

I then did a search for the binomial name (that is, Coprinus comatus) and found lots of corroborating evidence (and more images that match very closely).  As a few other folks did, I discovered (their page on Shaggy Mane) and liked their level of detail in describing how to identify the particular variety.  

As Mushroom Appreciation writes:  "Like a frightened squid or exploding pen, this mushroom releases a black liquid that is laden with spores. As it matures it will deliquesce, meaning it will appear to melt away until only the stem is left."  (That word, deliquesce, was new to me, so I did a [ define deliquesce ] -- a lovely term meaning "to become liquid"!)  

There's also a section on the Mushroom Appreciation site that gives details about how to identify this mushroom (and possibly similar-appearing mushrooms).  

Apparently this mushroom is also edible, although a bit delicate to prepare.  (And you have to move quickly from "just picked" to "just cooked," as they'll deliquesce not long after you pick them. 

 { As always, don't eat any mushrooms until you've taken a class in mycology and identification!  It's easy to get really sick or die after eating a mis-identified mushroom. } 

2.  Here's another thing I found sticking up out of the soil in my garden.  This is a particularly well-watered section of the garden--you can see the green beans growing in the background.  Just before I took this picture, the brown parts at the tip were covered in flies.  I know why, because it smelled terrible--a bit like rotting meat--perfect fly attractant.  Unfortunately, I only got one good picture.  I took several, but it was in a somewhat difficult to reach place, and this was the only one in good focus. It's about 5 inches long, and seemingly appeared overnight.  What IS this thing?  (And should I be worried about it?)  

For this, the most salient search clues would seem to be: (a) it smells really bad, and (b) it's growing in my garden.  

I'm going to include "garden" in my search term because I mostly seem mushrooms in lawns, or in woodlands where the places mushrooms grow is fairly stable over time.  Since garden soil is churned up at least twice a year, the mushrooms that grow in that kind of place would seem to be very different than "ordinary" mushrooms.  

So my first query was: 

     [ stinky mushroom garden ] 

Which gave me the following SERP: 

See that row of images?  This is called "Universal blended Images" (because the algorithm "blends in" image results into the regular search results. 

This kind of thing happens only when there's pretty strong evidence that your search terms are all included with the texts describing these images.  

I was also struck by the appearance of the word "Stinkhorn" on the page several times.  What a strange thing!  

To evaluate this page, I clicked on the row of images to see what was there.  It's a surprising set of mushrooms.  Such shapes and colors!  And all, apparently, stinky.  

When cruising through the images I found a couple that looked very similar to the picture I took.  When I clicked on the first one that seemed very similar, I found myself back on on a page titled "Stinkhorns: The Phallaceae and Clathraceae."  

There's that word again:  Stinkhorn.  (And two genus names as well, Phallaceae and Clathraceae.)  

I read the MushroomExpert page about Stinkhorns and found an identification key at the bottom of the page.  This makes me feel good about the credibility of the content:  Good botanical guides will have "keys" like this to help you winnow out the various possibilities.  
Here's what their key looks like: 

Start at step one.  Answer the question.  If it's true, then you know it's Stahelimyces cinctus.
If that's "Not as above" then jump to question 2.  Proceed like this, answering questions
and following the flowchart.  If the the "spore slime occurring on the inner surface..." then jump
to question #12.  
Sure enough, if you run through their key, you'll find it's a Lysurus mokusin, the "Lantern Stinkhorn."   

Curiously, for something that smells so bad, it is "... considered to be edible when still in the immature "egg" stage, and is thought to be a delicacy in China. When mature, its foul odor would deter most individuals from attempting consumption..."  

No kidding.  

 3.  While running through the Stanford Industrial Park (where HP headquarters, Varian, Xerox PARC, and a bunch of Silicon Valley research labs are located) I found the bush below covered in red berries.  Each berry is around 1 inch in diameter, and the bushes themselves are used as hedges.  It's an attractive plant, and I can see why you'd plant long stretches of this between buildings.  Oddly, I've also seen this plant grown as a tree with a trunk planted as a decorative planting along sidewalks.  And if I recall correctly, I remember there's some connection with Madrid.  What kind of bush/tree is this?  And what's the connection with Madrid? What's the genus/species name?  

Most SearchResearchers seemed to have ID-ed this by using "Search-by-Image," and that's a fine way to do it.  (The trick seems to have been to crop the image down.) 

But I have to admit to doing the relatively simple description of the most obvious feature: 

     [ strawberry tree ] 

As luck would have it, the first page of results are all about this tree.  I had no idea that it would be THAT easy to identify.  

As everyone seems to have figured out instantly, this is the Arbutus unedo, aka the "Strawberry Tree," that's commonly planted in temperate climates as a reliable hedge or ornamental.  

Interestingly, Arbutus unedo was one of the species described by Carl Linnaeus in Volume One of his landmark 1753 work Species Plantarum, giving Arbutus the name it still bears today.  This book was the landmark work that set up the whole binomial naming scheme that we still use today.  (That is, the Genus species names that we give to organisms.)  Given that so many of the names assigned to plants have changed over the past 250 years, it's remarkable that Arbutus unedo still has the same name!   

Wikipedia entry says:  "The fruit is a red berry, 1–2 cm diameter, with a rough surface. The fruit is edible, though many people find it bland and mealy.  The name 'unedo' is explained by Pliny the Elder as being derived from unum edo "I eat one," which may seem an apt response to the flavor."

Fact checking:  For the good of the blog, I ate one of the ripe berries.  (After, of course, checking that I had my identification down correctly.)  And I can report that they are mealy, with a fairly bland flavor.  It was more-or-less a "meh" experience. 

But unlike a real strawberry, the "mealiness" of the undeo meant that I kept picking little bits of the fruit out of my teeth for hours afterwards.  The fruit is really a composite of many tiny bits, so it was a bit like eating a slightly fruity ball of cornmeal.  You could eat many of these and survive, but you probably wouldn't want to do so.  

To make the final connection, I double checked Wikipedia's comment about the undeo fruit appearing on the herald of Madrid.  

     [ Arbutus Madrid ] 

leads to many confirming pages, including a site that specializes in heraldry, and confirms that the bear is eating the fruit of the Arbutus unedo.  

Search lessons 

1. Sometime the obvious search is exactly right.  I find people often overly complexify their searches.  Try the obvious search ( [strawberry tree] or [mushroom dissolving]) and you might well be surprised to see that this is the way many people have written about the topic; meaning that your obvious search will lead to the obvious results.  

2.  Search by image is great (especially if you use the cropping trick).  As a few readers found (cropping the image to get to just the important parts), this works remarkably well.   When cropping, choose the parts that you think other photographers will likely focus on.  

3.  The presence of a identification key marks botanical pages as being serious works.  If you've read many plant or flower identification sites, the ones with a "key to identification" tend to be pretty serious sites.  Yes, the keys can be intimidating, but it tells me that someone has gone to a LOT of trouble to help us understand how to figure out what kind of plant this is.  You don't just toss off a key in a few minutes--they take a lot of time and effort to create.  Any site that has one (that they've created) is probably a pretty decent reference source.  

Next week... a real challenge--a two week challenge (as I'm going on vacation for the first 2 weeks of September)!  Get your search skills out, and get ready to research! 

Search on! 

Thursday, August 28, 2014

What are those plants? (AND search-for-character now in Google Docs)

It's clear that the SearchResearchers don't need much of my help on this week's challenge.  You've been zipping along quickly and painlessly.  Nice work.  

I'll write up my method tomorrow, although I doubt any of you will be surprised!  

If you remember a few episodes ago I wrote about How to search for a special symbol, and gave a link to as a handy method to find special characters  (such as Cherokee "U" -- áŽ¤ , or the infinity symbol --  )

You might have noticed that this same capability is now in Google Docs.  (See: inserting special characters)  

When you open a Google document, spreadsheet or presentation, just go to the Insert menu and click Special characters.  You'll have multiple ways to find a special character: browse by categoy, search by keyword (example: arrow), enter the Unicode code point (example: 2195) or (best of all)...  draw the character.

Here's another example from Google Drive's Google+ page  (I hadn't thought of searching for an emoticon...)  

Another couple of examples--here I draw a contour integral symbol: 

Or, something that lots of teachers might want to search for, in this case I search by name: 

Enjoy searching! 

Wednesday, August 27, 2014

Search Challenge (8/27/14): What are these plants?

When I go out for a run I usually carry my phone or a camera, just in case I find something odd, peculiar, or spectacular.  It's a quick matter to grab the phone and capture the odd plant, animal, street sign, or atmospheric condition that I'd like to understand.  Then, when I get back to my computer I can look up the things I've captured and get a bit more of an inside story about where I'm running.  

This week I found a few amazing things that I'd like your help in identifying.  The Challenge is the same for all three:  What IS this?  And what can you find out about it? 

1.  I found this under a redwood tree in a lawn at one of the Google buildings.  I visited here every day for a week, and took this series of pictures over a couple of days.  It's shady here, but as you can see, it's just the lawn under the canopy of the redwood.  What ARE these things? What's the genus and species name?  

2.  Here's another thing I found sticking up out of the soil in my garden.  This is a particularly well-watered section of the garden--you can see the green beans growing in the background.  Just before I took this picture, the brown parts at the tip were covered in flies.  I know why, because it smelled terrible--a bit like rotting meat--perfect fly attractant.  Unfortunately, I only got one good picture.  I took several, but it was in a somewhat difficult to reach place, and this was the only one in good focus. It's about 5 inches long, and seemingly appeared overnight.  What IS this thing?  (And should I be worried about it?)  

 3.  While running through the Stanford Industrial Park (where HP headquarters, Varian, Xerox PARC, and a bunch of Silicon Valley research labs are located) I found the bush below covered in red berries.  Each berry is around 1 inch in diameter, and the bushes themselves are used as hedges.  It's an attractive plant, and I can see why you'd plant long stretches of this between buildings.  Oddly, I've also seen this plant grown as a tree with a trunk planted as a decorative planting along sidewalks.  And if I recall correctly, I remember there's some connection with Madrid.  What kind of bush/tree is this?  And what's the connection with Madrid? What's the genus/species name? 

As always, you can click on the picture to see the larger, more detailed version of the image.  (And no, there's no useful metadata here in the EXIF of the images.)  

If you would, please let us know HOW you figured out the answers.  What tools did you use (if any)?  What search queries worked for you?  And what sidetracks did you take (and then get out of!).  

I hope these plants are exotic enough for you.  I have to say that I was surprised by each in my searches, and I hope you find this interesting as well.  

Search on! 

Monday, August 25, 2014

Erratum: Leon Czolgosz did NOT live in the Oneida Colony

As you might have noticed by now, I'm not perfect.  In fact, I'll wager that all investigative reporters (and the occasional SearchResearch blogger) make mistakes somewhere along the line.  It's inevitable. 

But an honest writer will try to fix their mistakes--that's what an erratum is all about.  In fact, if the web site you're using as a high-quality reference does NOT have a way to update their materials, you might consider that they're not such a great source.  Good newspapers, good reporters, good books all have some way to fix the record.  

Let me illustrate by example. 

Last week a SearchResearch reader, Joel Meltzer, a former resident of the Oneida Community Mansion House in Oneida, wrote to me to point out that: 

At some point someone misunderstood the fact that ANOTHER presidential assassin was an Oneida Community member, and drew the mistaken conclusion that Czolgosz was a member.  This was then added to the Oneida Community Wikipedia page.  (It has since been removed).  The statement repeated over and over again, is that Czolgosz was "briefly a member" of the community.  No one ever goes into more detail because there is no detail.  It just isn't true. Again, he was just a young boy when the community disbanded and he didn't live in Oneida!

The writer then correctly pointed out that I made this same error in my post on July 22, 2013 post "Answer: What's the connection between President McKinley's assassin and "free love"?"  

Well, that's an interesting claim... and I wondered what could have happened.  

Luckily, I have pretty good notes about writing that post, so I went back and reconstructed my searches and zeroed in on what went wrong. Here's my reconstruction: 

What went wrong  The question for that week was "What's the connection between President McKinley's assassin and "free love"?"  

In my post, I showed that Searching inside of the Google News Archives, it was simple enough to find multiple references to Noyes use of the phrase "free love."  And then a quick look in Google Books for [ Noyes "free love" ] lead me to Without Sin: The Life and Death of the Oneida Community, Spencer Klaw (1994) where you can find that "in the late summer of 1852, in an article in the Circular [the Colony’s newsletter]  he [Noyes] boldly included “Cultivation of Free Love” in a list of principles that the community stood for." 

So he's the guy who gave the notion of "Free Love" some currency. 

Now, when I looked for a connection to the assassin of President McKinley, I wrote:  
"Leon Czolgosz, who shot President McKinley at Pan-American Exposition reception on September 6, 1901.  Czolgosz, a native of Michigan and an avowed radical anarchist ( who hung out with people like Emma Goldman) was, for a short time, a member of the Oneida Colony. "  

Ever assertion like that needs to come from somewhere, and a good reporter tracks the origin (aka the provenance) of their facts.  A great reporter keeps his notes around for years just to be able to revisit questions of fact and inference.  

In this case, I had read Cults and Terrorism by Frank MacHovec where he writes 

"Charles Guiteau, President Garfield's assassin, was a 5 year Oneida member.  Leon Goglsz, for a shorter time, the assassin of President McKinley, was also an Oneida member. (Vowell, 2006)."    (emphasis mine)

That's where I got my information.  I should have been worried when MacHovec spelled the assassin's name incorrectly (it should be Cgoglsz, not Goglsz).  I admit that I did not check the reference to Vowell, 2006, but just assumed that MacHovec represented that information accurately. 

Prompted by Joel's question, I pulled out my notes, found the MacHovec citation quickly, and THEN checked (Vowell, 2006), which is by Sarah Vowell (and actually published in 2005 by Simon Schuster).  The Google Books link to Vowell's book.    

When I downloaded the book (which yes, I had to buy in order to scan completely), I read through every mention of Oneida and every mention of Cgoglsz... and none of them assert that Cgoglsz was a member of Oneida.  

So... I assume that MacHovec simply misread the book, or combined notes from different sources together and misplaced Cgoglsz at Oneida.  

Since I want to double-source everything, I looked up the Oneida colony history (from multiple sources) and found that they dissolved in 1881 (when Cgoglsz would have been 8 years old).  It's pretty clear from the biographies of Cgoglsz that he was working in steel mills from the age of 14, there's just not much possibility that he spent any time at the Oneida Community.   (It's also clear from reading a few bios of Cgogslz that he really didn't spend any time at Oneida.  Given how much detail these bios have, it's inconceivable that they would have omitted that detail of his life.) 

There it is:  Leon Frank Czolgosz, born in 1873, assassin of President McKinley, executed by electric chair in 1901, was never a member of the Oneida Community.  

On the other hand, Charles Guiteau was, for more than five years, in the Community (he later assassinated President James Garfield), so there is still a story line connecting the ideas.  Note that there's no causal relationship here (free love doesn't lead to becoming an assasin), but there is an interesting accident of history that these stories should cross.  

I'll go edit the original post to link to this.  Erratum duty discharged.  

Search on.  (Carefully!) 

Friday, August 22, 2014

Appendix: Answer: The shortest--and flattest--route there.

Really? These are crazy people. 

Oh yeah... I forgot the historical connection.  

Hannibal.  Elephants.  218 BC.  25,000 soldiers marching from Barca to Roma.  

If you do any searches with Alps, mountain passes, Oulx (etc) you'll find that the southern route is one of those that's proposed as the way Hannibal got his elephants from Spain to Italy.  It's a heck of a walk for an army, especially one that's got 37 elephants.  (That's the number he ended up with.  We don't know how many he started with.)  

And while historians debate exactly which mountain pass they hiked through (with elephants!), it had to be one of these routes. (Another version of the march from Barcelona to Rome.)  

All the other passes are worse!