Thursday, August 16, 2018

Answer: How to find difficult to find web pages? (Part 1)


There are many reasons...   


... why a particular page might be difficult to find.  Sometimes your memory is just plain wrong; sometimes your memory is so generic as to not remember anything that would let you pick the right page out of thousands of similar pages; sometimes the page is really missing (that is, a 404 error--web page missing).  
The Challenges from last week are interesting examples of Difficult-Web-Pages.  Let's talk about how to find these, and why they're tough.  

1.  A while ago I remember reading an article about a famous US author that was having some difficulty editing his own Wikipedia page.  As crazy as it sounds, they wanted to have independent verification of what he was saying.  I found myself wanting to re-read that article so I could refer to it in my writing.  I needed to find it to confirm details.  This was my Challenge:    
Who was the famous US author that was involved in a dispute with Wikipedia over the accuracy of the entry describing his novel?   

I had a very clear memory of this article, but I couldn't remember where I read it.  You might think that the obvious query is something like: 

     [ Wikipedia famous author article challenged ] 

(or something similar).  The problem is that the query isn't specific enough--there are a LOT of web pages with these terms, so the results are pretty scattered--they didn't help me find what I'm looking for.  We have to find a way to change the query:  


A big part of the problem here is that many of the results are FROM Wikipedia.  (Makes sense, Wikipedia is a search term.) In fact, the first 40 results are all from Wikipedia.org  
What would happen if we excluded the Wikipedia results?  Would that improve our accuracy?
  
As you know, site: lets you search just within that site (e.g.,  [ site:Wikipedia.org ] )  
But how can we exclude a site?  That's easy: Use the minus symbol like this:  
     [ blah blah blah –site:Wikipedia.org ] 

Notice that small MINUS sign (aka a hypen) in front of the site: operator.  That means to search everywhere on the web, but NOT on this site.  

When I do this search, my SERP looks like this:  




See that 4th result?  (The one at the bottom of this image.)  This is exactly what I was looking for--a famous US author (Philip Roth) who was in a dispute with Wikipedia over the accuracy of the entry describing one of his novels.  As Roth wrote in his open letter to Wikipedia,  he was told that "...I, Roth, was not a credible source: 'I understand your point that the author is the greatest authority on their own work,' writes the Wikipedia Administrator – 'but we require secondary sources.'"  This dispute went on for a while, and to their credit, Wikipedia repaired the entry, and it stands as an accurate source of information about Roth and his work.  

Other approaches work too.  Reader SpiritualLadder found the answer with the query: 

     [ dispute with author over book Wikipedia entry ] 

And D. Lazar found it with: 

     [ author dispute wikipedia ] 

I tried a few other queries like this (that is, without the site:) and found it to be pretty hit or miss.  If you managed to guess the right words, you'd find the article.  Using the -site: operator gets you to the result pretty quickly. 

Several readers also found their way to the Wikipedia List of Controversies page (which is pretty interesting reading), and then found the article about Philip Roth on that page.  


A black racer rising up out of the grass.
Thanks & P/C Continis on Flickr.
2. See that image above?  That's a black racer snake.  I happened to see one the other day, and I remembered from previous reading that the state of New Jersey had a few articles about snakes in their state, and I remember one about black racers in particular. 
Can you help my fading memory and find an article about the black racer snake that’s published by the state of New Jersey as part of their educational outreach program? 
This was my first query.  


I'm not proud of it, but I wanted to show you that even practiced searchers also make mistakes.  

What's wrong here? 

What I'm trying to do is to just search on websites in New Jersey.  I know that the code for New Jersey is .nj  so I used that as the target of the site: operator.  

But I got zero results.  Why?  

After I calmed down after the shock of zero results, I realized that no matter what you think of New Jersey, it doesn't have its own top-level-domain name.  (A top-level-domain name is the code at the very end of a URL--e.g., .GOV .MIL  .EDU  .INFO etc.)  

An important lesson for searchers is to look at what's going on and try to debug your process.  What can you learn from this for next time?  In this case, I needed to know what the REAL web name is for the state of New Jersey.  

A quick search for [ official website New Jersey ] tells you that they're part of .GOV -- and their URLs all end in .NJ.GOV!  

Let's redo that query with the correct site specifier (and a better query).  


This looks more like what I'm seeking.  It's in New Jersey's educational web site, and it's about black racer (Coluber constrictor) snake.  

Now, to find all of the educational content at the NJ.GOV site, I just truncated the URL of the first result.  That is, I went to: https://www.nj.gov/pinelands/infor/educational/   and found a great page full of results...





Search Lessons 

In this post, I really wanted to emphasize the way that site: operates.  There are two big lessons here. 

1.  You can use –site: as a way to remove invasive results from your search.  In this case, because we were searching for something about Wikipedia (but not necessarily ON Wikipedia), we used the –site: operator as a way to get rid of the annoying results that were all on Wikipedia.  Use this trick anytime you want to remove an entire site from consideration.  Usually, this happens with super popular sites that tend to dominate the results. 

2.  SITE: can take any site specifier, including subdomains and directories.  In this example we just used the subdomain + top-level-domain  .NJ.GOV  -- but we could also do a site: with a directory as well.  Here's an example showing that the Pinelands part of the official web site has around two thousand pages covering a broad range of topics.  (And you can see that Educational content is part of their much larger mission.)   





As I mentioned, this is Part 1 of a series of "Difficult Web Page" search Challenges.  This one wasn't too difficult--Part 2 will be more challenging.  

During the next week or so I'll be doing an occasional additional post about topics that I think you'll be interested in reading.  See you here soon. 

Search on! 

Wednesday, August 8, 2018

SearchResearch Challenge (8/8/18): How to find difficult to find web pages? (Part 1)



Every so often you know a web page exists, but it's tough to put your finger on it.  


This last week I had several search Challenges pop up in my work.  Here are a couple of questions I found myself asking, and was ultimately able to resolve.  Can you?  

This is Part 1 of a series of "Difficult Web Page" search Challenges.  These first two aren't so hard--Part 2 will be more challenging.  Each of these Difficult Web Page SearchResearch Challenges is intended to highlight one particular method for doing your web searches with precision and skill.  


A black racer rising up out of the grass. 
Thanks & P/C Continis on Flickr.


-->
1.  A while ago I remember reading an article about a famous US author that was having some difficulty editing his own Wikipedia page.  As crazy as it sounds, they wanted to have independent verification of what he was saying.  I found myself wanting to re-read that article so I could refer to it in my writing.  I needed to find it to confirm details.  This was my Challenge:    
Who was the famous US author that was involved in a dispute with Wikipedia over the accuracy of the entry describing his novel?   

2. See that image above?  That's a black racer snake.  I happened to see one the other day, and I remembered that the state of New Jersey had a few articles about snakes in their state, and I remember one about black racers in particular. 
Can you help my fading memory and find an article about the black racer snake that’s published by the state of New Jersey as part of their educational outreach program? 

To answer these requires a bit of Search Engine Jedi-level skills.  Can you answer both of these Challenges?  

When you do, be sure to tell us how you did it in the comments!  

Good luck in your quest.  

Search on! 



Thursday, August 2, 2018

Answer: The Mystery of the Salzburg Stream--does it flow uphill?

Water flowing uphill?   

It was a huge surprise to me to see the stream apparently running uphill.  What's up with that?  How would that be possible?  

To remind us of the Challenge, here's a recap.  
Here's the place I was walking (from Google Maps, in 3D mode).  In reality, it's clear that the left part of the picture is uphill from the right part.  


Just below the fortress is a patch of forest, and below that is a meadow.  In the middle of the meadow is a stream that ends in a clump of trees and brush.  
Here's the map of that place.  See the blue line of the stream? That's the stream flowing into the side of the hill.  



Here's the same scene in the satellite view.   It sure looks like the source of a spring, with water flowing from south-to-north, into the side of the hill.  



The Challenge was:  


How is it that this stream in Salzburg is apparently flowing uphill?  What's the real story behind this gentling flowing brook? How is this possible?  

What I did... 

When I realized that the water was flowing INTO the mountain (rather than out of the mountain) I was plenty puzzled.  I tried to walk into the bushes to see what was going on, but that's a big thicket (with lots of spiny, pointy plants, which I assume were not planted there by accident).  

So I had to turn to my trusty search engine to figure this one out.  All I knew at this point was that there was an interesting physical phenomenon that needed explaining.  But how to start?  

I already knew that there were places where water apparently flows upstream.  (I know knew that these places are tricks of the place, where an obscured horizon creates an illusion called a gravity hill, as noted by Jeffrey Dowdy in the comments.)  This wasn't one of those places--this water really was running into the hillside.  

The first thing I did was to check Google Maps.  Here's what I first saw when I looked at the area.  The yellow arrow marks the place where I took the photo.  



On this map, the blue line represents the stream... and by zooming out a bit I found the name "Almkanal" on the stream.  (In the maps above, I had zoomed in a bit too much, and didn't see the label on the stream.)  

Now I did the query for:  

     [ almkanal Salzburg ] 

which quickly leads us to several articles--Almkanal and the Big Cleanup  and Almkanal,(both from Salzburg.Info, the city's official website), along with Almkanal (from SN.at, the Salzburger Nachrichten, the local news site) and Almkanal of Salzburg (from Atlas Obscura

From these sites we learn that the Almkanal is a water channel (the oldest subterranean aqueduct) that flows from the  Königsseeche (a large river that in the Alps) and then straight through a tunnel in the Mönchsberg (a ridge of stone running east-west on the southern edge of the town), "reaching the old town between the cemetery of St. Peter and the fortified lane..."  

The aqueduct was built by an engineer named Albert, who cut a channel through the Mönchsberg in order to access the large amount of water just south of the town.   The channel was completed in 1142.  It's 370 meters-long, and as much as 1.2 m wide and 2.2 m.  

In other words, this is the original water supply for the old city of Salzburg!  It really is an aqueduct that flows into a channel that was dug into the hillside.  More amazingly, it's still in use, and is open to the public (tickets required) for a few weeks each September.  

Enter the St. Peter's cemetery from Kapitelplatz square and you will see a water mill that is powered by the Almkanal. The mill is part of the bakery of the monastery, by far the oldest bakery of Salzburg.  

(You can read more than you might ever want at the Almkanal website's publication publikationen list.)  

Interior of the Almkanal.  (P/C/ Salzburg.info
  
It's been around for quite some time, so it's not a surprise that there are multiple maps of the project.  Here's one from 1700:  


Map of the Almkanal from 1700.  (P/C Almakanal.at)



And another from 1790--this one shows (at the bottom right) a kind of cross-section of the kanal.  You can see that the canal slopes downward, very gently, from the channel that runs all the way to the Königsseeche.  Like all canals, it only flows one way:  downhill.  This was a rather clever bit of engineering!  

And another map (1799) with cross-section at bottom right. The entrance on the far right is the streamhead that I photographed. Note that in this map, north is to the right  (P/C Almkanal.at)

That entrance into the Mönchsberg was pretty covered up when I found it.  Luckily, it wasn't hard to find a better picture taken in winter, when much of the obscuring vegetation has fallen.  

This is the Bürgermeisterloch ("Mayor's Hole"), where the Almkanal dives under the Mönchsberg
to deliver water to the other side of the ridge.  (P/C Salzburg Wiki.)  Those berry vines are vicious.  


Another way to find the answer: 

In some sense, I got lucky that the stream was shown on the Google Map AND that it was such an unusual name.  What would I have done if that hadn't worked out?  

I guess I would have gone with the next obvious query: 

     [ Salzburg stream that goes underground ] 

gets you to the Atlas Obscura article on the Almkanal

While looking through the SERP brought up by this search, I also found a lovely YouTube video showing the entrance that I photographer (look at 0:53 in the video below), as well as lots of footage of people crawling through the tunnels.  Remarkable. 

https://www.youtube.com/watch?v=-aHBJBM95vQ&feature=youtu.be 



Yes, the channel is open for a few weeks each year while they clean it out.  (I bet you can figure out how to get tickets if you're in Salzburg in September.)  And... it's still used to grind flour!  


Search Lessons 


(1) Checking the Map--at multiple levels of zoom--is a great idea.  Often you can pick up terms and place names that are useful in your searching.  It certainly paid off here. 

(2) As both Jeffry and Jon point out in the comments thread, your "initial theory" can often steer you into blind alleyways.  Be careful of things you think you know!  Background knowledge is incredibly useful, but if your search isn't working out, a really important skill to have is to recognize when that's NOT working, and then step back and think about how else you'd approach the problem.  In this case, it's tempting to explore ideas about karst geography (that also features disappearing rivers), but the geology of the Mönchsberg isn't karstic, it's conglomerate, which means you have to look for another explanation.  



Search on!