Wednesday, August 29, 2018

Answer: How to find difficult web pages? (Part 2)


What makes a page difficult to find?  (Part 2)

I was impressed by how well (and how quickly!) SRS readers were able to figure these out.  Some of the search paths were lovely and inspired.  Nice work, Readers!  


Here's what I did.  Let me repeat the two Challenges and then tell you what I did to answer them.  


1.  This happens to me more often that I would like:  Images in my blog will sometimes go missing in action.  This happens when a website disappears, leaving my nice link to their image with a gaping hole.  Perhaps you've seen it on other web sites--the hole looks like this: 

A broken image link leaves behind a hole-in-the-page.  I want to find a replacement image.  One that looks the same as this missing image!
How can I find a replacement image for this hole in my blog?  In other words, can you find this missing image?  This hole-in-the-blog comes from the SRS post of December 14, 2011 and shows a particular remote-control glider.  (In fact, it's one that I built back in the late 1990s.)  
The Challenge for skilled SRS-ers is to (a) figure out what that image looked like, and (b) find that image somewhere else on the internet.  Can you? 

My solution:  I tried opening this image by Control-clicking (right-click on Windows) on the image-hole and then "Open Image in New Tab"--like this: 

  
I wanted to get the URL of the image.  (And yes, I could have done "Copy Link Address," but you'll see why I did it this way in a second...)   The URL for this image is: 

www.carlgoldbergproducts.com/airplanes/gpma0960_01_bg.jpg


This is what you might see if you open this link in a new tab: 


This is a classic "page not found" error.  

If you recall from a few weeks ago, I mentioned that it's handy to use the Wayback Machine browser extension.  This is a Chrome (or FF) extension that pops up when you hit a missing page (or file).  So my display really looked like this: 


If you "click here," it takes you to the Wayback Machine, and if you follow the obvious links forward, you'll get to this page: 



Now I see that the image is from an old site about remote-control gliders.  That makes sense, and it's going to be one of those images, but which one?  

I just went back to the Wayback Machine and put in that image URL above (the one in green + bold above).  Here's what I get from the Wayback Machine: 


Great!  It looks like the image was last saved on Feb 8, 2018. But if you click on that, you get another "missing" image.  Truth is, sometimes you have to work your way back along the timeline to find a real version of this image.  I jumped back to Mar 12, 2014 and found this: 


But I wasn't quite done yet.  I was wondering if that image had been used somewhere else.  Did this particular glider move from the Carl Goldberg company to some other place?  

To test this out, I did this query, looking for another use of this image name elsewhere on the web: 

     [ inurl:gpma0960_01_bg.jpg ] 

As you know, the inurl: operator searches for any string inside of a URL.  In this case, I was searching for that particular file name.  (Why?  Because I know that people are lazy and usually don't rename images.)  

Unfortunately, that gave me zero results.  

Now what?  

Let's look at the file name in detail.  It's: 

      gpma0960_01_bg.jpg

To me, this looks like a product code ("gpma0960") with a number (01) and a code indicating that it was used in the background (bg).  

What would happen if we just did an inurl: search for the product code name?  I'd expect to find all kinds of things with that code in the URL.  Here's my next search: 

     [ inurl:gpma0960 ] 

And... we hit the mother lode!  Here's the SERP for this query.  See how the product code appears in all of the URLs.  


This inurl: trick is incredibly useful for finding products, especially those that are no longer in production!  


2.  A while ago I was having dinner at a hole-in-the-wall Turkish restaurant somewhere in Europe and had a fantastic dessert.  It was rich, creamy, simple and wonderful.  I wrote down the namekaymak–so I could find it again at a place closer to home.  My Challenge was to find a place near me (that is, in Mountain View, California) that sells kaymak.  Can you find a place in Mountain View, CA that sells this fantastic dessert?  
(Note that I do not want clotted cream, nor do I want to buy it through online purchase, I want real kaymak that I can eat today!!  
For extra credit (and this is the difficult part)--How much does this place in Mountain View sell it for?  


My solution started by searching for: 

     [ kaymak near me ] 

But if you're not in Mountain View, CA (as I am), you could do the equivalent thing with this query: 

     [ kaymak near Mountain View, CA ] 

In this case I included the city and state because there are multiple cities that share our name.  I wanted to be sure to get the right one.  Here's what I see: 


Notice that the first result is to a Yelp result that lists places that sells "clotted cream."  That's close, but not quite what I wanted.  I want kaymak!  In this case, I want to turn off the synonyms, so I quote the term to get exactly that (and only that).  Note the difference between these two SERPs.  


This looks great!  

But oddly, when I open the Olympus Caffe & Bakery web site, I can't find the word kaymak on the page.  This is a case where my Control-F skills didn't pan out.  

Now what?  As you can see, it's not on the page!  


I'm confident that kaymak is here, somewhere.  Where?  

I could start clicking on all of the buttons (e.g. "Cakes/Desserts"), but I went with a more hacker approach, a method that's sometimes handy.  

I went ahead and did a View Source.  It's an option that you can get to like this:  


This will show you the raw HTML, which can be scary, but you can then search for kaymak... Here, I've highlighted the line, which happens to include the price:  $4.50 


If you read HTML, you can see it appears under the "Turkish Breakfast" menu item, which would have taken me a long time to find by clicking on all of the options.  

Viewing the source of the page is often a useful method when the page is complex and has a lot of 


As I said, I was impressed by some of the answers in the comments this week.  Well done team!  


Search Lessons 


1.  Remember the Internet Archive / Wayback Machine when looking for lost pages or images!  They don't cover absolutely everything, but it is an invaluable service to the community. 

2. Using INURL: to find other pages with the same text in the URL is often a great way to track down pages that share content with what-you're-seeking.  Don't underestimate the power of inertia:  Webmasters often prefer to keep the URLs of previously existing images and pages when they move (or copy) content.  As a side-effect of this, you can often find content that would otherwise go missing.  

3.  Developer>View Source  ... it gives you access to the ground truth for many pages.   In this case, I was able to find the kaymak entry very quickly, without all of that annoying clicking around in the menus to figure out which category of thing it was hidden under.  

Search on! 

5 comments:

  1. Hello Dr. Russell and everyone.

    Challenge as always was very interesting and allowed me to know about new productos like Kaymak, and new knowledge in the comments and with the answer.

    About Q1, in my laptop/tablet, when I open the image link on a new tab, I don't get the Internal Server Error. The url changes going to the main page from them, truncating "airplanes/gpma0960_01_bg.jpg" giving now a site that current first article talks about apps for adults.

    However, today I learned that passing the mouse over the missing image, shows the full url with the "airplanes/gpma0960_01_bg.jpg" part included. So from there, we can work with the View Source option, searching with Ctr-F one word in this case, I used redoing the challenge: airplanes. After that I just worked the rest as you did.

    About the View Source, I knew about and I use it a lot in your blog in order to check if other peers have already posted a link in comments like the ones used by Remmij and me that have HTML Hyperlinks. Curious thing, never occurred me to use this to find the price of the Kaymak, therefore I used kaymay site: with the Olympus Cafe.

    INURL, is something very helpful that I don't use a lot. And now, I see the potential and how helpful it is.

    ReplyDelete
    Replies
    1. Ramón - you are becoming the search whiz/Jedi/superhéroe —— I hadn't used the 'view source' at all… or inurl: operator
      chowhound
      Dan - a ways from MV… but you travel downtown often…?
      Lokma — under the Brunch menu — $6 -Bal & Kaymak - 1801 Clement Street San Francisco, CA 94121

      (btw - what this Mac shows the font as - Georgia 13.00031442 -… do know how to zoom/magnify…but not sure why the readers would be required to…this should be tiny Tahoma 9.0… )

      fwiw: tried [carl goldberg gliders] search after seeing Carl turn up in the URL…
      Gentle Lady, 1st hit, then checked 'Images'
      Ramón, you might like these…
      cowries
      of knights and snails - interesting story & visuals…

      Delete
    2. Thanks, Remmij. I am just learning with Dr. Russell, you and all of our peers who post. I have improved a lot since Google Power Searching MOOC that in which I had the blessing to find Dr. Russell and his blog for the first time. That MOOC is also mentioned on Wikipedia's article about Massive open online course

      Yes, you are right, Remmij! I love them. I am sure Dr. Russell loves it too. Shells are so awesome as Ocean and Marine Life. Thanks for sharing.

      Delete
    3. I just read a new article that is very interesting and a big surprise. As, for example, was the explosive seeds SRS we had in the past. I think you will like it too. The tree that bleeds... metal?

      Delete