Tuesday, April 14, 2015

Answer (part 2): Can you find the reference for...

I told you this... would be difficult! 

Remember that the Challenge hanging us up from last week was this:  
1.  Can you find the reference for.... A paper I once read that claimed "the probability of a reader reading a book in the library was a function of the distance of that book from the library catalog."   
As you can tell, this research was done a while ago (back when library science papers were measuring book access in terms of card catalog distances).  I haven't had any luck finding the original paper that made this claim.  Can you?  What's the citation?  

 Regular Reader Hans got the first solid reference for this result with his search... 

He did a search on Google Scholar: 

     [distance reading a book in the library probability library catalog ext:pdf]

and found... 
Library economic metrics: Examples of the comparison of electronic and print journal collections and collection services. Donald W. King, Peter B. Boyce, Carol Hansen Montgomery, Carol Tenopir: Library Trends 51(3) (2003)

in the 3rd position on the SERP.  (Note that "EXT:pdf" is the same as "FILETYPE:pdf".)  

In that journal, on page 392 the author writes:  
One indicator of print collection effectiveness is the proximity of the collection to readers (i.e., its accessibility). Every survey we have done comparing distance (in minutes) of readers to the print collection shows the overall use of the library, use of its journal collection, and amount of reading are inversely correlated with the distance to the library. That is, those closer have higher use, although it is found that readers further away from the collection tend to read more when they do visit the library. Evidence of the effect of distance on reading is as follows: 
- 66 percent of the readings are from library print collections when the readers are less than five minutes away; 
- 48 percent of readings are from there when five-to-ten-minutes away; and  
- 34 percent of readings are from there when over ten minutes away

In other words, the graph looks like this:

time to the books, in minutes from card catalog

And then a bit later, Regular Reader Kirk (with support from Luke) found this article:  

Dwyer, J. R. (1979). Public response to an academic library microcatalog. Journal of Academic Librarianship, 5(3).  

Both Kirk found this article by searching through librarian journals.  In this case, he found it by using LISA (Library and Information Science Abstracts), but he mentions that you can also find it on Google (or Scholar) by searching for: 

     [ "distance from the card catalog" ] 

Once you do that search, you'll find this paper by Dwyer. Here's what it looks like there: 

The URL is http://eric.ed.gov/?id=EJ206855  -- and if you notice the last word of the green text, you'll see the word "ERIC."  It's a large collection of articles from the academic (and especially academic education) world.  You can see the entire list of articles indexed by ERIC by looking at their catalog of journals list.  

Unfortunately, not all of the articles indexed by ERIC are available in full-text.  That means we have to go to another place to find the full-text!  

Luckily, there are database companies that DO provide the full-text of many academic journals.  The big three in this area are Gale, Proquest and EBSCO.  A quick search for: 

     [ "Journal of Academic Librarianship" full-text ] 

tells us that the "full text is available from EBSCO's Academic Search."  

This tells us that the journal is available in full-text on EBSCO Host (a paid database service).  This is a great approach IF you know that such a database exists and if you have access to it.  Unfortunately, like many databases, EBSCO Host requires payment to access.  

HOWEVER, I was able to get access to EBSCO Host through my Mountain View library account, which is one VERY good reason why you want to have a library card--you can still get access to articles that aren't available in any other way.  Here are 4 other reasons...  (Notice that it was the 3rd library I checked--my other local libraries didn't have it!)  I just logged in with my library card and clicked on the "Research" button.  In that list I found the EBSCO Host database, clicked on it.  That took me to the EBSCO Host data base, but I noticed that it defaults to "Searching ERIC."  That's a fine database, but is aimed primarily at educational resources, not the the "Academic Search" that provides access to this journal.   

The default is to search only on ERIC, you'll want to change this to get to the J. of Academic Librarianship

Change it like this...

And now you have access to the full-text!  

When I finally found and read the full-text of that paper, I found a very similar set of data about the probability of accessing a book as a product of its distance from the card catalog.  Here's the chart of probability vs. time to the book from Dwyer's paper.  (Note that I had to transform the data pretty seriously to get it to be comparable, including looking up distances between the libraries on the U. Oregon campus in order to determine the amount of time it takes to get between the various campus libraries.)  

time to the books, in minutes from card catalog

What's striking about these graphs is how similar they are, even though the data is from two very different libraries over a time-span of 24 years apart!  (The second curve has an inexplicable uptick at ~5, but I suspect that is an aberration caused by too small of a data set.  The paper doesn't give the exact count, but I suspect the number of samples is small.) 

These curves are pretty similar to this inverse square curve.  (I could do a curve fitting to prove the point, but for this blog, general similarity is enough.)  

time to the books, in minutes from card catalog (model)

There's more to write about this, but it's getting late and I want to send this out to you before tomorrow's new Challenge.  

Excellent job on everyone's part in getting this all figured out.  

Bottom line:  These two articles give a pretty decent case that the probability of access DOES diminish with the distance (or more accurately, the amount of time it takes to get to the book).  However, in order to find this we had to do some fairly clever searching--first to choose the right / workable search terms, and then to get to the second article we had to go to a paid database via our local library.  

Notice that Hans' query: 

     [distance reading a book in the library probability library catalog ext:pdf]

seems unlikely to work.  Why do you need both instances of the word "library" in the search?  (Try taking it out: you'll see that both of them are needed to bring up the King paper to top.)  

It's because Google search works on word sequences as well as simple words in the search. That is, the phrases "book in the library",  "library probability", and "library catalog" are all part of the text, and therefore match.  

The only way to get this to work is to keep trying with short phrases that you think are likely to be in the text of the perfect article.  Think about it this way--if you were to write an article about this topic, what kinds of likely short phrases would be in that text?  

There might be a shorter query that will find this article, but this query isn't bad at all.  

Nicely done.  

And WRT having to also search EBSCO Host for the full-text of the paper:  Sorry, that's the way it is these days.  Some journals are indexed and text-available via databases like EBSCO, Gale, or Proquest.  Luckily, all of these providers allow Google to index the text, but not provide the full-text.  For that, you still have to figure out which database has the full-text, and then figure out a way to get into that system.  Even more luckily, many libraries still provide this kind of access to their patrons.  Thank heavens!  Hurrah for libraries again!  

Search on!  


  1. Dan, Thanks for this clear explanation.
    I have heard from several search gurus, that repeating the most important search terms in a query gives better results (eg http://www.rba.co.uk/search/TopSearchTips.shtml). Can you please elaborate at one time a little bit on how this works? Does this have anything to do with relevance ranking?

  2. This has all been very interesting!

    Among many other things, I loved to learn about the "EXT" shortcut.

    I found out that, although it works pretty much the same as "FILETYPE" and gives pretty much the same results, it is not exactly an alias for it. Nor is it undocumented, as claimed on every guide on Google operators I've read on the web. Here's from the latest version (December 2013) of Google Search Appliance : Search Protocol Reference (this is the official reference guide for the Google Search Appliance):

    "File Extension Filtering
    The query prefix ext: filters the results to include only documents with the specified file extension. No spaces can come between ext: and the type. For example, ext:pdf, which retrieves all documents with the pdf extension.
    You can combine this prefix with the filetype prefix to construct the following types of query : filetype:pdf AND ext:pdf, which retrieves all documents with the Mime type pdf and with the pdf extension.
    You can exclude file types by putting a minus sign before ext, such as -ext:pdf. […]
    Sample usage:
    whitepaper ext:doc OR ext:pdf"

  3. Tip for accessing EBSCO worked fine in Canada as well. List of databases screen slightly different results (Academic Search Complete - my result gave me Academic Elite or Academic Premier) but I just selected all and got the same result as you did. I added the screenshot of results for my area for the benefit of other Canadians searching.

    Great tip.

  4. Thanks Dr. Russell for the answer. One question the article Hans found was the one you read time ago or is just the same topic? I ask because maybe you read it in paper and not everything is online.

    Great tips and knowledge in these challenge. I didn't know about all the sites you mentioned or even about searching for the entire paper. I was reading some and only read the abstract in many cases. Now, I know how to look for the whole thing.

    Thanks Luis for posting more details about "EXT".