But some content can be crawled, but not displayed. That is, in some cases, our spider can index the text of the document so you can find it, but when you click on the link, you might not be able to actually see the original source material.
The thing is, lots of publishers don't provide open access to all of their content. The let the spiders crawl the content (to make it searchable), but then put the content behind a paywall. That makes sense if you're trying to make money, you want searchers to pay for the ability to search and read. But it's kind of a pain if you're just trying to do research.
Sometimes you'll find a document--say, a book like Schirmer Encyclopedia of Film. This is a big (1200 pages) reference work that you might use to do some research on directors, films, genres, production methods, etc. But it's also $528 new, and even used copies typically sell for around $400.
This is a classic reference book, the kind you might find in library reference collections.
Of course, getting access to it online would be incredibly handy--the kinds of search you can do in an online version of a book is very different than with a hardcopy version. But the only way to get online access is through Gale's system--they have rights to the e-version of the book, and you have to use their system (which costs money) to access it.
Luckily, some of this book is indexed by Google Books, so if your search on Books was:
[ "where only the Maltese Falcon (1941) have survived intact" ]
BUT... notice that you have to search in Google Books for that quote. Doing this search in regular Google doesn't find the book. The only way to get access to the complete book is to be part of an institution that has a Gale account that you can use. To get beyond the paywall means that you need to be affiliated with a university, or a really great public library. (And in truth, all great Search Researchers try to develop and maintain such relationships--it's the only way to get access to this content.)
However...sometimes there are other ways to get around the paywall. In our last Search Challenge, you could do a regular Google search for the title of that paper we were interested in:
[ "Public response to an academic library microcatalog" ]
This is paper by Dwyer that looked interesting, but the first link goes to the public version of ERIC, and they don't have the full-text (at least not via the public web interface).
But the second link looks like this:
If you click on this link, it takes you to the EBSCO Host site, which has a very nice link to libraries near me, like this:
This is great! (Although, as I said in yesterday's post, the Palo Alto library doesn't actually HAVE EBSCO Host, so that link is broken. Luckily, the Mountain View library does, and I have a library card there as well. When one link doesn't work, try the next--be a resilient searcher.)
It's worth knowing about these paid databases, because they sometimes have the only online-available copy of an article.
There are sometimes cases with workarounds... For instance, sometimes an author will publish a paper where the paper is on a paywalled site. For instance, there's a well known paper with the title, "Reflections of the environment in memory," which is available through the publisher for $35. However, if you do a search like this, you can find copies that other people have put up on the open web (usually for educational purposes):
[ "reflections of the environment in memory" filetype:pdf ]
I know it's sometimes hard to pay $35 for a paper from the publisher when you don't even know if it's what you're really looking for--so this is a way to see the entire paper without having to break the bank.
Of course, if you find the paper to be what you want, and you end up using it in your research, the right thing to do is to go purchase a legal copy of the paper from the original publisher.
Search Lessons: There are several here...
1. Some online content can be found in slightly different forms than what you might expect. That's the lesson of the Schirmer's guide. If you only looked for the complete book, you might miss all of the different volumes as they exist in Google Books.
2. Sometimes, you just have to search Google Books. Currently, even direct quotes from published books do not appear in a regular Google search--you still have to check Books.Google.com. (In theory this will improve over time, but as of today, you still have to go to Books.)
3. Be sure to notice that some content collections (e.g., EBSCO) direct you local libraries that have access rights. This is a wonderful service--use it when you can.
4. Sometimes you can do a workaround by searching for a PDF of the article. With luck, you'll find a version of it somewhere on the web. And, as I said, IF you use this article (or even read it end-to-end), you really should go buy the real PDF from the provider.
5. There are a lot of paid databases out there: Learn which ones have what kind of content. Obviously, this is a huge problem; in the future I'll write about the ones I use, and how I learned what's where.
6. Be a part of the university / college / library ecosystem. Having a couple of library cards (especially for libraries that have access to the paid databases) is incredibly valuable. Besides, the librarians frequently know things that can shave hours off your research time. One of their great strengths is knowing what all of the paid databases are and what they contain.
That's it for today.
Coming up, an answer to Rosemary's question about how to think about forming queries. (I'll work on this while in-flight.)