Tuesday, May 5, 2015

Searching within a document, human memory, and other things you'd like Google search to do

A while ago, Regular Reader Rosemary asked a good question that I've been thinking about.  Here's what she wrote.  (I've edited it a bit.)  

Sometimes a challenge is more about a concept than it is about specific words. Dr. Dan says that "your recollection may be how you remembered it but not necessarily written in that context." As you said this is a good intro to a reference interview.

What I started to see [ in the SRS Challenge 4/8/15 - "Can you find a reference for..." ] was a concept - The Law of Least Effort. As J. Matthews says on Page 36 “The Law of Least Effort is alive and well and operates in the physical library spaces”

It makes sense that using this concept we can find elsewhere a reference to card catalogue locations affecting people’s search efforts.  But the problem is more than just the specific words--it's all of the other words I'd like to be using in my search.  

[ Later, Rosemary amplified on this...]

Maybe we need a “Super Control F.” That is, the ability to have multiple words (including unrelated) or even possibly vague words in the search that can be interpreted in the search box. 

If we had a Google Search Box within the document we would have more tools like I’ve already mentioned or more. Another thought is to have a way to draw out a keyword list out of a document. Nowadays we have keywords/tags for documents often attached. But historically they often don’t exist especially for journals/papers. Perhaps there are some tools that now exist.  This challenge got me thinking about how I do things? Perhaps there are better ways. Can Google Search help us search within a document?

Another lesson for me is the use "references". Each time I looked/scanned at a document I would then scan the references at the end the article/book/paper.  With title and author we possibly had a lead but with this particular challenge, vague keywords made this more difficult. I wonder how can we make the most out of these possible leads besides scanning each one.

There are many great questions in here, but let me make a couple of comments.  

1. Searching for a concept and choosing keywords that can find it.  Rosemary points out that the Principle of  Least Effort is somewhat difficult to find, given the words needed to get to it.  This is known as the Vocabulary Problem (or the vocabulary mismatch problem). It's been studied since at least 1981, when my buddy George Furnas first wrote about it.  The basic problem is that on average 80% of the times different people (even experts in the same field) will name the same concept with different words. There are usually tens of possible names that can be attributed to the same thing.

In our case, we don't know what the concept is that we're looking for--so how do we begin a reasonable search for it?  

I suspect that this is one of those gumption problems--you just have to keep searching for the concept. 

On the other hand, there is an intuition for a concept that good searchers need to develop.  That is, you have to have the sense that such a concept (such as least effort) would exist, and that you can find it.  

How do we develop such an intuition?  I really don't know.  But I suspect that reading widely will lay the foundation for such intuitions. 

2. Your memory is terrible.  As we've discussed, you might think your memory is great, but the truth is, your memory is fallible in many ways.  In particular I remember may/may-not be the way I wrote it in my notes, or the way it appears in reality.  (This was the point of the post about "Can you find the reference for...")  My recollection was different in many ways from the reality.  

Being a great searcher means knowing how to back off from "things you know are in the target."  I remember something as having "3 levels" of abstraction--but the reality is that it's 4.  I might remember my friend Grace as having sent me a particular email message, when in reality it was Bob (he's a co-worker of Grace, hence the mistake).  

As you search, in particular when you get stuck--try backing away from the things you're sure about.  Look for reasonable alternatives (different names, different dates, different places), and see if that doesn't help. 

3.  Super-control-F function.  Rosemary wants to be able to search for multiple things at a time.  Turns out that there are extensions for Chrome and Firefox that allow you to search for regular expressions.   A regular expression is basically a pattern that matches pieces of text in the document.  For instance, using a regular expression like  [Apple | Facebook | Google] I can search for all three of those company names at the same time.  If you're using one of the regular expression plug-ins, this is exactly a "super-control" F function.    

4. Want search by synonym.  Interestingly, Rosemary's note prompted me to check for other Chrome extensions--and it turns out there's one that searches for synonyms of the search term.  (e.g., "car" for "transportation")  It's an interesting idea.  I've got it installed now, we'll see how often I use it.  Thing is, you can have it now.  (You just need to search for it...)  

5. Want fine-grain search within a document.  I agree with Rosemary that having find capabilities within a long document for things like [ word1 AROUND(3) word2 ] or [ intitle:term1 ] would be very nice.  The good news here is that symbol search (e.g., for characters like /, #, @, ~, etc., things that Google ignores in its search function), work quite nicely with Control-F.  (The bigger problem, and the reason Google doesn't implement such a thing, is that 90% of people don't know about Control-F.  Why build in something else that would be used by even fewer people?)    

6. Using references at end of the a document.  Is there any better method of scanning them rather than just scanning them?  Alas, I don't know of anything better than taking good notes.  

If you find one, though, let me know!  

In the meantime, Search On! 


  1. Great discussion. I've already located a couple Chrome extensions that I think may help. I'll give them a go. You have given me some ideas and I'm going to do some experimenting to see what kind of results I can achieve (or perhaps what I can't achieve). I'll post any worthwhile results. Thanks for such a detailed response.

    1. Let us know what your experiments reveal! Any good extensions out there that help with the SearchResearch process?

  2. One thing that I have found really useful with using Control-F in Chrome is that you can see all the occurrence of a word in the scroll bar. However, I have often wished that I could open a second find box, and see all the occurrences of a different word in the search bar in a different colour at the the same time.

    A similar idea for an extension would be one where the terms you searched for in google to bring you to that documents appear highlighted in the text (and scroll bar), when you open the document. A huge part of the search process is opening pages and quickly deciding if they are relevant to you or not, and the quicker you can find the words that brought you to the that page the quicker you can decide if it is relevant.

  3. Some useful extensions:

    MultiHighlighter allows multi-word searching/highlighting on almost any web page. Each term is highlighted with a unique color.

    Quick Find for Google Chrome
    Next gen text search for Chrome and Opera. Port of Firefox Quick Find features + awesome new ones. Search results in one location. Navigate to links in just a few keystrokes.

    Synonym Control-F
    Control-F that also searches through synonyms.

    Google Quick Scroll
    Quick Scroll lets you jump directly to the relevant bits of a Google search result. Google Quick Scroll is a browser extension that helps you find what you are searching for faster.

    Search the current site
    Search any site you're on by the simple click of a button. The search is done not only on the 1 page you are viewing at the time you click as [Ctrl]+[F] does but instead Google is used to "site search" within ALL pages of the site. Search websites the easy way!