Tuesday, October 19, 2010

Scoping in search

Paradoxically, setting a limit is sometimes the most important thing you can do on the way to understanding something big and difficult.

The idea of setting a “scope” (a la programming languages, or even in the more common vernacular sense of the “scope” of an investigation) is often key to being able to see through the clutter to the essentials. 

This is perhaps most easily seen in the way experts search on Google.  If you think about it, every term in a query is really “setting a scope” by saying what term should be counted.  If you have three terms in your query, say [ independence hall Philadelphia ], each word is implicitly focusing the results on documents that have those terms.  That makes sense. 

But in a larger way, setting your scope is a really important part of understanding what you’re really trying to do.  And tools that implicitly define a scope help out a great deal. 

For instance, www.blackwebportal.com is yet-another search engine—but this one is for the black Afro-American community.  (And interestingly, it’s not for the black community of any other country, but is strongly limited to the US.)  There are many specialty search engines that are defined by the population group they serve (e.g., Middle Eastern—www.mymena.com), language (e.g., Latvian—www.search.lv) , the market they serve (retail, travel, etc.) or interest areas (windsurfing, knitting, robot construction, etc.).  What’s so surprising to me is that almost any interest-area / population group / language you can think of has a search engine to serve its members.  My favorite is the Lolcats search engine (http://rollyo.com/rhianda/lolcats/ -- you thought I was kidding, didn’t you?  Here's an example search for fuzzy lolcatz.). 

You see scoping at work when you decide to use a particular kind of resource—say, when you use Amazon to look for a book, Pubmed to look up medical information, or Youtube to find a video. 

“Scoping” is the choice you make to limit the range of possibilities you’re working from:  that’s a good thing—it’s often the key to being able to see the signal in the noise.  It can be as simple as choosing the language of search (like searching German only sites when you’d like to find a German-language article), or as sophisticated as knowing how to search only within a specific *kind* of site when you have a strong suspicion that the answer will be there. 

Of course, this goes hand-in-glove with knowing that such kinds of resources exist.  When looking up your family crest, it would be immensely useful to know that there are web sites (and books!) dedicated JUST to describing heraldic devices, some with lovely language that’s hyperspecialized and otherwise archaic (think of the phrase, “lion rampant within a double tressure flory counterflory gules”—that’s not a phrase that would leap to mind in daily conversation). 

The biggest challenge is when the target of your search is so generic (or so little known by you) that it’s hard to figure out how to describe it, let alone choose a scope for your research. 

For instance, yesterday I saw a pretty yellow flower.  Try a search for that! It’s hopeless as it is. 

Good searchers know that they need to add in as much contextual information as possible to limit the range of possibilities.  Where did I see the flower?  In California… in the summer… on the roadside…  All that scoping information helps to limit the range of possibilities.  Pretty soon, you’re onto a page with a table of images to sort through. 

To scope effectively, you need to know what’s possible and available.  You need to that Intellius.com or Spock.com can deliver a huge amount of information about a person.   In the same way, you can find out the assessed value of a house in Santa Clara valley by going to the county Assessor’s website, but you can’t find out who actually owns the property.  (For that, you have to physically visit the assessor’s office in downtown San Jose.  Why?  Because there’s a state law that prohibits them from posting the address of any elected official… and keeping a master list of all officials that should be excluded from the list is just too painful.) 

Sorry about the seeming contradiction here, but you need to know what you need to know. 

It goes on and on.  The more you know about what’s out there, the more you can constrain your searches.  This has been true since books became cheap enough to proliferate like textual bunnies. 

Interestingly, the research problem used to be “how to search sufficient resources to make sure you checked all the relevant places.”  Now, the research problem is often “how can you search just the good resources to be sure you haven’t missed the signal in the noise.”

This becomes especially interesting in the rising tide of interest in highly data-driven science. (See, for instance, Chris Anderson’s article in Wired http://www.wired.com/science/discoveries/magazine/16-07/pb_theory -- or you can search for  [site:wired.com Anderson data science ].  Similarly, see Ben Shneiderman’s article on Science 2.0 – here the best search is [ shneiderman “science 2.0” filetype:pdf ] – if you don’t do this, you’ll end up paying too much money for an otherwise freely available article.) 

While the examples here are all about search, I think this holds more generally for all of our research as well.  Defining your problem, being clear about what you’re trying to accomplish—these seem like obvious steps. But it’s an ongoing problem in all researches—we need to keep reminding ourselves of what the goal is (goals are?).    

At least that’s a problem I have.  Maybe you do too.  


Search on! 


1 comment:

  1. Nice post Dan! Having just finished teaching a two-week session with high school age students, I was surprised to discover that they didn't really know the difference between a question you can look up the answer to on Google and one you can't. Why? I think it has to do with scoping... To them, if there's no Wikipedia or Dictionary.com answer, there's not an answer at all. They don't really get how to scope in other ways.

    Glad you wrote about this topic, so I can do a better job scaffolding their question-posing the future.

    ReplyDelete